Web Indicator Reports
Here are the steps necessary to collect web data and calculate a range of indicators for a collection of publications, including the Mean Normalised Log-transformed Citation Score (MNLCS) and the Normalised Proportion Cited (NPC).
- Step 1: Identify the group of publications to be assessed and categorise them by field (e.g., using Scopus or WoS subject categories).
- Step 2: Save the article information (authors, title, journal, publication year) in a standard tab-delimited format in a separate file for each subject category/year combination. First, discard publications that are in small subject/year combinations (e.g., <100 publications). Create tab-delimited files for the each subject/year. There should be one line per publication. Each line should contain the author names in standard format (following Scopus or Web of Science formats would be ideal), the publication year, the article title and the journal name (ignore this for books). The first line of the file should contain header information. Here is an example of the format for journal articles and for books. If your data is in a spreadsheet, it can be saved in this format using the Save As command and selecting the Plain text (tab delimited) format. The filename for each file must contain the subject name and year, and end with -[group].txt, where [group] should be replaced by a name for the collection of articles. The same [group] should be used for files containing publications from the same group. If the files are in Scopus of the Web of Science then choose the tab delimited format in which to save them.
- Step 3: For each retained subject/year combination, a benchmarking sample is needed of articles from the rest of the world. For this, download all articles from the Scopus/WoS (if possible) field/year or a large balanced sample (e.g., the first and last 5000 articles published in the category) for the world reference set. Filter out any large trade or art journals with a high proportion of uncited articles. Name the files using the standard Webometric Analyst naming convention so that each filename contains the subject name and year, and ends with -world.txt. These filenames must exactly match the group filenames, except for replacing -[group].txt with -world.txt. All of the files should be stored within a single folder that does not contain any other files.
- Here is a small artificial example of a complete set of publication data files in structured name format, with all publications in a single file being from the same field and year, and each group file corresponding to a world file.
- Step 4: Decide which alternative indicators are to be used for the data. Webometric Analyst supports Wikipedia citations, PowerPoint citations, syllabus mentions, grey literature citations, Google Patent citations and general web citations. Start Webometric Analyst, close the Startup Wizard and sect the appropriate indicator type from the Make Searches menu. Use this to create a file of searches for each of your original data files.
- Here is an example of the files created for Wikipedia citations.
- Step 5: Since Bing API searches need to be paid for after the first free 1,000, unless you have a budget, the next stage is to generate a random sample of articles from the world and group sets (e.g., 500 per set) and use these samples instead of the full set. For this, from the Make Searches menu, select the Replace search files with a random sample up to a maximum number menu option and instruct Webometric Analyst to replace all the search files with random samples of 500.
- Step 6: Use Webometric Analyst to run all the searches. For, this, start Webometric Analyst, open the Wizard by selecting Link Analysis Wizard from the File menu, enter your search key, click OK, click the Run All Searches In File button and select one of the search files. Wait for Webometric Analyst to finish and then click the same button again and select another file. Repeat this until all the files have been run. The picture below shows some of the files generated for PowerPoint searches, togther with two additioanl files created in Stage 7. Example files for Wikipedia searches.
- Step 7: Use Webometric Analyst to calculate MNLCS and EMNPC and confidence limits for both. For this, start Webometric Analyst, close the Startup Wizard and then select Calculate MNLCS, gMNCS and NPC for a set of web searches (structured file names) from the Reports menu. Select the folder containing all of the files, when requested. This will create two new files. The file called all_data.txt, contains all of the data extracted from the searches in a format that can be loaded into a stats package or spreadsheet. This is a backup file in case you want to calculate your own indicators. The file called report.txt contains MNLCS and EMNPC values for each individual file in a long list at the top. Near the end of the file it then reports tables of the combined MNLCS and EMNPC values for the whole collection. This is the main part of the results. [See below for a sample output]
- Step 8: If you want MNLCS and EMNPC calculated separately for each year, then create new folders, one for each year, and copy all the files from each year into the relevant year folder. Repeat the step above for each year folder.
Sample report
Source of web search results: C:\Users\Public\Documents\data\Wikipedia citations structured names Total number of world files (e.g., one per field and year): 2
World File: Biochemistry Molecular Biology Alcohol 2012 world_wiki Queries : 498 Arithmetic mean (unique URLs) : 0.026104 Arithmetic mean (unique domains) : 0.024096 Arithmetic mean (unique sites) : 0.020080 Arithmetic mean (unique STLDs) : 0.020080 Arithmetic mean (unique TLDs) : 0.020080 Geometric mean (95%CI) of unique URLs : 0.016255 (0.005730, 0.026891) Geometric mean (95%CI) of unique domains : 0.015668 (0.005704, 0.025732) Geometric mean (95%CI) of unique sites : 0.014016 (0.005297, 0.022810) Geometric mean (95%CI) of unique STLDs : 0.014016 (0.005297, 0.022810) Geometric mean (95%CI) of unique TLDs : 0.014016 (0.005297, 0.022810) Mean (95%CI) of log(1+unique URLs) : 0.016125 (0.005714, 0.026536) Mean (95%CI) of log(1+unique domains) : 0.015547 (0.005687, 0.025407) Mean (95%CI) of log(1+unique sites) : 0.013919 (0.005283, 0.022554) Mean (95%CI) of log(1+unique STLDs) : 0.013919 (0.005283, 0.022554) Mean (95%CI) of log(1+unique TLDs) : 0.013919 (0.005283, 0.022554) Proportion non-zero (95%CI) : 0.020080 (0.010943, 0.036565) MNLCS - mean (95%CI) of world normalised log(1+unique URLs) [Population version]: 1.000000 (0.354343, 1.645657) MNLCS - mean (95%CI) of world normalised log (1_unique URLs) [Sample version]: 1.000000 (0.321747, 3.108037) MNLCS - mean (95%CI) of world normalised log(1+unique domains) [Population version]: 1.000000 (0.365825, 1.634175) MNLCS - mean (95%CI) of world normalised log (1_unique domains) [Sample version]: 1.000000 (0.331823, 3.013657) MNLCS - mean (95%CI) of world normalised log(1+unique sites) [Population version]: 1.000000 (0.379564, 1.620436) MNLCS - mean (95%CI) of world normalised log (1_unique sites) [Sample version]: 1.000000 (0.343900, 2.907818) MNLCS - mean (95%CI) of world normalised log(1+unique STLDs) [Population version]: 1.000000 (0.379564, 1.620436) MNLCS - mean (95%CI) of world normalised log (1_unique STLDs) [Sample version]: 1.000000 (0.343900, 2.907818) MNLCS - mean (95%CI) of world normalised log(1+unique TLDs) [Population version]: 1.000000 (0.379564, 1.620436) MNLCS - mean (95%CI) of world normalised log (1_unique TLDs) [Sample version]: 1.000000 (0.343900, 2.907818) EMNPC - world normalised proportion non-zero (95%CI) [ie risk ratio]: 1.000000 (0.428985, 2.331082)
Group file: Spain_wiki. In set: Biochemistry Molecular Biology Alcohol 2012 Queries : 193 Arithmetic mean (unique URLs) : 0.020725 Arithmetic mean (unique domains) : 0.020725 Arithmetic mean (unique sites) : 0.020725 Arithmetic mean (unique STLDs) : 0.020725 Arithmetic mean (unique TLDs) : 0.020725 Geometric mean (95%CI) of unique URLs : 0.014469 (0.000255, 0.028886) Geometric mean (95%CI) of unique domains : 0.014469 (0.000255, 0.028886) Geometric mean (95%CI) of unique sites : 0.014469 (0.000255, 0.028886) Geometric mean (95%CI) of unique STLDs : 0.014469 (0.000255, 0.028886) Geometric mean (95%CI) of unique TLDs : 0.014469 (0.000255, 0.028886) Mean (95%CI) of log(1+unique URLs) : 0.014366 (0.000255, 0.028476) Mean (95%CI) of log(1+unique domains) : 0.014366 (0.000255, 0.028476) Mean (95%CI) of log(1+unique sites) : 0.014366 (0.000255, 0.028476) Mean (95%CI) of log(1+unique STLDs) : 0.014366 (0.000255, 0.028476) Mean (95%CI) of log(1+unique TLDs) : 0.014366 (0.000255, 0.028476) Proportion non-zero (95%CI) : 0.020725 (0.008089, 0.052069) MNLCS - mean (95%CI) of world normalised log(1+unique URLs) [Population version]: 0.890917 (0.015827, 1.766008) MNLCS - mean (95%CI) of world normalised log(1+unique URLs) [Sample version]: 0.890917 (0.015768, 3.039885) MNLCS - mean (95%CI) of world normalised log(1+unique domains) [Population version]: 0.924021 (0.016415, 1.831627) MNLCS - mean (95%CI) of world normalised log(1+unique domains) [Sample version]: 0.924021 (0.016356, 3.074937) MNLCS - mean (95%CI) of world normalised log(1+unique sites) [Population version]: 1.032124 (0.018336, 2.045913) MNLCS - mean (95%CI) of world normalised log(1+unique sites) [Sample version]: 1.032124 (0.018272, 3.337906) MNLCS - mean (95%CI) of world normalised log(1+unique STLDs) [Population version]: 1.032124 (0.018336, 2.045913) MNLCS - mean (95%CI) of world normalised log(1+unique STLDs) [Sample version]: 1.032124 (0.018272, 3.337906) MNLCS - mean (95%CI) of world normalised log(1+unique TLDs) [Population version]: 1.032124 (0.018336, 2.045913) MNLCS - mean (95%CI) of world normalised log(1+unique TLDs) [Sample version]: 1.032124 (0.018272, 3.337906) World normalised proportion non-zero (95%CI) [ie risk ratio]: 1.032124 (0.346414, 3.075162)
World File: Chemistry Alcohol 2012 world_wiki Queries : 498 Arithmetic mean (unique URLs) : 0.002008 Arithmetic mean (unique domains) : 0.002008 Arithmetic mean (unique sites) : 0.002008 Arithmetic mean (unique STLDs) : 0.002008 Arithmetic mean (unique TLDs) : 0.002008 Geometric mean (95%CI) of unique URLs : 0.001393 (-0.001363, 0.004156) Geometric mean (95%CI) of unique domains : 0.001393 (-0.001363, 0.004156) Geometric mean (95%CI) of unique sites : 0.001393 (-0.001363, 0.004156) Geometric mean (95%CI) of unique STLDs : 0.001393 (-0.001363, 0.004156) Geometric mean (95%CI) of unique TLDs : 0.001393 (-0.001363, 0.004156) Mean (95%CI) of log(1+unique URLs) : 0.001392 (-0.001364, 0.004148) Mean (95%CI) of log(1+unique domains) : 0.001392 (-0.001364, 0.004148) Mean (95%CI) of log(1+unique sites) : 0.001392 (-0.001364, 0.004148) Mean (95%CI) of log(1+unique STLDs) : 0.001392 (-0.001364, 0.004148) Mean (95%CI) of log(1+unique TLDs) : 0.001392 (-0.001364, 0.004148) Proportion non-zero (95%CI) : 0.002008 (0.000355, 0.011285) MNLCS - mean (95%CI) of world normalised log(1+unique URLs) [Population version]: 1.000000 (-0.980000, 2.980000) MNLCS - mean (95%CI) of world normalised log (1_unique URLs) [Sample version]: 1.000000 (NaN, NaN) MNLCS - mean (95%CI) of world normalised log(1+unique domains) [Population version]: 1.000000 (-0.980000, 2.980000) MNLCS - mean (95%CI) of world normalised log (1_unique domains) [Sample version]: 1.000000 (NaN, NaN) MNLCS - mean (95%CI) of world normalised log(1+unique sites) [Population version]: 1.000000 (-0.980000, 2.980000) MNLCS - mean (95%CI) of world normalised log (1_unique sites) [Sample version]: 1.000000 (NaN, NaN) MNLCS - mean (95%CI) of world normalised log(1+unique STLDs) [Population version]: 1.000000 (-0.980000, 2.980000) MNLCS - mean (95%CI) of world normalised log (1_unique STLDs) [Sample version]: 1.000000 (NaN, NaN) MNLCS - mean (95%CI) of world normalised log(1+unique TLDs) [Population version]: 1.000000 (-0.980000, 2.980000) MNLCS - mean (95%CI) of world normalised log (1_unique TLDs) [Sample version]: 1.000000 (NaN, NaN) EMNPC - world normalised proportion non-zero (95%CI) [ie risk ratio]: 1.000000 (0.104375, 9.580795)
Group file: Spain_wiki. In set: Chemistry Alcohol 2012 Queries : 282 Arithmetic mean (unique URLs) : 0.003546 Arithmetic mean (unique domains) : 0.003546 Arithmetic mean (unique sites) : 0.003546 Arithmetic mean (unique STLDs) : 0.003546 Arithmetic mean (unique TLDs) : 0.003546 Geometric mean (95%CI) of unique URLs : 0.002461 (-0.002406, 0.007352) Geometric mean (95%CI) of unique domains : 0.002461 (-0.002406, 0.007352) Geometric mean (95%CI) of unique sites : 0.002461 (-0.002406, 0.007352) Geometric mean (95%CI) of unique STLDs : 0.002461 (-0.002406, 0.007352) Geometric mean (95%CI) of unique TLDs : 0.002461 (-0.002406, 0.007352) Mean (95%CI) of log(1+unique URLs) : 0.002458 (-0.002409, 0.007325) Mean (95%CI) of log(1+unique domains) : 0.002458 (-0.002409, 0.007325) Mean (95%CI) of log(1+unique sites) : 0.002458 (-0.002409, 0.007325) Mean (95%CI) of log(1+unique STLDs) : 0.002458 (-0.002409, 0.007325) Mean (95%CI) of log(1+unique TLDs) : 0.002458 (-0.002409, 0.007325) Proportion non-zero (95%CI) : 0.003546 (0.000626, 0.019810) MNLCS - mean (95%CI) of world normalised log(1+unique URLs) [Population version]: 1.765957 (-1.730638, 5.262553) MNLCS - mean (95%CI) of world normalised log(1+unique URLs) [Sample version]: 1.765957 (NaN, NaN) MNLCS - mean (95%CI) of world normalised log(1+unique domains) [Population version]: 1.765957 (-1.730638, 5.262553) MNLCS - mean (95%CI) of world normalised log(1+unique domains) [Sample version]: 1.765957 (NaN, NaN) MNLCS - mean (95%CI) of world normalised log(1+unique sites) [Population version]: 1.765957 (-1.730638, 5.262553) MNLCS - mean (95%CI) of world normalised log(1+unique sites) [Sample version]: 1.765957 (NaN, NaN) MNLCS - mean (95%CI) of world normalised log(1+unique STLDs) [Population version]: 1.765957 (-1.730638, 5.262553) MNLCS - mean (95%CI) of world normalised log(1+unique STLDs) [Sample version]: 1.765957 (NaN, NaN) MNLCS - mean (95%CI) of world normalised log(1+unique TLDs) [Population version]: 1.765957 (-1.730638, 5.262553) MNLCS - mean (95%CI) of world normalised log(1+unique TLDs) [Sample version]: 1.765957 (NaN, NaN) World normalised proportion non-zero (95%CI) [ie risk ratio]: 1.765957 (0.184564, 16.897165)
The table below contains the same information as above and can be cut and pasted into a spreadsheet for convenience. ====================================================================================================== Set (e.g.,Field/Year) Group Queries Arithmetic mean (unique URLs) Arithmetic mean (unique domains) Arithmetic mean (unique sites) Arithmetic mean (unique STLDs) Arithmetic mean (unique TLDs) Proportion (95%CI) non-zero Lower95 Upper95 Mean (95%CI) of log(1+unique URLs) Lower95 Upper95 Mean (95%CI) of log(1+unique domains) Lower95 Upper95 Mean (95%CI) of log(1+unique sites) Lower95 Upper95 Mean (95%CI) of log(1+unique STLDs) Lower95 Upper95 Mean (95%CI) of log(1+unique TLDs) Lower95 Upper95 Geometric mean (95%CI) of unique URLs Lower95 Upper95 Geometric mean (95%CI) of unique domains: Lower95 Upper95 Geometric mean (95%CI) of unique sites Lower95 Upper95 Geometric mean (95%CI) of unique STLDs Lower95 Upper95 Geometric mean (95%CI) of unique TLDs Lower95 Upper95 MNLCS - mean (95%CI) of world normalised log(1+unique URLs) Lower95 Upper95 MNLCS - mean (95%CI) of world normalised log(1+unique domains) Lower95 Upper95 MNLCS - mean (95%CI) of world normalised log(1+unique sites) Lower95 Upper95 MNLCS - mean (95%CI) of world normalised log(1+unique STLDs) Lower95 Upper95 MNLCS - mean (95%CI) of world normalised log(1+uniqueTLDs) Lower95 Upper95 EMNPC - world normalised proportion non-zero (risk ratio) Lower95 Upper95 Biochemistry Molecular Biology Alcohol 2012 world_wiki World 498 0.026104 0.024096 0.020080 0.020080 0.020080 0.020080 0.010943 0.036565 0.016125 0.005714 0.026536 0.015547 0.005687 0.025407 0.013919 0.005283 0.022554 0.013919 0.005283 0.022554 0.013919 0.005283 0.022554 0.016255 0.005730 0.026891 0.015668 0.005704 0.025732 0.014016 0.005297 0.022810 0.014016 0.005297 0.022810 0.014016 0.005297 0.022810 1.000000 0.321747 3.108037 1.000000 0.331823 3.013657 1.000000 0.343900 2.907818 1.000000 0.343900 2.907818 1.000000 0.343900 2.907818 1.000000 0.428985 2.331082 Biochemistry Molecular Biology Alcohol 2012 Spain_wiki 193 0.020725 0.020725 0.020725 0.020725 0.020725 0.020725 0.008089 0.052069 0.014366 0.000255 0.028476 0.014366 0.000255 0.028476 0.014366 0.000255 0.028476 0.014366 0.000255 0.028476 0.014366 0.000255 0.028476 0.014469 0.000255 0.028886 0.014469 0.000255 0.028886 0.014469 0.000255 0.028886 0.014469 0.000255 0.028886 0.014469 0.000255 0.028886 0.890917 0.015768 3.039885 0.924021 0.016356 3.074937 1.032124 0.018272 3.337906 1.032124 0.018272 3.337906 1.032124 0.018272 3.337906 1.032124 0.346414 3.075162 Chemistry Alcohol 2012 world_wiki World 498 0.002008 0.002008 0.002008 0.002008 0.002008 0.002008 0.000355 0.011285 0.001392 -0.001364 0.004148 0.001392 -0.001364 0.004148 0.001392 -0.001364 0.004148 0.001392 -0.001364 0.004148 0.001392 -0.001364 0.004148 0.001393 -0.001363 0.004156 0.001393 -0.001363 0.004156 0.001393 -0.001363 0.004156 0.001393 -0.001363 0.004156 0.001393 -0.001363 0.004156 1.000000 NaN NaN 1.000000 NaN NaN 1.000000 NaN NaN 1.000000 NaN NaN 1.000000 NaN NaN 1.000000 0.104375 9.580795 Chemistry Alcohol 2012 Spain_wiki 282 0.003546 0.003546 0.003546 0.003546 0.003546 0.003546 0.000626 0.019810 0.002458 -0.002409 0.007325 0.002458 -0.002409 0.007325 0.002458 -0.002409 0.007325 0.002458 -0.002409 0.007325 0.002458 -0.002409 0.007325 0.002461 -0.002406 0.007352 0.002461 -0.002406 0.007352 0.002461 -0.002406 0.007352 0.002461 -0.002406 0.007352 0.002461 -0.002406 0.007352 1.765957 NaN NaN 1.765957 NaN NaN 1.765957 NaN NaN 1.765957 NaN NaN 1.765957 NaN NaN 1.765957 0.184564 16.897165 ======================================================================================================
The mean Normalised Log-transformed Citation Scores (MNLCS) in the table below are the best to use to compare the group overall with the world average if there are multiple different world averages (e.g., different fields and/or years). For each group they are the average of ln(1+c) values, divided by the world average ln(1+c) for the file (e.g., field and year). The world average MNLCS should always be 1. MNLCS values above 1 indicate that the group average is higher than the world average; MNLCS values below 1 indicate that the group average is lower than the world average
WARNING! MNLCS POPULATION confidence limits below are optimistic because they do not take into account the variability in the world average value. - Please use only the MNLCS SAMPLE confidence limits. These are adjusted from the population limits using the weighted average Feiller Expansion calculation. - NaN in the Sample confidence limits mean that these are impossible to calculate and are effectively infinite. ====================================================================================================== Group SampleSize URLMNLCS URLLower95Sample URLUpper95Sample URLLower95Population URLUpper95Population domainMNLCS domainLower95Sample domainUpper95Sample domainLower95Population domainUpper95Population siteMNLCS siteLower95Sample siteUpper95Sample siteLower95Population siteUpper95Population STLDMNLCS STLDLower95Sample STLDUpper95Sample STLDLower95Population STLDUpper95Population TLDMNLCS TLDLower95Sample TLDUpper95Sample TLDLower95Population TLDUpper95Population World 996 1 NaN NaN -0.0407826005900878 2.04078260059009 1 NaN NaN -0.0390180539676497 2.03901805396765 1 NaN NaN -0.0369442738815229 2.03694427388152 1 NaN NaN -0.0369442738815229 2.03694427388152 1 NaN NaN -0.0369442738815229 2.03694427388152 Spain_wiki 475 1.41041481914271 NaN NaN -0.694482453870358 3.51531209215579 1.4238653112085 NaN NaN -0.683270007425786 3.53100062984278 1.46778947368421 NaN NaN -0.647218248937197 3.58279719630562 1.46778947368421 NaN NaN -0.647218248937197 3.58279719630562 1.46778947368421 NaN NaN -0.647218248937197 3.58279719630562 ======================================================================================================
Overall proportion non-zero calculations - not recommended because biased against groups with more articles in categories with a high world proportion of non-zero values. ====================================================================================================== Group N Proportion_Positive Lower95 Upper95 world_wiki 996 0.011044 0.006178 0.019668 Spain_wiki 475 0.010526 0.004504 0.024402 ======================================================================================================
Field equalised proportion non-zero and EMNPC calculations - all group sample sizes are set to the arithmetic mean group sample size for sets with at least one publication ====================================================================================================== Group N AvProportionNonzero Lower95 Upper95 EMNPC Lower95 Upper95 world_wiki 996 0.011044 0.006178 0.019668 1.000000 0.443690 2.253826 Spain_wiki 475 0.012136 0.005490 0.026609 1.098836 0.417754 2.890315 ======================================================================================================
MNPC calculations - similar to the above (ignore this table if EMNPC is reported above). Confidence intervals are the weighted average of the confidence intervals for each individual field/year set. ====================================================================================================== Group N MNPC Lower95 Upper95 MNPCLower95boot MNPCUpper95boot EMNPCLower95boot EMNPCUpper95boot world_wiki 996 1.000000 0.266680 5.955938 1.000000 NaN 1.000000 1.000000 Spain_wiki 475 1.467789 0.250326 11.281067 0.080648 Infinity 0.234574 3.096373 ======================================================================================================