Web Indicator Reports

Here are the steps necessary to collect web data and calculate a range of indicators for a collection of publications, including the Mean Normalised Log-transformed Citation Score (MNLCS) and the Normalised Proportion Cited (NPC).

  1. Step 1: Identify the group of publications to be assessed and categorise them by field (e.g., using Scopus or WoS subject categories).
  2. Step 2: Save the article information (authors, title, journal, publication year) in a standard tab-delimited format in a separate file for each subject category/year combination. First, discard publications that are in small subject/year combinations (e.g., <100 publications). Create tab-delimited files for the each subject/year. There should be one line per publication. Each line should contain the author names in standard format (following Scopus or Web of Science formats would be ideal), the publication year, the article title and the journal name (ignore this for books). The first line of the file should contain header information. Here is an example of the format for journal articles and for books. If your data is in a spreadsheet, it can be saved in this format using the Save As command and selecting the Plain text (tab delimited) format. The filename for each file must contain the subject name and year, and end with -[group].txt, where [group] should be replaced by a name for the collection of articles. The same [group] should be used for files containing publications from the same group. If the files are in Scopus of the Web of Science then choose the tab delimited format in which to save them.
  3. Step 3: For each retained subject/year combination, a benchmarking sample is needed of articles from the rest of the world. For this, download all articles from the Scopus/WoS (if possible) field/year or a large balanced sample (e.g., the first and last 5000 articles published in the category) for the world reference set. Filter out any large trade or art journals with a high proportion of uncited articles. Name the files using the standard Webometric Analyst naming convention so that each filename contains the subject name and year, and ends with -world.txt. These filenames must exactly match the group filenames, except for replacing -[group].txt with -world.txt. All of the files should be stored within a single folder that does not contain any other files.
    1. Here is a small artificial example of a complete set of publication data files in structured name format, with all publications in a single file being from the same field and year, and each group file corresponding to a world file.

  1. Step 5: Since Bing API searches need to be paid for after the first free 1,000, unless you have a budget, the next stage is to generate a random sample of articles from the world and group sets (e.g., 500 per set) and use these samples instead of the full set. For this, from the Make Searches menu, select the Replace search files with a random sample up to a maximum number menu option and instruct Webometric Analyst to replace all the search files with random samples of 500.
  2. Step 6: Use Webometric Analyst to run all the searches. For, this, start Webometric Analyst, open the Wizard by selecting Link Analysis Wizard from the File menu, enter your search key, click OK, click the Run All Searches In File button and select one of the search files. Wait for Webometric Analyst to finish and then click the same button again and select another file. Repeat this until all the files have been run. The picture below shows some of the files generated for PowerPoint searches, togther with two additioanl files created in Stage 7. Example files for Wikipedia searches.

  1. Step 7: Use Webometric Analyst to calculate MNLCS and EMNPC and confidence limits for both. For this, start Webometric Analyst, close the Startup Wizard and then select Calculate MNLCS, gMNCS and NPC for a set of web searches (structured file names) from the Reports menu. Select the folder containing all of the files, when requested. This will create two new files. The file called all_data.txt, contains all of the data extracted from the searches in a format that can be loaded into a stats package or spreadsheet. This is a backup file in case you want to calculate your own indicators. The file called report.txt contains MNLCS and EMNPC values for each individual file in a long list at the top. Near the end of the file it then reports tables of the combined MNLCS and EMNPC values for the whole collection. This is the main part of the results. [See below for a sample output]
  2. Step 8: If you want MNLCS and EMNPC calculated separately for each year, then create new folders, one for each year, and copy all the files from each year into the relevant year folder. Repeat the step above for each year folder.

Sample report

Source of web search results: C:\Users\Public\Documents\data\Wikipedia citations structured names
   Total number of world files (e.g., one per field and year): 2
World File: Biochemistry Molecular Biology Alcohol 2012 world_wiki
   Queries   : 498
   Arithmetic mean (unique URLs)   : 0.026104
   Arithmetic mean (unique domains)   : 0.024096
   Arithmetic mean (unique sites)   : 0.020080
   Arithmetic mean (unique STLDs)   : 0.020080
   Arithmetic mean (unique TLDs)   : 0.020080
   Geometric mean (95%CI) of unique URLs   : 0.016255 (0.005730, 0.026891)
   Geometric mean (95%CI) of unique domains   : 0.015668 (0.005704, 0.025732)
   Geometric mean (95%CI) of unique sites   : 0.014016 (0.005297, 0.022810)
   Geometric mean (95%CI) of unique STLDs   : 0.014016 (0.005297, 0.022810)
   Geometric mean (95%CI) of unique TLDs   : 0.014016 (0.005297, 0.022810)
   Mean (95%CI) of log(1+unique URLs)   : 0.016125 (0.005714, 0.026536)
   Mean (95%CI) of log(1+unique domains)   : 0.015547 (0.005687, 0.025407)
   Mean (95%CI) of log(1+unique sites)   : 0.013919 (0.005283, 0.022554)
   Mean (95%CI) of log(1+unique STLDs)   : 0.013919 (0.005283, 0.022554)
   Mean (95%CI) of log(1+unique TLDs)   : 0.013919 (0.005283, 0.022554)
   Proportion non-zero (95%CI)           : 0.020080 (0.010943, 0.036565)
   MNLCS - mean (95%CI) of world normalised log(1+unique URLs)   [Population version]: 1.000000 (0.354343, 1.645657)
   MNLCS - mean (95%CI) of world normalised log (1_unique URLs)       [Sample version]: 1.000000 (0.321747, 3.108037)
   MNLCS - mean (95%CI) of world normalised log(1+unique domains)   [Population version]: 1.000000 (0.365825, 1.634175)
   MNLCS - mean (95%CI) of world normalised log (1_unique domains)       [Sample version]: 1.000000 (0.331823, 3.013657)
   MNLCS - mean (95%CI) of world normalised log(1+unique sites)   [Population version]: 1.000000 (0.379564, 1.620436)
   MNLCS - mean (95%CI) of world normalised log (1_unique sites)       [Sample version]: 1.000000 (0.343900, 2.907818)
   MNLCS - mean (95%CI) of world normalised log(1+unique STLDs)   [Population version]: 1.000000 (0.379564, 1.620436)
   MNLCS - mean (95%CI) of world normalised log (1_unique STLDs)       [Sample version]: 1.000000 (0.343900, 2.907818)
   MNLCS - mean (95%CI) of world normalised log(1+unique TLDs)   [Population version]: 1.000000 (0.379564, 1.620436)
   MNLCS - mean (95%CI) of world normalised log (1_unique TLDs)       [Sample version]: 1.000000 (0.343900, 2.907818)
   EMNPC - world normalised proportion non-zero (95%CI) [ie risk ratio]: 1.000000 (0.428985, 2.331082)
Group file: Spain_wiki. In set: Biochemistry Molecular Biology Alcohol 2012
   Queries   : 193
   Arithmetic mean (unique URLs)   : 0.020725
   Arithmetic mean (unique domains)   : 0.020725
   Arithmetic mean (unique sites)   : 0.020725
   Arithmetic mean (unique STLDs)   : 0.020725
   Arithmetic mean (unique TLDs)   : 0.020725
   Geometric mean (95%CI) of unique URLs   : 0.014469 (0.000255, 0.028886)
   Geometric mean (95%CI) of unique domains   : 0.014469 (0.000255, 0.028886)
   Geometric mean (95%CI) of unique sites   : 0.014469 (0.000255, 0.028886)
   Geometric mean (95%CI) of unique STLDs   : 0.014469 (0.000255, 0.028886)
   Geometric mean (95%CI) of unique TLDs   : 0.014469 (0.000255, 0.028886)
   Mean (95%CI) of log(1+unique URLs)   : 0.014366 (0.000255, 0.028476)
   Mean (95%CI) of log(1+unique domains)   : 0.014366 (0.000255, 0.028476)
   Mean (95%CI) of log(1+unique sites)   : 0.014366 (0.000255, 0.028476)
   Mean (95%CI) of log(1+unique STLDs)   : 0.014366 (0.000255, 0.028476)
   Mean (95%CI) of log(1+unique TLDs)   : 0.014366 (0.000255, 0.028476)
   Proportion non-zero (95%CI)           : 0.020725 (0.008089, 0.052069)
   MNLCS - mean (95%CI) of world normalised log(1+unique URLs)   [Population version]: 0.890917 (0.015827, 1.766008)
   MNLCS - mean (95%CI) of world normalised log(1+unique URLs)       [Sample version]: 0.890917 (0.015768, 3.039885)
   MNLCS - mean (95%CI) of world normalised log(1+unique domains)   [Population version]: 0.924021 (0.016415, 1.831627)
   MNLCS - mean (95%CI) of world normalised log(1+unique domains)       [Sample version]: 0.924021 (0.016356, 3.074937)
   MNLCS - mean (95%CI) of world normalised log(1+unique sites)   [Population version]: 1.032124 (0.018336, 2.045913)
   MNLCS - mean (95%CI) of world normalised log(1+unique sites)       [Sample version]: 1.032124 (0.018272, 3.337906)
   MNLCS - mean (95%CI) of world normalised log(1+unique STLDs)   [Population version]: 1.032124 (0.018336, 2.045913)
   MNLCS - mean (95%CI) of world normalised log(1+unique STLDs)       [Sample version]: 1.032124 (0.018272, 3.337906)
   MNLCS - mean (95%CI) of world normalised log(1+unique TLDs)   [Population version]: 1.032124 (0.018336, 2.045913)
   MNLCS - mean (95%CI) of world normalised log(1+unique TLDs)       [Sample version]: 1.032124 (0.018272, 3.337906)
   World normalised proportion non-zero (95%CI) [ie risk ratio]: 1.032124 (0.346414, 3.075162)
 
World File: Chemistry Alcohol 2012 world_wiki
   Queries   : 498
   Arithmetic mean (unique URLs)   : 0.002008
   Arithmetic mean (unique domains)   : 0.002008
   Arithmetic mean (unique sites)   : 0.002008
   Arithmetic mean (unique STLDs)   : 0.002008
   Arithmetic mean (unique TLDs)   : 0.002008
   Geometric mean (95%CI) of unique URLs   : 0.001393 (-0.001363, 0.004156)
   Geometric mean (95%CI) of unique domains   : 0.001393 (-0.001363, 0.004156)
   Geometric mean (95%CI) of unique sites   : 0.001393 (-0.001363, 0.004156)
   Geometric mean (95%CI) of unique STLDs   : 0.001393 (-0.001363, 0.004156)
   Geometric mean (95%CI) of unique TLDs   : 0.001393 (-0.001363, 0.004156)
   Mean (95%CI) of log(1+unique URLs)   : 0.001392 (-0.001364, 0.004148)
   Mean (95%CI) of log(1+unique domains)   : 0.001392 (-0.001364, 0.004148)
   Mean (95%CI) of log(1+unique sites)   : 0.001392 (-0.001364, 0.004148)
   Mean (95%CI) of log(1+unique STLDs)   : 0.001392 (-0.001364, 0.004148)
   Mean (95%CI) of log(1+unique TLDs)   : 0.001392 (-0.001364, 0.004148)
   Proportion non-zero (95%CI)           : 0.002008 (0.000355, 0.011285)
   MNLCS - mean (95%CI) of world normalised log(1+unique URLs)   [Population version]: 1.000000 (-0.980000, 2.980000)
   MNLCS - mean (95%CI) of world normalised log (1_unique URLs)       [Sample version]: 1.000000 (NaN, NaN)
   MNLCS - mean (95%CI) of world normalised log(1+unique domains)   [Population version]: 1.000000 (-0.980000, 2.980000)
   MNLCS - mean (95%CI) of world normalised log (1_unique domains)       [Sample version]: 1.000000 (NaN, NaN)
   MNLCS - mean (95%CI) of world normalised log(1+unique sites)   [Population version]: 1.000000 (-0.980000, 2.980000)
   MNLCS - mean (95%CI) of world normalised log (1_unique sites)       [Sample version]: 1.000000 (NaN, NaN)
   MNLCS - mean (95%CI) of world normalised log(1+unique STLDs)   [Population version]: 1.000000 (-0.980000, 2.980000)
   MNLCS - mean (95%CI) of world normalised log (1_unique STLDs)       [Sample version]: 1.000000 (NaN, NaN)
   MNLCS - mean (95%CI) of world normalised log(1+unique TLDs)   [Population version]: 1.000000 (-0.980000, 2.980000)
   MNLCS - mean (95%CI) of world normalised log (1_unique TLDs)       [Sample version]: 1.000000 (NaN, NaN)
   EMNPC - world normalised proportion non-zero (95%CI) [ie risk ratio]: 1.000000 (0.104375, 9.580795)
Group file: Spain_wiki. In set: Chemistry Alcohol 2012
   Queries   : 282
   Arithmetic mean (unique URLs)   : 0.003546
   Arithmetic mean (unique domains)   : 0.003546
   Arithmetic mean (unique sites)   : 0.003546
   Arithmetic mean (unique STLDs)   : 0.003546
   Arithmetic mean (unique TLDs)   : 0.003546
   Geometric mean (95%CI) of unique URLs   : 0.002461 (-0.002406, 0.007352)
   Geometric mean (95%CI) of unique domains   : 0.002461 (-0.002406, 0.007352)
   Geometric mean (95%CI) of unique sites   : 0.002461 (-0.002406, 0.007352)
   Geometric mean (95%CI) of unique STLDs   : 0.002461 (-0.002406, 0.007352)
   Geometric mean (95%CI) of unique TLDs   : 0.002461 (-0.002406, 0.007352)
   Mean (95%CI) of log(1+unique URLs)   : 0.002458 (-0.002409, 0.007325)
   Mean (95%CI) of log(1+unique domains)   : 0.002458 (-0.002409, 0.007325)
   Mean (95%CI) of log(1+unique sites)   : 0.002458 (-0.002409, 0.007325)
   Mean (95%CI) of log(1+unique STLDs)   : 0.002458 (-0.002409, 0.007325)
   Mean (95%CI) of log(1+unique TLDs)   : 0.002458 (-0.002409, 0.007325)
   Proportion non-zero (95%CI)           : 0.003546 (0.000626, 0.019810)
   MNLCS - mean (95%CI) of world normalised log(1+unique URLs)   [Population version]: 1.765957 (-1.730638, 5.262553)
   MNLCS - mean (95%CI) of world normalised log(1+unique URLs)       [Sample version]: 1.765957 (NaN, NaN)
   MNLCS - mean (95%CI) of world normalised log(1+unique domains)   [Population version]: 1.765957 (-1.730638, 5.262553)
   MNLCS - mean (95%CI) of world normalised log(1+unique domains)       [Sample version]: 1.765957 (NaN, NaN)
   MNLCS - mean (95%CI) of world normalised log(1+unique sites)   [Population version]: 1.765957 (-1.730638, 5.262553)
   MNLCS - mean (95%CI) of world normalised log(1+unique sites)       [Sample version]: 1.765957 (NaN, NaN)
   MNLCS - mean (95%CI) of world normalised log(1+unique STLDs)   [Population version]: 1.765957 (-1.730638, 5.262553)
   MNLCS - mean (95%CI) of world normalised log(1+unique STLDs)       [Sample version]: 1.765957 (NaN, NaN)
   MNLCS - mean (95%CI) of world normalised log(1+unique TLDs)   [Population version]: 1.765957 (-1.730638, 5.262553)
   MNLCS - mean (95%CI) of world normalised log(1+unique TLDs)       [Sample version]: 1.765957 (NaN, NaN)
   World normalised proportion non-zero (95%CI) [ie risk ratio]: 1.765957 (0.184564, 16.897165)
 
The table below contains the same information as above and can be cut and pasted into a spreadsheet for convenience.
   ======================================================================================================
   Set (e.g.,Field/Year)	Group	Queries	Arithmetic mean (unique URLs)	Arithmetic mean (unique domains)	Arithmetic mean (unique sites)	Arithmetic mean (unique STLDs)	Arithmetic mean (unique TLDs)	Proportion (95%CI) non-zero	Lower95	Upper95	Mean (95%CI) of log(1+unique URLs)	Lower95	Upper95	Mean (95%CI) of log(1+unique domains)	Lower95	Upper95	Mean (95%CI) of log(1+unique sites)	Lower95	Upper95	Mean (95%CI) of log(1+unique STLDs)	Lower95	Upper95	Mean (95%CI) of log(1+unique TLDs)	Lower95	Upper95	Geometric mean (95%CI) of unique URLs	Lower95	Upper95	Geometric mean (95%CI) of unique domains:	Lower95	Upper95	Geometric mean (95%CI) of unique sites	Lower95	Upper95	Geometric mean (95%CI) of unique STLDs	Lower95	Upper95	Geometric mean (95%CI) of unique TLDs	Lower95	Upper95	MNLCS - mean (95%CI) of world normalised log(1+unique URLs)	Lower95	Upper95	MNLCS - mean (95%CI) of world normalised log(1+unique domains)	Lower95	Upper95	MNLCS - mean (95%CI) of world normalised log(1+unique sites)	Lower95	Upper95	MNLCS - mean (95%CI) of world normalised log(1+unique STLDs)	Lower95	Upper95	MNLCS - mean (95%CI) of world normalised log(1+uniqueTLDs)	Lower95	Upper95	EMNPC - world normalised proportion non-zero (risk ratio)	Lower95	Upper95
   Biochemistry Molecular Biology Alcohol 2012 world_wiki	World	498	0.026104	0.024096	0.020080	0.020080	0.020080	0.020080	0.010943	0.036565	0.016125	0.005714	0.026536	0.015547	0.005687	0.025407	0.013919	0.005283	0.022554	0.013919	0.005283	0.022554	0.013919	0.005283	0.022554	0.016255	0.005730	0.026891	0.015668	0.005704	0.025732	0.014016	0.005297	0.022810	0.014016	0.005297	0.022810	0.014016	0.005297	0.022810	1.000000	0.321747	3.108037	1.000000	0.331823	3.013657	1.000000	0.343900	2.907818	1.000000	0.343900	2.907818	1.000000	0.343900	2.907818	1.000000	0.428985	2.331082
   Biochemistry Molecular Biology Alcohol 2012	Spain_wiki	193	0.020725	0.020725	0.020725	0.020725	0.020725	0.020725	0.008089	0.052069	0.014366	0.000255	0.028476	0.014366	0.000255	0.028476	0.014366	0.000255	0.028476	0.014366	0.000255	0.028476	0.014366	0.000255	0.028476	0.014469	0.000255	0.028886	0.014469	0.000255	0.028886	0.014469	0.000255	0.028886	0.014469	0.000255	0.028886	0.014469	0.000255	0.028886	0.890917	0.015768	3.039885	0.924021	0.016356	3.074937	1.032124	0.018272	3.337906	1.032124	0.018272	3.337906	1.032124	0.018272	3.337906	1.032124	0.346414	3.075162
   Chemistry Alcohol 2012 world_wiki	World	498	0.002008	0.002008	0.002008	0.002008	0.002008	0.002008	0.000355	0.011285	0.001392	-0.001364	0.004148	0.001392	-0.001364	0.004148	0.001392	-0.001364	0.004148	0.001392	-0.001364	0.004148	0.001392	-0.001364	0.004148	0.001393	-0.001363	0.004156	0.001393	-0.001363	0.004156	0.001393	-0.001363	0.004156	0.001393	-0.001363	0.004156	0.001393	-0.001363	0.004156	1.000000	NaN	NaN	1.000000	NaN	NaN	1.000000	NaN	NaN	1.000000	NaN	NaN	1.000000	NaN	NaN	1.000000	0.104375	9.580795
   Chemistry Alcohol 2012	Spain_wiki	282	0.003546	0.003546	0.003546	0.003546	0.003546	0.003546	0.000626	0.019810	0.002458	-0.002409	0.007325	0.002458	-0.002409	0.007325	0.002458	-0.002409	0.007325	0.002458	-0.002409	0.007325	0.002458	-0.002409	0.007325	0.002461	-0.002406	0.007352	0.002461	-0.002406	0.007352	0.002461	-0.002406	0.007352	0.002461	-0.002406	0.007352	0.002461	-0.002406	0.007352	1.765957	NaN	NaN	1.765957	NaN	NaN	1.765957	NaN	NaN	1.765957	NaN	NaN	1.765957	NaN	NaN	1.765957	0.184564	16.897165
   ======================================================================================================
 
The mean Normalised Log-transformed Citation Scores (MNLCS) in the table below are the best to use to compare the group overall with the world average if there are multiple different world averages (e.g., different fields and/or years).
   For each group they are the average of ln(1+c) values, divided by the world average ln(1+c) for the file (e.g., field and year).
   The world average MNLCS should always be 1.
   MNLCS values above 1 indicate that the group average is higher than the world average; MNLCS values below 1 indicate that the group average is lower than the world average
WARNING! MNLCS POPULATION confidence limits below are optimistic because they do not take into account the variability in the world average value.
   - Please use only the MNLCS SAMPLE confidence limits. These are adjusted from the population limits using the weighted average Feiller Expansion calculation.
   - NaN in the Sample confidence limits mean that these are impossible to calculate and are effectively infinite.
   ======================================================================================================
   Group	SampleSize	URLMNLCS	URLLower95Sample	URLUpper95Sample	URLLower95Population	URLUpper95Population	domainMNLCS	domainLower95Sample	domainUpper95Sample	domainLower95Population	domainUpper95Population	siteMNLCS	siteLower95Sample	siteUpper95Sample	siteLower95Population	siteUpper95Population	STLDMNLCS	STLDLower95Sample	STLDUpper95Sample	STLDLower95Population	STLDUpper95Population	TLDMNLCS	TLDLower95Sample	TLDUpper95Sample	TLDLower95Population	TLDUpper95Population
   World	996	1	NaN	NaN	-0.0407826005900878	2.04078260059009	1	NaN	NaN	-0.0390180539676497	2.03901805396765	1	NaN	NaN	-0.0369442738815229	2.03694427388152	1	NaN	NaN	-0.0369442738815229	2.03694427388152	1	NaN	NaN	-0.0369442738815229	2.03694427388152
   Spain_wiki	475	1.41041481914271	NaN	NaN	-0.694482453870358	3.51531209215579	1.4238653112085	NaN	NaN	-0.683270007425786	3.53100062984278	1.46778947368421	NaN	NaN	-0.647218248937197	3.58279719630562	1.46778947368421	NaN	NaN	-0.647218248937197	3.58279719630562	1.46778947368421	NaN	NaN	-0.647218248937197	3.58279719630562
   ======================================================================================================
Overall proportion non-zero calculations - not recommended because biased against groups with more articles in categories with a high world proportion of non-zero values.
   ======================================================================================================
   Group	N	Proportion_Positive	Lower95	Upper95
   world_wiki	996	0.011044	0.006178	0.019668
   Spain_wiki	475	0.010526	0.004504	0.024402
   ======================================================================================================
Field equalised proportion non-zero and EMNPC calculations - all group sample sizes are set to the arithmetic mean group sample size for sets with at least one publication
   ======================================================================================================
   Group	N	AvProportionNonzero	Lower95	Upper95	EMNPC	Lower95	Upper95
   world_wiki	996	0.011044	0.006178	0.019668	1.000000	0.443690	2.253826
   Spain_wiki	475	0.012136	0.005490	0.026609	1.098836	0.417754	2.890315
   ======================================================================================================
 
MNPC calculations - similar to the above (ignore this table if EMNPC is reported above). Confidence intervals are the weighted average of the confidence intervals for each individual field/year set.
   ======================================================================================================
   Group	N	MNPC	Lower95	Upper95	MNPCLower95boot	MNPCUpper95boot	EMNPCLower95boot	EMNPCUpper95boot
   world_wiki	996	1.000000	0.266680	5.955938	1.000000	NaN	1.000000	1.000000
   Spain_wiki	475	1.467789	0.250326	11.281067	0.080648	Infinity	0.234574	3.096373
   ======================================================================================================