- Start by downloading Webometric Analyst from http://lexiurl.wlv.ac.uk/ and signing up for a Windows Azure account key, following the instructions on that site.
- Select a set of up to 20 queries to test (more than 20 and you may have to pay Bing!). You could pick random keywords from the list of words from English tweets in 2012 here or you may have your own idea about which searches you would like to test. Use Windows Notepad to create a text file containing each query, one per line (e.g., here).
- Select a set of search markets to test. You could pick the complete set of English search markets (e.g., here) or select from the Bing search market list. The list of search markets should be saved into a plain text file created with Windows Notepad, one market per line, and saved without any extra spaces (example).
- Now prepare to run the searches by starting Webometric Analyst, entering your Windows Azure account key, and selecting Classic Interface from the Wizard. In the Search Options menu, check the "Run each query once for each search market…" option. Also in the Search Options menu, select the Change Search Options menu item to produce a dialog box of search options. In this dialog box, check "Disable Location Detection" (to stop Bing from customising the search results to your current geographic location) and click OK.
- Now run the searches (via the Bing API) by clicking on the Run All Searches in File button and, when requested, selecting the file of queries and the file of search markets. It may take an hour or so to complete all the searches.
- To analyse the overlap between searches and to count how often each URL appears in the results of the different searches, from the Webometric Analyst classic interface, select "Compare Overlap and Order for two or more Long Results Files…" from the Utilities menu. And select the folder containing the files. The folder should just contain the long results files (i.e., those containing "long results" in their filename) and any other files should be moved into a different folder. The results produced will be the following:
- Overlap10: (not needed) a table of the overlap size in top 10 results between all possible pairs of search markets.
- Overlap10_matrix: a matrix of the overlap size in top 10 results between all possible pairs of search markets (same information as Overlap10 but in matrix format).
- Overlap_10JaccardDist: a matrix of Jaccard distances between top 10 results of the different markets.
- OverlapTop10URLs: A table of all URLs in the top 10 of each query, with a column for each search market recording a 1 in a column if the URL appears in the results for that search market.
- OverlapAll: (not needed) a table of the overlap size in the full set of result URLs between all possible pairs of search markets.
- OverlapAll_matrix: a matrix of the overlap size in the full set of result URLs between all possible pairs of search markets (same information as OverlapAll but in matrix format).
- OverlapAllURLs: A table of all URLs in the full set of result URLs for each query, with a column for each search market recording a 1 in a column if the URL appears in the results for that search market.
- Overlap_AllJaccardDist: a matrix of Jaccard distances between all the results of the different markets.
- Analyse the results with your choice of spreadsheet and/or statistics program.