Query Splitting

What is it? Query splitting is a technique used to get additional results beyond the normal maximum of 1000 results per query returned by search engines. Most search engines, including Bing, stop after returning 1000 matches for a search and this is sometimes not enough for webometric purposes. In practice, if a search engine has over 1000 matches then it sometimes stops at a figure somewhere between 850 and 1000, so even if a query returns 850 results there may be significantly more in the search engine database.

How does it work? (Level 1) If a query splitting is enabled in Webometric Analyst and a query returns over 850 results then Webometric Analyst suspects that there may be many additional results not returned by the search engine. It then automatically constructs two logically disjoint sub-queries and submits them and combines the results of all three queries. If the level is higher than 1 then this process is repeated for each level.

What are the disadvantages? Level n query splitting uses up 2^(n+1)-1 times more queries than normal searching so it takes longer and may use up your 5000 searches before you finish your task. For example, level 1 query splitting uses up 3 times as many searches, level 2 uses up 7 times as many searches and level 3 uses up 15 times as many searches.

How do I use query splitting in the Wizard? If you are using the Wizard, click the advanced option and select a query splitting level of 2, when asked. This will give up to 7000 results per search (probably closer to 4,000 though). If you really need lots of extra results then select a query splitting level of 3, which should give up to 7,000-15,000 results but used lots of searches per query so you may run out of uses of your Azure key before your searches finish.

How do I use query splitting in the classic interface? If using the classic interface, to get lots of information about a single search:-

  1. Start Webometric Analyst and select the option to go to the Classic Interface
    Look on the right for the Advanced Section and click the Extended URL lists tab
  2. Check the Use Query Splitting... Option (the default is the maximum level of query splitting which can take a long time and many queries. If you just want double or quadruple the results, change the "max splits..." option to 1 or 2.
  3. Click Run searches in a File and select the plain text file with the searches in (as normal for the classic interface).
  4. Once the searches have finished, the results will all be in a plain text file with a similar name to the original file. If you need the results summarised and duplicate URLs eliminated (there will be many duplicates in the results) then please use the Reports menu, "make a set of.. " option.

Please note that query splitting takes up a lot of searches. The programme will stop when the query limit has been reached and can be restarted after a month. Please be careful with this feature and only use it if really necessary because it generates a lot of searches.

A published technical report about query splitting: