Creating networks and key term/user lists from Twitter (Advanced)
The basic (much simpler) instructions are here.
**Please see also a PowerPoint presentation from Kim Holmberg summarising ways of creating Twitter Networks in Webometric Analyst (please save as filetype .pptx rather than .zip to view it)**
These notes summarise how to identify the key terms and users in a set of twitter data and how to create two types of network from the tweets collected. The notes describe how to create the networks from Tweets that have already been collected by you, so the first step is to collect the data using Webometric Analyst (see below for important advice) or another tool. The notes here describe how to create the following types of information.
- Lists of the most important words, #hashtags or @users
- Co-mention networks for words, #hashtags or @users
- Direct tweet networks of who tweets whom
Summary of key steps
- Collect tweets
- Identify key terms and/or users for the tweets
- Create the lists
- Create the networks
Collecting tweets
The tweets should have been collected either by listing users to follow or keywords to search for in tweets, or both. It is no longer possible to download tweets with Webometric Analyst. There are some functions to analyse data collected before Twitter/X stopped free data sharing..
Identifying key terms and/or users for the tweets
Webometric Analyst can identify the most important words, hashtags and users based upon their relative frequency in texts for each label. For each label, the most important terms are those that occur frequently for that label and rarely for other labels. Webometric Analyst uses the chisquare metric to estimate the importance of terms. It will produce a list of terms for each label and their importance rating, as follows.
- Using Windows Notepad or otherwise (not a word processor), create a plain text file listing the labels that you want to analyse for your data, one per line. For instance, this might be the queries used or labels for groups of queries. These should exactly correspond to the labels in the label column of your data file.
- Start Webometric Analyst, close the Wizard by clicking the top-right hand corner X sign, and select menu item: Tab-sep. text| Calculate chi-squared of all words in col n by categories in col m in plain text tab-separated [e.g., for multiple twitter searches]
- Follow the Webometric Analyst instructions about file selection, column numbers etc. carefully and wait for it to finish. Make sure that you say Yes to including hashtags and @ symbols in the results.
- Load the resulting file into a spreadsheet.
- Sort the file on the chi-squared column in descending numerical order for the label that you are interested in. (e.g., in Excel, select all the columns in the worksheet, then right click the appropriate chi-squared column and select Sort descending from the right-click menu OR select the sort button from the Data menu tab, select the appropriate chi-square column, and choose Z-A sort.)
- The most important hashtags, users and terms will be listed in descending order at the top of the file, mixed together. You will have to manually sort out the ones that you need. For example
- to identify the top 50 users, select the first 50 terms starting with @
- to select the top 50 hashtags, select the first 50 terms starting with #
- to identify the top 50 terms, select the first 50 terms not starting with @ or #.
If you only have one set of queries then it is impossible to identify the most important terms because a comparison is needed for this. Instead you can follow the above procedure but sort on the raw term frequencies instead of the chi-squared values. The top terms will be common words like "it" and "the" and you will have to manually identify topic-related words from the sorted list.
Example: The spreadsheet here covers words extracted from different scholarly disciplines or fields.
Creating co-mention networks for words, #hashtags or @users
Co-mention networks are networks based upon how often words, hashtags or users co-occur in tweets. For instance, if the data set contains the following three tweets:
- [From @rajvordeman] Hello @ mikethelwall and @amitabhbachchan
- [From @mikethelwall] @rajvordeman is @amitabhbachchan at #work?
- [From @mikethelwall] @amitabhbachchan are you at #work or #pub?
Then in terms of the three types of co-mention:
- There is one co-mention of @ mikethelwall and @rajvordeman (tweet 1) and one co-mention of @rajvordeman and @amitabhbachchan (tweet 2)
- There is one co-mention of #work and #pub (tweet 3)
- There are lots of co-mentions of individual terms (e.g., is and at in tweet 2).
To obtain a co-mention network of tweets, click the
ALTERNATIVE METHOD: To obtain co-mention networks of any type complete the following instructions.
- Using Windows Notepad or a more powerful alternative such as Notepad Plus Plus (but not a word processor), create a plain text file listing all the words/users/hashtags that you would like to calculate co-mentions for. This should normally be the top 50 words/users/hashtags extracted using the section above.
- If you haven't done it already, create the plain text label file, as described in the section above.
- Start Webometric Analyst, close the Wizard by clicking the top-right hand corner X sign, and select menu item: Tab-sep. text| Make co-occurrence matrix and Pajek file for word list or user list using col m of file [e.g., after listing the top words/tweeters/hashtags above]
- Follow the Webometric Analyst instructions about file selection, column numbers etc. carefully and wait for it to finish. Make sure that you say Yes to including hashtags and @ symbols in the results.
- This produces two files, a matrix and a Pajek network file.
- To read the matrix, load it into a spreadsheet.
- To view the network, either load it into Pajek (if you are familiar with it) or follow the instructions below to load it into Webometric Analyst's network drawing part. For the latter, start Webometric Analyst, close the Wizard by clicking the top-right hand corner X sign, and select menu item: File| Draw Network and then load the network either using the File|Load command or by dragging the filename onto the title bar. To arrange the network neatly, select the menu item: Layout|Automatic. To colour in the network very approximately by community, select menu item Partition|Colour network by Community|As many communities as possible. More information about drawing networks is available.
Creating direct tweet networks for @users
Direct tweet networks are based upon how often tweets from one @user contain the names of other @users. For example, consider the tweets:
- [From @rajvordeman] Hello @ mikethelwall and @amitabhbachchan
- [From @mikethelwall] @rajvordeman is @amitabhbachchan at #work?
- [From @mikethelwall] @amitabhbachchan are you at #work or #pub?
Then in terms of messaging:
- @rajvordeman -> @mikethelwall has a count of 1 (tweet 1)
- @mikethelwall -> @rajvordeman has a count of 1 (tweet 2)
- @mikethelwall -> @amitabhbachchan has a count of 2 (tweets 2 and 3)
To obtain direct tweet networks complete the following instructions.
- Using Windows Notepad or a more powerful alternative such as Notepad Plus Plus (but not a word processor), create a plain text file listing all the @users that you would like to calculate messages between. This should normally be the top 50 @users extracted using the section above although you may have a reason for creating a different list.
- If you haven't done it already, create the plain text label file, as described in the section above.
- Start Webometric Analyst, close the Wizard by clicking the top-right hand corner X sign, and select menu item: Tab-sep. text| Make direct tweet frequency matrix and Pajek file for user list using col m of file [e.g., tweeters list, tweets]
- Follow the Webometric Analyst instructions about file selection, column numbers etc. carefully and wait for it to finish.
- This produces two files, a matrix and a Pajek network file.
- To read the matrix, load it into a spreadsheet.
- To view the network, either load it into Pajek (if you are familiar with it) or follow the instructions below to load it into Webometric Analyst's network drawing part. For the latter, start Webometric Analyst, close the Wizard by clicking the top-right hand corner X sign, and select menu item: File| Draw Network and then load the network either using the File|Load command or by dragging the filename onto the title bar.
- To arrange the network neatly, select the menu item: Layout|Automatic.
- To colour in the network very approximately by community, select menu item Partition|Colour network by Community|As many communities as possible. Alernatively, if you have logical categories for the nodes already, colour in the nodes yourself using a different colour for each category. Right-clicking on a node gives a menu that includes an option to colour a node.
- More information about drawing networks is available.
Example: The network below was created from digital humanities tweets, using the option to ignore @ and # symbols when processing the data so hashtags, usernames and keywords are all mixed up. The network was drawn with webometric analyst and manually tidied up by moving nodes around to make the pattern clearer and recolouring the digitalhumanities node from blue to red.