Big Data Text AnalysisHome -- Download -- Instructions -- FAQ |
This is a method to investigate a broad topic (e.g., dancing) with many subtopics (e.g., dance styles) by analysing the words used in comments on relevant videos.
Guideline data size recommendations, based on the number of comments you collect:
The CTFC method comprises both data gathering and analysis. Whilst the complete method involves many different types of analyses, a particular application can ignore irrelevant ones. This method is fully described in an academic paper that is currently being reviewed. This page supports this paper with specific instructions for the software.
Note that Step 1.4 and 1.6 can now be conducted through Mozdeh instead of Webometric Analyst, with the YouTube data collection tab in the Mozdeh startup wizard.
If you are studying one or more YouTube channels then the first 5 actions can be ignored and instead enter the channel ID(s) in the YouTube query interface and select the Channel IDs option in the Data Collection dialog box (the sixth action: Comment downloading).
The this completes the creation of a Mozdeh project with predominantly English comments on each video.
This produces a time series graph of all the comments. (a) Start Mozdeh and load the English version of the project. (b) From the Analyse menu, select the Graph Time Series submenu. (c) Enter a blank search and click Create Graph with Boolean Search. (d) To save the graph to a file, click Show Graph Formatting Options and then click Print Graph, and select a printer that will save the results to a file (e.g., pdf, Microsoft document format – most computers have one of these – a commercial product produces particularly good results: www.peernet.com/conversion-software/pdf-to-tiff-converter/). [Select Search from the Analyse menu to return to the main screen.]
This produces a list of terms that associate with each subtopic compared to the others. First, start Mozdeh and load the English version of the project. Then complete the following steps for each subtopic. (a) Select the subtopic query from the topic/label box. (b) Entering a blank search and click Boolean Search. (c) Click the Calculate Word Frequencies for all Search Matches button. (d) Copy the results (in a text box on the right of the screen) to a spreadsheet by (i) right clicking in the word frequencies box, (ii) clicking Select All, (iii) right clicking in the word frequencies box, (iv) clicking Copy and (v) switching to the spreadsheet and pasting the text to it.
For the whole project:
(a) Start Mozdeh and load the English version of the project. (b) Click the Advanced Search Tab if it is not already visible. Select Female from the User Gender drop-down box. (c) Enter a blank search and click Boolean Search. (d) Click the Calculate Word Frequencies for all Search Matches button. (e) Copy the results (in a text box on the right of the screen) to a spreadsheet by (i) right clicking in the word frequencies box, (ii) clicking Select All, (iii) right clicking in the word frequencies box, (iv) clicking Copy and (v) switching to the spreadsheet and pasting the text to it. (f) repeat the above from b) with Male selected in the User Gender drop-down box.
Before running the subproject gender analysis (or the subproject sentiment analysis) a subproject data file must be built in Mozdeh for each subproject, as follows. The following steps must be conducted for each subproject. (a) From the Subprojects menu, click [Use all data – ignore all subprojects]. (b) enter a blank search in the search box at the top left hand corner of the screen. (c) Click the Save tab and check the option Make Subproject From Search Matches option. (d) Click Boolean search and then enter a name for the subproject (without quotes) in the dialog box. This has created a file listing all tweets associated with the subproject.
For each subproject the process is the same as above except with a subproject selected (new step c):
(a) Start Mozdeh and load the English version of the project. (b) Click the Advanced Search Tab if it is not already visible. Select Female from the User Gender drop-down box. (c) From the Subprojects menu, click Select Subproject. Click on the subproject name (possibly ending in .dat) in the new dialog box. [until the subproject is changed, all future operations apply only to texts in that subproject] (d) Enter a blank search and click Boolean Search. (e) Click the Calculate Word Frequencies for all Search Matches button. (f) Copy the results (in a text box on the right of the screen) to a spreadsheet by (i) right clicking in the word frequencies box, (ii) clicking Select All, (iii) right clicking in the word frequencies box, (iv) clicking Copy and (v) switching to the spreadsheet and pasting the text to it. (g) repeat the above from b) with Male selected in the User Gender drop-down box.
For the whole project:
(a) Start Mozdeh and load the English version of the project. (b) Click the + button in the sentiment section of the main interface to restrict the results to tweets that are at least moderately positive and not moderately negative. (c) Enter a blank search and click Boolean Search. (d) Click the Calculate Word Frequencies for all Search Matches button. (e) Copy the results (in a text box on the right of the screen) to a spreadsheet by (i) right clicking in the word frequencies box, (ii) clicking Select All, (iii) right clicking in the word frequencies box, (iv) clicking Copy and (v) switching to the spreadsheet and pasting the text to it. (f) repeat the above from b) except clicking the – button to the right of the + button.
Conduct Step 4a unless it has already been done.
For each subproject the process is the same as above except with a subproject selected (new step c):
(a) Start Mozdeh and load the English version of the project. (b) Click the Advanced Search Tab if it is not already visible. Click the + button in the sentiment section of the main interface to restrict the results to tweets that are at least moderately positive and not moderately negative. (c) From the Subprojects menu, click Select Subproject. Click on the subproject name (possibly ending in .dat) in the new dialog box. [until the subproject is changed, all future operations apply only to texts in that subproject] (d) Enter a blank search and click Boolean Search. (e) Click the Calculate Word Frequencies for all Search Matches button. (f) Copy the results (in a text box on the right of the screen) to a spreadsheet by (i) right clicking in the word frequencies box, (ii) clicking Select All, (iii) right clicking in the word frequencies box, (iv) clicking Copy and (v) switching to the spreadsheet and pasting the text to it. (g) repeat the above from b) except clicking the – button to the right of the + button.
This creates networks of subtopics. (a) Start Mozdeh and load the English version of the project. (b) From the Network menu click Make networks of post similarity between labels.
Step 1 part 4, original method with Webometric Analyst: Video list generation: (a) Enter the subtopic queries into a plain text file, one per line. (b) Download Webometric Analyst from http://lexiurl.wlv.ac.uk to a Windows computer. (c) Sign up for a YouTube API key for permission to submit automatic YouTube queries http://lexiurl.wlv.ac.uk/searcher/YouTubeKeyRegister.html. (d) start Webometric Analyst, close the Wizard, and select the YouTube tab in the main interface. (e) Click Search for Videos Matching Each Query in File, select the plain text file of subtopic queries and wait for it to finish. Delete all of the files produced except the one of title matches (i.e., where the queries match the title of the videos).
Step 1 part 6, original method with Webometric Analyst: Comment downloading: (a) Start Webometric Analyst, close the Wizard, and select the YouTube tab in the main interface. (b) Click Get YouTube Comments for List of Video IDs, select the filtered file of query title matches and wait for YouTube to deliver all the comments. This may take days. ***This can all be done in Mozdeh instead of Webometric Analyst now.
Step 1, part 7, original method with Webometric Analyst: (a) Start Mozdeh to get as far as the New Project screen before closing it. This configures folders on your computer. (b) Start Webometric Analyst, close the Wizard, and select the YouTube tab in the main interface. (c) Click Convert YouTube Comments to Mozdeh Format, select the filtered file of query title matches (filename ending in TM.txt or TM), and select the option to process a maximum of one comment per user. This exports the comments to a Mozdeh project.
Step 1, part 8, original method with Webometric Analyst: Start Mozdeh, select the new project of YouTube comments created by Webometric Analyst and wait for Mozdeh to ingest the comments ready for analysis.
Made by the University of Wolverhampton during the CREEN and CyberEmotions EU projects and updated at the University of Sheffield. |