Test- Download - Java Version - Non-English - Buy! - About
SentiStrength
Automatic sentiment analysis of up to 16,000 social web texts per second with up to human level accuracy for English - other languages available or easily added.
SentiStrength estimates the strength of positive and negative sentiment in short texts, even for informal language. It has human-level accuracy for short social web texts in English, except political texts. SentiStrength reports two sentiment strengths:
-1 (not negative) to -5 (extremely negative)
1 (not positive) to 5 (extremely positive)
Why does it use two scores? Because research from psychology has revealed that we process positive and negative sentiment in parallel - hence mixed emotions.
SentiStrength can also report binary (positive/negative), trinary (positive/negative/neutral) and single scale (-4 to +4) results. SentiStrength was originally developed for English and optimised for general short social web texts but can be configured for other languages and contexts by changing its input files - some variants are demonstrated below.
Quick Tests (English version):
Other languages: Finnish, German, Dutch Spanish. Russian, Portuguese, French, Arabic, Polish, Persian, Swedish, Greek, Welsh, Italian, Turkish.Download SentiStrength
SentiStrength is free for academic research and is certified safe by Softpedia. Please contact the author for the commercial Java version or a commercial license for the online version. The free version runs under Windows only and is provided without liability or guarantees for any uses. Downloading SentiStrength and/or the configuration files signifies acceptance of these conditions. This version does not contain the keyword or domain classification facilities.
- Free SentiStrength Windows download - and free for academic uses SentiStrength Java download make sure to save the program AND the data files (this does not do the binary/trinary/scale classifications).
- if SentiStrength gives an error message when starting, please try downloading and installing the "Microsoft .NET Framework Redistributable Package" and then try running SentiStrength again.
- Remember to use Register New Location in the File menu to point SentiStrength to the location of the data files as soon as it loads, unless they are saved in the default location C:\SentStrength_Data\.
Here is an tutorial on SentiStrength from Prof Laeeq Khan of Ohio University.
Buy SentiStrength
A commercial licence for SentiStrength is available for £1000 - please contact m.thelwall -at- wlv.ac.uk. The Java version of SentiStrength is normally used commercially.
SentiStrength is used by computing, language technology and market research companies in the US, Europe and Australia. Some use the default English version and others have translated it into different languages or adopted it to integrate with their existing language technology systems. Commercial users range from small start-ups to one of the world's top 10 largest corporations.
Java Version
The Java version of SentiStrength is similar to the Windows version in core functions but has additional capabilities - see the SentiStrength Java manual (updated February 2017) and Mac users' starting instructions (also helps in Linux probably) and command list. It can conduct binary (positive/negative), trinary (positive/neutral/negative), single-scale classifications (-4 very negative to very positive +4) in addition to the standard type, and can conduct keyword-oriented and domain-oriented classifications. It also has a special mode for binary and trinary classification on longer texts. It allows wildcards in the idiom list file. The Java version can be downloaded with the link above. It can process about 16,000 tweets per second.
For RJB users, here is some sample RJB code from Adam Pantanowitz, University of the Witwatersrand.
For Python users, here is some sample Python code from Alec Larsen, University of the Witwatersrand. For Python 3+ try this sample Python code (or text version) variant instead. Unizip this Jupyter Notebook to use.
For GATE users, here is a GATE wrapper to import SentiStrength from Mark Greenwood and Diana Maynard, University of Sheffield. Instructions for using the GATE plugin from Alex ibollit Stepanenko.
For Weka users, there is a SentiStrength wrapper for the AffectiveTweets Weka package by Felipe Bravo-Marquez of Waikato University that can be installed via the WekaPackage manager.
Thank you to Sooyeon Jeong of MIT for fixing SentiStrength to work for Android apps.
About SentiStrength
SentiStrength is a sentiment analysis (opinion mining) program. It is described and evaluated in the following peer-reviewed academic articles:
- Thelwall, M., Buckley, K., Paltoglou, G. Cai, D., & Kappas, A. (2010). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12), 2544–2558.
- Thelwall, M., Buckley, K., & Paltoglou, G. (2012). Sentiment strength detection for the social Web, Journal of the American Society for Information Science and Technology, 63(1), 163-173.
- Thelwall, M., & Buckley, K. (2013). Topic-based sentiment analysis for the Social Web: The role of mood and issue-related words. Journal of the American Society for Information Science and Technology, 64(8), 1608–1617.
- Thelwall, M., & Buckley, K., Paltoglou, G., Skowron, M., Garcia, D., Gobron, S., Ahn, J., Kappas, A., Küster, D. & Holyst, J.A. (2013). Damping Sentiment Analysis in Online Communication: Discussions, monologs and dialogs. In: A. Gelbukh (Ed.): CICLing 2013, Part II, LNCS 7817, pp. 1-12. Springer, Heidelberg.
- [Turkish version] G. Vural, B. B. Cambazoglu, P. Senkul, and O. Tokgoz (2013) A framework for sentiment analysis in Turkish: Application to polarity detection of movie reviews in Turkish, Computer and Information Sciences III, pp 437-445.
- Thelwall, M. (2017). Heart and soul: Sentiment strength detection in the social web with SentiStrength (summary book chapter). In: Holyst, J. (Ed.) Cyberemotions: Collective emotions in cyberspace. Berlin, Germany: Springer (pp. 119-134). doi:10.1007/978-3-319-43639-5_7
- [Spanish version] Vilares Calvo, D., Thelwall, M., & Alonso, M.A. (2015). The megaphone of the people? Spanish SentiStrength for real-time analysis of political tweets. Journal of Information Science. [Introduces an improved Spanish sentiment strength detection version of SentiStrength and shows Spanish political tweets tend to amplify news stories.]
- Thelwall, M. (2018). Gender bias in sentiment analysis. Online Information Review, 42(1), 45-57.
- Culpeper, J., Findlay, A., Cortese, B. & Thelwall, M. (2018). Measuring emotional temperatures in Shakespeare’s drama. English Text Construction, 11(1), 10-37. [early modern English version]
It has been applied in the following research projects, amongst others.
- Thelwall, M., Buckley, K., & Paltoglou, G. (2011). Sentiment in Twitter events. Journal of the American Society for Information Science and Technology, 62(2), 406-418.
- Kucuktunc, O., Cambazoglu, B.B., Weber, I., & Ferhatosmanoglu, H. (2012).
A large-scale sentiment analysis for Yahoo! Answers, Proceedings of the 5th ACM International Conference on Web Search and Data Mining.[Used in Yahoo!] - Weber, I, Ukkonen, A., & Gionis, A. (2012). Answers, not links: extracting tips from yahoo! answers to address how-to web queries, Proceedings of the fifth ACM international conference on Web search and data mining (WSDM '12). [Used in Yahoo!]
- Pfitzner, R., Garas, A., & Schweitzer, F. (2012). Emotional divergence influences information spreading in Twitter, ICWSM-12.
- Garas, A., Garcia, D., Skowron, M., & Schweitzer, F. (2012). Emotional persistence in online chatting communities. Scientific Reports, 2, article 402.
- Mihai Grigore and Christoph Rosenkranz (2011). Increasing the willingness to collaborate online: an analysis of sentiment-driven interactions in peer content production. ICIS 2011 Proceedings. Paper 20.
- G. Vural, B. B. Cambazoglu, and P. Senkul (2012). Sentiment-focused web crawling, Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp 2020-2024.
- Giorgos Giannopoulos, Ingmar Weber, Alejandro Jaimes, Timos Sellis (2012). Diversifying User Comments on News Articles, Web Information Systems Engineering (WISE 2012). Lecture Notes in Computer Science, pp 100-113.
- From Greg Merritt: New Cities Foundation (2012), Connected Commuting: Research and Analysis on the New Cities Foundation Task Force in San Jose (SentiStrength is mentioned on page 16). See also Crowdsourcing your Commute (New York Times).
- Witherspoon, C. & Stone, D.(2013). Analysis and Sentiment Detection in Online Reviews of Tax Professionals: A Comparison of Three Software Packages, Journal of Emerging Technologies in Accounting, 10(1), 89-115.
-
Zheludev, I., Smith, R., & Aste, T. (2014). When can social media lead financial markets?. Scientific Reports, 4. doi:10.1038/srep04213.
- Durahim, A. O., & Coşkun, M. (2015). #iamhappybecause: Gross National Happiness through Twitter analysis and big data. Technological Forecasting and Social Change, 99, 92-105. [Uses the Turkish version]
- Calefato, F. Lanubile, F. & Novielli, N. (2018). How to ask for technical help? Evidence-based guidelines for writing questions on Stack Overflow. Information and Software Technology Journal, 94, 186-207..
- Livas, C., Delli, K., & Pandis, N. (2018). “My Invisalign experience”: content, metrics and comment sentiment analysis of the most popular patient testimonials on YouTube. Progress in Orthodontics, 19(1), 3.
- Culpeper, J., Archer, D., Findlay, A. & Thelwall, M. (2018). John Webster, the dark and violent playwright? ANQ: A Quarterly Journal of Short Articles Notes and Reviews, 31(3). 201-210. doi:10.1080/0895769X.2018.1445515
- Orasan, C. (2018). Aggressive Language Identification Using Word Embeddings and Sentiment Features. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), p. 113 - 119, Santa Fe, USA, August 25. [Software used]
- Baviera, T., Sampietro, A., & García-Ull, F. J. (2019). Political conversations on Twitter in a disruptive scenario: The role of “party evangelists” during the 2015 Spanish general elections. The Communication Review, 22(2), 117-138.
- Toussaint, P. A., Renner, M., Lins, S., Thiebes, S., & Sunyaev, A. (2022). Direct-to-Consumer Genetic Testing on Social Media: Topic Modeling and Sentiment Analysis of YouTube Users' Comments. JMIR Infodemiology, 2(2), e38749.
SentiStrength is part of various social media analysis software suites, including:
- Mozdeh Big data text analysis (Windows)
- COSMOS Open Data Analytics software (multi-platform)
- Chorus data harvesting and visual analytics suite for Twitter (Windows)
Press coverage and initiatives
- Reading the Riots: Investigating England's Summer of Disorder Guardian online.
- SentiStrength classified London Olympics tweets with the results put up in lights on the EDF Energy London Eye.YouTube videos: Barge 1, Barge 2, Inside Barge, LondonEye lightshow.
- Time Magazine: Want to Light Up the London Eye? Just Tweet That the Olympics Are 'Totes Amazeballs', July 27, 2012.
- Wall Street Journal: Hidden Message in the London Eye.
- UK Daily Telegraph article, p. 27, 19 July 2012, "Happy Olympic tweeters to light up London Eye" in "the world's first social media driven light show".
- BBC News Article: 20 July 2012, London Eye Olympic Twitter positivity lightshow launched.
- UK Daily Mirror article, 20 July 2012, The mood of the nation: Tweets to power spectacular London 2012 light show.
- Sydney Morning Herald: London Eye to become giant Twitter mood ring during Olympics, July 25, 2012.
- Voice of Russia: Twitter embraces Olympics, colors tweets in emotions and sends messages from space, July 24, 2012.
- SportPrimeur, 20 July 2012, London Eye twitter-uitlaatklep tijdens Spelen.
- SentiStrength classified tweets for the 2014 Super Bowl, with the results transformed into a lightshow on the Empire State Building.
- NBC New York, Jan 28, 2014: Fans to Pick Colors in Empire State Building Super Bowl Light Shows.
- CBS Local, Jan 27, 2014: Start Tweeting! Super Bowl Debates To Light Up Empire State Building.
- Forbes Jan 30, 2014: Verizon's Super Bowl Scheme Is To Save $4 Million And Light Up The Sky.
- LA Times Jan 30, 2014: Empire State Building lights up for Super Bowl, Chinese New Year.
- Belfast Telegraph, Jan 29, 2014: Empire State building lights up in Super Bowl team colors.
- Fox CT, Jan 29, 2014: #WhosGonnaWin? Tweets Pick Colors Of Empire State Building.
Classifying texts with SentiStrength
To get SentiStrength to classify one or more texts, put the texts into a plain text file with one text per line. Select Analyse All Texts in File from the Sentiment Strength Analysis menu and select the text file. The output will be a copy of the file with positive and negative classifications added at the end of each line, preceded by tabs. Individual texts can also be classified by selecting Analyse One Text from the Sentiment Strength Analysis menu. Sets of text files can also be processed.
Optimising SentiStrength term weights
The term positive and negative weights can be found in the EmotionLookupTable.txt file in the SentStrength_Data folder. These can be manually adjusted by editing the file. Alternatively, they can be automatically fine-tuned with a classified text collection. To fine tune EmotionLookupTable.txt values used by SentiStrength, first create a collection of texts that have been classified by humans with positive (1-5) and negative (1-5) sentiment strengths. Put these into a plain text file in which each line has the format: positive – tab – negative – tab – text (negative sentiment can be entered as either positive or negative integers). The set should be at least 500 texts. Select Optimise the emotion dictionary weights from the Sentiment Strength Analysis menu and SentiStrength will create a new term strength list that is optimised for the sentiment in the new texts. To use the new strengths, save a copy of the original strength list and then replace it with the new list.
Assessing the accuracy of SentiStrength
To assess the accuracy of SentiStrength on a set of texts, a sample must first be classified and formatted as above. The human classifications can then be compared with the SentiStrength classifications on the same sample.
Alternatively, if one data set is available to optimise the word strength list and the same set is to be used for validation then the 10-fold cross-validation procedure can be used. This uses 90% of the data to train the term weights and the remaining 10% to assess the accuracy of the adjusted weights. This is repeated 10 times with a different 10% left out and the total results are reported. To run a 10-fold cross-validation, create the classified text as above and select Run a 10-fold cross-validation to assess the above algorithm from the Sentiment Strength Analysis menu.
Extra resources:
- Six sets of at least 1000 human coded texts, each coded by three independent coders.
- Coding manual for sentiment in texts (pdf version) (for Tweets, but easily changed for other texts)
- SentiStrength based 6-hour sentiment analysis course.
Support files
The various files with SentiStrength contain information used in the algorithm and may be customised.
- The EmotionLookUpTable is just a list of emotion-bearing words, each one with the word then a tab, then an integer 1 to 5 or -1 to -5. This can be edited and extended. Note that strengths of +1 and -1 have no effect on the program. There are some in the list, just to indicate that they words have been considered but not used. Each word can end with a wild card * but this can only go at the end.
- The EmoticonLookUpTable is as above but for a list of emoticons.
- EnglishWordList.txt is just a list of English words - it is used for the part of the algorithm that tries to correct words with non-standard spellings.
- NegatingWordList.txt reverses the polarity of subsequent words -e.g., not happy is negative.
- BoosterWordList.txt increases sentiment intensity -e.g., very happy is more positive than happy.
- SlangLookupTable.txt – replaces common slang with equivalent words or expressions
- IdiomLookupTable.txt–overrides the sentiment strength of the individual words in the phrase
Language customisation
SentiStrength can be adjusted for other languages by translating the term list EmotionLookupTable.txt and adding any other sentiment-bearing words that have been omitted. Note that the sentiment scores for terms should be in the range 2 to 5 (positive) or -2 to -5 (negative). A score of +1 or -1 means neutral and neutral terms are ignored. A training corpus in the new language is recommended to help adjust the term weight strengths (see Optimising SentiStrength term weights).You will need the Java version or Windows version 2.2 of SentiStrength to cope with accented characters or characters not found in English as well as some additional linguistic features. International lexicons are here but please note the licence terms.
The following files will also need to be translated or replaced with a local equivalent (see the extra instructions):
- EmoticonLookupTable.txt - check the strengths are appropriate and add any common new national variations
- SlangLookupTable.txt – replace with a list of common slang in the new language
- EnglishWordList.txt – replace with a word list of correct spellings in the new language (many such lists are on the web, but this step is optional)
- NegatingWordList.txt – translate/replace with a list of negating words in the new language
- IdiomLookupTable.txt–replace with a list of common idioms in the new language
- BoosterWordList.txt – translate/replace with a list of booster words in the new language – words that emphasise the strength of emotion in any subsequent words
- QuestionWords.txt– translate/replace with a list of words in the new language that reliably indicate that a question is being asked
You will also need to register a list of non-English common multiple letters (e.g., ii is common in some languages but not English). For the Java version please see the manual for this option. For the Windows version, please check the options menu for this customisation. Spell-checking can also be completely disabled in both versions, if needed.
Negating words occurring after sentiment words (e.g., "I am happy not" is OK in German but not English) can be customised in the Java version of SentiStrength but not the Windows version, sorry. The Java version may need the utf8 option to read the input files, if in UTF8 rather than ASCII format (note that utf8 does not always work on ANSI text files so it should not be used as the default).
SentiStrength versions for other languages
Would you like to help? If you are a linguist with knowledge of any of these languages then you could help by:
- Checking the dictionaries for accuracy and missing sentiment words
- Reporting any badly classified texts.
Please email m dot thelwall at wlv.ac.uk if you would like to help. This makes a good student project.
Classifiers with some testing (8)
Thank you to Eismont Polina, Efanova Iuliia, Konovalova Svetlana, Losev Viktor and Velichko Alena of Saint Petersburg State University of Aerospace Instrumentation, Department of Applied Linguistics for help with the first Russian version. (+ve correl. 0.28-0.47, -ve correl. 0.31-0.46 on tweets - the second number is overfitted due to testing on the evaluation data set, so the real correlation is probably about 0.35 for both). 3972 human-classified Russian tweets.
Thank you to Юлия Павлова, Olessia Koltsova and Sergei Koltsov for the second Russian version. It was developed by the Laboratory for Internet Studies, National Research University Higher School of Economics (NRU HSE), and supported by the Russian Humanitarian Research Foundation and NRU HSE.
There is also a Turkish sentiment strength classifier that is a variant of SentiStrength created by Gural VURAL, METU Computer Eng. Dept. This is available on the same basis as the Java version.
Created by Jonathan Culpeper, Alison Findlay and Beth Cortese of Lancaster University. See paper.
Citation context classifiers are also available.
Hindi. SentiStrength cannot cope with the unicode marks in Hindi but Ashutosh Khanna has created Hindi resources and code that can be used for a similar sentiment analysis. He welcomes additions, comments or suggestions (ashutosh.khanna26 [at] gmail.com).
Completely untested classifiers (9) [just for fun: please email m dot thelwall at wlv.ac.uk if you would like to help improve them - this makes a good student project for linguists or computer scientists, together with testing the results, and making a small corpus of sentiment-classified texts! Here are the language files for 9 of these languages - please improve them and send back if you like. Please also send us a few positive and negative words from any languages not listed here and we will make a new version for your language!]
Thank you to Usman A. Adam (Kaduna State University) for the Hausa version.
Basic classifiers (10) that recognise only a few sentiment words. Please email m dot thelwall at wlv.ac.uk if you would like to help improve them or to send a list of at least 10 common sentiment words for any language. We can't get Hindi, Punjabi and Bengali to work at the moment, sorry (see above for a partial Hindi solution). This is due to word segmentation issues related to these languages containing characters classified as 'marks' in unicode. Also, the Chinese simplified and traditional and the Japanese are artificial versions that add spaces between words or phrases into the language. Thank you to Dr Fajri Koto for allowing some words from InSet to be used in the Indonesian version and to Sri Milawati Asshagab for suggesting it.
Domain customisation
SentiStrength can be adjusted for other domains (e.g., Twitter, product reviews) by adding new relevant words and sentiment strengths to the term list EmotionLookupTable.txt and adjusting any relevant existing term strengths. The other files can also be adjusted, as for language customisation. For example, the file EmotionLookupTableGeneral.txt in the download zipfile contains a slightly adjusted set of term weights to cope with more impersonal communication than MySpace. In this alternative file, the word "love" has a higher strength because it is less likely to be used in formulaic message endings, such as "love from" or "love u" or "love x".
Stress and relaxation
SentiStrength's sister program TensiStrength detects the strength of stress and relaxation in social media texts.
Data mining
The data mining menu and ARFF menu items are not part of the main SentiStrength functionality nor documented. Please ignore them unless they make sense to you.
Other
For further issues, please see the Frequently Asked Questions.
SentiStrength was produced as part of the CyberEmotions project, funded by the EU (FP7).