Test- Download - Java Version - Non-English - Buy! - About

SentiStrengthSentiStrength

Automatic sentiment analysis of up to 16,000 social web texts per second with up to human level accuracy for English - other languages available or easily added.

SentiStrength estimates the strength of positive and negative sentiment in short texts, even for informal language. It has human-level accuracy for short social web texts in English, except political texts. SentiStrength reports two sentiment strengths:

-1 (not negative) to -5 (extremely negative)

1 (not positive) to 5 (extremely positive)

Why does it use two scores? Because research from psychology has revealed that we process positive and negative sentiment in parallel - hence mixed emotions.

SentiStrength can also report binary (positive/negative), trinary (positive/negative/neutral) and single scale (-4 to +4) results. SentiStrength was originally developed for English and optimised for general short social web texts but can be configured for other languages and contexts by changing its input files - some variants are demonstrated below.

Quick Tests (English version):


Output: Dual, binary, trinary, scale

       

Other languages
: Finnish, German, Dutch Spanish. Russian, Portuguese, French, Arabic, Polish, Persian, Swedish, Greek, Welsh, Italian, Turkish.

Download SentiStrength

SentiStrength is free for academic research and is certified safe by Softpedia. Please contact the author for the commercial Java version or a commercial license for the online version. The free version runs under Windows only and is provided without liability or guarantees for any uses. Downloading SentiStrength and/or the configuration files signifies acceptance of these conditions. This version does not contain the keyword or domain classification facilities.

Here is an tutorial on SentiStrength from Prof Laeeq Khan of Ohio University.

Buy SentiStrength

A commercial licence for SentiStrength is available for £1000 - please contact m.thelwall -at- wlv.ac.uk. The Java version of SentiStrength is normally used commercially.

SentiStrength is used by computing, language technology and market research companies in the US, Europe and Australia. Some use the default English version and others have translated it into different languages or adopted it to integrate with their existing language technology systems. Commercial users range from small start-ups to one of the world's top 10 largest corporations.

Java Version

The Java version of SentiStrength is similar to the Windows version in core functions but has additional capabilities - see the SentiStrength Java manual (updated February 2017) and Mac users' starting instructions (also helps in Linux probably) and command list. It can conduct binary (positive/negative), trinary (positive/neutral/negative), single-scale classifications (-4 very negative to very positive +4) in addition to the standard type, and can conduct keyword-oriented and domain-oriented classifications. It also has a special mode for binary and trinary classification on longer texts. It allows wildcards in the idiom list file. The Java version can be downloaded with the link above. It can process about 16,000 tweets per second.

For RJB users, here is some sample RJB code from Adam Pantanowitz, University of the Witwatersrand.

For Python users, here is some sample Python code from Alec Larsen, University of the Witwatersrand. For Python 3+ try this sample Python code (or text version) variant instead. Unizip this Jupyter Notebook to use.

For GATE users, here is a GATE wrapper to import SentiStrength from Mark Greenwood and Diana Maynard, University of Sheffield. Instructions for using the GATE plugin from Alex ibollit Stepanenko.

For Weka users, there is a SentiStrength wrapper for the AffectiveTweets Weka package by Felipe Bravo-Marquez of Waikato University that can be installed via the WekaPackage manager.

Thank you to Sooyeon Jeong of MIT for fixing SentiStrength to work for Android apps.

About SentiStrength

SentiStrength is a sentiment analysis (opinion mining) program. It is described and evaluated in the following peer-reviewed academic articles:

It has been applied in the following research projects, amongst others.

SentiStrength is part of various social media analysis software suites, including:

Press coverage and initiatives

Classifying texts with SentiStrength

To get SentiStrength to classify one or more texts, put the texts into a plain text file with one text per line. Select Analyse All Texts in File from the Sentiment Strength Analysis menu and select the text file. The output will be a copy of the file with positive and negative classifications added at the end of each line, preceded by tabs. Individual texts can also be classified by selecting Analyse One Text from the Sentiment Strength Analysis menu. Sets of text files can also be processed.

Optimising SentiStrength term weights

The term positive and negative weights can be found in the EmotionLookupTable.txt file in the SentStrength_Data folder. These can be manually adjusted by editing the file. Alternatively, they can be automatically fine-tuned with a classified text collection. To fine tune EmotionLookupTable.txt values used by SentiStrength, first create a collection of texts that have been classified by humans with positive (1-5) and negative (1-5) sentiment strengths. Put these into a plain text file in which each line has the format: positive – tab – negative – tab – text (negative sentiment can be entered as either positive or negative integers). The set should be at least 500 texts. Select Optimise the emotion dictionary weights from the Sentiment Strength Analysis menu and SentiStrength will create a new term strength list that is optimised for the sentiment in the new texts. To use the new strengths, save a copy of the original strength list and then replace it with the new list.

Assessing the accuracy of SentiStrength

To assess the accuracy of SentiStrength on a set of texts, a sample must first be classified and formatted as above. The human classifications can then be compared with the SentiStrength classifications on the same sample.
Alternatively, if one data set is available to optimise the word strength list and the same set is to be used for validation then the 10-fold cross-validation procedure can be used. This uses 90% of the data to train the term weights and the remaining 10% to assess the accuracy of the adjusted weights. This is repeated 10 times with a different 10% left out and the total results are reported. To run a 10-fold cross-validation, create the classified text as above and select Run a 10-fold cross-validation to assess the above algorithm from the Sentiment Strength Analysis menu.

Extra resources:

Support files

The various files with SentiStrength contain information used in the algorithm and may be customised.

Language customisation

SentiStrength can be adjusted for other languages by translating the term list EmotionLookupTable.txt and adding any other sentiment-bearing words that have been omitted. Note that the sentiment scores for terms should be in the range 2 to 5 (positive) or -2 to -5 (negative). A score of +1 or -1 means neutral and neutral terms are ignored. A training corpus in the new language is recommended to help adjust the term weight strengths (see Optimising SentiStrength term weights).You will need the Java version or Windows version 2.2 of SentiStrength to cope with accented characters or characters not found in English as well as some additional linguistic features. International lexicons are here but please note the licence terms.

The following files will also need to be translated or replaced with a local equivalent (see the extra instructions):

You will also need to register a list of non-English common multiple letters (e.g., ii is common in some languages but not English). For the Java version please see the manual for this option. For the Windows version, please check the options menu for this customisation. Spell-checking can also be completely disabled in both versions, if needed.

Negating words occurring after sentiment words (e.g., "I am happy not" is OK in German but not English) can be customised in the Java version of SentiStrength but not the Windows version, sorry. The Java version may need the utf8 option to read the input files, if in UTF8 rather than ASCII format (note that utf8 does not always work on ANSI text files so it should not be used as the default).

SentiStrength versions for other languages

Would you like to help? If you are a linguist with knowledge of any of these languages then you could help by:

Please email m dot thelwall at wlv.ac.uk if you would like to help. This makes a good student project.

Classifiers with some testing (8)

Created by Sebastijan R. Maček, Sonja Debevc and Gregor Zamuda of STA, the Slovenian Press Agency. The version in this website does not work for words with accented characters.

Created by Hannes Pirker, Interaction Technologies Group at the Austrian Research Institute for Artificial Intelligence (OFAI), with additions from Elias Kyewski of the University of Duisburg. The German version can be obtained via this download form

The Dutch version was created by Ella Smorenburg, supervised by Emmeke Veltmeijer MSc and dr. Charlotte Gerritsen at the Vrije Universiteit Amsterdam. Download Dutch files (work best with Java version of SentiStrength).

Created by David Vilares from previous versions made by David Garcia. Incorporates terms from Ancora and SOCAL. Human-coded Spanish tweets sentiment strength test and development sets. See also an English-Spanish code-switching corpus from this paper.

Made with pride in the European Union by Dr. Eng. Cosmo "Pyo" DI MILLE and his team.

Thank you to Eismont Polina, Efanova Iuliia, Konovalova Svetlana, Losev Viktor and Velichko Alena of Saint Petersburg State University of Aerospace Instrumentation, Department of Applied Linguistics for help with the first Russian version. (+ve correl. 0.28-0.47, -ve correl. 0.31-0.46 on tweets - the second number is overfitted due to testing on the evaluation data set, so the real correlation is probably about 0.35 for both). 3972 human-classified Russian tweets.

Thank you to Юлия Павлова, Olessia Koltsova and Sergei Koltsov for the second Russian version. It was developed by the Laboratory for Internet Studies, National Research University Higher School of Economics (NRU HSE), and supported by the Russian Humanitarian Research Foundation and NRU HSE.

There is also a Turkish sentiment strength classifier that is a variant of SentiStrength created by Gural VURAL, METU Computer Eng. Dept. This is available on the same basis as the Java version.

(this version includes only part of Gural VURAL's system so its results are not as good as Gural VURAL's full version.)

Created by Jonathan Culpeper, Alison Findlay and Beth Cortese of Lancaster University. See paper.

Citation context classifiers are also available.

Hindi. SentiStrength cannot cope with the unicode marks in Hindi but Ashutosh Khanna has created Hindi resources and code that can be used for a similar sentiment analysis. He welcomes additions, comments or suggestions (ashutosh.khanna26 [at] gmail.com).

Completely untested classifiers (9) [just for fun: please email m dot thelwall at wlv.ac.uk if you would like to help improve them - this makes a good student project for linguists or computer scientists, together with testing the results, and making a small corpus of sentiment-classified texts! Here are the language files for 9 of these languages - please improve them and send back if you like. Please also send us a few positive and negative words from any languages not listed here and we will make a new version for your language!]

Thank you to Usman A. Adam (Kaduna State University) for the Hausa version.

Basic classifiers (10) that recognise only a few sentiment words. Please email m dot thelwall at wlv.ac.uk if you would like to help improve them or to send a list of at least 10 common sentiment words for any language. We can't get Hindi, Punjabi and Bengali to work at the moment, sorry (see above for a partial Hindi solution). This is due to word segmentation issues related to these languages containing characters classified as 'marks' in unicode. Also, the Chinese simplified and traditional and the Japanese are artificial versions that add spaces between words or phrases into the language. Thank you to Dr Fajri Koto for allowing some words from InSet to be used in the Indonesian version and to Sri Milawati Asshagab for suggesting it.

Domain customisation

SentiStrength can be adjusted for other domains (e.g., Twitter, product reviews) by adding new relevant words and sentiment strengths to the term list EmotionLookupTable.txt and adjusting any relevant existing term strengths. The other files can also be adjusted, as for language customisation. For example, the file EmotionLookupTableGeneral.txt in the download zipfile contains a slightly adjusted set of term weights to cope with more impersonal communication than MySpace. In this alternative file, the word "love" has a higher strength because it is less likely to be used in formulaic message endings, such as "love from" or "love u" or "love x".

Stress and relaxation

SentiStrength's sister program TensiStrength detects the strength of stress and relaxation in social media texts.

Data mining

The data mining menu and ARFF menu items are not part of the main SentiStrength functionality nor documented. Please ignore them unless they make sense to you.

Other

For further issues, please see the Frequently Asked Questions.

SentiStrength was produced as part of the CyberEmotions project, funded by the EU (FP7).