Sentiment analysis is the process of identifying, measuring, and interpreting positive or negative opinions expressed in large amounts of text data on the Internet. This method is used in recommendation systems, news analysis, political science, marketing and sociological research. Sentiment analysis can help customers choose products based on reviews, recognize users’ true search needs, and identify extremist resources. In addition, such analysis can be used to study the impact of social media posts on the effectiveness of marketing policies, consumer reactions to a company’s products, and even to predict stock market movements based on social media sentiment.
Significant progress has been made in sentiment analysis over the past few years, especially with the application of deep neural networks to text processing. However, if you transfer a trained model from one domain to another, problems arise. For example, a model for analyzing restaurant reviews will not work well with bank reviews. Scientists today are trying to find a way to speed up the transfer of models between domains and make it more efficient – this would save a lot of money and effort. Another challenge facing scientists is how to quickly and inexpensively improve the quality of sentiment analysis using a neural network in a specific domain.
Scientists from the Intelligent Systems Laboratory of Vyatka State University have developed an approach that allows to quickly transfer a sentiment analysis model from one domain to another. The authors found that when transferring some universal sentiment analysis model, which was trained on a large collection of various texts from a certain domain, to another domain, the quality of analysis will be low. This means that the model needs to be fine-tuned. The authors determined that fine-tuning a universal model requires only a few hundred labeled texts from a new domain, and not thousands or tens of thousands as for primary training. The research is large-scale: for the first time in the Russian language hundreds of experiments were carried out with 30 sentiment-annotated corpora from 12 domains, which contained more than 280 thousand texts. Such a large volume of research materials reinforces the validity of the conclusions.
Additionally, the authors trained a cross-domain Russian-language model, which effectively analyze sentiment in different domains, and made it publicly available. They also labelled by sentiment and shared a new text corpus RuNews, including 1,823 news texts, and obtained sentiment analysis quality scores that exceeded the best state-of-the-art assessments for 7 test corpora.
“The main task we solve in our work is improving the quality of sentiment analysis using a neural network in a certain domain (for example, when analyzing bank reviews). It is desirable to do this quickly and cheaply. The key problem in this case is that in the domain of interest, as a rule, there is no high-quality labelled corpus of texts, that is, texts that have been processed and provided with additional information, such as labels, tags or descriptions,” says the leader of the project, Evgeny Kotelnikov, professor at Vyatka State University.
The material has been prepared with the financial support of Ministry of Education of Russia within the federal project «Popularization of science and technology».