For my master thesis in “human media interaction” at the University of Twente, I worked on applying sentiment analysis technologies to emotion detection in online communities.
I am proud of this work which took 10 months of programming and writing and I felt it deserved to be digitalized from PDF to HTML. After graduating I also gave a lecture to other HMI students and there was some interest in studying to topic further. I am tempted to dive into this topic further and apply deep learning techniques to get better results.
A summary of the experiment is to automatically detect emotions related to the two-axis of core affect: Positive vs negative and aroused vs non-aroused. A machine learning approach was used to train algorithms based on self-annotated Twitter posts based on emoticons. The algorithm was tested with the online community of Greenpeace Greenwire.
Title page
Detecting core affect in e-communities using sentiment analysis technologies
Author: Niels van der Molen
Committee:
Prof. dr. F.M.G de Jong
Dr. M. Theune
Dr. ir. H.J.A. op den Akker
T.W. Potze BSc (Open Social)
A thesis submitted in the fulfilment of the requirements for the degree of Master of Science in the Human media Interaction Department of Electrical Engineering, Mathematics and Computer Science
August 2014
Abstract
Automatically detecting affect from text of users that interact in online communities could allow novel applications for e-community platforms. For example, if there would be an alert function responding to affect expressed by users of a social media platform then moderators could spend more effort on users who express negative affect. Furthermore, community developers could take affective information into consideration for altering
functionalities of the platform which could improve the user and community experiences.
To study if human annotators can accurately label affect in text, some content of the active users of the e-community platform Greenpeace Greenwire was annotated on the dimensions of core affect by 3 annotators. The inter-rater agreement was calculated using Krippendorff’s alpha with values of 0.534 for the valence-, and 0.326 for the activity dimension. The average of the weighted Cohen’s kappa is 0,536 for the valence, and 0,335 for the activity dimension. This indicates that both valence and activity dimensions of the circumplex model of core affect in text from e-communities can be labeled by human annotators. The annotated dataset was used for testing the affect detection algorithms.
Algorithms were created that can detect polarity, arousal values from a text which were based on the dimensions of core affect. Furthermore, an algorithm was tested which can classify affect versus non-affect values in text trained on data from Wikipedia and Twitter. An automatic labeling approach using emoticons was taken to classify tweets on polarity and arousal values for training data for the algorithms. The results of the affect detection algorithms are promising although there was a small number of low aroused-, and negative data in the annotated Greenwire dataset.
Chapters
Chapter 1: Introduction
Chapter 2: Background – published soon
Chapter 3: Datasets – published soon
Chapter 4: Annotating Greenwire Dataset – published soon
Chapter 5: Methodology – published soon
Chapter 6: Implementation – published soon
Chapter 7: Results – published soon
Chapter 8: Conclusions – published soon
0 Comments