Data set construction and exploratory experiments for cyberbullying detection

TitleData set construction and exploratory experiments for cyberbullying detection
Publication TypeTalk
Year of Publication2014
AuthorsVan Hee, C, Verhoeven, B, Lefever, E, De Pauw, G, Hoste, V, Daelemans, W
Conference/Workshop/...Presented at ATILA 2014, Ghent, Belgium
Date Published11/2014
AbstractIn the current era of online interactions, both positive and negative experiences are abundant on the web. As in real life, these negative experiences can have quite an impact on our youngsters. Recent research report cybervictimization rates among teenagers between 3% and 24% (Olweus, 2012; Patchin & Hinduja, 2012). In the research project AMiCA (Automatic Monitoring for Cyberspace Applications), we strive to automatically detect harmful content such as cyberbullying on social networks. We collected data from social networking sites and by simulating cyberbullying events with volunteer youngsters. This dataset was annotated for a number of fine-grained categories related to cyberbullying such as insults and threats. More broadly, the severity of cyberbullying in the post, as well as the author's role in the cyberbullying event (i.e. harasser, victim or bystander) were defined. We present the results of our preliminary experiments where we try to determine whether an online utterance is harmful (i.e. contains cyberbullying) or not. Moreover, we explore the feasibility to classify online posts in four categories (threats, insults, sexual talk and defensive statements). These results have provided insights in the difficulty and learnability of this task.