Automatic classification of participant roles in cyberbullying: Can we detect victims, bullies, and bystanders in social media text?

TitleAutomatic classification of participant roles in cyberbullying: Can we detect victims, bullies, and bystanders in social media text?
Publication TypeJournal Article
Year of Publication2020
AuthorsJacobs, G, Van Hee, C, Hoste, V
JournalNatural Language Engineering
Pagination26
Date Published11
Keywordsbullying participants, cyberbullying, lt3, social media text, text classification
Abstract

Successful prevention of cyberbullying depends on the adequate detection of harmful messages. Given the impossibility of human moderation on the Social Web, intelligent systems are required to identify clues of cyberbullying automatically. Much work on cyberbullying detection focuses on detecting abusive language without analyzing the severity of the event nor the participants involved. Automatic analysis of participant roles in cyberbullying traces enables targeted bullying prevention strategies. In this paper, we aim to automatically detect different participant roles involved in textual cyberbullying traces, including bullies, victims, and bystanders. We describe the construction of two cyberbullying corpora (a Dutch and English corpus) that were both manually annotated with bullying types and participant roles and we perform a series of multiclass classification experiments to determine the feasibility of text-based cyberbullying participant role detection. The representative datasets present a data imbalance problem for which we investigate feature filtering and data resampling as skew mitigation techniques. We investigate the performance of feature-engineered single and ensemble classifier setups as well as transformer-based pretrained language models (PLMs). Cross-validation experiments revealed promising results for the detection of cyberbullying roles using PLM fine-tuning techniques, with the best classifier for English (RoBERTa) yielding a macro-averaged -score of 55.84%, and the best one for Dutch (RobBERT) yielding an -score of 56.73%. Experiment replication data and source code are available at https://osf.io/nb2r3.

URLhttps://doi.org/10.1017/S135132492000056X
DOI10.1017/s135132492000056x