Fight hate speech on Social Media:
I’ve had the privilege of presenting my work on offensive text classification in code switched Hindi-English language at both ACL’18 and EMNLP’18, two top-tier NLP conferences. Based on the great feedback I’ve received, the outreach and awareness I’ve been able to raise and the number of people I’ve been able to involve in tackling mental health problems on social media, I’ve decided to dive in deeper into the motivation and methodology for this work.
Our team has had the great fortune of receiving grants for both our papers and we’re elated to have been given the platform to not only learn but also bring back everything we’ve learned at these forums to our community. The motivation to write this post is the same as why I write my other blog posts or give talks; I want to give back to the community, I want to introduce students to making the web a more accepting, open and secure place by leveraging AI and the aspects of AI for social good.
The papers can be found in ACL Anthology:
A deeper look into the problem
The rampant use of offensive content on social media is destructive to a progressive society as it tends to promote abuse, violence and chaos and severely impacts individuals at different levels. Specifically, in the Indian subcontinent, number of Internet users is rising rapidly due to inexpensive data . With this rise, comes the problem of hate speech, offensive and abusive posts on social media. Social media is rife with such offensive content that can be broadly classified as abusive and hate-inducing on the basis of severity and target of the discrimination
Hate speech vs Abusive speech: Is there a difference?
Hate speech is an act of offending a person or a group as a whole on the basis of certain key attributes such as religion, race, sexual orientation, gender, ideological background, mental and physical disability.
Abusive speech is offensive speech with a vague target and mild intention to hurt the sentiments of the receiver.
Hinglish: What and Why?
What is Hinglish?
Hinglish is a major contributor to the tremendously high offensive online which is formed of the words spoken in Hindi language but written in Roman script instead of the Devanagari script. Hinglish is a pronunciation based bi-lingual language that has no fixed grammar rules. Hinglish extends its grammatical setup from native Hindi accompanied by a plethora of slurs, slang and phonetic variations due to regional influence.
Is Hinglish that commonly used to warrant studies for offensive text classification?
Most social media platforms delete such offensive content when: (i) either someone reports manually or (ii) an offensive content classifier automatically detects them. However, people often use such code-switched languages to write offensive content on social media so that English trained classifiers can not detect them automatically, necessitating an efficient classifier that can detect offensive content automatically from code-switched languages. In 2015, India ranked fourth on the Social Hostilities Index with an index value of 8.7 out of 10, making it imperative to filter the tremendously high offensive online content in Hinglish.
The Challenges
Hinglish has the following characteristics:
1. It is formed of words spoken in Hindi (Indic) language but written in Roman script instead of the standard Devanagari script.
2. It is one of the many pronunciations based pseudo languages created natively by social media users for the ease of communication.
3. It has no fixed grammar rules but rather borrows the grammatical setup from native Hindi and compliments it with Roman script along with a plethora of slurs, slang and phonetic variations due to regional influence.
Hence, such a code-switched language presents challenging limitations in terms of the randomised spelling variations in explicit words due to a foreign script and compounded ambiguity arising due to the various interpretations of words in different contextual situations.
Another challenge worth consideration in dealing with Hinglish is the demographic divide between the users of Hinglish relative to total active users globally. This poses a serious limitation as the tweet data in Hinglish language is a small fraction of the large pool of tweets generated, necessitating the use of selective methods to process such tweets in an automated fashion.
Formulating the problem
- Creation of a dataset consisting of tweets to identify offensive, abusive and hate-speech in Hinglish language.
- Ascertaining a diverse set of features to build a robust classifier.
- Development of models based on features selected and exploration of deep learning approaches with transfer learning.
In the following article, I will cover the technical aspects of each of these subproblems in depth.
Data: Collection and Annotation
HOT Dataset
HOT is a manually annotated dataset that was created using the Twitter Streaming API3 by selecting tweets having more than 3 Hinglish words. The tweets were collected during the interval of 4 months of November 2017 to February 2018. The tweets were mined by imposing geo-location restriction such that tweets originating only in the Indian subcontinent were made part of the corpus. The collected corpus of tweets initially had 25667 tweets which was filtered down to remove tweets containing only URL’s, only images and videos, having less than 3 words, non-English and nonHinglish scripts and duplicates.
T-SNE plot of the HOT dataset
HOT Annotation
The annotation of HOT tweets were done by three annotators having sufficient background in NLP research. The tweets were labeled as hate speech if they satisfied one or more of the conditions: (i) tweet used sexist or racial slur to target a minority, (ii) undignified stereotyping or (iii) supporting a problematic hashtags such as #ReligiousSc*m.
Examples of tweets in the HOT dataset
Methodology
Preprocessing
Preprocessing is often overlooked, but is one of the most crucial steps for NLP problems. The tweets obtained from data sources were channeled through the following pre-processing pipeline with the aim to transform them into semantic feature vectors.
- The first pre-processing step was the removal of punctuations, URLs, user mentions {@mentions} and numbers {0–9}.
- Hash tags and emoticons were suitably converted by their textual counterparts along with conversion of all tweets into lower case.
- Stopword removal followed by transliteration and translation of each word in the Hinglish tweet to the corresponding English word using http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/.
- Creating word embedding layers.
MIMCT Model
The MIMCT model has a split architecture consisting of two major components:
- Primary and Secondary Inputs.
- CNN-LSTM binary channel neural network.
MIMCT Model
Does this model have to be this complex?
You’re probably thinking there must be simpler models that can solve this problem. There’s a lot happening here, and while the paper describes this more formally, I’ll attempt to describe this in the easiest way possible.
Apart from the regular embedding inputs, additional hierarchical contextual features are also required so as to complement the overall classification of the textual data. These features additionally focus on the sentiment and tailor-made abuses that may not be present in regular dictionary corpus. This helps to overcome a serious bottleneck in the classification task and could be one of the prominent reasons for high misclassification of abusive and hate-inducing class in baseline and basic transfer learning approaches. The multiple modalities added to the MIMCT model as secondary inputs are:
- Sentiment score: Positive/Neutral/Negative
- LIWC Features: Generative labels for authenticity, psychological state, etc.
- Profanity vector: Presence of specific bad words in Hinglish.
Transfer Learning via CNNs and LSTMs
The proposal to apply transfer learning is inspired by the fact that despite having a smallsized dataset, it provides relative performance increase at a reduced storage and computational cost (Bengio, 2012). Deep learning models pre-trained on EOT learn the low-level features of the English language tweets. The weights of initial convolutional layers are frozen while the last few layers are kept trainable such that when the model is retrained on the HOT dataset, it learns to extract high level features corresponding to syntax variations in translated Hinglish language.
Diving deeper into the architecture
CNN: Convolutional 1D layer (filter size=15, kernel size=3) → Convolutional 1D (filter size=12, kernel size=3) → Convolutional 1D (filter size=10, kernel size=3) → Dropout (0.2) → Flatten Layer → Dense Layer (64 units, activation = ’relu’) → Dense Layer (3 units, activation = ’softmax’)
LSTM : LSTM layer(h=64, dropout=0.25, recurrent dropout=0.3) → Dense (64 units, activation = ’relu’) → Dense (3 units, activation = ’sigmoid’)
Results
Results for non-offensive, abusive, hate-inducing tweet classification on EOT, HOT and the HOT dataset with transfer learning (TFL) for Glove, Twitter Word2vec and FastText embeddings
Key takeaways:
- SVM supplemented with TF-IDF features gives peak performance when compared to other configurations of baseline supervised classifiers.
- TF-IDF is the most effective feature for semantically representing Hinglish text and gives better performance than both Bag of Words Vector and Character N-grams.
- Post transfer learning, the model performances on HOT improve significantly strengthening the argument that there was a positive transfer of features from English to Hinglish tweet data.
- While sentiment score isn’t very useful, the combination of profanity vector and LIWC features leads to huge improvements.
Results of the MIMCT model with various input features HOT compared to the previous baseline. Primary inputs are enclosed within parentheses, and secondary inputs are enclosed within square brackets.
Error Analysis
- Creative word morphing: Human annotators as well as the classifier misidentified the tweet ’chal bhaag m*mdi’, which translates in English as ’go run m*mdi’, as nonoffensive instead of hate-inducing. Here ’m*mdi’ is an indigenous way of referring to a particular minority that has been morphed to escape possible identification.
- Indirect hate: The tweet ’Bas kar ch*tiye m***rsa educated’ was correctly identified by our annotators as hate-inducing but the classifier identified it as abusive. This is because pre-processing of this tweet as ’Limit it m*ther f*cking religious school educated’ leads to lose in its contextual reference to customs and traditions of a particular community.
- Uncommon Hinglish words: The work in its present form dos not deal with uncommon and unknown Hinglish words. These may arise due to spelling variations, homonyms, grammatical incorrectness, mixing of foreign language, influence of regional dialect or negligence due to subjective nature of the 146 transliteration process.
- Analysis of code-mixed words: It has been shown in previous research that bilingual languages tend to be biased in favour of code-mixing of certain words at specific locations in text. Contextual investigation in this direction can be a useful to eliminate the subjective problem of Hinglish to English transliteration in future work.
- Possible overfitting on homogenous data: The data usually present on the social media portals tend to be noisy and often repetitive in content. The skew in the class balance of dataset coupled with training on deep layered model may lead to overfitting of the data and may possibly induce large variation between expected and real-world results. We suspect this might be inherent in present experiments and can be overcome by extracting data from heterogenous sources to model a real-life scenario.
Conclusion and Future work
The main contributions of our work can be summarised as follows:
- Building an annotated Hinglish Offensive Tweet (HOT) dataset
- We ascertain the usefulness of transfer learning for classifying offensive Hinglish tweets.
- We build a novel MIMCT model that outperforms the baseline models on HOT.
The future work entails:
- Feature selection for more optimal feature subsets.
- Exploration of GRU based models.
- Looking into stacked ensemble of shallow CNNs.
