Divide and Conquer: An Ensemble Approach for Hostile Post Detection in Hindi

Recently the NLP community has started showing interest towards the challenging task of Hostile Post Detection. This paper present our system for Shared Task at Constraint2021 on "Hostile Post Detection in Hindi".The data for this shared task is provided in Hindi Devanagari script which was collected from Twitter and Facebook. It is a multi-label multi-class classification problem where each data instance is annotated into one or more of the five classes: fake, hate, offensive, defamation, and non-hostile. We propose a two level architecture which is made up of BERT based classifiers and statistical classifiers to solve this problem. Our team 'Albatross', scored 0.9709 Coarse grained hostility F1 score measure on Hostile Post Detection in Hindi subtask and secured 2nd rank out of 45 teams for the task. Our submission is ranked 2nd and 3rd out of a total of 156 submissions with Coarse grained hostility F1 score of 0.9709 and 0.9703 respectively. Our fine grained scores are also very encouraging and can be improved with further finetuning. The code is publicly available.

分而治之:印地语敌对哨声检测的综合方法

最近,NLP社区已开始对敌对岗位检测的挑战性任务表现出兴趣。本文介绍了在Constraint2021上针对“印地语中的敌对哨所检测”的共享任务系统。.. 共享任务的数据以印地语Devanagari脚本提供,该脚本是从Twitter和Facebook收集的。这是一个多标签的多类别分类问题,其中每个数据实例都被注释为以下五个类别中的一个或多个:伪造,仇恨,令人反感,诽谤和非恶意。为了解决这个问题,我们提出了一个由基于BERT的分类器和统计分类器组成的两级体系结构。我们的团队“信天翁”在印地语子任务的敌对岗位检测中获得0.9709粗粒度敌对F1得分,在45个团队中获得第二名。在156个提交的材料中,我们的提交材料分别排名第2和第3,粗粒度敌意F1得分分别为0.9709和0.9703。我们的细粒度评分也非常令人鼓舞,可以通过进一步的微调来提高。 (阅读更多)