TeamMX at PoliticEs 2022: Analysis of Feature Sets in Spanish Author Profiling for Political Ideology

José Luis Ochoa-Hernández*, Yuridiana Alemán

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

Natural Language Processing (NLP) is evolving more and more every day and it is becoming a very powerful tool, especially when it works in combination with Machine Learning algorithms, as it is making ventures into areas in which it was not well known, such as automatic programming systems based on the GPT-3 model, the market or sales prediction, even, the risk detection in banking systems on the basis of written exchanges between branch managers or directors of the same bank. The so-called short texts, comments/reviews made on social networks like Twitter, Facebook or Youtube, are becoming relevant in several domains. The corpus provided by the IberLEF 2022 Task - PoliticEs was used for extract political ideology information, it was focused on the identification of the gender, the profession, and the political spectrum from a binary (Left, Right) and multi-class perspective (Left, Right, Moderate-Left and Moderate-Right). Eight methods are proposed, six of them didn't have the expected results, but contributed to the two best ones. We implemented a customized stopwords study for our research in collaboration with experiments such as Best unique words per category, Set-based study, Transition point and others to extract the features, then Random Forest, SVM and Neural Network algorithms with default parameters and the Scikit learn tool were used to identify the categories. Obtaining a Macro F1 value of 0.7984 and the highest value achieved was 0.8270 in the category of Profession.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume3202
StatePublished - 2022
Event2022 Iberian Languages Evaluation Forum, IberLEF 2022 - A Coruna, Spain
Duration: 20 Sep 2022 → …

Bibliographical note

Publisher Copyright:
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

Keywords

  • Author profiling
  • Authorship analysis
  • Authorship attribution
  • Linguistic features
  • Natural language processing

Fingerprint

Dive into the research topics of 'TeamMX at PoliticEs 2022: Analysis of Feature Sets in Spanish Author Profiling for Political Ideology'. Together they form a unique fingerprint.

Cite this