Natural Language Processing (NLP) is evolving more and more every day and it is becoming a very powerful tool, especially when it works in combination with Machine Learning algorithms, as it is making ventures into areas in which it was not well known, such as automatic programming systems based on the GPT-3 model, the market or sales prediction, even, the risk detection in banking systems on the basis of written exchanges between branch managers or directors of the same bank. The so-called short texts, comments/reviews made on social networks like Twitter, Facebook or Youtube, are becoming relevant in several domains. The corpus provided by the IberLEF 2022 Task - PoliticEs was used for extract political ideology information, it was focused on the identification of the gender, the profession, and the political spectrum from a binary (Left, Right) and multi-class perspective (Left, Right, Moderate-Left and Moderate-Right). Eight methods are proposed, six of them didn't have the expected results, but contributed to the two best ones. We implemented a customized stopwords study for our research in collaboration with experiments such as Best unique words per category, Set-based study, Transition point and others to extract the features, then Random Forest, SVM and Neural Network algorithms with default parameters and the Scikit learn tool were used to identify the categories. Obtaining a Macro F1 value of 0.7984 and the highest value achieved was 0.8270 in the category of Profession.
|CEUR Workshop Proceedings
|Published - 2022
|2022 Iberian Languages Evaluation Forum, IberLEF 2022 - A Coruna, Spain
Duration: 20 Sep 2022 → …
Bibliographical notePublisher Copyright:
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
- Author profiling
- Authorship analysis
- Authorship attribution
- Linguistic features
- Natural language processing