Skip to main content

Analyse de sentiments des commentaires en Arabe classique et Dialecte Marocain : recommandation de livre et publications sur la covid-19.

Engineer: Mohamed Guassim
Organisation: HENCEFORTH
Language: French
Promotion: 2020
Year: 3

Abstract #

The majority of brands and companies in different sectors are aware of the importance of the opinions expressed about their products and services. Monitoring their E-reputations, by analyzing a large number of publications and comments from social networks, is therefore the best solution to improve their image, propose new offers and promotions, conduct marketing campaigns and make the best decision at the best time to maximize their profit and minimize their expenses.

It is in this context that the present report is written, and which addresses the Machine Learning approach to analysing the sentiments in comments written in Modern Standard Arabic or Moroccan Dialect. This work begins with the selection of a sdataset concerning the evaluation of books for Modern Standard Arabic as well as the manual annotation of a dataset of 18,959 comments on the subject of Covid-19 collected from Facebook publications in three categories (positive, negative or objective). We then pre-process and explore the textual comments for a better understanding of the dataset and its characteristics. Then, we present the steps performed to train and adapt the Machine Learning and Deep Learning models to classify these comments into 2 and 3 classes and then 5 classes for deeper analysis. As far as sentiment analysis is concerned, several researches have been carried out in order to be able to build adequate models. N-grams, Stemming and Light Stemming or the Stopwords removal are all combinations that have been tested. Experimental results have shown that Convolutional Neural Networks (CNN), Support Vector Machine (SVM) and Naive Bayes (NB) are efficient models that have given better results in sentiment analysis. We achieved 93% accuracy using the SVM model for a classification in 3 classes and 81% accuracy using the CNN model for a classification in 5 classes. While for a classification in 2 classes we achieved an accuracy of 93.7% using the NB model. These results are very satisfactory when compared with recent research that has used the same models for the same purpose.