Customer Review Sentiment Analysis

The project aims to analyse the sentiment of amazon reviews with the help of Machine Learning & Deep Learning. Sentiment analysis (or opinion mining) uses natural language processing and machine learning to interpret and classify emotions in subjective data. This helps to further understand the overall sentiment of customers as well as predict the sentiments of incoming reviews. Additionally, the models were deployed to Heroku and an application was developed using Flutter to predict sentiment of any input sentence using API calls.

The reviews were collected from the dataset provided by Jianmo Ni available HERE. They were initially cleaned by removing stopwords and lemmatizing each word of the sentence to get the roots. Then, feature extraction was conducted on the collected reviews. Here, two ways to make binary/boolean features or to vectorize were exercised.

  • True/False approach:
    • In this approach, the presence of certain words (features) in training data is tested and fitted to the resultant sentiment.
    • Each feature is given a sentiment probability based on the number of times one classification is made with the presence of each feature being true (is present) or false (is absent).
    • Here, it only considers the occurrence of the feature and not the frequency.
  • Count Vectorization approach:
    • In this approach, the words (features) are converted into a vector (matrix) storing the number of occurrences of each feature i.e. its frequency.


The following classifiers were utilized:

  • From NLTK (True/False based):
    • Bernoulli Naive Bayes
  • From Sklearn (Count Vectorization - Unigram and Bigrams):
    • Multinomial Naive Bayes
    • Support Vector Machines
    • Decision Tree
  • From Tensorflow (Pre-trained Text Embedding Layer trained on English Google News 7B corpus):
    • LSTM
    • ANN

The models were developed using Google Colab (Link HERE).

In order to perform the live testing of the models as well as deployment, an application was developed named ‘ARSA’ (Amazon Review Sentiment Analysis) (Code HERE). The flask program and ML models were uploaded to Heroku using Github REPO and the server was deployed. Within the server, the flask API waits for a POST request which will be sent by flutter when the user presses the ‘PREDICT’ button. This sends the data in json format to the API and also invokes the ‘/predict’ route where all the analysis takes place.


Application Screenshots:


Classification Report of all models:

Model PrecisionRecallF1-scoreSupport
SVMneg0.830.510.632446
(uni-grams)pos0.860.970.917554
 accuracy0.85  10000
 macro avg0.840.740.7710000
 weighted avg0.850.850.8410000
SVMneg0.830.530.652446
(bi-grams)pos0.860.970.917554
 accuracy0.86  10000
 macro avg0.850.750.7810000
 weighted avg0.860.860.8510000
DTneg0.560.540.552446
(uni-grams)pos0.850.860.867554
 accuracy0.78  10000
 macro avg0.70.70.710000
 weighted avg0.780.780.7810000
DTneg0.560.540.552446
(bi-grams)pos0.850.860.867554
 accuracy0.79  10000
 macro avg0.710.70.710000
 weighted avg0.780.790.7810000
MNBneg0.70.660.682352
(uni-grams)pos0.90.910.97648
 accuracy0.85  10000
 macro avg0.80.790.7910000
 weighted avg0.850.850.8510000
MNBneg0.680.730.72352
(bi-grams)pos0.90.910.97648
 accuracy0.85  10000
 macro avg0.790.810.810000
 weighted avg0.860.850.8510000
Neural Networkneg0.730.630.681276
 pos0.880.920.93724
 accuracy0.85  5000
 macro avg0.80.780.795000
 weighted avg0.840.850.845000
LSTMneg0.660.750.71276
 pos0.910.870.893724
 accuracy0.84  5000
 macro avg0.780.810.795000
 weighted avg0.850.840.845000

It was observed that DT has the lowest scores (0.78) while the rest of the models had a similar accuracy (0.86 - 0.87). Also, it can be seen that the neural network models appear to have similar accuracy (0.85) as that of the simpler ML counterparts however, it must be noted that this accuracy was achieved with half the data than that feeded to ML models. As evident,, the neural network models outperform the ML models. However, the Neural Network models are comparatively heavier to deploy than ML models owing to the fact that they require higher computational power as well as memory space.