Convolutional Neural Network Method in Determining
Pfizer Vaccination Sentiment Analysis
Riza�Adrianti Supono1*, Hizkia Abiel
Muljana2
Master of
Information Systems Management, Universitas Gunadarma, Depok,
Indonesia1*
Information
Systems, Faculty of Computer Science and Industrial Technology, Universitas Gunadarma, Depok,
Indonesia2
Email: [email protected]1*,
[email protected]2
ABSTRACT
Coronavirus
(COVID-19) is a disease caused by the SARS-CoV-2 virus by attacking the
respiratory system in humans and because of the rapid spread of infection, WHO
declared COVID-19 a pandemic. Over time, several types of vaccines have been
discovered which are thought to minimize the possibility of infection. One of
the vaccines is Pfizer. During the use of the Pfizer vaccine, there have been
pros and cons caused by the side effects of using the vaccine. Therefore,
sentiment analysis was carried out on public opinion with data sourced from
tweets on Twitter. The method used in making the model is Convolutional Neural
Network (CNN). This model has been successfully created and has been tested on
1158 training data and 773 test data. The training data obtained an accuracy
level of 98.87 % and the test data obtained an accuracy level of 69.46%.
Keywords: Pfizer Vaccination, Sentiment Analysis,
Convolutional Neural Network
Coronavirus (COVID-19) is a disease caused by the SARS-CoV-2
virus by attacking the human respiratory system (Zhang et al., 2020).
The first case occurred in Wuhan, China at the end of 2019 and then spread to
all parts of the world, so that the World Health Organization (WHO) declared
COVID-19 a pandemic (Chaurasiya et al., 2020).
Because of this, various countries are trying to suppress the spread by closing
access to several areas (Saadat et al., 2020).
Of course, this has a big impact on the lifestyle of various groups of society (Wilkinson, 2020).
Therefore, researchers and medical personnel are trying to overcome this virus
by finding the right vaccine (Krubiner et al., 2021).
So far several vaccines have been discovered that can help minimize the
possibility of contracting COVID-19 (Sultana et al., 2020).
One of the vaccines that has been successfully created is
Pfizer (Bernal et al., 2021).
This vaccine was produced from collaboration between the BioTech, Fosun and
Pfizer companies (Feix & Feix, 2021).
The Pfizer vaccine has a high effectiveness value in stimulating the production
of antibodies for COVID-19, namely 90% (Inchingolo et al., 2022).
Quoting from covid.go.id, this vaccine has entered Indonesia in August 2021 and
is being distributed in stages for the activity of administering dose 1 to
those who have not had the vaccine and dose 2.
On January 11 2022, the government decided to carry out the
3rd dose of vaccine (booster) which was implemented on January 12 2022 by
prioritizing elderly and vulnerable groups who had received the 2nd vaccine
more than 2 months ago. One of the vaccines used to carry out Booster
activities is Pfizer (Yehezkie & Ramatillah, 2023).
After using the vaccine, it causes several side effects,
giving rise to pros and cons in the opinion of the public. This opinion can be
found on several social media available in Indonesia. Social media has helped
various people to communicate long distances, communication can even be done
without needing to know the person beforehand. Apart from communicating between
individuals, social media can be used to disseminate information and public
opinion. One of the social media commonly used by Indonesian people is Twitter.
Quoting writing done by Angeline Puput Giovani, et al; Twitter has become
popular in Indonesian society because of its simplicity and ease of use, and
users can freely express their views or opinions. To search for information,
you can type certain keywords to find the desired information. Apart from that,
Twitter has become a forum for accommodating various public opinions regarding
certain topics. Therefore, to make it easier to respond to certain topics,
sentiment analysis can be carried out and the results can be used as a
consideration for public response.
Quoting from writing by Sukma Nindi Listyarini, sentiment
analysis is a computational study of individual attitudes, opinions and
emotions towards an entity. Entities can represent individuals, events, or
topics. Algorithms commonly used when conducting sentiment analysis in
Indonesian are Naive Bayes, Maximum Entropy (ME), Support Vector Machine (SVM),
and Decision Tree. Meanwhile, research on sentiment analysis in English has
applied a deep learning method, namely Convolutional Neural Network (CNN),
which produces much better output than other algorithms, namely; Precision 7%,
Recall 8%, F-1 Score 9%.
Related research that has been conducted previously provides
an important foundation in understanding Convolutional Neural Network (CNN)
methods. Azhar Eka Mulia Wiguna et al. (2021)
applied CNN to detect threat speech on Twitter social media posts, with system
accuracy results reaching 80.63 % . Meanwhile, Hans Juwiantho et al. (2020)
developed the Word2Vec model for Twitter sentiment analysis in Indonesian,
achieving an average accuracy of up to 76.40 % . Apart from that, research by
Sukma Nindi Listyarini and Dimas Aryo Anggoro produced the highest accuracy of
90% in analyzing regional election activities using CNN. Kzar (2023)
also used CNN for product sentiment analysis, with the best model achieving an
accuracy of 81.4 %. Finally, Parameswari (2022)
achieved the highest accuracy in environmental opinion analysis in Depok City
with a score of 86%.
The methodology that will be carried out consists of several
stages such as system requirements analysis, data collection, data
pre-processing, sentiment labeling, data vector presentation, model training
and testing and displaying sentiment results.
This section will
explain the stages carried out in the research to produce the previously
planned output. The first stage is an analysis of the system requirements
required during the research. The second stage, collecting tweets with the
keyword "Pfizer" in Indonesian which will be used as a dataset. The
third stage carries out data preprocessing such as case folding, cleaning text,
normalization, stopword removal, stemming. The fourth stage is to label the
dataset into "positive", "neutral", "negative".
The fifth stage converts the data that has been labeled into a vector. The
sixth stage creates a CNN model which will be used for the training data model,
validation model, testing model to get a value for the accuracy of the model
created. The seventh stage, create data visualization from the results of the
model that has been created.
System
requirements analysis is a stage that aims to determine the functional and
non-functional requirements required.
������� Functional requirements in this
research:
a.
Pre-paration
data results.
b.
Classification
results: positive, neutral and negative.
c.
Results
of the Convolutional Neural Network (CNN) method.
������� Non-functional needs in this research:
a.
Software
requirements:
1)
MacOS
High Sierra version 10.13.6
2)
Google
Chrome
3)
Google
Collab
4)
Python
5)
Google
Sheets
b.
Hardware
requirements:
1)
2.5GHz
Intel Core i5
2)
4
GB 1600 MHz DDR3
3)
500GB
HDD storage
��� This research uses secondary data obtained
by collecting tweets that contain the keyword "Pfizer" in Indonesian.
This data was obtained by the crawling method using the tweepy and twint
libraries. The data crawling process succeeded in collecting tweet data with a
total of 5,131 data published on April 9 2022 - June 6 2022. Table 1 is some
examples of tweet data that were successfully collected.
Table
1Raw
Tweet
|
No |
Date |
Username |
Tweet |
|
1 |
2022-06-06 11:47:01 |
Ndhr |
Anyway, you can check the
Malaysian version of the complete vaccine definition here https://t.co/WDfimMvRbD The point is, if you are 60+
and/or have 1-2 Sinovac or AstraZeneca vaccines, you must get a booster. If
you're 60 or under, just get the Pfizer or Moderna vaccine twice and consider
being fully vaccinated. |
|
2 |
��������� 2022-06-06 10:04:37�� |
Style_ID |
FDA Accepts Pfizer
Application for Covid-19 Vaccine for Children Under 5 Years Read more information at https://t.co/ujudUa62kz #styleid #fda
#vaks�nasicovid19 #vaksinpfizer #vaks�nanak #health |
|
3 |
2022-06-06 9:54:53 |
SuperB |
RT @clairvoyant_cl: Has
anyone tried the main whole virus vaccine (Sinovac, Sinopharm), booster 1
mRNA (Pfizer, Moderna), bo... |
This
stage processes unstructured data into more structured data. This is needed to assist
the data processing process at the next stage.
������� Case Folding is the initial stage carried
out when processing raw data into data that is ready to be used. This stage
changes each letter character in the data to lower case.
������� At this stage, clean the tweet data from
URLs, punctuation, special characters and repeated spaces. How this stage works.
Table
2. Cleaning Text
|
No |
Tweet |
Tweet_Clean |
|
1 |
Anyway,
you can check the Malaysian version of the complete vaccine definition here https://t.co/WDfimMvRbD The
point is, if you are 60+ and/or have 1-2 Sinovac or AstraZeneca vaccines, you
must get a booster. If you're 60 or under, just get the Pfizer or Moderna
vaccine twice and consider being fully vaccinated. |
Anyway,
you can check the Malaysian version of the complete vaccine definition here https://t.co/wdfimmvrbd Basically,
if you are 60+, and/or have 1-2 Sinovac or AstraZeneca vaccines, you need a
booster. If you are 60 or under, just get the Pfizer or Moderna vaccine twice
and consider being fully vaccinated. |
|
2 |
FDA
Accepts Pfizer Application for Covid-19 Vaccine for Children Under 5 Years Read
more information at https://t.co/ujudUa62kz #styleid
#fda #vaks�nasicovid19 #vaksinpfizer #vaks�nanak #health |
FDA
accepts Pfizer application for Covid-19 vaccine for children under 5 years
old see
complete information at https://t.co/ujudua62kz #styleid
#fda #vaks�nasicovid19 #vaksinpfizer #vaks�nanak #health |
������� Standard words into standard ones. This
normalization refers to the rules for writing in Indonesian.
Table 3. Normalization
|
No |
Tweet |
Tweet_Clean |
|
1 |
Anyway, the Malaysian
version of the complete vaccine definition can be checked here. Basically,
if you are old, and/or the vaccine is Sinovac or AstraZeneca, you must get a
booster. If the vaccine is lower than Pfizer or Moderna, I'd consider being
fully vaccinated. |
Anyway, the Malaysian
version of the complete vaccine definition can be checked here. The point
is, if you are old, and/or the vaccine is Sinovac or Astrazeneca, you must
get a booster. If the vaccine is lower than Pfizer or Moderna, consider
being fully vaccinated. |
|
2 |
fda accepts pfizer
application for covid vaccine - children under 1 year old see more
information at �nasicovid �nanak |
fda accepts pfizer
application for covid vaccine for children under 1 year old see complete information
at inasicovid inanak |
|
3 |
_cl: has anyone tried the
main whole virus vaccine (sinovac, sinopharm), mrna booster (pfizer,
moderna), bo... |
_cl: has anyone tried the
main whole virus vaccine (sinovac, sinopharm), mRNA booster (pfizer,
moderna), bo... |
Stopword
removal is a stage for removing words that are considered unimportant and do
not affect analysis activities.
|
No |
Tweet |
Tweet_Clean |
|
1 |
Anyway, the Malaysian
version of the complete vaccine definition can be checked here. The point
is, if you are old, and/or the vaccine is Sinovac or Astrazeneca, you must
get a booster. If the vaccine is lower than Pfizer or Moderna, consider
being fully vaccinated. |
Anyway, check the Malaysian
version of the complete vaccine definition here. Yes, the point is that if
you are old, the Sinovac Astrazeneca vaccine requires a booster. If the
vaccine is lower than Pfizer or Moderna, consider fully vaccinated. |
|
2 |
fda accepts pfizer
application for covid vaccine for children under 1 year old see complete
information at inasicovid inanak |
FDA accepts Pfizer Covid
vaccine application for under-year-olds. See complete information about
Covid-19 in children |
|
3 |
_cl: has anyone tried the
main whole virus vaccine (sinovac, sinopharm), mRNA booster (pfizer,
moderna), bo... |
_cl: have you ever tried the
main whole virus vaccine (sinovac, sinopharm), mrna booster (pfizer,
moderna), bo... |
Stemming
is the process of changing a word back to its basic form. This can happen by
deleting affixes at the beginning and end of a word. The stemming process can
be seen in Table 5.
Table 5. Stemming
|
No |
Tweet |
Tweet_Clean |
|
1 |
Anyway, check the Malaysian
version of the complete vaccine definition here. Yes, the point is that if
you are old, the Sinovac Astrazeneca vaccine requires a booster. If the
vaccine is lower than Pfizer or Moderna, consider fully vaccinated. |
Anyway, the Malaysian
version of the complete vaccine definition, check here. Yes, the point is,
if the vaccine is old, Sinovac AstraZeneca requires a booster, if it's lower
than the Pfizer and Moderna vaccines, consider being fully vaccinated. |
|
2 |
FDA accepts Pfizer Covid
vaccine application for under-year-olds. See complete information about
Covid-19 in children |
FDA accepts Pfizer Covid
vaccine application for under-year-olds. Check out complete information
about Nanak's Covid-19 vaccine |
|
3 |
_cl: have you ever tried the
main whole virus vaccine (sinovac, sinopharm), mrna booster (pfizer,
moderna), bo... |
cl has tried the principal
whole virus vaccine Sinovac Sinopharm booster mrna pfizer moderna bo |
Labeling
in this research uses Textblob. Textblob plays a role in determining the value
of a tweet. After obtaining this value, a label will be given with the value
criteria -1, 0, 1. If the resulting value from the textblob is smaller than 0,
then the tweet will have a negative value. If the value is greater than 0 it
will be positive and it will be neutral if the value is equal to 0.
After going through the data
preprocessing process, the data set has a total of 1932 tweets with three
labels, namely 797 positive data, 835 neutral data and 300 negative data.
����������� The training dataset used
amounts to 60% of the dataset, namely 1,159 data.
����������� The test dataset used amounts
to 40% of the dataset, namely 773 data.
In this section, we use the Keras
library, namely Tokenizer, to convert text into word index or binary vector
form.
����������� The CNN model architecture used
is as follows:
In this section, we will explain CNN
model training which will be followed by model validation and CNN model
testing.
Model training uses 60% of the dataset.
This data is used to train a CNN model to determine sentiment classification in
tweets. Next, use a confusion matrix consisting of a true positive, two false
positives, a true negative, two false negatives, a true neutral and two false
neutrals.
Table
6. Confuxion Matrix
|
|
Predictions |
|||
|
Negative |
Neural |
Positive |
||
|
Actual |
Negative |
True
Negative |
False
Neural |
False
Positives |
|
Neural |
False
Negatives |
True
Neural |
False
Positives |
|
|
Positive |
False
Negatives |
False
Neural |
True
Positive |
|
Based
on table 6, true negative shows data that is predicted correctly with negative
sentiment, while false negative shows data that is predicted incorrectly with
negative sentiment. True positive shows data that is predicted correctly with
positive sentiment, while false positive shows data that is predicted
incorrectly with positive sentiment. True neutral indicates data that is
predicted correctly with neutral sentiment, while false neutral indicates data
that is predicted incorrectly with neutral sentiment.
�� Model testing uses test data of 40% of the dataset.
Testing data is not included in the data used for model training. This test
will use a confusion matrix which contains true positive, false positive, true
negative, false negative, true neutral and false neutral values.
The final stage is designing a
visualization of the results of sentiment analysis for Pfizer. This
visualization aims to make it easier to understand the data that has been
analyzed. Visualization in the form of a pie chart and wordcloud.
IMPLEMENTATION
AND TRYING
Stage
is carried out on all data that has been collected. Data preprocessing consists
of case folding, text cleaning, normalization, stopword removal and stemming
stages. The following is an attachment to the program carried out during the
data preprocessing process.

Figure
2. Case folding function code

Figure
3. Cleaning Text Function Code

Figure
4. Normalization Function Code

Figure
5. Stopword Removal function code

Figure
6. Stemming function code
Labeling
uses the Textblob library where Indonesian text will be translated into English
to get the polarity value. then this value will be used to determine the label
for a text with the condition that if the value is greater than 0 it will be
given a positive label, if the value is equal to 0 it will be given a neutral
label and if the value is smaller than zero it will be given a negative label.
Labeling results can be seen in table 7.
Table 7. Tweet Labeling
|
No |
Tweet Preprocessing Results |
Label |
|
1 |
Pfizer
booster vaccine is effective against omicron in children |
Positive |
|
2 |
information
on pfizer booster vaccine magelang dongg |
Neutral |
|
3 |
the
weak pfizer booster fever |
Negative |
��������� Word
Embedding aims to convert tweet data into vector. The word embedding results
can be seen in table 8.
Table 8. Word Embedding Results
|
No |
Tweet Preprocessing Results |
Vector |
|
1 |
Pfizer
booster vaccine is effective against omicron in children |
[2, 3, 1, 55, 113, 62, 25] |
|
2 |
information
on pfizer booster vaccine magelang dongg |
[34, 2, 3, 1, 1140, 1086] |
|
3 |
the
weak pfizer booster fever |
[193, 626, 3, 1, 29] |
��� CNN model training was carried out on 60% of
the entire dataset, namely 1,159 data. The highest accuracy value on the
training data is 1.0 which is at the 100th epoch and the highest accuracy on
the validation data is at the 25th epoch, namely 0.6947,
so this research uses a model with a total of 25 epochs.
However, if you look at table 8, the loss value in the
validation data is higher than the loss value in the training
data. This is called overfitting where the model works better during training.

Figure
7. Confusion Matrix Training Data
In Figure 4.6,
the model created succeeded in predicting 501 of the 505 data labeled as
actually negative, but the model considered that 4 data were neutral. Then 483
data were labeled neutral, the model succeeded in predicting 474 data correctly
and 9 data labeled neutral were considered positive and the model succeeded in
predicting 171 as data with a positive label.
Testing on the CNN model was carried out on 40% of the
entire dataset, namely 773 data. This test was carried out using a number of epochs of 25.
Figure 6 is a Confusion Matrix generated using test data.

Figure 8 Test Data Confusion Matrix
In
Figure 8, the model created succeeded in predicting 214 of the 261 data with
actual negative labels, but the model considered 33 data to be neutral and 14
data to have a positive label. Then, of the 411 data labeled neutral, the model
succeeded in predicting 263 data as neutral, but the model considered 102 data
labeled negative and 46 data labeled positive. In 101 positive data, the model
actually succeeded in predicting 60 data labeled positive, but the model
considered 18 data as negative and 23 data as neutral.
At this stage, a visualization of the results of the
sentiment analysis that has been carried out will be displayed. The visualization will use a
line plot to determine the total number of tweets made on each date, a pie
chart to display the percentage of negative, neutral and positive labels and a
wordcloud visualization to determine the highest volume of word usage.
Sentiment analysis towards the Pfizer vaccine using
Convolutional Neural Network (CNN) has been successfully carried out using data
in the form of tweets in Indonesian taken from Twitter. After the
pre-processing process, the dataset size was reduced from 5,131 data to 1,932
data, which was then classified into three labels: positive, neutral, and
negative. The highest percentage belonged to neutral labels (43.2 %), followed
by positive labels (41.3%) and negative labels (15.5%). The CNN model achieved
training accuracy of 98.87 % on 1,159 training data and testing accuracy of
69.46% on 773 test data. Suggestions for future research are to use a larger
dataset and a wider time span to increase modeling accuracy.
REFERENCES
Bernal, J. L., Andrews, N., Gower, C., Robertson, C., Stowe,
J., Tessier, E., Simmons, R., Cottrell, S., Roberts, R., & O�Doherty, M.
(2021). Effectiveness of the Pfizer-BioNTech and Oxford-AstraZeneca vaccines on
covid-19 related symptoms, hospital admissions, and mortality in older adults
in England: test negative case-control study. Bmj, 373.
Chaurasiya, P., Pandey, P., Rajak, U.,
Dhakar, K., Verma, M., & Verma, T. (2020). Epidemic and challenges of
coronavirus disease-2019 (COVID-19): India response. Available at SSRN
3569665.
Feix, T., & Feix, T. (2021).
Developing a COVID-19 vaccine to save the world. Valuing Digital Business
Designs and Platforms: An Integrated Strategic and Financial Valuation
Framework, 75�113.
Inchingolo, A. D., Malcangi, G., Ceci,
S., Patano, A., Corriero, A., Vimercati, L., Azzollini, D., Marinelli, G.,
Coloccia, G., & Piras, F. (2022). Effectiveness of SARS-CoV-2 vaccines for
short-and long-term immunity: a general overview for the pandemic contrast. International
Journal of Molecular Sciences, 23(15), 8485.
Juwiantho, H., Setiawan, E. I., Santoso,
J., & Purnomo, M. H. (2020). Sentiment analysis twitter bahasa indonesia
berbasis word2vec menggunakan deep convolutional neural network. Jurnal
Teknologi Informasi Dan Ilmu Komputer, 7(1), 181�188.
Krubiner, C. B., Faden, R. R., Karron, R.
A., Little, M. O., Lyerly, A. D., Abramson, J. S., Beigi, R. H., Cravioto, A.
R., Durbin, A. P., & Gellin, B. G. (2021). Pregnant women & vaccines
against emerging epidemic threats: ethics guidance for preparedness, research,
and response. Vaccine, 39(1), 85�120.
Kzar, B. I., & Safi, H. H. (2023).
Systematic review of sentiment analysis and predict sarcastic. Journal of
Al-Qadisiyah for Computer Science and Mathematics, 15(2), Page-166.
Parameswari, P. L., & Prihandoko, P.
(2022). Penggunaan Convolutional Neural Network Untuk Analisis Sentimen Opini
Lingkungan Hidup Kota Depok di Twitter. Jurnal Ilmiah Teknologi Dan Rekayasa,
27(1), 29�42.
Saadat, S., Rawtani, D., & Hussain,
C. M. (2020). Environmental perspective of COVID-19. Science of the Total
Environment, 728, 138870.
Sultana, J., Mazzaglia, G., Luxi, N.,
Cancellieri, A., Capuano, A., Ferrajolo, C., de Waure, C., Ferlazzo, G., &
Trifir�, G. (2020). Potential effects of vaccinations on the prevention of
COVID-19: rationale, clinical evidence, risks, and public health
considerations. Expert Review of Vaccines, 19(10), 919�936.
Wiguna, A. E. M., Nasrun, M., &
Nugrahaeni, R. A. (2021). Deteksi Ujaran Ancaman Berbasis Website Pada
Postingan Media Sosial Twitter Menggunakan Metode Convolutional Neural Network.
EProceedings of Engineering, 8(1).
Wilkinson, R. G. (2020). The impact of
inequality: How to make sick societies healthier. Routledge.
Yehezkie, M. P., & Ramatillah, D. L.
(2023). Evaluation Comparison of the Effectiveness of Full Dose Pfizer
Vaccine with Pfizer Booster Society in Indonesia.
Zhang, Y., Geng, X., Tan, Y., Li, Q., Xu,
C., Xu, J., Hao, L., Zeng, Z., Luo, X., & Liu, F. (2020). New understanding
of the damage of SARS-CoV-2 infection outside the respiratory system. Biomedicine
& Pharmacotherapy, 127, 110195.
|
Copyright holder: Riza Adrianti
Supono, Hizkia Abiel Muljana (2024) |
|
First publication right: |
|
This article is licensed under: |