Contextual Market Basket Analysis during Covid-19

One form of Data Mining application to analyze Market Basket Analysis. Market Basket Analysis helps identify buying patterns formed from concurrent transactions. One of the problems with Market Basket Analysis is that customer needs vary according to season and time of day, especially during this covid-19 season. For this purpose, by using the Artificial Neural Network (ANN) Approach that is connected to Market Basket Analysis, it can analyze and compare purchasing patterns and can identify rules that were formed before and after covid-19; several rule changes were found due to changes in people's behavior patterns.


INTRODUCTION
Since the World Health Organization (WHO) announced the coronavirus outbreak or Covid-19 as a pandemic, there have been many changes in various factors, one of which is in the retail business sector (Bre, Gimenez, & Fachinotti, 2018;Gangurde, Kumar, & Gore, 2017;Setiawan & Mulyanti, 2020a). Especially with the appeal from the Indonesian government to stay at home to combat the Covid-19 pandemic and the implementation of large-scale social restrictions (PSBB) in various regions, it has an impact on people's buying behavior patterns. The pandemic makes it difficult for retailers to make sales forecasts (Ariana & Asana, 2013;Fayyad, Piatetsky-Shapiro, & Smyth, 1996;Haykin, 2009;Setiabudi, Budhi, Purnama, & Noertjahyana, 2011).
One of the retail sectors that can still survive during the COVID-19 pandemic is the sale of essential ingredients and food. However, a store that sells various necessities requires an extensive database to manage the number of transactions that occur and the number of types of goods sold. In order to increase sales, scanning can be done to find patterns of people's buying behavior with the existing database; that's what makes Data Mining needed (Abdulsalam, Adewole, Akintola, & Hambali, 2014;Agarwal, 2013;Han, Pei, & Tong, 2022;Hann & Kamber, 2000).
Data Mining used in retail is generally known as a Shopping Basket Analysis (Market Basket Analysis). Market Basket Analysis helps analyze items in a shopping basket owned by buyers in the same transaction and moment (Alawadh & Barnawi, 2022;Chen, Tang, Shen, & Hu, 2005a;Kaur & Kang, 2016a). With Market basket analysis, transactions this year can help analyze patterns of purchasing goods that tend to be purchased together with other products and find the right product combination that is conditioned by the covid-19 pandemic season (Agrawal & Srikant, 1994;Bhargav, Mathur, & Bhargav, 2014;Kaur & Kang, 2016b). This is very beneficial for companies in the daily retail sector to help market these products together during this pandemic to increase purchases (Chen, Tang, Shen, & Hu, 2005b;Mansur & Kuncoro, 2012). However, using Market Basket Analysis uses a lot of data, making it difficult for daily retail sector companies able to analyze it. In general, the approach to finding buying habits and patterns uses the Association Rule to find the relationship between different items. The algorithm used to determine the most frequent itemset in the Association Rule is using Apriori (Herianda et al., 2018;Saputra & Sibarani, 2020;Setiawan & Mulyanti, 2020b).
However, the problem that often occurs in using the Apriori Algorithm is that it uses an iterative approach, where we need to find candidate sets and item sets that are often purchased repeatedly. In addition, to find differences in purchasing patterns between normal times (before covid-19) and covid-19 according to the specified time (holidays) (Dhanabhakyam, 2011;Fahrudin, 2019;Utari & Hakim, 2015).
In this paper, we will use a partially connected Artificial Neural Network (ANN) approach with a single layer feed-forward to solve this problem. In addition, with the same approach, the author will also compare and identify changes in purchasing patterns in normal times (before the covid-19 pandemic) with purchasing patterns during the covid-19 pandemic. This is expected to help find future purchasing patterns at a specific time that is adjusted to buying patterns before COVID-19.

METHOD
The proposed method was carried out to solve this problem, it can be seen in Figure 1 below: Figure 1. Data compilation diagram until the data is ready to use Based on the research methodology in Fig.4, several important things that will be done to complete this research consist of planning, collecting, and processing data, before the data is ready to be used for analysis.

A. Data Collection
Because to find the results of the Market Basket Analysis, the data needed is data that includes purchase transactions of goods in the last three years. The data is cleaned and adjusted to the existing transactions, so it is ready to be processed. The data used includes more than 1000 transaction data with purchases per transaction from 1 to 282 variants of goods. You can see an example of the available dataset table in Table I, which is taken from the existing transactions.

B. Data Pre-processing
To be able to find purchasing patterns before and after the covid-19 pandemic, the data will be identified and separated. The identification of the session before the pandemic was taken based on the time of the entry of Covid-19 in Indonesia, which is estimated to be March 2020. To find out the buying pattern before the covid-19 pandemic, using data from the previous two years, namely 2018 to 2019. So the data coverage used before covid-19 is data (2018-2019) and data after covid-19 (2020). However, because too many items are considered, the data will be adjusted to make it easier to process. This data will be grouped based on 27 categories according to the type of goods, making it easier to draw conclusions.
After dividing the two data by identifying the existing seasons, the data is divided into two, namely before the covid-19 pandemic (before covid-19) and after the covid-19 pandemic (after covid-19). The data before covid-19 is transaction data before the entry of covid-19 to see how normal purchases were two years earlier. In comparison, the data after covid-19 is transaction data after the entry of covid-19. The data is from 2018 to 2020.
Clean data that has been cleaned will be identified again. With this data, it will also be divided by time (period), namely during Christmas, Lebaran, and Ordinary Days. The following can be seen in Table II. Data that is ready to be processed based on the number of existing Transaction data and the most categories before and after Corona-19.

C. Generate Association Rule
In addition to identifying the season before and after the covid-19 pandemic, it is necessary to identify the time. The data before covid-19, it will be identified and divided into two parts based on the year, namely 2018 to 2019. By looking at the 2018 data compared to the 2019 data, the author can compare what purchasing patterns are the same. With the same data, it can be found changes in purchasing patterns in certain months. In addition, after buying patterns were found before the Covid-19 pandemic, the data can be used to identify possible purchasing patterns in the year before or after the Covid pandemic.
From all transaction data that has been identified previously, the data will be processed using the Association Rule so as to produce purchase pattern data (Market Basket Analysis) per year. The process of making Association Rules, using the Apriori Algorithm. By using Apriori, all datasets that are joined within a certain time will be processed so as to get a collection of itemsets that are often found. The set of itemsets will be adjusted to the time the data is processed, such as 2018 data, which will be processed alone, and separated from 2019 data, and so on. The algorithm analysis process uses a confidence value that is adjusted to the resulting rule.
To manage transaction data that is ready for use, the data must be adapted to machine learning models. One example is using one-hot encoding on the data. One Hot Encoding is the process of converting categorical variables into N binary columns where N is the unique value in the original column. The data is changed based on Transaction_id and the name of the item. If the number is 1, then the item is included in the transaction, and if 0, then the item is not included in the transaction. From this data, they are grouped again based on the transaction number and date, and if there are items that have the same transaction number, they will be combined into one group.
Because the data is numerical, it can be processed using the Apriori algorithm. This Apriori calculation uses Python. After finding the Frequent itemset, a rule must be made to find the antecedent and consequent of the existing rules. The minimum support and confidence values are adjusted so that they do not produce too many rules or not too few but are adjusted to the amount of data available. From the data that has been processed, the data is analyzed and adjusted and becomes data that is ready for use.
When the data is ready for use, it will enter the stage of using a Neural Network with a system design diagram, as can be seen in

D. Weight Assignment
With the data that the making of the Association Rule has produced by each Season (before and after covid-19 pandemic), the data will be given weight. This weight value is adjusted according to the type of data category and the most data in each period, which is taken from the previous Apriori process.
After being given a weight for each data time category, a combination will be found for all products before and after covid at a specific time. If the result in the calculation is positive, it will be said to be valid, and if it is negative, it will be invalid.
Based on store sales transactions, these transactions can be accumulated. The accumulation of sales transactions is what category of goods is purchased the most from the existing purchase transactions, and this forms the best category used for Neural Network calculations.

E. Apply Neural Network (FFNN)
To maximize the work of Apriori, an additional Artificial Neural Network (ANN) is used, which is partially connected to a single-layer feedforward network. Judging from several previous studies, ANN can help the use of Apriori. By using ANN, the transaction data of certain goods that have been generated by the Association rule are entered in the input layer, becoming x1=1, x2=2, x3=3, x4=4. Each result from apriori will then be used as an input that is connected to each neuron. However, because it only uses part of the neural network, there is no need to use a hidden layer.
So here, we will use a single-layer feed-forward partially connected neural network technique, as shown in Fig 7 below, to solve this problem.

Fig. 3 Model Artificial Neural Network
For this Neural Network implementation, a feed-forward neural network algorithm is applied to find the sigma value. The following can be seen as the Algorithm used for the implementation of Neural Networks in Market Basket Analysis for Predictive models: To calculate the sigma value, a feed-forward network is used. First, from the results that have been obtained, the combination is then made input nodes based on the most categories that are adjusted to the existing rules. Each combination of input nodes is calculated for the sigma value using the weight of each item that gives the weight[i] array so that it gets the sigma value. If the sigma value is positive, then the combination is considered a valid combination, but if it is not set as an invalid combination.
For the implementation of a neural network connected to a single-layer feed-forward. The results of this Neural Network will also be divided into three periods, namely Christmas, Eid, and Ordinary Days.

RESULTS AND DISCUSSION
The results of the analysis between MBA (Market Basket Analysis) and MBA + FFNN (Feed-forward Neural Network) on sales data at minimarkets before and after covid-19 have obtained data which can be displayed in the following table.

Eid
From the Eid rules, 4 categories were selected, including 2 Christmas categories before Covid-19 and after Covid-19. The categories chosen were Fragrance and Cotton Cleaners from before Covid and Cigarettes and Beauty to represent Christmas after Covid-19.
Suppose the value of the input item is x1 = Fragrance, x2 = Cotton Cleaner, x3 = Cigarettes, x4 = Beauty. At the time before covid-19, Fragrance (x1) and Cotton cleaner (x2) were the most purchased categories, then given weight, for example, 1 for both. Because Cigarettes (x3) and Beauty (x4) are among the items that are rarely purchased, they are given a weight of -5. By applying the first combination, we get (1*1) + (1*2) + ((-5)*3) + ((-5)*4) = -32 The value is negative, so the allocated threshold value is 0. So this is false and cannot be set in the basket. This can be applied to the input neurons to the overall combination.
• For the second combination (1,2, and 3) the output of the addition function will be: (1*1) + (1*2) + ((-5)*3) = -12 • For the third combination(1,2,and 4) the output of the summing function will be: (1*1) + (1*2) + ((-5)*4) = -17 • For the fourth combination (2,3, and 4) the output of the summation function will be: (1*2) +((-5)*3) + ((-5)*4) = -33 • For the fifth combination (1,3, and 4) the output of the summation function will be: (1*1) + ((-5)*3) + ((-5)*4) = -34 • For the sixth combination (1, and 2) the output of the summation function will be: (1*1) + (1*2) = 3 • For the seventh combination (1, and 3) the output of the summation function will be: (1*1) + ((-5)*3) = -14 • For the eighth combination (1, and 4) the output of the summation function will be: (1*1) + ((-5)*4) = -19 • For the ninth combination (2, and 3) the output of the summation function will be: (1*2) + ((-5)*3) = -13 • For the tenth combination (2, and 4) the output of the addition function will be: (1*2) + ((-5)*4) = -18 • For the 11th combination (3 and 4) the output of the summing function will be: ((-5)*3) + ((-5)*4) = -35 It can be interpreted that there is only one valid combination (which gives an output greater than 0), namely: (1,2), and the others are all invalid combinations. For the results of the Eid rule before covid, valid and invalid combinations can be seen in Table IV. For Eid data, it can be seen before covid, only cotton fragrances and cleaners were valid categories because cigarettes and beauty were not included in most categories in the pre-covid data. Meanwhile, compared to post-covid data after Eid, the most data are Cigarettes (x1) and Beauty (x2), so the data finds the same frequent items. Let's say the value of the input item is x1 = Beauty , x2 = Cotton Cleaner, x3 = Fast Food, x4 = Germ Cleaner. For the calculation, because we see the initial attachment of neurons (1,2,3,4), namely by connecting with all inputs (x1,x2,x3,x4), the output of the first combination is = (x1*1) + (x2*2) + (x3*3) + (x4*4) Because before covid, Beauty (X1) and Cotton Cleaner (X2) were the most purchased categories, so they were given weight, for example, 1 for both. Because Fast Food (X3) is also one of the most frequently purchased categories, it will be considered as frequently purchased, so it is given a weight of 0. And lastly, because the Germ Cleaner (X4) is one of the items that are rarely purchased, it is given a weight of -5. By applying the first combination, we get = (1*1) + (1*2) + (0*3) + ((-5)*4) = -16 After calculating using the existing formula and getting a valid and invalid combination, which can be seen in Table VI. Because after/during Covid, Ready-to-eat Food (X3) and Germ Cleanser (X4) are the most purchased categories, so they are given weight, for example, 1 for both. Because Beauty (X1) is also one of the most frequently purchased categories, it will be considered frequently purchased, so it is given a weight of 0. And lastly, because Cotton Cleaners (X2) are among the items that are rarely purchased, they are given a weight of -5. By applying the first combination, we get = (0*1) + ((-5)*2) + (1*3) + (1*4) = -3 It can be seen that at Christmas before covid, the most data from Apriori (Table VI) is Beauty and Cotton Cleaners, and one of them is Ready-to-eat Food, so it can be seen in Table  VII that frequent items generated using MBA+FFNN also found results the same one. Likewise, for Christmas data after Covid, because most data are Fast Foods and Germs, and Beauty is still included in most data in Table VI, it can be seen in Table VII that frequent items generated using MBA+FFNN also found the same results.

Ordinary Days
One of the comparisons besides Lebaran and Christmas is an ordinary day other than the two major holidays. This Ordinary Day includes 10 months other than the month in which there are Eid and Christmas holidays. From the weekday rule, 4 categories were selected, including 2 categories of weekdays before covid-19 and after covid-19. The categories chosen were Fragrance and Cotton Cleaner from before covid and Cigarettes and Beauty to represent an ordinary day after covid-19.
Because before covid, Germ Cleaner(X1) and Cotton Cleaner (X2) were the most purchased categories, so they were given weight, for example, 1 for both. Because cigarettes (X3) are among the items that are rarely purchased, they are given a weight of -5. And lastly, because Beauty (X4) is also one of the most frequently purchased categories, it will be considered frequently purchased, so it is given a weight of 0. By applying the first combination, we get = (1*1) + (1*2) + ((-5)*3) + (0*4) = -12 . It can be seen in Table VIII shows the results of valid and invalid combinations. Because after/during Covid, Cigarettes (X3) and Beauty (X4) were the most purchased categories; they were given weight, for example, 1 for both. Because the Germ Cleanser (X1) is also one of the most frequently purchased categories, it will be considered as frequently purchased, so it is given a weight of 0. And lastly, Cotton Cleaner (X2) is one of the items that are rarely purchased, so it is given a weight of -5. By applying the first combination, we get (0*1) + ((-5)*2) + (1*3) + (1*4) = -3 . Can be seen in Table IX. The last one is for Normal Day data; before covid, Germ Cleaner and Cotton Cleaners were the most data, and Beauty is also still included in most categories, so that can be seen in Table  VIII getting similar results. In the post-covid data, the most data are cigarettes and Beauty, but germ cleaners are still included in most categories, so it can be seen the results in Table IX show the same results.

CONCLUSION
Based on the results of the research that has been done, it can be seen that, in general, before and after Covid-19 experienced several significant changes. As an example of the situation before COVID-19, people's activities were mostly outside the home, and people's finances were still said to be normal. So that the presentation of needs/wants tends to be fulfilled by buying ready-to-eat (instant) food, and meeting the needs of cooking activities tend to be minimal/ignored, such as kitchen utensils, spices, and so on. However, the government required people to carry out activities at home and maintain health by consuming nutritious food and maintaining a healthy body with vitamins and medicines, and prioritizing cleanliness in the home environment to avoid COVID-19. Changes in people's behavior, which were initially consumptive (eating out of the home), became more selective (cooking at home), especially if there were people who were positive for COVID-19, their finances turned to heal.

No
Ordinary