THE PERFORMANCE OF HIGH RESOLUTION NEUTRON POWDER DIFFRACTOMETE

ISSN : P 2720-9938 E 2721-5202 ��

ECOTOURISM RECOMMENDATIONS BASED ON SENTIMENTS USING SKYLINE QUERY AND APACHE-SPARK

Nouval Trezandy Lapatta

Faculty of Teacher Training and Education, Tadulako University, Indonesia

Email: [email protected]

ARTICLE INFO	ABSTRACT
Received : 21 April 2022 Revision : 02 May 2022 Received : 15 May 2022	The selection of an ecotourism destination is a challenging service in an online transaction. The process must consider personal considerations, such as costs or distance and interesting eco-points like specific sceneries or the rare and unique picturesque landscapes. Only a few tourists have such required information for any particular local resources. A proposed recommender system is a solution for tourists to get advice on appropriate ecotourism destinations based on sentiments according to their preferences. This work proposed the skyline query method based on the Skyline Sort Filter algorithm in the Apache Spark cluster computing framework to build recommendations. The sentiment analysis process using the SentiStrength algorithm obtain an accuracy of 78.3% and F-arithmetic of 84.5%. These results indicate the proposed recommender system can detect positive responses from visitors to ensure best ecotourism recommendations with positive sentiments for tourist. Apache Spark with three computer nodes has 213.7 times faster execution time on correlated data, 240 times faster on independent data, and 288.1 times faster on anti-correlated data than a single computing method.
Keywords: Ecotourism; recommender system; skyline query; sentiments spark

Introduction

Recommender Systems are software tools and techniques that provide items suggestions that will be useful to users (Ricci, Rokach, & Shapira, 2011). Suggestions given are intended to help users in various decision-making processes (Mahmood & Ricci, 2009), such as choosing news, music (Sunitha & Adilakshmi, 2018), or tourist destinations (Gavalas, Konstantopoulos, Mastakas, & Pantziou, 2014). The growth of decision support systems and increasing data size lead researchers to seek new recommendation methods to efficiently retrieve useful insights from multi-dimensional datasets (Kalyvas & Tzouramanis, 2017). An efficient recommendation method can provide the best advice or recommendations regardless of user preferences. For example, a user wants to get the nearest tourist destinations advice at a low cost in tourism. When there are no objects that meet these criteria, the recommender system must be able to provide other interesting alternative suggestions but still meet the previous evaluation criteria such as near and cheap.

Tourism activities have a variety of certain characteristics including preference of tourists in choosing a tourism destination. For example, in choosing an ecotourism object (Damanik & Weber, 2006), a user might consider the rare or unique flora and fauna, beautiful sceneries, ease of access, or available facilities. Various criteria considered and increasing size of data cause searching the database using the conventional method will require high computation and may not produce the expected results.

In recent years, the Skyline query method has become an important issue in database research to extract interesting objects from multi-dimensional datasets. Skyline query processing applies in many applications that require multi-criteria decision making without using cumulative functions to determine the best results based on user preferences. The skyline operator (Borzsony, Kossmann, & Stocker, 2001) filters out a set of interesting objects based on evaluation criteria from a large objects dataset. Interesting objects are objects not dominated by other objects in data. An object is said not to be dominated by other objects if the value of an object is better on all criteria and better at least on one criteria (Djatna & Morimoto, 2009). Skyline query can be used to find the interesting ecotourism objects that are not dominated by other objects with certain characteristics.

Previous study developed a mobile ecotourism recommendation system using spatial data of ecotourism object, user's profile, and frequency of visits, but the data used is static or not dynamic (N. Rosmawarni, T. Djatna, 2013). In addition, tourism recommendation modelling by measuring the similarity between a user's profile and characteristics of a tourism object extracted from the tourism object's social media account (Khotimah, Djatna, & Nurhadryani, 2014). However, the characteristics do not represent the tourism object because the social media account also posts irrelevant to tourism object and its activities. This study attempts to answer the lack of previous research.

A dynamic and relevant ecotourism recommendation requires representative and updated data on the ecotourism object. The tourism sites like TripAdvisor, besides general preferences such as distance or cost in the recommendation method, also use input from tourists, such as ratings and comments. This method has a weakness in that someone can give a good rating with a bad comment or vice versa. Sentiment analysis can be applied to determine the sentiment of visitor comments precisely, whether positive, negative, or neutral (Medhat, Hassan, & Korashy, 2014). Then the sentiment score will be combined with the rating given. Through this sentiment analysis process, tourists are expected to get ecotourism recommendations with the best rate and positive responses.

The number of preferences considered, a large amount of data, and the sentiment analysis process applied can cause high complexity. Thus, computational process to produce these recommendations cannot be done using conventional methods or running on a single computer. A solution is to implement Skyline query processing with a cluster computing method (Ramdani, Djatna, & Sukoco, 2018). Cluster computing can process the task given on multiple computers in parallel. This work uses Apache Spark as a cluster computing framework. The use of cluster computing methods through Apache Spark is expected to increase the speed of generating recommendations. Finally, this study aims to develop a Skyline query to generate ecotourism recommendations based on sentiment using Apache Spark. The proposed recommender system has been implemented in an Android mobile application.

Method

The tools used in this study consist of hardware for cluster architecture and mobile application testing devices, and software for developing a recommender system. This cluster uses four nodes of virtual machines from Google Cloud Platform (GCP) with their specifications as shown in Table 1. One computer acts as a primary node that sends tasks through the cluster manager to be processed simultaneously on the other three computers as the executor called the worker node.

Table 1

Cluster architecture specifications

Component	Configuration	Version
Processor	2 vCPU	Broadwell
Memory	8 GB	DDR3
Storage media	500 GB	HDD
Operating system	Ubuntu 18.04 LTS	1.4-ubuntu18

The minimum specifications of mobile devices for implementation testing are shown in Table 2. Moreover, the software used in developing this recommendation system is shown in Table 3. Ecotourism data was collected from the Indonesia Ministry of Environment and Forestry, TripAdvisor site, and Google Maps API.

Table 3

Software used

Software	Version	Function
Anaconda	5.2.0	Python packet management and environment
Android Studio	3.4.1	Integrated Development Environment (IDE) for Android development
Apache Spark	2.4.2	Cluster-computing framework
Jupyter Notebook	4.4.0	Editor for Python application development
Kotlin	1.3.31	Programming language for Android development
Python	3.7.3	Application Programming Interface (API) language used in Apache Spark

The method consists of 2 stages, pre-processing and recommendation method development processed within the Apache Spark cluster framework, as shown in Figure 1.

Figure 1. The flow of research stages

Table 4

Sentiment dictionaries

Dictionary	Example words	Sentiment score
Booster	sangat	�� 2
Emoticon	:(	�� -3
Idiom	besar kepala	�� -4
Negation	bukan	�� -1
Negative	kotor	�� -3
Positive	senang	�� 4
Question	apakah	�� 0

d. Sentiment Analysis

An information obtained in the previous stage is a list of comments given by tourists about the ecotourism object. Based on the results, sentiment analysis is applied to determine tourist sentiment for the ecotourism object. The algorithm used for sentiment analysis is SentiStrength (Thelwall, Buckley, Paltoglou, Cai, & Kappas, 2010). SentiStrength is an algorithm with a lexicon-based classification that uses rules and additional linguistic information (non-lexical) to measure the sentiment power of short text in English (Wahid & Azhari, 2016).

SentiStrength uses positive and negative scales. This is based on psychological research, which states that humans can independently feel positive and negative emotions simultaneously to a certain extent (Norman et al., 2011). SentiStrength will produce positive and negative values in 1 to 5. Value 1 indicates sentence lacks of positive or negative sentiment, while value 5 indicates sentence has a very positive or negative sentiment (Thelwall, Buckley, & Paltoglou, 2012). Based on the sentiment score, the sentiment class of a comment text will be decided by comparing the highest positive and the highest negative score with the following rules:

1) If positive > negative, then positive sentiment.

2) If positive < negative, then negative sentiment.

3) If positive = negative, then neutral sentiment.

Furthermore, the class score is obtained from the difference between the maximum positive and maximum negative score. Then summarize between class score and rating score. The highest value obtained will be the value of sentiment attribute and used as a preference to rank recommended ecotourism objects at the next stage.

e. Skyline Query

Recommended objects are ranked using the skyline query method in the Apache Spark cluster framework. The skyline query algorithm used is Sort Filter Skyline (SFS) (Chomicki, Godfrey, Gryz, & Liang, 2003). SFS is a development of the predecessor algorithm, �Block Nested Loop (BNL). Like the naive nested-loop algorithm, BNL repeatedly reads the set of tuples and eliminates objects by finding other objects in the dataset that dominate them. Its performance is susceptible to the number of dimensions and the underlying data distribution.

SFS improves BNL performance by pre-sorting the input dataset in ascending order according to a monotone preference function, such as the sum of values of an object on all dimensions or optimized as entropy. Presorting enforces that an object p dominating another object q will be visited before q. This reduces the number of pairwise comparisons between objects and ensures the progressive behaviour of SFS. The fewer dominance tests performed, SFS is significantly more efficient in its computation. The following are the attributes of each ecotourism object data used to generate recommendations.

1) Flora:� rare plants, medicines, forests.

2) Fauna: endemic or rare animals.

3) Sceneries: a beautiful spot for photography needs.

4) Facilities: number of tourist facilities or playgrounds.

5) Access: access to ecotourism locations.

6) Rate: ecotourism rate by tourists on Google Maps.

7) Distance: the value of the distance between tourists and the ecotourism location. This value is obtained from the distance measurement process using Equation 1.

8) Sentiment: a sentiment score from tourist comments. This score is obtained from the sentiment analysis process using the SentiStrength algorithm.

This study developed method of Skyline query by implementing multi-level Skyline queries (Kodama, Iijima, Guo, & Ishikawa, 2009). The algorithm works well, but the problem is resulting skyline may consist of a small number of objects. A user who wants to compare several destinations would not be satisfied by such a result. If the number of skyline objects is not greater or equal to the user's request, then search for the next skyline object by removing skyline object that has been obtained from next candidate list.

Based on the object data of eco-tourism along with all its attributes and preferences, then for each ecotourism object t, where t = [1, 2, 3, 4 ... n] is carried out, the ranking process of object recommendations through the dominance test using the SFS algorithm with the following stages.

1) Presorting is based on the entropy value obtained from Equation 2.

� �� (2)

Where E(t) is the entropy value of object t, and t[a_i] is the normalized value of an attribute of object t in the i-dimension.

2) The t object at the top of pre-sorted data (entropy 1) is the first skyline object.

3) For each subsequent t object, dominance tests is used with the current skyline object in a window (S).

4) If t is dominated by skyline object in S, delete t.

5) If the skyline object in S does not dominate t, save t as a new skyline object.

6) If the number of skyline objects generated is smaller than the user's preference, repeat the steps by removing the current skyline object as the next candidate.

f. Implementation of Mobile Applications

The recommender system workflow on the Android mobile application can be seen in Figure 2. The preference is input from the user, such as current location or distance from user. The ecotourism list is obtained from database, while detailed information is obtained from the Google Maps API. Then rank the recommended objects through the skyline query method within Apache Spark, and the results as the ecotourism recommendation will be displayed on the user's device.

Results And Discussion

This section will explain the sentiment analysis results applied to visitor comments to get the most positive response. In addition, it will also discuss the development of recommendation methods using skyline queries and the enhancement of execution time obtained from the Apache Spark cluster computing implementation compared to the single computing method.

A. Sentiment Analysis

Sentiment analysis is performed on visitor comments on the Google Maps application using the SentiStrength algorithm to generate sentiment class, whether positive, negative, or neutral. This sentiment class is obtained by comparing the maximum positive and negative scores. At the same time, the class score is obtained from the difference between maximum positive and maximum negative scores. By summing class score the rating given by tourists, the sentiment attribute value will be obtained and used to rank the recommended object using Skyline Query algorithm in the next stage.

Table 6 illustrates the sentiment classification results of Indonesian comment text using the SentiStrength algorithm, where the maximum positive score is 4 while the maximum negative score is 3. Then the sentiment class of text will be a positive sentiment.

Table 7

Evaluation of sentiment classification results

Actual	Predicted			True	False	Total
Actual	Positive	Neutral	Negative	True	False	Total
Positive	8	18	185	185	26	211
Neutral	5	92	29	92	34	126
	11	7	13	11	20	31
Total	24	117	227	288	80	368

B. Development of Recommendation Method

Based on available attributes and preferences given by users, ecotourism object recommendations are generated using the skyline query through the Sort Filter Skyline (SFS) algorithm with a multi-level skyline query method. This test is carried out using the preferences as follows:

1) For the distance attribute, minimal value is better.

2) For another attribute, maximal value is better.

3) The number of recommendations expected by the user (k) is 6, with the maximum distance being 200 km.

The result of the ranking process is shown in Table 8. Based on the preferences, the number of skyline objects generated in the first process (Level 1) is 4 objects. This number is less than the user's request. As a result, Level 2 Skyline query process is carried out by removing the skyline object generated in Level 1 from the next candidate list. Then Level 2 generates 4 skyline objects, the total number of skyline objects is 8. This number is more than the user's request, thus the Skyline query process stops at the second level, and the results are delivered to the user.

Table 8

Result of multi-level skyline query with k=6

Level	Skyline object	Attributes*								Candidates	Time (ms)
Level	Skyline object	1	2	3	4	5	6	�� 7**	8	Candidates	Time (ms)
1	TN Pancar Mountain	2	3	3	3	3	5	�� 1.5	8	15	0.998
	TN Ujung Kulon	3	3	3	3	3	5	153.8	4
	SM Sawal Mountain	3	2	2	3	2	5	180.3	9
	TN Halimun Salak Mountain	3	3	2	3	2	3	� 13.1	4
2	SM Muara Angke	2	3	3	3	3	5	� 51.2	7	11	0.996
	TN Ciremai Mountain	3	3	3	3	3	5	187.3	1
	TN Gede Pangrango Mountain	2	2	3	2	2	5	� 33.1	3
	TWA Cimanggu	2	2	2	2	2	3	� 30.8	3

* value before normalized, ** distance attribute (in km)

Table 9

Speedup results of cluster computing method

Number of objects	Correlated			Independent			Anti-correlated
	Number of nodes
	1	2	3	1	2	3	1	2	3
2000	4.7	6.8	10.0	7.8	9.8	13.4	9.2	9.6	14.3
4000	15.8	26.2	38.4	28.6	33.0	46.7	34.4	37.5	49.3
6000	25.8	53.9	79.5	61.1	69.1	102.3	67.2	80.1	99.9
8000	46.4	101.0	146.0	96.0	125.8	185.7	105.1	118.8

Component	Configuration
Memory	2 GB
System Development Kit (SDK)	API 24 Android 7.0 Nougat
Screen resolution	1280 x 800

API	Endpoint	Output
Distance Matrix	https://maps.googleapis.com/maps/api/distancematrix/parameter	Distance and route to the object
Open Weather	https://api.openweathermap.org/data/version/weather?parameter	The weather around the object
Place Details	https://maps.googleapis.com/maps/api/place/details/parameter	Detailed information about the object

Information	Value
Comment text	Pemandangannya sih sangat bagus tapi tempatnya agak jauh :(
Classified text	pemandangannya sih sangat bagus [6] tapi tempatnya agak jauh [-1] :( [-3]
Maximum positive score	6
Maximum negative score	-3
Sentiment class	Positive
Class score	3