I need 1000 word literature review of two article so each 500 word
My topic :Applicatyion of text mining on product satisfaction and feedback
Literature Review: Describe two (2) references (must be research articles from journal/conference/academic report/thesis) that are relevant to your topic. Include the general background of references, dataset used, details of how the text mining process is applied, as well as relevant findings and conclusions. Discuss the implications of the references to the current project. See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/344875850
Exploring healthcare/health-product ecommerce satisfaction: A text mining
and machine learning application
Article in Journal of Business Research · October 2020
DOI: 10.1016/j.jbusres.2020.10.043
CITATIONS
READS
29
384
4 authors, including:
Swagato Chatterjee
Jiwan Sharma
Indian Institute of Technology Kharagpur
Indian Institute of Technology Kharagpur
20 PUBLICATIONS 253 CITATIONS
1 PUBLICATION 29 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Studies on User Generated Content View project
B2B Marketing View project
All content following this page was uploaded by Swagato Chatterjee on 25 October 2020.
The user has requested enhancement of the downloaded file.
SEE PROFILE
Journal of Business Research xxx (xxxx) xxx-xxx
Contents lists available at ScienceDirect
Journal of Business Research
PR
OO
F
journal homepage: http://ees.elsevier.com
Exploring healthcare/health-product ecommerce satisfaction: A text mining and
machine learning application
Swagato Chatterjee a ,1 ,⁎ , Divesh Goyal b ,1 , Atul Prakash b ,1 , Jiwan b ,1
a
b
Vinod Gupta School of Management, Indian Institute of Technology, Kharagpur, Kharagpur, West Bengal 721302, India
Indian Institute of Technology, Kharagpur, Kharagpur, West Bengal 721302, India
ABSTRACT
Keywords
In the digital era, online channels have become an inevitable part of healthcare services making healthcare/
health-product e-commerce an important area of study. However, the reflections of customer-satisfaction and
their difference in various subgroups of this industry is still unexplored. Additionally, extant literature has majorly focused on consumer surveys for customer-satisfaction research ignoring the huge data available online.
The current study fills these gaps. With 186,057 reviews on 619 e-commerce firms from 29 subcategories of
healthcare/health-product industry posted in a review-website between 2008 and 2018, we used text-mining,
machine-learning and econometric techniques to find which core and augmented service aspects and which emotions are more important in which service contexts in terms of reflecting and predicting customer satisfaction.
Our study contributes towards the healthcare/health-product marketing and services literature in suggesting an
automated and machine-learning-based methodology for insight generation. It also helps healthcare/health-product e-commerce managers in better e-commerce service design and delivery.
UN
CO
RR
EC
TE
D
ARTICLE INFO
Health-product ecommerce
Text mining
Sentiment
Emotion
Customer satisfaction
Online reviews
1. Introduction
Consumer perception is vital for any organization, regardless of them
being product and/or service-based. Both positive and negative perceptions resulting in consumer feedback and reviews are crucial for organizations to weigh their consumer-base. Consumer reviews provide such
information, which assist organization to churn up various matrices like
customer satisfaction (CSAT) and net promoter score (NPS) (Ho-Dac,
Carson, & Moore, 2013). With deep internet penetration even in the
remotest locations, consumers today are hooked online, whereby they
share information, views in various online platforms via consumer reviews (Park, Gu, Leung, & Konana, 2014). While on the one hand,
some organizations have a place within their website to enable the consumer to share his/her views/information through standardized quantitative or rating based fields, others have textual reviews; at times, both
exist in coherence (Siering, Deokar, & Janze, 2018).
‘Textual reviews’ wherein a consumer can pour his/her heart out either in frustration or happiness are certainly one of the bests in terms
of ‘informative content’. Through this medium, organizations get a detailed understanding of consumer sentiments and emotions. Further,
organizations do also get key insights into ‘consumer psychology’ in
terms of how a consumer initial perceived a product/service vis a vis
⁎
1
how s/he evaluated it post acquisition (Ye, Zhang, & Law, 2009).
In fact, this insight helps specifically multifaceted service industries,
like healthcare organizations for instance to deep-dive further to find
how such sentiments, emotions and evaluations thereof actually lead
the consumer to provide ratings. Importantly, with many healthcare/
health-product ecommerce organizations now being in the fray, almost
all of them seem to be preferring an omni-channel approach, whereby
‘that’ understanding gains further relevance.
Extant literature has extensively talked about how consumer reviews
affect both an existing customer and a new customers’ decision-making
and the overall perceptions of the organization and its brand (Sharp,
2011). Studies that have primarily focused on healthcare services, have
gone on to elaborate the rationales for making an online review helpful, almost to the point of it being ‘invaluable’ (Sandars & Walsh,
2009). However, extant literature has not focused on how the textual reviews can be used to find the reflectors and predictors of customer satisfaction in healthcare/health-product ecommerce (Sandars
& Walsh, 2009; Sharp, 2011). This understanding is important as
such an idea will help the healthcare/health-product ecommerce managers to make better service design, improved customer relationship
management and efficient handling of customer reviews. The current
study fills this gap. The key research question is: (a) How the opinion
of the consumers of various service attributes leads to their overall sat
Corresponding author.
E-mail address: swagato1987@gmail.com (S. Chatterjee)
All authors have equal contribution.
https://doi.org/10.1016/j.jbusres.2020.10.043
Received 10 January 2020; Received in revised form 12 October 2020; Accepted 14 October 2020
Available online xxx
0148-2963/© 2020.
S. Chatterjee et al.
Journal of Business Research xxx (xxxx) xxx-xxx
isfaction on healthcare/health-product ecommerce? (b) Whether importance of such attributes vary depending on the ecommerce subcategory?
(c) Whether the textual reviews can be used to answer the above questions? (d) Whether the emotions expressed in such reviews reflects customer satisfaction?
Herein, we look to analyze consumer reviews and ratings specifically for the heathcare/health-product industry; primarily, healthcare/
health-product ecommerce. In this attempt, at first, we analyzed the text
of multiple reviews to explore diverse core and augmented (C&A) service aspects based on which consumers tend to give their reviews (textually). Then we looked at exploring how these overall and attribute wise
sentiments and emotions lead to CSAT. Further, we show how healthcare/health-product ecommerce contexts change the above mentioned
relationships. For example, the consumer expectations for a pharmacy
and drugs ecommerce and a beauty products ecommerce is expected to
be different. We explore such aspects in the third step. We also check the
predictive power of the above mentioned variables to predict consumer
satisfaction.
The structuring of this paper from hereon is as follows: the next section covers the theoretical model, followed by the methodology and the
results. Discussion along with both the theoretical and practical implications follow. We conclude by highlighting the limitations and mention
the future scopes as well.
2.2. Online reviews
UN
CO
RR
EC
TE
D
PR
OO
F
Online reviews also are extremely informative for ‘prospective consumers’, who’re possibly uninformed or even ill-informed. The reviews,
especially the ones, which are consistent, affect the purchase decision-making process. Organizations, thereby at times, often tend to go
out of their way in trying to ensure getting positive reviews and ratings,
which in turn helps them leverage their brand worth and brand value.
Extant literature has covered many dimensions of online reviews vis
a vis its relevance and importance, when it comes to consumer decision-making, which in turn affects organizations’ bottom line (Chevalier & Mayzlin, 2006; Duan, Gu, & Whinston, 2008). Extant literature has also focused on pricing and promotional strategies for organizations, for whom it is like a multi –period game (Ajorlou, Jadbabaie,
& Kakhbod, 2016). While in the first half of the game, the focus remains on generating favorable online reviews, and the second half looks
to leverage on the positive impact of the first half, which goes on to affect their price, sales and profits (Ajorlou et al., 2016). Understanding
the underlying consumer psychological mechanisms leading to favorable
reviews, vis a vis how organizations motivate consumers to do the same
is also another area that has been explored in the past (Hennig-Thurau, Gwinner, Walsh, & Gremler, 2004; Mowen, Park, & Zablah,
2007). What motivates the bandwagon behavior in terms of providing
incongruous online reviews are also explored (Cheung & Lee, 2012).
Some researchers have found how the various types of customers or various types of purchase contexts can lead to difference in preferences and
drivers of customer satisfaction (Ahani et al., 2019; Xu, 2020).
We primarily focus here on both metric and textual aspect of online reviews encompassing thereby the consumer sentiments and emotions holistically (Chatterjee, 2019; Siering et al., 2018). Interestingly enough, extant literature in healthcare services has not combined
qualitative and quantitative information while explaining consumer satisfaction (Ng & Luk, 2019). In fact, in that sense, our study contributes
to the extant literature (Ng & Luk, 2019). Moreover, it is important
to note herein that attribute-wise sentiment mining and emotion mining
has remained very limited, especially in healthcare marketing literature.
2. Background study
2.1. Ecommerce customer satisfaction and customer ratings
Customer rating (CR) has been a major variable for marketers when
it comes to assessing the progress of their action (Anderson, Fornell, & Lehmann, 1994). CR is known to enhance customer purchases, be it a new purchase or a repeat, which naturally results in organizational profitability (Anderson et al., 1994; Söderlund, 1998).
But, what drives CR is a question that has plagued researchers for
decades (Anderson & Sullivan, 1993; Martensen, Gronholdt, &
Kristensen, 2000; Mouwen, 2015). This is particularly seen in cases
where consumer heterogeneity and multiple business models exist (Grewal, Chandrashekaran, & Citrin, 2010). In fact, it is this ‘heterogeneity’ that leads to differential importance from consumers to differing service attributes.
To explain CSAT in ecommerce, extant literature has explored various underlying constructs like value, trust and service quality (Oh,
1999; Szymanski & Hise, 2000; Taylor and Baker, 1994; Zeithaml,
Parasuraman, & Malhotra, 2002), using survey-based methods for
data collection (Pappas, Pateli, Giannakos, & Chrissikopoulos,
2014; Wang et al., 2019). Nevertheless, it is important to understand
user-generated ratings using user-generated information, as they would
be free of various biases. Studies using user-generated content has focused on how pre-purchase and post-purchase attribute wise ratings impact the CR or CSAT on ecommerce (Posselt & Gerstner, 2005). Some
have also tries to check whether the impact of such variables change
over time and over product category (Dholakia & Zhao, 2010; You,
Bhatnagar, & Ghose, 2016). However, it may be noted that both qualitative and quantitative data need to be combined in order to reflect or
predict CR, as textual reviews are often rich source of information and
the correct manifestation of consumer opinions. Extant literature has not
focused on this aspect while studying consumer satisfaction with ecommerce firms using user-generated content (Dholakia & Zhao, 2010;
Posselt & Gerstner, 2005). Our study is crucial in the sense that it acts
as a bridge between theory and practice. We propose a methodology,
whereby we look to create insights from user ratings through textual reviews by using text mining along with econometric and machine learning methods.
2.3. Text mining
True that ‘structured data’ is more comprehensible and useful; nevertheless, ‘unstructured data’ can yield much more information provided it is analyzed by combining both qualitative and quantitative techniques. ‘Text mining’ is one of the ways in which quantitative insights
may be generated even from unstructured textual data. ‘Text analytics’ on the other hand, transforms the data processed from text mining,
and creates actionable insights thereof. Text mining is used in document
classification, topic modelling, translation, language identification, fake
news detection, semantic mining, and chatbot development. Herein, following Hotho, Nürnberger, and Paaß (2005), we attempt to delve
into textual reviews by pre-processing the text data at first, followed by
text mining to create analyzable data. Further, text analytics is used to
generate actionable insights.
However, in order to apply text-mining methods, it is important to
have the unstructured text data clean and thereby ready; text pre-processing precisely helps in this endeavor. Pre-processing includes three
phases: stop words removal, stemming and important word identification (Vijayarani, Ilamathi, & Nithya, 2015). Stop words include unnecessary words such as pronouns, prepositions, etc. Essentially, words
that do not ‘value-add’ to the research context. Removal of such words
is a necessity thereof, whereby the size of the text data reduces, and it
helps in faster processing of the text data. Moreover, it also helps in ensuring that important data aren’t lost in the mix, which are critical for
text analysis (Feldman & Sanger, 2007). Removing ‘stop words’ in
2
S. Chatterjee et al.
Journal of Business Research xxx (xxxx) xxx-xxx
sumer’s overall satisfaction depending on their own and other aspects
accessibility and diagnosticity.
As per multiple pathway anchoring and adjustment (MPAA) model,
both personal characteristics of consumers (inside-out) and multiple attributes of services (outside-in) go on to build the consumer’s overall attitude towards a service (Cohen & Reed, 2006). Therefore, a combination of multiple internal and external forces lead to the final consumer
outcomes in terms of his/her purchase-decision making process for example. Factors that may impact the attitude formation are direct/imagined experience with the object, analytical attitude formation method,
analogical reasoning, value and social-identity driven attitude etc. (Cohen & Reed, 2006). All of these together suggests that consumer attitude formation results from a complex mix of various types of factors.
Textual reviews provided by consumers give a vivid description of
their experiences with a service (Chatterjee, 2019). Their sentiments
expressed in textual reviews therefore about C&A service aspects are a
rich source of information about a consumer’s attitudes. As discussed
earlier, such information are both accessible and diagnosable via text
mining techniques, which essentially make them as primary reflectors
of CSAT, as suggested by AD model (Lynch, 2006; Vaidyanathan,
2000). Extant literature explored the differential impact of the type of
service aspects on consumer outcomes, which are of two types, i.e. core
aspects, which provide basic benefits, such as food in a restaurant; and
augmented aspects, which provide additional benefits, such as live music in a restaurant (Chatterjee, 2019). In a health ecommerce setting, it
is important to study how sentiments towards C&A service aspects lead
to overall satisfaction.
For the sake of managing consumer reviews in open online channels,
it is also important to predict how changes made in C&A service aspects
can affect consumer ratings. Therefore, the predictive power of the C&A
service aspects in overall satisfaction is also an important area of study.
Therefore, we posit:
UN
CO
RR
EC
TE
D
PR
OO
F
volves multiple methods, such as mutual information method, Zipf’s
law method, classic method, and term-based random sampling. Once
the clean dataset is available, they are stemmed to develop connections within sentences in an attempt to reduce similar information content (Vijayarani et al., 2015). Stemming may be done either by word
truncation or statistical and/or mixed methods. The sole objective both
stemming and stop words removal is to find the most important word.
A common word for instance within the whole corpus is less important;
however, a word oft repeated within a ‘particular’ document is certainly
very important. Interestingly, this logic is captured in term frequency-inverse document frequency (TF-IDF) scores, which is used as a proxy of
importance score of the words (Feldman & Sanger, 2007; Vijayarani
et al., 2015).
The ‘bag-of-words’ (BoW) and parts-of-speech (POS) methods have
been used for text analytics based on Brill (1995) recommendations.
Further, we used POS tagging to identify the parts of speech a word
per se. Herein, it may be noted that this is a very common method for
feature selection from text (Asghar, Khan, Ahmad, & Kundi, 2014).
Generally, when consumers articulate their views, they’re nouns, while
the views in themselves are adjectives; for example: “The bar in this hotel is classy”; while ‘bar; is the noun, ‘classy’ is the adjective. Post the
POS tagging, we use the BOW method after considering the nouns which
carry the highest TF-IDF measures, thereby most important (Salton &
Buckley, 1988); further, we club them under various service aspects.
Scores of these words are further used in other data mining techniques
in order to generate additional insights (Chatterjee, 2019).
Sentiment mining for ecommerce aspects as referred to above is the
most common text analytics method; it could be done at the document
level, sentence level or feature level. The two most common ways of
identifying ‘sentiment’ include the Lexicon-based approach and statistical learning based approach (Feldman, 2013); while the most common way of identifying a sentiment within a text from the Lexicon-based
approach is summating the sentiment scores of all the words in the
text. The ‘statistical learning based approach’ could also be an alternative, whereby pre-marked data are used in various cutting edge machine
learning techniques.
Given the prominence and importance of online reviews today, sentiment mining has become crucial in providing essential information, especially through overall as well as feature-wise sentiment mining techniques (Siering et al., 2018). Unfortunately extant research in the
context of healthcare hasn’t explored this feature enough (Chatterjee,
2019; Popescu & Etzioni, 2007; Siering et al., 2018). Given that
we’ve attempted to use the same, our study gains more salience in terms
of its contribution to extant literature (Popescu & Etzioni, 2007).
H1 Sentiment towards C&A service attributes has positive relationship with CSAT in healthcare/health-product ecommerce industry.
As per the MPAA model, along with service attributes, a consumer’s
personal characteristics do effect his/her overall attitude towards a service (Cohen & Reed, 2006). Consumer characteristics typically tend to
finds its way of expressions through consumer emotions, which in turn
leads to consumer outcomes. As per MPAA model, consumption emotions act as a medium for both outside-in and inside-out expressions
of the consumers, which thus influence overall satisfactions (Cohen &
Reed, 2006). Understandably, while positive emotions lead to favorable judgements, negative emotions may lead to harsher evaluations.
Textual reviews are certainly rich in information when it comes to
consumer emotions. Extant literature has focused on how consumer
emotions can be extracted from textual reviews (Chatterjee, 2019).
However, unlike sentiment scores, emotion scores are multidimensional
(Westbrook & Oliver, 1991); for instance, while on the one hand,
we have positive emotions such as joy, trust, surprise etc., our negative
emotions comprise as sadness, disgust, anger etc. Importantly, such emotions do not necessarily fall within the same dimension; in other words,
they not only vary in terms of valence and degree, but also in terms
of meaning and source (Westbrook & Oliver, 1991). Extant literature
has dealt in detail on this aspect, establishing that such emotions do reflect a consumer’s attitude and behavior (Chatterjee, 2019; Laros &
Steenkamp, 2005).
However, extant literature has suggested that negative emotions lead
to more diagnosticity (Filieri, 2016), which essentially makes the input variable stronger, as per AD model. Further, Cavanaugh, MacInnis, and Weiss (2016) have categorized emotions based on valence
and arousal; for instance, while sadness may be a negative emotion,
it is low on the arousal factor, while anger on the other hand, de
3. Hypotheses development
Exploring the antecedents of CSAT has been an important research
domain in extant literature, covering service quality, trust, perceived
value etc. (Garbarino & Johnson, 1999; Oh, 1999). There have also
been attempts to understand how individual service attributes can lead
to CSAT, as they are more accessible and diagnostic in nature. Moreover, according to the accessibility-diagnosticity (AD) model, such accessible and diagnostic input variables do lead to consumer outcomes,
whereby we can consider individual service attributes as primary drivers of CSAT ratings (Vaidyanathan, 2000). However, these variables
have different levels of accessibility and diagnosticity, which are reliant
on consumer knowledge and/or his/her lack of information thereof. According to AD model, the influence of the memory of an input A on
the attitude formation is directly proportional to its accessibility and
inversely proportional to its diagnosticity. Moreover, the same is inversely proportional to its accessibility and directly proportional to its
diagnosticity of other inputs (Lynch, 2006). Extending the above, the
evaluation of various service aspects would have varied impact on con
3
S. Chatterjee et al.
Journal of Business Research xxx (xxxx) xxx-xxx
spite being a negative emotion too, is high on arousal. Additionally, it is
important to note that a high arousal emotion is highly accessible, as it
overcomes other cognitive processing (Filieri, 2016; Salehan & Kim,
2016), resulting thereby in higher effects of high arousal emotions. All
these together lead to a very interesting focal point for our study, i.e.
whether arousal, degree or valence of an emotion leads to different consumer outcomes.
The relationship with consumption emotion and customer satisfaction has been explained by the pleasure-arousal (PA) model by Ladhari
(2007). It suggests that the pleasure and arousal component of consumption emotions leads to positive cognitive state which in turn results
in satisfaction and positive WOM. Extending the above, we argue that
the expression of consumption emotions can be found in the textual reviews. Therefore the consumption emotions expressed in the textual reviews can reflect the satisfaction of the consumers. As per the PA models, as pleasure effects satisfaction, positive emotions are expected to be
positively related with satisfaction. High arousal emotions are also expected to be more related to satisfaction that low arousal emotions (Ladhari, 2007). Therefore we posit:
agers to create marketing plans focused to their own industry, while
helping them manage customer reviews better.
Therefore, we further posit:
PR
OO
F
H3 The relationship strengths of the overall sentiment, aspect wise
sentiments and emotions expressed in the textual review vary depending
on the type of healthcare/health-product ecommerce.
All of the above hypotheses are important, as following Xu (2020),
what consumers state in their reviews and what actually drives their satisfaction can be very different. This is because the underlying mechanism of review writing and underlying mechanism of customer satisfaction can be very different (Xu, 2020). Though we rely on the truthfulness of the review and emotion expressed, mere trivial relationship between the sentiments, emotions and overall satisfaction may not be true.
Therefore further probe is important. Our approach is different from
Xu (2020) as we adopted text mining and machine learning techniques
along with econometric techniques to explain and predict customer satisfaction.
4. Empirical study
H2a Overall sentiment in textual review has significant relationships
with CSAT in healthcare/health-product ecommerce industry.
4.1. Data and processing
H2b Emotions expressed in textual review have significant relationships with CSAT in healthcare/health-product ecommerce industry.
UN
CO
RR
EC
TE
D
We have collected data about healthcare/health-product ecommerce
firms from a website called trustpilot.com, which collects customer reviews about all types of ecommerce. We collected 186,057 reviews under the ‘Health and wellbeing’ category, which included 29 sub-categories including 619 posts from healthcare/health-product ecommerce
firms, posted between 2008 and 2018. The dataset had CR, a proxy
of CSAT, in 1 to 5 point scale (1 = highly dissatisfied, 5 = highly satisfied), along with the textual review (title and main content) on the
ecommerce firms. Fig. 1 summarizes the data processing and analysis
framework. At first, we removed number and stop-words, blank spaces
and punctuations etc. to make the initial pre-processed corpus. Next,
we used lexicons NRC Word-Emotion Association Lexicon (also called
EmoLex), created by Mohammad and Turney (2013) and found to be
suitable for consumer review based analysis (Chatterjee, 2019; Siering et al., 2018) to get the overall sentiments (negative, positive)
and 8 basic emotions from the text as listed in Table 1. In fact, similar methodology is common in information systems, data science and
marketing literature (Dang, Zhang, & Chen, 2010; Mostafa, 2013;
Taboada, Brooke, Tofiloski, Voll, & Stede, 2011).
Extant literature has suggested that consumers provide differential
importance to various service aspects depending on the context of service (Xu, 2020). For instance, consumers give differential importance
of service features for restaurants of different business models, such as
fine dining vs. fast food restaurants. In an adventure travel business context, the relative importance of C&A service aspects tend to vary based
on gender, demographics, travel goals and level of adventure (Matzler, Füller, Renzl, Herting, & Späth, 2008). In fact, in the hotel industry, the attribute level information generated from textual reviews have different influence on customer satisfaction depending on
the type of the hotel (Xu, 2020). It has been also found that factors
that consumers talk about and the factors that lead to their customer
satisfaction can be different set of variables (Xu, 2020). The health
oriented ecommerce industry also consists of various types of ecommerce. Some are generic, while some others focus on certain product
segments, such as personal care, drug and pharmacy, eye-care, skincare,
home health care etc. Therefore, the relative importance of C&A services vis vis the consumer emotions in reflecting and predicting consumer outcomes is expected to be different under these different contexts. Specific knowledge of such feature importance would help man
For sub-category-wise analysis, we have chosen top six sub-categories based on the number of reviews available: fitness and nutrition
Fig 1. Flowchart for data handling and model building.
4
S. Chatterjee et al.
Journal of Business Research xxx (xxxx) xxx-xxx
Table 1
Summary statistics of the variables in the models.
Beauty and
Wellness
Drugs and
Pharmacy
Maximum
4.52
1.06
1
5
0.40
0.41
−1
1
0.36
0.31
−1
1
0.21
1.05
0.17
0.30
0.98
0.33
0.41
1.23
4.39
0.65
1.52
0.58
0.85
1.32
0.89
0.81
1.75
1.19
0
0
0
0
0
0
0
0
1
21
44
17
49
38
26
21
50
5
0.4
0.43
−1
1
0.36
0.31
−1
1
Cosmetics
Skincare
Customer
Rating
Overall
Sentiment
Title
Overall
sentiment
Review
Anger
Anticipation
Disgust
Fear
Joy
Sadness
Surprise
Trust
Customer
Rating
Overall
Sentiment
Title
Overall
sentiment
Review
Anger
Anticipation
Disgust
Fear
Joy
Sadness
Surprise
Trust
Customer
Rating
Overall
Sentiment
Title
Overall
sentiment
Review
Anger
Anticipation
Disgust
Fear
Joy
Sadness
Surprise
Trust
2.26
0.74
0.29
1.14
0.24
0.36
1.06
0.44
4.53
3.06
1.67
0.79
1.79
0.73
0.96
1.53
1.07
1.04
0
0
0
0
0
0
0
0
1
84
56
18
44
17
22
38
26
5
0.4
0.41
−1
1
0.36
0.3
−1
1
2.28
0.6
0.24
1.08
0.2
0.31
1.1
0.34
4.64
2.66
1.42
0.68
1.52
0.65
0.9
1.41
0.93
0.91
0
0
0
0
0
0
0
0
1
53
51
17
31
15
49
23
26
5
0.4
0.4
−1
1
0.37
0.31
−1
1
1.84
0.46
0.17
0.92
0.11
0.24
0.74
0.26
2.15
1.23
0.57
1.31
0.45
0.78
1.05
0.8
0
0
0
0
0
0
0
0
57
26
12
28
9
27
29
16
Standard
Deviation
Minimum
Maximum
4.42
1.17
1
5
0.39
0.43
−1
1
PR
OO
F
Fitness and
Nutrition
Customer
Rating
Overall
Sentiment
Title
Overall
sentiment
Review
Anger
Anticipation
Disgust
Fear
Joy
Sadness
Surprise
Trust
Customer
Rating
Overall
Sentiment
Title
Overall
sentiment
Review
Anger
Anticipation
Disgust
Fear
Joy
Sadness
Surprise
Trust
Customer
Rating
Overall
Sentiment
Title
Overall
sentiment
Review
Anger
Anticipation
Disgust
Fear
Joy
Sadness
Surprise
Trust
Customer
Rating
Overall
Sentiment
Title
Overall
sentiment
Review
Anger
Anticipation
Disgust
Fear
Joy
Sadness
Surprise
Trust
Minimum
UN
CO
RR
EC
TE
D
All
Mean
Standard
Deviation
Mean
Eye treatment
0.37
0.32
−1
1
1.98
0.51
0.21
1.01
0.16
0.22
0.94
0.28
4.56
2.54
1.35
0.68
1.52
0.58
0.67
1.27
0.81
1.05
0
0
0
0
0
0
0
0
1
57
30
14
26
16
17
26
22
5
0.41
0.41
−1
1
0.41
0.33
−1
1
1.83
0.41
0.16
0.95
0.13
0.17
0.87
0.21
4.34
2.35
1.2
0.58
1.41
0.49
0.59
1.19
0.71
1.23
0
0
0
0
0
0
0
0
1
67
29
14
29
10
16
21
25
5
0.38
0.41
−1
1
0.37
0.33
−1
1
1.68
0.42
0.14
0.92
0.12
0.17
0.72
0.23
2.02
1.08
0.5
1.26
0.44
0.55
0.95
0.67
0
0
0
0
0
0
0
0
41
24
8
16
6
10
10
13
(40,708), beauty and wellness (35,065), drugs and pharmacy (15,443),
cosmetic (13,121), skincare (12,795) and eye treatment (10,269). We
found that the attribute-specific sentiments expressed in the text for
these six sub-categories only as the attributes are different for different
sub-categories. Further, we followed the bag-of-words method suggested
by Chatterjee (2019) for finding sentiments attribute-wise; at first we
found the nouns which occurred at least in 5% of the reviews (using
package developed by Nguyen, Nguyen, Pham, and Pham (2016)).
Following this, 4 experts and 9 users of healthcare/health-product ecommerce helped us to divide the nouns in various service attributes. The
final list of nouns in various service attributes have been given in a
supplementary file. The attributes found included service, product,
delivery, price, facility, equipment and time. Further, in order to find
attribute-wise sentiments, we have broken the texts in sentences to
see if at least one word did relate to an existing attribute, following
which we looked for the sentiment of such sentences. For example, a
review on beauty and wellness segment says: “Love the product I or
5
S. Chatterjee et al.
Journal of Business Research xxx (xxxx) xxx-xxx
dered (BRANDNAME) – so pigmented and long-wearing. it’s hard to
believe it’s not conventionally made. Love the free shipping and the
eco-conscious packaging!”. Based on the bag of words, here the part
which is relevant to the attribute called “product” is “Love the product I
ordered and the eco-conscious packaging!”, Sentiment of this portion is
used as the sentiment of “product” for the given review. Table 1 gives
the statistical summary of the data.
4.4. Feature importance comparisons
PR
OO
F
For robustness check of the results obtained in the explanatory models, we further analyzed the predictive models to get the feature importance of various emotions and aspect wise sentiments for various subcategories. The supplementary file has detailed values of the feature
importance scores, expressed in percentage terms where the total of the
feature importance of all emotions and aspect wise sentiments is 100.
As per the results, joy, anger and disgust are most important emotions, while anger plays a very important role for cosmetics and disgust
for eye-treatment. Unlike the regression results, in the predictive models, we find little feature importance of fear. Anticipation, sadness, trust
and surprise are of less importance.
In terms of the service aspects, product, service and delivery are most
important aspects as compared to the other four aspects. Service plays a
very important role in case of fitness and nutrition, while product plays
a very important role in beauty and wellness and cosmetics sub-categories. In general, price plays a small role in the beauty and wellness
category. Time is a crucial aspect for eye treatment. Figs. 2 and 3 gives
the graphical representation of the above results.
4.2. Explanatory models
We have used linear regression analysis for finding the explanatory
power of the insights generated from the review text. This is done in line
with extant literature (Chatterjee, 2019; Siering et al., 2018). However, we have also included ordered logistic models expecting non-linear
relationships and as the dependent variable is categorical rating (Chatterjee & Mandal, 2020). We analyzed the data as a whole and sub-category wise. For overall analysis, we only used the sentiment and emotion scores from the whole text and the title sentiment. For the sub-category wise analysis, we considered attribute-wise sentiments along with
the variables as described above.
The result of the overall analysis suggests that the sentiment of the
title and the body best reflects consumer satisfaction. Among emotions,
anger and fear have very strong negative effect on satisfaction, while joy
has strong positive effects. Uncertain emotions such as anticipation and
surprise, though are positive in valence, have negative relationship with
overall satisfaction. The effect of other emotions, though statistically significant, are very small. The result supports H2a and H2b.
While we try to compare the sub-categories, the above-mentioned
impact of overall sentiment of the title and the body along with the emotions holds true, thus further supporting H2a and H2b. Some emotions
specifically associated with some sub-categories include disgust with
drugs and pharmacy, eye-treatment and beaut-wellness; sadness with
skincare and eye-treatment. In terms of the ecommerce attributes, product assortment, services available and delivery are found to be most important for most of the sub-categories. Time aspects are most important
for eye-treatment along with drugs and pharmacy. Price is important for
cosmetics, beauty and wellness along with drugs and pharmacy; however as per the order-logistic regression it has not relationship with customer satisfaction for any category. Equipment and facility is important
for fitness, drugs and skincare subcategories. The above results suggests
that aspect wise sentiment can reflect customer satisfaction, thus supporting H1. However, the relative importance of overall sentiment, aspect wise sentiments and emotions expressed in the textual review vary
depending on the type of healthcare/health-product ecommerce, which
supports H3.
The models were free from multi-collinearity and heteroscedasticity
issues. Table 2 summarizes the models.
5. Discussions
UN
CO
RR
EC
TE
D
CSAT, which is an attitudinal aspect of consumer outcome, is of
paramount importance when it comes to organizations looking to use
a metric for assessing both consumer outcomes (Chang, 2015; Söderlund, 1998). Herein, based on user-generated information, we look
to elaborate upon the antecedents of CSAT, specifically in healthcare/
health-product ecommerce. We used textual qualitative reviews, and
through text mining along with natural language processing techniques,
we have attempted to derive insights from them. Sentiments and emotions expressed in a textual review for the overall service include our
first salient finding. Moreover, we have also found the sentiments that
have been expressed under specific service attributes, basing ourselves
on keywords and bag of words. By and large, these insights form part
of qualitative reviews, through which we have looked at explaining consumer outcomes. We also explored how the relationships as explained
above, tend to vary over the multitude of business models.
Based on the regression results and the results obtained from feature importance scores, we can conclude that both the C&A service
attributes do play a very important role when it comes to the ‘types
of ecommerce firms’, especially in terms of reflecting and predicting
CSAT. The above can be explained using MPAA model where consumer uses various pathways while building attitude (Cohen & Reed,
2006). This includes both personal characteristics of consumers (inside-out) and multiple attributes of services (outside-in). Therefore, a
combination of multiple internal and external forces lead to the customer satisfaction, as supported in the results. The core attributes of
ecommerce i.e., product assortment, services available and delivery, are
most important. This is expected as per AD model, as the core attributes are more accessible and diagnostic (Vaidyanathan, 2000). The
relative importance of product is higher for beauty and wellness as well
as cosmetics, while the same for services is higher for fitness and nutrition. This is in expected lines, because the sub-category of beauty
and cosmetics is heavily product centric, whereby it is the product performance and product quality that essentially lead to CSAT. On the
other hand, service success in fitness and nutrition often depend on
consumers’ motivations and discipline, which may be improved by services provided by ecommerce firms. Thus, service plays an important
role in nutrition and fitness. Among the augmented aspects, time gets
higher importance for eye-treatment along with drugs and pharmacy;
for these two sub-categories, on-time delivery and on-time service is crucial. Price does have some importance for cosmetics, beauty and wellness along with drugs and pharmacy, as often such sub-categories are
dependent on multiple and regular usage of products. Equipment and fa
4.3. Predictive models
We also used the predictive power of overall sentiment and emotions scores, and aspect-wise sentiment score in predicting overall satisfaction. We have used 100 fold validation method to check the outsample validity of the predictive models. For analysis we have used Linear
Regression, XGboost, Random Forest and Decision tree (CART) as the
methods. All these methods are commonly used for comparative analysis of the explaining power of machine learning models. As per Table
3, while XGboost and Random Forest show better predictive power in
terms of lower root mean square error (RMSE), the linear regression
model performs almost equally well. Thus, we can use linear regression
model for predictive analysis too; the advantage being that the regression model is theoretically explainable.
6
Table 2
Explanatory Models.
Model
Regression
Variables
Overall
Nutrition and
Fitness
Beauty and
Wellness
Drugs and
Pharmacy
Skincare
Eyetreatment
Cosmetics
AdjR2
AIC
(Intercept)
Overall Sentiment
Title
Overall Sentiment
Body
Anger
Anticipation
Disgust
0.2913
0.3451
0.2843
0.2234
0.3647
0.3351
0.4056
4.01***
0.63***
3.76***
0.79***
3.99***
0.62***
4.25***
0.44***
4.11***
0.5***
3.68***
0.71***
3.86***
0.67***
0.86***
0.95***
0.79***
0.58***
0.67***
1.03***
0.89***
Fear
Joy
Sadness
Surprise
Trust
service
product
delivery
price
time
facility
equipment
1|2
2|3
3|4
4|5
F
O
O
R
P
Ordered Logistic Regression
Overall
Nutrition and
Fitness
Beauty and
Wellness
Drugs and
Pharmacy
Skincare
Eyetreatment
Cosmetics
252,654
62,557
47,562
18,218
15,746
15,499
19,227
1.53***
1.62***
1.60***
1.32***
1.26***
1.57***
1.46***
2.83***
2.90***
2.75***
2.44***
2.33***
2.77***
2.67***
−0.23***
−0.13***
−0.20*
−0.49***
−0.14***
−0.09
(NS)
0.05 (NS)
D
E
T
C
E
R
R
O
C
N
U
−0.28***
−0.06***
−0.08***
−0.29***
−0.07***
−0.05***
−0.29***
−0.07***
0.03*
−0.2***
−0.06 ***
−0.19 ***
−0.25***
−0.08***
−0.1***
−0.14***
−0.05***
−0.27***
−0.33***
−0.04***
−0.11***
−0.41***
−0.16***
−0.03**
−0.43***
−0.13***
0.02 (NS)
−0.46***
−0.18***
−0.12***
−0.23**
−0.22***
−0.26**
−0.13***
−0.15***
−0.09***
−0.15 ***
0.04*
−0.04*
−0.25***
−0.25***
−0.19***
−0.28***
0.12***
−0.08***
−0.08***
−0.01
(NS)
0.12***
−0.03***
−0.06***
0.03***
0.12***
−0.03***
−0.08***
−0.01 (NS)
0.1***
−0.06 ***
−0.11 ***
−0.01 (NS)
0.17***
−0.04*
−0.19***
−0.04**
0.31***
−0.05(NS)
−0.31***
−0.01 (NS)
0.36***
−0.11*
−0.47***
−0.00 (NS)
0.11***
0.09***
0.13**
0.11***
−0.07***
−0.08***
−0.01
(NS)
0.14***
0.26***
−0.13***
−0.26***
−0.01(NS)
0.07***
0.13***
−0.18***
−0.09***
−0.01
(NS)
0.16***
−0.02
(NS)
0.14***
−0.12***
−0.12***
0.03*
0.33***
0.02 (NS)
0.51*
−0.40***
−0.24***
−0.08
(NS)
0.07
(NS)
0.32***
−0.27***
−0.34***
0.03
(NS)
0.62**
0.02 (NS)
0.14***
0.12***
0.12***
0.11*
0.12***
0.00 (NS)
0.27**
0.20 (NS)
0.32*
0.2***
0.15***
0.09*
0.31***
0.15***
0.17*
0.11 (NS)
−0.02 (NS)
0.74**
−0.02 (NS)
0.11***
0.08*
0.11
(NS)
−0.02
(NS)
−0.04
(NS)
−0.01
(NS)
0.01 (NS)
0.18***
−0.27**
0.00 (NS)
0.08 (NS)
0.04 (NS)
0.28***
0.05 (NS)
0.25***
NS = Not significant, * means p < 0.05 and *** are p < 0.0001.
−0.04 (NS)
1.03***
1.37 (NS)
−0.55***
−0.43
(NS)
−0.36
(NS)
−0.51*
−2.32***
−1.69***
−0.98***
0.01 (NS)
−2.76***
−2.19***
−1.51***
−0.48***
−2.43***
−1.98***
−1.37***
−0.33***
1.02***
−0.27**
−2.37***
−1.72***
−1.05***
−0.08***
0.73**
−1.92***
−1.37***
−0.69***
0.31***
−0.0 (NS)
0.33***
−0.14*
−0.48***
−0.02
(NS)
−0.01
(NS)
−0.26
(NS)
−0.11
(NS)
−0.17
(NS)
3.37***
0.23***
−0.11*
−0.22**
−0.00
(NS)
0.33*
−1.94***
−1.26***
−0.53***
0.43***
−2.18***
−1.55***
−0.85***
0.09*
0.17 (NS)
0.19 (NS)
0.38 (NS)
S. Chatterjee et al.
Journal of Business Research xxx (xxxx) xxx-xxx
Table 3
RMSE scores of predictive models.
Drugs & Pharmacy
Eye Treatment
Skincare
Cosmetic
1.03
0.92
0.97
1.19
0.92
0.92
0.92
0.92
0.82
0.78
0.82
0.99
1.01
0.93
0.99
1.18
0.91
0.83
0.87
1.06
0.99
0.9
0.95
1.14
PR
OO
F
Beauty & wellness
UN
CO
RR
EC
TE
D
Linear Regression
Xgboost
Random Forest
Decision tree
Fitness & Nutrition
Fig 2. Average feature importance of emotions.
Fig 3. Average feature importance of service aspects.
cility on the other hand, is important only for nutrition and fitness
sub-category, as they often work in an omni-channel mode, where
brick and mortar facilities and online ecommerce work conjointly. Thus,
we conclude that the feature importance of consumer sentiments towards C&A service attributes vary depending on the type of healthcare/health-product ecommerce. This finding is in line with previous researchers who focus on the relative importance of core vs. augmented
service aspects (Byrd, Canziani, Hsieh, Debbage, & Sonmez, 2016;
Ravald & Grönroos, 1996).
We affirm the explanatory and predictive power of consumer emotions based on the results of the regression models and predictive models. A consumer’s overall sentiment and title sentiment can reflect and
predict his/her satisfaction the most. Further, it isn’t surprising therefore that the most important emotions are higher arousal- anger, fear,
disgust and joy. This can be explained by the PA model which suggests
8
S. Chatterjee et al.
Journal of Business Research xxx (xxxx) xxx-xxx
comes is related to customer satisfaction. The relationship of experiential emotions and customer satisfaction is supported by PA model (Ladhari, 2007). Thus our study strengthens the above model.
Our third theoretical contribution is that we explored how the comparative importance of various service attribute-wise qualitative evaluations differ based on consumer outcomes in the healthcare service context (i.e. the subcategories of healthcare and wellbeing). Additionally,
we also found that the importance of overall textual sentiment and textual emotions actually change while trying to reflect CSAT under the
healthcare service context. True that past studies have dealt on how the
service context impacts consumer evaluations (Ekinci & Riley, 2003;
Matzler et al., 2008; Xu, 2020; Xu, Benbasat, & Cenfetelli, 2013),
but when it comes to the healthcare/health-product ecommerce context, they’re almost non-existent. The health oriented ecommerce industry also consists of various types of ecommerce, including generic ecommerce and more niche ecommerce. This varying context will lead to
differential relative importance of C&A services vis a vis the consumer
emotions in reflecting and predicting consumer. Specific knowledge of
such feature importance would help managers to create marketing plans
focused to their own industry, while helping them manage customer reviews better. Therefore, this is a pioneering study also in the context of
healthcare CSAT literature.
We found that both core and augmented service attributes play
crucial roles in reflecting and predicting CSAT in the healthcare/
health-product ecommerce context. The above finding strengthens the
MPAA model with additional evidences of validity of the model (Cohen
& Reed, 2006). As per MPAA model, attitude formation happens based
on inside-out and out-side in pathways and multiple internal (sentiment
and emotions) and external variables (evaluations of service aspects)
contribute towards satisfaction building. The current study ensures the
same and contributes towards literature on usage of MPAA model in
consumer behavior (Hasford & Farmer, 2016; Lynch, 2006).
PR
OO
F
that the consumption emotions with higher arousal are more related
to satisfaction (Ladhari, 2007). Specifically, disgust is more important
in the sub-categories of drugs and pharmacy along with eye-treatment,
while anger for cosmetics. This is in line with consumer identity literature related with cosmetics usage (Fabricant & Gould, 1993). Consumers of cosmetics ecommerce often have external locus on identity
i.e., they look for social acknowledgement. Therefore the purchase context is psychologically distant, and any service failure in such a context, could lead to high arousal negative emotions such as anger (Davis,
Gross, & Ochsner, 2011; Tatavarthy, Chatterjee, & Sharma,
2019). On the other hand, low arousal emotions such as anticipation,
sadness and trust are of less importance, as supported by PA model
(Ladhari, 2007). Sadness does have some importance in the context of
skincare category, possibly because skincare is often a psychologically
close context, related to a consumer’s identity (Lazar, 2011). Therefore, any service failure in skincare ecommerce may lead to low arousal
negative emotions such as sadness (Davis et al., 2011). Based on the
results, we conclude that consumer sentiments and emotions can and
do reflect as well as predict CSAT in healthcare/health-product ecommerce industry. However, the feature importance of consumer emotions
do tend to vary, depending on the type of healthcare/health-product
ecommerce (del Bosque & San Martín, 2008).
UN
CO
RR
EC
TE
D
5.1. Theoretical and methodological contribution
The paper has a number of theoretical and methodological contributions. Our first theoretical contribution is that the paper is a pioneering effort at exploring how the qualitative evaluations of the service aspects relate with CSAT, especially in the healthcare context (Siering et al., 2018). The importance of textual reviews has been extensively highlighted in extant literature thus far, (Brill, 1995; Hotho
et al., 2005); however, what has remained unexplored is the method
in combining both qualitative and quantitative data (Siering et al.,
2018). Though the contribution in healthcare context is more applied
in nature instead of core theoretical contribution, nuances in healthcare and health-product ecommerce context is very important. Healthcare being a multi-faceted service context, the theoretical underpinnings
of customer satisfaction can result in very different reflectors and predictors, as found in our study. Thus the current contribution is important and unique in the healthcare context. On the other hand, the above
is also a methodological contribution towards the literature which focus on ecommerce satisfaction study. Studies of ecommerce satisfaction
was majorly survey-based with latent constructs like value, trust and service quality as the antecedents (Oh, 1999; Pappas et al., 2014; Szymanski & Hise, 2000; Taylor and Baker, 1994; Wang et al., 2019;
Zeithaml et al., 2002). While some studies also included user generated content to study e-commerce satisfaction and tris to find the influence of pre and post-purchase attributes, they majorly relied on quantitative data obtained from review websites (Dholakia & Zhao, 2010;
Posselt & Gerstner, 2005; You et al., 2016). This is the pioneering study which focus on ecommerce satisfaction using both quantitative and qualitative information from the user generated content, thus
contributing to extant literature (Dholakia & Zhao, 2010; Posselt &
Gerstner, 2005; You et al., 2016).
Our second important contribution lies in the fact that in extant literature dealing with textual reviews major importance have been given
to overall or aspect-wise sentiments (Salehan & Kim, 2016; Siering
et al., 2018; Ye et al., 2009); the usage of textual emotions scores
has been limited thus far (Ahani et al., 2019; Wang et al., 2019).
Through this study, we have found the emotions from textual reviews
in order to explore how they relate to consumer outcomes (Salehan
& Kim, 2016; Ye et al., 2009). While sentiments of various core
and augmented attributes are expressed in the text, the emotions are
often related to the results and experience of the usage of the ecommerce platform and the healthcare product/service. Thus by including
the emotion elements in our study, we also explore how experiential out
5.2. Managerial implications
As regards managerial implications: the primary implication lies in
its service design, which is often nontrivial decision for healthcare/
health-product ecommerce firms, given that healthcare is often an amalgamation of multiple service aspects, whose importance may still be
not known completely. We chose to focus both on C&A service aspects.
The study that the former has higher importance than the latter. However, the relative importance of such aspects vary depending of healthcare/health-product ecommerce service contexts defined as the subcategories of the healthcare/health-product ecommerce industry. Therefore,
the prioritization and resource allocation decisions during the service
designing process should consider the above. When there’s a resource
allocation problem, ecommerce firms can use the regression models as
objective functions and take investment decision for each service aspect
which will improve CSAT.
Our study also gives a comparative analysis of the predictive models based on econometrics and machine learning and suggests that the
econometric models work equally good in comparison to the most common machine learning models. Moreover, the information generated
from the qualitative review can also be used to predict CSAT. Thus the
study gives an automated system which can easily find the reflectors of
CSAT in various service context giving suggestions where a healthcare/
health-product ecommerce firm should focus. Ecommerce firms crunch
huge set of data and automated predictive models suggesting potential
service designs and handling consumer reviews via automated review
management systems are important. This methodology of predictive machine learning models which is clubbed with text mining can extract relevant information from the text automatically and can find the relative
importance of such information in predictive customer satisfaction, thus
giving important marketing information to the managers in a dynamic
ever-changing world.
9
S. Chatterjee et al.
Journal of Business Research xxx (xxxx) xxx-xxx
Finally, the study suggested that both sentiment and emotions have
explanatory power while reflecting CSAT. We further suggest such relative explanatory power varies depending on what type of healthcare/
health-product ecommerce we are studying. During service failures,
ecommerce contexts that are more close to self-identity (such as skincare) will result in low arousal emotions such as sadness while ecommerce contexts that are more close to social-identity (such as cosmetics) will result in high arousal emotions such as anger. This understanding is important for healthcare/health-product ecommerce service
managers in service design, more specifically service recovery and customer relationship management strategy. For instance, based on the
above understanding skincare ecommerce firms will focus on ensuring
low arousal positive emotions (trust) via their recovery strategy while
cosmetics ecommerce will try to induce high arousal positive emotions
(joy). Therefore, the communication content and the recovery measures
should also be designed accordingly.
PR
OO
F
Asghar, M.Z., Khan, A., Ahmad, S., & Kundi, F.M. (2014). A review of feature extraction in
sentiment analysis. Journal of Basic and Applied Scientific Research, 4(3), 181–186.
Brill, E. (1995). Transformation-based error-driven learning and natural language
processing: A case study in part-of-speech tagging. Computational linguistics, 21(4),
543–565.
Byrd, E.T., Canziani, B., Hsieh, Y.C.J., Debbage, K., & Sonmez, S. (2016). Wine tourism:
Motivating visitors through core and supplementary services. Tourism Management,
52, 19–29. doi:10.1016/j.tourman.2015.06.009.
Cavanaugh, L.A., MacInnis, D.J., & Weiss, A.M. (2016). Perceptual dimensions
differentiate emotions. Cognition and Emotion, 30(8), 1430–1445. doi:10.1080/
02699931.2015.1070119.
Chang, K.C. (2015). How travel agency reputation creates recommendation behavior.
Industrial Management & Data Systems, 115(2), 332–352. doi:10.1080/
15256480802557283.
Chatterjee, S. (2019). Explaining customer ratings and recommendations by combining
qualitative and quantitative user generated contents. Decision Support Systems, 119,
14–22. doi:10.1016/j.dss.2019.02.008.
Chatterjee, S., & Mandal, P. (2020). Traveler preferences from online reviews: Role
of travel goals, class and culture. Tourism Management, 80, 104108. doi:10.1016/
j.tourman.2020.104108.
Cheung, C.M., & Lee, M.K. (2012). What drives consumers to spread electronic word
of mouth in online consumer-opinion platforms. Decision Support Systems, 53(1),
218–225. doi:10.1016/j.dss.2012.01.015.
Chevalier, J.A., & Mayzlin, D. (2006). The effect of word of mouth on sales: Online book
reviews. Journal of Marketing Research, 43(3), 345–354. doi:10.1509/jmkr.43.3.345.
Cohen, J.B., & Reed, A. (2006). A multiple pathway anchoring and adjustment (MPAA)
model of attitude generation and recruitment. Journal of Consumer Research, 33(1),
1–15. doi:10.1086/504121.
Dang, Y., Zhang, Y., & Chen, H. (2010). A lexicon-enhanced method for sentiment
classification: An experiment on online product reviews. IEEE Intelligent Systems,
25(4), 46–53. doi:10.1109/MIS.2009.105.
Davis, J.I., Gross, J.J., & Ochsner, K.N. (2011). Psychological distance and emotional
experience: What you see is what you get. Emotion, 11(2), 438. doi:10.1037/
a0021783.
del Bosque, I.R., & San Martín, H. (2008). Tourist satisfaction a cognitive-affective model.
Annals of Tourism Research, 35(2), 551–573. doi:10.1016/j.annals.2008.02.006.
Dholakia, R.R., & Zhao, M. (2010). Effects of online store attributes on customer
satisfaction and repurchase intentions. International Journal of Retail & Distribution
Management, 38(7), 482–496. doi:10.1108/09590551011052098.
Duan, W., Gu, B., & Whinston, A.B. (2008). The dynamics of online word-of-mouth and
product sales—An empirical investigation of the movie industry. Journal of Retailing,
84(2), 233–242. doi:10.1016/j.jretai.2008.04.005.
Ekinci, Y., & Riley, M. (2003). An investigation of self-concept: Actual and ideal
self-congruence compared in the context of service evaluation. Journal of Retailing
and Consumer Services, 10(4), 201–214. doi:10.1016/S0969-6989(02)00008-5.
Fabricant, S.M., & Gould, S.J. (1993). Women’s makeup careers: An interpretive study
of color cosmetic use and “face value”. Psychology & Marketing, 10(6), 531–548.
doi:10.1002/mar.4220100606.
Feldman, R., & Sanger, J. (2007). The text mining handbook: Advanced approaches in
analyzing unstructured data. Cambridge University Press.
Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications
of the ACM, 56(4), 82–89. doi:10.1145/2436256.2436274.
Filieri, R. (2016). What makes an online consumer review trustworthy? Annals of Tourism
Research, 58, 46–64. doi:10.1016/j.annals.2015.12.019.
Garbarino, E., & Johnson, M.S. (1999). The different roles of satisfaction, trust, and
commitment in customer relationships. Journal of Marketing, 63(2), 70–87.
doi:10.1177/002224299906300205.
Grewal, R., Chandrashekaran, M., & Citrin, A.V. (2010). Customer satisfaction
heterogeneity and shareholder value. Journal of Marketing Research, 47(4), 612–626.
doi:10.1509/jmkr.47.4.612.
Hasford, J., & Farmer, A. (2016). Responsible you, despicable me: Contrasting competitor
inferences from socially responsible behavior. Journal of Business Research, 69(3),
1234–1241. doi:10.1016/j.jbusres.2015.09.009.
Hennig-Thurau, T., Gwinner, K.P., Walsh, G., & Gremler, D.D. (2004). Electronic
word-of-mouth via consumer-opinion platforms: What motivates consumers to
articulate themselves on the internet? Journal of Interactive Marketing, 18(1), 38–52.
doi:10.1002/dir.10073.
Ho-Dac, N.N., Carson, S.J., & Moore, W.L. (2013). The effects of positive and negative
online customer reviews: Do brand strength and category maturity matter? Journal of
Marketing, 77(6), 37–53. doi:10.1509/jm.11.0011.
Hotho, A., Nürnberger, A., & Paaß, G. (2005). A brief survey of text mining. Ldv Forum,
20(1), 19–62.
Ladhari, R. (2007). The effect of consumption emotions on satisfaction and word-of-mouth
communications. Psychology & Marketing, 24(12), 1085–1108. doi:10.1002/
mar.20195.
Laros, F.J., & Steenkamp, J.B.E. (2005). Emotions in consumer behavior: A hierarchical
approach. Journal of Business Research, 58(10), 1437–1445. doi:10.1016/
j.jbusres.2003.09.013.
Lynch, J.G., Jr. (2006). Accessibility-diagnosticity and the multiple pathway anchoring
and adjustment model. Journal of Consumer Research, 33(1), 25–27. doi:10.1086/
504129.
Lazar, M.M. (2011). The right to be beautiful: Postfeminist identity and consumer beauty
advertising. New femininities (pp. 37–51). London: Palgrave Macmillan.
5.3. Limitations and future scope
UN
CO
RR
EC
TE
D
We have not studied the psychological mechanism that creates consumer attitude based on the sentiments and emotions felt by the consumer. Future research can bring in textual reviews in psychological experiments to give better clarity on this aspect. How such mechanism can
lead to differential importance of different C&A attributes along with
different sentiment and emotions in various healthcare/health-product
ecommerce context should also be explored. Other variables, such as
cultural and socio economic background of the consumers may also have
an impact on the above mechanism. We could not study the same due to
lack of data which can be studied by future researchers. The results can
also be expanded in other healthcare/health-product ecommerce contexts, more so in Omni-channel contexts which is not done in the current
study and can be explored in future. Possible bandwagon behavior in
terms of providing incongruous online reviews can also be explored in
the context of healthcare (Cheung & Lee, 2012). The bandwagon effect
in the context of a healthcare product such as cosmetics may be high but
such effect may not be present in eye-care, depending on how sensitive
eye-care is to a customer in comparison to cosmetics. Future researchers
can focus on the same. While using online reviews makes the data collection and information generation easier, one must keep in mind often the
online reviews may not be a true representative of the customer sample.
The demographic and psychographics of the consumers do drive their
willingness to put reviews on online review channels (Manner, 2017).
Therefore, while large dataset reduce the impact of bias, as is the case
in our study, future researchers may try to overcome this limitation by
including multiple review channels or combining both survey based and
online review based findings.
Appendix A. Supplementary material
Supplementary data to this article can be found online at https://doi.
org/10.1016/j.jbusres.2020.10.043.
References
Ahani, A., Nilashi, M., Yadegaridehkordi, E., Sanzogni, L., Tarik, A.R., Knox, K., …
Ibrahim, O. (2019). Revealing customers’ satisfaction and preferences through online
review analysis: The case of Canary Islands hotels. Journal of Retailing & Consumer
Services, 51, 331–343. doi:10.1016/j.jretconser.2019.06.014.
Ajorlou, A., Jadbabaie, A., & Kakhbod, A. (2016). Dynamic pricing in social networks:
The word-of-mouth effect. Management Science, 64(2), 971–979. doi:10.1287/
mnsc.2016.2657.
Anderson, E.W., Fornell, C., & Lehmann, D.R. (1994). Customer satisfaction, market
share, and profitability: Findings from Sweden. The Journal of marketing, 53–66.
doi:10.2307/1252310.
Anderson, E.W., & Sullivan, M.W. (1993). The antecedents and consequences of customer
satisfaction for firms. Marketing Science, 12(2), 125–143. doi:10.1287/
mksc.12.2.125.
10
S. Chatterjee et al.
Journal of Business Research xxx (xxxx) xxx-xxx
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods
for sentiment analysis. Computational Linguistics, 37(2), 267–307. doi:10.1162/
COLI_a_00049.
Vaidyanathan, R. (2000). The role of brand familiarity in internal reference price
formation: An accessibility-diagnosticity perspective. Journal of Business and
Psychology, 14(4), 605–624. doi:10.1023/A:1022942330911.
Vijayarani, S., Ilamathi, M.J., & Nithya, M. (2015). Preprocessing techniques for text
mining-an overview. International Journal of Computer Science & Communication
Networks, 5(1), 7–16.
Wang, W.M., Tian, Z.G., Li, Z., Wang, J.W., Vatankhah Barenji, A., & Cheng, M.N. (2019).
Supporting the construction of affective product taxonomies from online customer
reviews: An affective-semantic approach. Journal of Engineering Design, 30(10–12),
445–476. doi:10.1080/09544828.2019.1642460.
Westbrook, R.A., & Oliver, R.L. (1991). The dimensionality of consumption emotion
patterns and consumer satisfaction. Journal of consumer research, 18(1), 84–91.
doi:10.1086/209243.
Xu, J.D., Benbasat, I., & Cenfetelli, R.T. (2013). Integrating service quality with system and
information quality: An empirical test in the e-service context. MIS Quarterly, 37(3),
777–794. doi:10.25300/MISQ/2013/37.3.05.
Xu, X. (2020). Examining an asymmetric effect between online customer reviews emphasis
and overall satisfaction determinants. Journal of Business Research, 106, 196–210.
doi:10.1016/j.jbusres.2018.07.022.
Ye, Q., Zhang, Z., & Law, R. (2009). Sentiment classification of online reviews to travel
destinations by supervised machine learning approaches. Expert systems with
applications, 36(3), 6527–6535. doi:10.1016/j.eswa.2008.07.035.
You, Y., Bhatnagar, A., & Ghose, S. (2016). Customer satisfaction with E-Retailers: The role
of product type in the relative importance of attributes. Journal of Internet Commerce,
15(3), 274–291. doi:10.1080/15332861.2016.1212314.
Zeithaml, V.A., Parasuraman, A., & Malhotra, A. (2002). Service quality delivery through
web sites: A critical review of extant knowledge. Journal of the Academy of Marketing
Science, 30(Fall), 362–375.
UN
CO
RR
EC
TE
D
PR
OO
F
Manner, C.K. (2017). Who posts online customer reviews? The role of sociodemographics
and personality traits. Journal of Consumer Satisfaction, Dissatisfaction and
Complaining Behavior, 30, 23.
Martensen, A., Gronholdt, L., & Kristensen, K. (2000). The drivers of customer satisfaction
and loyalty: Cross-industry findings from Denmark. Total Quality Management,
11(4–6), 544–553. doi:10.1080/09544120050007878.
Mohammad, S.M., & Turney, P.D. (2013). Crowdsourcing a word–emotion association
lexicon.
Computational
Intelligence,
29(3),
436–465.
doi:10.1111/
j.1467-8640.2012.00460.x.
Mostafa, M.M. (2013). More than words: Social networks’ text mining for consumer brand
sentiments. Expert Systems with Applications, 40(10), 4241–4251. doi:10.1016/
j.eswa.2013.01.019.
Mouwen, A. (2015). Drivers of customer satisfaction with public transport services.
Transportation Research Part A: Policy and Practice, 78, 1–20. doi:10.1016/
j.tra.2015.05.005.
Mowen, J.C., Park, S., & Zablah, A. (2007). Toward a theory of motivation and personality
with application to word-of-mouth communications. Journal of Business Research,
60(6), 590–596. doi:10.1016/j.jbusres.2006.06.007.
Matzler, K., Füller, J., Renzl, B., Herting, S., & Späth, S. (2008). Customer satisfaction with
Alpine ski areas: The moderating effects of personal, situational, and product factors.
Journal of Travel Research, 46(4), 403–413. doi:10.1177/0047287507312401.
Ng, J.H., & Luk, B.H. (2019). Patient satisfaction: Concept analysis in the healthcare
context. Patient Education and Counseling, 102(4), 790–796. doi:10.1016/
j.pec.2018.11.013.
Nguyen, D.Q., Nguyen, D.Q., Pham, D.D., & Pham, S.B. (2016). A robust
transformation-based learning approach using ripple down rules for part-of-speech
tagging. AI Communications, 29(3), 409–422. doi:10.3233/AIC-150698.
Oh, H. (1999). Service quality, customer satisfaction, and customer value: A holistic
perspective. International Journal of Hospitality Management, 18(1), 67–82.
doi:10.1016/S0278-4319(98)00047-4.
Pappas, I.O., Pateli, A.G., Giannakos, M.N., & Chrissikopoulos, V. (2014). Moderating
effects of online shopping experience on customer satisfaction and repurchase
intentions. International Journal of Retail & Distribution Management, 42(3),
187–204. doi:10.1108/IJRDM-03-2012-0034.
Park, J.H., Gu, B., Leung, A.C.M., & Konana, P. (2014). An investigation of information
sharing and seeking behaviors in online investment communities. Computers in
Human Behavior, 31, 1–12. doi:10.1016/j.chb.2013.10.002.
Popescu, A.M., & Etzioni, O. (2007). Extracting product features and opinions from
reviews. Natural language processing and text mining (pp. 9–28). London: Springer.
Posselt, T., & Gerstner, E. (2005). Pre-sale vs. post-sale e-satisfaction: Impact on repurchase
intention and overall satisfaction. Journal of Interactive Marketing, 19(4), 35–47.
doi:10.1002/dir.20048.
Ravald, A., & Grönroos, C. (1996). The value concept and relationship marketing.
European Journal of Marketing, 30(2), 19–30. doi:10.1108/03090569610106626.
Salehan, M., & Kim, D.J. (2016). Predicting the performance of online consumer reviews:
A sentiment mining approach to big data analytics. Decision Support Systems, 81,
30–40. doi:10.1016/j.dss.2015.10.006.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval.
Information Processing & Management, 24(5), 513–523. doi:10.1016/
0306-4573(88)90021-0.
Sandars, J., & Walsh, K. (2009). The use of online word of mouth opinion in online
learning: A questionnaire survey. Medical Teacher, 31(4), 325–327. doi:10.1080/
01421590802204403.
Sharp, J. (2011). Brand awareness and engagement: A case study in healthcare social
media. Frontiers of Health Services Management, 28(2), 29–33.
Siering, M., Deokar, A.V., & Janze, C. (2018). Disentangling consumer recommendations:
Explaining and predicting airline recommendations based on online reviews. Decision
Support Systems, 107, 52–63. doi:10.1016/j.dss.2018.01.002.
Söderlund, M. (1998). Customer satisfaction and its consequences on customer behaviour
revisited: The impact of different levels of satisfaction on word-of-mouth, feedback to
the supplier and loyalty. International journal of service industry management, 9(2),
169–188. doi:10.1108/09564239810210532.
Szymanski, D.M., & Hise, R.T. (2000). E-satisfaction: An initial examination. Journal of
Retailing, 76(3), 309–322. doi:10.1016/S0022-4359(00)00035-X.
Tatavarthy, A.D., Chatterjee, S., & Sharma, P. (2019). Exploring the moderating role
of construal levels on the impact of process versus outcome attributes on service
evaluations. Journal of Service Theory and Practice, 30, 1–40. doi:10.1108/
JSTP-10-2018-0229.
Biography
Dr. Swagato Chatterjee is a researcher, consultant, teacher and academician. He has over
7 years of experience in marketing, operations and analytics. He has worked with companies like Coca Cola, Times of India, Technosoft, Mitsubishi, Nomura, Yes Bank, CSC,
Ernst and Young, Genpact in various consultancy and training assignments related to analytics. He has published in reputed international journals such as Decision Support Systems,
Tourism Management, International Journal of Hospitality Management, Journal of Business
and Industrial Marketing, Journal of Consumer Marketing, Journal of Strategic Marketing, Journal of Indian Business Research, Global Business Review among others and presented in various national and international conferences. He is a BTech from IIT Kharagpur and a PhD
in marketing from IIM Bangalore. Currently he is an Assistant Professor in Vinod Gupta
School of Management, IIT Kharagpur in the area of marketing and analytics.
Divesh Goyal is currently a student at IIT Kharagpur pursuing his B.Tech. in Metallurgical and Materials Engineering and M.Tech. in Entrepreneurship Engineering. His interests
lie in the field of finance, marketing, and analytics. He has experiences of working in a
startup, as well as a globally recognized B-School, IIM Udaipur. He is looking forward to
working in the field of Big Data and Artificial Intelligence and wants to be an entrepreneur.
Atul Prakash is pursuing his M.Sc. in economics in the Department of Humanities and Social Sciences at IIT Kharagpur. His research interests lie in the domain of microeconomics,
behavioural finance, trade, and analytics. He has first-hand experience in the field of data
analytics which includes projects developing prediction and forecasting models. He looks
forward to exploring the applications of modern analytic tools and methodologies in trade
and finance.
Jiwan is an undergraduate at IIT Kharagpur pursuing his B.Tech. in Agricultural & Food
Engineering and M.Tech in Financial Engineering. His interest lies in Portfolio Optimization, Quantitative Finance, and analytics. He has worked on projects involving risk modelling, time series forecasting, and big data analytics. He is looking forward to exploring the
applications of machine learning in marketing and finance.
11
View publication stats
Journal of Hospitality Marketing & Management
ISSN: 1936-8623 (Print) 1936-8631 (Online) Journal homepage: http://www.tandfonline.com/loi/whmm20
Understanding Satisfied and Dissatisfied Hotel
Customers: Text Mining of Online Hotel Reviews
Katerina Berezina, Anil Bilgihan, Cihan Cobanoglu & Fevzi Okumus
To cite this article: Katerina Berezina, Anil Bilgihan, Cihan Cobanoglu & Fevzi Okumus (2015):
Understanding Satisfied and Dissatisfied Hotel Customers: Text Mining of Online Hotel
Reviews, Journal of Hospitality Marketing & Management, DOI: 10.1080/19368623.2015.983631
To link to this article: http://dx.doi.org/10.1080/19368623.2015.983631
Accepted online: 27 Feb 2015.Published
online: 27 Feb 2015.
Submit your article to this journal
Article views: 190
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=whmm20
Download by: [Universite Laval]
Date: 24 September 2015, At: 21:57
Journal of Hospitality Marketing & Management, 00:1–24, 2015
Copyright © Taylor & Francis Group, LLC
ISSN: 1936-8623 print/1936-8631 online
DOI: 10.1080/19368623.2015.983631
Understanding Satisfied and Dissatisfied Hotel
Customers: Text Mining of Online Hotel
Reviews
KATERINA BEREZINA
Downloaded by [Universite Laval] at 21:57 24 September 2015
College of Hospitality and Technology Leadership, University of South Florida
Sarasota–Manatee, Sarasota, Florida, USA
ANIL BILGIHAN
College of Business, Florida Atlantic University, Boca Raton, Florida, USA
CIHAN COBANOGLU
College of Hospitality and Technology Leadership, University of South Florida
Sarasota–Manatee, Sarasota, Florida, USA
FEVZI OKUMUS
Rosen College of Hospitality Management, University of Central Florida, Orlando, Florida, USA
This article aims to examine the underpinnings of satisfied and
unsatisfied hotel customers. A text-mining approach was followed
and online reviews by satisfied and dissatisfied customers were
compared. Online reviews of 2,510 hotel guests were collected
from TripAdvisor.com for Sarasota, Florida. The research findings
revealed some common categories that are used in both positive and negative reviews, including place of business (e.g., hotel,
restaurant, and club), room, furnishing, members, and sports.
Study results further indicate that satisfied customers who are willing to recommend a hotel to others refer to intangible aspects of
their hotel stay, such as staff members, more often than unsatisfied customers. On the other hand, dissatisfied customers mention
more frequently the tangible aspects of the hotel stay, such as
furnishing and finances. The study offers clear theoretical and
managerial implications pertaining to understanding of satisfied
Address correspondence to Katerina Berezina, College of Hospitality and Technology
Leadership, University of South Florida Sarasota–Manatee, 8350 N. Tamiami Trail, Sarasota, FL
34243, USA. E-mail: katerina@katerinaberezina.com
Color versions of one or more of the figures in the article can be found online at www.
tandfonline.com/whmm.
1
2
K. Berezina et al.
and dissatisfied customers through the use of text mining and hotel
ratings via review websites, social media, blogs, and other online
platforms.
KEYWORDS hotel reviews, text mining, user generated content,
customer satisfaction, dissatisfaction
Downloaded by [Universite Laval] at 21:57 24 September 2015
INTRODUCTION
Hotels operate in a competitive and dynamic environment (Verma, Victorino,
Karniouchina, & Feickert, 2007; Wilkins, 2010). The challenges of running a
hotel business are identified by the fragmentation and complexity of the
lodging industry (Okumus, Altinay, & Chathoth, 2010). Aside from this,
increasing commoditization of hotel products makes it more difficult for
hotel companies to compete for their customers. Starkov and Price (2007)
suggested that customers select hotels based on the following criteria: familiarity, brand image, implementation of customer retention programs, and
core offering or value of the hotel. Given this, it is important to understand
what makes customers return or not return to a hotel, what makes them
recommend a hotel to their friends and relatives or not recommend it, what
image a property/brand has, and what features create value for customers.
Hotels employ different tools to assess and address customer satisfaction
and behavioral intentions. These tools may include placing comment cards
in the guest rooms, employing service recovery techniques to address inhouse service failures, distributing postdeparture guest satisfaction surveys,
and introducing follow-up measures for those problems that could not be
resolved in-house. Even though hotels dedicate efforts to assess and recover
(if necessary) customer satisfaction, the problem presents itself in guests’
unwillingness to share their experiences and provide feedback to hotels.
Previous research suggests that the majority of customers do not act on the
dissatisfactory service that they receive and are reluctant to complain to the
service provider (Ekiz & Au, 2011; Ekiz, Khoo-Lattimore, & Memarzadeh,
2012). Such reluctance to complain and provide feedback to hotels may take
away an opportunity to perform service recovery and improve the service
level in hotels. At the same time, it is important to note that the Internet
makes it easier for customers to share their experience via review websites,
social media, blogs, and other online platforms. The abundance of customer
reviews posted on the Internet is available not only to hotel managers, but
also to other consumers who may base their purchasing decisions on the
information provided online (Dickinger & Mazanec, 2008).
An emerging dependence on the Internet as the source of information
for decision-making regarding tourism products strengthens the need for
more research in the electronic reviews area (Sparks & Browning, 2011). It is
Downloaded by [Universite Laval] at 21:57 24 September 2015
Text Mining of Hotel Reviews
3
important for hotel managers to utilize customer review information that is
available for them online in order to better understand their customers and
improve hotel performance. However, the online medium generates such
a large volume of information that it may be difficult for the managers to
review and evaluate all of it. For this reason, this article undertakes the
text-mining approach that allows for the extraction of meaningful patterns
from large volumes of textual information (Lau, Lee, & Ho, 2005; Turban,
Sharda, & Delen, 2010). Most of the previous studies rely on the overall ratings of hotels (e.g., Ramanathan & Ramanathan, 2011). This current research
deploys customer recommendation, which is a stronger measure of customer
experience. Detailed ratings carry more information about user preferences
than single overall ratings alone (Jannach, Zanker, & Fuchs, 2014). Opinion
mining captures the subjectivity in terms of the semantic orientation associated with the constituents of a text (Gräbner, Zanker, Fliedl, & Fuchs, 2012;
Taboada, Brooke, Tofiloski, Voll, & Stede, 2011).
In summary, this article aims to examine the underpinnings of satisfied and dissatisfied customers by applying the text-mining approach to the
online hotel reviews. This will be achieved by comparing the online hotel
reviews of satisfied customers who are willing to recommend the property
to others and those of dissatisfied ones who do not recommend others to
come to the property where they stayed. Study results should allow us to
understand what aspects of amenities and services offered by hotels generate
positive comments and what aspects generate negative ones.
REVIEW OF LITERATURE
Hotel Guest Satisfaction and Behavioral Intentions
Identifying satisfied and dissatisfied customers has been an important
research theme among scholars from various disciplines including engineering, management, marketing, and hospitality (Chow & Zhang, 2008;
Pizam & Ellis, 1999). The concept of guest satisfaction and dissatisfaction
has been comprehensively examined by marketing and consumer behavior
researchers. These postpurchase behaviors are acknowledged as of great
importance to the firms due to their influence on repeat purchases and
word-of-mouth (WOM) recommendations. In a nutshell, satisfaction reinforces positive attitudes toward the brand and leads to a greater likelihood
that the same brand will be purchased again. On the other hand, dissatisfaction may lead to negative brand attitudes and weaken the likelihood of
buying the same brand again.
One of the key approaches to answer the questions of customer satisfaction and potential future behavior is measuring service quality (Bharwani
& Jauhari, 2013; Buttle, 1996; Crick & Spencer, 2011; Cronin & Taylor,
1992; Dortyol, Varinli, & Kitapci, 2014; Gummesson, 2014; Ladhari, 2012;
Downloaded by [Universite Laval] at 21:57 24 September 2015
4
K. Berezina et al.
Parasuraman, Zeithaml, & Berry, 1985; Prentice, 2013; Qu & Sit, 2007; Torres
& Kline, 2013; Yee, Yeung, & Cheng, 2010). Service quality is a level of service delivery based on the customer perception (Zeithaml, Bitner, & Gremler,
2006). Perceived service quality is a part of a broader concept of customer
satisfaction and behavioral intentions incorporating customer loyalty and
WOM communications (Prasad, Wirtz, & Yu, 2014; Prentice, 2013).
Hotel guests use a variety of elements to evaluate the quality of service
that they receive during their stay (Pizam & Ellis, 1999; Wilkins, Merrilees,
& Herington, 2007). Research indicates that customer satisfaction is affected
by both tangible and intangible aspects of service quality (Ekinci, Dawes, &
Massey, 2008; Prentice, 2013; Torres & Kline, 2013). The intangible elements
are service related such as assurance, customer service and empathy whereas
tangible elements are related to the physical facilities of the hotel such as
appearance of hotel personnel and cleanliness of the room (Ramanathan &
Ramanathan, 2011). It is claimed that service failure may have an impact on
the perception of service quality, satisfaction and future behavioral intentions (Berezina, Cobanoglu, Miller, & Kwansa, 2012; Han & Back, 2007;
Prentice, 2013; Tarn, 1999). Therefore, the recognition of attributes that
enhance customer satisfaction and ensure customer loyalty is important for
hotels.
Hoteliers aim to make customers satisfied and keep them coming back
to their properties. It is cheaper to keep an existing hotel guest than to invest
in finding new customers (Tyrrell & Woods, 2005). Furthermore, research
indicates that increasing customer retention rates by 5% may result in profit
increase by 25% to 95% (Reichheld & Schefter, 2000). Gefen (2002) points
out that acquiring new customers is more expensive than keeping loyal
ones, while serving loyal customers is cheaper than serving new customers.
Besides, loyal customers spend more and frequently refer new customers to
a supplier, providing another rich source of profits (Bowen & Shoemaker,
1998; Shoemaker & Lewis, 1999).
The growth and penetration of the Internet expand the effect of referrals
from loyal customers. However, dissatisfied customers may also be valuable for hotels. First, they may assist hotels by pointing out the problematic
areas of hotel operations that may require careful attention and improvement (Harrison-Walker, 2001). Another reason for appreciating dissatisfied
customers is the effects of the service recovery paradox. The service recovery paradox states that the customer satisfaction rate is even higher for
those customers who have experienced service failure followed by service
recovery than for those customers who received their service properly on
the first time (Harrison-Walker, 2001; Hoffman & Bateson, 2010; Zeithaml
et al., 2006). Literature supports the fact that service recovery strategies
increase customer loyalty (Cranage & Mattila, 2005). However, if complaints
are not addressed, it may result in dissatisfaction, low repeat-purchase levels, and negative WOM (Mattila & Mount, 2003). In order to avoid such
Downloaded by [Universite Laval] at 21:57 24 September 2015
Text Mining of Hotel Reviews
5
negative consequences, Harrison-Walker (2001) suggested that companies
should embrace customer complaints for their own benefit. HarrisonWalker recommended that companies develop necessary outlets for customers to complain, including website resources, call centers, and chatting
options.
At the same time, negative WOM could be harmful to companies
(Bambauer-Sachse & Mangold, 2011). Customers are inclined to specifically seek negative reviews because negative information is considered as
being more diagnostic and informative than positive or neutral information.
Negativity is weighted more heavily in the decision making process than positive information (Herr, Kardes, & Kim, 1991). Negative WOM could deter
potential customers from considering a particular product or brand, therefore, damaging the company’s reputation and financial strength (Sundaram,
Mitra, & Webster, 1998). It could also go viral very quickly in today’s connected world and possibly diminish brand equity and image, reduce sales,
and, in extreme cases, close businesses completely.
Evaluating Customer Satisfaction on Web 2.0
Traditionally hoteliers and academics assess service quality quantitatively by
using guest comment cards and questionnaires. However, the development
of the Internet and consumer-generated content provides a strong opportunity for a qualitative approach to service quality. The development of the
Internet has led to the shaping of the second generation of the Internet,
which is referred to as Web 2.0. It is an expression that was used for the first
time by O’Reilly in 2004. O’Reilly (2005) defined Web 2.0 as “a set of economic, social, and technology trends that collectively form the basis for the
next generation of the Internet—a more mature, distinctive medium characterized by user participation, openness, and network effects” (p.1). The
technology that is referred to as the second-generation Internet (Web 2.0) is
one that usually includes tools that allow people to collaborate and share
information online. Examples of these include, but are not limited to, social
networking, instant messaging, social bookmarking, mash-ups, blogs, virtual worlds, podcasts, web videos, and wikis (Kasavana, Nusair, & Teodosic,
2010).
The most developed area of Web 2.0 within travel is consumer
reviews (O’Connor, 2008: Litvin, Goldsmith, & Pan, 2008; Nusair, Bilgihan,
& Okumus, 2013). The examples of travel review websites include websites such as Expedia and TripAdvisor. Pew Internet & American Life Project
study (2006) reports that searching for travel related information is one of
the most popular online activities. Research indicates that people utilize
online travel referrals for travel planning (Cox, Burgess, Sellitto, & Buultjens,
2009; Mackay, McVetty, & Vogt, 2005; Litvin et al., 2008; Nusair, Bilgihan,
Okumus, & Cobanoglu, 2013; Stringam & Gerdes, 2010). Furthermore, over
Downloaded by [Universite Laval] at 21:57 24 September 2015
6
K. Berezina et al.
5 million travelers a month visit VirtualTourist.com in order to seek travel
reviews and tips (Lee & Gretzel, 2006); approximately 20 million people
visit TripAdvisor to read other travelers’ reviews every month (Yoo, Lee, &
Gretzel, 2007). Recommendations provided by other consumers based on
their tourism experiences are suggested to be not only the most preferred
sources of travel information, but also the most influential sources for travel
decision-making (Pan, MacLaurin, & Crotts, 2007). Online consumer reviews
empower guests by allowing them to access “more accurate, up-to-date
information about products” (Kucuk & Krishnamurthy, 2007). Aside from
customers, management could also potentially benefit from online comments
to report service strengths and weaknesses, making them of considerable
utility when studying customer relationship management (Cho, Im, & Hiltz,
2003). User-generated content create opportunities for hotels to gain a better
understanding of their guests (Barreda & Bilgihan, 2013).
Literature suggests that hotel guest reviews are characterized by a
growing importance and impact on the consumer decision-making process
and hotel selection (Bulchand-Gidumal, Melián-González, & López-Valcárcel,
2011; O’Connor, 2008; Gretzel, & Yoo, 2008; Xie, Miao, Kuo, & Lee, 2011).
Results of previous studies suggest that approximately 90% of travelers find
hotel reviews to be helpful (Gretzel & Yoo, 2008; Stringam, Gerdes, &
Vanleeuwen, 2010). According to the 2010 Portrait of American Travelers
(YPartnership/ Harrison Group, 2010) the top preferred choices for finding
travel information and prices include online travel agencies and review web
sites such as Expedia (56%), Travelocity (52%), and Orbitz (46%).
Product and service reviews are an increasingly important type of usergenerated content as they provide a valuable source of information to help
customers make good purchasing decisions. Previous research reveals that
the influence of user-generated online reviews on online sales is significant,
with a 10% increase in traveler review ratings boosting online bookings by
more than five percent (Ye, Law, Gu, & Chen, 2011). Predictions suggest that
online reviews influence more than US$10 billion in online travel purchases
annually (Compete, 2007).
The sphere of consumer-generated content was studied by surveying
Internet users and investigating their opinions about hotel reviews (Gretzel
& Yoo, 2008). Stringam et al. (2010) conducted a study on hotel ratings that
demonstrated the dominance of positive reviews in the online media (about
74% of the reviewers would recommend the property where they stayed
to others). This study revealed a high positive correlation between service
subcategory ratings, overall satisfaction, and intentions to recommend the
hotel to others. Ekiz et al. (2012) investigated the online complaints in the
luxury hotel context. They identified two main categories in online consumer
complaints: room for improvement (physical attributes of the hotel room and
the quality of the amenities provided in the room) and hotel staff attitudes
(misbehaviors, bad attitude, lack of knowledge, skill, and passion of the
staff).
Downloaded by [Universite Laval] at 21:57 24 September 2015
Text Mining of Hotel Reviews
7
In the area of travel reviews, text mining has been utilized in order to
classify pleasant reviews by satisfied customers and unpleasant reviews by
dissatisfied customers (García-Barriocanal, Sicilia, & Korfiatis, 2010). These
researchers utilize shallow natural language processing (NLP) in order to
identify emotion-based review categories for reviews in Spanish. They suggest that hotel guest reviews can serve as a complementary source for hotel
quality evaluation. Qualitative analysis of London hotels’ online reviews by
O’Connor (2010) revealed the top 10 most common topics mentioned in the
reviews to be the following: hotel location, room size, staff (good service),
cleanliness, breakfast, in-room facilities, comfortable, temperature, dirty, and
maintenance. Pekar and Ou (2008) deployed opinion mining and investigated the relationship between subjective expressions and references to
hotel room features. However, they did not offer managerial implications
from a services marketing aspect. Barreda and Bilgihan (2013) investigated
the main themes that motivate guests to evaluate hotels on Web 2.0. Their
findings indicate hotel cleanliness as a common concern in guests’ expectations. Guests were found to be more likely to write positive reviews for
hotels that are conveniently located to attractions, shopping, airports, and
restaurants. Guests were also positively influenced by the quality of service
received.
RESEARCH METHOD
The purpose of this research is to identify the patterns in hotels reviews
regarding the aspects that make hotel guests satisfy with the hotel and inspire
them to recommend the property to others, and, on the other hand, to find
out about the negative patterns that cause guest dissatisfaction. Text mining
was chosen as a research method for the purpose of this research based on
the premise that this approach is capable of finding out meaningful patterns
in the vast amount of information generated by hotel guests’ reviews (Lau
et al., 2005; Turban et al., 2010). Text mining “explores data in text files
to establish valuable patterns and rules that indicate trends and significant
features about specific topics” (Lau et al., 2005, p. 345).
Sample and Data
Sarasota, Florida, in the United States was selected as a location of primary
focus for this article. Sarasota is a popular, vibrant, and fast-growing destination. It has been recognized with awards such as Orbitz.com’s “Top
10 Fastest Growing Domestic Beach Destinations” in 2008 and TripAdvisor
Traveler’s Choice Award of 2011. Its popularity keeps growing with the number of visitors each year. Visit Sarasota County has recorded 759,800 visitors
staying in paid accommodations in 2010; 827,000 visitors in 2011; followed
Downloaded by [Universite Laval] at 21:57 24 September 2015
8
K. Berezina et al.
by 894,100 and 941,400 in 2012 and 2013, respectively, the majority of which
come for leisure purposes.
This location was selected as a destination that offers a variety of
travel experiences, including beach vacations, business and meetings, art
and heritage tourism, leisure and sport activities, medical tourism, and ecotourism (Sarasota Convention and Visitors Bureau, 2009). Due to the variety
of tourism types developed in Sarasota, the city also offers a wide selection of different hotel properties, such as leisure/beach hotels, resorts,
business/conference hotels, limited service, select service, and full-service
properties. The types of accommodations offered include 47% condos, 31%
hotels/motels, 7% apartments, 7% houses, 4% mobile homes, 2% campsites.
At the same time, Sarasota is a relatively small destination compared to other
popular travel destinations in the United States. This allowed researchers to
collect all available reviews for Sarasota hotels while conducting the study.
All available online reviews for Sarasota hotels were collected from
TripAdvisor.com. The TripAdvisor website enables travelers to access information about hotels, flights, restaurants, vacation rental, cruises, and other
travel products. Users can post comments, share trip ideas/pictures, and
express their reviews on hotels, restaurants, and destinations. TripAdvisor
contains more than 100 million travel-related reviews from travelers from
all over the world. These reviews cover more than 2.5 million businesses,
116,000 destinations, and 1.1 million accommodations (TripAdvisor, 2013).
TripAdvisor was selected for this study, as it is one of the largest repositories
of travel-related reviews.
The data for this study was collected using an online robot developed
for the purposes of this research. A total of 2,510 reviews were recorded
in the excel file. The data file contains the following categories that present
usual attributes of consumer reviews on TripAdvisor (see Table 1). A list of
all hotels that were included in this study with corresponding star ratings,
type of the property, and the number of reviews is presented in Table 2
below.
The reviews that were included in the study were mainly (84.87%) provided by hotel guests traveling for leisure or leisure-related purposes (e.g.,
quality time with family, romantic getaway, personal event, etc.). Business
travelers accounted for 11.70% of all reviewers. Table 3 provides information about travelers’ purpose of the trip. These statistics are in line with
Sarasota’s leisure-dominated market composition (Sarasota Convention and
Visitors Bureau, 2009).
Internal Validity
In relation to internal validity, it is crucial for this research to divide the
reviews correctly into positive and negative categories. The researchers
assumed consistency in customer opinion about the hotel (expressed via
Text Mining of Hotel Reviews
9
TABLE 1 Hotel review categories
Field
Explanation
Quote
It contains a title of the guest review and in most cases the overall
feeling about the hotel
Name of the observed hotel
Username of the reviewer
Contributions contains the number of review posted by a
particular user on the TripAdvisor.com
Location refers to the reviewer’s residence
Trip type includes different categories: business; couples; family;
couples, family getaway; friends getaway; solo travel
Contains the review body in it
These fields contain numerical values that guests gave as rating
scores to each of the categories named above. The values range
from 1 (terrible) to 5 (excellent).
Hotel name
User name
Contributions
Downloaded by [Universite Laval] at 21:57 24 September 2015
Location
Trip type
Comment
Value, rooms,
location,
cleanliness,
service, and sleep
quality
Date of stay
Visit type
Travelers
Age group
Member since
Recommendation
The date that reviewer stayed in the hotel
Visit type contains the following categories: business;
hobbies/interest/culture; honeymoon; leisure; personal event;
quality time with family; romantic getaway; and other.
Refers to membership on TripAdvisor.com
Recommendation contains categories “Yes” or “No” and represents
likelihood of recommending this hotel to others.
hotel ratings and comments) and their recommendations for other travelers
(Yes/No). For the purpose of checking an internal validity of the reviews,
correlations of rating scores and recommendation scores were obtained. The
results are presented in Table 4. The analysis revealed strong significant
positive correlations for all variables except location where the correlation
was medium. After this, 20 reviews were randomly picked for the content
analysis. The first two authors of this study read these reviews in order to
double check if the content of those reviews really reflects the intention of
the reviewer to recommend or not to recommend this hotel. The results of
the internal validity check came out positive and the reviews were divided
into two categories based on customer recommendations.
Modeling and Word Categorization
The text-mining approach using PASW Modeler was applied to the comment
section of the document in order to identify patterns in guest comments
about the hotel. The Text Analytics Module of PASW Modeler allows for
conversion of unstructured data into a more structured one by means of
extracting concepts and relationships found in textual information. Current
research did not rely on the stance-shift analysis that considers syntax and
10
K. Berezina et al.
TABLE 2 Hotel reviews included in the study
Downloaded by [Universite Laval] at 21:57 24 September 2015
Hotel name
Lido Beach Resort
The Ritz-Carlton, Sarasota
Helmsley Sandcastle Hotel
Hyatt Regency Sarasota
Holiday Inn Sarasota–Lido Beach
Southland Inn
Hotel Ranola
Hotel Indigo Sarasota
Holiday Inn Express Sarasota–Siesta Key Area
Best Western Midtown
La Quinta Inn & Suites Sarasota
Country Inn & Suites I-75
Hibiscus Suites Inn
Tropical Breeze Resort & Spa
Hyatt Place Sarasota/Bradenton Airport
AmericInn Hotel & Suites of Sarasota
Homewood Suites by Hilton Sarasota
Coquina On The Beach
Hilton Garden Inn Sarasota–Bradenton Airport
SpringHill Suites Sarasota Bradenton
Comfort Inn Sarasota
Sleep Inn
Golden Host Resort
Holiday Inn Sarasota–Airport
Residence Inn Sarasota Bradenton
Comfort Inn
Hampton Inn Sarasota–I-75 Bee Ridge
Suntide Island Beach Club
Holiday Inn Sarasota–Lakewood Ranch
Quality Inn & Suites Airport
Days Inn Sa...
We've got everything to become your favourite writing service
Money back guarantee
Your money is safe. Even if we fail to satisfy your expectations, you can always request a refund and get your money back.
Confidentiality
We don’t share your private information with anyone. What happens on our website stays on our website.
Our service is legit
We provide you with a sample paper on the topic you need, and this kind of academic assistance is perfectly legitimate.
Get a plagiarism-free paper
We check every paper with our plagiarism-detection software, so you get a unique paper written for your particular purposes.
We can help with urgent tasks
Need a paper tomorrow? We can write it even while you’re sleeping. Place an order now and get your paper in 8 hours.
Pay a fair price
Our prices depend on urgency. If you want a cheap essay, place your order in advance. Our prices start from $11 per page.