Capturing Customer Heterogeneity using a Finite Mixture PLS Approach Carsten Hahn SAP AG, Neurottstrabe, 69190 Walldorf, Germany Michael D. Johnson University of Michigan Business School. 701 Tappan Street, Ann Arbor, Michigan 48109-1234, USA Andreas Herrmann University of St. Gallen, MCM Institute, Blumenbergplatz 9, 9000 St. Gallen, Switzerland Frank Huber University of St. Gallen, MCM Institute, Blumenbergplatz 9, 9000 St. Gallen, Switzerland Abstract An approach for capturing unobserved customer heterogeneity in structural equation modeling is proposed based on partial least squares. The method uses a modified finite-mixture distribution approach. An empirical analysis using quality, customer satisfaction and loyalty data for convenience stores illustrates the advantages of the new method vis-a-vis a traditional market segmentation scheme based on well known grouping variables. The results confirm the assumption of heterogeneity in the individuals’ perception of the antecedents and consequences of satisfaction and their relationships. The results also illustrate how the finite-mixture approach complements and provides insights over and above a traditional segmentation scheme. 1 Introduction Understanding customers requires an understanding of segment-level differences or heterogeneity. One traditional approach for understanding heterogeneity is to use separate marketing research (interviews, focus groups and surveys) to identify a priori segments upon which subsequent research and analysis is based. A more recent trend in marketing research is to determine the segments when analyzing customer data using a latent class or finite mixture approach. Our particular interest is to better understand heterogeneity within structural equation modeling (SEM) in marketing, and specifically models that link customer perceptions of quality and price to customer satisfaction and loyalty. For example, is customer satisfaction and loyalty for a convenience store driven primarily by “convenience” for some and “safety” for others? Two popular SEM methods for estimating such models are covariance structure analysis or CSA (using programs such as LISREL) and partial least squares or PLS4. There are occasions and contexts where researchers prefer to use PLS when estimating a quality, satisfaction and loyalty model. But while the finite mixture approach has been added and applied to covariance structure analysis5, it has not been integrated with PLS. The goal of this research is to integrate the advantages of PLS with the advantages of a finite mixture approach to market segmentation. The integration is unique because it leverages the advantages of a least-squared procedure when operationalizing a satisfaction model and the advantages of a maximum likelihoodbased approach when deriving market segments. We compare and contrast the approach with a more traditional a priori segmentation scheme using data from a national convenience store survey. The new approach both complements the traditional segmentation scheme and provides unique insights into the drivers of customer satisfaction. We begin by describing satisfaction modeling and approaches to estimating models. We then describe our Finite Mixture PLS approach, our empirical study, and results. The advantages and disadvantages of the approach vis-a-vis traditional market segmentation are then described and discussed. 2 Quality, satisfaction and loyalty modeling Models that link customer perceptions of quality and price to satisfaction and subsequent loyalty, or satisfaction models, have become common applications of SEM in marketing. Satisfaction models typically include the concrete attributes that describe a product or service, the benefits or consequences these attributes provide customers, a customer’s overall evaluation of their purchase and consumption experience (customer satisfaction), and the behavioral intentions or behaviors that result (such as repurchase, product recommendations or word-of-mouth, crossselling, or price tolerance). These models rest heavily on expectancy-value model formulations, where beliefs about the consumption experience (quality dimensions and price) affect customer satisfaction as a type of overall evaluation or attitude, which in turn affects customers’ behavioral intentions and behaviors. A key feature of satisfaction models is that the benefit, satisfaction, and loyalty constructs in the models are inherently abstract or latent variables. The most common way to empirically measure these latent variables is through the use of multiple concrete proxies or measurement variables. Benefits are measured using their attributes, satisfaction is measured using different overall evaluation standards (such as overall satisfaction, overall performance versus expectations, overall per formance versus an ideal), and loyalty is often measured using behavioral intentions (such as the likelihood of repurchase or recommendation to others). Statistical estimation of a satisfaction model must accommodate the fact that the model is a network of cause-and-effect relationships (as from quality, to satisfaction, to loyalty) that contains latent variables. There are two popular methods for estimating models of this type, partial least squares (PLS) and covariance structure analysis. The methods are more complementary than competing. Their use should depend on both the purpose of the analysis and the research context. For example, because the aim of covariance structure analysis is to explain relationships, and it is based on maximum likelihood estimation, it is particularly well suited to evaluating the relative fit of competing theoretical models. Yet there are frequent occasions in marketing research when PLS is the preferred method. PLS is essentially an iterative estimation procedure that integrates principal-components analysis with multiple regression. Whereas CSA explains covariance, the objective of PLS is to explain variance in the endogenous variables in a satisfaction model that have bottom-line managerial relevance (satisfaction, loyalty, profit). Because the latent variables in PLS are easily operationalized as principal components or weighted indices of the measurement variables, they provide managers with explicit benchmarks for evaluating their performance. When this performance information is combined with the impact scores from the regression estimates, managers have both the impact and performance information that they need to make key resource allocation decisions. Bagozzi/Yi (1994) delineate three contextual factors that also influence the choice of method. They argue that PLS is preferred over CSA when: (1) sample sizes are small, (2) the data to be analyzed is not multivariate normal (as when distributions are highly skewed), and (3) improper or non-convergent results are likely (as when estimating a complex model with many variables and parameters). Consider that satisfaction models often use small samples, especially at the segment level. Quality and customer satisfaction data is also marked by large negative skewness. And satisfaction models are often large and complex, involving multiple abstract benefits and dozens of attributes. These arguments illustrate why researchers prefer PLS when operationalizing an existing structural model, such as a company’s existing customer satisfaction model. PLS is, for example, used to estimate all of the major national satisfaction index models. PLS also has its disadvantages. One is that PLS tends to underestimate path coefficients and overestimate loadings. As Bagozzi and Yi argue, however, this means that the significant results of a PLS analysis can be given more credence because the test is more conservative. Other limitations of PLS are that jackknife or bootstrap procedures are needed to obtain estimates for the standard errors of the parameter estimates and, because PLS is a limited-information estimation method, its estimates are not as efficient as full-information estimates. Overall, however, there are clear reasons for integrating the advantage of PLS with the advantages of the finite mixture approach in a satisfaction context. When estimating structural equation models, researchers frequently treat data as if it were collected from a single population. This is unlikely to be the norm in customer satisfaction research. In multidimensional expectancy value models, customers from different market segments can have very different belief structures. Thus the impact that different drivers have on satisfaction, and their level of performance, likely varies from segment to segment. Typically, heterogeneity in structural equation models has been addressed by assuming that consumers can be assigned to segments a priori of the basis of demographic variables, usage levels, or other proxies for the underlying segments. A limitation of the a priori approach is that heterogeneity is often not captured adequately by well-known observable variables. Jedidi/Jagpal/DeSarbo (1997a, 1997b) propose a new approach based on CSA where heterogeneous groups are identified simultaneously with the structural equation model using a finite mixture framework. Arminger/Stein (1997) propose a more general method based on covariance structure estimates. An alternative is to develop a hierarchical Bayesian methodology for treating heterogeneity in structural equation models. An important advantage of this methodology is that it automatically provides individual-specific estimates of model parameters and factor scores. This is an interesting source for marketing managers who want to implement a relationship-marketing concept based on individual customer-to-supplier relationships. However, the researcher requires some meaningful, a priori information about the parameters and more than one observation from at least some individuals. Again, our goal is to combine the advantages of PLS with the finite mixture approach. Figure 1 provides a taxonomy of methods that seek to capture heterogeneity in structural equation models and shows where our proposed approach fits in the taxonomy. Substantively, the approach allows the marketing manager to perform response-based market segmentation where all consumers or customers in a segment are homogeneous in terms of the model’s path coefficients. As we will show, the approach complements a priori segmentation by capturing heterogeneity within existing, well-known segments. Methodologically, the approach contributes to marketing research by allowing researchers to detect unobservable, discrete moderating factors that account for heterogeneity among consumers advantages of predicting path coefficients, using PLS, with the maximum likelihood estimation of a finite mixture model. Conceptually the approach expands the a priori segmentation methods to prediction-oriented structural equation models. In contrast to the existing approaches of Jedidi et al. (1997a, 1997b, and 1996) and Ansari et al. (2000) our approach is more management oriented as we can consider either formative and reflective measures in our model. In addition if all exogenous variables of the inner model are assumed to be formative and the endogenous are assumed to be reflective a simulation environment is given. 3 The finite mixture partial least squares approach Wold (1966) originally developed the PLS approach as an algorithm for least squares (LS) estimation of path models with latent variables. Each latent variable (LV) is indirectly observed by a block of manifest variables (MVs). PLS predicts the linear conditional expectation relationship between dependent and independent variables. As this approach is based on predictor specification, PLS is quite different from covariance structure analysis like LISREL that focuses on a causality concept based on accounting for covariances. (2A path model with latent variables (structural equation model) consists of an inner model (inner relations, structural model, substantive part) and an outer model. The inner model depict the relationships among the latent variables as posited by substantive theory. Let 𝛪 = the subject (observation, individual) 𝑖 with 𝑖 − 1, … , N; 𝜂i = the vector of the endogenous variables in the inner model for subject 𝑖; 𝜉𝑖 = the vector of the exogenous variables in the inner model for subject I, The inner relations can be expressed by: B𝜂i+Γ𝜉𝑖= 𝜁𝑖 (1) where B(Q × Q) and Γ(Q × P) are path coefficient matrices with Q = number of endogenous variables, P = number of exogenous variables, and 𝜁′𝑖 is a random vector of residuals 23. Outer relations define the relationships between the manifest variables (indicators) and the latent variables (components). Two kinds of outer relationships can be specified: reflective and formative. PLS allows for either type of relationship. Let 𝑥𝑖 = the vector of observed measures for the exogenous LV for subject 𝑖; 𝑦𝑖 = the vector of observed measures for the endogenous LV for subject 𝑖. The outer relations for the reflective (outward) model can be expressed by: 𝑦 = Ʌ𝑦𝜂 + 𝜀𝑦 (2a) 𝑥 = Ʌ𝑥𝜉 + 𝜀𝑥 (2b) where 𝑦 = Ʌ𝑦(K × Q) and Ʌ𝑥(P × L) are the matrices of loadings that relate the latent variables to their measures where K is the number of indicators for endogenous variables, L is the number of indicators for exogenous variables and the 𝜀’s are the residuals and usually interpreted as measurement errors or noise. In the formative case, the relationships between the LVs and their indicators are defined as: 𝜂 = 𝜋𝑛𝑦 + 𝛿𝑛 (3a) 𝜉 = 𝜋𝜁𝑥 + 𝛿𝜉 (3b) where the π’s are the multiple regression coefficients and δ’s are the residuals from regressions The usual PLS algorithm predicts Β, Γ, the Ʌ’s and the π’s with an iterative scheme of partial least squares and calculates the scores of 𝜂 and ξ for ever individual. 𝜂 and ξ are multivariate normal distributed24. The result is an aggregate predictor specification based on the constraints of Β and Γ for the whole sample. Conceptually, heterogeneity in a satisfaction model is concentrated in the path coefficients that relate quality factors and price to satisfaction and subsequent loyalty25. The proposed model is an approach to capture the heterogeneity. It assumes that 𝜂i is distributed as a finite mixture of conditional multivariate normal densities26, 𝑓𝑖⃓𝑘(•): 𝐾 𝐾 ǀ Β𝑘 ǀ 1𝜂 −1𝑖~ ∑𝑘=1 𝜌𝑘𝑓𝑘(𝜂𝑖ǀ𝜉𝑖 , Β𝑘 , Γ𝑘 , ψ𝑘) = ∑𝑘=1 𝜌𝑘 [ 𝑄/2 1/2 exp (− (Β𝑘𝜂𝑖 + Γ𝑘𝜉𝑖)′ψ𝑘 (Β𝑘𝜂𝑖 + Γ𝑘𝜉𝑖))] (2𝜋) ǀ ψ𝑘ǀ 2 (4) where: k = 1,…, K latent classes; m = 1,…, Q number of endogenous variables; j = 1,…, P number of exogenous variables; Β𝑘 = ((𝛽𝑟𝑚𝑘)), the (Q × Q) matrix of endogenous variables coefficients for latent class k(r = 1,…,Q); Γ𝑘 = ((𝛾𝑚𝑗𝑘)) , 𝑡ℎ𝑒 (𝑄 × P) matrix of exogenous variables coefficients for latent class 𝑘; ψ𝑘 = the (Q × Q) matrix with the variances for each regression of the inner model on the diagonal and zero else; 𝜌 = (𝜌1, … , 𝜌𝐾), a vector of the K mixing proportions of the finite mixture (of which K – 1 are independent) such that 𝜌𝑘 > 0 and ∑𝐾𝑘=1 𝜌𝑘 = 1. Suppose, the 𝜂𝑖 vectors are independent, the likelihood function for the N vectors (𝜂𝑖 ,…,𝜂𝑁) is given by: 𝑁 𝐾 ǀ Β𝑘 ǀ 1 𝐿 = ∏ [∑ 𝜌 −1𝑘 [ exp (− (Β𝑘𝜂𝑖 + Γ𝑘𝜉𝑖)′ψ𝑘 (Β𝑘𝜂𝑖 + Γ𝑘𝜉𝑖))]] (2𝜋)𝑄/2ǀ ψ ǀ1/2𝑘 2 𝑖=1 𝑘=1 (5) The mixing proportions 𝜌 can be construed as prior probabilities of any subject belonging to the K latent classes. The posterior probability of membership for subject i in class k (?̂?𝑖𝑘) can be computed using Bayes’ theorem, conditional on the estimates of the class-specific parameters ?̂?𝑘1Β̂k1Γ̂k1ψ̂𝑘 via: ?̂?𝑘𝑓𝑡ǀ𝑘(𝜂?̂? = 𝑖 ǀ 𝜉𝑖1Β̂k1Γ̂k1ψ̂𝑘 ) 𝑖𝑘 𝐾 . (6) ∑𝑘=1 ?̂?𝑘𝑓𝑖ǀ𝑘(𝜂𝑖ǀ𝜉𝑖,Β𝑘,Γ𝑘,ψ𝑘) 3.1 Identification of the Model Mixtures of multivariate normal densities are typically identified27, but the model specified in Equation 4 is not identified if all elements in Γ= (Γ1,…, Γ𝐾). B = (B1,…,B𝐾) and ψ = (ψ1, … , ψK) are free. Identification in this context requires placing restrictions on model parameters. The most common restrictions set some elements of Γ, Β, and ψ to zero or some other constant, whereas others entail the imposition of equality of inequality constraints on paramers28. In our model the parameters of the diagonal of ψ are free only. The off-diagonal parameters are constrained to zero. The free parameters of Γ, Β are conditional to the inner model. Only the parameters for specifying the inner model are free, whereas the other parameters are constrained to zero. 3.2 Estimation of the Model via the EM-Algorithm The likelihood of the model developed in the previous section can be maximized using the EM algorithm29. The algorithm contains an expectation part (E-step) and a maximization part (M-step). It should be mentioned that another optimization routine such as Newton-Raphson or Fletcher- Reeves could be used to maximize the likelihood-function. Convergence is not ensured with the two latter methods. The EM algorithm is attractive because it can be programmed easily and convergence is ensured. The estimation procedure can be described as a two-stage process. In the first stage a PLS solution is estimated based on the aggregate sample with the aim of obtaining predictor scores for the latent variables, 𝜂 and ξ, for each respondent individually. In the second stage the predicted scores of the latent variables are used as dependent and independent variables for a set of regressions of the inner model, defined by the constraints of B and Γ respectively. Every endogenous variable reflects a regressant of an OLS regression, whereas the regressors come from a subset of endogenous and exogenous variables. All regression equations are computed independently according to the PLS assumption. Consequently the matrix for latent class k, ψk, is a diagonal matrix with the variances of the partial regressions on the diagonal. Our segmentation approach relaxes the second stage by implementing a finite mixture model with this set of regression equations. The modification of the M-step is described later. In order to present an EM formulation, we introduce nonobserved data via the indicator function: 𝑧𝑖𝑘 = 1 if subject 𝑖 belongs to class 𝑘, = 0 otherwise. We assume that the nonobserved data in the vector 𝑧1 = (𝑧1ǀ, …,ziK) are independently and identically multinomially distributed with probabilities 𝜌𝑘. The joint likelihood of 𝜂𝑖 and 𝑧𝑖 is 𝐿𝑖(𝜂𝑖 , 𝑧𝑖, 𝜉 z 𝑖 , Β𝑘 , Γ𝑘 , ψ𝑘 , 𝜌𝑘) = ∏𝑘[𝜂𝑖𝑓( 𝜂𝑖 , ǀ𝜉𝑖 , Β𝑘 , Γ𝑘 , ψ𝑘)] . (7) The complete likelihood over all subjects is 𝐿 = ∏𝑖 ∏𝑘[𝜉 𝑧 𝑖 , Β𝑘 , Γ𝑘 , ψ𝑘)] (8) And the log-likelihood is In 𝐿 = ∑𝑖 ∑𝑘 𝑧𝑖𝑘 ln(𝑓( 𝜂𝑖 , ǀ𝜉𝑖 , Β𝑘 , Γ𝑘 , ψ𝑘)) + ∑𝑖 ∑𝑘 𝑧𝑖𝑘 ln𝜌𝑘 . (9) The matrix Z = (𝑧1, … , 𝑧𝑡) is considered as missing data. The EM-algorithm starts with an E-step, where the expectation of lnL is evaluated over the conditional distribution of the nonobserved data Z given the predicted values of 𝜂𝑖 and 𝜉𝑖 of the observed data 𝑦1 and 𝑥2, and the provisional estimates (B ∗, Γ∗, ψ∗, and 𝜌∗) of the parameters Β, Γ, ψ, and 𝜌 respectively. These estimates can be calculated from a random sample of membership probabilities of 𝑃𝑖𝑘 or can be set from the analyst based on assumptions and/or prior knowledge about the classes and the coefficients. The expectation of the likelihood function is E lnL; 𝜉𝑖 𝜌 = 𝜌 ∗, B = B∗, Γ = Γ∗, ψ = ψ∗) =∑𝑖 ∑ ∗ 𝑘 𝐸(𝑧𝑖𝑘; 𝜉𝑖, 𝜌 , B ∗, Γ∗, ψ∗ǀ𝜂𝑖)ln( 𝑓(𝜂𝑖ǀ𝜉𝑖, 𝜌 ∗ , B∗ Γ∗𝑘 𝑘, 𝑘 , ψ ∗ 𝑘)) (10) +∑ ∑ 𝐸(𝜉 , 𝜌∗ ∗ ∗ ∗ ∗𝑖 𝑘 𝑖 , B , Γ , ψ ǀ𝜂𝑖)ln𝜌𝑘 . The conditional expectation of 𝑧𝑖𝑘 can be calculated as E(𝑧𝑖𝑘;ξ, 𝜌 = 𝜌 ∗, B = B∗, Γ = Γ∗, ψ = ψ∗) = 𝜌∗𝑘𝑓(𝜂𝑖ǀ𝜉 ∗ ∗ 𝑖 , B𝑘,Γ𝑘 , ψ ∗ 𝑘)/ ∑𝑘 𝜌 ∗ 𝑘 𝑓(𝜂𝑖ǀ𝜉𝑖, B ∗ 𝑘,Γ ∗, ψ∗𝑘 𝑘). (11) Comparing (11) with (6) reveals that the posterior membership probability 𝑃∗𝑖𝑘 for subject i in class k evaluated with provisional estimates is 𝑃∗𝑖𝑘 = 𝐸(𝑧𝑖𝑘; 𝜉𝑖, B ∗, Γ∗, ψ∗ǀ𝜂𝑖). (12) The nonobserved data in matrix Z are replaced by the posterior probabilities calculated on the base of provisional estimates. Thus equation (10) becomes E (lnL; 𝜉𝑖 𝜌 = 𝜌 ∗, B = B∗, Γ = Γ∗, ψ = ψ∗) = ∑𝑖 ∑ 𝑃 ∗ 𝑘 𝑖𝑘ln(𝑓(𝜂 ǀ𝜉 , 𝜌 ∗ , B∗𝑖 𝑖 𝑘 𝑘,Γ ∗ 𝑘 , ψ ∗ 𝑘)) (13) + ∑𝑖 ∑𝑘 𝑃 ∗ ∗ 𝑖𝑘ln𝜌𝑘. In the M-step we maximize equation (9) with respect to the parameters subject to the restriction 𝜌𝑘 > 0 and ∑𝑘 𝜌𝑘 > 1, conditional on the new provisional estimates of 𝑧𝑖𝑘 in order to obtain revised parameter estimates. These revised estimates are then used in the subsequent E-step to calculate new estimates of 𝜌𝑖𝑘. These estimates are used as expectations of 𝑧𝑖𝑘 in the next M-step to get new estimates of the parameters and so forth. In our approach the M-step contains a number of independent OLS regressions, one for each regression in the inner model. The regressions of the inner model reveal the relationships between the m endogenous variables (as dependent variables) and the exogenous and endogenous variables (as independent variables) of the model. The relationships are defined via Β and Γ. Tus for each endogenous variable as a dependent variable an OLS regression is calculated in the M-step. We use the Maximum Likelihood Estimator of the coefficient and the variance, that is identical to the Lease Squares Prediction in the OLS case. Let m = number of independent regressions in the inner model; 𝐴𝑚 = number of exogenous variables as regressors in regression 𝑚; 𝐵𝑚 = number of endogenous variables as regressors in regression 𝑚; Y𝑚𝑖 = the value of the reggressor (𝐴𝑚 + 𝐵𝑚 × 1)-vector for regression 𝑚 of individual i. We obtain the parameters of the regression for endogenous variable 𝑚 with Y𝑚𝑖 = 𝜂𝑚𝑖 𝑋𝑚𝑖 = 𝐸𝑚𝑖,𝐸𝑚𝑖)′ where E𝑚𝑖 = {(𝜉𝑖, … , 𝜉𝐴 )𝑖𝑓 𝐴𝑚 >= 1, 𝑎𝑚 = 1, … , 𝐴𝑚 𝑎𝑛𝑑 𝜉𝑎 𝑖𝑠 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑜𝑟 𝑜𝑓 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑚 ( ) 𝑒𝑙𝑠𝑒. 𝑚 𝑚 N𝑚𝑖 = {(𝜂𝑖, … , 𝜂𝐵 )𝑖𝑓 𝐵 >= 1, 𝑏 = 1, … , 𝐵𝑚 𝑚 𝑚 𝑚 𝑎𝑛𝑑 𝜂𝑏 𝑖𝑠 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑜𝑟 𝑜𝑓 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑚 ( ) 𝑒𝑙𝑠𝑒. 𝑚 and the close-form OLS analytic expressions for and 𝜏𝑚𝑘 and 𝜔𝑚𝑘 𝜏𝑚𝑘 = [∑𝑖 𝑃𝑖𝑘(𝑋 ′ −1 𝑚𝑖𝑋𝑚𝑖)] [∑𝑖 𝑃𝑖𝑘(𝑋 ′ 𝑚𝑖𝑌𝑚𝑖)], (14) with 𝜏𝑚𝑘= (((𝛽𝑎 𝑚𝑘)),(( 𝑦𝑚 𝑏𝑚𝑚𝑘)))′. and 𝜔𝑚𝑘 = [∑𝑖 𝑃𝑖𝑘(Y𝑚𝑖 − 𝑋𝑚𝑖𝜏𝑚𝑘)(Y𝑚𝑖 − X𝑚𝑖𝜏𝑚𝑘)′/I𝜌𝑘] (15) with 𝜔𝑚𝑘 = 𝑐𝑒𝑙𝑙 (𝑚 × 𝑚) 𝑜𝑓 ψ𝑘 . The result of each independent regression serves as a new provisional estimate for the next E-step iteration of the EM-algorithm. The E-step and the M-step are successively applied until no further improvement in the ln-likelihood-function is possible based on a pre-specified convergence criterion. Hence, although convergence to at least a locally optimum solution is guaranteed, different starting values of the parameters must be used to investigate the potential occurrence of the local optimum. 3.3 MODEL SELECTION When applying the above model to data, the actual number of classes K is unknown and must be inferred form data. The problem of identifying the number of classes is still without a satisfactory statistical solution30. The likelihood ratio test statistic for example is not valid in a mixture model, because it is not asymptotically distributed as chi-square. Bozdogan/Sclove (1984) propose using Akaike’s (1974) Information Criteria (AIC) for determining the number of classes in a mixture model: 𝐴𝐼𝐶𝐾 = −2 ln 𝐿 + c𝑁𝐾 , (16) with c = 2 is a constant and 𝑁𝑘 is the number of free parameters: 𝑁𝐾 = (K -1) + KR + KQ. (17) R is the number of predictor variables in all regressions of the inner model. The constant c in AIC imposes a penalty on the likelihood, which weighs the increase in fit (more parameters yield a higher likelihood) against the additional number of parameters estimated31. Two criteria penalize the likelihood more heavily: Schwarz’s (1978) Bayesian information criteria 𝐵𝐼𝐶𝐾 = -2lnL+ln𝐼𝑁𝐾 , (18) where c = lnI and the consistent Akaike information criteria computed as 𝐶𝐴𝐼𝐶𝐾 = -2lnL+ln(I+1)𝑁𝐾 , (19) where c = (lnI +1). However all measures discussed above are heuristics for model selection. To assess the separation of the segments, an entropy statistic32 can be used to investigate the degree of separation in the estimated individual class probabilities. 𝐸𝑁𝐾 = 1 − [∑ ∑ −𝑃𝑖𝑘ln𝑃𝑖𝑘]/𝐼 ln 𝐾. 𝑖 𝑘 𝐸𝑁𝐾 is a relative measure and is bounded between 0 and 1. Values close to 1 indicate that the derived classes are well separated. In addition the entropy measure indicated whether a solution is interpretable or not. For example, a solution with goods heuristics and a bad entropy measure, say 𝐸𝑁𝐾 = 0, can not be interpreted accurately. The segments are “fuzzy,” which means that only ‘parts’ of the subjects belong to a class. The fuzziness of any derived class memberships makes the managerial implications equally “fuzzy”. 4 Empirical application We illustrate the Finite Mixture PLS approach using a national survey of customers’ perceived quality, satisfaction and loyalty with convenience stores. The survey was sponsored by the National Association of Convenience Stores (NACS) and based on a representative cross-section of convenience store customers and stores in the United States. The interview methodology (computer aided telephone interviews) and random sampling procedure was the same as that used for the American Customer Satisfaction Index (ACSI) survey. The data were collected in December of 1998, and the sample included 1,025 customers who were selected to be representative of the demographic profile of convenience store consumers. In terms of demographics, 42.4% were male and 57.6% were female. The age range was from 18-81 with a median age of 37. The sample was also broadly distributed across income and education levels. The NACS satisfaction model is presented in Figure 2. The Figure shows nine latent variables, benefits or consequences that are immediate antecedents of satisfaction in the model. Satisfaction, in turn, affects customer loyalty, both of which are also latent variables in the model. Each latent variable is operationalized using multiple proxies or survey measures rated on 10-point scales (see Table T). The satisfaction measures are the same as those used in the ACSI survey, while the loyalty measures are rated likelihoods of revisiting the convenience store and recommending it to others. A correlation matrix of the measurement variables is shown in Table 2. The measures of satisfaction and loyalty is assumed to be reflective whereas the rest of the measures are assumed to be formative. This is derived by the assumption of the ASCI34. 4.1 Aggregate results The aggregate PLS results are shown in Table 3. The values of the inner model calculated with PLS are equal to these of our new approach with K = 1. According to the aggregate path coefficients, the largest drivers of satisfaction are perceived safety (0.193), store layout (0.174), prices (0.168) and separate take out (0.152). The smallest drivers of satisfaction are products (0.039) and motorist sendees (0.039). This suggests that, on the whole, both products and gas services are relatively undifferentiated across convenience stores. Satisfaction has a large and significant impact on loyalty in the model (0.625). The aggregate results for K = 1 provide important benchmark values for the goodness of fit measures AIC, BIC and CAIC, which are shown in the first row of Table 4. The next section discusses the constraining of parameters and results for K greater than one. 4.2 Disaggregate finite mixture results We applied our new approach for a varying number of classes K. The impacts should be greater or equal to zero (for both the aggregate and disaggregate models) as all of the survey questions are valenced in the same direction (higher values are more attractive, such as higher quality or more attractive prices). If we assume that all drivers of satisfaction are independent of each other, the impacts can be interpreted as the increase in satisfaction that results from an increase in any particular price or quality driver. Our initial disaggregate solutions contained some small, negative coefficients for the drivers. With the assumption of independence, an interpretation would be difficult. For example, a negative impact of service to satisfaction would mean that higher quality service lowers overall satisfaction. To prevent such non-interpretable solutions, we constrained our coefficients to be equal or higher than zero. Consequently we only obtain local but interpretable optima for our coefficients. In addition, we used the constrained solution which has the minimized values in AIC, BIC and CAIC. Table 4 shows the goodness of fit statistics for model selection (ln-likelihood (LnL), AIC, BIC, CAIC) and the entropy measurement (EN) described above for K = 1, 2, 3, 4, 5 and 6. In our application the model heuristics are not in contrast to each other. If so the more conservative BIC should be preferred. In the last section we mentioned the problems associated with locally optimum solutions. Therefore, we calculated all class options (2 through 6) ten times with different random starting values to be sure that we find a global maximum. For K= 2 the algorithm always found the same solution. For K = 3 we obtained different solutions close together, but the solution with the values shown in Table 4 was the one with the optimal goodness of fit measures (the smallest values for AIC, BIC and CAIC). The solutions for K = 4, K = 5 and K = 6 showed greater variance. It shows that finding a global maximum becomes more difficult if we are looking for solutions with higher number of classes K. A reason for this phenomenon is the high-dimensional solution space for the local and global maxima. The success of iterative processes like the EM-algorithm depends on a good (plausible) set of starting values. As we do not know any good (or better plausible) sets of starting values, we have to increase the number of alternative solutions. Therefore for K = 4, K = 5 and K = 6 we started the approach twenty instead of ten times. Table 4 shows the solution with the minimized AIC, BIC and CAIC for K= 4, K = 5 and K= 6. The R2-value of the aggregated version is 0.63. The R2-value of the K = 5 classes solution is 0.88. This indicates that the explained variance has really been improved by going from one to five segments. It is important to emphasize that the approach can be used in either an explorative or confirmative fashion. If the researcher knows any a priori information about the real values of the model, he or she can integrate this information when setting the starting values. If the algorithm finds a solution that corresponds to or is very similar to the starting values, it is evident that the prior information is a good start. In contrast, if the researcher wants to find new information about his or her model a number of different starting values are used to be sure that the result is not a local optimum in the high dimensional solution space. We focus on the 5-class solution where the AIC, BIC and CAIC measures are minimized. This solution also has the best entropy measure among the K = 2 through 6 solutions (EN = 0.43). Note that the selection of the most interpretable solution is the same as for the unconstrained case mentioned earlier: the 5-class solution. Table 5 presents the path coefficients (impacts) for each of the 5 classes, where each class represents a relatively homogeneous group of customers. Going forward, we refer to these classes as market segments. Segments one through five comprise 10.7%, 36.8%, 17.2%, 27.7% and 7.6% of the overall survey population respectively. For segment one, satisfaction is almost synonymous with safety, which has an impact of 0.984. The next highest and only other significant driver is cleanliness with an impact of 0.291. Segment two, the largest segment, is more balanced in that service, prices, cleanliness, convenience and safety all have significant impacts (ranging from 0.185 to 0.260). This segment also shows the largest impact of satisfaction on loyalty (0.863). Segment three is quite different from either of the previous two segments, as store layout and separate take out are the main drivers with impacts of 0.495 and 0.485. Segment four, which is the second largest overall, is the most price sensitive segment where impact of price is 0.210. Store layout and separate take out also have significant impacts for this segment. Segment five is marked by the importance placed on store layout and convenience, with impacts of 0.595 and 0.312 respectively. These shoppers want to find what they need and get in and out of the store quickly. A membership probability is calculated for each customer in each segment. The entropy measure EN = .43 for K = 5 gives an aggregate value of how strongly customers belong to one particular segment. However the entropy measure gives no idea as to just what that means for each segment. For example, on the one hand, a customer can belong to four different segments with a membership probability of say .10 and to one segment with .60. On the other hand, a customer can belong to each segment with a membership probability of .20. In addition, the differences of the entropy statistics between the solutions are non-impressive. Therefore a more detailed investigation of the membership probabilities of the five-segment solution should be useful. Table 6 shows the number of customers who belong to a segment with a membership probability higher than .80, .60, .50 and .40 respectively. In our application only 130 customers belong to one segment with a membership probability higher than .80. This is about 13% of the whole sample. Ideally, membership probabilities should be as unique as possible for one specific segment, hence the probability should be near 1. But in reality, the lower membership probabilities illustrate the complexity of measuring response-based variables. Table 6 shows that 686 customers out of 1,025 belong to one segment with a probability higher than .50. This means that our 5-segment solution is a fairly good approximation for grouping 1,025 different individuals together into 5 segments. 4.3 Post hoc analyses of the segments To augment our interpretation of the segment-level results, we conducted post hoc analyses of the posterior probabilities of membership based on a model from Ramaswamy et al. (1993): 𝑄𝑖𝑘 = ∑𝑢 𝑍𝑖𝑢 𝛿𝑢𝑘 + 𝑣𝑖𝑘 , (21) with 𝑄𝑖𝑘 = ln(𝑃𝑖𝑘/𝑃𝑖), 𝑃𝑖=(∏𝑘 𝑃𝑖𝑘) 1/𝐾 as the geometric mean of the posterior probabilities, 𝑍𝑖𝑢 as the value of descriptive variable u for individual I, 𝛿𝑢𝑘 as the impact coefficient for variable u for segment k, 𝑣𝑖𝑘 as a random normal disturbance variable. The descriptive variables in our study, collected as part of the convenience store survey, are: gender (male/female), age (in years), number of household members (5 categories), user frequency (daily, weekly, occasional user), education (3 categories), income (3 categories), 7-11 store user (Yes/No), neighborhood store user (Yes/No). The 7-11 brand was by far the most frequently measured convenience store in the sample, hence its use as a descriptive variable. Also common were neighborhood store users who, when asked “At which convenience store do you shop most often?” they respond with a store name that is not part of a franchise system. This variable picks up the unique nature of “Mom and Pop” stores (typically family owned) that make up a large proportion of the industry. Table 7 shows the impact coefficients from the post hoc analysis of our 5-segment solution. Overall there are relatively few significant descriptors for the five segments. One exception is gender, which is significantly related to segments two through five. For segment one, which is the safety conscious segment, household size is the largest descriptor. The larger the family, the more concern there is over safety. This is logical as larger families have more children who run errands or meet friends at convenience stores. Segment two, which had the most significant drivers of satisfaction (dominated by cleanliness), are primarily females who visited 7-11 stores. Segment three shoppers, where store layout and separate take out food are the dominant drivers, are primarily females who where not weekly shoppers. Segment four, the price conscious segment, is marked only by the fact that it is more female. In contrast, the store layout and convenience segment (the “get me in and out quickly” segment) is predominantly male. Clearly, our analysis demonstrates that the results of an aggregate satisfaction model can be very misleading. Aggregate analysis hides the existence of meaningful subset of customers that are more homogeneous in their satisfaction drivers. While some customers are dominantly concerned with safety, other customers’ satisfaction is the result of convenience or price. It is also clear from our post hoc analysis that the segments can not be clearly identified using simple descriptive variables. This is natural, as segments do not exist at the level of descriptive variables but rather at the level of benefits, consequences and needs35. Yet marketing managers often require such variables to derive market action implication. Gender is clearly one variable that helps to differentiate at least one segment. User frequency could also be such a variable. Managers in the convenience store industry pay particular attention to daily, weekly and occasional users and how their needs differ. In the next section of the paper, we use these a priori groupings before the customer satisfaction model is calculated. 4.4 Disaggregate PLS results: A priori segmentation Table 8 shows the PLS results for the a priori segmentation based on the daily, weekly and occasional user segments (n = 265, 300 and 436 respectively). The solutions show that, in each segment, there are five to seven significant drivers of satisfaction, none of which have particularly large impacts. Some notable differences are the increased importance of separate take out for daily users (who likely obtain more of their meals from the stores), the importance of service to weekly customers, and the importance of products to occasional users. But the pattern of results for each segment is similar to what we found for the aggregate sample. The finite mixture-based segments show much more pronounced differences in satisfaction drivers across segments. This suggests that, while the user frequency groups are homogeneous with respect to usage, they are still quite heterogeneous in their satisfaction drivers. The solution for each group is still an aggregate of different coefficients for the drivers of satisfaction. 4.5 Disaggregate PLS results: A priori and Finite Mixture PLS segmentation To illustrate this heterogeneity and show how the Finite Mixture PLS approach provides insight to an existing, a priori segmentation scheme, we applied the new approach to the daily users segment. This “heavy user” group is of obvious importance to convenience stores and a major focus of their marketing activity. But not all daily users are necessarily looking for the same things from their convenience store. When we applied our new approach to the daily users, a two-segment solution emerged based on minimal values for the ln-likelihood, AIC, BIC and CAIC statistics (entropy = 0.60). The results are shown in Table 9. Satisfaction for Segment 1 (K = 1, 22.8% of the sample) is driven dominantly by store layout and separate take out food, followed by motorist services. These “daily shoppers” fill their grocery baskets, stomachs and vehicles at their local convenience stores. They also appreciate high quality service. In contrast, segment 2 customers (K = 2, 77.2% of the sample) are more sensitive to safety, prices and cleanliness. These “daily stoppers” seem to stop to get just what they need. Satisfaction also has much more impact on loyalty for the segment 1 “shoppers” (0.823) than for the segment 2 “stoppers” (0.460). We applied our posthoc analysis approach described above to the two-segment solution one main difference emerged. Segment 1 customers are significantly more likely to shop at 7-11 stores. 5 Discussion and conclusions An emergent solution to capturing heterogeneity in market response is to use a latent class approach such as a finite mixture model. But whereas the latent class methods are based on maximum likelihood estimation, the operationalization of a satisfaction model often necessitates a least squares-based procedure. As an SEM methodology, PLS (partial least squares) is particularly well suited to estimating and operationalizing satisfaction models in practice. PLS accommodates the skewed data and small sample sizes common in satisfaction research and, compared to other techniques, it is less prone to non-convergent or improper solutions. For managers, the performance scores and impacts that emerge from PLS analysis provide the diagnostic and benchmark information required to set priorities for improvement. The goal of this article has been to merge the advantages of least square estimation, when estimating a satisfaction model, with the advantages of maximum likelihood estimation, when deriving market segments. Our Finite Mixture PLS approach is designed to capture heterogeneity in structural equation models that link quality and price drivers to satisfaction and subsequent loyalty. It empirically derives segments and directly estimates model relationships. The advantage of the approach compared to an a priori segmentation scheme is that the derived segments are homogenous in terms of model relationships. The approach calculates segment proportions, or the degree to which customers belong to particular segments, and the results can be statistically tested with goodness of fit measures. Thus the proposed Finite Mixture PLS model expands the existing Partial Least Squares approach to include one of the central issues in marketing theory and practice-segmentation. When we apply the Finite Mixture PLS analysis to a national survey of quality, satisfaction and loyalty for convenience store customers, it reveals significant heterogeneity. Our five-segment solution identifies clear differences among customers who, for example, either value safety, separate take out and store layout, or prices. Another interesting observation is that, when we conduct a post hoc analysis that related descriptive variables to the segments, relatively few significant predictors emerge. Exceptions include gender, household size and frequency of usage. This finding is consistent with the prevailing view in marketing that segments exist at the level of benefits, consequences and needs, while descriptive variables such as age, gender and frequency of use may be weak proxies36. To illustrate the problem, we analyzed a prominent a priori segment, daily users, using the Finite Mixture PLS approach. The results again reveal clear differences in both the drivers of satisfaction and the effect of satisfaction on loyalty. Whereas “daily shoppers” value store layout, separate take out food and motorist services, “daily stoppers” value safety, prices and cleanliness. Our findings reinforce an underlying premise in marketing that is often lost in practice, particularly in the practice of measuring and managing customer satisfaction. Satisfaction studies often rely on concrete, descriptive attributes of the product, service and customer segment. According to our findings and in line with means end theory, customers do not purchase a package of attributes, but rather a complex of benefits or even a set of values. And the benefit segments themselves are not easily described using traditional demographic variables. Applied satisfaction models should strive to capture both the abstract nature of satisfaction drivers and satisfaction-based market segments. Finite mixture-based segments that are built upon a latent variable modeling approach, such as PLS, can go a long way toward explaining variance in satisfaction judgements. They also help companies to draw more reasonable conclusions than those based on descriptive variables alone, such as frequency of usage. One limitation of the proposed approach (mentioned in Footnote 4) is that it does not consider interaction effects in the inner model. In addition, in following the standard assumptions of the PLS approach, we assume that the regressions of the inner model are independent. Future research should focus on these aspects and on large-scale simulation studies to test the Finite Mixture PLS method in different marketing applications where heterogeneity is present. Another avenue for further research is a more profound identification of market segments. The post hoc analysis seems to shed not too much light on demographically identifiable segments. Therefore these segmentation results are not really useful since management can’t truly identify who are differentially in each of the market segments. One reviewer suggested a concomitant variable approach to reparameterize the mixing proportions as direct functions of the demographics. A test could reveal which model may fit best. References Akaike, Hirotuyn (1974), A new look at statistical model identification, in: IEEE Transactions on Automatic Control, Vol. 6, pp, 716-723. Ansari, Asim/Jedidi, Kamel/Jagpal, Harsharan S. (2000), A hierarchical Bayesian methodology for treating heterogeneity in structural equation models, in: Marketing Science, Vol. 19, pp. 328- 347. Arminger, Gerhard/Stein Petra (1997), Finite mixtures of covariance structure models with regressors, in: Sociological Methods & Research, Vol. 26, pp. 148-182. Bagozzi, Richard P. (1982), A field investigation of causal relations among cognitions, affects, intentions, and behavior, in: Journal of Marketing Research, Vol. 19, pp. 562-584. Bagozzi, Richard P. (1994), Structural equation models in marketing research; Basic principles, in: Bagozzi, Richard P. (ed.), Principles of Marketing Research, pp. 317-385. Bagozzi, Richard P./Yi, Y. (1994), Advanced topics in structural equation models, in: Bagozzi, Richard P. (ed.), Advanced Methods of Marketing Research, pp. 1-52. Best, Roger J. (2000), Market-based management: Strategies for growing customer value and profitability. Bozdogan, Hamparsum/Sclove, Stanley L. (1984), Multi-sample cluster analysis using Akaike’s information criterion, in: Annals of the Institute of Statistical Mathematics, Vol, 36, pp, 163-180. Brusco, Michael J/Cradit, J. Dennis/Stahl, Stephanie (2002), A Simulated Annealing Heuristic for a Bicriterion Partitioning Problem in Market Segmentation, in: Journal of Marketing Research, Vol. 39, pp. 99-109. Dempster, Arthur P/Laird, Nan M/Rubin, Donald B. (1977), Maximum likelihood from incomplete data via the EM-algorithm, in: Journal of the Royal Statistical Society: Series B, Vol. 39, pp. 1-38. Diamantopoulos, Adamantios/Winkelhofer, Heide W, (2001), Index Construction with Formative Indicators: An Alternative to Scale Development, in: Journal of Marketing Research, Vol. 38, pp. 269-277. Dillon, William R./White, John BJRao, Vithula R./Filak, Dong (1997), Good science: Use structural equation models to decipher complex customer relationships, in: Marketing Research, Vol. 9, pp. 22- 31. Fornell, Cleas (1987), A second generation of multivariate analysis: Classification of methods and implications for marketing research, in: Houston, MichaelJ. (ed.), Review of Marketing 1987. Fornell, Cleas (1995), The Quality of Economic Output: Empirical Generalizations About Its Distribution and Association to Market Share, in: Marketing Science, Vol. 14, G203-G211. Fornell, Cleas/Bookstein, Fred L. (1982), Two structural equation models: LISREL and PLS applied to consumer exit-voice theory, in: Journal of Marketing Research, Vol. 14, pp. 440-452. Fornell, Cleas/Cha, Joe (1994), Partial least squares, in: Bagozzi, Richard P. (ed.), Advanced Methods of Marketing Research, pp. 52-78. Fornell, Cleas/Johnson, Michael D/Anderson, Eugene WJCba, Joe/Bryant, Barbara E. (1996), The American customer satisfaction index: Nature, purpose and findings, in: Journal of Marketing, Vol. 60, pp. 7-18. Gustafsson, Anders/Johnson, Michael D. (1997), Bridging the quality-satisfaction gap, in: Quality Management Journal, Vol. 4, pp. 27-43. Hahn, Carsten H. (2002), Segmentspezifische Kundenzufriedenheitsanalyse. Jedidi, Kamel/Jagpal, Harshava S/DeSarbo, Wayne S. (1997a), Finite-mixture structural equation models for response-based segmentation and unobserved heterogeneity, in: Marketing Science, Vol. 16, 39-59. Jedidi, Kamel/Jagpal, Harshava S/DeSarbo, Wayne S. (1997b), STEMM: A general finite mixture structural equation model, in: Journal of Classification, Vol. 14, pp. 23-50. Jedidi, Kamel/Ramaswamy, Venkatram/DeSarbo, Wayne S/Wedel, Michel (1996), On estimating finite mixtures of multivariate regression and simultaneous equation models, in: Structural Equation Modeling, Vol. 3, pp. 266-289. Johnson, Michael D/Gustafsson, Anders (2000), Improving customer satisfaction, loyalty and profit: An integrated measurement and management system. Johnson, Michael D JGustafsson, Andeis/Andreassen, Tor W/Lervik, Line/Cha, Joe (2001), The evolution and future of national customer satisfaction index models, in: Journal of Economic Psychology, Vol. 22, pp. 217-245. Joreskog, Karl G. (1977), Structural equation models in the social sciences: Specification, estimation, and testing, in: Krishnaiah Paruchuri K. (ed.), Applications of Statistics, pp. 265-287. Kamakura, Wagner A/Russell, Gary, (1989), A probabilistic choice model for market segmentation and elasticity structure, in: Journal of Marketing Research, Vol. 26, pp. 379- 390. Kamakura, Wagner AJWedel, Michel/Agrawal, John (1994), Concomitant variable latent class models for conjoint analysis, in: International Journal of Research in Marketing, Vol. 11, pp. 451-464. Manilla, John A/James, John C. (1977), Importance-performance analysis, in: Journal of Marketing, Vol. 41, pp. 77-79. McLachlan, Geoffrey J/Basford, Kaye E. (1988), Mixture models: Inference and applications to clustering. McLachlan, GeoffreyJJKrishnan, Triyan (1997), The EM-algorithm and extensions. Muthen, Bengt O. (1989), Latent variable modeling in heterogeneous populations, in: Psychometrika, Vol. 54, pp. 557-585. Ramaswamy, Venkatram/DeSarbo, Wayne S/Reibstein, David J./Robinson, William T. (1993), An empirical pooling for estimating marketing mix elasticities with PIMS data, in: Marketing Science: Vol. 12, pp. 103-124. Schwarz, Gideon (1978), Estimating the dimension of a model, in: Annals of Statistics, Vol. 6, pp. 46l- 464. Steenkamp, Jan-Benedict E. M/Baumgartner, Hans (2000), On the use of structural equation models for marketing modeling, in: International Journal of Research in Marketing, Vol. 17, pp. 195-202. Steenkamp, Jan-Benedict E. M /van Trijp, Hans C. M. (1996), Quality guidance: A consumer-based approach to food quality improvement using partial least squares, in: European Review of Agricultural Economics, Vol. 23, pp. 195-215. Wedel, Michel/Kamakura, Wagner A. (1999), Market Segmentation, Conceptual and Methodological Foundations, Second Edition. White, Michael E. (1997), Customer Satisfaction for the Ann Arbor Soccer Referee Association, Working Paper, University of Michigan Business School. Wold, Herman (1966), Estimation of principal components and related models by iterative least squares, in: Krishnaiah, Parachuri R. (ed.), Multivariate analysis: Proceedings of an international symposium held in Dayton, Ohio, pp. 391-420.