Determining Attribute Importance in a Service Satisfaction Model Anders Gustafsson Karlstad University Michael D. Johnson University of Michigan Determining the importance that customers place on the product and service attributes that drive their satisfaction with, and loyalty to, service providers is an essential part of a firm’s resource allocation process. An unsettled issue is whether importance measures should come directly from customers or be derived statistically and, if so, how. The authors compare direct importance ratings with a variety of methods for statistically deriving attribute importance in a customer satisfaction model. Using three data sets, the methods are compared on criteria that include their ability to explain variation in satisfaction, to identify customers’ more important attributes, and to be interpretable. The findings suggest that because each of the tested methods has its strengths and weaknesses, it is essential to choose a method that is compatible with the research goals and context. Keywords: attribute importance; customer satisfaction modeling The importance that customers place on service quality attributes as drivers of satisfaction and loyalty is a critical input to a firm’s resource allocation strategy and quality improvement efforts. From a strategy standpoint, logical areas for improvement are those that are both important to customers and on which a firm is doing poorly (Martilla and James 1977). From a quality improvement standpoint, attribute importance and performance measures are critical inputs to internal change processes and tools such as quality function deployment (Mazur 1998). However, there remains a paucity of research that compares alternative methods for determining attribute importance measures within the context of a customer satisfaction model (Johnson and Gustafsson 1997). A major unsettled issue is whether importance measures should come directly from customers or be derived statistically from satisfaction and loyalty evaluations. There are pervasive arguments for deriving importance measures statistically from overall evaluations (Dillon et al. 1997; Gustafsson and Johnson 1997). Yet the available empirical research that compares direct and derived importance measures suggests that direct measures predict future behavior well (Griffin and Hauser1993). Direct measures of importance have been particularly popular in a service setting in the context of service quality measures such as SERVQUAL (Parasuraman, Zeithaml, and Berry 1988). The goal of this research is to provide insight into this debate through a systematic study of alternative methods for determining the drivers of satisfaction and loyalty. We argue and show that there is relatively wide variation in statistically derived importance measures depending on the approach used and that direct measures fall somewhere between the best and the worst of the statistical methods. The performance of each method and resulting importance measures is examined using several criteria. These include a method’s ability to explain variation in customer satisfaction and loyalty, to provide diagnostic importance measures, and to be interpretable. We begin by describing customer satisfaction and loyalty modeling and the methods that we compare. In addition to direct importance ratings, we examine different statistical methods for deriving importance measures including multiple regression (MR), normalized pairwise estimation (NPE), two variations on partial least squares (PLS), and a type of principal components regression (PCR). We then develop our criteria for evaluating importance measures. An empirical analysis of three data sets highlights several interesting results. Generally, the MR and NPE approaches explain more variation in satisfaction. NPE has the additional advantage of avoiding negative measures that are difficult to interpret. The measures derived using PLS and PCR are more diagnostic in terms of being able to identify customers’ most important attributes. There is also evidence that direct importance ratings are “forward-looking” as their relative ability to explain customer loyalty is greater than their relative ability to explain satisfaction per se. CUSTOMER SATISFACTION MODELING Identifying product and service attributes of import to customers has been a major focus of consumer and market researchers for decades. More traditional approaches to identifying importance include conjoint measurement, which focuses on evaluations of product and service concepts within controlled settings (Green and Srinivasan 1990), and choice modeling (as through analysis of diary or scanner-panel data; Guadagni and Little 1983). More recent attention has turned to understanding the attributes and benefits that drive evaluations of the consumption experience, or customer satisfaction, and subsequent loyalty (Dillon et al. 1997; Gustafsson and Johnson 1997; Ryan, Rayner, and Morrison 1999; Steenkamp and van Trijp 1996). In this study, we use company-specific survey instruments that have been developed over time for the purpose of understanding customers’ consumption experiences. The surveys are designed with a customer satisfaction model in mind. We want to determine what attributes and benefits have the most impact on, and explain most variation in, customer satisfaction. There are other alternatives to 2 identifying service qualities in need of improvement, such as the SERVQUAL instrument (Parasuraman, Zeithaml, and Berry 1988). This instrument identifies the service quality gaps between rated expectations and performance to identify and develop improvement priorities. Although the SERVQUAL questions are derived from a five-factor structure, the approach does not estimate a satisfaction model. Moreover, research has shown that the SERVQUAL scale does not always exhibit the five-factor structure it was designed for (Carman 1990; Johnson et al. 2001). Thus, the approach is less appropriate when modeling customer satisfaction or loyalty per se. We focus specifically on cumulative customer satisfaction, defined as an overall evaluation of a customer’s purchase and consumption experience to date (Fornell 1992; Fornell et al. 1996). Other research focuses on more transaction-specific satisfaction (Boulding et al. 1993; Oliver 1997). Because customer loyalty and repurchase decisions are based on a broader consumption history and explain more variation in loyalty (Lervik Olsen and Johnson 2003), cumulative satisfaction is simply more useful for our purposes. We define customer loyalty as a customer’s predisposition to repurchase from a product or service provider (a behavioral intention), which serves as a proxy for actual retention and profit (Fornell et al. 1996). In contrast to traditional conjoint measurement or choice modeling, cumulative satisfaction models emphasize customers’ perceptions of product performance, the overall evaluations that result, and the behavioral intentions they create. Cumulative satisfaction models rest heavily on multidimensional expectancy-value model formulations (Bagozzi 1992; Fishbein and Ajzen 1975) that use latent variables (Gustafsson and Johnson 1997; John- son et al. 2001). Accordingly, customers have distinguish- able beliefs or perceptions regarding their consumption experience that we label customer benefits. Each of these benefits is described using one or more concrete product attributes. These benefits are the primary antecedents of customer satisfaction as a type of overall evaluation of the consumption experience. This satisfaction, in turn, influences customers’ behavioral intentions in the form of a predisposition to repurchase the product or service again (loyalty). Figure 1 illustrates a satisfaction model using a subset of attributes and benefits from our empirical study of pharmacy services. The benefits include the accessibility of the pharmacies, the quality of the physical environment or premises, and the quality of prescription services. Each benefit is itself a function of multiple underlying attributes. Whereas the attributes describe more concrete and descriptive aspects of a product or service offering, the benefits describe the more general or abstract qualities that customers derive from the attributes (Gustafsson and Johnson 1997). The perceived accessibility or convenience of the pharmacy is, for example, a function of how easy the pharmacy is to get to, how easy it is to park, and the opening hours. One of the primary functions of such a model is to provide input to a firm’s priority setting and subsequent re- source allocation decisions. The priority setting process requires two key inputs. One is the relative importance of the various attributes and benefits toward improving customer satisfaction. The other is performance data on the attributes and benefits. Performance benchmarks are obtained from the survey-based attributes and benefits that drive satisfaction, often relative to a set of direct competitors within a market segment. Importance-performance analysis then determines where a firm should concentrate its resources to improve performance (Martilla and James 1977). The essential aspects to improve are those where importance is high and performance is low. This effectively focuses resources where they have the greatest impact on satisfaction and subsequent loyalty. Those aspects where performance and impact are both high illuminate a firm’s competitive ad- vantage. It is essential to at least maintain, if not improve, performance on these drivers. When both importance and performance are low, customers are telling us not to waste resources improving these areas. More interesting is the low importance/high performance category. This may be an area where resources have been wasted in the 3 past be- cause the improvements were not important to customers. Alternatively, these may be drivers of satisfaction that customers consider basic and necessary. Such benefits and at- tributes may be consistently provided to customers and, as a result, have little to no impact on satisfaction. In this research, we focus specifically on the determination of importance. METHODS FOR DETERMINING IMPORTANCE A key feature of satisfaction models is that the benefit, satisfaction, and loyalty constructs in the models are inherently abstract or latent variables. The most common way to empirically measure these latent variables is through the use of multiple concrete proxies or measurement variables. These measures should cover different unique dimensions in the latent variables; they should not be the same questions repeated in slightly different ways (Drolet and Morrison 2001; Garpentine 2001). Benefits are measured using their attributes, satisfaction is measured using different overall evaluation standards (such as overall satisfaction, overall performance versus expectations, overall performance versus an ideal), and loyalty is measured using behavioral intentions (such as the likelihood of re- purchase or recommendation to others). There are two general methods for determining the importance of attributes and benefits in a satisfaction model. The researcher can ask customers directly how important attributes are using scale ratings, point allocation, or paired comparison rating. The alternative is to statistically estimate the satisfaction model to derive attribute and benefit importance. We compare direct scale ratings of attribute importance to five common approaches to statistical estimation: (a) MR, (b) NPE, (c) PLS with reflective attribute specifications, (d) PLS with formative attribute specifications, and (e) a type of PCR. Multiple Regression (MR) In MR, the researcher simply regresses an entire set of product attribute ratings against some dependent variable of interest, such as satisfaction (Griffin and Hauser 1993). This approach is the 4 easiest to implement statistically. It is also the most problematic. The primary problem with this approach is that it does not take into account the distinction between measurement and latent variables in a satisfaction model. As a result, multicollinearity among the independent variables can be severe. Consider that most of the multicollinearity within a customer satisfaction model exists among multiple measures of the same underlying constructs (such as the attributes of a given benefit or the multiple measures of satisfaction). As described later, reflective PLS and PCR remove much of this multicollinearity using the measurement variables to operationalize the latent variables as indices prior to running the regressions. The primary advantage of the MR approach is that it uses all available information in the attribute performance measures to explain satisfaction. However, the severe multicollinearity in the case of MR may cause some positively valanced attributes to have negative coefficients that are difficult to interpret (Ryan, Rayner, and Morrison 1999). Normalized Pairwise Estimation (NPE) NPE is a very simple algorithm for dealing with multicollinearity that has become popular among practitioners. The procedure is described in Rust and Donthu (2003) as follows. First, correlations are obtained between each of the predictor variables and the dependent variable. An ordinary least squares (OLS) multiple regression is run, and the R2 obtained. If the predictors are uncorrelated, then the sum of the squared correlations equals the R2 from the multiple regression. If the predictors are correlated, how- ever, the sum of the squared correlations will be larger than the R2. Let us call the sum of the squared correlations S 2, and let r 2 be the square of the correlation between predictor (attribute) i and the dependent variable (e.g., satisfaction). Then the estimated importance measure for predictor i is equal to (riR/S). Conceptually, NPE adjusts individual correlations based on the total correlation in the model. Like MR, NPE uses all available information in the independent variables to explain the dependent variable. The particular advantage of the NPE approach is that it addresses the multicollinearity problem that occurs with multiple regression (where overfitting can result in negative importance measures). As long as the correlation be- tween an attribute and satisfaction are positive, the importance measure is positive. A disadvantage with both MR and NPE is that these methods do not use the type of model structure in Figure 1. No distinction is made be- tween measurement variables (such as attributes) and more latent variables (such as customer benefits, satisfaction, and loyalty). Both MR and NPE are more data driven then theory driven. As a result, a company may not take advantage of the lens of the customer (Johnson and Gustafsson 2000). Furthermore, the information generated by NPE is correlation based, and correlations do not have the desired interpretation as impact scores. The attributes never compete with each other to explain variation in satisfaction. This results in two potential limitations. First, attribute importance measures using NPE may be less diagnostic than other methods. Second, NPE may overstate the importance of a particular attribute. Partial Least Squares (PLS) Using Reflective or Formative Measures In contrast, statistical estimation of a satisfaction model through PLS accommodates the fact that the model is a network of cause-and-effect relationships (as from bene- fits, to satisfaction, to loyalty) that contains latent variables (Gustafsson and Johnson 1997; Johnson and Gustafsson 2000; Ryan, Rayner, and Morrison 1999; Steenkamp and van Trijp 1996). We focus for the moment on PLS using reflective measures. PLS is essentially an iterative estimation procedure that integrates principal components analysis with multiple regression (Fornell and Cha 1994; Wold 1966). The objective of PLS is to explain variance in the endogenous variables in a satisfaction model (such as satisfaction, loyalty, or 5 profit). Because PLS is based on principal components, the latent variables are operationalized as weighted indices of their measurement variables. And because these constructs are used as input to regression models, the weights and path coefficients relating attributes to benefits to satisfaction are akin to beta coefficients or impact scores. Measurement variables in a PLS model can be operationalized as either reflective or formative. The reflective PLS procedure extracts the first principal component from each subset of measures for the various latent variables in a model and then uses these principal components within a system of regression models. Going back to Figure 1, one equation would explain satisfaction with pharmacy ser- vices using the three benefits as independent variables, whereas a second equation would explain loyalty using satisfaction as an independent variable. The algorithm then goes through a series of iterations in an attempt to improve the ability to explain variation in the dependent measures in the regression models by adjusting the principal component weights. Performance measures for the latent variables in a PLS model are operationalized as principal components, or simple weighted averages of the measurement variables. Attribute-level impacts are determined by multiplying the unstandardized weight that an attribute contributes toward a benefit times the impact the benefit has on satisfaction in the regressions (Gustafsson and Johnson 1997). An alternative within PLS is to specify formative indicators for the attribute to benefit relationships. The reflective versus formative distinction is illustrated using the simple causal model in Figure 2 in which two benefits affect satisfaction, which in turn influences loyalty. In each case, the latent variables are described using multiple measurement variables (attributes A1 through A6 for the bene- fits, measures S1 through S3 for satisfaction, and measures L1 through L3 for loyalty). One of the benefits in Figure 2 specifies a reflective relationship between the latent and measurement variables. The latent variable is reflected in the measurement variables as indicated by the direction of the arrows for attributes A1 through A3. This approach takes the theoretical or latent variable as the starting point and proposes or implies specific observable events or measures; the observations are reflective of the underlying constructs. Contrast this with the other benefit in Figure 2 where attributes A4 through A6 are specified as formative. The benefit in this case is assumed to be made up of, or defined by, a collection of measurement variables. In essence, the reflective case presumes that the observable measures are dependent on the latent or abstract construct, whereas the formative case presumes that the latent variable is defined by, or dependent on, what is observed (Fornell and Cha 1994). Computationally, the weights in PLS are determined in different ways depending on the formative or reflective specification. When estimating a PLS model with reflective indicators, the latent variable indices are akin to principal components. The weights are simply the attribute- level loadings after rescaling. They are a function of the correlation among a construct’s measurement variables. In the formative specification mode, the multiple measures of a latent variable are regressed directly against another latent variable (e.g., the attributes of “service” are regressed directly against satisfaction), and the unstandardized multiple regression coefficients are the measurement variable weights. A disadvantage of PLS is that it ignores some information in the independent variables (attributes). The benefit indices do not retain all of the information in the original attributes. Thus, PLS should explain less variation in satisfaction than MR or NPE. And although PLS reduces the amount of multicollinearity in the model (because the highest correlations are among the attributes of a given benefit), it does not eliminate the problem. As formative PLS includes some use of multiple regression at the measurement variable level (regressing subsets of attributes against satisfaction), it shares some of the same problem inherent in the multiple regression approach. 6 The choice of reflective or formative depends on several factors. The measurement variable weights under a formative specification are most directly interpretable as impact scores (as is the case for the path coefficients) be- cause they compete with each other to explain satisfaction. The implication is that formative weights may be more di- agnostic or vary from attribute to attribute within a given benefit. At the same time, a reflective specification is more defensible in most applications. For constructs such as satisfaction and loyalty, which are likely manifested in a wide variety of measures, Johnson and Gustafsson (1997) argued that only a reflective specification is justified. Finding a finite number of subdimensions or components that make up or define a construct such as overall satisfaction is problematic. If, however, there are a specified and reason- able number of attributes that define a particular customer benefit, a formative specification is appropriate. Put sim- ply, a formative specification is much more demanding in that the latent variable should be more completely specified or defined. For example, Johnson and Gustafsson (1997) contrasted reflective and formative attribute specifications within a PLS-based satisfaction model for a large furniture retailer and found that the formative weights were better at distinguishing important from unimportant attributes. But small negative weights for some of the attributes were difficult to interpret. We test between PLS using reflective attribute specifications and PLS using formative attribute specifications. However, in both cases, satisfaction is a reflective construct. An alternative to PLS for estimating structural equation models with latent variables is a maximum likelihood– based procedure such as covariance structure analysis (CSA) using, for example, LISREL (Jöreskog 1970). Whereas PLS is prediction oriented, CSA focuses on ex- plaining covariance or the strength of relationships. CSA is a very appropriate method when testing among alternative model specifications based on strong theory and data (Lervik Olsen and Johnson 2003). However, several considerations make CSA less appropriate when operationalizing an existing quality or satisfaction model. Foremost, the latent variables in CSA are based on true- score theory. One implication is that CSA is restricted to reflective indicators. Another implication is that, when CSA models are used to develop attribute weights in an applied setting, the researcher typically reverts to OLS-based regression. It is simply not possible to interpret the weights at the measurement variable level for CSA in the same 7 manner as when PLS is used. Because the main focus in this research is on making comparisons of measurement variables, CSA is less appropriate. In addition, CSA re- quires larger sample sizes that are not always available in practice (Fornell and Bookstein 1982). Compared to PLS, for example, CSA is more susceptible to improper or nonconvergent results when estimating a complex model with many measurement variables (Bagozzi and Yi 1994). Principal Components Regression (PCR) An alternative to both reflective and formative PLS is a hybrid approach that combines the use of principal components analysis and MR. This approach is termed PCR (Frank and Friedman 1993; Massy 1965). In traditional PCR, all the attribute ratings for a product are factor- analyzed to produce a set of independent components or factors (Ryan, Rayner, and Morrison 1999). The problem is that the approach is completely data driven and atheoretical. We use a variation on PCR that is designed to estimate an existing satisfaction model, such as a model that is based on theory, which has evolved in a company over time or is developed through qualitative research (the lens of the customer). Using the approach, the benefit categories (attribute clusters) in the model are used to structure the analysis. The researcher extracts the first principle component from each subset of measures for each benefit, the satisfaction measures as a group, the loyalty measures as a group, and so on. The principal components are then used as input to a series of regression models. As with PLS, the attribute-level weights are calculated by multi- plying the weight of an attribute on a benefit times the beta coefficient or impact score for that benefit on satisfaction. This approach to PCR is actually a special case of PLS in which all observable variables or measures are reflective and there is no iteration or adjustment in the measurement variable weights. An important advantage of PCR is that it is relatively easy to implement using a variety of existing statistical packages, whereas PLS requires special software. It is conceptually very close to PLS in that it takes into account the difference between latent and measurement variables and provides regression-based impact scores for the path coefficients. Research in other domains suggests that PCR provides model fits that are quite close to PLS (Frank and Friedman 1993). The primary disadvantage of PCR compared to PLS is that it ignores residual variation among the measurement variables of different constructs; the variable weights are not adjusted to explain more variation in the dependent variables in a model. And like PLS, PCR does not use all of the information in the at- tribute measures and may not eliminate overfitting due to multicollinearity. Direct Importance Measures A categorically different approach is to ask customers directly for importance information. The two most com- mon of the direct approaches are direct rating and point al- location methods (Bottomley, Doyle, and Green 2000; Griffin and Hauser 1993; Doyle, Green, and Bottomley 1997). Using direct ratings, respondents rate the importance of individual benefits or attributes on a scale ranging, for example, from not at all important to very important (Jaccard, Brinberg, and Ackerman 1986). These direct ratings are similar to the ratings of what customers “should expect” or desire in the SERVQUAL model (Parasuraman, Zeithaml, and Berry 1988). Using point allocation methods, respondents allocate a given number (say 100) importance points among a set of attributes. A third approach, used primarily in the quality area, involves the use of paired comparison ratings. Here respondents rate the relative importance of attribute pairs. On the basis of prior research, we use direct ratings of attribute importance as a basis of comparison to the statistical methods. The paired comparison approach, as typified by the analytic hierarchy process (Saaty 1980), is relatively difficult for respondents to provide. It has also been 8 criticized for producing arbitrary measures of importance (Dyer 1990). Between the direct rating and point al- location methods, Griffin and Hauser (1993) found that they yield similar results. However, other research shows systematic differences between direct ratings and point al- location (Bottomley, Doyle, and Green 2000; Doyle, Green, and Bottomley 1997). Specifically, Bottomley, Doyle, and Green (2000) showed that direct ratings are preferred for two important reasons. First, respondents prefer direct ratings to point allocation. Second, direct ratings provide more stable weights. In our experience, direct importance ratings are also more common in practice. All of the direct methods assume that customers both understand what the researcher means by “important” and what attributes are important to them. Even if customers know what is important to them, they must be willing to tell you. As a result, direct importance measures may result in socially acceptable or status quo answers and poor discrimination (as when customers rate all attributes as relatively important). The amount of information in self- reported importance measures also drops off as the number of attributes increases (Scott and Wright 1976). Because statistical estimation of attribute and benefit importance is more objective and unbiased, it is arguably superior to direct customer ratings (Gustafsson and John- son 1997; Hayes 1998). Yet the only empirical study in which direct and derived importance measures are com- pared shows the opposite result. Griffin and Hauser (1993) compared direct importance measures with those obtained using multiple regression. They report on data obtained from a consumer products firm where importance was measured using three direct methods (a 9-point direct rating scale and two forms of point allocation—a constant sum scale of 100 points and an anchored scale in which 10 points are allocated to the most important attribute and up to 10 points are allocated to other attributes). Their results reveal high reliability among the three direct measures. Moreover, these scales each correlate highly with customer preference for seven product concepts from a product development team, which varied on the different performance dimensions. The authors then regressed attribute performance ratings for the consumer product against a rating of satisfaction. They found that the revealed importance measures did not correlate with preferences for the hypothetical product concepts. They also report briefly on another study in which revealed importance measures were obtained for a high-cost durable product. In both studies, they found several negative coefficients (where some positively scaled attributes have a negative effect on satisfaction) and poor face validity for the revealed importance measures. However, these results should be interpreted with caution. The direct measures were only compared to the multiple regression approach, which is arguably the weakest of the statistical approaches examined here. Moreover, customer satisfaction is meant to describe a customer’s accumulated experience with an existing product or service, not his or her interest in, or preferences for, hypothetical product concepts. EVALUATION CRITERIA We compare the different methods for determining importance on a series of criteria to evaluate their performance in a satisfaction-modeling context. All comparisons are made on the measurement variable level because three of the methods (MR, NPE, and direct rating) only produce results at this level. Our criteria are a method’s ability to (a) explain variation in customer satisfaction; (b) identify customers’ most important attributes; (c) avoid negative, uninterpretable importance measures; and (d) explain variation in loyalty. Although any one method may perform well on any one of the criteria, our goal is to understand which method or methods perform well across all the criteria and just how great the differences are. Variation Explained 9 As the fitting objective of the OLS estimation methods is to explain variation in the dependent variables, percent- age of variance explained (R2) is a natural criterion on which to evaluate them (Fornell and Cha 1994). This criterion is also used to evaluate the direct ratings of importance using a simple multiattribute model formulation that is analogous to the multiple regression approach. The attribute ratings, importance measures, satisfaction measures, and loyalty measures in our empirical studies are all rated on 1- to 10-point scales. For each individual, we multiply the rated importance measures times the rated performance measures for each attribute, sum the products, and divide by the sum of the weights (to normalize the function) as follows: Where Yj is the predicted satisfaction (or loyalty) for individual j, xij is rated importance of attribute i for individual j, and pij is rated performance for attribute i for individual j. Note that this function uses individual-level weights (importance ratings) to predict overall satisfaction or loyalty. In contrast, the statistical approaches produce aggregate-level weights estimated across respondents. This suggests an alternative to the weights in equation (1) where we calculate a second predicted satisfaction (or loyalty) for individual j, denoted Yj., as follows: is the average importance rating on attribute I across a similar population of customers (respondents). It is unclear at this point as to which of the two versions will be a better predictor of satisfaction and/or loyalty. Equation (1) has the advantage of using individually customized versus aggregate-level weights. Equation (2) has the advantage of aggregation over potentially error-laden individual responses. The R2s are calculated directly. Diagnosticity Diagnosticity is the ability of the method to identify just which attributes and benefits are most important, or most diagnostic, in affecting customer satisfaction. When set- ting priorities for quality improvement, the emphasis is on identifying just which area a company should invest in to ensure or increase satisfaction. Doyle, Green, and Bottomley (1997) showed how different direct methods vary with respect to their diagnosticity. Their benchmark for comparison is a linear relationship between the importance measures and their rank order. Deviations from linearity directly affect the measures’ diagnosticity. For example, the authors show that direct ratings (from not at all important to very important) deviate from linearity in a concave fashion. That is, respondents are less able to distinguish between relatively important attributes and less important attributes. Following Doyle, Green, and Bottomley’s (1997) approach, we test for diagnosticity by regressing both the attribute ranking and a quadratic term for the ranking against the importance measures for a particular method. We look for two results. First is a large and significant linear relationship between the ranks and the importance measures. Second is a convex relationship (negative 10 quadratic) be- tween rank and importance. The latter suggests that the method is particularly good at distinguishing among the more important attributes in a set. In contrast, a concave relationship (positive quadratic) between rank and importance indicates a distinct lack of diagnosticity among the more important attributes in a set. Negative Measures Using a direct rating scale, all of the attribute importance measures are positive as long as the scale values are positive. As described earlier, the multicollinearity across attribute ratings for a given offering can create problems in the form of negative weights (beta coefficients) for the regression-based methods. Based on our earlier discussion, these problems should be the most severe for the multiple regression approach because it completely ignores the relatively high collinearity among multiple measures of the same latent variable. Although NPE uses an MR to determine the total amount of correlation in a model, its estimates are based on correlations, which will only be negative if the correlation itself is negative. The PLS and PCR approaches use the satisfaction model (where attributes are used to create benefit-level indices) to reduce the multicollinearity to a level that produces meaningful benefit impacts. Attribute weights or impacts are then calculated by multiplying the weight or contribution of an attribute to its benefit times the impact of the benefit on satisfaction. For both PCR and PLS with reflective attributes, there should be relatively few negative importance weights. Because PLS with formative attributes is more similar to multiple regression, it should fall some- where in between. We examine the incidence of negative measures across the methods. Predictions In sum, an apparent inconsistency in the satisfaction literature prompted our study. While statistical methods should produce more objective attribute and benefit importance measures, the one empirical study to date sup- ports direct measures as superior. Our proposed explanation is that the performance of direct importance measures depends critically on the statistical benchmark. MR and NPE should explain the most variation in satisfaction and loyalty. Whereas MR should be very prone to negative, un- interpretable importance measures in the face of multicolinearity, NPE should not. But as NPE relies on pairwise correlations, where attributes do not compete to explain a dependent variable, importance measures obtained from this method should be less diagnostic and may be inflated. The advantage of PLS and PCR is that these methods build on a firm’s satisfaction model and recognize latent variables. These models constitute a firm’s operational theory that relates attributes to benefits, benefits to satisfaction, and satisfaction to loyalty. We expect these methods to avoid much of the multicolinearity problem that plagues MR yet provide diagnostic importance measures. Finally, we expect direct importance measures to perform some- where between the best and the worst of the statistical methods. Beyond such general predictions, our study is exploratory. Prior research has simply not compared the range of methods studied here. EMPIRICAL STUDY Our empirical study consists of making comparisons across three data sets collected specifically for the purpose of the research. All of the satisfaction models for the data sets follow the basic structure described in Figure 1 where groups of attributes provide customers with particular benefits and these benefits drive satisfaction. We place the following boundaries on our analyses and presentation for brevity. For comparison reasons, only attribute-level importance measures are examined. Recall that benefit-level impacts are used to calculate attribute-level importance in those methods where the 11 satisfaction model is used to create latent variables (reflective PLS, formative PLS and PCR). For the multiple regression, NPE and direct ratings, attribute-level importance measures are the only output. We focus on a limited model in which satisfaction or loyalty is the endogenous or dependent variable. Finally, we use a principal component of satisfaction and loyalty measures to provide the multiple regression, NPE, and direct measures with dependent variables that are comparable (in abstraction and sensitivity) with that used in the structural modeling methods. Data Sets Three data sets were collected from convenience samples of university students who completed written surveys as part of their course work. Three services for which the students had significant experience were surveyed: postal services, a supermarket chain, and pharmacy services (usable observations of 99, 91, and 70 respectively). Each company’s or agency’s own satisfaction survey was used as a base, into which direct importance ratings were incorporated. Each attribute was rated on a direct-rating scale from 1 (not at all important) to 10 (very important). The survey measures used to operationalize the benefit and satisfaction indices for each data set are shown in Appendix A. We take the companies’ models and associated surveys as given because our interest is in operationalizing an existing model (versus developing a better model). The models and surveys are all currently being used by the companies or organizations from which they are taken and have evolved over time based on both qualitative data and quantitative analysis. Before describing our results, we use diagnostic information from the reflective PLS estimation results to evaluate the quality of the structural models, which we describe prior to presenting our main results. The data were first screened to make sure that respondents had the proper experience with all the constructs. The only data set that posed a problem was the pharmacy survey, where 24 out of the original 94 respondents (25.5%) were removed based on a lack of experience with prescription services. The criterion used was simply that these respondents had not filled out that section of the questionnaire. The next step was to eliminate all attributes (variables) that had too many missing values. Replacing missing values using methods such as average value substitution leads to an underestimation of the beta coefficients due to reduced variation. Consequently, the variables need to be screened to make sure that they contain a sufficient percentage of responses. Downey and King (1998) argued and showed that average value substitution is appropriate when there are 20% or fewer values missing. This rule was applied to our data sets. Overall missing values were not a large problem. All variables with more than 20% missing values were removed. Most variables (80%) had less than 5% missing values, and there were three variables that had between 10% and 20% missing values. A small number of latent variables with only a single measurement variable were also eliminated from the analyses to maximize any observable differences among the estimation methods. Model Quality As noted, the quality of the measurement models may affect how well the different methods perform. In the case of a poor measurement model, as when the latent variables in the model are more highly correlated with each other than with their measurement variables, the models may not be measuring what they purport to measure. If the models are poor, even PLS may have significant problems with multicollinearity. Model quality, in this case, is a re- flection of the company’s or research agency’s prior qualitative research and model development. There are guidelines to apply when determining the quality of models based on PLS output (Fornell and Cha 1994). One is that the measurement variable loadings in PLS should exceed .707 to ensure that at least half of the variance in the observed variables is shared with the construct (the 12 squared correlation equals the variance explained, where .7072 = 50%). This criterion is referred to as communality. The second criterion used to evaluate the discriminant validity of the model is to explore whether each latent variable shares more variance with its measures than it does with other constructs in the model. This may be examined by looking at the percentage of measurement loadings that exceed the latent variable correlations. According to these criteria, the postal services model with 15 measurement variables is the best model; there are no communality problems, and the measurement loadings always exceed associated latent variable correlations. The supermarket model is the largest with 33 indicators. In this case, two out of eight benefit-level latent variables have an average communality just below 50%, whereas 92% of the measurement loadings exceed the latent variable correlations. The model with the most problems among the three models is the pharmacy model (with 29 measures). Here three out of nine latent variables have an average communality below 50%, and 88% of the indicators’ loadings exceed the latent variable correlations. Although all of the models are relatively good from a measurement standpoint and representative in our view of what one sees in practice, some have more weaknesses than others. It is natural for estimation problems to grow as the quality of the measurement models decline. From a quasi-experimental stand- point, the variation in model quality provides a natural basis for examining how the differences among the estimation techniques vary from model to model. Results Variation explained. Table 1 shows the variance in satisfaction explained (R2) for the methods and models. As MR and NPE use all of the information in the attribute measures, they naturally explain the largest amount of variance (78%-85%) where the average is 82%. The formative PLS models are next in explained variance with an average of 74% followed by the reflective PLS and PCR methods with 64% and 63%, respectively. Across the models, reflective PLS and PCR are quite similar. One of the more interesting results is the ability of the direct measures to predict customer satisfaction, at least in an absolute sense. When building a model at the individual level, we reach an average R2 of 46%. When aggregated importance measures are used, the average R2 increases to 52%. Although lower than the R2 averages for the statistical methods, the direct measures are not that much worse than, for example, the reflective PLS and PCR methods (especially for the postal service and supermarket data sets). It is also clear that the direct measures benefit from using more aggregate-level measures. Diagnosticity. As described earlier, we follow Doyle, Green, and Bottomley’s (1997) approach to evaluate the distribution of importance weights and their diagnosticity. Specifically, the importance measures obtained from each method and model are used to estimate the following regression equation: 13 where importance is the estimated (or directly rated) importance for each attribute and rank is the rank order importance of the attribute. We rank the importances from 1 (most important) to n (least important) for each individual method. The rank2 residual is a term obtained by first regressing the square of the rankings against the ranking to remove the linear component. This reduces the col- linearity between the linear and quadratic terms to provide more stable estimates. Recall that we look for a significant linear relationship between the ranks and importance measures as revealed by a significant negative coefficient for rank. That is, importance should decrease as an attribute’s rank increases from 1 to n. For a method to provide diagnosticity among the more important attributes, we also look for a significant negative quadratic term for 14 rank2residual. That is, importance should decrease at a de- creasing rate as an attribute’s rank increases. The results are presented in Table 2. The various methods, including the direct ratings, are similar with respect to the linear effect of rank. The differences across methods are primarily with respect to the quadratic term. The formative and reflective PLS methods are most likely to highlight those attributes that are most important to customers based on a greater incidence of significantly negative quadratic terms. For example, all three of the data sets have a significant negative effect for rank2residual under reflective PLS. In contrast, the MR estimates show only a linear relationship between importance and rank, and NPE shows two significant positive quadratic terms. To illustrate, Figure 3 shows the distribution of importance weights for each statistical method using the pharmacy data set. The figure uses the rank order from the reflective PLS model as a basis for comparison. The convex nature of the reflective PLS results are visible in the figure. The figure also reveals some similarity be- tween the formative PLS and multiple regression results. Another important finding from Table 2 is that the direct ratings show a significant positive quadratic effect in all three data sets. This suggests that the direct ratings are more diagnostic among the lower ranked attributes than among the higher ranked attributes. Figure 4 illustrates the finding using the distribution of direct importance ratings for the pharmacy data. As reported above, and consistent with Doyle, Green, and Bottomley’s (1997) results, there is both a significant linear effect for rank and a significantly positive quadratic effect for rank in all three data sets for the direct ratings. Thus, the direct ratings are poor at distinguishing among the more important attributes in each model. The concave nature of the relationships illustrates how the method is more likely to identify what is least important to customers than what is most important to customers. It is worth noting that the most important at- tributes according to the direct ratings are distributed across the different latent variables in each model. The results are not just a case of all the attributes of a particular benefit being equally important. Interestingly, the NPE approach is similarly undiagnostic for the supermarket and pharmacy data sets. Recall that this is likely due to the fact that the NPE approach is based on pairwise correlations. The attribute performance measures do not compete with each other to explain variation (except when computing the total correlation). The size of the positive quadratic terms for the direct measures is generally larger than the size of the negative quadratic terms for the statistically derived importance measures (see Table 2). This suggests that the concavity observed for the direct ratings is greater than the convexity observed using, for example, the PLS methods. Another interesting observation is that the PLS and PCR approaches are 15 most similar on this criterion when the quality of the measurement model is higher, as for the postal services and supermarket data sets. Figure 3 illustrates another difference between the NPE estimates and the other statistical methods. The NPE importance measures are higher. Table 3 shows the average level of importance by statistical method and model (data set). The differences are greatest for the pharmacy data set and smallest for the postal service data set. This is consistent with the earlier argument that, because the method is correlation based where importance measures do not compete with each other to explain satisfaction, the NPE approach may overstate importance. Again, however, the differences are greatest when the measurement model quality is lower (the pharmacy data). When the model quality is higher, as for the postal service and supermarket data, the NPE estimates are only marginally higher than the other approaches. Negative measures. Another evaluation criterion is each method’s ability to avoid negative weights or measures that, for the statistical methods, are difficult to interpret as importance. Recall that the direct ratings are positive by definition. Table 4 shows the total number of negative measures in each case as well as the number that are negative and significant (in parentheses). Our expectation was that the NPE approach should perform quite well on this criterion, followed by the PLS and PCR methods, whereas MR should perform the worst. As can be seen from Table 4, these predictions were generally confirmed. In the case of MR, 30 out of 87 attributes show negative values, 3 of which are significant. Although one might argue that these 3 attribute coefficients are significantly negative by chance, they represent 33% of all significant attributes for the MR results. The majority of the negative measures for the PLS and PCR methods come from the pharmacy and supermarket models. The negative coefficients in these cases are quite small, most of which are only negative at the fourth decimal point. Because NPE avoids overfitting by relying on pairwise correlations, negative measures should only occur when the correlations themselves are negative. Two negative but nonsignificant correlations occurred for the pharmacy data set. Performance of the direct measures. Although the direct importance measures show no negative values by definition, they explain less variation in satisfaction and are less diagnostic than the other methods. One explanation is that the direct measures focus more on what is salient or important going forward, whereas the statistically derived measures reflect what is more diagnostic among customers’ more recent experiences 16 (satisfaction). If this is the case, then the direct measures should perform better when used to explain a more forward-looking dependent variable. In the case of a satisfaction model, a natural forward-looking variable to examine is loyalty as a measure of customers’ predisposition to behave favorably toward the product or company. This loyalty is operationalized in our data sets using either a measure of future repurchase likelihood or recommendation likelihood. We tested this explanation by looking at the gap between satisfaction variance explained and loyalty variance explained for each method. If direct ratings are tapping into a more long-run importance or salience, their relative ability to explain loyalty should improve. From Table 5, we find that the average variance explained for loyalty using the statistical methods is 27% for PCR, 34% for reflective PLS, 37% for MR and NPE, and 38% for formative PLS. This is equivalent to what is found in the American customer satisfaction index where on average 36% of the loyalty latent variable is explained (Fornell et al. 1996). For the direct ratings, the average variance explained for loyalty is 33% for the aggregate-level measures and 30% for the individual measures. The observed gaps in variation explained between the statistical methods and the aggregate direct measures are much smaller than the earlier reported gaps in satisfaction variance explained (see Table 1). This lends credence to the notion that direct measures are tapping into a more forward-looking importance or salience. As a result, their performance improves when ex- plaining loyalty rather than satisfaction. SUMMARY AND DISCUSSION Academics and practitioners alike use various methods to determine the importance of attributes and benefits in a service quality and satisfaction model. These include direct ratings of importance and statistically derived impact scores. Despite the attention given to customer satisfaction and loyalty in recent years, very little research has explored which of these methods provides the better measures of importance. Although there are good arguments for deriving importance measures statistically from over- all evaluations, prior empirical evidence suggests that direct measures are superior to at least one statistical method—MR. In this article, we examined a wide variety of methods for determining importance: MR, NPE, PLS with reflective attribute specifications, PLS with formative attribute specifications, a variation on PCR, and direct importance ratings. We compared the various methods on several criteria including: (a) satisfaction variance explained, (b) diagnosticity (the ability to identify what attributes are most important to customers), (c) the incidence of negative and uninterpretable measures, and (d) loyalty variance explained. The results are summarized in Table 6. Based on our summary findings in Table 6, it is clear that no one method outperforms all the others—all methods have strengths and weaknesses. While the MR approach explains the largest amount of variance in satisfaction, it suffers the most from multicollinearity based on the incidence of 17 negative importance estimates. NPE ex- plains the same amount of variance as does multiple regression. It avoids the problem of overfitting that can affect the regression-based methods because it relies on correlations that are adjusted based on the total correlation in a model. However, our results reveal that because the attributes do not compete with each other to explain variation in satisfaction, NPE importance measures are less diagnostic (less able to identify customers’ most important attributes). NPE measures may also overstate importance somewhat depending on the quality of the measurement model. The advantage of the PLS and PCR methods is that they recognize the existence of latent variables (such as customer benefits) and leverage a company’s satisfaction model or implicit theory that relates attributes to benefits to satisfaction and loyalty. By using the structure in a satisfaction model to limit the problems of multicolinearity while allowing benefits (and attributes in the case of formative PLS) to compete with each other, these regression- based approaches provide more diagnostic importance measures. Our approach to PCR is most similar to the PLS methods when the quality of the measurement model is high. The major limitation of these approaches is that they do not use all of the information in the attribute measures and thus explain less variation in satisfaction. The primary implication of these results is that the choice of method should suit the company’s and researcher’s purpose. If, for example, the goal is simply to explain variation and provide attribute importance measures that completely avoid the problems of multicolinearity, then NPE would be a suitable approach. If, in contrast, the goal is to identify those benefits and attributes that are most diagnostic in affecting satisfaction, our results suggest that the reflective PLS method is the method of choice, and formative PLS and PCR are close substitutes. Recall, however, that the choice of reflective and formative PLS depends on just how comprehensive the attribute specifications are. We also contrasted the statistical methods with direct measures of importance based on Griffin and Hauser’s (1993) findings that direct measures outperform the MR approach with respect to, for example, the incidence of negative estimates that are difficult to interpret. Our results confirm these findings. However, a statistical approach that models product benefits and satisfaction as latent variables within a system of cause-and-effect relationships outperforms the direct ratings. As noted, the direct ratings fail to explain nearly as much variation in satisfaction as do the other methods. Direct ratings also suffer from a severe “concavity bias” (Doyle, Green, and Bottomley 1997) in that they distinguish more among the less important attributes in a set than among the more important attributes in a set. A particularly interesting finding with respect to the direct ratings is that they explain nearly as much variation in loyalty as do the statistical methods. Our conclusion is that statistical estimates of importance identify those attributes that have had the greatest impact on a customer’s more recent consumption experiences, whereas direct ratings capture what is more globally salient to customers and thus important over time. As direct and derived ratings contain somewhat different and complementary information, an implication of our results is that researchers might gainfully employ both measures to operationalize importance as a more latent construct to ex- plain loyalty. Overall, our study provides a more broad-based comparison of satisfaction research methods than previously available and yields some important conclusions. First, the NPE approach avoids the primary problem that plagues MR (multicolinearity) but is not as diagnostic as the other methods. Second, PLS using reflective indicators generally perform the best in terms of identifying attributes that are most important to customers. This is consistent with the arguments that this method is particularly well suited to operationalizing a customer satisfaction model for the purposes of driving quality improvement efforts (Gustafsson and Johnson 1997; Johnson and Gustafsson 2000; Ryan, Rayner and Morrison 1999). Finally, we find evidence that direct importance ratings are forward- looking as their relative ability to explain customer loyalty is greater than their relative ability to explain satisfaction per se. 18 One potential limitation of our study is that we assume linear relationships between the attributes and satisfaction. The classic Kano Model predicts the possibility of nonlinear relationships between performance attributes and satisfaction either over time or across a heterogeneous customer base (see Johnson and Gustafsson 2000). This prediction is supported in prior service research (Ander- son and Mittal 2000; Kumar 2002). However, our experience is that the relationships between attributes and satisfaction are essentially linear at a given point in time for a relatively homogeneous population, such as the samples used here. Consistent with this argument, initial inspection of all the attribute-satisfaction relationships in our data sets supported our assumption of linear relation- ships. Nonlinear relationships, when they occur, are more likely to exist between satisfaction and loyalty per se (Auh and Johnson 1997; Mittal and Kamakura 2001). Another limitation is that we contrast multiple statistical approaches with a single direct importance measure. This was because previous research identified direct ratings as either superior or equivalent to other direct measures and easier to collect (versus point allocation methods or paired comparisons). However, based on the relative strength of the direct measures in explaining loyalty, future research might include a wider range of direct importance measures for comparison. Our results also reveal that the similarities and differences between the methods is a function of the quality of the measurement model being used. However, we only examine this in a quasi- experimental sense. Future research using simulated data could more systematically test our finding that the better the model, the more robust the output of the estimation methods. 19 20 REFERENCES Anderson, Eugene W. and Vikas Mittal (2000), “Strengthening the Satisfaction-Profit Chain,” Journal of Service Research, 3 (November), 107-20. Auh, Seigyoung and Michael D. Johnson (1997), “The Complex Relationship between Customer Satisfaction and Loyalty for Automobiles,” in Customer Retention in the Automotive Industry: Quality, Satisfaction and Loyalty, M. D. Johnson, A. Herrmann, F. Huber, and A. Gustafsson, eds. Wiesbaden, Germany: Gabler, 141-66. Bagozzi, Richard P. (1992), “The Self-Regulation of Attitudes, Intentions, and Behavior,” Social Psychology Quarterly, 55 (2), 178-204. _____and Youjae Yi (1994), “Advanced Topics in Structural Equation Models,” in Advanced Methods of Marketing Research, Richard P. Bagozzi, ed. Cambridge, MA: Blackwell, 1-52. Bottomley, Paul A., John R. Doyle, and Rodney H. Green (2000), “Testing the Reliability of Weight Elicitation Methods: Direct Rating versus Point Allocation,” Journal of Marketing Research, 37 (November), 508-13. Boulding, William, Ajay Kalra, Richard Staelin, and Valarie A. Zeithaml (1993), “A Dynamic Process Model of Service Quality: From Expectations to Behavioral Intentions,” Journal of Marketing Research, 30 (February), 7-27. Carman, James M. (1990), “Consumer Perception of Service Quality: An Assessment of the Service Quality Dimension,” Journal of Retailing, 56 (3), 33-55. Dillon, William R., John B. White, Vithala R. Rao, and Doug Filak (1997), “Good Science: Use Structural Equation Models to Decipher Complex Customer Relationships,” Marketing Research, 9 (Winter), 22-31. Downey, R. G. and Craig V. King (1998), “Missing Data in Likert Ratings: A Comparison of Replacement Methods,” Journal of General Psychology, 125 (2), 175-92. Doyle, John R., Rodney H. Green, and Paul A. Bottomley (1997), “Judging Relative Importance: Direct Rating and Point Allocation Are Not Equivalent,” Organizational Behavior and Human Decision Processes, 70 (April), 65-72. Drolet, Aimee L. and Donald G. Morrison (2001), “Do We Really Need Muliple-Item Measures in Service Research?” Journal of Service Research, 3 (3), 196-294. Dyer, James S. (1990), “Remarks on the Analytical Hierarchy Process,” Management Science, 36 (March), 249- 58. Fishbein, M. and I. Ajzen (1975), Belief, Attitude, Intention, and Behavior: An Introduction to Theory and Research. Reading, MA: Addison-Wesley. Fornell, Claes (1992), “A National Customer Satisfaction Barometer: The Swedish Experience,” Journal of Marketing, 56 (January), 6-21. _____ and Fred L. Bookstein (1982), “Two Structural Equation Models: LISREL and PLS Applied to Consumer Exit-Voice Theory.” Journal of Marketing Research, 14 (November), 440-52. _____ and Jaesung Cha (1994), “Partial Least Squares,” in Advanced Methods of Marketing Research, Richard P. Bagozzi, ed. Cambridge, MA: Blackwell, 52-78. ______ Michael D. Johnson, Eugene W. Anderson, Jaesung Cha, and Barbara Everitt Bryant (1996), “The American Customer Satisfaction Index: Nature, Purpose and Findings,” Journal of Marketing, 60 (October), 7-18. Frank, Ildiko E. and Jerome H. Friedman (1993), “A Statistical View of Some Chemometrics Regression Tools,” Technometrics, 35 (2), 109-35. Garpentine, Terry H. (2001), “A Practitioner’s Comment on Aimee L. Drolet and Donald G. Morrison’s ‘Do We Really Need Multiple-Item Measures in Service Research?’” Journal of Service Research, 4 (2), 155-58. 21 Green, Paul E. and V. Srinivasan (1990), “Conjoint Analysis in Marketing: New Developments and Directions,” Journal of Marketing, 54 (October), 3-19. Griffin, Abbie and John R. Hauser (1993), “The Voice of the Customer,” Marketing Science, 12 (1), 1-27. Guadagni, Peter M. and John D. C. Little (1983), “A Logit Model of Brand Choice Calibrated on Scanner Data,” Marketing Science, 2 (Summer), 203-38. Gustafsson, Anders and Michael D. Johnson (1997), “Bridging the Quality- Satisfaction Gap,” Quality Management Journal, 4 (3), 27-43. Hayes, Bob E. (1998), Measuring Customer Satisfaction: Survey Design, Use, and Statistical Analysis Methods, 2nd ed. Milwaukee, WI: ASQ Quality Press. Jaccard, James, David Brinberg, and Lee J. Ackerman (1986), “Assessing Attribute Importance: A Comparison of Six Methods,” Journal of Consumer Research, 12 (March), 463-68. Johnson, Michael D. and Anders Gustafsson (1997), “Bridging the Gap II: Measuring and Prioritizing Customer Needs,” in Proceedings of the Third Annual International QFD Symposium: Volume 2, Anders Gustafsson, Bo Bergman, and Fredrick Ekdahl, eds. Linköping, Sweden: Linköping University, 21-34. ______ and ______ (2000), Improving Customer Satisfaction, Loyalty and Profit: An Integrated Measurement and Management System. San Francisco: Jossey-Bass. _____, ______, Tor Wallin Andreassen, Line Lervik, and Jaesung Cha (2001), “The Evolution and Future of National Customer Satisfaction Index Models,” Journal of Economic Psychology, 22 (April), 217-45. Jöreskog, Karl G. (1970), “A General Method for Analysis of Covariance Structures,” Biometrika, 57, 239- 51. Kumar, Piyush (2002), “The Impact of Performance, Cost, and Competitive Considerations on the Relationship between Satisfaction and Re-purchase Intent in Business Markets,” Journal of Service Research, 5 (August), 55-68. Lervik Olsen, Line and Michael D. Johnson (2003), “Service Equity, Satisfaction, and Loyalty: From Transaction-Specific to Cumulative Evaluations,” Journal of Service Research, 5 (3), 184-95. Martilla, John A. and John C. James (1977), “Importance-Performance Analysis,” Journal of Marketing, 41 (January), 77-79. Massy, William F. (1965), “Principal Components Regression in Exploratory Statistical Research,” Journal of the American Statistical Association, 60, 234-46. Mazur, Glenn (1998), “QFD for Service Industries,” in The QFD Hand- book, Jack V. ReVell, John W. Moran, and Charles A. Cox, eds. New York: John Wiley, 139-62. Mittal, Vikas and Wagner Kamakura (2001), “Satisfaction, Repurchase Intent, and Repurchase Behavior: Investigating the Moderating Effects of Customer Characteristics,” Journal of Marketing Research, 38 (1), 131-42. Oliver, Richard L. (1997), Satisfaction: A Behavioral Perspective on the Consumer. New York: McGraw- Hill. Parasuraman, A., Valarie A. Zeithaml, and Leonard L. Berry (1988), “SERVQUAL: A Multiple-Item Scale for Measuring Consumer Perceptions of Service Quality,” Journal of Retailing, 64 (Spring), 12-40. Rust, Roland T. and Naveen Donthu (2003), “Addressing Multi-collinearity in Customer Satisfaction Measurement,” working paper, Robert H. Smith School of Business, University of Maryland at College Park. Ryan, Michael J., Robert Rayner, and Andy Morrison (1999), “Diagnosing Customer Loyalty Drivers: Partial Least Squares vs. Regression,” Marketing Research, 11 (Summer), 19-26. Saaty, T. L. (1980), The Analytic Hierarchy Process. New York: McGraw-Hill. 22 Scott, James and Peter Wright (1976), “Modeling an Organization Buyer’s Product Evaluation Strategy,” Journal of Marketing Re- search, 13 (May), 211-24. Steenkamp, Jan-Benedict E. M. and Hans C. M. van Trijp (1996), “Quality Guidance: A Consumer-Based Approach to Food Quality Improvement Using Partial Least Squares,” European Review of Agricultural Economics, 23, 195-215. Wold, Hermann (1966), “Estimation of Principal Components and Related Models by Iterative Least Squares,” in Multivariate Analysis: Proceedings of an International Symposium Held in Dayton, Ohio, P.R. Krishnaiah, ed. New York: Academic Press, 391-420. 23