To provide a penalty analysis of a consumer data set, that is to investigate how liking or acceptability of product decreases when product attributes are not at the optimal intensity.
Penalty Analysis (PA) is a popular consumer science analysis method. It examines how liking or acceptability of product decreases when product attributes are not at the optimal intensity. Liking is typically measured using on a 7- or 9pt- scale, while attribute intensities are measured using either just about right (JAR) or check all that apply (CATA; binary) scales. Note, both liking and product attribute data are provided by the same consumer.
JAR questions ask the consumer to rate the intensity of a specified attribute in a product on an agreement scale, the most common being a five- point scale: much too little, too little, just about right, too much and much too much. Although other variants exist (e.g., 7-point JAR ), all pivot around a central about right response. For CATA data the scale is binary: the attribute is either perceived (1) or absent (0).
PA works by firstly assessing the distribution of JAR/CATA responses for each level of the respective scale: the absolute number and percentage of consumers that responded to each level of response are calculated. For JAR data, some pre-processing follows: it is most common to merge categories to form three final levels for subsequent analysis: too little, about right and too much. Thus, levels much too little and too little are collapsed into one level of response, as are too much and much too much into another. This is also true of the variants of the JAR scale: levels are collapsed to form three levels. EyeOpenR accepts a variety of JAR scales and will pre-process the data automatically. No pre-processing is required for CATA data.
After collapsing the JAR data to three levels, the mean liking of a product is calculated for each of the three level. For example, the mean liking is obtained for all consumers who report Product X to have too little of attribute Y, about right intensity of Y, and too much of attribute Y. There are now three mean scores per attribute, per product. The same protocol occurs for CATA data: mean liking is compared between those that report an attribute present vs. absent. Thus, for CATA data there are now two mean scores per attribute, per product.
PA for JAR data continues by calculating the difference in mean liking between about right and the two non-optimal means. These are known as penalties and mean drops in the literature. A weighted penalty is also computed, which multiplies the proportion of consumers by the mean drop. For CATA data, the difference in means is known as mean impact: it represents the difference in liking due to an attribute being present vs. absent in a product.
In summary, there are two important steps in PA: firstly, to calculate the distribution of responses across the respective scale used; secondly, to calculate the mean drop/impact (the difference in liking) for consumers responding too little/much vs. about right (JAR) or present vs. absent (CATA). Regarding proportions, in the case of JAR it is unreasonable to expect all consumers to rate an attribute (e.g., sweetness) as about right. A general rule of thumb is that the no more than 20% should report either too little or too much and that these proportions (too little and too much) should be roughly the same size (i.e., both less than 20% and both approximately equal). If there is less than this threshold reporting non optimal intensity then, in general, the attribute is deemed to be sufficiently about right level for a majority of consumers.
If more than a chosen threshold report non-optimal intensity, then it is important to understand if there is a difference in liking between this group vs. those reporting about right. In general, a mean drop of 1 or more (using a 9pt hedonic scale) is interpreted as business-relevant, although this varies. Nonetheless, a relatively high mean drop combined with a high proportion of consumers reporting non-optimal intensity is cause for concern.
For CATA data, it is unreasonable that 0 or 100% of consumers report that a product has a particular attribute, as the meaning of that attribute is different to different consumers. Nevertheless, the same principles apply as with JAR data: firstly, one inspects the distribution of proportions of absent/present responses. A 20% threshold is often used, meaning that 20% of consumers must detect an attribute as present for penalties to be interpreted. A large mean impact indicates the attribute being present/absent is important for consumer liking: it is crucial to correctly interpret the means of absent and present categories to determine the direction.
One popular way to present PA results is to visualize proportion and mean drop is by plotting the two on the X and Y axis respectively, per product. This is said to provide diagnostic information as to how one could improve a product’s characteristics. Attributes in the upper right quadrant are of concern: high proportion of consumers (X-axis) are reporting a non-optimal intensity level and the respective mean drop/impact is high (Y-axis). Different companies have their own action standards regarding what defines this area, which is sometimes referred to as a ‘danger zone’ or ‘critical corner’, but in general, a proportion of more than 20% proportion (X-axis) and more than 1pt mean drop (Y-axis) is interpreted as non-satisfactory.
As a result of PA it may be tempting to conclude that to
improve liking, the intensity needs to change in accordance with consumer
feedback. Note however, from a developer’s perspective, things are never this
simple: firstly, if a high proportion rate intensity as, say, “too little”,
then increasing the intensity will likely impact the proportion who reported “just
right”; secondly, in the domain of food products, attributes show
multi-collinearity (i.e., they are correlated and unlikely to be independent);
lastly, in a consumer test where liking and descriptors are rated in quick
succession there is likely to be a halo-effect, whereby the degree of liking colours
the perceived intensity of several attributes. Nevertheless, with these cautions
in mind, PA continues to be widely adopted in the industry.
A recent addition to PA is to collect scores of an ideal
product, that is, where consumers additionally rate whether an attribute is
present or absent for a hypothetical product. When working with CATA data,
there are four possible situations:
The analysis then proceeds as described in the preceding
section: proportions of each response level are performed followed by
calculation of mean drops.
As a recap, when an attribute is present in a test but not
in an ideal product, the test product has too much of said attribute;
likewise when an attribute is not present in a test product but is in an ideal
product, then the test product has too low intensity of said attribute;
when the intensity of the test and ideal matches, it can be thought of as about
right (JAR).
Use of R packages: car, SensoMineR