How Do Cuppers Cup? Evaluating and Evolving Elements of the SCA Cupping Protocol | 25, Issue 18

Dr. JORGE BERNY and Dr. MARIO FERNÁNDEZ-ALDUENDA share initial results of a collaborative study examining how cuppers cup and exploring the potential impacts of a proposed component of the reengineered cupping protocol.

If you survey a group of coffee cuppers on whether the SCA Cupping Protocol and Form are subjective or objective—as the SCA did in a 2021 perception study—you'll find opinions are divided. Roughly half the respondents felt the form is mainly subjective, half categorized it as mainly objective, and quite a few were in either extreme. Deeper conversation in interviews, however, turned up a useful paradigm to understand why opinions are so mixed: “intersubjectivity.” This prevailing perception, that the protocol is objective when cuppers are well trained (or “calibrated”), suggests that cuppers’ results do not reflect an objective reality nor each cupper’s individual preferences. Instead, cuppers strive to evaluate coffees based on a collective—but context-dependent—understanding of which attributes are desirable or undesirable.

But the tools we use to evaluate coffee—our senses—are human, and so are, by nature, subjective; we are subject to cultural norms, to personal preferences, and to our environment in that specific moment of coffee evaluation, particularly if we are assessing quality, as opposed to describing the flavor. Is it possible to “train away” our humanity to achieve a singular norm across all cuppers' impression of quality? In short, no—any attempt to regulate perceptions of “good” and “bad” merely substitute one set of preferences for another.

Advances in sensory science offer us a path forward by framing objectivity and subjectivity through two separate kinds of tests: analytical quality measurements, like taste intensity or body level, are considered objective; value judgment (like grade, preference, liking, or acceptability) is considered subjective. In sensory science, these analytical quality measurement tests are known as “descriptive” tests; “value judgment” tests are known as “affective” tests.

As we’ve continued on our journey to evolve the SCA’s Coffee Value Assessment System, we realized that the existing cupping form combines these two types of tests together—something sensory science advises against. The following feature, outlining a combined effort between World Coffee Research and the Coffee Science Foundation to understand how the current cupping protocol is used by cuppers, offers us a fascinating glimpse of how these affective—or preference—tests within the existing form determine (or assign) value through scoring. This work by Dr. Jorge Berny and Dr. Mario Fernández-Alduenda sets the stage for a future where we can confidently outline which markets prefer which attributes, an important step in the SCA’s longer journey to improve market access and equity throughout the specialty coffee supply chain.

Mary Basco
Research and Knowledge Development Programs Manager, SCA


The practice of cupping, by and large, is applied as a quality control process: Is this coffee the one I thought I was buying? Is it what I want to buy?

As the Specialty Coffee Association’s cupping protocol has become well established and universally used, other potential users have emerged, from roaster-retailers to researchers. As our industry has expanded its understanding of coffee, sensory science, and value creation, there has never been a better time for us to evaluate and evolve this set of tools. But in addition to thinking through how coffee is evaluated (and what characteristics are considered during evaluation), it’s important to understand who is doing the evaluating, too—and while the wide-reaching user survey and accompanying semi-structured interviews completed in 2020–2021[1] offer incredibly valuable insight, there’s nothing quite like studying how people use a tool in real time.

In these new potential uses for an evolved cupping protocol and form—specifically research conducted by World Coffee Research (WCR) to develop breeding targets—we found an opportunity to collaborate on a research project that would meet the needs of both WCR and the SCA’s Coffee Science Foundation (CSF). For the CSF, this work would give us the opportunity to explore not only how the current form and protocol were being used by cuppers, but also one of the proposed components of a reengineered SCA Coffee Value Assessment System (i.e., cupping protocol and form): a descriptive, “check-all-that-apply” compartment for flavor attributes. For WCR, it would offer an opportunity to explore whether there is a more efficient way to identify emerging coffee varieties with specialty potential quickly and easily.

Evaluating varieties during the breeding process is complicated: you need to check a plant’s performance across several traits, all of which use a different methodology of evaluation. Like other crop breeding research, coffee is evaluated for productivity, resistance to stressors, and quality. But what is “quality"? We’re coming to terms with the idea that “quality” isn’t as singular as we previously thought—ideas about quality are diverse. While cupping may be the method to evaluate quality, it’s not necessarily suited for coffee breeding—critically, because you have to wait many years for a new tree to produce coffee, but also because the cupping process itself is cost- and time-intensive (breeding programs may need to evaluate hundreds or thousands of individual trees—cupping is not ideally suited for this "high throughput" evaluation). From a breeding perspective, we need other ways of quickly checking a plant’s potential for quality, but before we can do that, we need to understand how the existing quality evaluation tools are being used. If all other factors are equal, and the cuppers follow the same protocol, how consistent is the assessment within and between cuppers? Does preference play a large role? If so, what are the cuppers assessing differently—and why? Can the protocol of evaluation be optimized to assess preferences more accurately in a sample evaluation?

So many factors can affect the coffee in a cupping bowl: the variety, climate, agronomic management, harvesting time, processing methods, drying, storage, roasting, preparation (to name a few!). Before we can evaluate how cuppers cup, we (WCR and the CSF) first needed to try to minimize these variables as much as possible. To do this, we centrally roasted 36 wet-processed F1 hybrids (grown at a single farm, Finca Aquiares in Costa Rica) and 3 control samples (grade production wet-processed “blenders” from Costa Rica, Honduras, and Nicaragua) to be prepared as cupping samples with Third Wave Water before being evaluated by 8 highly experienced industry cuppers. To avoid bias, all samples were recoded, including two repetitions of each control sample, and the order of evaluation was randomized for each cupper. The cuppers followed the SCA cupping protocol, and recorded the presence of flavor descriptors and other attributes.

 

Evidence of Intersubjectivity

Using the replicated checks, we found a high repeatability in total cupping score when the same cupper assessed a repeated sample. As expected, some of the coffees scored higher than others on average (the differences were statistically significant), and different cuppers scored differently from one another (again, differences were statistically significant). This suggests that, although cuppers have different preferences, if a moderate number of evaluations were made of the same samples, we’d likely get a good estimate of the average quality. (A large survey of industry professionals who regularly use the cupping protocol conducted by the SCA in 2020–2021 identified this as “intersubjectivity,” where “cuppers’ results do not reflect an objective reality nor each cupper’s individual hedonic, or liking, reactions to a coffee,” but instead evaluate based on a sense of “cupping criteria” which “define an attribute’s desirability or undesirability for an abstract collectivity,” i.e., “the market.”)[2] Similarly, cuppers evaluate samples with an overall higher or lower score than others.

The data suggest that personal preferences among cuppers can impact how a cupper will evaluate the different varieties (in other words, not all cuppers ranked the same varieties as highly as others; in scientific terms, this is called the “variety and cupper interaction” and it was marginally statistically significant (p = 0.08)). While this isn’t a complete surprise, further investigation of preference using a cluster analysis of the total score across cuppers and varieties found two distinct groups of sample preference and three groups of cuppers. Figure 1 is a principal component analysis (PCA) biplot showing the two variety preference clusters as well as the projections of the cuppers. Although you’ll note two circles (one blue and one red, indicating clusters of coffee samples), you can also see how cupper preferences are distributed and clustered with the lines emanating from the center: lines close together show cuppers with similar preferences. Here, you can see that cuppers 1 and 7 form a group in terms of their preference (i.e., they scored coffees similarly), cuppers 4, 5, 6, and 8 form another group, and cupper 3 generally scored inversely to the other cuppers.

An alternate view of the same data, presented in Figure 2, offers a clearer indication of these preferences—even if it looks a little like a random linkage of Star Wars TIE fighters. Where Figure 1 showed us the clusters of samples as red and blue circles of different scores, Figure 2 shows the scores of each sample cluster across the three different cupper clusters. The bits that look like stretched and condensed TIE fighters (or like many sideways letter “H”s of various widths) indicate the variation in scores for this cupper cluster for each sample cluster. If the H is narrow, there wasn’t a large difference in scores between cuppers; if the H is very stretched, it indicates more variance between scores within the group.

So, what does this different view show us? First, we see that there was a lot more variance in the scores of sample cluster 1 (in blue) than sample cluster 2 (in red). We can also see evidence of preference in the scores: because the red line and the blue line (the mean of the scores for each sample cluster for each cupper cluster) aren’t consistently parallel, it tells us that not all cuppers preferred the same samples. Not only did the cluster of cuppers 2, 4, 5, 6, and 8 (CL2_4_5_6_8) rate sample cluster 1 more highly than the cluster of cuppers 1 and 7 (CL1_7), they also clearly preferred the sample cluster 1 overall, as evidenced by the slight upward slope between their two groups in red. More strikingly, cupper 3 preferred the samples in cluster 2—the blue line sits above the red line.

 

Figure 1. Principal component analysis of total cupping score across cuppers and hybrid coffee samples. Each dot or triangle is a unique coffee sample and each line is a cupper. Lines close together show cuppers with similar preferences.

Figure 2. Mean total score comparison by cupper cluster group (horizontal axis, from left to right, cuppers 1 and 7, cuppers 2, 4, 5, 6, and 8, and cupper 3) for each sample cluster (sample cluster 1 in blue; sample cluster 2 in red).

 

 

Road-Testing a Descriptive Approach

It shouldn’t come as too much of a surprise to learn that there are differences in preferences among cuppers—we are human, after all!—but we wondered if it would be possible to learn why these differences exist. Were some cuppers likely to score in line with a clear preference (or dislike) of a particular flavor attribute? To try to see if there was any correlation, we asked the cuppers to elicit descriptors freely, capturing over 200 attributes in total. Of these, we chose attributes that were present in more than 10% of the samples for further evaluation, then subjected them to something known as “stepwise model selection” to find the smallest subset of attributes that best explained the most variation across samples. This produced nine attributes: “chocolate,” “floral,” “fruit,” “juicy,” “medium,” “mild,” “milk chocolate,” “rounded,” and “sweet.”

Next, we wanted to understand how the perception of these attributes impacted scoring. To do this, we compared a linear model for all the cuppers with a model for each subset of the cupper cluster, allowing us to estimate the effect of an attribute on the overall score if all other factors were held constant. In Figure 3, you can see that two of the nine attributes—“mild” and “medium”—have a negative estimate compared to the others: when cuppers noted the presence of these attributes, the overall cupping score was between 0.7 and 1 point lower. The remaining seven attributes—“milk chocolate,” “juicy,” “floral,” “rounded,” “chocolate,” “fruit,” and “sweet”—impacted scores positively, in order of highest impact to lowest impact. Interestingly, when comparing cupper clusters, “fruit” and “sweet” yielded the most striking differences, with cupper 3 associating the fruit more negatively and sweet more positively than the other cuppers.

To explore this in more depth, we looked at the distribution of scores for all the cupper clusters, comparing the mean scores across all samples specifically for “fruit.” In Figure 4, instead of tracking the sample clusters across the cupper clusters, we analyze how the presence of the attribute “fruit” impacts the total SCA score for the different groups of cuppers (0 = not present, 1 = present). Although, in general, it seems that the samples with the fruit attribute had higher scores overall, the “interactions” of the lines for each cupper cluster tell us a more nuanced story. First, for all cuppers except cupper 3, the presence of “fruit” is associated with higher SCA total scores. However, for cuppers 1 and 7 (indicated in green) the impact of the presence of “fruit” in a coffee shifts the score even higher than for cuppers 2, 4, 5, 6, and 8. Cupper 3, marked in red in Figure 4, is an outlier—they were the only cupper for whom mean scores for a coffee were lower when they marked “fruit” as an attribute present in the coffee sample.

 

Figure 3. Impact of specific attributes on total score by cupper cluster. As an example, an attribute plotted on the right-hand half of the graph for a given cluster will have a positive impact on the cupping scores resulting from that cluster. The circles indicate the mean impact on the score, with the horizontal lines running through them indicating the window of variation.

Figure 4. Effect of the presence of “fruit” flavor in coffee on the total score by cupper cluster. On the left, total score of samples with no “fruit.” On the right, total score of samples with “fruit”—the “fruit” flavor is correlated with higher scores for the green and blue cupper clusters.

 

 

Integrating the Results into the Coffee Value Assessment System

From a WCR perspective, we learned something important—relying on one or very few cuppers for sample evaluation can bias the assessment of samples for overall cupping quality—and we took another step forward in understanding what attributes are valuable to coffee buyers. WCR, the CSF, and the SCA are expanding on this research with a larger study that includes 230 coffee samples and feedback from 40 coffee companies in 20 countries. WCR will compare cupper feedback on these samples with lab tests (chemistry, metabolomics, and near-infrared spectroscopy) to explore possibilities for high-volume, low cost methods of predicting cup quality using lab tests.

From the SCA’s perspective, the results of this study confirmed that our plans to evolve the cupping form and protocol are on the right track, if we want to create a system that rewards multiple ideas of quality. Under the specialty coffee industry’s current cupping paradigm, the results from cupper 3 in this study would have been dismissed as coming from a “poor” or at least highly “uncalibrated” cupper. Such an easy dismissal of a cupper’s results when unaligned with the “correct” cupping scores implies a misconception about the nature of cupping results. Cupping results are affective—they do not reflect an objective property of coffee, but the cupper’s judgment about the coffee quality. Such judgment may be more or less “inter-subjective,” inasmuch as the cupper tries to cup on behalf of their market’s preferences as opposed to personal liking, but it remains subjective—and there should not be right or wrong answers for affective judgments. In the evolved cupping paradigm, which we tested here, cupper 3’s scores aren’t thrown out because they’re an outlier. Their preference is not wrong, especially if they can clearly articulate what they like and (extrapolating into the future) value. Cupper 3 is, in fact, a buyer, and perhaps potential coffee suppliers would want to align with cupper 3’s criteria.

This study also confirms the desirability of some of our industry’s favorite flavor attributes: “juicy,” “rounded,” and “chocolaty” coffees were universally rewarded with higher scores. However, some attributes usually considered as very desirable, such as “floral” and “fruity,” are not necessarily equally appealing to all cuppers. As this type of study progresses, we shall be able to further characterize the cuppers and their preference drivers, and ultimately be able to say something like “the fruity character is considered positive in Market A, whereas it is not desirable in Market B”—one of the ways in which the SCA hopes to rebalance the power of buyers relative to growers/processers by not only making market information more accessible, but by embedding mechanisms into coffee quality evaluations that allow for a variety of preferences. ◇


Dr. JORGE C. BERNY MIER Y TERAN is the Breeding and Technical Manager at World Coffee Research. Dr. MARIO R. FERNÁNDEZ-ALDUENDA is the SCA’s Technical Officer.

To learn more about the SCA’s evolution of the existing cupping system into a coffee value assessment system, visit sca.coffee/cupping.


References

[1] Understanding and Evolving the Specialty Coffee Association Coffee Value Assessment System: Results of the 2020–2021 User Perception Study and Proposed Evolution. https://sca.coffee/cupping-perception-study.

[2] https://sca.coffee/cupping-perception-study or https://sca.coffee/cupping-perception-study-es.


We hope you are as excited as we are about the release of 25, Issue 18. This issue of 25 is made possible with the contributions of specialty coffee businesses who support the activities of the Specialty Coffee Association through its underwriting and sponsorship programs. Learn more about our underwriters here.