Clustering 115 Hop Varieties

Finding “Close Siblings” and “Distant Cousins”

TL;DR: I used fancy computer things to group hops of similar composition of oils. Nine different groupings of hops were derived. These groups could be used in the selection process for trialing hops. Most similar and least similar hops are listed by hop name. Four hop varieties were selected, and four beers will be made!


Descriptions of hops online are often not coercive enough to get me to try them out in a beer. If you’re like me, I might go by the short description, word of mouth suggestion, or maybe even the name of the hop. For some hop varieties, they have a huge abundance of descriptive information available online that seems rather reliable. For example, I do believe that Citra expresses “citrus” after reading a lot of reviews. But what about the hops that have short three word descriptions? This is what this exercise has been all about, those rarely used hops that people just don’t talk about or seem to use. Are you limiting yourself by using only your favorite hops? Is your favorite hop still out there, and you just haven’t used it yet because it has a horrible name that scares people away? Merkur? Smaragd? Dr. Rudi? Should these rarely talked about hops just die a slow death?

My hypothesis: probably not.

Therefore, I decided to *cluster* hop varieties (no pun intended), using some readily available information from the YCH website. Hop oil composition seems to be the most readily available and most talked about data of hops (other than sensory descriptions), so these oils will be the focus.

Available Data:

The YCH website is free. They have a good amount of hop information and their constituent oils. Hops have hundreds of compounds that all, in some way or another, contribute to the final expression of that unique hop in a beer. In this data, we only have 11. These 11 seem to be the most used and most prevalent measures, and may help describe not only the character of bitterness in the beer, but also the final flavors and *maybe* some aromas.

Hop Measurements Used:

Hop Measurement Description Unit
Total Oil the total amount of oil present in the sample mL/100g
Alpha Acid acids that isomerize during the boil that bring bitterness in beer % of Weight
Beta Acid acids that boil off quickly, and may be better utilized in lower temperature situations or as a first wort contribution % of Weight
B-Pinene piney and/or spicy % of Total Oil
Caryophyllene “woody” % of Total Oil
Cohumulone higher percentages are commonly known to impart a “harsher bitterness” although the validity of this claim is sometimes disputed % of Alpha Acid
Farnsene floral % of Total Oil
Gerianol floral, sweet, or rose % of Total Oil
Humulene woody and/or piney % of Total Oil
Linalool floral or organge % of Total Oil
Myrcene green, resinous, or herbal % of Total Oil

Questions I Want Answered:

  1. For each hop, what are the hops most like it?
  2. For each hop, what are the hops least like it?
  3. Which hops would make an interesting experiment based on the results?


For the nitty gritty:

Github: link

RPubs: link

Condensed Overview of My Process:

  1. Got data from YCH Hops website
  2. Cleaned the data, i.e. deleted hop blends and hops with a large percentage of unavailable data
  3. Imputed missing data for variables that have <10% missing values (used aregImpute via Hmisc package in R)
  4. Calculated the distance between hop varieties (scaled distance)
  5. Used k-means clustering algorithm to find grouping of similar hops (I chose the gap statistic method to derive number of clusters)


  1. Figure 1 below shows the results of the clustering (high-res link). The overlaps suggest that some hops could possibly fit in another group. I’ll have to look into another clustering algorithm (e.g. neo-kmeans) or a hard clustering algorithm. For the purposes of trialing hops, I think this is adequate for now.
  2. Group Tables are further below that show the members of the 9 clusters of hops.
  3. If you so wish, download a package (xlsx) of entire rankings, groups, and distance.

Figure 1:

high res clusters

(high-res link)

Group Tables:

Group 1 Group 2 Group 3
Columbus Bravo Blanc
Ekuanot HBC366 Brewers Gold (US) Bobek
Ella Centennial Mandarina Bavaria
Galaxy Chinook Motueka
Magnum (US) Comet Riwaka
Millenium Galena Super Pride
Polaris HBC 438 Waimea
Summit HBC 682
Tomahawk Mosaic HBC369
Zeus Simcoe
Group 4 Group 5 Group 6
Amarillo Bramling Cross Admiral
Cascade East Kent Golding Ahtanum
Citra HBC394 Fuggle Bitter Gold
Crystal Golding (US) Brewers Gold (DE)
Horizon Golding (UK) Bullion
Loral HBC291 Hallertau (US) Celeia
Merkur Hallertau Mittlefruh Cluster
Mt. Rainer Helga First Gold
Nugget (US) Liberty Newport
Palisade Mt. Hood Northern Brewer (US)
Rakau Northdown Pilgrim
Sorachi Ace Northern Brewer (DE) Sussex
Tahoma Pacifica Target
Triskel Perle (US)
Ultra Perle (DE)
Savinjski Golding
Tettnang (US)
Whitbread Golding
Group 7 Group 8 Group 9
Chelan Aramis Saaz (CZ)
Dr. Rudi Aurora Santiam
Green Bullet Challenger Select
Herkules Glacier Spalt
Huell Melon HBC 431 Sterling
Nelson Sauvin Hersbrucker Sylva
Nugget (DE) Kohatu Tettnang (DE)
Pacific Gem Magnum (DE) Wai-iti
Pacific Jade Opal
Pioneer Premiant
Pride of Ringwood Saphir
Southern Cross Smaragd
Wakatu Strisselspalt


After looking over the data some interesting observations can be made. This may be cherry picking, but:

  1. Columbus, Tomahawk, and Zeus (where the term CTZ comes from) were all extremely similar and grouped close together as should be expected. Also, if you didn’t know, Zeus is actually genetically different than the other two.
  2. Genetically similar hops did cluster together, e.g. Mosaic & Simcoe, Fuggle & Wilamette, Centennial & Brewers Gold. But also, sometimes they did not, e.g. Citra & it’s ancestors: Hallertau Mittelfruh/Tettnang/Brewer’s Gold/East Kent Golding.
  3. Identical genetics are not necessarily more important than terroir, e.g. Magnum US vs DE, Golding US vs UK, Tettnang US vs DE. I expected genetically identical hops to be next to each other in distance, but this did not turn out to be true. This may be a limitation of the data I used, but it may also be due to the effects of soil composition and nutrition, farming process, and weather.
  4. Group 5 has a notable amount of “noble” varieties of hops, while Group 7 has a high number of hops from Oceania. If I had better knowledge of more hops, I may be able to pick out general characteristics of these groups better.

A troubling issue with the results deal with the soft clustering resulting in group overlap… which suggests I may have to revisit which algorithm I use to compute clusters. Also I may have to reduce the variables used using PCA or some other form of dimension reduction. Considering there may be better ways I can find to cluster, there may be new data to use, or a new way to trial hops (e.g. removing bitterness from the equation), I consider this an ongoing project that will have to be revisited from time to time.

Overall, from my own experiences, I think these are usable results and worthy enough to hatch a few plans. If you want to use this, I’d suggest naming a few hops you like and finding other like them. Remember, these groups take into account all variables provided, which, to my eyes, seem like they straddle between qualities of bitterness, flavor, and some aroma characteristics. Perhaps I’ll remove total oil, alpha and beta measures to see what the groupings would look like if the hops were used exclusively under isomerization temperatures. Another day perhaps.

Future Plans:

At the end of the day, all of this means nothing… unless I can prove it may be a good way to trial hops. I think the best way to use these results is for me to put the damn mouse and keyboard down and make beer. Lets see if these distance values and groupings make any sort of sense in the real world.

So I’ll make 4 SMaSH (Single Malt and Single Hop) beers using these 4 hops: HBC 682, Pacifica, Helga, & Challenger.

Why these? Well, according to the distance measure, HBC 682 and Pacifica are the “most distant” and perhaps least unlike each other. Of these two distant hops, Pacifica has a close sibling- Helga (closer than any relative to HBC 682). And, of all of the hops, Challenger had the least average distance to all other hops, which may lead one to believe it is the “most average” and most like all other hops, a middle of the road kind of gal.

Please let me know what you think via message or comment!

Next up: recipe building!

Part 2: Click here!

4 thoughts on “Clustering 115 Hop Varieties

  1. “Perhaps I’ll remove total oil, alpha and beta measures to see what the groupings would look like if the hops were used exclusively under isomerization temperatures.” – what do you mean when your refer to “under isomerization temperatures? Are you saying, when utilized at temperatures below isomerization (sub-170*F)?


    1. Yep, under 170 F. The groupings now cluster partially based on alpha acid percentage (so higher alpha hops may be in the same group, unless the other variables highly correlate a lower alpha hop with those higher alpha hops). At the end of the day, if I want a hop similar to Simcoe for dry hopping or whirlpool, I don’t necessarily desire a hop that has similar alpha acid percentage, just flavor and aroma contribution, since I can always adjust usage rates to get the same total oil or “pungency” (which is another study altogether). I imagine I’ll re-run this if people actually read my blog.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s