Finding “Close Siblings” and “Distant Cousins”
TL;DR: I used fancy computer things to group hops of similar composition of oils. Nine different groupings of hops were derived. These groups could be used in the selection process for trialing hops. Most similar and least similar hops are listed by hop name. Four hop varieties were selected, and four beers will be made!
- Github: link
- RPubs: link
- Part 2: Recipe Building
- Part 3!: New Methodology, & Looking at Bittering vs. Flavor/Aroma
Descriptions of hops online are often not coercive enough to get me to try them out in a beer. If you’re like me, I might go by the short description, word of mouth suggestion, or maybe even the name of the hop. For some hop varieties, they have a huge abundance of descriptive information available online that seems rather reliable. For example, I do believe that Citra expresses “citrus” after reading a lot of reviews. But what about the hops that have short three word descriptions? This is what this exercise has been all about, those rarely used hops that people just don’t talk about or seem to use. Are you limiting yourself by using only your favorite hops? Is your favorite hop still out there, and you just haven’t used it yet because it has a horrible name that scares people away? Merkur? Smaragd? Dr. Rudi? Should these rarely talked about hops just die a slow death?
My hypothesis: probably not.
Therefore, I decided to *cluster* hop varieties (no pun intended), using some readily available information from the YCH website. Hop oil composition seems to be the most readily available and most talked about data of hops (other than sensory descriptions), so these oils will be the focus.
The YCH website is free. They have a good amount of hop information and their constituent oils. Hops have hundreds of compounds that all, in some way or another, contribute to the final expression of that unique hop in a beer. In this data, we only have 11. These 11 seem to be the most used and most prevalent measures, and may help describe not only the character of bitterness in the beer, but also the final flavors and *maybe* some aromas.
Hop Measurements Used:
|Total Oil||the total amount of oil present in the sample||mL/100g|
|Alpha Acid||acids that isomerize during the boil that bring bitterness in beer||% of Weight|
|Beta Acid||acids that boil off quickly, and may be better utilized in lower temperature situations or as a first wort contribution||% of Weight|
|B-Pinene||piney and/or spicy||% of Total Oil|
|Caryophyllene||“woody”||% of Total Oil|
|Cohumulone||higher percentages are commonly known to impart a “harsher bitterness” although the validity of this claim is sometimes disputed||% of Alpha Acid|
|Farnsene||floral||% of Total Oil|
|Gerianol||floral, sweet, or rose||% of Total Oil|
|Humulene||woody and/or piney||% of Total Oil|
|Linalool||floral or organge||% of Total Oil|
|Myrcene||green, resinous, or herbal||% of Total Oil|
Questions I Want Answered:
- For each hop, what are the hops most like it?
- For each hop, what are the hops least like it?
- Which hops would make an interesting experiment based on the results?
For the nitty gritty:
Condensed Overview of My Process:
- Got data from YCH Hops website
- Cleaned the data, i.e. deleted hop blends and hops with a large percentage of unavailable data
- Imputed missing data for variables that have <10% missing values (used aregImpute via Hmisc package in R)
- Calculated the distance between hop varieties (scaled distance)
- Used k-means clustering algorithm to find grouping of similar hops (I chose the gap statistic method to derive number of clusters)
- Figure 1 below shows the results of the clustering (high-res link). The overlaps suggest that some hops could possibly fit in another group. I’ll have to look into another clustering algorithm (e.g. neo-kmeans) or a hard clustering algorithm. For the purposes of trialing hops, I think this is adequate for now.
- Group Tables are further below that show the members of the 9 clusters of hops.
- If you so wish, download a package (xlsx) of entire rankings, groups, and distance.
|Group 1||Group 2||Group 3|
|Ekuanot HBC366||Brewers Gold (US)||Bobek|
|Group 4||Group 5||Group 6|
|Cascade||East Kent Golding||Ahtanum|
|Citra HBC394||Fuggle||Bitter Gold|
|Crystal||Golding (US)||Brewers Gold (DE)|
|Loral HBC291||Hallertau (US)||Celeia|
|Mt. Rainer||Helga||First Gold|
|Palisade||Mt. Hood||Northern Brewer (US)|
|Sorachi Ace||Northern Brewer (DE)||Sussex|
|Group 7||Group 8||Group 9|
|Huell Melon||HBC 431||Sterling|
|Nugget (DE)||Kohatu||Tettnang (DE)|
|Pacific Gem||Magnum (DE)||Wai-iti|
|Pride of Ringwood||Saphir|
After looking over the data some interesting observations can be made. This may be cherry picking, but:
- Columbus, Tomahawk, and Zeus (where the term CTZ comes from) were all extremely similar and grouped close together as should be expected. Also, if you didn’t know, Zeus is actually genetically different than the other two.
- Genetically similar hops did cluster together, e.g. Mosaic & Simcoe, Fuggle & Wilamette, Centennial & Brewers Gold. But also, sometimes they did not, e.g. Citra & it’s ancestors: Hallertau Mittelfruh/Tettnang/Brewer’s Gold/East Kent Golding.
- Identical genetics are not necessarily more important than terroir, e.g. Magnum US vs DE, Golding US vs UK, Tettnang US vs DE. I expected genetically identical hops to be next to each other in distance, but this did not turn out to be true. This may be a limitation of the data I used, but it may also be due to the effects of soil composition and nutrition, farming process, and weather.
- Group 5 has a notable amount of “noble” varieties of hops, while Group 7 has a high number of hops from Oceania. If I had better knowledge of more hops, I may be able to pick out general characteristics of these groups better.
A troubling issue with the results deal with the soft clustering resulting in group overlap… which suggests I may have to revisit which algorithm I use to compute clusters. Also I may have to reduce the variables used using PCA or some other form of dimension reduction. Considering there may be better ways I can find to cluster, there may be new data to use, or a new way to trial hops (e.g. removing bitterness from the equation), I consider this an ongoing project that will have to be revisited from time to time.
Overall, from my own experiences, I think these are usable results and worthy enough to hatch a few plans. If you want to use this, I’d suggest naming a few hops you like and finding other like them. Remember, these groups take into account all variables provided, which, to my eyes, seem like they straddle between qualities of bitterness, flavor, and some aroma characteristics. Perhaps I’ll remove total oil, alpha and beta measures to see what the groupings would look like if the hops were used exclusively under isomerization temperatures. Another day perhaps.
At the end of the day, all of this means nothing… unless I can prove it may be a good way to trial hops. I think the best way to use these results is for me to put the damn mouse and keyboard down and make beer. Lets see if these distance values and groupings make any sort of sense in the real world.
So I’ll make 4 SMaSH (Single Malt and Single Hop) beers using these 4 hops: HBC 682, Pacifica, Helga, & Challenger.
Why these? Well, according to the distance measure, HBC 682 and Pacifica are the “most distant” and perhaps least unlike each other. Of these two distant hops, Pacifica has a close sibling- Helga (closer than any relative to HBC 682). And, of all of the hops, Challenger had the least average distance to all other hops, which may lead one to believe it is the “most average” and most like all other hops, a middle of the road kind of gal.
Please let me know what you think via message or comment!
Next up: recipe building!
Part 2: Click here!