Wine appreciation requires language. But the way you use language depends on what you consider to be a “good tasting note.” What is good? What’s the norm?
…writing is a learned activity, no different in that regard from hitting a golf ball or playing the piano. Yes, some people naturally do it better than others. But apart from a few atypical autodidacts (who exist in all disciplines), there’s no practical way to learn to write, hit a golf ball, or play the piano without guidance on many points, large and small. And everyone, even the autodidact, requires considerable effort and practice in learning the norms. The norms are important even to those who ultimately break them to good effect. Bryan A. Garner, Garner’s Modern American Usage (2009, p. 104)
Famous critics and formal tasting systems provide models/norms/reference points. But how good are those norms?
What does “green apple, citrus peel, medium+ acidity” mean, exactly?
Models are useful, but only if we don’t lose touch with what is actually going on. So let’s calibrate our models to reality. What is being written in practice? Do we really know what the norms are, or are we just imagining things?
To the best of our knowledge this is the first attempt to use NLP (natural language processing) algorithms to find structure in wine notes. Algorithms are tools we can exploit to explore what wine means to us. We are excited by this opportunity to shine new light on wine words (see our last couple of posts for background).
Models and Reality
We’ve built a tool that allows us to cluster wine words based on whether or not they occur in similar contexts. The idea is to understand wine words by analyzing their usage in context: what company does a word keep? For example, are the words oyster and sea used in similar contexts to describe breezy whites?
In case you’re wondering, the tool was easy to build. There are many pre-packaged NLP techniques that can be cobbled together to do what you want. This is the golden age of data exploration, after all! We’re a bit surprised no-one else has done what we’re doing with wine words before, but it’s fun to be first!
Tasting Model Review
We have intuitive notions about how wine descriptors relate. For example, WSET has a guide to wine-word usage based on a few broad categories called the “Systematic Approach to Tasting“. Then there’s the classic “Aroma Wheel” originally due to UC Davis Professor Ann C. Noble. A beautiful new attempt to categorize and visualize wine descriptors is due to WineFolly, see “Wine Descriptors & what they mean”.
These tasting systems / word categorizations are models. If you adhere to one of these models, your wine notes will be somewhat predictable
If you like the aroma wheel, “blackcurrant” and “cassis” may frequently occur together in your notes to describe intense, fruity red wines.
If you’re following the WSET system, cassis isn’t a “standard word” so perhaps blackcurrant and black cherry would be more likely to occur together.
If WineFolly’s poster is on your wall, I expect a different style of note entirely: this Syrah is “fleshy”, “flamboyant” and “plummy”.
Tasting systems / models / norms — whatever you want to call them, we’re being loose with words here — are important. They’re especially important when you start learning about the conventional tasting wisdom. I want to think of a model as a lens that flattens something complicated and turns it into digestible conceptual chunks.
A map is not the territory it represents, but, if correct, it has a similar structure to the territory, which accounts for its usefulness. Alfred Korzybski, Science and Sanity (1933, p. 58)
Rethinking the Models: Calibrating to real notes
So the whole point here is that rather than starting with a model, we want to turn things around and start with what the critics are actually writing.
There are many ways of building a tasting system from a collection of wine notes. The questions you have to answer are: how do I summarize the main features of the text? And what does “similar” mean? Is blackberry similar to raspberry if the two words consistently occur in the same sentence? Or are they similar if they consistently have the same neighbouring words?
It’s a bit like the process Bendor Grosvenor et. al. rely on in BBC’s Fake or Fortune: Bounce light off the painting’s surface, record what comes back, let algorithms tell you what elemental features make up the composition, then compare the features to your model of what real paintings by the artist should be like.
Down and Dirty with the Algos
OK, back to business of building wine-word models by looking at what is actually being written.
To start off with, let’s focus on a simple kind of lens, which may be familiar from a Statistics course — PCA. (If it’s not familiar, it really doesn’t matter, we’re just trying to pretend that we’re clever by mentioning fancy acronyms. Judge the acronym by the picture it produces is a good rule to live by. We will meet more lenses and pictures over the next three posts.)
Our input is a high-demensional set of features which summarise our dataset of tasting notes…
… Think of it as follows: wine contains a vast amount of information, which you boil down to a tasting note. The neural network algorithm boils down a large number of tasting note sentences and represents them as 200-dimensional vectors. The purpose of the lens is to then focus this high-dimensional summary of tasting notes down to three dimensions.
200 dimensions are hard to keep in your head, but we can visualize 3-dimensions on a simple plot: two axes + a third dimension represented on a colour-scale. Different lenses highlight different facets. So depending on which looking glass we use, we’ll get a different perspective. Some may be more useful / intuitive than others.
- Wines -> tasted by critics
- Critics write tasting notes -> critics publish tasting notes
- Database of tasting notes -> algo summarizes main features of wine note collection
- Hundred-dimensional representation of tasting vocab -> lens focuses it into 3D
Let’s start by looking at how different kinds of berries feature in our wine note collection as seen through the PCA lens. The size of the bubbles in the plot below indicate how often the descriptor shows up in the collection of notes.
Through the Looking Glass
The horizontal axis on the graph is the most telling. The further apart the words are, the more dissimilair through our lens. In particular tomato (yes a tomato is technically a berry) and gooseberry are out on their own. You’d expect this, right?
Note that even though tomato and gooseberry are close together in the picture, their colours are quite different. (This means that in a 3-D plot, they’d be far apart.)
“Berry” and “cherry” also stand out. Both are more generic than the other descriptors. In particular, cherry is usually divided up into “black” or “red”. So effectively we are combining two quite distinctive descriptors into one here, which explains why it stands out so vibrantly.
Cranberry, strawberry and redcurrant are all red and fresh and somewhat young Pinot-like, but why is raspberry not in this cluster as I would have expected? According to our lens, raspberry is closer to blackcurrant and cassis than to strawberry.
That’s it for today. Feel free to use this picture to inform your use of “berry”. For example, you could separate the graph above into 4-5 clusters:
When you’re writing a note and you’ve used the words “blueberry” and “blackcurrant” then you’ve covered most of the space on this picture already. So perhaps it’s time to move on to secondary flavour descriptors? On the other hand, perhaps you want to re-inforce the impression that this is where the flavour’s at? In that case, why not add “cherry” and “mulberry” to the mix?
See you next time for more wine words in context!
In the mean time you may enjoy (if you haven’t already) Making Peace in the Language Wars, and Tense Present: Democracy, English, and the Wars over Usage. That’s really what this is all about.
Additional Material (added on 03.04)
In response to questions on PCA and variance explained (see comments).