The premise

Each year for the NUIG Postgrad Ball (hosted by Microsoc, NUIG’s Microbiology Society), we need to provide wine. We are but poor students, and have no sommelier for the department. So, we set out to find the best wine in Galway in February 2019. The best wine for 7 euro or under, that is. As this is the second year such an analysis, we have some loose priors from last year’s results, that we may be able to incorporate.

The Study

We recruited 15 volunteers, including yours truly. While I did not label the wines, we can still consider this a single-blind study. We covered the label of the wines with blank paper, removed corks to prevent the cork-vs-screw cap bias, and labeled the bottles White [1-5] or Red [1-5], as needed. Red wines were opened an hour prior to tasting to breathe, and white wines were chilled at least 30 minutes. Samples were poured in order of the bottle number (white 1-5, then red 1-5). A vase for excess wine was provided to prevent the pressure to finish ones glass. Approximately 5-15 ml were provided in either a wine or champagne glass (due to availability). Participants were instructed to note characteristics of the wine and rate on a scale from 1 to ten.

One deviation from last year’s analysis was the presense of a lovingly prepared booklet for keeping track of tasting notes, and comparing tasting notes to those provided by the manufacturer. While one could consider this a bias, we considered this a was to counteract the negativity associated with cheap wine. So, instead of expecting a bad wine, tasters might be expecting a floral bouquet, or oakiness.

The code for this analysis can be found at https://github.com/nickp60/weird_one_offs/tree/master/cheapwine.

The Data

Let’s read in the tasting data; I ented the data, having collected the booklets after the tasting.

Here I added in the tasting order as well:

dat <- read.csv("./data/wine2019.csv", stringsAsFactors = F)
dat$order <- rep(1:10, nrow(dat)/10)
str(dat)

## 'data.frame':    150 obs. of  7 variables:
##  $ Wine      : int  1 2 3 4 5 1 2 3 4 5 ...
##  $ Person    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ preference: chr  "red" "red" "red" "red" ...
##  $ category  : chr  "white" "white" "white" "white" ...
##  $ rating    : int  7 4 3 8 2 6 7 8 7 1 ...
##  $ notes     : chr  "good" "tasteless" "tasteless" "good,tasty" ...
##  $ order     : int  1 2 3 4 5 6 7 8 9 10 ...

# I removed 3 missing values
dat <- dat %>% filter(!is.na(rating))

One difference between this analysis and the 2018 analysis was the addition of a self-reeported preference: does a taster generally prefer red or white wine? We wanted to see how that effected the mix.

One taster reported “none”, instead of reed or white. To avoid adding the extra complexity, we imputed their preference based on their responses:

dat %>% filter(preference == "none") %>% group_by(category) %>% summarize(mean=mean(rating))

## # A tibble: 2 x 2
##   category  mean
##   <chr>    <dbl>
## 1 red        5.8
## 2 white      6

They have a slight preference for white, so we will include them in that category. We also coded red/white as 1/2 for later analysis (both for preference and category).

So do we have an equal spread of red vs white preferences?

dat$preference <- ifelse(dat$preference == "none", "white", dat$preference)
ggplot(dat %>% distinct(Person, preference), aes(x=preference, fill=preference)) + geom_histogram(stat="count", color="black") +  scale_fill_manual(values = mycolors, guide=F) + labs(title = "Taster's preferences", y="Number of Tasters", x="") + wine_theme()

## Warning: Ignoring unknown parameters: binwidth, bins, pad

No! More people prefer red wine.

dat$preference <- as.factor( paste("Prefers", dat$preference))
dat$pref<- ifelse(dat$preference == "white", 1, 2)
dat$cat <- ifelse(dat$category == "white", 1, 2)

str(dat)

## 'data.frame':    147 obs. of  9 variables:
##  $ Wine      : int  1 2 3 4 5 1 2 3 4 5 ...
##  $ Person    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ preference: Factor w/ 2 levels "Prefers red",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ category  : chr  "white" "white" "white" "white" ...
##  $ rating    : int  7 4 3 8 2 6 7 8 7 1 ...
##  $ notes     : chr  "good" "tasteless" "tasteless" "good,tasty" ...
##  $ order     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ pref      : num  2 2 2 2 2 2 2 2 2 2 ...
##  $ cat       : num  1 1 1 1 1 2 2 2 2 2 ...

Now, what do the rating look like?

So we can start to see a few interesting things. We see some trends to prefering certain wines. One curious thing (coroborated by annecodotal evidence) is that there appears to be a fatigue effect. We will come back to that later. First, lets take a quick look if there are any crazy tasters:

Interestingly, it looks like people’s ratings mostly followed their professed preferences, so thats good for consistency.

But which wine is best?

A Bayesian approach to wine tasting

I have, for better or worse, drunken the Bayesian Kool-Aid, so a natural next step would be to model the data in such a manner.

The reason for doing this is several-fold:

low sample sizes
personal priors (prefering red or white)
a potential trend due to tasting fatigue
a bit of missing data

All these things make for difficult statistical analysis by convential methods, but arer prime for building a hierarchical model.

The file <./models/wine2019.stan> contains the model itself, which I will describe here. I tried to use common naming from the previous data.

For those unfamiliar with Bayesian modelling, the general premise is to use the data and any prior information to estimate different parameters of a model, in probability-based manner. This process is done using the Stan modeling software, which has an interface for use with R.

The Model

Fom the plots above, it looks like the actual quality of the wine is affected linearly by time (ie, the tasting order, from tasting fatigue), plus a taster’s preference for red vs white, plus a taster’s general preference (see above graph – taster 15 had a low averarge rating, while taster 4 had a generally high averge rating).

The Data and Parameters:

We let \(N\) be the number of datapoints, ranging from \(n_0 ... n_N\). \(Nwine\) identifies both which wine and the tasting order (white 1-5, red 1-5). We stored a 0/1 value for whether a given wine was preferend.

The parameters we are estimating: - \(b\): intercept of the linear effect of time/tasting order; ie, the rating before time/order effected the rating - \(m\): slope of the linear effect of time/tasting order - \(not\_my\_fav\): how much is a rating effected by not liking the wine color in general? - \(personal\_bias\): the variation introduced by whether a person prefers red or white

We have fit the data to the model, and we need to check the fit. First, we can plot the Rhat statistic, which should be very close to 1, indicating good mixing of the Markov sampling chains in the modeling. We can also visualize the chains.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## 'pars' not specified. Showing first 10 parameters by default.

The model fits well! This was not the case for the first several models, either due to bugs in the code, or experimenting with non-linear fits. The model also improved dgreatly after adding the personal variation parameter.

So what effect of preference do we see? First lets look to see whether there is a fatigue affect with time, by viewing the estimates for the slope of the fit:

## ci_level: 0.8 (80% intervals)

## outer_level: 0.95 (95% intervals)

It looks like it, at least for white wines, there is indeed a negative trend associated with the order, potentially indicating fatigue. Interestinly, it appears that that is not true for red wines, even though those were tasted after the white wine.

How about an effect of rating non-prefered wines?

## ci_level: 0.8 (80% intervals)

## outer_level: 0.95 (95% intervals)

There is a slight but non-significant increase in the rating when rating non-prefered wines. Huh.

How about person to person bias?

## ci_level: 0.8 (80% intervals)

## outer_level: 0.95 (95% intervals)

We can see the \(personal\_bias\) parameter recapitulating the bias we saw in the earlier plot showing personal preference means. This parameter can be used to correct for those personal preferences.

The \(not\_my\_fav\) parameter, we can see the effct of rating a non-prefered wine. It is small, as it includes 0, but the standard deviations are wide, indicating that the preferences could make rating erratic.

Finally, we can plot the modeled values over our observed values:

What we can see is that in some cases, these different effects did not change the observed results; in other cases (such as red 5 and white 5), we can see that the order of tasting potentially resulted in a lower rating than would be expected otherwise.

But then again, white wine 5 was terrible…

Conclusions

Lets add some names to these wines, and sort them by their cost:

Best of a Bad Bunch 2019

(or, “happy hour feat. Bayes”)

Nicholas Waters