A week or so ago, the grand Magus over at lamages.blogspot.com/ published a great, quick thought exercise taken from Daniel Kahneman’s book Thinking, Fast and Slow. Here are the particulars of the problem: you’re in a community with two different color vehicles; 85% are green and 15% are blue. A vehicle was involved in a hit and run accident. A witness says the car was blue. We can establish that the witness may correctly identify the color of a car 80% of the time. Given all that, what is the probability that the car is blue?

In an uncharacteristic fit of industriousness, I didn’t just read through the explanation, but tried to work it out myself. My natural inclination was to assume a table as follows:

Car is green | Car is blue | Marginal | |
---|---|---|---|

Witness is correct | 68% | 12% | 80% |

Witness is incorrect | 17% | 3% | 20% |

Marginal | 85% | 15% | 100% |

The interior probabilities can be worked out by multiplying the marginals, which is a great thing for lazy people like me. However, structuring things in this way can make the problem a bit harder to work out. This configuration doesn’t directly address our question. If we want to know the chance that the car was blue- given that the witness says that it was blue- we have to pluck out the scenarios wherein the witness will state that she saw a blue car. These are: 1) the witness is correct and the car is blue and 2) the witness is incorrect and the car is green. The double negative does my head in. Notwithstanding that, if we add those two probabilities together, we get the 29% chance that the witness says blue and we can then normalize the 12% (car is blue) to get the 41% chance that the car is actually blue.

The 2×2 table suggested by the Mage’s approach is as follows. Note that he had to work out the marginals, as explained in his post.

Car is green | Car is blue | Marginal | |
---|---|---|---|

Witness says green | 68% | 3% | 71% |

Witness says blue | 17% | 12% | 29% |

Marginal | 85% | 15% | 100% |

With this setup, the probability of a blue car is easy to isolate as the scenario now takes up a single row. Just divide the 12% by 29% to normalize the row and we arrive at the posterior probability of 41%.

It’s a subtle thing, but meaningful. The other nice thing about this approach is that it’s consistent with how we might look at a binary classification problem. It’s worth taking a quick moment to identify some of the salient attributes of the table. Positive predictive value is the ratio of true positives divided by total number of positive forecasts. This is the likelihood that the witness will be correct, when she says that a car is green. (“Positive” in this context may be regarded as a green car.) In this case, that number is 96%. This is very high. The negative predictive value, which is the likelihood the witness will be correct when she says blue is 41%.

This is what the Mage’s table looks like with the probabilities replaced by the terms we use to describe them. (Again, “positive” in this case is arbitrarily defined to be a green car.)

Car is green | Car is blue | |
---|---|---|

Witness says green | Positive Predictive Value (PPV) | False Discovery Rate (FDR) |

Witness says blue | Fale Omission Rate (FOR) | Negative Predictive Value (NPV) |

Marginal | Prevalence | 1-Prevalence |

There’s a very important point that isn’t emphasized in the example. The witness’ Accuracy is less than the Prevalence. The witness could achieve an accuracy of 85% by simply guessing “green” for every car that she sees. This is what gives us the counterintuitive result that a witness who is accurate 80% of the time has less than a 50/50 chance of having seen a blue car. How accurate would they need to be to get at least 50%?

The quantity we’re interested in is the negative predictive value divided by the sum of NPV / FOR. I don’t have a letter for this, so I’m calling it q. (I’d love to hear that this thing has a name.)

Solving for A, we get:

For the case where q=0.5, this simplifies to the Prevalence. If the witness’ accuracy is equal to the percentage of green cars, we can be 50% confident that she really saw a blue car. (Note that we can’t get to accuracy= Prevalence by always guessing a green car. If this were the case, we’d never have a situation where the witness said she saw a blue car.)

How does our table look now?

Car is green | Car is blue | Marginal | |
---|---|---|---|

Witness says green | 72.25% | 2.25% | 74.5% |

Witness says blue | 12.75% | 12.75% | 25.5% |

Marginal | 85% | 15% | 100% |

We can see that when the witness says the car is blue, the odds are even that the car is blue. And if she says green? That’s 97%.

There is a bit of contrivance at work here in that we know the population of vehicles with certainty. We also presume that the population has no other defining characteristics that could improve our accuracy. Perhaps blue cars are driven at particular hours of the day or more in certain locations? However, there is a bit of a lesson. If you’re trying to identify the rare case, you must be able to generate an accuracy that’s higher than the prevalence of the baseline case. A rare disease needs a very precise diagnostic test. Or, in my field, if you’re trying to identify the one liability claim in 1,000 that will produce a massive jury award, your predictive model must be very, very good.

#### Session info:

## R version 3.1.1 (2014-07-10) ## Platform: x86_64-pc-linux-gnu (64-bit) ## ## locale: ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C ## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 ## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 ## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C ## [9] LC_ADDRESS=C LC_TELEPHONE=C ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] knitr_1.6 RWordPress_0.2-3 ## ## loaded via a namespace (and not attached): ## [1] digest_0.6.4 evaluate_0.5.5 formatR_0.10 htmltools_0.2.4 ## [5] markdown_0.7.2 RCurl_1.95-4.1 rmarkdown_0.2.50 stringr_0.6.2 ## [9] tools_3.1.1 XML_3.98-1.1 XMLRPC_0.3-0 yaml_2.1.13

From the original:

“The court tested the reliability of the witness under the same circumstances that existed on the night of the accident and concluded that the witness correctly identified each one of the two colours 80% of the time and failed 20% of the time.”

Does such a “concrete” prior actually exist in real life? Change the prior value, and the deduction changes. This is the Achilles Heel of Bayesian analysis. But, as the Bayesians (at least, some of them) admit, with a large enough sample, the Bayesian Posterior is asymptotic to the Frequentist Statistic. So, the only time Bayesian calculation makes a difference is when the Prior is fungible. And thus, why Frequentists (at least, some of them) don’t trust Bayesian analysis.