Evaluating Ben Franklin’s Alternative to Regression Models for Decision Making
Recently, Gwern pointed me to a blog post by Chris Stucchio that makes the impressive-sounding claim that “a pro/con list is 75% as good as [linear regression], which he goes on to show based on a simulation. I was intrigued, as this seemed counterintuitive. I thought making choices would be a bit harder than that, especially when you have lots of choices — and it is, kind of. But first, let’s setup the problem motivation, before I show you pretty graphs of how it performs.
Let’s posit a decision maker with a set of options, each of which has some number of characteristics that they have preferences about. How should they choose? It’s not easy to figure out exactly which option they would like the most — especially if you want to get the perfect answer! Decision theory has a panoply of tools, like Multi-Attribute Decision Theory, each with whole books written about them. But you don’t want to spend $20,000 on consultants and model building to choose what ice cream to order; those methods are complicated, and you have a relatively simple decision.
For example, someone is choosing a car. They know that they want fuel efficiency of more than 30 miles per gallon, they want at least 5 seats for their whole family to fit, they prefer a sedan to an SUV or small car, and they would like it to cost under $15,000. Specifying how much they care about each, however, is hard; do they care about price twice as much as the number of seats? Do they care about fuel efficiency more or less than speed?
Instead of asking people to specify their utility function, as many decision theory methods would require, most people just look at the options and pick the one they like most. That works OK, but given cognitive biases and sales pitches that convince them to do something they’ll regret later, a person might be better off with something a bit more structured. That’s where Chris brings in Ben Franklin’s advice.
…my Way is, to divide half a Sheet of Paper by a Line into two Columns, writing over the one Pro, and over the other Con. Then…I put down under the different Heads short Hints of the different Motives…I find at length where the Ballance lies…I come to a Determination accordingly.
Chris interprets “where the Ballance lies” as which list, Pro or Con, has more entries.
The question he asks is how much worse this fairly basic method, which is uses a statistical method referred to as “Unit-Weighted Regression,” is than a more complex regression model with exact preference weights.
Where did “75% as Good” come from?
Chris set up a simulation that showed that, given two random choices and random rankings, with a high number of attributes to consider, 75% of the time the choice given by Ben Franklin’s method is the same as that given by a method that uses the (usually unknown) exact preference weights. This is helpful, since we frequently don’t have enough data to arrive at a good approximation of those weights when considering a decision. (For example, we may want to assist senior management with a decision, but we don’t want to pester them with lots of questions in order to elicit their preferences.)
Following the simulation, he proves that, given certain assumptions, this bound is exact. I’m not going to get into those assumptions, but I will note that they probably overstate the actual error rate in the given case; most of the time, there are not many features, and when there are, features that have very low weights wouldn’t be included, which will help the classification, as I’ll show below.
But first, there’s a different problem; he only talks about 2 options. So let’s get to my question, and then back to our car buyer.
It should be fairly intuitive that picking the best option is harder given more choices. If we picked randomly between two options, we’d get the right choice 50% of the time, without even a pro-con list. (And coin-flipping might be a good idea if you’re not sure what to do — Steven Levitt tried it, and according to the NBER working paper he wrote, it’s surprisingly effective. Despite this, most people don’t like the idea.)
But most choices have more than two options, and that makes the problem harder. First, I don’t have any fair three-sided coins. And second, our random guess now gets it right only a third of the time. But how does Ben Franklin’s method do?
First, this shows the case Chris analyzed, with only two options, compared to 3;
The method does slightly worse, but it’s almost as good as long as there aren’t lots of dimensions. Intuitively, that makes sense; when there are only a couple of things you care about, one of the options probably has more than the other— so unless one of the options is much more important than the others, it’s unlikely that the weights make a big difference. We can check this intuition by looking at our performance with many more options;
With only a few things that we care about, pro/con lists still perform incredibly well, even when there are tons of choices. In fact, with few enough features, it performs even better. This makes sense; if there is a choice that is clearly best we can pick it, since it has everything we want. This is part of the problem with how the problem was set up; we are looking at whether each item has or doesn’t have the thing we want — not the value.
If we have a lot of cars to choose from, and we only care about the 4 things we listed, (30 MPG, 5 seats, Sedan, cost < $15,000), picking one that satisfies all of our preferences is easy. But that doesn’t mean we pick the best one! Given a choice between a five-seater sedan that gets 40 MPG and costs $14,000 or one that gets 32 MPG and costs $14,995, our methods calls it a tie. (It’s “correct” because we assumed each feature is binary.) There are plenty of algorithmic ways to get around this that are a bit more complex, but any manual pro/con list would make this difference apparent without adding complexity.
Interestingly, however, with many choices, the methods starts working much worse with many feature dimensions. Why? In a sense, it’s actually because we don’t have enough choices. But first, let’s talk about weak preferences, and why they make the problem seem harder than it really is.
If we actually have a list of 10 or 15 features, odds are good that some of them don’t really matter. In algorithm design, we need a computer to make decisions without asking us, so a binary classifier can have problems picking the best of many choices with lots of features — but people don’t have that issue.
If I were to give you a list of 10 things you might care about for a car, some of them won’t matter to you nearly as much as others. So… if we drop elements of the pro/con list that are less than 1/5 as important as the average, how does the method perform?
And this is why I suggested above that when building a Pro/Con list, we normally leave off really low importance items — and that helps a bit, usually.
When we have lots of choices, the low importance features add noise, not useful information;
Of course, we need to be careful, because it’s not that simple! Dropping features when we don’t have very many is a bad idea — we’ll miss the best choice.
The Curse of Dimensionality versus Irrelevant Metrics
We can drop low importance features, but why does the method work so much worse with more features in the first place? Because, given a lot of features, there are a huge number of possibilities. 5 features allows 2⁵ possibilities — 32. Anything that has all 32 that we want (or most of them,) will be the best choice — and ignoring some of them, even if they are low weight, will miss that. If we have 50 features, though, we’ll never have 2⁵⁰ options to find one that has everything we might want — so we want to pay attention to the most important features. And that’s the curse of dimensionality.
If I were really a statistician, that would be an answer. But as a decision theorist, that actually means that our metric is a problem. Picking bad metrics can be a disaster as I have argued at length elsewhere. And our car buyer shows us why.
There are easily a hundred dimensions we could consider when buying a car. Looking at the engine alone, we might look at torque, horsepower, and top speed, to name a few. But most of these options are irrelevant, so we would ignore them in favor of the 4 things we really care about, listed above; picking a car with the best engine torque that didn’t seat 5 would be a massive failure.
And in our analysis here, these dimensions are collapsed into a binary, both in our heuristic pro/con list, and in the base case we compared against! As mentioned earlier, this ignores the difference between 32 MPG and 40 MPG, or between $14,000 and $14,995 — both differences we do care about.
And that’s where I think Ben Franklin is cleverer than we gave him credit for initially. He says “I find at length where the Ballance lies…I come to a Determination accordingly.” That sounds like he’s going to list the options, think about the Pros and Cons, and then make a decision — not on the basis of which list is longer — but simply by looking at the question with the information presented clearly.
Note: Code to generate the graphs in R can be found here; https://github.com/davidmanheim/Random-Stuff/blob/master/MultiOption_Pro_Con_Graphs.R