Group decisions: humans and machines

Group decisions

We often find ourselves making decisions as a group:
- A group of friends, planning an outing
- Members of an organization, deciding on a course of action
- Citizens in a republic, deciding who should represent them
Some take it as an article of faith that groups are smarter than individuals—as though more brains were an unalloyed good in any situation.

We know that some organizations are highly functional while others are highly dysfunctional. This is enough to show that the value of collaboration depends strongly on additional factors.
We are faced with this question: what are the conditions for effective group decision making?
It turns out that machine learning researchers have been thinking about this question for a while, and have produced fascinating answers.

Lessons from machine learning: ensemble prediction

When faced with a difficult prediction task, data scientists will often train many models to make the prediction, and then figure out a way to combine those predictions into a group prediction. The hope is that the group will be more accurate than any of the individuals.
We call this strategy ensemble prediction. Researchers have figured out multiple ways to combine models into ensembles; and have identified conditions that determine whether the ensemble will in fact perform better than its individuals.
We’ll describe some ensemble methods. Then we’ll apply the insights from machine learning to decision-making in human groups.

Bagging (TODO)

wiki

Random Forests (TODO)

wiki

Boosting (TODO)

Implications for human group decisions

The quality of our group decisions depends on our mechanism for combining brain-power.
- Bagging translates directly to a voting system for group decisions. Voting systems have been studied for a long time by public choice theorists; some systems are better than others.
  - First past the post. Used in American political elections. Known to have many flaws. Does not efficiently extract insight from individual voters. (In the political setting, it also has the undesirable consequence of producing two-party systems.)
  - Ranked choice voting. Often touted as a replacement for first past the post. Arrow’s impossibility theorem proves that ranked choice systems are unable to simultaneously satisfy various desirable notions of fairness.
  - Score voting. Most resembles bagging, but differs in that it does not normalize each voter’s distribution of preferences.
    - It has many desirable properties. Because it is a cardinal voting system and not an ordinal voting system, it side-steps Arrow’s infamous theorem. And it does in fact satisfy the kinds of fairness forbidden by the theorem.
    - A normalized version would address the critique that cardinal voting is based on a notion of utility, and that utility cannot be compared meaningfully between individuals. A normalized score system—identical to bagging—would be formulated as predicting which choice is best—not some notion of utility maximization.
  - In bagging, models are required to make their predictions independently of each other. This is a mathematical argument for the idea that voters ought to form their decisions independently—rather than blindly following someone else’s opinion.
- Adaptive boosting combines individual predictions via weighted vote. This suggests a voting system where different voters have different levels of credibility, dependent on their past performance.
The quality of our group decisions depends on the composition of our group.
- The success of bagging in general, and Random Forests/ExtraTrees in particular, suggests that a large amount of “cognitive diversity” is a good thing in voting-based decisions. This includes
  - Randomly chosen training sets \(\Leftrightarrow\) variety of background experience
  - Randomly chosen variables \(\Leftrightarrow\) variety of interest and attention
  - Randomly chosen split-points \(\Leftrightarrow\) variety of ways that different brains process the same information
- Bagging only works if each member of the ensemble performs better than random. We don’t get any improvement by combining bad models.
- Bagging is primarily a method for reducing the variance of a predictor. It doesn’t address prediction bias. Roughly speaking: as an ensemble of humans grows larger, the idiosyncratic cognitive flaws of its individuals will cancel out. But the average human cognitive patterns will persist. Humans tend to have high variance (e.g., their decisions might depend on whether they’ve eaten recently), so bagging still yields an advantage. Meanwhile, the cognitive diversity mentioned above will tend to decrease bias.
- Adaptive boosting can be thought of as a collection of specialists, selected in such a way that they compensate for each other’s weaknesses. Each individual may be quite weak—only slightly better than random—but they need to be selected very carefully and their votes combined with appropriate weighting.

\( \blacksquare\)

PS: It’s so funny, I checked today’s XKCD after writing this post. It turned out to be insanely relevant: https://xkcd.com/2225/