Dr. Michael J. Mazarr is Acting Director of the Strategy, Resources, and Doctrine Program in the Arroyo Center and a Senior Political Scientist at RAND.
Philip Tetlock has worked for decades on the problem of judgment in national security affairs. He became justly renowned for his book Expert Political Judgment: How Good Is It? How Can We Know? (Princeton University Press, 2006), which demonstrated, among other things, that foreign policy experts were no more accurate in their forecasts than “monkeys throwing darts.” Tetlock’s somewhat alarming finding led to a series of intriguing questions: Just how good can judgment become? Can we do better than the “experts”?
This innovative line of research laid the foundation for a new book, Superforecasting: The Art and Science of Prediction, co-authored with journalist Dan Gardner. It surveys techniques used by the most successful individuals and teams in Tetlock’s Good Judgment Project (GJP), a series of forecasting tournaments in which participating analysts, many from careers far removed from national security, make predictions on key issues: Will oil prices fall below $30 a barrel within a year? Will Japan decide to place troops on a disputed island in the next 6 months? The questions deal with discrete issues and are precise, asking about a particular event or choice. They also are framed within a specific period of time, from 1 month in advance to 1 year.
Tetlock has found that some people do in fact perform far better in such contests than others—repeatedly, reliably, and controlling for other variables. Top GJP forecasters beat a control group by 60 percent in the first year and by 78 percent in the second. As Tetlock states, they even “outperformed professional intelligence analysts with access to classified data.” One could quibble with the approach. The narrowest interpretation of these findings, for example, might not be that surprising. Confronted with precise questions dominated by a handful of known variables, forecasters who give exceptional care to facts and probabilistic guidelines such as base rates will surely do better than more casual dart throwers.
The project also risks equating forecasting with “judgment.” Tetlock himself admits that “foresight is one element of good judgment, but there are others.” Judgment ultimately is about what to do, and it is not guaranteed that people who excel at one will be good at the other. Someone who excels at using probabilistic methods to guess at the future price of corn might fail miserably at integrating the multiple strategic and political implications of a complex security choice.
This conflation speaks to a core assumption of the project—and a third possible objection. Tetlock is a numbers guy, interested in quantifiable results from probabilistic analysis. This is helpful to a certain extent. For complex, ambiguous national security decisions, however, it is not clear how far that is. Tetlock is explicit about this distinction—between linear or deterministic choices and thoroughly complex ones. He refers to the analytical challenges of “the butterfly dynamics of nonlinear systems” and uses the common metaphor of clouds and clocks to distinguish mechanisms whose variables and causal relationships are known from an unfolding complex system. He downplays the difference, however, describing the hard-and-fast distinctions as “false dichotomies.” Yet the problem of which strategy will best deal with Russia is a fundamentally cloud-like enterprise, and no forecasting-style probability exercise is likely to furnish an answer that is objectively better than others.
This is very likely one reason why senior leaders are so resistant to structured efforts to improve decisionmaking. At the end of the day, what they are doing is educated guesswork—and they know it. The most decisive factors in their choices are norms, values, political considerations, and bureaucratic constraints that cannot be assigned precise values. As a result, most such officials ascend to high office having built, usually over a long period of time, a well-honed, experience-based intuition that they trust more than any analytical method. (Tetlock recognizes this and cites research that demonstrates how in real decisionmaking settings, “these educated, accomplished people reverted to the intuitive.”)
Despite these concerns, Tetlock’s research—thoughtful, innovative, and arriving amid a tsunami of evidence about the risks to senior leaders of cognitive bias and thoughtless heuristics—demands to be taken seriously. More than that, it invites the U.S. Government to get more serious about the process of making national security decisions. Among other things, Tetlock’s research is one of the first large-scale empirical efforts to demonstrate the clear value of enhancing the rigor and quality of judgments. As his superforecasters suggest, exacting procedures do tend to improve results. They ask well-designed, critical questions and apply careful analytical methods. Furthermore, they ultimately find ways to understand issues more thoughtfully and accurately than people who ignore such methods on the way to a far more imprecise guess. Tetlock’s efforts have also demonstrated hopeful ways to put thinkers together in teams that self-correct their own analytical errors, rather than exacerbate them.
In this sense, Tetlock’s work complements the insight of such scholars as Daniel Kahneman, Paul Slovic, Robert Jervis, and many others who have been warning for decades about the risks of simplified and often biased cognitive patterns. And it is only a small leap from Tetlock’s findings to the context of complex national security judgments: An intuitive, emergent choice informed by and willing to take seriously the results of rigorous analysis will have a better batting average, even if the final judgment remains unavoidably subjective and impressionistic.
If we are to take seriously this line of thinking about thinking, it becomes clear that future U.S. administrations that are serious about the quality of their judgments no longer have any excuses. They ought to create more formalized decision analytical processes designed to maximize the rigor and accuracy of even complex choices.
This could involve, for example, an effort to build—probably on the base of a specialized unit in the National Security Council (NSC)—both the habits of mind and specific techniques and tools characteristic of superforecasting groups. Some questions or principles would be integrated into all interagency processes and policy documents, while some techniques would be applied to particular decisions, depending on their issue or character. Over time, paralleling Tetlock’s emphasis on outcomes, the effort could track the accuracy of various sub-judgments, that is, discovering where they were right and where wrong, and looking for consistent patterns.
This would be tremendously difficult to organize. Senior officials have little interest in being forced through analytical gymnastics to reach conclusions that can never be proved better than intuitive guesswork. Moreover, they will often lack the time needed to undertake anything more than a cursory process. A senior director for analytic methods at the NSC, however, could help shape the design of options papers, push groups to consistently ask the right questions, warn top decisionmakers about encroaching bias, and introduce more formalized decision techniques when time is available. The idea would not be to build an intricate, highly theoretical process, but to take elements likely to be present in any policy process—background papers, options papers, interagency dialogues, Cabinet-level meetings—and supercharge their analytical rigor.
There seems little doubt that formalizing such methods in the national security process, at least in slimmed-down versions appropriate to the pace of decisionmaking, would avoid the occasional disaster and create insights that generate new opportunities. At a minimum, now that research such as Tetlock’s has made clear the potential value of formally rigorous thinking, it would seem irresponsible not to find out. JFQ