Why bother with theory when you've got big data?

25 Mar 2020

When I do a search for the term “big data,” I get 8.47 billion results in just over a half second. On the first page of my search results there is a listing of the most recent articles on big data as well as big data services offerings. But followers of Disruptive Innovation and related theories know that we take a theory-first approach to understanding the world.

First, by “theory” we mean something devilishly simple: a statement of what causes what and why (see Scientific Process). But what we’ve come to understand is that rather than pursue causality, there are huge investments in artificial intelligence, machine learning and data science.

Second, while most people know the dictum that correlation is not causation, they speak and behave in a way that’s inconsistent with this. In fact, the most ardent promoters of big data claim that as we master data, we won’t need the scientific method or theory building.

This is nonsense. And I’d like to use an example from one of my former teachers, K. Codell Carter, a historian and philosopher of medicine. During the late 19th century research changed from being empirically driven to being causally driven—in our terms theory driven. One example from the book Carter cites about the different trajectories that causal and non-causal thinking in curing disease is a comparison of scurvy and beri beri—both diseases caused by a deficiency of a single nutrient.

Scurvy, a severe vitamin C deficiency, was described by the famed father of medicine Hippocrates (460 BC - 370 BC), but not understood causally until 1927. Throughout the two millennia that preceded a true understanding of the cause of scurvy many remedies were prescribed, almost all of them close to the true cause of scurvy. And by the 1830s something like clinical guidelines were developed for the prevention and treatment of the disorder. Fresh meat and citrus generally contain enough vitamin C to cure scurvy. However, lacking a deep understanding of the mechanism for scurvy led to confusion among physicians and researchers. For example, the storage and transport of citrus juices in copper or their exposure to air and sunlight degraded vitamin C content to sub-therapeutic levels, often meaning that their interventions failed. Researchers got what we might refer to as “noisy” data from their clinical trials with citrus juices and meat. Sometimes lime juice cured scurvy and sometimes not. To make matters worse, the advent of faster steam ships and a proliferation of ports got sailors from port to port faster, where they could consume good sources of vitamin C. So although the on ship sources of vitamin C had been attenuated to the point of being ineffective, sailors reached port sooner, where they got better nutrition and so felt few, if any effects, of scurvy. Without a drive to understand the causal mechanism for scurvy, these guidelines did not evolve much. Researchers were satisfied with knowing that (most of the time) lime juice cured scurvy. And so medical discoveries and powers stalled.

Beriberi, an insufficiency of vitamin B1, had a different trajectory. Although known in the 17th century, the serious study of beriberi did not arise until the 1880s. At this point, the causal approach to understanding and curing disease had risen to some prominence. Because of their causal mindset and theory-driven approach, in fact, by 1884, a Japanese researcher had determined that beriberi was only found among sailors whose diet consisted almost entirely of polished rice. One can imagine empirical researchers being satisfied with knowing this and working to ensure that sailors had a more diverse diet, as they had been satisfied knowing that usually meat or citrus cured scurvy. However, where scurvy researchers stopped due to pretty good empirical work, the drive toward causal mechanisms kept scientific progress on beriberi moving forward. In the 1880s researchers had done controlled animal studies, and shortly thereafter settled definitively that a dietary deficiency from some element of unpolished rice caused beriberi. And by 1913 researchers knew with certainty the very extract of rice bran whose lack was the cause beriberi.

Big data is just more sophisticated empiricism, and sometimes that’s just fine. If tests on our data show a tight fit to a regression line—if they have a high r-squared—we might be able to make future predictions that are sufficient to solve our problems. However, while literally billions of dollars are being poured into big data’s statistical (i.e., non-causal) methods, it’s worth remembering that sound application of the scientific method is both more powerful in its implications for all sorts of ills, and is also vastly cheaper and faster. At the Christensen Institute, we have found the power of theory for many issues in management research and application. And we want to invite skeptics and boosters alike to approach their work by developing sound causal theories.

Why bother with theory when you've got big data? is built on the principles we teach in our live, online Product Science Bootcamp.

Stuck on a business problem? Don’t have time to attend our bootcamp?
Bring it with you to our Unstuck Assist session and Get Unstuck.

Free Consult