An Embarrassment of Data: Why Businesses Should Focus on Hypothesis Building

Share Share Share Share Share

Collecting data doesn't create value on its own – businesses need to focus on building capabilities and honing strategy.

by CYRIL MAURY, Stripe Partners

"What a useful thing a pocket-map is!" I remarked.

"That's another thing we've learned from your Nation," said Mein Herr, "map-making. But we've carried it much further than you. What do you consider the largest map that would be useful?"

"About six inches to the mile."

"Only six inches!" exclaimed Mein Herr. "We very soon got to six yards to the mile. Then we tried a hundred yards to the mile. And then came the grandest idea of all! We actually made a map of the country, on the scale of a mile to the mile!"

"Have you used it much?" I enquired.

"It has never been spread out, yet," said Mein Herr: "the farmers objected: they said it would cover the whole country, and shut out the sunlight! So we now use the country itself, as its own map, and I assure you it does nearly as well."

—Lewis Carroll, Sylvie and Bruno Concluded, Chapter XI

Backup 20,038 photos to save 1.2gb of memory on your phone. My iPhone recently gave me a sensible nudge – and a reminder of how many pictures I take. I know I never go back to look at them, so why do I continue taking thousands of photographs, for them to just clog the memory of my phone?

I do it because it gives me an option: the possibility to look at them in the future. But since I never actually exercise that option, isn’t it really just a case of self-deception? What value, if any, will come from this accumulation of data? This story encapsulates a question central to innovation and business strategy: is more data always a good thing?

“When you go from 10,000 training examples to 10 billion training examples, it all starts to work. Data trumps everything.”
Gary Kasparov, Deep Thinking (2017)

Google translate started to work once it was fed millions of books in multiple languages. The first successful neural network image recognition software only worked after humans labeled millions of images, building a huge data set machines could “learn” from (the largest public data set, Image-net, currently holds 14,197,122 labeled images). More recently, GPT-3, which uses deep learning to produce human-like text, was trained on 45TB of text data. This represented 510 billion tokens of information (tokens are parts of words, and the basic unit used in NLP algorithms to encode data).

It’s true that in these specific cases, more data means better predictions. Pundits often extrapolate from this that more data is necessarily a good thing. But this overlooks the fact that data gathering always comes at a cost.

I recently took part in an innovation project for a large financial services company. It already had access to troves of client data (finances, purchasing patterns, sociodemographic information, qualitative insights, etc.). Still, at the beginning of the engagement, the client's main preoccupation was how to develop services to collect more user data.

As the project progressed, there was a collective realisation that this objective was all but a distraction: the company's immediate strategic problem really should be how to use the data they already had in order to create value for clients and develop a strategic advantage. But this required a different ability: that of making hypotheses as to how.

More Data Feels Like a Safe Bet

This "more data fallacy" is rather common. One notable example is loyalty programs. There’s nothing intrinsically wrong with them, but all too often they are built without clear answers to two foundational questions: How do they benefit users? and, What business value is created with the data they are designed to collect? Those questions are particularly important in a context of increased focus on privacy, with users ever more cautious about giving companies unfettered access to their personal data.

This tendency towards ever more data gathering may be to do with the increasingly complex business environment. Taking immediate action is daunting: with so many changing variables, how can one be sure that they’re not making a terrible decision? In contrast, gathering more data is always a safe bet. Like me with my iPhone photos, collecting is reassuring: it feels like you are laying the foundation for future business value.

Using Existing Data to Hone Strategy

Having to decide between gathering more data or acting based on what you already know isn't limited to business practitioners. Let us take a completely different example. The neural doctrine theory, proposed in 1884 by Santiago Ramon y Cajal (which astonishingly still holds true today) is the fruit of the scientist’s unique ability to devise a revolutionary set of hypotheses based on fresh data (neurons colorized with silver tint). The technique employed by Cajal to tint the neurons was invented by another scientist, Camilo Golgi. Yet, while Golgi colorized thousands of cells, he failed to use that newfound data to inform new hypotheses. He had the data but lacked the intuition to make sense of it.

Scientists are constantly wrestling with this fundamental epistemological question – do you put efforts on gathering more data (or on developing new methods to gather data) or on making sense of the data you already have?  Since the time of Cajal and Golgi, neuroscientists have continued to argue for both positions. In the past decade, advances in neuroimaging and computational power favored the proponents of data-gathering. This materialized in a number of initiatives focused on extensively mapping the brain, with the hope that once the map becomes precise enough, large simulations would naturally reveal the inner mechanics of the mind.

But in the last few years, a growing number of neuroscientists have been questioning this strategy, arguing that it only delays having to tackle “the hard problem”. For them, data is already plentiful and trying to gather more of it carries a huge opportunity cost: it takes funding away from research aimed at better understanding how the brain works based on the imperfect data that we already have.

As technology becomes ever more sophisticated, so does the temptation to think that, if fed enough data, some external entity – call it predictive analytics, deep learning, or artificial intelligence – will relieve humans from the burden of elaborating hypotheses, plans and strategies. Our argument at Stripe Partners, which still holds so far, is that it will not.

Focus on Building the Right Capabilities

In science, but more to the point in business, the organizational structures in place tend to lead decision-makers to over-invest in collecting data at the expense of spending time and effort creating value from existing data.

We found this to be particularly true for large legacy companies, where executives live in the constant—and legitimate—fear of being disrupted by emerging digital players. Yet what they often fail to realize is that in order to become a data-driven company, efforts must be first put on building the right capabilities (technological and cultural) to make sense and create value from data.

At Stripe Partners, we believe the data collection fetishism stems from an over appreciation of explicit knowledge at the expense of tacit knowledge. To paraphrase Polanyi, people know more than they can tell. It is the sum of one's experiences that allows them to recognise patterns in data and develop valuable hypotheses.

While essential, this ability to develop pointed hypotheses is seldom recognised, let alone actively developed by organisations. Stripe Partners has championed the idea that "embodied knowledge", the deep and practical understanding we acquire through our direct experience of the world, is the bedrock that underpins that ability to get an intuition for making fruitful hypotheses. We have argued that “ethnographic research, with its commitment to understanding through immersion and engagement in social fields produces dexterous, intuitive and practical cultural knowledge” for strategic collective action in organizations.

Virtuous innovation cycle, adapted from Dubberly, Evenson, Robinson (2008). While organisations often overinvest in data gathering, they neglect developing their hypotheses building capabilities.

There are a number of ways to develop that "hypothesis building" muscle. We will highlight two, which are at the heart of most of our client engagements:

  • Get as much close contact with your customers or users as possible. Talking to them and physically experiencing how they use your products or services is a powerful way to develop an instinct for what would actually improve their lives.
  • Remember that within your organization lies a wealth of tacit knowledge. From the C-suite to frontline workers, each employee has developed habits, heuristics and perspectives that they use everyday to make decisions. This too, is data. Ensuring that your company fosters a culture of curiosity and multidisciplinary communication will go a long way to help employees develop an instinct for how the business might create value for its users.

To be more comfortable making hypotheses based on existing data, business executives and social scientists working with businesses can ask the following questions:

  • What data does your company already have—both deep tacit knowledge and structured quantitative data?
  • How might you encourage conversations between employees from different departments and with different profiles and experiences?
  • Who within the company might have developed knowledge that is not formalised?
  • How might you surface that knowledge so it can be beneficial more widely within the organization?
  • How can you create more opportunities for employees to spend time immersed with customers or users?

Investing resources to gather data is necessary. Yet, to be fruitful it must be balanced by an equal measure of cultural capabilities and inspiration. Data should be an incentive, rather than a substitute, to strategic thinking and experiential learning.


Carroll, L. (2013). Sylvie and Bruno. Western Standard Publishing Company.

Dubberly, H.; Evenson, S. (2008). The Analysis-Synthesis Bridge Model. Interactions 15(2), 57–61.

Guitchounts, G. (2020). An Existential Crisis in Neuroscience. Nautilus 81: Maps, January 23.

Kasparov, G. and Greengard, M. (2017). Deep Thinking: Where Machine Intelligence Ends and Human Creativity Begins. John Murray.

Polanyi, M. and Sen, A. K. (2013). The Tacit Dimension. The University of Chicago Press.

Roberts, S. and Hoy, T. (2015). Knowing That and Knowing How: Towards Embodied Strategy. EPIC2015 Proceedings.

Washington, C., Hsu, S. and Kasthuri, B. (2019, July 23). Bobby Kasthuri & Brain Mapping. Manifold Podcast, Episode #2, July 23.

Image: "Abstract" by Etienne via flickr (CC BY-NC-ND 2.0).

Cyril Maury is a seasoned strategy and innovation practitioner, whose expertise centers on helping companies develop new business models to unlock growth opportunities. Having lived in Latin America and the Middle East, he particularly enjoys untangling the operational, organisational and cultural complexities inherent in expansion in emerging markets. He is now a Director at Stripe Partners.

Stripe Partners is an EPIC2021 Sponsor. Sponsor support enables our unique annual conference program that is curated by independent committees and invites diverse, critical perspectives.


Beyond User Needs: A Meaning-Oriented Approach to Recommender Systems, Iveta Hajdakova

Human and Artificial Intelligence: The Same, Different or Differentiated? , Simon Roberts

Going with the Gut: The Case for Combining Instinct and Data, Simon Roberts

Leave a Reply