Below the Surface of the Data Lake: An Ethnographic Case Study on the Detrimental Effect of Big Data Path Dependency at a Theme Park

Share Share Share Share Share


When evaluating the project, it became clear that the anthropologists spent the majority of their time on the project chasing a mirage. There were two reasons for this. Firstly, the current CDP included very little data on individuals and no data at all on groups. Capturing data on the customer experience, and not just the effect changes might have to the park’s performance, was thus a futile endeavour from the get-go. Secondly, the project did not have the mandate to define and suggest new data sources to be included in the CDP. This meant that the team could only work with the data sources included in the current CDP.

Before the data scientists made a bid for the project the theme park executives had written a request for proposal (RfP) detailing what sources they wanted to include in the CDP. These were made up of a mix between what data the theme park was already collecting, but would like to automate, and best-practice within the industry. The advantage of defining what data sources is needed upfront is that it lends itself well for comparison between different bidders: What is the difference in price? What are the strengths and weaknesses of their different approaches? Etc. The disadvantage is that a decision about what data to include is, to some extent, taken before knowing what data is needed. And once a contract has been signed, it is difficult to alter course. Thus, when the theme park executives signed the contract with the bidder they found most appealing, they also made a decision that meant it would be possible to measure the park’s operational performance but not the customer experience. This was not a deliberate decision, however, as they were unaware of the path dependency (Marquis & Tilcsik: 2013) their signatures created. And none of the proposals received by the park executives had challenged the wisdom of the inherent path dependencies created by the RfP.

Looking across other data integration projects that ReD Associates has been involved in, it is clear that there’s a lack of recognition of the path dependencies created already when formulating a RfP. One explanation is, that the impetus for most data integration projects seem to stem from a desire to digitalize the company’s current data practice rather than build a new digitally based strategy. To automate and consolidate current data collection, rather than rethink the new possibilities a digitalization of data could mean. The decision about what data to include thus largely becomes a common-sensical process, where companies formulate data integration RfPs based on what data they already collect, but haven’t digitalized. Then, once data collection has been automated and consolidated, many companies hope to take advantage of this new entity they have created by asking: What else might they be able to learn from the vast amount of data? A question that anthropologists might be hired to help answer, by coming up with new questions to ask the new big data entity. However, due to the often-ignored path dependency the answer to that question is often: Not very much. Having lots of data doesn’t necessarily translate into having the right data. As a result, many companies end up with big data platforms, but without the ability to answer some of the questions most fundamental to them. This was the result for the theme park described in this case-study, as the CDP left the theme park without any means to currently measure whether they deliver a good customer experience. In recognizing the detrimental effect that path dependency based on early data decisions can have for companies’ big data projects, the authors of this case study suggest two new ways going forward.

Firstly, many companies would likely benefit from an open-ended analysis of their data needs, one that is not limited to what data is currently important for the company, before finalizing what data to include in a new data platform. This will enable companies to make an informed decision about what data sources to prioritize – i.e. a digitalization strategy. This suggested approach is illustrated in fig. 4 below:


Figure 4: To build a digitalization strategy companies should first conduct an analysis of what they should use the data for (red arrow), rather than just digitalizing their current data sources (blue arrow), which is how most companies progress today.

Had the ethnographic research described in this case-study been carried out prior to the establishment of the CDP the theme park’s executives might have prioritized e.g. geo-location data over digitalizing data on the average length of stay of employees. This example illustrates how anthropologists can help companies build a digitalization strategy, by identifying what is important to measure and what data implications it entails. Prompting companies to think through their data strategy, rather than just digitalizing their current setup.

Secondly, there’s also a learning directed at anthropologists and data scientists staffed on projects with data sources clearly defined in a RfP. In hindsight, the anthropologists involved in the project described above were naïve in thinking that the CDP would be able to conjure up a measurement for whatever they found to be important for the customer experience and the data scientists were over-confident in thinking that they could. Had the anthropologists looked below the surface of the data lake early on in the project, or had the data scientists been more critical towards the possibilities with the current data sources, they would both have found that there was very little data likely to be relevant for measuring the customer experience. This could have prompted a conversation about how best to proceed, rather than chasing the mirage of trying to build a bonding index with the data available within the current CDP. A recommendation for future projects aimed at combining thick- and big data would thus be to start out with a critical assessment of the available data sources – understanding them as important actants (Latour: 1987, 1993, 2005) for the project. If these turn out to likely be incompatible with the hoped-for outcome, a re-scoping of the project will be necessary. Furthermore, even if the available data sources seem relevant it would be recommendable to try and build in the flexibility of including new ones. Had the CDP, for example, included a lot of data on individuals an early assessment of the data sources might not have raised any red flags. However, as the anthropologists’ analysis found, what mattered most for the customer experience was the intra-group bonding. Thus, despite a lack of red flags in an early assessment of the data sources due to a lot of data on individuals, the CDP might still not be able to measure the thing that mattered most for a good customer experience if it wasn’t able to capture data on groups. It would therefore be advisable for projects seeking to combine the strengths of both thick- and big data to avoid a strictly narrow scope, see fig. 5 below:


Figure 5: Projects aimed at combining the strengths of thick- and big data should avoid narrow scopes, in order to make room for the explorative power of the anthropological method.

The core strength of the anthropological method is its ability to understand people on their own terms (Malinowski: 1922), unbiased by pre-conceived ideas of what might and might not matter to them. This ability is what enables anthropologists to produce data that’s deep, meaningful and original. A narrow scope risks short-circuiting this ability, if the anthropologists’ understanding of the human experience has to take fit within a pre-confined set of metrics. Data scientists, on the other hand, often thrive in such conditions, where they can apply mathematical modelling to determine the most important factors in a defined multi-dimensional space – i.e. a dataset with a limited number of variables. However, as illustrated and argued in this case-study, the open-ended anthropological approach can have great value for data digitalization projects, by answering the question: What data should be included in a big data platform?


When companies consolidate their data in large, digital data platforms they increasingly start asking what else they might be able to do with this new asset. What other questions might it help them answer? What other truths about their company and their customers does it hold? This case-study has illustrated why many executives are likely in for a disappointment, when asking questions like these after a digital data platform has been created. Having lots of data doesn’t necessarily enable companies to answer their most fundamental questions. Especially considering that the impetus for many data integration projects is to make existing data practices faster and smarter – not to answer new questions. This case-study argues for a new approach for companies, in suggesting that they start with identifying what questions are most important for them to answer before deciding on what data to collect. This will enable the companies to be strategic about their digitalization and make informed prioritizations about what data to collect when. Anthropologists are adept at identifying the fundamental questions that companies should ask at the beginning of large data integration projects. For these reasons, companies embarking on projects entailing the collection of large amounts of data would be wise to start out with an anthropological analysis.

2018 Ethnographic Praxis in Industry Conference Proceedings, pp. 631–645, ISSN 1559-8918


Bauman, Zygmunt
2000.     Liquid Modernity. Cambridge. Polity Press.

Brines, Julie & Serafini, Brian
2016.     Is divorce seasonal? UW research shows biannual spike in divorce filings. In University of Washington News, August 21st, 2016.

2015.     Roller-coaster’s ‘weird sensations’ perceived differently with age. CBS News. Accessed July 2nd, 2018.

Geertz, Clifford.
1973.     The Interpretation of Cultures. New York. Basic Books.

Kuang, Cliff.
2015.     Disney’s $1 Billion Bet on a Magical Wristband. Wired Magazine. Accessed July 3rd, 2018.

Latour, Bruno.
1987.     Science in action: how to follow scientists and engineer through society. Cambridge, Massachusetts: Harvard University Press.

Latour, Bruno.
1993.     We have never been modern. Translated by Catherine Porter. Cambridge, Massachusetts: Harvard University Press.

Latour, Bruno.
2005.     Reassembling the social: an introduction to actor-network theory. Oxford New York: Oxford University Press.

Malinowski, Bronislaw.
1922.     Argonauts of the Western Pacific: An Account of Native Enterprise and Adventure in the Archipelagos of Melanesian New Guinea. Illinois. Waveland Press, Inc.

Marquis, Christopher & Tilcsik, András.
2013.     Imprinting: Toward a Multilevel Theory. Academy of Management Annals: 193-243.

Stein, Brian & Morrison, Alan.
2014.     The enterprise data lake: Better integration and deeper analytics. Accessed September 8th 2018.

Pages: 1 2 3 4

Leave a Reply