Why is “data the new oil?”

The mainstream media phrase “data is the new oil” sets a very high bar for data. In this blogpost, we’ll examine if and why this phrase holds truth. 

September 06, 2017
• by
Vivek Sharma

The importance of oil to the global economy can hardly be overstated. From a demand perspective, oil and its derivatives are critical to many industries – automotive, aviation, and industrial, to name a few. As a product, oil does not have any direct substitutes matching its scale. From a supply perspective, only a handful of countries (OPEC) produce enough oil beyond their own needs. With this context, the mainstream media phrase “data is the new oil” sets a very high bar for data. In this blogpost, we’ll examine if and why this phrase holds truth. 

Every time we use a digital device, we create information that is then stored digitally in bits or binary digits. In 2016 alone, 3.4 billion people spent over 6 hours per day on the internet which, in addition to the Internet of Things and industrial internet, contributed to 2.5 quintillion bytes of data created every day. 80% of all data available today was created in just last 2 years and, per IDG, we will be creating 163 zettabytes (1 zettabyte = trillion GB) of data annually by 2026.  A large portion of this data is generated through user facing use cases by a few leading technology companies like Amazon (eCommerce, cloud), Facebook (social, chat), and Google (search, browser, cloud). This market power is likely to further consolidate in the near future for three key reasons.

First, these companies are merging offline data with online data. Even today, 90% of US retail and 70% of our daily life happens offline, which creates significant opportunity for these companies to buy offline data from third parties and 'mash it up' with online data to have a 360 view on consumer activities. Niraj Dawar explains further in Has Google finally proven that online ads cause offline purchases.

Last week [June 2017] Google announced that, in an effort to bridge the “online ad–offline purchase” gap, it will begin to connect online ad exposure to brick-and-mortar sales. The company claims it will be able to track about 70% of all credit and debit card transactions and link them to online consumer behavior. Google’s move is also a competitive response to Facebook’s partnership with Square and Marketo [allowing] Facebook to track consumer store visits and some transactions…”

Second, they are using powerful technology tools to drive real time insights from large amounts of unstructured data. When we say the word “data,” we assume that it is structured, or clearly defined through a format, table, or common identifier. However, majority of the data being generated today is either semi-structured (e.g. social media posts) or unstructured (e.g. text, video, and image files). Rob Kitchin provides a great overview of challenges presented in analyzing unstructured data in his book The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences.

... unstructured data do not have a defined data model or common identifiable structure. Each individual element, such as narrative text or photo, may have a specific structure or format, but not all data within a dataset share the same structure. As such, while they can be searched and queried, they are not easily combined or computationally analyzed… [but] some estimates suggest that such data are growing at 15 times the rate of structured data...

Storing and analyzing huge amount of unstructured data requires sophisticated technology tools like Hadoop, Artificial Intelligence (AI), and Machine learning (MLR), that offer a natural advantage for the technology firms. Furthermore, the effectiveness of those tools gets exponentially better as they process more data.

Third, these companies are organizing data around an individual. In the early days of the internet, there was an adage about internet anonymity: "on the Internet, nobody knows you're a dog.” That saying no longer holds much truth as more and more aspects of online and offline activity are organized around individuals, thus giving powerful insights to drive personalization and micro targeting. A Federal Trade Commission study revealed the extent to which businesses are collecting detailed information on individuals.

"Data collected could include bankruptcy information, voting registration, consumer purchase data, web browsing activities, warranty registrations, and other details of consumers’ everyday interactions. While each data broker source may provide only a few data elements about a consumer’s activities, data brokers can put all of these data elements together to form a more detailed composite of the consumer’s life. For example, one of the nine data brokers has 3000 data segments for nearly every U.S. consumer…."

Remember back in the day when you needed a Thomas Guide to go anywhere? Around 15 years ago, that need transitioned from static to dynamic when user dependency moved to navigation technology with location awareness. Nowadays, I know very few people that go anywhere without first plugging the address (even one they know by heart) into Google Maps to get a recommendation on the fastest route that takes into account both the distance and real-time traffic conditions.

Across industries, the base user expectation includes getting real time, personalized recommendations through computational analytics; but, only a small set of players can play this game, giving them a potentially insurmountable competitive advantage. With an inelastic demand, lack of a product substitution and an oligopolistic supply side, data truly is the new oil!