In an era of unprecedented uncertainty, executives are demanding new data and analytics capabilities to support ongoing changes to decision making to address the crisis, become more resilient, and, eventually, prepare for the recovery. Today companies are increasingly seeking better insights by tapping into third-party data. This data can include almost anything, from Covid-19 related data and private company-related information to consumer behavior measurement, indexed foot traffic data, and workforce population data.
According to Chris Casey, Worldwide Head of Business Development AWS Data Exchange, “In today’s data-driven world speed to insights is critical and in many cases a true competitive advantage for many organizations. We all want to make data-driven decision-making easier and derive value from data much faster than we have ever been able to do in the past.”
Significance of Third-party Data
As most business and technology professionals know, the volume of data is increasing exponentially, and having access to timely relevant third-party data and the organization’s internal dataset is crucial for them to be successful in driving decision-making in their organization.
Third-party data sources can help businesses personalize and gain new revenue streams by launching new products or services, enhancing risk visibility and mitigation, and better-anticipating shifts in demand for products and services.
At the Data Lake Summit 2020, Chris Casey delivered a keynote on the importance of access to third-party data in your data lake. He shared the significance of having quick and seamless access to discover and activate third-party data, especially when unplanned or fast-changing events unfold and require different inputs to forecast and adjust your business strategy.
Watch his keynote session here:
As part of their mission in February 2020, Casey pointed out that AWS Data Exchange began working with their customers to source third-party data to help academics, researchers, and the healthcare community triage Covid-19-related issues. “At AWS, we work with organizations to make Covid-19-related data from various sources such as public records, foot traffic, business visitation patterns, and economic activity. Today we have more than 150 Covid-19 related data products on AWS data exchange,” he said.
How Organizations Are Leveraging Third-Party Data
Casey highlighted a few case studies on how their customers were able to achieve Covid-19 related innovations:
In one case, AWS customer Chan Zuckerberg BIOHUB was able to combine Covid-19 data from the AWS Data lake with their third-party data to help them in their predictive models to predict Covid-19 epidemiology. Similarly, the University of Texas at Austin uses free data from X-mode via AWS Data Exchange to develop an app that will allow its users to avoid Covid-19 hotspots and be made aware of their exposure to Covid-19.
In another case, a global package goods company is leveraging data from AWS Data Exchange to better estimate demand across their retail partners and Covid-19 cases data to improve their predictive models around employees and clients.
Closing Thoughts
As Covid-19 continues to append our lives, people in every sector and country turned to data to stay informed, share information, and respond with confidence. “At AWS, we believed the one way we could help was to provide experts with the data and tools they needed to understand, track, plan for, contain to neutralize covid-19,” Casey opined.
The AWS data lake team established a public data lake on Covid -19 in April 2020, leveraging data from AWS Data Exchange’s data providers. The AWS Covid-19 data lake currently allows experiments to quickly run the analysis on the data without wasting time extracting and wrangling data from available data sources. The organizations can use AWS or Qubole to perform trend analysis, do keyword searches, perform Q&A analysis, run machine learning models, or do custom analysis to meet their organizations’ specific needs.
AWS Data Exchange makes subscribing to and providing data easy. It is one place to exchange all kinds of non-sensitive data. It natively integrates data into AWS and its Partner Technology and provides easy subscription management and rapid and secure billing data provisioning. AWS Data Exchange gives you single data ingestion and mechanism to feed data into your data lake or cloud data warehouse running on AWS. It also allows you to leverage all the capabilities Qubole offers to use the data within your organization.
At Qubole, we have put the following considerations at the forefront of our data platform’s design to help organizations fulfill their requirements of building third-party repositories better and faster:
- The QDS platform supports full transactionality on a data lake, regardless of the cloud—AWS, GCP, or Azure.
- It provides built-in support for delete operations, enabling customers to comply with regulatory and privacy requirements for ‘Right to Erasure’ within established SLAs.
- You can write directly to cloud object stores, thus eliminating extra overhead while guaranteeing data integrity at the best performance possible.
Most importantly, we continue to provide freedom of choice of data processing engine – Apache Spark, Presto, Hive, etc.—with a full implementation of ACID capabilities based on Hive transactional tables.