A recent article from McKinsey claimed that on average only 15 percent of big data projects meet expectations. For all the promise that big data holds, very few companies have been able to really extract the full potential from the data they collect and store these days. And we do collect a lot more data today: with the increase in connectivity and proliferation of data-producing devices — the Internet of Things (IoT), mobile, social media, application, and machine logs — enterprises are capturing and processing unprecedented amounts of data globally. The IDC estimated that the total amount of data will increase by 440 percent by 2020.
Yet enterprises as a whole are leaving a lot of value on the table and face difficulties translating amassed data into actionable insights. McKinsey estimates that more than 70 percent of the potential value of all data is unrealized, and only one percent of big data captured in an unstructured format is analyzed or put to use.
In short, we are getting really good at capturing a lot of data. However, ensuring this data is available to users to inform business decisions usually reveals big problems with the economics of scaling and making data available across all consumption points.
It’s Time for a Different Approach to Big Data
To expand from a few focused projects and transition to a truly data-driven business — where data informs every business decision — organizations need to focus on three areas:
- A cloud-first approach
- The scalability and elasticity of the cloud are very well suited for the bursty nature of big data workloads. There is no upfront cost for experimentation, so you can scale capacity up or down depending on demand.
- Avoid technology and people silos
- Different tasks in the big data lifecycle — ETL, machine learning, BI — use different engines. Leveraging multiple engines occurs because certain tasks perform better with different engines, or because different engines may be preferable depending on the background of the data team.
- Boost productivity and time to value with self-serve access
- As more users require access to data, it is critical to ensure the data team is not a bottleneck. If the data team continues to be responsible for manually provisioning users and creating data sets, it becomes impossible to scale. Consider that a typical on-premise big data infrastructure has a 1:5 or 1:10 ratio of administrators to users. If you want to scale up to 500 users, you will have to hire 50 administrators — imagine the cost overhead associated with that between salaries and administrator overhead.
The cloud approach provides data teams with five key capabilities necessary to enable users with access to data:
Scalability: With the cloud, there are no limitations on the amount of data that can be processed. Simply scale up resources when you need to match demand. This ensures that anyone who needs data processing can do so and only pay for the compute they use.
Elasticity: Provision or de-provision resources to meet real-time demand. You can change the capacity and power of machines on the fly, leading to greater agility and flexibility.
Self-Service and Collaboration: Everything is API-driven. Users can choose the resources they need without requiring that someone else provision these for them.
Cost Efficiency: The benefits are twofold — First, the cost is on a usage basis as opposed to software licensing, so you only pay for what you use. Second, your operational costs are much lower, because the cloud boosts the productivity of IT personnel.
Monitoring and Usage Tracking: Finally, the cloud provides monitoring tools that allow organizations to tie usage costs to business outcomes, and therefore gain visibility into their Return On Investment (ROI).
Getting Started on the Cloud
Enterprises can take a few different approaches when making their move to the cloud. These can be broadly categorized as:
Lift and Shift: In this approach, the entire on-premise software stack is replicated on the cloud to take advantage of the shift from CapEx to OpEx. This approach is a great way to get started and experiment on the cloud without a very significant upfront investment. The downside is that it does not take advantage of cloud features such as separation of compute and storage, autoscaling, and other cost optimizations.
Lift and Reshape: True generic cloud computing is adopted, and is a minimum requirement for success with big data on the cloud. As organizations mature, they will be able to take a workload-driven approach to take advantage of the cloud’s elasticity. IT moves from estimating and provisioning for ‘what-if’ scenarios to a facilitator of business outcomes. However, technologies and tools in the big data space are continuously evolving, and it becomes very cumbersome to support all users and multiple use cases as new users are onboarded.
Autonomous Cloud Data Platform: This approach builds on top of the lift and reshapes by adding advanced features built specifically to optimize costs and cloud computing for big data operations. Using a combination of heuristics and machine learning, big data cloud automation ensures workload continuity, the best performance, and the greatest cost savings. Automation of lower-level tasks makes engineering teams less reactive and more focused on improving business outcomes.
Several successful companies have leveraged Qubole’s cloud-native data platform to transition to the cloud, and successfully use big data to improve business outcomes.
Why Enterprises Choose Qubole
- 3x faster time to value: Qubole delivers faster innovation with self-service access to big data, enabling use cases that can be deployed in days — not weeks or months.
- Single platform: Our platform offers a shared infrastructure for all users with the ability to leverage multiple best-of-breed engines. Qubole is massively scalable on any cloud, thereby preventing vendor lock-in.
- 10x more users and data per administrator: Qubole’s self-service platform enables administrators to input policy controls and controls user/group access privileges. The platform’s automation capabilities ensure that all users and workloads are provisioned.
- 50% lower TCO*: Several cost optimization features allow users to leverage lower and cheaper compute.
*Compared to other on-premise and cloud data platforms