Apache Spark remains a growing force in the realm of big data. Perhaps that shouldn’t come as a surprise considering the overall momentum behind big data analytics, but the growth in just the past few months has been nothing short of impressive. No doubt part of the reason behind that growth — besides a greater drive to take advantage of big data — is Spark’s notable improved performance when compared with MapReduce. A deeper dive into the numbers reveals more about how Spark is used and who is using it.
Spark Usage in Line with Market Growth
We here at Qubole have seen this Apache Spark growth first hand. More customers are showing interest in how they can best use Spark in relation to the big data they have collected. In other words, businesses that come to Qubole have shown greater interest in Spark’s capabilities. What we have experienced isn’t necessarily out of the ordinary when compared to the market growth surrounding Spark. If anything, it goes hand-in-hand with Spark’s increasing popularity across the entire big data field.
“Half of all Qubole customers now use Spark as part of their analytic processing
Spark Users Growing
Qubole’s own internal numbers show just how popular Apache Spark has become among various enterprises. Let’s compare Spark usage in the Qubole Data System (QDS) in April 2016 to what it was like in November 2015. In just that six-month period of time, the processing on Qubole’s platform increased by 36 percent. In other words, not only are more people using Spark, but the number of hours they’re using it is on the rise. Half of all Qubole customers are now using Spark as part of their analytic processing.
Given Spark’s versatility with numerous programming languages and frameworks, increased usage and queries seem like a natural result. Another increase can be seen in the number of unique Spark clusters that have been started by Qubole customers, which has risen by nearly 50 percent in the same six-month timeframe.
“In six months, Qubole has seen a 49% increase in unique Spark clusters started
Other Engines are Growing
Spark isn’t the only analytics engine showing growth. We’re also seeing notable increases in usage for Presto. Looking once again at the period from November 2015 to April 2016, Qubole has seen the number of customers using Presto increase by around 300 percent, a number similar to Apache Spark growth. Internal Qubole numbers also show that the number of new Presto clusters during those six months is up by a considerable amount — going from 97 to 254. The growing popularity of both Spark and Presto indicates that analytics engines are becoming more in demand. Each engine has its own strengths depending on the job customers want it to do. There’s always the right tool for a specific job, and Qubole customers have access to a variety of analytics engines that can ensure the job is done right.
What these statistics tell us is that the growth seen in Apache Spark, along with other analytics engines, is very real indeed. Spark is a fast analytics engine, capable of real-time processing. That means Spark is closely connected to multiple intriguing trends within technology, most notably the Internet of Things. With this in mind, industries of all types should have more than a passing interest in what Spark can do. In other words, Spark can do more than just provide a boost for tech companies. Other areas like healthcare, banking, and education stand to gain more from using Spark and its accompanying tools. As more organizations become familiar with big data and machine learning, they’ll necessarily gravitate toward Apache Spark and other engines, and with it, we can expect the number of users and queries to increase as well.
Qubole’s big data-as-a-service offers all of the commonly used analytics engines. See how easy using Spark and Presto can be with a risk-free, free trial.