NEW! Spark 3.3 is now available on Qubole. Qubole’s multi-engine data lake fuses ease of use with cost-savings. Now powered by Spark 3.3, it’s faster and more scalable than ever.
Apache Spark is a high-performance, distributed data processing engine that has become a widely adopted framework for machine learning, stream processing, batch processing, ETL, complex analytics, and other big data projects. Qubole has supported Apache Spark-as-a-Service since 2014 and has contributed several major projects (SparkLens) and optimizations (RubiX) back to the open-source community.
Qubole combines the biggest benefits of Spark: scalability, the speed of processing, and flexibility of languages; with an enterprise-ready data platform built to handle petabyte scale. With Qubole you can use your interface of choice — Notebooks, Web Console, SDK, or API — to build applications using Scala, Java, Python, or R. Qubole Spark runs some of the largest and most efficient clusters in the cloud, scaling from 10 to 1000 nodes and back down in minutes.
NEW! Spark 3.3 is now available on Qubole. Qubole’s multi-engine data lake fuses ease of use with cost-savings. Now powered by Spark 3.3, it’s faster and more scalable than ever.
Advanced cost controls result in up to a 50% reduction in costs with Qubole
Performance optimizations and smart management tools that increase Spark processing efficiency
Qubole makes Spark easier to use by automating back-end configuration and other day-to-day processes
Enterprise-grade security, JDBC/ODBC connectors to enterprise data sources, and 3rd party integrations.
Apache Spark on Qubole vs. Open Source Apache Spark
Apache Spark on Qubole | Apache Spark | |
Spot Bidding | ||
Graceful Spot Shutdown | ||
Spot Rebalancing | ||
Workload-Aware Autoscaling | ||
Aggressive Downscaling with graceful decommissioning | ||
Container Packing | ||
Heterogeneous Clusters | ||
Per-second billing | ||
Advanced Multi-tenancy |
Apache Spark on Qubole | Apache Spark | |
Faster Reads | ||
Faster writes | ||
Compute Optimization for joins and filters | ||
Fault isolation of compute resources | ||
S3 Direct writes optimization | ||
S3 listing optimization | ||
Metadata Caching | ||
Rubix (distributed caching) |
Apache Spark on Qubole | Apache Spark | |
Multiple languages (PySpark, Spark SQL, Scala, etc) | ||
Multiple data sources (S3, Redshift, Snowflake) | ||
Versioning | ||
Scheduling | ||
Dashboarding | ||
Collaboration and sharing |
Apache Spark on Qubole | Apache Spark | |
Profiling (SparkLens) | ||
Monitoring (Ganglia, DataDog, etc) | ||
Intelligent Log Access |
Apache Spark on Qubole | Apache Spark | |
Access control for notebooks, clusters, jobs, structured data | ||
Audit end-user activity logs | ||
SSO with SAML 2.0 support | ||
Data encryption (at rest and in motion) | ||
HIPAA, SOC2 Type2, ISO-27001 compliant environments |
Apache Spark on Qubole | Apache Spark | |
Connect with BI tools with authenticated ODBC/JDBC (Tableau, Looker, etc.) | ||
REST API (Talent, Informatica, RStudio, etc, Airflow, Oozie) | ||
Data Source Connectors (Snowflake, Redshift, Kafka, Kinesis) |
Apache Spark on Qubole | Apache Spark | |
24/7 support from our Spark experts | ||
Runs multiple versions of Apache Spark |
Free access to Qubole for 30 days to build data pipelines, bring machine learning to production, and analyze any data type from any data source.
See what our Open Data Lake Platform can do for you in 35 minutes.