developers = ("awesome")
print("Developers are {}".format(developers))

Spark Presto Reference Docs

Home >
Developers

Qubole Test Drive

Try the Qubole Platform today. Get hands-on experience with Spark, Presto, Hive, and more.

Start Free Trial

Open Source Tools

GitHub repository of Qubole open source project contributions and tools.

Explore Tools

Big Data Engines

Learn the components that makeup Qubole’s big data technologies.

Get Started

User Forum

Qubole Product forum and discussion board.

Join Community

Engineer Blog

Learn new developments, best practices, use cases and more from Qubole engineers and users.

Read blog

Videos

Qubole Video Channel for Case Studies, Events, Education, and News

Watch Now

What it's Made of:

Cloud Infrastructure

Big Data Engines and Frameworks

Data Pipelines

API/SDK

Metadata & Caching Optimization

Data Sources & File Formats

Cloud Infrastructure

The Qubole Data Service is built for the cloud; with available services in AWS, Azure, and Oracle Cloud.

No need to manage clusters. Get instant access to Hadoop, Hive, Spark, Presto, and more at the push of a query.

Security for the cloud. Qubole embraces different cloud infrastructures with enterprise compliance (HIPAA, PCI, SOC 2) attestations.

Big Data Engines and Frameworks

From Data Science to Engineering. Quickly visualize unstructured data, build data pipelines, or train and productionize ML algorithms.

Common user interfaces for developing Hadoop, Spark, Hive, and Presto. Providing each data team self-service access to the data lake

Integrate with technologies from the entire Big Data ecosystem (Apache Kafka, Ranger, HBase, Arrow, H2O, Superset, and many more).

Data Pipelines

Built-in scheduler, to easily build and manage production data pipelines.

Build complex end-to-end pipelines easily with Airflow and the Qubole Operator.

Integrate 3rd party tools and technologies from Jenkins to Oozie, Talend, and other automation/data pipeline services.

API/SDK

Qubole offers a full set of REST application programming interfaces (APIs) to manage all platform functions from infrastructure to user management

Use your favorite languages (Python, Java, Ruby, and R) to build with the Qubole SDK.

Qubole commands APIs to directly submit queries and retrieve results of Hive, Spark, and Presto commands.

Metadata & Caching Optimization

Metastore caching for quick discoverability of your data lake, with secure encryption at rest.

Shared metadata caching to reduce resource inefficiency and improve performance with multiple users querying.

Engine-level caching with Rubix, an open-source technology developed by Qubole, for improving the performance of Presto and Spark workloads

Resource Management

Automation built for the cloud. Qubole focuses on separating storage from computing, to enable dynamic scalability.

Big data clusters built with workload aware auto-scaling, aggressive downscaling, and optimizations to leverage AWS Spot Instances

Built for petabyte-scale with cloud computing. Save and contain costs as you scale workloads, without manual intervention or tuning.

SQL on Data lakes

Qubole Hive Metastore allows you to easily create tables and query structured and unstructured data in seconds.

Run federated queries across multiple data sources (NoSQL databases, Data Warehouses, and more) with Qubole Presto.

Use your favorite interface with Qubole SQL engines. Whether it is Analyze Workbench, Notebooks, or connecting your favorite BI tool.

ML Workflows

Build. Use your favorite Data Science workbench and tools (RStudio, Jupyter, SageMaker, H2O, and more) to explore and develop new models

Train. Fast, self-service access to compute allows for rapid model training. Making selecting the right ML model, a quick and iterative process.

Deploy. Whether running batch or real-time ML operations, Qubole is built to scale up to petabytes of data, and manages production pipelines.

Tuning Workloads

Qubole improves query performance at runtime with Join Ordering and Dynamic Filtering optimizations for Spark and Presto.

Proactively tune Spark workloads with SparkLens or optimize tables and queries with recommendations from Qubole AIR.

Live stats collection on Table performance for optimizing production workloads and datasets

Data Sources & File Formats

Query any file format (JSON, Avro, Parquet, ORC, etc) with any engine. Qubole allows self-service access to analyze cloud storage.

Integrate with your Data Warehouses, RDS, or Data Marts to enable read/write access to Qubole engines

Big data engines (Hadoop, Spark, Presto) built for faster query performance with cloud object stores

eBooks & White Papers

Labs & Education

Examples

Community Events

Big Data Training

Connect With Us

Recent Tweets

[tweets max=3 user=qubole]

Get In Touch

Recent Blog

Events

No upcoming events at this time.

developers = ("awesome")
print("Developers are {}".format(developers))

What it's Made of:

Cloud Infrastructure

Big Data Engines and Frameworks

Data Pipelines

API/SDK

Metadata & Caching Optimization

Resource Management

SQL on Data lakes

ML Workflows

Tuning Workloads

Data Sources & File Formats

Big Data Training

Apache Hive for
Data Engineering

Apache Spark
for Data Science

Presto for
Data Analysts

Connect With Us

Recent Blog

Events

Product

Company

Helpful Links

START YOUR FREE TRIAL OF QUBOLE

Contact Form

On-Demand Qubole Demo

Google Cloud Sessions

Thank you!

UNLOCK QUBOLE FOR FREE

UNLOCK QUBOLE FOR FREE

UNLOCK QUBOLE FOR FREE

UNLOCK QUBOLE FOR FREE

UNLOCK QUBOLE FOR FREE

UNLOCK QUBOLE FOR FREE

developers = ("awesome")print("Developers are {}".format(developers))

What it's Made of:

Cloud Infrastructure

Big Data Engines and Frameworks

Data Pipelines

API/SDK

Metadata & Caching Optimization

Resource Management

SQL on Data lakes

ML Workflows

Tuning Workloads

Data Sources & File Formats

Big Data Training

Apache Hive for Data Engineering

Apache Spark for Data Science

Presto for Data Analysts

Connect With Us

Recent Blog

Events

START YOUR FREE TRIAL OF QUBOLE

Contact Form

On-Demand Qubole Demo

Google Cloud Sessions

Thank you!

developers = ("awesome")
print("Developers are {}".format(developers))

Apache Hive for
Data Engineering

Apache Spark
for Data Science

Presto for
Data Analysts