Airflow Movie Recommendation Engine Example
Get started with the basics of using Airflow with each big data engine in Qubole (Spark, Presto and Hive), to build an ETL pipeline to structure the MovieDB dataset. From there, learn how to use Airflow with Spark to run a batch ML job that can be used in productionizing the trained model on the now clean data.
Wikipedia Trends Pipeline with Hive & Airflow
A Big Data app that displays the topics that are trending on Wikipedia. There are two main parts: a webapp in Ruby on Rails that is fed by a Hive data pipeline hosted in the Qubole scheduler, there is also a variation in the demo to use Apache Airflow.
Demo Query that can invoke Qubole Autoscaling
This is a SQL query that was used in the Qubole Autoscaling white paper, and can be used for internal tests against multiple engines (Spark, Presto, and Hive).
Get instant access to Notebook examples by selecting any of the tiles below. Each example varies in difficultly from visualization to Machine Learning use cases using SQL, Python, Scala, AngularJS, and more. Download the Notebooks into Qubole Spark to run them yourself.
Free access to Qubole for 30 days to build data pipelines, bring machine learning to production, and analyze any data type from any data source.
See what our Open Data Lake Platform can do for you in 35 minutes.