While running Presto queries in Qubole Data Service (QDS), if you’ve ever opened the Query Tracker link you must have noticed that your queries run in a modified form where the INSERT OVERWRITE DIRECTORY (IOD for short) is appended to the query you provided. For example, the left box in the image below is the QDS Analyze page where a user has run a SELECT query, and the right box is the Query Tracker page where the query is seen as an INSERT OVERWRITE DIRECTORY over the original user query:
Note that there is no visible impact to the user of appending INSERT OVERWRITE DIRECTORY to the query, as QDS ensures that results written to the directory are returned to the user.
Purpose of Converting a Query into an IOD Query
This addition of IOD is done for two main purposes:
- Persisting results in a permanent store for historical access
QDS allows users to visit the queries they have run in the past, and all information including results would be available for those queries. For this purpose, the results should be persisted to a permanent store. Performing the action internally in Presto via IOD works best, as Presto can leverage the parallelism of the cluster. Let’s see how:
In Presto, the output of a query will be available in parts in multiple nodes. For instance, if the query outputs result after a GroupBy operation, the result would be available in parts in all the nodes of the cluster that ran the final grouping operation (which is usually all the nodes in the cluster). If we were to write the results to a permanent store from outside Presto, we would have to write a client that would fetch these results via master serially and then write them to the store. This process of serially fetching and writing results through a single client becomes tremendously slow for queries with larger results.
Instead, if we convert the query into IOD, then all the nodes having results can independently write those results to the permanent store in parallel. This makes the query lifecycle much faster. - Avoid query stalls due to slow clients
Execution in Presto happens in a pipeline of operators — with Table Scan Operator being the leaf, Output Operator being the root of this pipeline, and other operators like Aggregation Operator, Join Operator, etc. lying in between. Data between operators flow in a group of rows called Pages. Table Scan reads a Page of data and sends it to its parent operator, which pushes its output to its parent and so on until it reaches the Output Operator. Users stream the query output from this Output operator as and when the Pages are available. But if the client is slow to pull out the Page from Output Operator, the whole Pipeline gets stalled. Adding an IOD ensures that results are streamed to a more reliable cloud store and queries execution continues independent of the client.
Changes for an IOD Query Conversion
To achieve this query conversion, we have implemented a few changes in QDS:
- Support for IOD syntax in QDS Presto: QDS Presto supports IOD syntax. Qubole contributed “INSERT INTO TABLE” implementation to the open-source community. “INSERT OVERWRITE DIRECTORY” implementation is specific to Qubole Presto (click here for more details).
- Auto-conversion of queries into IOD: We have two versions of adding IOD in queries:
- QDS middleware passes a location to Presto while submitting queries where it would search for the results of queries. We have modified Qubole Presto to inject IOD in the AST of the query so as to write the results to that location.
- Appending IOD in the query string in QDS middleware. This is a simple regex-based solution that finds the SELECT clause and appends IOD before it. Regex-based query rewrite is error-prone, as finding the right place to add IOD is difficult. For example, adding IOD to SELECT is simple but complicated for WITH queries, and then comments of different styles also need different handling. Therefore, the safest approach is to do this from inside Presto, and we are moving towards making approach (a) the default.
- Make QDS Middleware IOD-aware: Since the results of the query which has IOD injected is actually the data written by IOD and not the output of IOD (which is how many rows IOD wrote), the QDS middleware is made aware of this injection so that when a user requests the result of a query the data written by IOD is returned.
With these changes in place, users have their results always available for their historical queries at a minimal cost. The cost of writing results to a cloud store from the nodes running in the same cloud is small enough that we have moved to convert all queries into IOD in contrast to our old behavior of not doing it for queries with a smaller LIMIT clause. We are continuously working on improving the user experience and query performance, and to that end, we are working in the areas like faster writes to S3 in Presto using multipart uploads, increasing concurrency of IOD Writes in Presto, streaming results from s3 in our *DBC drivers, etc. We will soon blog about the performance benefits from these optimizations too.