Qubole offers Hive as a Service. When a user logs in to Qubole, he/she sees the tables and functions associated with their account and can submit a HiveQL command via the composer pane. Qubole takes care of executing the HiveQL command, spawning a Hadoop cluster if necessary, and saving results and logs. Now, multiple users belonging to different accounts may issue commands at the same time. Under the covers, these commands, along with contextual information, go into a common queue and are picked up in a FIFO manner. A wrapper script that does per-account housekeeping picks up a HiveQL command and fires up a Hive JVM with the right configuration parameters and passes the command as an argument. This Hive JVM compiles the command and runs it against the customer’s cluster. This sequence of events is shown below.
Bringing up a new Hive JVM for every command has a performance impact. Spawning a JVM with all the required jars and classes takes approx 7 seconds. This overhead becomes significant for short-running commands (e.g. EXPLAIN, CREATE EXTERNAL TABLE). This latency is especially problematic when a user is composing a new query and it takes 7 seconds to get a syntax error back from the system. So we started looking into what would it take to eliminate this latency and improve user experience. The answer here was to have a pre-warmed server that serves requests to lightweight clients. The new flow of control looks like this:
The devil is often, however, in the details. Hive is known to have issues related to multi-threading. Hive does come with a HiveServer, but we needed a server that could support different accounts, operate against different metastores, and talk to different hadoop clusters. Hence, we started working on a multi-threaded, multi-tenant Qubole Hive Server (QHS). The first order of the day was fixing thread-safety issues and any code that assumes that Hive talks to a single metastore. A lot of it involved code hygiene like changing Static vars to ThreadLocals, adding synchronization, cleaning out, or disabling internal caches. We’re in the process of contributing this back to the community.
We used Thrift to generate the client and server stubs. The API is relatively modest. The client (in Python) submits the HiveQL command along with metastore information and cluster information and gets a command_id in response. The client polls the server and receives a completed status when the command is done. If the user wishes to cancel a running query, the client can pass a cancel request to QHS with the appropriate command_id. QHS maintains a per-client state in its internal data structures and uses one thread to handle a command at a time. QHS needs to keep track of currently running MapReduce jobs and associate them with command_ids in case it needs to cancel a running MapReduce job. We also implemented a graceful shutdown of QHS that waits for currently running queries to complete to make it easier to release new internal versions of Hive with minimal disruption to our users.
The net effect of QHS was that the latency of the simplest of commands came down from 7 seconds to under 0.4 seconds.
There was one issue that was a little perplexing. After running for a week or so, QHS started throwing “too many files open” exceptions. A quick lsof call confirmed that there were numerous open file handles. Surprisingly, though, these all pointed to jar files. After some investigation, we found that the URLClassLoader leaks file handles to jars it opens (see this link for some dirty details). These are never garbage collected. We ended up using the non-standard ClassLoaderUtil.releaseLoader to free up resources. Java 7 has a nicer solution for this where URLClassLoader has a close method that performs the necessary cleanup.
If you’re interested in giving Qubole a try, please sign up to start your free trial.