Qubole Security Model for Authorization in AWS Cloud

October 17, 2017 by Shefali Aggarwal Updated April 13th, 2024

Qubole has defined a new security model to improve enterprise-level security and data governance on the cloud. The model integrates the cloud vendor’s storage authorization with Hive authorization. This improves usability for both cloud-storage administrators and Data Administrators (DBA) while eliminating errors that arise from end-user authorization problems.

This is an important milestone on the way to Qubole’s goal of building a secure, enterprise-level cloud platform. Qubole is one of the first vendors to add cloud storage-level checks at query compile time, and consequently offers the most secure platform in the cloud.

History of Database Security Model

In traditional databases (RDBMS and NoSQL), the database had complete control over the catalog, compute, and storage, so administrators and users used the database as the single source of truth for authentication and authorization.

In early versions of Apache Hive, the catalog (Hive Metastore), compute (M/R), and storage (HDFS) were separate. Seamless authorization was not possible because each system had to administer independently. Apache Hive released SQL Standard Based Hive Authorization in Hive 0.13, which integrated all the systems. This mechanism runs storage-level checks through the Metastore server and checks the GRANT tables during query compile. As a prerequisite, a user must be registered with Hive and HDFS, and both must have a single view of users and their credentials. Similarly, Apache Ranger and Apache Sentry both provide HDFS-level file permission checks when the same user is trying to access tables through SQL.

Cloud Security Model

In the case of public clouds, data is stored in the cloud, and not in HDFS. Authorization on cloud storage is managed by the cloud vendor. SQL Standard Based Hive Authorization in Hive does not work in this case because there is no unified view of users and credentials.

Cloud Data Access Control

Consider the typical representation of a Hive table in the cloud depicted in the diagram above:

A Hive table consists of files in the cloud storage, and catalog information in the Hive Metastore. Consequently, users have to be assigned Roles/Keys for cloud storage as well as USER or ROLE with the Hive database. The authorization modules of the cloud storage and the database are separate, and it is very hard to keep them synchronized. A mapping or coordination is needed to manage the pairs of roles and use them effectively.

Qubole Security Model

Qubole has integrated cloud storage authorization with Hive authorization by introducing a new security model. In QDS, users can choose between two models:

L1: User uses cloud storage permissions to control table access. Users may also combine SQL statements to define access policy and rules, but will mainly use such statements for error-proofing (e.g. in the case of a user accidentally dropping a table).

We’ll discuss the two models in detail below, using AWS S3 storage as an example.

L1: Cloud Storage Authorization(AWS)

If the organization has 2 tables:

Table	User Access	S3 location
A	User1	s3://org/datawarehouse/tables/A
B	User2	s3://org/datawarehouse/tables/B

What administrator needs to define in User1 and User2’s IAM policy:

For user1’s IAM role:

...

{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::org/datawarehouse/tables/A/*"
},

...

For user2’s IAM-role:

...

{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::org/datawarehouse/tables/B/*"
},

...

Some issues with L1:

Fine-Grained Data Access Control	Cannot support because users get access to the entire set of files.
Storage Administrator Maintenance	Need to create IAM roles per user or user group to define data directory access according to table access. In the case of any directory restructure, need to update the IAM policies.
Auditing	At file level, which could be verbose (imagine a join across multiple tables, multiple partitions).
DBA	Either do nothing or redo authorization policies via SQL statement as error-proofing.

L2: Qubole SQL Authorization

L2’s design principle is to provide Database Administrators with a unified view of database and storage permissions. No coordination with cloud storage administrators is needed.

Qubole Hive implements this design principle by assigning a separate role to the compute nodes (IAM-C). This role provides access to all the data.

Qubole Hive then executes the following checks query compile time:

GRANT tables in metastore
Storage-level checks with the user’s IAM role (IAM-A) as needed for location-related DDLs only.

The compute role and query compile-time checks provide a seamless way for database administrators to define authorization policies without needing to make any changes in cloud storage policies.

Advantages of L2 vs. L1:

Fine-Grained Data Access Control	Achievable through Hive Views.
Storage Administrator Maintenance	No need to define data directory access according to the table access in each IAM role setup. No work for any data directory restructures in the protected data warehouse bucket or folder.
Auditing	At table level through Hive logs.
DBA	Using Standard SQL to control user access.

L2: Limitations

The L2 model covers data access only through metadata policies in QDS.

Qubole recommends customers also secure their cluster access outside of Qubole (e.g. direct access to clusters or storage) to build a completely secure environment.

Use Case

A big organization has three departments: Sales, Marketing, and Finance. These need to have different privileges for access to tables stored in the protected data area.

	Tables
	Customer	Store_sales	Promotion
Sales	Read/Write	Read/Write	Read
Marketing	Read	Read	Read/Write
Finance	Read	Read	Read

Storage Configuration

3 tables are stored in an S3 data warehouse which is a protected area
Protected Data Area:
s3a://datalake/datawarehouse/customer/ (customer table)
s3a://datalake/datawarehouse/store_sales/ (store_sales table)
s3a://datalake/datawarehouse/promotion/ (promotion table)

IAM Role Creation

IAM-Cluster: has full access to s3a://datalake/datawarehouse
IAM-AccountDefault: has access only to s3a://datalake/defloc/
if the admin wants to add a temp folder for everyone: s3a://datalake/temp

Then follow Dual-IAM Role Documentation to set up the Dual-IAM role for the account

QDS Account Setup

All users can belong to the same QDS account. Here’s a user list that the administrator needs to grant permission to.

User Group / Department	User
Finance	[email protected]
Marketing	[email protected]
Sales	[email protected]

On the Account Settings page in the QDS UI, put IAM-AccountDefault role credentials on the setting page.

HiveQL DDL Privilege Setup

After Hive Authorization is enabled for the account.

Set role admin;

Use demo_database;

 

Create role sales;

Create role marketing;

Create role finance;

 

Grant select, insert on customer to role sales;

Grant select on customer to role finance;

Grant select on customer to role marketing;

 

Grant select, insert on store_sales to role sales;

Grant select on store_sales to role finance;

Grant select on store_sales to role marketing;

 

Grant select on promotion to role sales;

Grant select on promotion to role finance;

Grant select, insert on promotion to role marketing

 

GRANT role finance to USER john;

GRANT role sales to USER david;

GRANT role marketing to USER mary;

Future Work

Other engines (Spark, Presto, etc.) and applications (e.g. Zeppelin Notebook) will support the L2 Model over the next couple of releases. We’ll write blogs to elaborate on the mechanism and provide use cases for these engines to demonstrate our cross-engine security solution for data authorization.

Start Free Trial

Qubole Security Model for Authorization in AWS Cloud

History of Database Security Model

Cloud Security Model

Qubole Security Model

L1: Cloud Storage Authorization(AWS)

L2: Qubole SQL Authorization

L2: Limitations

Use Case

Storage Configuration

IAM Role Creation

QDS Account Setup

HiveQL DDL Privilege Setup

Future Work

Recent Posts

Categories

Read Big Data Analytics: Interactive Queries using Presto on Microsoft Azure Data Lake Store and Qubole

Product

Company

Helpful Links

START YOUR FREE TRIAL OF QUBOLE

Contact Form

On-Demand Qubole Demo

Google Cloud Sessions

Thank you!

UNLOCK QUBOLE FOR FREE

UNLOCK QUBOLE FOR FREE

UNLOCK QUBOLE FOR FREE

UNLOCK QUBOLE FOR FREE

UNLOCK QUBOLE FOR FREE

UNLOCK QUBOLE FOR FREE

Qubole Security Model for Authorization in AWS Cloud

History of Database Security Model

Cloud Security Model

Qubole Security Model

L1: Cloud Storage Authorization(AWS)

L2: Qubole SQL Authorization

L2: Limitations

Use Case

Storage Configuration

IAM Role Creation

QDS Account Setup

HiveQL DDL Privilege Setup

Future Work

Recent Posts

Categories

Read Big Data Analytics: Interactive Queries using Presto on Microsoft Azure Data Lake Store and Qubole

START YOUR FREE TRIAL OF QUBOLE

Contact Form

On-Demand Qubole Demo

Google Cloud Sessions

Thank you!