Notebooks and Dashboards are the most common ways for Qubole users to play with data interactively using Apache Spark and Presto. Our notebook and dashboard users wanted an easy way to present and share analysis reports with their business partners and users outside of Qubole. We heard their requests and developed a new feature that allows users to access notebooks as reports.
Qubole now provides multiple formats for users to share reports:
- PDF: View and share reports in PDF format, which allows users to copy text content in code and results.
- PNG: A PNG can be easily embedded in presentations without any restrictions on the viewport.
- HTML: HTML is the best format option because users outside of Qubole can access limited interactivity without depending on any cloud resources. Those users can even play with chart types and do some basic data analysis.
Apart from supporting multiple formats, we have also addressed the following feature requests based on feedback from our customers:
- Generate reports through REST APIs
- Generate reports on the scheduled time
- Customize reports with or without code in the case of notebooks
- Hide unnecessary details like the progress bar and action items
- Ability to view large table data
- Generate reports even when the cluster is down
- Ability to share reports via email
How to Use the Feature
Qubole integrates the new reporting feature into three easy-to-use interfaces for maximum flexibility. Below we will walk through these three interfaces one by one.
1. Download Notebook/Dashboard as Report Through UI
Users can download a notebook or dashboard in PDF, PNG, and HTML formats and choose to show or hide the code in the case of notebooks. The steps are as follows:
- On the Notebooks/Dashboards page, click on the Settings icon on the top right.
- Select Download As.
- In the Download dialog box, select the required format from the drop-down list. By default, HTML is selected.
- In the case of notebooks, select the Show Code checkbox to include code in the downloaded report.
- Click the download button. The report will start downloading once it has been successfully generated.
The above GIF captures these steps. All of these steps can be performed via REST APIs, which our documentation captures in detail here.
2. Email Notebook/Dashboard as Report Through UI
Users can email a notebook or dashboard as an attachment in PDF, PNG, and HTML formats. When emailing a notebook, the user can choose to show or hide the notebook code in the attachment. The steps for this option are:
- On the Notebooks/Dashboards page, click on the Settings icon on the top right.
- Select Email as an attachment.
- In the Email dialog box, select the required format from the drop-down list. By default, HTML is selected.
- Enter the email address. If you want to send the attachment to multiple recipients, separate email addresses with commas.
- In the case of a notebook, select the Show Code checkbox if you want the code to appear in the attachment.
- Click the send button.
The above GIF captures these steps. All of these steps can be performed via REST APIs, which our documentation captures in detail here.
3. Send Dashboard Report as Email Attachment at Regular Intervals
Qubole supports the periodic refresh of dashboards at a given scheduled time. Users can subscribe to receive the dashboard report as an email attachment in PDF, PNG, and HTML formats at regular intervals when dashboard refreshes occur.
- On the Dashboards page, open a dashboard and click on the Settings icon in the top right.
- Select Configure Dashboard option.
- Select the Schedule Dashboard checkbox.
- Select Send as Email checkbox.
- Select the required format from the drop-down list. By default, HTML is selected.
- Enter the email address. If you want to send the attachment to multiple recipients, separate email addresses with commas.
- Save the dashboard configurations.
Note: This can also be configured during dashboard creation.
Technical Challenges and Design Choices
Our design decisions were guided by the need to address all of the use cases listed at the beginning of this blog. In this section, we will discuss two main aspects of these choices.
1. Client-Side vs. Server-Side
While it is possible to generate a PDF file on both the client side and the server side, we decided to do this on the server side. The client-side offers only PDF generation and has limitations on capabilities like the ability to:
- Inject custom styles and scripts based on user options like show/hide code and hide the information that isn’t required in the reports
- Generate reports at the scheduled time
- Generate reports through REST APIs
- Send reports by email
2. PDF/Image Convert Libraries vs. Headless Browser
A lot of options are available to convert HTML to PDF/PNG, and this works well for server-side rendered pages. But in our case, the notebook/dashboard pages are rendered completely on the client browser asynchronously. Hence, our only option was to use a headless browser. We evaluated PhantomJs, WebKit, and Puppeteer. PhantomJs and WebKit don’t have built-in high-level APIs to generate a PDF or PNG, and we would depend on external libraries built on top. No single library suited all of our use cases, so we decided to use Puppeteer.
Puppeteer is a node.js library, maintained by Google, that provides high-level APIs to control headless Chrome or Chromium over the DevTools Protocol. Puppeteer enabled us to generate reports with the following abilities:
- A single library that can generate PDF, PNG, and HTML formats
- An interactive user page scroll lazily loads paragraphs into our notebooks; this helps to inject scripts into the scroll page so the notebook loads completely while capturing the report
- A couple of things like the progress bar, paragraph action items, etc. were hidden in reports by injecting stylesheets
- Setting the correct viewport in which our graphs and reports render better
- Make sure all of the images are loaded while the report is captured by event listener APIs
- Inject stylesheets to show or hide code editor based on user requirement
- Embed external Stylesheets and Javascript into an HTML file so a single downloadable HTML report could be generated
- Customize headers and footers in the case of a PDF report
Rollout Plan
This feature is in early beta and we are slowly rolling it out to all Qubole accounts. To get early access, send us an email at [email protected].
Start a free trial on GCP, Microsoft Azure, or AWS.