Tableau and Athena: Serverless Analytics Stack

4 Steps To Create a Serverless Analytics with Tableau and Amazon Athena

Thomas Spicer
Openbridge
Published in
9 min readJul 13, 2018

--

Combining the simplicity of Tableau, a data lake, and the power of AWS, Athena can deliver a cost-efficient, high-performance data lake architecture.

One of the key elements of the Tableau and Athena model is that you only pay for the queries you run. This is an attractive feature since there is no hardware to set up, manage, or maintain.

For reference, here is an example of the monthly savings using Athena versus a traditional analytics warehouse:

In a use case where you are running 1 Redshift ds2.xlarge node, your savings would be about $560 a month, or $6,720 for the year with Athena. If you are running a cluster of 2 ds2.xlarge nodes, the savings would be about $1,220 per month or $14,600 per year. Not bad!

When you combine an Amazon S3 data lake, Amazon Athena, and Tableau’s new Hyper Engine, you can create a low-cost, serverless analytics stack.

Overview: Bringing together an Amazon S3 data lake, Amazon Athena and Tableau

The basic premise is storing your data in an optimized data lake on S3 with Amazon Athena acting as the query service for that data (A). Tableau can then store a “cache” of that data within the Hyper Engine (B) to optimize performance. Simultaneously, reducing costs since you will perform fewer queries to Athena than you do the data resident in the Hyper Engine.

AWS Data Lake Architecture — S3, Athena, and Tableau

The pattern of using a local cache of the data is not uncommon for Tableau. What is unique in this case is that you are not paying for a data warehouse to be running 24/7. With Athena, you only pay for what you use, unlike your traditional analytics warehouse, which may have a lot of idle time that you are paying for.

There is certainly variability in your Athena costs depending on the volume or the complexity of the queries. This will impact the cost. Also, there may be features that Redshift has that Athena does not.

The point here is to provide context on the potential cost savings in a model outlined above since you only pay for what you use with Athena, especially when paired with Tableau.

This sounds great, and now you are wondering how you get started with Athena and Tableau. The rest of this post details how to get started.

Step 1: Getting Your Data Into Amazon Athena

Getting started with Athena is pretty easy. We published a “how-to” guide to get Athena up and running in 60 seconds. We have published a collection of AWS Athena articles on what it is and how to get started with it, including best practices like using Apache Parquet as your data format. We will assume you have followed our guides and have Athena set up with your data loaded there.

The next step is to connect Tableau to Athena.

Step 2: Connecting Tableau To Amazon Athena

With your data loaded into your S3 data lake and Athena set, you are ready to connect Tableau. The process is straightforward and resembles other database connections. Select the Amazon Athena data source and then connect using your credentials and host info:

  • Tableau connection for the Athena server: athena.us-east-1.amazonaws.com
  • S3 staging directory: s3://yourbucketname/results/
  • User Name: aws access key
  • Password: aws sercret key

If you get stuck, take a look at the Tableau docs for more detail on making a connection to Athena.

Step 3: Optimize Tableau Visualizations And Reports With Custom Views In Athena

There may be use cases where you want to create complex queries as views. These views may JOIN multiple tables in Athena to ease certain queries in Tableau.

Depending on the complexity of these queries, run times can take a while to complete. While this is not ideal for live database connections, it works well in a data extract model to refresh the data resident within Tableau on a schedule.

Using views and extracts, you can minimize Athena costs because you are only running the query once (or a few times a day) and then publishing the extract to Tableau. This means you can be very efficient in leveraging the Tableau Hyper engine while minimizing your costs in Athena.

For more detail on how to set up views in Athena and leverage them in Tableau, check out our guide.

Step 4: Configure and Schedule Data Refreshes from AWS Athena to Tableau Hyper Engine

You have created an amazing Tableau dashboard, your data is in AWS Athena, and you are ready to share it with the rest of your organization by publishing.

Before you do, you will want to plan to refresh the Athena data source(s) used by that dashboard always to display the most recent data available.

While we can use a live connection to Athena, we want to use extracts. This approach ensures the visualizations take advantage of the underlying Tableau Hyper engine, ensuring you maximize Tableau performance while minimizing AWS data lake and Athena costs.

The process for setting up an automated extract refresh schedule is fairly simple. This process assumes that you have already created a local extract(s) in Tableau's desktop version. If you haven’t, you can do so by simply right-clicking on the data source(s) in question in the Data Sidebar and selecting Extract Data.

Once that’s done, you are ready to get started with schedules. Select Publish Workbook in Tableau:

This will display the Workbook Publishing Pop Up, which allows you to configure many settings, including which project the workbook should be published under, the desired name, and most importantly for this walk-through, manage the data sources and schedules.

Click the Edit link in the Data Sources section to set those preferences.

In our example, we have left the Publish Type preference set to Embedded in Workbook and we have changed the ‘Authentication’ preference to Allow Refresh Access.

This will enable you to set an automated extract refresh process. Once you are happy with all of the configuration details in the pop-up, click the green Publish button:

Once this process completes, you should receive a Publishing Complete success message. In the bottom right corner of this message, you will see another green button labeled Schedule a Refresh.

Click that button to set up your automated extract refresh process.

Schedule Refreshes

You will be presented with a list of schedule options. In our example, we know that all the underlying data sources are updated after midnight, and those update processes are complete within a couple of hours, so we select the Daily at 5:00 a.m. option and click the Schedule Refreshes button.

You will see that your Tableau Workbook has been published and the green message that states Created 1 extract refresh.

Confirm and Edit Schedules

If you click on the Data Sources tab, you will see your extract(s) listed along with the connection details.

If you click on the Refresh Schedules tab, you will see the refresh schedule you created, the last time it ran, and the next time it is scheduled to run.

If you want to change the timing of the schedule, check the box next to the schedule and select Change Schedule from the Actions drop-down menu.

Then select the desired runtime and click Change Schedule

If your underlying source data is updated more than one time each day and you want to add a refresh schedule, click the orange New Extract Refresh button, select the desired refresh time and click Schedule Refreshes

When you return to the Refresh Schedules tab, you will see that there are now two daily refresh schedules (in our example, one at 5:00 a.m. and a second one at 4:00 p.m.)

Depending on your use cases, setting up an automated extract refresh schedule to retrieve the most recent data hourly, once per day, weekly, or monthly, will minimize the run time of the Tableau queries in your dashboard and optimize the underlying data used in those queries.

That’s it! You now have a setup with Tableau using the Hyper Engine for visualizations with the underlying data resident within AWS Athena.

Do You Want Automated, Zero Administration AWS Athena Data Pipelines To Use With Tableau?

It has never been easier to get your data into AWS Athena. Our Athena service optimizes and automates the configuration, processing, and loading of data to AWS Athena, unlocking how users can return query results in Tableau.

With our new zero administration data ingestion service for AWS Athena, you push data from supported data sources. Our service will automatically load it into your AWS Athena database for Tableau Desktop and Tableau Server.

If you wanted to get started with Amazon’s query service AWS Athena but did not have the time or expertise, this is the solution for you!

Want to discuss how to leverage a serverless analytics stack for your organization? Need a platform and team of experts to kickstart your data and analytics efforts? We can help! Getting traction by adopting new technologies, especially if your team is working in different and unfamiliar ways, can be a roadblock to success. This is especially true in a self-service-only world. If you want to discuss a proof-of-concept, pilot, project, or any other effort, the Openbridge platform and team of data experts are ready to help.

Reach out to us at hello@openbridge.com. Prefer to talk to someone? Set up a call with our team of data experts.

Visit us at www.openbridge.com to learn how we are helping other companies with their data efforts.

References

--

--