An Introduction To AWS Athena Interactive Query Service

What Is AWS Athena and how to get started with interactive query services

Anna
Published in
5 min readDec 27, 2016

--

In November of 2016, Amazon Web Services (AWS) introduced Amazon Athena, a new service that uses Facebook Presto, an ANSI-standard SQL query engine, to query your data lake.

A key benefit of Athena is that it is serverless, so there is no infrastructure to manage. This can offer exceptional value and performance, especially when paired with a data lake and BI platform like Tableau.

Amazon says that Athena is robust enough to support large-scale efforts while also being accessible to smaller companies that only need to run a few queries a month.

The adoption of Athena has been robust, and the use cases compelling. For example, organizations are creating serverless business intelligence stacks with Athena, Apache Parquet, and Tableau.

This brief introduction explains how the new service can benefit companies, both large and small.

Amazon Athena: A Game Changer?

Amazon describes Athena as a serverless interactive query service. What makes Athena so interesting is that a serverless solution may alter how many think about their data workflows.

Since Athena accesses data directly from S3, users don’t need to concern themselves with setting up servers, frameworks, clusters, or other tools other than getting the data loaded to S3. This can be cost-saving for use cases that do not require a traditional data warehouse.

Here are a few of AWS Athena’s key offerings:

  • An intuitive point-and-click interface for database and table creation is available with no need for advanced technical training typical in these systems.
  • Rapid query results without having to worry about tuning queries or optimizing database structures.
  • Since Amazon S3 stores the data, businesses need to invest in physical IT infrastructure to query and store their information.
  • The Amazon Athena pricing is a pay-as-you-go business model means users only need to pay for queries they actually run. This avoids getting locked into fixed rates for a level of service they don’t actually use.

All these features make AWS Athena stand out and easy to use for big and small companies, especially if they are already using or are planning to use wanted to use PrestoDB.

Getting Started With Amazon Athena

Sample Athena Pipeline. Source: Rahul Pathak Slideshare Presentation

To get started with AWS Athena, you will need to make sure you have data residing on S3. With your data in place, you will need to create a database and tables in a format that matches the ones stored on S3.

Athena supports formats including CSV, ORC, JSON, and Apache Parquet. If your data is not in one of those supported formats on S3, you will need to convert it.

Do not be intimidated by the database and table creation process. It is a fairly straightforward process, and Amazon provides you with a step-by-step guide on how to create a database and tables and get started with queries. Openbridge offers an ELT and ETL service that automates all of this for you!

How to Query Athena

Amazon Athena offers businesses an enterprise-level data query tool that is simple to use. Since companies don’t have to invest in infrastructure build-outs and pay only for what they use, Athena is likely to become a powerful and accessible part of an enterprise data workflow.

The result sets from queries will get stored back on S3. Managers and data scientists can use another Amazon product called Amazon QuickSight to produce visualizations and reports. It’s also possible to use other business intelligence, BI tools, and programmatically via Python, Java, or similar using a JDBC connection (get JDBC driver).

Once you are working with Athena, you can save frequently used queries. You can also view and download the query history with your Athena catalog manager. For more info about how Athena works, visit the AWS Documentation page.

AWS Athena’s Secret Sauce: Facebook Presto

Athena uses Facebook Presto (source code) as the underlying technology. This is the same open-source and ANSI-standard query engine that powers analytics workloads at Facebook and many other companies.

Presto’s developers built a SQL engine for interactive analytics. While it compares favorably with commercial warehousing systems for speed, it can also scale to handle applications the size of Facebook. For instance, more than 3,000 Facebook employees use Presto daily to run over 30,000 SQL queries against the 300 PB data store.

These are some reasons that Presto performs so well:

  • Data is always stored in memory during the query’s execution.
  • Presto relies upon flat memory structures and other efficient coding methods to streamline execution.
  • Presto was designed to minimize latency by executing queries immediately upon data discovery and without waiting for a previous query to finish.

For an overview of the platform, check out this SlideShare presentation from Rahul Pathak, AWS GM Athena, and EMR:

Getting Started With Amazon Athena

It has never been easier to get your data into Amazon Athena. The Openbridge service optimizes and automates the configuration, processing, and loading of data to a data lake. This includes automatically handling data lake formation, data catalogs, and operations for you.

Our code-free, zero administration data lake service delivers cost savings and performance gains for Amazon Athena by compressing, partitioning, and converting your data to a columnar format to reduce the amount of data it needs to scan.

With automated data pipelines to Amazon Athena, you don’t need to worry about configuration, software updates, failures, or scaling your infrastructure as your datasets and number of users grow.

Get Started Now With Amazon Athena

References

DDWant to discuss how to leverage Amazon Athena for your organization? Need a platform and team of experts to kickstart your data and analytic efforts? We can help! Getting traction adopting new technologies, especially if your team is working in different and unfamiliar ways, can be a roadblock for success. This is especially true in a self-service only world. If you want to discuss a proof-of-concept, pilot, project, or any other effort, the Openbridge platform and team of data experts are ready to help.

Reach out to us at hello@openbridge.com. Prefer to talk to someone? Set up a call with our team of data experts.

Visit us at www.openbridge.com to learn how we are helping other companies with their data efforts.

--

--

Marketing strategy at Openbridge. Helping analysts automate data pipeline routes and bring data together at www.openbridge.com.