Beginners Guide For Faster, Cost-Effective SQL Queries

Thomas Spicer
Openbridge
Published in
3 min readJul 15, 2021

--

Often when people are starting off writing queries with SQL or via tools like Tableau or Power BI, they are often not very well optimized. Writing well-structured, narrow queries accomplish a few goals.

The first is allowing you to ask questions and get results as quickly as possible. Overly broad queries can take a LONG time to complete.

The second goal is to reduce costs. Systems like BigQuery, Snowflake, Amazon Redshift Spectrum, and Amazon Athena charge you for the compute resources required to execute your queries. Overly broad queries are much more expensive to run.

For example, let us assume you have a table of Amazon orders data from Seller Central. There are about two years of data present which totals 200 GB of data. Here is a simple, unoptimized query;

SELECT * FROM myorders;

Running an unoptimized query like this will take some time to complete. Not only will it be slower, but it is also more expensive. For example, Amazon Athena charges $5 per TB of data scanned. Running this query in Athena costs you about $1. Imagine you have a team of people running this query 5, 10, 20+ times a day. Over the course of a month, you would have racked up over $200+ dollars in query costs. Avoiding running queries like this all the time ensures you are not necessarily increasing the costs from your SQL query engine on Google, AWS, or Snowflake.

How can we improve on the query? Let’s say you wanted to find orders after a specific date.

SELECT * FROM myorders WHERE order_date > '2021-07-01'

If you were to run a query with the data but with a date constraint, the results will be returned faster. However, while you improve the response time of the query by using a date, you still may be charged $1. Why?

Since you are using SELECT *, you still are scanning all 200GB of data in your myorders table. SQL compute services can charge based on the amount of data scanned. If myorders has 50 columns, using * will ensure all 50 columns are included in the query which means you have scanned 200GB of data to return orders after July 1, 2021.

So how do we improve things further? Rather than query all 50 columns, you can narrow the scope of your query. How? By specifying column names, you are improving response time and lowering costs. In this case, we will narrow our query toorder_amount andorder_id:

SELECT order_amount, order_id FROM myorders;

This query will be significantly faster and cost less. In our Amazon Athena use case, this query will have scanned about 100MB of data, which is a fraction of the total 200GB for the previous queries. The cost of this query will be about 0.01. Big price between difference between $1 and $.01 simply by specifying the column names of interest, right?

To further optimize your query and cost performance, you can combine named columns with a date:

SELECT order_amount, order_id FROM myorders WHERE order_date > '2021-07-01'

While we used SQL queries as examples, creating well-composed queries is true for tools like Tableau, Power BI, Data Studio, and hundreds of others. Behind the scenes, these tools are issuing SQL queries. The more optimized your queries in your, the greater your performance will be.

There may be cases where doing SELECT * is needed. However, in cases where you can choose to narrow the scope of a query, you will reap the benefits in both time and money.

About Openbridge: Make data work for you

Unify data in trusted, private industry-leading data lake or cloud warehouses like Amazon Redshift, Amazon Redshift Spectrum, Google BigQuery, Snowflake, Azure Data Lake, Ahana, and Amazon Athena. Data is always fully owned by you.

Take control, and put your data to work with your favorite analytics tools. Explore, analyze, and visualize data to deliver faster innovation while avoiding vendor lock-in using tools like Google Data Studio, Tableau, Microsoft Power BI, Looker, Amazon QuickSight, SAP, Alteryx, dbt, Azure Data Factory, Qlik Sense, and many others.

Get a 30-day free trial so you can try Openbridge for yourself.

--

--