ETL Tools — The ELT vs. ETL Process

How to ELT or ETL data to a warehouse, data lake, or mart

Thomas Spicer
Published in
11 min readJul 1, 2019

--

In this article, we will provide a high-level, conceptual view of ELT and ETL definitions. There are example ETL solutions that help illustrate real-time or batch data ingestion and migration. Lastly, we show how ETL and ELT can coexist in a data workflow for pulling data, transforming it, and loading it to cloud-based destinations.

  • ETL: The “ETL” acronym can describe an electrical safety certification or a data process that will “Extract, Transform and Load” to a target destination. Examples of “T” are processes for data cleansing or data quality prior to loading
  • ELT: The abbreviation “ELT” is used to describe two pretty cool things. One is the world’s biggest eye on the sky called the “Extremely Large Telescope.” The other acronym is for data ingestion, loading, and processing called “Extract Load Transform.”

While it can seem like you are peering into the darkness when working on ETL and ELT projects, we won’t be talking about how to use a telescope to solve those challenges (or electrical safety testing).

ETL and ELT Defined

ELT (Extract, Load, Transform) is a variation of ETL (Extract, Transform, Load). With ELT, data transformations occur after data is loaded to a data lake or warehouse.

With the advancements of data lakes or cloud data warehouses like Azure Data Lake, AWS Redshift, AWS Redshift Spectrum, AWS Athena, SQL Server, and Google BigQuery or Presto, the use of ELT has seen increased adoption.

Batch processing of large volumes of data, or streaming real-time raw data, each reflects pipelines that benefit from post-load transformation.

What is an ETL tool?

  • represents an ELT or ETL process for extracting, aggregating, organizing, converting, routing, mapping, and storing data regardless of the volume of data
  • offer frameworks and methods to handle diverse and complex data ecosystems ensuring ease of access
  • provide consistency and automation for development, testing, and operations
  • increase business velocity for data processing, visualizations, reporting, or statistical models by removing barriers to accessing data

What is an ETL process?

Business is buried in data. The distance between data and those who need to use data has never been greater. When data resides hidden in systems, it is unseen and untapped for operations, insights, and decision making. These are your data silos, and ETL processes can help break them down.

A process is meant to provide a framework to efficiently and programmatically break down data silos. A process encapsulates people and technology to solve ongoing and expanding challenges for data ingestion and data migration. A process informs the right types of tools that will work best for your data challenges.

Data lake and architecture diagram

Why we use ELT and ETL tools

Extract, Transform, and Load has traditionally been considered difficult and resource-intensive given various upstream data sources, ill-defined business logic, lack of expertise, and legacy environments. The difficulty is centered mainly on the transformation work required in advance of loading data to a target destination.

Whether it is ETL or ELT, the heavy lifting necessary for transformation work is still present in today’s enterprises. This work is not getting any easier, considering the data wrangling across so many data sources.

The purpose of commercial or open-source ETL tools is to facilitate easy data consumption. Ultimately, easy data consumption is a principal goal of any effort. These processes reflect attempts to achieve efficiencies and remove barriers to attain a smooth data consumption outcome.

ETL vs. ELT

When people discuss ETL alternatives, they focus primarily on ELT. We previously did a deep dive on ETL, which briefly referenced ELT. We will be going into more depth in this article on both.

As we indicated previously, the transformations or the “T” in both ETL and ELT reflect business logic and models. Both process steps are similar, just in a different order. For example, the data extraction and data loading in both will generally look alike.

While the shift only changes the order of operation, this can make all the difference for teams that need rapid access to data (even if the information is less polished).

The fact that “T” is an intermediate step is considered one of the main ETL implementation challenges. ELT puts the transformation of data after loading, whereas ETL has it in the middle before loading.

A complex transformation can take time to get defined, designed, implemented, and tested. This can be a blocker to accessing less refined but usable data.

Since potentially complex transformations do not slow data access, teams can get access to it more quickly with an ELT process.

While loaded data may be less refined, the fact that it is easily accessible delivers value to those that prioritize speed over further refinement.

How do ETL tools work? Diagrams for ELT and ETL developers

Below are a few different ELT and ETL flow diagrams for ETL developers. The data source for these flows can reflect batch and streaming data ingestion. These examples reflect a high-level data ingestion pipeline using both ETL or ELT.

1. ETL process in data warehouse example

This example shows the data extraction from a source through a set of data transformation tasks and loaded into a target warehouse or data lake destination. The data extraction in ETL, as well as transformation and loading, can be seen in this basic BigQuery example:

BigQuery ETL

This end-to-end ETL example reflects a data extraction process, data mapping, and loading to a target data warehouse. It reflects ETL's basic concepts for the procedure.

While we only show one data source, they would apply to multiple sources as well. On another note, we show BigQuery in the example, but this could quickly reflect an Amazon Athena or Redshift ETL process within an AWS pipeline. So the best tool may not fit systems like Redshift or Athena.

2. ELT tools example

In this example, we show ELT or the “extract, load, transformation” process. It’s helpful to think of ELT tools and processes as an evolution of traditional ETL methods.

As we mentioned previously, the transformation (“T”) step is transitioned to a post-process. In some regards, the “T” might not even be needed depending on the use case. It might just be “EL” or extraction and loading.

If transformation is needed, it is executed at a later date and time by possibly by a different team. In ELT, an often complex transformation process is abstracted from the workflow to increase the velocity of data flow.

ELT Example

How would this change in a BigQuery vs. Amazon Redshift workflow? Not much.

Redshift and ELT Example

A Redshift ETL or ELT process will be similar but may vary in the tools used. There is a collection of Redshift ETL best practices, even some open-source tools for parts of this process. However, from an overall flow, it will be similar regardless of destination,

3. ELT vs. ETL architecture: A hybrid model

ETL often is used in the context of a data warehouse. Our examples above have used this as a primary destination.

Both serve a broader purpose for applications, systems, and destinations like data lakes and data marts. Keep in mind this is not an ETL vs. ELT architecture battle, and they can work together. Below is an example of a hybrid model:

Hybrid Model

In this example, there are two parts to the ELT and ETL architecture. The first team used an ELT process to move data from Adobe events to an AWS data lake.

The second team would data migration of the data from AWS to an enterprise Oracle Cloud environment using Oracle ETL systems. This allows team two to run ETL testing jobs in Oracle, and create new data models or applications without impacting downstream workflows.

Why AWS to Oracle? It was the most efficient and cost-effective data consumption pattern for the Oracle BI environment. The agility and economics of using the AWS rather than a traditional ETL data migration process were 100X less expensive and much faster.

When looking at data ingestion in an AWS data lake like the example above, some curation levels should be implemented. This is the case for other destinations like marts and warehouses too. Primarily, this is a function of the extraction applications you are creating.

The extraction methods in data warehouses such as the Oracle example above will vary based on Team Two's tools.

4. ETL and ELT tools in business intelligence

This example is a derivative of the number 3. This is a demonstration of an ELT and ETL business intelligence extension.

In our use cases, we will assume the Oracle environment Team Two manages does not allow a specific group of analysts access. The analysts can undertake data analytics based on the ELT work Team One completed.

The example below reflects how the same workflow supports Tableau data prep and a Tableau AWS connection to a data lake and Amazon Athena.

Tableau ELT

This model's efficiency enables the BI team to leverage the ELT model from team one, and connect Tableau to AWS Athena so they can accomplish their reporting work.

Is Tableau an ETL tool in this use case? No, this is purely for data analysis. Tableau could undertake some essential transformation work. This answer may change slightly if the questions were in Tableau Prep, an ETL tool. With Tableau Prep, you are indeed taking some level of ETL work. You could also replace Tableau with some form of the Alteryx process.

The point here is that rather than have this be ETL vs. ELT, you can have an approach that gives you the flexibility to have a conversation about one or more tools performing transformations on the same data.

5. ELT and ETL data for Adobe clickstream events

This example is a derivative of the number 4. Like the last case, rather than using commercial offerings like Informatica ETL or Talend ETL for processing Adobe data, a data lake and Oracle process were used.

This shows business intelligence operations occurring from the Oracle EDW rather than the data lake.

They can work together.

The Oracle ETL team process is simplified, given the use of the optimized data lake architecture and organized data catalog. The Oracle team calls the ETL schema endpoint for each master view they want to tap for data extraction.

They undertake batch processing at a set schedule to extract the data they need. Of course, if required, the underlying information is available in the lake, but using the catalog for data extraction in ETL, the process is a snap.

ETL Target: S3 Data Lake, Data Catalog

In this case, the Oracle ETL developers created a workflow to support business operations and a separate Tableau ETL data analytics. This provided an ETL automation framework that leveraged Adobe, Amazon, and a data lake as a landing zone or staging area.

Summary: Building out data pipelines

If you are an entry-level ETL developer or an old pro, you know the best ETL or ELT is a function of available tools, skills, experience, and attention to detail. You know that ETL and ELT are not mutually exclusive of each other. Both can be considered as parts of a broader data integration strategy. Each may reflect a stage or a step in a data migration process that occurs within the enterprise.

If you are currently designing data pipelines for data warehousing or a data lake, you may want to consider ELT or a hybrid approach. The hybrid model is typical of data lake workflows.

Any specific ETL vs. ELT pros and cons will be dependent on your situation. For example, if you are using Google Cloud or Amazon Web Services, both offer a collection of services to create or host your cloud ETL or ELT. Each will provide a unique set of capabilities and service offerings to assist in rolling your tools.

Getting Started

Are you looking for AWS ETL partners? A simple, easy-to-use online ETL tool? The best ETL tools are the ones you don’t have to learn how to code or deploy. Openbridge offers ELT data ingestion as a service. We offer both batch data ingestion services as well as streaming API data ingestions.

In addition to our batch data processing system and streaming API, we also offer pre-built data connectors to popular services like Facebook, Amazon Seller Central, Google Ads, Salesforce, Google Analytics 360, YouTube, and many others.

If you do not want to learn ETL and look for a code-free, fully automated, zero administration ELT data pipeline, we have you covered.

We have launched a code-free, zero-admin, fully automated data lake that automates database, table creation, Parquet file conversion, Snappy compression, partitioning, and more.

Get started with Amazon Redshift Spectrum or Amazon Athena for free!

DDNot ready yet to get started just yet? If you are trying to learn how to implement the ETL process or explore an ELT model, reach out to our data experts' team to discuss your needs further.

Need a platform and team of experts to kickstart your data and analytics efforts? Our ELT and ETL toolkits can help! Getting traction by adopting new technologies, especially if your team is working in different and unfamiliar ways, can be a roadblock to success. This is especially true in a self-service-only world. If you want to discuss a proof-of-concept, pilot, project, or any other effort, the Openbridge platform and team of data experts are ready to help.

Visit us at www.openbridge.com to learn how to help other companies with code-free, fully automated ETL Processes.

References

Opensource ETL Tools

Cloud ETL Tools

Traditional ETL Software

Python ETL

Python is accessible and ubiquitous in ETL and ELT. Python SDKs support an ETL extraction process to transformations in Pandas, Pyarrow, to loading in a target destination with SQLAlchemy, you can find Python everywhere. Python is a jack of all trades toolkit, either as a standalone solution or as a component in open source, cloud, or commercial products.

SQL ETL?

On Quora, there is a question asking, “Is SQL an ETL tool?”. Technically speaking, SQL could be considered a tool for ETL. You can undoubtedly extract, transform, and load data via SQL. It might not be the best tool, but you could not categorically say it could not be used as one. So in an underlying use case, yes, SQL could be used as a simple tool though there are likely better options (see Python above).

--

--