Extract, Transform <-> Load
What it means in general?
Nowadays BI (Business Intelligence) is wide-spread data interpreter. It helps to translate computer methods and company instruments into understandable by people form. One of the used processes during the BI, that I want to introduce to you in the article - is ETL (Extract Transform Load) and also a more new version called ELT (Extract, Load, Transform)

What means ETL in general?

This process extracts the data from different RDBMS/NoSQL/ETC source systems, then transforms the data and finally loads the data warehouse system. People are used to think that creating a data warehouse is simply a data extraction from numerous sources and loading into database there. So, that is not the same. Therefore, let's figure it out.

As we already know, the abbreviation is not "quite difficult" , but it's still almost meaningless for us.

Then, we have to go deeper.
At first, "Extraction". During this step, data is extracted from PostgreSQL, MongoDB, flat files etc. into the staging area and further to the Data Warehouse.

It takes three methods of Data Extraction: Full Extraction, Partial Extraction – without update notification, partial extraction – with update notification.

Regardless of the method used, extraction doesn't affect performance and response time of the source systems.

Next part of the abbreviation is "Transformation". Extracted data from source server is raw and not for usage in its original form. Therefore it demands to be cleansed, mapped and transformed.

In fact, this is the main part when ETL gets value and changes data such that insightful BI reports can be generated. In this step, you apply a set of functions on extracted data. Data that does not require any transformation is called as direct move or pass through data.

And the final step is "Loading". Typically, data warehouse requires to load huge volume of data in a relatively short period. However, load process should be optimized for performance.

In case of failure, recover mechanisms should be configured to restart from the point of failure without data integrity loss. Admins of Data Warehouse have to spectate, resume and cancel loads as per prevailing server performance.

Why do you need ETL?

If you want to create a data warehouse that combines data from various sources, you will need a way to get that data into the warehouse. This is where ETL comes in.

For example, we may extract marketing data from various sources like Google Analytics and Facebook Ads. Every data set is different, so each will need to be transformed to clean up the data set and prepare it to be loaded in an orderly way. Finally, the data can be loaded into the data warehouse.

These three steps – extract, transform, and load – are necessary in creating and maintaining a data warehouse.

While it is sometimes advantageous to accomplish ETL through a custom solution, most people opt for one of the many ETL tools that are offered through various software providers.

Once the data has been loaded into the data warehouse, you will be able to analyze and report on it through your tool of choice. The ETL process is required to get that point.

What is ELT and why should we care?

Let's Work Together

Made on