An Extract-Transform-Load (ETL) pipeline is a sequence of processes to extract data from multiple sources, transform it to meet specific requirements, and load the result into a target database for analysis and reporting.
An ETL pipeline could for example involve extracting customer data from a CRM system, a website database, and a social media platform. The data would then be transformed by removing duplicates, standardizing name and address formats, and merging relevant fields (first name + last name). The transformed data could finally be loaded into a PostgreSQL to create customer reports or perform accounting tasks.
ETL is a key concept to build software products because it explains how to centralize data from multiple sources into a single location, ensure data quality, scale to handle large volumes of data, save time by automating data processing, and support informed decision-making through business intelligence.
Let's imagine a use case where you have several Google Sheets with customer data, and you want to create online business reports from this data. The first step is to extract the data from the Google Sheets, then you would need to clean and format the data, and finally you'll need to generate and publish the business reports on a website.
Often, business data is scattered in different places. The extract phase consists in gathering all the data from the different sources to be loaded into a single location.
There are 3 ways to extract data with Rowy: import, webhooks and actions.
One simple way to extract data is to import it from a CSV file.
Google Sheets can export data as CSV files, which you can then import into Rowy. Rowy Imports handle the most important part of data extraction―data mapping―which you can use to map the CSV columns to the corresponding Rowy columns:
This way, you can have a working database in a few minutes. But you'll probably have a hard time doing this for hundreds of spreadsheets with dozens of tabs each. For more complex scenarios, Rowy offers 2 scalable options.
Rowy Webhooks are HTTP functions: whenever you visit the webhook's URL, the webhook is triggered:
If you combine it with external webhook events and API calls, you can entirely automate data extraction. For example, you could create a webhook that runs every time a new row is added to a Google Sheet, and then use the webhook to import the new row into Rowy. Check out our article on importing data from Google Sheets to learn more.
Of course, webhooks can also be called from other Rowy functions to create advanced features. You can also use any npm package or Firestore object within a webhook function. Action columns, for example, can trigger webhooks.
Rowy Action columns represent a semi-automated way to extract data. They are also HTTP functions you can use to run any script on demand: add a new column with the Action type, write your code in the column's settings, and click the corresponding button in the table to execute the script.
This is useful when you want to control when the data extraction process occurs. For example, you could create table rows corresponding to a data source and use the Action button to granularly control which data source to extract.
The transform phase is when you clean and format the data to meet specific quality requirements. You have extracted data and mapped it to the corresponding Rowy columns, but you'll probably need to process it to make it usable. Typically, you'll need to remove duplicates, standardize name and address formats, and merge relevant fields (first name + last name).
Rowy offers two column types to automate data transformation: formulas, and derivatives.
The Formula field type allows you to operate on the values of the other fields in your table: mathematical operations like sum and differences, but also more advanced data manipulations like triming blank spaces in text or concatenating two columns like you would do in an Excel spreadsheet.
You can think of formulas as a way to perform simple data processing on a single row. For more advanced data processing, you can use Derivative columns.
Derivative columns are another type of Rowy columns that run a Cloud function when the corresponding row is created, updated or deleted―an event-driven function for Firestore. It's especially useful to process Firestore data whenever the database changes, to derive values from other fields:
Unlike formulas, derivatives aren't limited to a single row. They can access other Firestore objects, and you can import any npm package to suit any use case you have in mind. For example, you can use derivatives to split a full name into first name and last name, look up redundant data, or geocode an address via API.
The loading phase is when you load the transformed data into a target database. With Rowy, your data is automatically loaded into a Firestore database, so you can use any of the Firestore API methods to load data into any front-end app:
const customer_refs = await getDocs(collection(db, "customers"))
You can also use Rowy as a temporary data store to load data into other target databases, like third-party services.
Because Rowy has extensive support for running custom code, you can build any sort of advanced ETL features you want. For example, you could use the Image column type to perform data visualization tasks. Say, you want to visualize the number of customers per age segment in a chart to add it in a business report. Just generate a bar chart with SVG, and then use a library to convert the SVG to a PNG image. You can then use the Image column to store the image in Rowy, and use the image URL in your report.
Data is at the heart of every business. If the data is poor, decisions will be too. That's why ETL pipelines are key to ensuring data quality. And if you can do it with minimum hassle with Rowy, it's even better!
If you have any questions or app ideas you're excited about, don't hesitate to join our Discord community and share them with us!