Fivetran, the ETL and data pipeline vendor, has released a benchmark report to compare top data warehouses. Get fast facts about the report here.
Fivetran, the ETL and data pipeline company, has released its Cloud Data Warehouse Benchmark report. In partnership with Brooklyn Data Co., Fivetran studied five major cloud data warehouse vendors and how their platforms have changed and improved since 2020.
SEE: Job description: ETL/data warehouse developer (TechRepublic Premium)
In this report, we’ll summarize the key points of this benchmark study and highlight some of the differentiators Fivetran identified among these data warehousing competitors.
What is Fivetran?
Fivetran is a cloud-based data pipeline solution that supports many ETL and data migration projects. One of the main advantages it offers users is several high-speed connectors that require little maintenance and easily adapt to source system changes. With these connectors that span a wide variety of data sources, data integration projects can be simplified.
Other products and solutions from Fivetran include the following:
Fivetran can support a range of business data projects, but the company specifically highlights marketing, sales and finance analytics use cases. Fivetran integrates most seamlessly with AWS and Amazon Redshift, Microsoft Azure and Synapse, Databricks, Google Cloud and BigQuery, and the Snowflake Data Cloud.
Fast facts about Fivetran’s Cloud Data Warehouse Benchmark
This latest Fivetran benchmark offers a comparative analysis of several top players in the cloud data warehousing space. Here are some important details about the queries Fivetran ran, the vendors they assessed and the performance metrics they measured:
- Fivetran conducted a comparative analysis of speed and cost across five data warehouses.
- The main data warehouses covered in this study are Amazon Redshift, Snowflake, Google BigQuery, Databricks and Azure Synapse.
- Fivetran’s assessments in this study are based on the typical Fivetran user, with a focus on many marketing and sales data platforms. According to Fivetran, these users are usually working with complex but lower-volume data sources.
- The dataset used included 24 tables at a 1TB scale; tables include hypothetical retailer data, with the largest table having four billion rows.
- 99 queries were run between May and October 2022 to get these results.
- Each warehouse was queried in three different configurations: The standard configuration is represented with 1X in Fivetran’s tables; 0.5X represents results with half of that compute power; 2X represents results with double that computing power.
Results of the Cloud Data Warehouse Benchmark
The Cloud Data Warehouse Benchmark generated significant data about data warehouse performance and what users might be looking for. For the sake of this report summary, we’ll focus primarily on the big takeaways related to cost, speed and year-over-year improvements.
Cost and speed
Costs across these data warehousing solutions are relatively similar, especially if you assess these tools through a cost-to-performance ratio. Speeds are also similar, as many of these tools deliver results and make data changes within a second or two of each other.
SEE: Best practices for data quality in data warehouses (TechRepublic)
According to Fivetran’s research, this is how each of these solutions compares at the 1X level:
- BigQuery is the highest cost and second-slowest solution.
- Synapse is the second-highest cost and slowest solution.
- Redshift is the third-highest cost and second-fastest solution.
- Snowflake is the fourth-highest cost and fastest solution.
- Databricks is the lowest cost and third-fastest solution.
All of these solutions performed within a few cents and seconds of each other at the 1X level. It’s important to note that while most of the 0.5X solutions stayed within the same ranges as each other, Azure Synapse takes a significant dip in speed with 0.5 compute power.
Each of the vendors covered in this report has made performance improvements, specifically in processing time, between 2020 and 2022. Here’s a quick summary of these findings:
- Databricks was much slower than the other competitors in this group in 2020 — though they have made more advancements than any other vendor listed here since then, now sitting in third place among this group likely related to the rewrite they did of their SQL execution engine.
- Snowflake has surpassed Redshift as the fastest and highest-performing vendor on this chart, but the two are still incredibly close in their numbers.
- BigQuery is the slowest of the four competitors reviewed in this section, but it is still keeping a very close pace with all of them.
- Synapse was not reviewed in Fivetran’s performance improvement benchmark.
Which cloud data warehouse should you choose?
The main conclusion that Fivetran drew from this study is that while some of these cloud data warehousing solutions offer slightly better performance speeds and/or costs, they’re all keeping a relatively close pace with each other. In other words, there isn’t really a “bad” data warehouse option in this set.
SEE: Cloud data warehouse guide and checklist (TechRepublic Premium)
So which cloud data warehouse should you select for your business? That all depends on the kinds and quantities of data you’re working with, the expertise of your data team and the overall investment your company is willing to make for this kind of data management solution.
Read next: Best ETL Tools & Software 2022 (TechRepublic)