Skip to Content

AWS Snowflake’s Cloud Data Warehouse — What I Learned and Why I’m Rethinking the Data Warehouse

Posted on October 28, 2022 by

Categories: AWS


Customers frequently seek Hashmap for assistance in obtaining the most excellent possible performance from their data warehousing solutions.

After investing a lot of time recently in scaling, balancing, and tweaking our on-premise data warehouse infrastructure, Tableau workbooks’ performance to evaluate consumer device hardware data wasn’t where it needed to be.

At peak periods, a maximum of 13 concurrent users might be supported. Executing simple searches might take as little as 5 minutes, while sophisticated queries could take up to 30 minutes. Only once every 24 hours were data loads occurring, yet hourly loads were needed. A further problem was that the user base was geographically distributed throughout the US, Europe, and Asia.

Was it time to consider changing the environment of the data warehouse?

Putting off essential data analysis for days or weeks is no longer acceptable. Data scientists often feel dissatisfied by the restrictions on queries and the inability to load, convert, and integrate structured and semi-structured data since most business teams expect real-time insights that match the quick speed of business and markets.

You Must Take Into Account the Cloud Hold on; it’s not an easy chore to investigate and thoroughly analyse all the data warehouse solutions available. How do you begin? In terms of performance, scalability, elasticity, adaptability, and cost, how do you choose a solution that can surpass the current platform or other conventional data warehousing solutions?

Any healthy IT company should have fundamental expertise in managing and expanding a data warehouse. It should not be necessary to challenge a cloud data warehouse. Bring in an outside team that specialises in this, such as Hashmap, without hesitation. They may assist you not only with setting up the warehouse, which is surprisingly simple but also with configuring it in a way that anticipates the future while minimising expenses.

The interactive, two-hour Data & Cloud Migration and Modernization Workshop from Hashmap will teach you and your team how to hasten desired outcomes, lower risk, and allow contemporary data preparedness. We’ll go over the possibilities and ensure everyone is clear on what has to be prioritised, typical project phases, and risk mitigation techniques. Register right away for our free workshop.

Workshop on Modernizing & Migrating Data & Cloud | Hashmap
With insights, views, team activities, and more, we help you plan out your digital transformation path to the cloud.
The cloud is a significant component influencing how modern data warehousing is evolving. Access is created via the cloud to:

  • Nearly limitless, inexpensive storage
  • Scale-up and scale-down capabilities as needed
  • entrusting the cloud vendor with the complex operational duties of managing and securing data warehousing
  • Possibility of paying for just the computational and storage resources that are really utilised when they are needed
  • I must admit that, having worked with a wide range of software over the years, from Hadoop to Teradata, and having been heavily involved
  • in projects moving workloads from on-premise environments to the cloud, I was highly excited for the chance to investigate the options
  • or architecting and deploying this specific data warehouse.

I concentrated on a method to examine the possibilities that are now accessible across the range of cloud data warehousing rather than letting the characteristics of an existing DW solution create constraints on assessing a new solution.

Snowflake Cloud Data Warehouse is chosen.

The data feeding procedure used during its deployment was reliable and experienced. Every day, an ETL script imports the raw JSON file from the file system and puts the data into an ORC-formatted, snappy-compressed SQL table.

The first restriction was that the cloud data warehouse had to support the ORC file format to prevent having to conduct the ETL procedure again. Maintaining backward compatibility with the current Tableau worksheets was the second necessary restriction.

Given the limitations, only two cloud data warehouses—Snowflake and Amazon Redshift Spectrum—support the ORC file format. ORC files may be accessed with Snowflake and Redshift Spectrum as external files stored in Amazon S3. Redshift Spectrum’s capability to natively load and process ORC data files into Snowflake gave it the advantage over Redshift.

Since Tableau can connect to many other data sources and data warehouses, including Snowflake and Redshift Spectrum, meeting the Tableau restriction was pointless.

One benefit for shorter-term POCs is that Snowflake gives a $400 credit for 30 days that may be used for computing and storage. Snowflake now operates in AWS, but we anticipate it will soon run in other cloud providers.

Snowflake is now accessible on Microsoft Azure as of June 2018 and GCP as of June 2019, as an update to this article.

Snowflake Structure

Snowflake is an absolute SaaS service, a cloud data warehouse developed on top of the Amazon Web Services (AWS) cloud architecture. There isn’t any virtual or actual gear available for you to pick out, set up, configure, or administer. You don’t need to install, set up, or administer any software. Snowflake is in charge of all ongoing management, maintenance, and tweaking.

The Snowflake data warehousecomprisesf three significant parts from an architectural standpoint.