Cloud Data Warehouse: 5 DOs and DON’Ts
In the previous blog post, we discussed why moving the data warehouse to the cloud is becoming a new normal, along with the benefits reaped.
However, careful considerations and execution are vital for making the most of your cloud data warehouse.
Moving your business data to a cloud warehouse is not a “one size fits all” thing.
The process varies from industry to industry based on compliance requirements, company size, data volume, business processes, and many other macro-level factors pertaining to business-specific needs.
But on a high level, one can follow certain guidelines for taking the right path and sticking to it down the road.
In this blog post, we’ll see 5 critical DOs and DON’Ts of cloud data warehousing.
Cloud Data Warehouse: DOs and DON’Ts
The 5 DOs and DONT’s we’ll discuss in this post are:
- Opt for the right architecture
- Create a solid data model and data flow blueprint
- Choose ELT over ETL
- Go agile
- Don’t overlook user adoption
Let’s see them in detail.
Opt for the right architecture
Most businesses deciding to migrate to a cloud data warehouse face a major roadblock – choosing the right architecture.
There are currently 2 types of architectures for cloud data warehousing.
- Cluster-based architectures
- Serverless architecture
The cluster-based architecture is a pool of shared computing resources, referred to as Nodes. Each node in the cluster consists of CPU, RAM, and storage space.
The cluster-based architecture is priced based on the number of clusters you use, and hence the cost incurred is quite easily predictable.
For Serverless architecture, the hardware allocation is taken care of by the managed service provider. The computing resource allocation happens dynamically on a shared basis.
You pay for the storage space and the queries executed at a given time. The price is not easily predictable.
Though it is a tough call to make for many businesses, you can choose a hybrid model as well if your needs demand, and choose the “pay as you go” pricing offered by your managed service provider.
Another important consideration for choosing your managed service provider is if they are covering the compliance requirements, like HIPAA, SOC-2, etc that are mandatory for your business.
For more info about cloud data warehousing, and its benefits, check out our article on 6 reasons for rapid adoption of cloud data warehouses.
Create a solid data model and data flow blueprint
If you notice the differences between a data lake and data warehouse, you can easily find the major challenge with a data warehouse.
Unlike data lakes, a data warehouse requires a strong data model even before you start developing it.
Hence, when migrating to a cloud data warehouse, it is important to do the groundwork similar to what you did when creating an on-premise data warehouse.
The data modeling and data flow blueprint must not be overlooked. Though the existing model and flow blueprint can be reused to the maximum extent, one must not forget the complexities when moving to the cloud.
The cloud architecture you choose, the money you spend, and many other processes must be taken into account for optimizing your data model and data flow.
Ask questions like the ones mentioned below to the appropriate stakeholders and collect necessary information for building the data model and the flow.
- Which processes are essential for the business to operate?
- What data should be stored or retained to achieve compliance?
- What data is required for decision-making?
- Which data is handled manually outside of your data warehouse? Is it required to be stored for automation when the business grows in size?
- How the same data is used by different functions, say marketing/sales, operations, production, etc?
- What is the production deployment method for the cloud data warehouse? How to minimize the potential operational lapses when making the switch to the production environment?
The data model must always be focused on minimizing the data stored while maximizing the value derived out of it without any compromise.
Select the right schema for your data model, and carve out a path for data to flow from one system to another.
Migrating everything as it is from your on-premise data warehouse in a hurry to the cloud (without implementing the learnings from the mistakes in the past) is of little use. Hence, the groundwork must not be overlooked or skipped.
Choose ELT over ETL
The developers often wonder if they should choose to Extract, Transform and Load or to Extract, Load, and Transform.
The sequence of the integration process was not a cause of confusion until a few years ago as the only option was ETL.
However, with many managed service providers offering the ELT option, businesses are starting to think otherwise.
In the case of ELT, the data is loaded into the data warehouse first, and transformed to the appropriate format next. The transformation also happens within the Data Warehouse, and hence, the need for a staging environment is eliminated.
However, for ETL, the staging environment is a mandatory requirement, as the data transformation happens outside of the data warehouse before loading it.
The best choice is to use both ELT and ETL in a hybrid model, based on the compliance requirements for the data stored.
Go agile
Getting it all done in one shot is not the right way when moving your business data to a cloud data warehouse.
Yes, if you have an on-premise data warehouse, you’re most likely to choose to simply migrate the data from it to the cloud. In that case, the cloud just serves as a mere backup of your on-premise data warehouse.
It’s always good to take time, find out the inefficiencies in your on-prem warehouse, and fix them while moving to the cloud.
The agile methodology works best for your cloud migration projects, especially if your data volume is too high, and dealing with skilled resource limitations.
The agile methodology also helps to quickly adapt to any changes coming by the way of the project.
Don’t overlook user adoption
The final piece of the cloud data warehousing puzzle is user adoption.
This part is often overlooked by many businesses but has the potential to make or break the project.
There’s no doubt that the Cloud data warehouse offers many advantages over a traditional on-prem data warehouse from the technology aspect. However, businesses are also made of people following a discipline in terms of processes.
Hence, it is critical to onboard the users to the cloud data warehouse, and train them how to use it.
A quick example for the effect of failing to train users – in a serverless model, an inefficiently executed query (using large space) could even cost $1,000. It is important to help users to query the data appropriately to avoid such huge spending on a single query!
Hence, equipping the users with the required literacy of your cloud data warehouse is a mandatory step rather than an optional one!