Data Warehouse Modernization: Everything to know in 2021
The concept of Data Warehousing, though decades old, still remains a tough problem to solve for most organizations.
The ever-evolving customer needs, touchpoints, regulatory requirements, along with other business dynamics have proved that organizations must maintain and upgrade the Data Warehouses regularly.
No organization can ever claim their data warehouse is ‘all-set’ or use it on a “set it and forget it” mode. The changes are inevitable, in order to keep the Data Warehouses aligned to the pace of what’s happening in the real world!
In fact, the best data warehouses are the ones that cater well to the present business needs, while remaining flexible to accommodate the changing needs in the future.
But one thing is for sure. The ‘modern’ data warehouses of today run in the cloud, mainly to avoid the pain of managing the hardware infrastructure while achieving limitless scalability.
In this blog post, we’ll quickly run you through the critical information you need to know in 2021 for Data Warehouse modernization.
Use the Free Trial Only for POC
Remember this before you click on that Free Trial button, to create your Cloud Data Warehouse!
The leading Cloud Data Warehouse providers offer a free trial for 30 days, with a limited extension. So before you sign up, you must have a clear picture of what you’re going to try out in the trial.
The free trial must be a POC rather than marking the official beginning of your Cloud Data Warehouse project.
Just take a single fact table and a few dimension tables. Play around with relationships, formulas, calculations, and queries. Give more importance to depth rather than breadth.
In an on-premise DW, you don’t have to worry a lot about the volume of the data resulting from a query. But the game is totally different on the cloud. Some platforms charge you based on the volume of the query results while some are based on the computing/storage resources utilized.
Get familiarized with these minor nuances of the CDW at the free trial stage, rather than letting them delay your progress or halt the project after you kickstart it officially.
The POC must also give you a fair idea of which model to choose for your CDW – Serverless or Cluster-based. You must also be able to identify the redundancies on your existing DW setup.
You must also consider choosing the right region for hosting your (primary and Backup) Data Warehouse, calculate the costs and come up with the budgetary estimates both for the initial setup (keep it as accurate as possible), with an approximate budget forecast for subsequent maintenance efforts required for the near future.
Simplify Applications Migration With this Sequence
When transitioning from the old Data Warehouse to the new modernized Data Warehouse, make the impact as minimal as possible for the end-users. In fact, aim for making the transition as seamless as possible that the users don’t even realize about the transition.
Most teams struggle with migration sequencing. It’s always good to use a standard approach for all applications.
If you have x number of applications currently connected to the old data warehouse, use the following sequence for migration:
- Export all the existing data from your legacy data warehouse to the modern Data Warehouse.
The legacy DW must still continue to be used for the production at this stage.
- Next, connect ALL the applications (one by one) to the modern Data Warehouse, while the connectivity to the existing legacy Data Warehouse must remain intact, in order to continue the day-to-day operations.
After connecting each application to the modern Data Warehouse, a redundant Write operation must be performed on the modern DW, for each existing Write operation on the legacy DW.
At this stage, the Data Read operations for ALL applications must happen only in the legacy Data Warehouse (so you can use the existing data security framework without disturbing the real-world operations), while the Write operations are executed in both legacy and modern Data Warehouses.
It’s always good to perform the redundant write operation without affecting the UX. Your integration code must be in the following sequence at this stage:
- First, execute ALL the Read/Write Data operations at the Legacy DW. Now your business’s production will not be affected.
- Next, perform the redundant Write operation on the Modern DW.
Once you complete the redundant Write operations for ALL applications, move to the next step of the sequence.
- Enforce your Security Policies, set up Roles/Privileges, Row Level Security, and IP/Firewall restrictions on the modern Data Warehouse. Then connect the BI & Analytics tools. Until this stage, all the Read operations happen in the legacy DW, while the Write operations are performed on both legacy and modern DW.
- Execute the Data Read operations in the new cloud data warehouse for all applications, one at a time. Now Data Read is performed on modern DS, while the Data Write operations are performed on both.
- Then make the Write operations in the modern DW primary, and make the Write operations in the legacy DW as redundant. This is done just by making changes on the placement of code in the application integration sequence for each Write operation.
- Run it this way for a few days or months or years. The duration depends on a number of factors, say the criticality of the applications, number of applications migrated. If everything works as intended, then disconnect the old data warehouse, and remove its duplicate steps on the integrate codes from all applications.
In the above sequence, all the applications must travel together. (i.e) Go to the next step, only after ALL applications complete the previous step. This will help you to minimize the time with dependencies on both old and new data warehouses at the same time. You can cut down on the overhead costs while minimizing the risk of being forced to run both Data Warehouses simultaneously for long periods of time, which defeats the purpose of implementing the modern DW.
Last but not least! You cannot foresee all the scenarios proactively. So you’ll have to deal with some exceptions along the way, like for any other project.
Exceptions don’t necessarily mean you’ve done it wrong. If the exceptions are manageable and controllable, you’ve still done it right.
As a proactive step, it’s good to set up an exception-handling mechanism for critical applications and processes. Also, don’t overlook user training. It’s essential to help your BI users to learn optimizing the queries, so you can cut down on costs.
Data Warehouse modernization might seem cumbersome, but the benefits outweigh the pain of hardware infrastructure management while enjoying blazing-fast query performance. When approached with the steps mentioned above, you can enable your organization to create a valuable asset – A modernized Data Warehouse, and help succeed in the long term!