This is a podcast episode you do not want to miss! In this episode we sit down with Stephen Brobst, the CTO at Teradata. Below are just a few of the takeaways from this value packed episode. 

  • Large organizations are shifting more to a distributed / inter-cloud architecture for many reasons, a couple of reasons are data sovereignty, increasing resilience and reducing costs.
  • Just because your DW does not support indexing does not mean you do not need them. 
  • One of the most common reasons DW’s fail is they are led by IT and not the business. The DW should be led directly by the business needs and most important initiatives.

The rise of distributed/ inter data architectures

We are all familiar with a hybrid cloud, which just means creating an architecture that is connecting an on-premises infrastructure to a private or public cloud service. This type of approach became popular when many organizations started to shift to the cloud and legacy applications remained on-prem.

Another popular deployment strategy is called multi-cloud, which is when organizations deploy applications across multiple CSP’s. But these applications across the various CSP’s are not communicating with the other CSP’s.

The next type of deployment strategy recently growing in popularity is called distributed / inter cloud – which takes the multi-cloud concept that allows you to share data and processing responsibility across the multiple clouds or hybrid on-prem and in the cloud. In this inter / distributed cloud the operation, governance, updates are controlled and managed from a central control plane but the processing happens in the various locations / environments, this can help reduce cost, increase resilience and help meet regulations.

Distributed clouds are particularly important where multinational organizations need to meet data sovereignty regulations in particular with data localization. Many countries like Germany, France, Russia, Chine, Canada have data localization regulations specifying what data has to remain on servers within its borders. An example is sovereignty laws might say that, “this detailed data about this customer’s bank account, it’s not allowed to leave the country”.

It’s not all sunshine and rainbows in the distributed cloud architecture, in particular with cost. If your moving large amounts of data out of CSP’s you can be hit with data egress fees, which are essentially a tax on your data. Stephen believes this egress fee will go away over time, since Oracle Cloud has already eliminated the fee for the first 10 TB. Organizations want freedom of there data and not be trapped to vendors by large egress fees, this pressure by the customers will also help keep this fee down.

One of the biggest challenges organizations face when building out a multi-cloud architecture is not fulling understanding the strengths and weaknesses of each CSP. There are many variables and understanding when and how to integrate can be difficult.

Multi-Temperature Data Management

One pattern that is helping in this area is called multi-temperature data management which can, based on data usage patterns, dynamically move infrequently accessed data (cold data) to a lower-cost storage.

“you don’t want your DBA to have to be worrying about this cause the answer changes all the time”

  • Which data is best to be in solid state or non-volatile memory?
  • Which data should be on spinning disc drives?
  • Which data should be stored in an object store?
  • Which data should be stored in block storage?

They all have different characteristics in regards to cost and performance and access. You need to decide based on the use case.

And, and the answer today might be different than the answer tomorrow because that data is now a day older and it’s not going to be used with the same frequency.

That’s where multi-temperature data management can help with the dynamic migration of the data to the right place. But now storage hierarchy is not just hierarchy within a single system now it’s across multiple cloud service providers

For organizations that have been really successful with data warehouses, what patterns do you see that contribute to their success?

There are many factors that contribute to an orginzation having a successful data warehouse.

Per Stephen, one of the most important reasons is that they are business led and not technology led.

They look at what the business problem is and build the DW to support that.

The notion of build it and they will come is failure mode.

They will not come..

The key is to focus on incremental delivery that is directly tied to key business drivers.

Do not try to create large projects with a big bang, long term incremental change focusing on value add activities first.

One of Stephen’s key principles when designing a modern data architecture is to create an architecture that is flexible enough to ingest data near real time if the use case is there. And you may not need that capability today, but you will most likely need it in the future for some use cases.