Subscribe to the Teradata Blog

Get the latest industry news, technology trends, and data science insights each week.

Analytics in the Hybrid Cloud – An Architect’s Perspective

Analytics in the Hybrid Cloud – An Architect’s Perspective

The hybrid cloud is not just a consideration, but for our customers, especially our larger customers, already a reality. To understand what we mean when we say hybrid cloud, we first need to define two terms:

  • Hybrid Cloud: Uses mix of on-premises and one or more clouds.
  • Multi-cloud: Uses two or more clouds.

Typically, companies are using both hybrid and multi-cloud architectures.

When we talk to our customers about hybrid cloud and or multi-clouds, we often hear the following:

  • We want to use the best-of-breed cloud for a workload.
  • We want to be cloud agnostic.
  • One cloud provider does not serve all our locations.
  • Some workloads need to stay on-premises, others can go to the cloud.

It all really boils down to one question: How do I architect my analytics for the hybrid cloud?

First, the advantages and disadvantages of hybrid cloud or multi-cloud environments need to be considered:
 

Advantages Disadvantages
Special purpose clouds (e.g. Salesforce, SAP, …) Data movement between clouds / on-premises is slow and expensive
Better chance for a local data center Different stacks and proprietary technology on each cloud
Best-in-class cloud for specific workload Different costs and billing

Better price for a specific workload

Expertise for different clouds needed


A key aspect when planning a hybrid cloud architecture is data movement between the clouds / on-premises. While moving data into the cloud is free, moving data out of the cloud can be expensive. In addition, communication in and out of the cloud adds considerable latency.

An analytic ecosystem begins with sources and ends with consumers. The analytics ecosystem itself has three tiers:

  • Receive: Raw data lands here, and some standardization and cleansing are performed.
  • Analyze: The reporting, the model training and model scoring are done in this tier.
  • Serve: Here the data products are available in a format ready for consumption.

The figure below shows the analytical ecosystem with the tiers as described. It adds arrows, and the thickness of the arrows signifies the amount of data moving between the tiers:
Screen-Shot-2020-01-22-at-9-02-13-AM-(1).png

As the figure above shows, the most cost-effective movement between the clouds or between on-premises and the cloud is between the Serve tier and the consumer. The bandwidth usage between the Analyze and the Serve tier is the second lowest.

Below we show two examples of ecosystem architectures that illustrate the data movement between the Serve tier and the consumer.

In the first example, sources are operational systems on-premises and in the Salesforce cloud. The tiers that receive, analyze and serve are all in the AWS cloud. Consumption happens on-premises:
Screen-Shot-2020-01-22-at-9-03-11-AM.png
Ingestion into the cloud uses a lot of bandwidth, but the cloud providers do not charge for this. The outbound bandwidth usage is minimized as only results are transferred to the consumer on-premises.

In the second example, the main analytics is done on-premises, as are the sources (with the exception of Salesforce) and the consumers. In addition, there is a regulatory and compliance system in the cloud to service use cases where historic data needs to be accessed which is no longer available in the main analytics system.
Screen-Shot-2020-01-22-at-9-03-55-AM.png
Like the first architecture shown, in the second architecture, the bandwidth usage is out of the cloud and therefore the cost is minimized.

To conclude, here is a list of best practices:

Best Practice

Minimize data transfer between clouds

  • Exporting data is costly

  • Bandwidth is limited

Minimize time-critical dependencies between clouds

  • Latency slows down communication

Use applications as-a-service in the cloud

  • Salesforce

  • Office 365

  • SAP Financials

Use leading and established general purpose clouds

  • AWS

  • Azure

  • Google Cloud

The same Governance rules apply to all clouds and on-premises

  • Avoid shadow IT in the cloud

Use a dedicated connection to the cloud and in between clouds

  • Lower latency

  • Guaranteed bandwidth

  • More reliable

Use cloud-specific base SaaS

  • Object Storage (Amazon S3, Azure Blob)

  • Messaging

Use portable specialized services

  • Data Warehouse

  • Machine and Deep Learning

Deploy custom code as Docker images with Kubernetes

  • Dev ops

  • More cloud agnostic


The key recommendation is that the analytic ecosystem can consist of multiple clouds and on-premises environments, but the core analytics should be done on a single cloud or on-premises, and it all boils down to the data centricity on the cloud context and a common tool chain for analytics.

For more information about this topic, check out the white paper, “De-Risking Hybrid, Multi-Cloud Analytics” and the webinar, “Moving Toward a Modern Analytics Architecture”.
 


Portrait of Tomi Schumacher

(Author):
Tomi Schumacher

Tomi Schumacher is an Eco System Architect/Principal Data Engineer with over 20 years of experience in IT. Tomi has a wide range of experiences: he worked as a consultant, architect, manager and Independent contractor both in the US and in Switzerland for various industries. He is involved in the whole project management live cycle, and he worked on backend services, various databases, web frontends and mobile apps. Tomi worked on private (VMWare) and in public clouds.

  View all posts by Tomi Schumacher

Turn your complex data and analytics into answers with Teradata Vantage.

お問い合わせ