記事

Analytics in the Hybrid Cloud – An Architect’s Perspective

The hybrid cloud is not just a consideration, but for many of our customers, already a reality. Read more to learn best practices when considering a hybrid or multi-cloud environment.

Tomi Schumacher

2020年1月20日 3 分で読める

The hybrid cloud is not just a consideration, but for our customers, especially our larger customers, already a reality. To understand what we mean when we say hybrid cloud, we first need to define two terms:

Hybrid Cloud: Uses mix of on-premises and one or more clouds.
Multi-cloud: Uses two or more clouds.

Typically, companies are using both hybrid and multi-cloud architectures.

When we talk to our customers about hybrid cloud and or multi-clouds, we often hear the following:

We want to use the best-of-breed cloud for a workload.
We want to be cloud agnostic.
One cloud provider does not serve all our locations.
Some workloads need to stay on-premises, others can go to the cloud.

It all really boils down to one question: How do I architect my analytics for the hybrid cloud?

First, the advantages and disadvantages of hybrid cloud or multi-cloud environments need to be considered:

Advantages	Disadvantages
Special purpose clouds (e.g. Salesforce, SAP, …)	Data movement between clouds / on-premises is slow and expensive
Better chance for a local data center	Different stacks and proprietary technology on each cloud
Best-in-class cloud for specific workload	Different costs and billing
Better price for a specific workload	Expertise for different clouds needed

A key aspect when planning a hybrid cloud architecture is data movement between the clouds / on-premises. While moving data into the cloud is free, moving data out of the cloud can be expensive. In addition, communication in and out of the cloud adds considerable latency.

An analytic ecosystem begins with sources and ends with consumers. The analytics ecosystem itself has three tiers:

Receive: Raw data lands here, and some standardization and cleansing are performed.
Analyze: The reporting, the model training and model scoring are done in this tier.
Serve: Here the data products are available in a format ready for consumption.

The figure below shows the analytical ecosystem with the tiers as described. It adds arrows, and the thickness of the arrows signifies the amount of data moving between the tiers:
Screen-Shot-2020-01-22-at-9-02-13-AM-(1).png

As the figure above shows, the most cost-effective movement between the clouds or between on-premises and the cloud is between the Serve tier and the consumer. The bandwidth usage between the Analyze and the Serve tier is the second lowest.

Below we show two examples of ecosystem architectures that illustrate the data movement between the Serve tier and the consumer.

In the first example, sources are operational systems on-premises and in the Salesforce cloud. The tiers that receive, analyze and serve are all in the AWS cloud. Consumption happens on-premises:

Ingestion into the cloud uses a lot of bandwidth, but the cloud providers do not charge for this. The outbound bandwidth usage is minimized as only results are transferred to the consumer on-premises.

In the second example, the main analytics is done on-premises, as are the sources (with the exception of Salesforce) and the consumers. In addition, there is a regulatory and compliance system in the cloud to service use cases where historic data needs to be accessed which is no longer available in the main analytics system.

Like the first architecture shown, in the second architecture, the bandwidth usage is out of the cloud and therefore the cost is minimized.

To conclude, here is a list of best practices:

Best Practice
Minimize data transfer between clouds	Exporting data is costly Bandwidth is limited
Minimize time-critical dependencies between clouds	Latency slows down communication
Use applications as-a-service in the cloud	Salesforce Office 365 SAP Financials
Use leading and established general purpose clouds	AWS Azure Google Cloud
The same Governance rules apply to all clouds and on-premises	Avoid shadow IT in the cloud
Use a dedicated connection to the cloud and in between clouds	Lower latency Guaranteed bandwidth More reliable
Use cloud-specific base SaaS	Object Storage (Amazon S3, Azure Blob) Messaging
Use portable specialized services	Data Warehouse Machine and Deep Learning
Deploy custom code as Docker images with Kubernetes	Dev ops More cloud agnostic

The key recommendation is that the analytic ecosystem can consist of multiple clouds and on-premises environments, but the core analytics should be done on a single cloud or on-premises, and it all boils down to the data centricity on the cloud context and a common tool chain for analytics.

For more information about this topic, check out the white paper, “De-Risking Hybrid, Multi-Cloud Analytics” and the webinar, “Moving Toward a Modern Analytics Architecture”.

Analytics in the Hybrid Cloud – An Architect’s Perspective

Tomi Schumacher について