記事

The Data Lake is Dead; Long Live the Data Lake!

Martin Wilcox examines the failure of data lakes.

Martin Willcox

2019年6月13日 3 分で読める

You probably already know that leading analyst firms have been quoting data lake failure rates of 85% for some time now.

You may not be aware that one of those same leading analyst firms are now also forecasting that, by 2020, 30% of data lakes will be built on standard relational DBMS (database management system) technology “at equal or lower cost than Hadoop” because – and I quote - “application performance is superior” and “most data going into data lakes is relational.”

Put those two things together and you start to understand why MapR has recently gone to the wall. And why Cloudera is under so much financial stress.

With many organisations having invested tens and even hundreds of millions of dollars in data lakes that deliver little or no business value, it’s way past time for some brutal self-assessment in the technology industry.

Many data lakes have failed because they were IT-led vanity projects, with no clear linkage to business objectives and operational processes. If the strategy for your failing data lake is to lift-and-shift it lock-stock-and-barrel from Hadoop to an object store, then you are about to flush more millions down the pan - to say nothing of the opportunity cost associated with several more wasted years. Unfortunately, I know from personal experience that this is absolutely the plan in several large organisations that really ought to know better.

Failed data lakes often represent a toxic combination of both poor technology choices and an inadequate approach to data management and integration. If you think that data management begins and ends with ACID (Atomicity, Consistency, Isolation, Durability) compliance – as at least one of the cool kid vendors that e-mails me regularly seems to – then pick any technology platform you like, so long as you do it quickly. If you are going to fail anyway, you may as well fail fast.

Better yet, develop a data strategy that includes a layered data architecture, a minimum viable product approach to data integration (we call that “Light Integration”) - and an agile, incremental approach to the more robust integration of the data that matter most. That gives you a fighting chance of optimising end-to-end business processes and delivering real business value.

Much of the complex, multi-structured data that today sits unloved and unqueried in Hadoop-based data lakes will ultimately reside in object storage. At Teradata, we recognize this – hence our focus on enabling robust access to object stores. But much of your structured and semi-structured interaction data belongs in your existing data and analytics platform, where they can be seamlessly integrated with the transaction data you already manage there. Don’t just take my word for it, ask the analysts.

Not every data lake is a data swamp – and like all technologies, the Hadoop stack has a sweet spot. But the tide of history is now running against data silos masquerading as integrated data stores, just because they are co-located on the same hardware cluster. And that same tide is running against a distributed file system and lowest-common denominator SQL engine masquerading as a fully-fledged analytic DBMS.

If you are doubling-down your investment in Hadoop, you are swimming against that tide. And if you are betting on a fashionable-but-unproven technology to get you out of a data management hole, then you aren’t learning from recent history – you are condemning yourself to repeat it. But if you are ready to move on and look forward, talk to us about the industry’s leading integrated data and analytic platform, Teradata Vantage.

Tags

Martin has over 27-years of experience in the IT industry and has twice been listed in dataIQ’s “Data 100” as one of the most influential people in data-driven business. Before joining Teradata, Martin held data leadership roles at a major UK Retailer and a large conglomerate. Since joining Teradata, Martin has worked globally with over 250 organisations to help them realise increased business value from their data. He has helped organisations develop data and analytic strategies aligned with business objectives; designed and delivered complex technology benchmarks; pioneered the deployment of “big data” technologies; and led the development of Teradata’s AI/ML strategy. Originally a physicist, Martin has a postgraduate certificate in computing and continues to study statistics.

Martin Willcoxの投稿一覧はこちら

最新情報をお受け取りください

メールアドレスをご登録ください。ブログの最新情報をお届けします。

会社メールアドレス*

国*

はい

いいえ

テラデータはソリューションやセミナーに関する最新情報をメールにてご案内する場合があります。なお、お送りするメールにあるリンクからいつでも配信停止できます。以上をご理解・ご同意いただける場合には「はい」を選択ください。

address1

テラデータはお客様の個人情報を、Teradata Global Privacy Statementに従って適切に管理します。

The Data Lake is Dead; Long Live the Data Lake!

Martin Willcox について