*TLA = Three Letter Acronym.
TLAs. The world is full of them. But as processes shift, so too does the language with which we describe them. The process of preparing data for analytical usage is transforming. Gone are the days of writing business requirements, mapping new data feeds from written specifications and handing off to a select bunch of extract, transform, load (ETL) developers.
IT departments are finally cottoning on to the fact that it is far more efficient to place tools in the hands of the subject-matter experts that understand the data domains. The role of IT changes from doers, to that of the enablers: making the right tools available in a secure and robust data ecosystem.
This federated approach reinforces the concept of business data ownership. It brings to the fore the role of data stewardship – prioritising developments that directly support the development of new business use cases. It also aligns better to the agile working methods that many companies are now adopting.
This isn’t to say that we encourage data federation or duplication; we still want to integrate once and reuse many times through the use of data layers across an integrated data fabric.
So, why does the ETL TLA need to change? Let's examine the rationale.
From Extract, to Ingest
Data no longer comes from a relatively small number of on-premises product processing systems – where data was typically obtained in batches according to a regular schedule.
From transactions and interactions to IoT sensors, there are now many more sources of data. As we move to service-oriented and event-based architectures, data is much more likely to be continuous and streamed through infrastructure like an enterprise service bus (ESB). We need to be able to tap into these streams and rapidly ingest, standardise and integrate the source data.
From Transform, to Wrangle
The explosion in the use of data across diverse information products means that data is repurposed many times for each downstream consuming application. The adoption of the ELT paradigm reinforces this point, where data must be integrated with other data before complex transformations can occur.
Data may need to be derived, aggregated and summarised, to be optimised for consumption by use case or application. Creating complex new transformations requires a greater understanding of the data domain. New data-wrangling tools can also help business users accomplish these types of tasks.
From Load, to Project(ion)
We no longer simply load data to tables in a traditional enterprise data warehouse. Most organisations use an analytical ecosystem using open source technologies to supplement traditional data warehouses. The term ‘data projection’ extends the way data may be consumed – as a logical view spanning multiple platforms in the analytical ecosystem, or delivered via an API to a consuming application. Think of a projection as a set of flexible layers to access the data.