概要
This article covers how a data warehouse stores data, the architecture underlying them, how they enable various business units to leverage and analyze information, and the benefits of data warehousing, including data consistency, stable data storage, and ease of data accessibility.
It also examines the differences among data warehouses, data lakes, and lakehouses, how they can complement each other, and key issues organizations should consider when designing and implementing a data warehouse.
What is data warehousing?
A data warehouse is a centralized repository designed to support business intelligence (BI), reporting, and advanced analytics. It consolidates data from many sources across an organization, including transactional databases, operational systems, spreadsheets, and external datasets.
Once integrated into the warehouse, this information becomes easier for analysts, engineers, and data scientists to query and analyze. Teams can generate dashboards, reports, and visualizations that reveal patterns in sales, operations, customer behavior, and other business activities. Because a data warehouse integrates and standardizes information from multiple systems, it enables organizations to analyze both historical and current data in one place.
In practice, this consolidation helps organizations reduce data silos and maintain consistent definitions for metrics and business data. As a result, many enterprises rely on a data warehouse as a shared analytical foundation for data-driven decision-making.
Key benefits of data warehousing
The table below summarizes the core advantages organizations gain from implementing a data warehouse.
| Benefit | Description | Business Impact |
|---|---|---|
| Data consistency | Standardized data formats and definitions across systems | Ensures teams work from accurate and reliable information |
| Nonvolatile storage | Data stored in the warehouse remains stable after loading | Supports long-term historical analysis and auditing |
| Subject-oriented structure | Data organized around business domains such as finance, sales, or operations | Makes information easier for departments to access and analyze |
| Historical analysis | Data changes tracked over time | Enables trend analysis, forecasting, and long-term business insights |
| Improved performance | Analytical workloads are separated from operational systems | Prevents reporting queries from slowing transactional applications |
Together, these characteristics allow organizations to deliver large volumes of historical and operational data quickly to the people and tools that need it.
How is a data warehouse architected?
Data warehouse architecture organizes data processing and analytics into layers that manage how information is collected, stored, and accessed. These layers ensure that data can be integrated from many sources, transformed into consistent formats, and delivered efficiently to analytics tools.
Data typically enters the warehouse through an extract, transform, and load (ETL) pipeline that cleans and prepares information for analysis. After transformation, the data is stored in a system optimized for analytical queries and reporting. Users and applications then interact with the warehouse through analytical engines and visualization tools.
The layered structure below shows how these components work together.
| Architecture Layer | Function | Typical Technologies |
|---|---|---|
| Data ingestion and storage layer | Collects and stores data from multiple internal and external sources after ETL processing | Databases, data storage systems, SSD-based storage |
| Processing and analytics layer | Executes queries, performs aggregations, and manages security and optimization | OLAP engines, query optimizers, data processing engines |
| Access and presentation layer | Allows users and applications to analyze and visualize data | BI platforms, dashboards, SQL tools, data visualization software |
This layered architecture helps organizations scale analytics workloads while maintaining performance, security, and accessibility for business users.
Data warehouse vs. data lake vs. data lakehouse
Often, organizations will implement a combination of data warehouses and data lakes to complement each other and their distinct data needs. Let’s take a look at the differences between them and how they stack up to a data lakehouse.
The primary purpose of a data warehouse is to analyze data. It can intake vast amounts of data from various sources and store it for extended periods of time. Once stored, that data is formatted so it can be readily accessed and analyzed to deliver business insights and inform better decision-making. A data warehouse is a good option for organizations looking for a structured data solution focused on business intelligence, business reporting, and data analytics.
A data lake can also store and capture data in real time from a wide range of sources, including business applications, mobile apps, internet of things (IoT) devices, and more. But the data does not need to be formatted or defined first. This is referred to as “schema on read.” As a result, data lakes can cost-effectively hold vast amounts of raw data and easily scale as an organization’s data requirements change. In addition, users can access the data in various ways without having to know special languages like SQL. Data lakes are suitable for enterprises that need a flexible, affordable data solution to drive advanced analytics and machine learning (ML) models.
A data lakehouse is a relatively new architecture that combines the cost-effectiveness and scalability of a data lake with the robust information management of a data warehouse. A lakehouse can integrate various data sources automatically without having to make them fit formats or schemas. And it can store and process data in a single platform accessible to users across the enterprise. That allows enterprises to have a “single source of truth” for their data but still leverage advanced analytics and machine learning capabilities of a data lake to maximize business value—all in one solution.
| Data warehouse | Data lake | Lakehouse | |
|---|---|---|---|
| Data type | Relational and structured | Relational and non-relational; can be structured, semi-structured, and unstructured | Structured, semi-structured, and unstructured |
| Schema | On write | On read | On read and on write |
| Data quality | Consistent data that serves a single source of truth across the organization | Raw data that has not been organized or curated | A single source of truth for curated data and storage for raw data |
| Use cases | Core/batch reporting, data visualizations, business intelligence | Machine learning, advanced analytics, forecasting, real-time analysis | Real-time analytics, machine learning, immediate business insights, forecasting, reporting |
| Primary users | Business analysts | Data scientists, data, developers, engineers | Business analysts, data scientists, data, developers, engineers |
| Scalability | Difficult and expensive to scale | Can scale easily and cost effectively | Can automatically scale on demand |
| Cost | Storage and data management costs more than with a data lake | Storage and data management costs less than with a data warehouse | Storage is cost-effective and fast |
Designing and implementing a data warehouse
Designing and implementing a data warehouse requires coordination between IT teams, data professionals, and business stakeholders. The goal is to build an environment that supports analytics needs today while remaining flexible enough to accommodate new data sources, analytical models, and growing data volumes.
A well-designed warehouse balances several technical and organizational considerations. These include identifying the business questions the warehouse will support, determining where data originates, structuring that data for efficient analysis, and selecting the systems required to manage and access it. Performance, governance, and operational processes must also be planned to ensure the system remains reliable as usage grows.
The following table summarizes the primary factors organizations evaluate when designing and implementing a data warehouse.
| Design Consideration | What It Involves | Why It Matters |
|---|---|---|
| Business requirements | Identifying analytics goals, reporting needs, and key metrics | Ensures the warehouse supports real business decisions |
| Data sources | Determining where data originates, including operational systems and external feeds | Enables accurate and comprehensive data integration |
| Data modeling | Structuring data using schemas such as star or snowflake models | Improves query performance and analytical usability |
| System architecture | Selecting database platforms, ETL pipelines, and analytics tools | Determines scalability, reliability, and integration capabilities |
| Performance optimization | Applying indexing, partitioning, and caching strategies | Maintains fast query performance as data volumes grow |
| Process integration | Aligning the warehouse with business workflows and decision processes | Ensures analytics outputs are actionable |
| Refresh and recovery | Establishing schedules for data updates, backup, and restoration | Maintains data reliability and operational resilience |
By addressing these considerations early in the design process, organizations can build a data warehouse environment that supports reliable analytics, scalable growth, and long-term data strategy.
Data warehousing solutions
Your unique business requirements should be top of mind when considering a data warehouse solution for your organization. How do you intend to use it to draw meaningful insights and improve business decision-making?
It’s also important to consider:
- Connectivity and advanced capabilities. Look for a data warehouse architecture that offers connectivity to various data sources within your organization so you can intake and store data seamlessly. That may require a solution with data modeling and mapping, quality assurance, ETL capabilities, connectivity to BI tools, and more. This is critical to ensuring you can load and analyze data quickly and efficiently.
- Performance. The quicker a data warehouse can process and export information, the faster your users can make critical business decisions and drive value. Look for tools that improve performance like parallel processing, advanced processing power, and maximum uptime that allow you to process data more rapidly and efficiently.
- Scalability. An effective data warehouse solution should also be future-ready with the ability to scale as your organization’s data needs change. That includes the ability to handle larger amounts of data over extended periods.
- Cloud vs. on-premises. An on-premises data warehouse will reside on your network, which can offer maximum control over things like latency. But it can also require much more investment in both hardware and IT support personnel. Cloud solutions don’t require a physical environment for your data warehouse, so you don’t need to independently house or maintain it. That means it’s easy to deploy, more affordable, and can scale more easily.
Why choose Teradata for your data warehouse needs?
With Teradata VantageCloud, our complete cloud analytics and data platform, you can deploy a data warehouse at an enterprise level that offers flexibility, performance, and analytics to a wide range of users.
Our next-generation cloud-native architecture can seamlessly bring disparate sources of data together to create a single source of truth for your organization. It easily manages mixed workloads. And it ensures data integrity with real-time updates.
That means greater flexibility, scalability, and security for your organization so you can access business intelligence more quickly and efficiently leverage more informed decision-making.
Data warehousing FAQ
What is an example of a data warehouse?
What is an example of a data warehouse?
An enterprise data warehouse (EDW) is one example of this data storage technology. EDWs help businesses improve customer insights, optimize supply chain management, and increase productivity. Walmart was one of the first large organizations to employ an EDW, which it uses to guide suppliers, inform point-of-sales analytics, and more deeply understand customer buying habits. Today’s EDW solutions commonly involve a combination of environments designed to address a range of enterprise needs. For example, business users may require access to curated, structured data that can easily be applied to business intelligence applications. The EDW may also involve data lakes that store unstructured data with the potential to be used in business applications but that are not assigned to any particular function.
What are the key components of a data warehouse?
What are the key components of a data warehouse?
A data warehouse generally includes:
- A database that stores and manages data
- An ETL process responsible for extracting data from source systems and transforming it into a format for it to be analyzed
- Query and analysis tools so end users can create reports, build dashboards, and conduct data analysis
- A data access layer that manages data access, security, and authentication
- Data governance, security, backup, and recovery tools
What schemas are often used in data warehouses?
What schemas are often used in data warehouses?
The two most common schemas used in data warehouses are star schemas and snowflake schemas. A star schema simplifies queries and provides robust query performance because it minimizes the effort needed to retrieve data. So, they’re commonly used in scenarios where query performance is a top priority. A snowflake schema often requires more complex query processing due to an increased number of joins needed to retrieve data, but they use less space. Therefore, they can be suitable in scenarios where storage space is a concern.