記事

AI Data Pipeline: A Guide to Stages and Architecture

An AI data pipeline automates ingesting, preparing, and delivering data to train and operate AI models.

An AI data pipeline is an automated system that ingests, prepares, and delivers data to train, ground, and operate artificial intelligence and machine learning models. It extends the traditional data pipeline with the additional stages, governance, and monitoring required to support models in production.

Most AI data pipelines move data through five stages: ingestion, preparation, training or RAG indexing, deployment, and monitoring. Ingestion gathers raw data from source systems. Preparation cleans and transforms it. Training or indexing uses it to build a model or populate a retrieval index. Deployment puts the model into production. Monitoring tracks both pipeline and model health and feeds signals back into the next cycle.

A traditional data pipeline is optimized to deliver structured data to dashboards, reports, and analytics tools. An AI data pipeline is optimized to deliver data—often unstructured or multimodal—to machine learning and generative AI models. It enforces tighter lineage, supports continuous retraining, and monitors both data and model health. Most enterprises run both, with the AI pipeline extending the governance and storage provided by the traditional pipeline rather than replacing it.

AI data pipelines draw on several categories of tooling. Ingestion and orchestration tools move data from source to destination. Data preparation and feature engineering tools clean and shape data for models. Feature stores and vector stores manage the inputs used for training, inference, and retrieval-augmented generation. Observability tools track pipeline health, data drift, and model drift. Most production pipelines combine several of these categories rather than relying on a single end-to-end platform.

A typical AI data pipeline architecture diagram shows a horizontal flow of five stages—ingestion, preparation, training or indexing, deployment, and monitoring—with a feedback arrow from monitoring back into training. Source systems feed into ingestion on the left; applications and users consume the outputs on the right; governance, lineage, and access controls run as a horizontal layer beneath all five stages.

A pipeline in machine learning is the sequence of automated steps that transforms raw training data into a deployed model. Typical steps include feature engineering, training, validation, deployment, and monitoring. In a broader AI data pipeline, the machine learning pipeline is one stage—the training or indexing step—within a longer chain that begins at ingestion and ends at production monitoring.

最新情報をお受け取りください

メールアドレスをご登録ください。ブログの最新情報をお届けします。



テラデータはソリューションやセミナーに関する最新情報をメールにてご案内する場合があります。 なお、お送りするメールにあるリンクからいつでも配信停止できます。 以上をご理解・ご同意いただける場合には「はい」を選択ください。

テラデータはお客様の個人情報を、Teradata Global Privacy Statementに従って適切に管理します。