From Raw Data to Analysis
The data journey from source application to business report has three stages: ingestion (get raw data), transformation (apply business logic), and analysis (present insights). Each stage involves different tools, teams, and quality checks. This guide walks through all three.
What Are the Three Stages of the Data Journey?
- Data Ingestion — Get raw data from the source application
- Data Transformation — Apply business logic to convert data into a form that supports analysis
- Data Analysis / Reporting — Present the transformed data in a consumable format for end-users
How Does Data Ingestion Work?
Data ingestion reads data from source applications and moves it to a central storage location. For example, ingesting accounting data from NetSuite and storing it in AWS S3. During ingestion, a well-built pipeline ensures:
- Raw data is available in a consistent and reliable manner
- A copy of deduplicated data is maintained alongside the raw data
- Input data meets basic quality checks: no null records, data freshness, primary key validation
- The data schema is standardized and documented
This stage does not involve business logic. Popular data connectors include DataStori, Fivetran, and Airbyte.
What Happens During Data Transformation?
Data transformation converts raw ingested data into business-ready datasets. This stage focuses on:
- Transforming input data into the structure required by the business
- Defining relationships between entities in the data warehouse
- Writing data quality checks on business KPIs (e.g., sales this month > $10,000)
- Documenting and writing business-transformed data to warehouse tables
It is essential that the definitions of business KPIs remain consistent across pipelines. For example, if you build one pipeline for sales by month and another for sales per employee per month, the definition of "sales" must be the same in both.
Popular data transformation tools include dbt and Apache Spark.
How Is Data Presented for Analysis?
The transformed data is used for business reporting and analysis. This stage involves presenting data to different end-users in forms they can consume. Financial reports for management often differ from those required by the operations team, even though the underlying data is the same. Popular analytics tools include Power BI, Tableau, R, and Python.
What Supporting Tools Are Needed?
| Category | Purpose | Popular Tools |
|---|---|---|
| Workflow managers | Scheduling, automation, dependency management, monitoring | Airflow, Prefect |
| Data quality | Automated checks on fetched and transformed data | Great Expectations, AccelData |
| Data discovery | Tracking and documenting data assets as they grow | Amundsen, Atlan |
| Data governance | Access controls on data assets at scale | Varies by cloud provider |
Based on the maturity and needs of your organization, you can decide which supporting tools to add to your data stack.
Frequently Asked Questions
Who is responsible for each stage of the data journey?
Data ingestion is typically handled by data engineers. Transformation is a joint effort between data engineers (who build the platform) and business analysts (who define the rules). Reporting is usually managed by BI developers and consultants.
Do I need all the supporting tools from the start?
No. Start with ingestion and transformation tools, then add workflow managers and data quality checks as your pipeline complexity grows. Data discovery and governance tools are most valuable when your team and data volume scale up.
Where does DataStori fit in the data journey?
DataStori handles the first stage — data ingestion. It automates the process of fetching data from source applications, deduplicating it, running quality checks, and delivering it to your data warehouse or lakehouse.