fix heading

This commit is contained in:
Azwan b. Amit 2025-11-19 11:22:26 +08:00
parent 338a06e583
commit 386b9313ab

View File

@ -11,7 +11,7 @@ This manual guides data engineers & data analysts (DA/DE) through using Airflow,
- [Superset](#superset)
- [Trino](#trino)
- [Object Storage](#object-storage)
- [Workflow](#workflow)
- [Data Flow Diagram](#data-flow-diagram)
- [Data Pipeline](#data-pipeline)
- [1. Data Ingestion](#1-data-ingestion)
- [2. Raw Data Storage](#2-raw-data-storage)
@ -43,7 +43,7 @@ A distributed **SQL query engine** designed for fast analytics across large data
An **S3-compatible storage provider** (e.g., MinIO) used to store and retrieve unstructured data such as files, logs, and backups. It provides a scalable, API-driven interface for applications to manage data objects, similar to AWS S3.
## Workflow
## Data Flow Diagram
```mermaid
flowchart TB
@ -179,6 +179,7 @@ This script demonstrates data ingestion & ETL phases.
- `load_excel_to_csv()` - Ingesting data from source excel file, and store raw data as csv in object storage
- `load_csv_to_trino()` - Extract raw data from csv file, transform it, then load it back to object storage using Trino. Processed data is stored in Trino iceberg format, 'default' schema & 'computer_parts_sales' table.
- Assuming sample data source `computer-parts-sales.xlsx` has been uploaded to `airflow/excel` folder in target bucket in object storage (configured in Data Platform service settings).
```python
from airflow.sdk import DAG, task