fix heading
This commit is contained in:
parent
338a06e583
commit
386b9313ab
@ -11,7 +11,7 @@ This manual guides data engineers & data analysts (DA/DE) through using Airflow,
|
||||
- [Superset](#superset)
|
||||
- [Trino](#trino)
|
||||
- [Object Storage](#object-storage)
|
||||
- [Workflow](#workflow)
|
||||
- [Data Flow Diagram](#data-flow-diagram)
|
||||
- [Data Pipeline](#data-pipeline)
|
||||
- [1. Data Ingestion](#1-data-ingestion)
|
||||
- [2. Raw Data Storage](#2-raw-data-storage)
|
||||
@ -43,7 +43,7 @@ A distributed **SQL query engine** designed for fast analytics across large data
|
||||
|
||||
An **S3-compatible storage provider** (e.g., MinIO) used to store and retrieve unstructured data such as files, logs, and backups. It provides a scalable, API-driven interface for applications to manage data objects, similar to AWS S3.
|
||||
|
||||
## Workflow
|
||||
## Data Flow Diagram
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
@ -179,6 +179,7 @@ This script demonstrates data ingestion & ETL phases.
|
||||
|
||||
- `load_excel_to_csv()` - Ingesting data from source excel file, and store raw data as csv in object storage
|
||||
- `load_csv_to_trino()` - Extract raw data from csv file, transform it, then load it back to object storage using Trino. Processed data is stored in Trino iceberg format, 'default' schema & 'computer_parts_sales' table.
|
||||
- Assuming sample data source `computer-parts-sales.xlsx` has been uploaded to `airflow/excel` folder in target bucket in object storage (configured in Data Platform service settings).
|
||||
|
||||
```python
|
||||
from airflow.sdk import DAG, task
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user