Skip to main content

Data Import (ETL) Strategy

1. Executive Summary: The ETL Workflow

This pipeline transforms fragmented budget data into a unified structure for Zoho Creator.

High-Level Process

  1. Sources (csv/ root): Raw CSV exports from various systems. Immutable.
  2. Processing (etl/): Python scripts extract, normalize, and reconcile data.
    • See etl_integrity skill for script details.
  3. Validation (csv/verification/): Automated integrity checks against Business Rules.
    • See etl_integrity skill for logic.
  4. Output (csv/zoho_import/unified/): 6 Canonical CSVs ready for import.

2. Reference Skills

  • ETL Integrity Skill: The "Manual" & "The Law". Contains source-specific transformations, reconciliation logic, and business rules.

3. Quick Actions

  • Run Full Pipeline:
    ./run_full_etl.sh
  • Check Health:
    open csv/verification/reports/etl_validation_report.md

4. Integration Logic (Import Order)

The 6 canonical files must be imported into Zoho Creator in this strict order to maintain referential integrity:

  1. Providers (01_providers.csv)
  2. Streams (02_streams.csv)
  3. Budget Buckets (03_budget_buckets.csv)
  4. Budget Versions (04_budget_versions.csv)
  5. Contracts (05_contracts.csv)
  6. Invoices (06_invoices.csv)

[!IMPORTANT] > Zero Tolerance Policy: Never edit output CSVs manually. If validation fails, fix the code or the source data, then re-run the pipeline.