DAtasync

Config-Driven ETL Platform

I built a template‑driven ETL platform that replaced the idea of creating a custom pipeline for every new use case. The result is a single, reusable system that supports multiple sources and targets with minimal code changes.

You can’t
keep cloning pipelines forever.

Context and Challenge

The initial proposal was to build a separate pipeline for each use case and replicate it for future ones. That approach would have been easy to start but expensive to maintain: every new source or target meant new code, more risk, and more testing. I proposed a different direction: apply DRY principles and object‑oriented design to build an abstract pipeline that can be driven entirely by templates.

The Approach

I designed a multi‑purpose pipeline built around configuration. Each workflow is represented by a template that defines:
  • source type
  • target type
  • table names
  • paths and filters
  • schedule
The pipeline itself stays constant and
follows the same loop every time:
read → transform → write. What it reads, how it transforms, and where it writes are all determined by the template.
To scale safely, each template is triggered as its own Airflow job. That means each business runs independently on separate compute, so one flow doesn’t block another.

One pipeline
for many businesses,
driven by templates.

What I Built

Template‑driven orchestration: One pipeline that adapts its behavior based on configuration.
  • Abstract design: The pipeline is built around OOP and abstract classes, so integrations are interchangeable and easy to extend.
  • Reusable integrations: Sources and targets are encapsulated behind integration layers.
  • Parallel execution: Airflow dynamically creates a DAG per template, enabling parallel processing across businesses.
  • Schema safety: Automated migrations run once per template to keep targets aligned.

A plug‑and‑play ETL that runs
in parallel.

Cut runtime from hours to about 30 minutes.

Results

  • Massive performance gain: Runtime dropped from 4–5 hours to about 30 minutes.
  • Scalable onboarding: New data flows can be added by configuration rather than code.
  • Lower maintenance: A single pipeline replaces many one‑off scripts.
  • Cleaner architecture: DRY and OOP principles created a cleaner, easier‑to‑reason‑about system.

let's work together

Send info