[Backend User Story] Revise ETL Usage

Connected to issue #782 

## User Story

As a **backend developer**, I want to remove ETL execution from the CI pipeline so that pull requests run faster, CI resources are used efficiently, and test workflows scale better as SafeHome grows.

---

## Goalset

**Current State**

ETL scripts are currently executed on **every push and pull request** as part of the CI pipeline, even though the datasets that the ETL is sourcing from haven't been updated in a long time.

This setup is inefficient:

- **Slow pipelines:** ETL adds several minutes to every PR and push.
- **Wasted resources:** CI compute time and network usage are consumed even when changes only require linting, builds, or tests.
- **Low dataset churn:**
  - Tsunamis dataset last updated in 2022
  - Liquefactions dataset last updated in 2024
  - Soft Stories last updated July 2025 and no longer appears to update regularly

We don't need to create and load a new Postgres instance with ~50MB of data in this pipeline, since we're not persisting it and are just using it to run tests.

**Desired Outcome**

The goal is to **separate ETL from CI** by:

- Removing ETL execution from default PR and push workflows
- Using static inserts, seeded data, or fixtures for tests
- Allowing ETL to run intentionally and independently when data updates are actually needed

This change should reduce CI time, lower resource usage, and make the pipeline more maintainable as the project evolves.

---

## Acceptance Criteria

- ETL scripts are **not executed** as part of the default CI workflow for pull requests and standard pushes.
- Tests rely on **explicit inserts, seeded data, or fixtures** rather than requiring a full ETL run.
- A clear and documented mechanism exists to run ETL **separately and intentionally**, such as:
  - Manual triggers
  - Scheduled jobs
  - Dedicated CI workflows
- CI pipelines are **measurably faster** and focus on linting, builds, and tests.
- Docker and initialization logic are updated to reflect the separation between application startup and ETL execution.
- Documentation is added explaining:
  - Why ETL was removed from CI
  - How test data is provisioned
  - How and when ETL should be run going forward
  - Any implications for CI, Docker, and local development
- There are **no regressions** in test reliability or production data workflows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Backend User Story] Revise ETL Usage #786

User Story

Goalset

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Backend User Story] Revise ETL Usage #786

Description

User Story

Goalset

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions