Schema for ScaleLesson 5.3
Archiving old data: keeping production tables lean
data archiving strategy, archive table pattern, cutoff date, incremental archive job, hard delete after archive, cold storage, pg_partman, event sourcing reference
Why Archive
Production tables with years of data become slow to query, slow to back up, and expensive to index. Archiving moves old, rarely accessed rows to a separate archive table (or cold storage), keeping the live table small and fast.
Archive Table Pattern
-- Archive table mirrors the live table
CREATE TABLE orders_archive LIKE orders;
-- Archive job: move rows older than 2 years
INSERT INTO orders_archive
SELECT * FROM orders
WHERE created_at < DATE_SUB(CURRENT_DATE, INTERVAL 2 YEAR)
AND status IN ('completed', 'cancelled');
DELETE FROM orders
WHERE created_at < DATE_SUB(CURRENT_DATE, INTERVAL 2 YEAR)
AND status IN ('completed', 'cancelled');Principles
- Archive in small batches (1000โ10000 rows) to avoid locking.
- Only archive rows in terminal states (completed, cancelled) โ never active records.
- Run the archive job during off-peak hours.
- Keep the archive table in the same database for fast joins; move to cold storage (S3, BigQuery) once you no longer need real-time queries against it.
Partitioning and archiving complement each other: partition by date, then drop old partitions โ which is an O(1) operation regardless of row count โ instead of running expensive DELETE queries.
