Database Design: From Idea to Schema

Schema for Scale/Assessment

Practice & Assessment

Test your understanding of Schema for Scale

Multiple Choice Questions

A table with 500 million rows is partitioned by YEAR(created_at). A query filters by WHERE created_at >= '2024-01-01'. What performance benefit does partitioning provide?

Why is created_at a poor choice for a sharding key in a high-write system?

What is the correct order of operations when archiving old rows to prevent data loss?

You want dashboard queries to run fast without impacting production OLTP performance. Which pattern achieves this?

What must be true about the shard key for a sharded schema to work correctly?

Coding Challenges

Design a Partitioned Events Table

Write CREATE TABLE DDL for an events table (event_id BIGINT, user_id INT, event_type VARCHAR, payload JSON, created_at TIMESTAMP) partitioned by RANGE on YEAR(created_at) for years 2022 through 2025 plus a MAXVALUE partition. Then write the SQL for: (1) an archive job that copies all 2022 events to an events_archive table and deletes them from the live table in batches of 5000; (2) a query that returns a count of events per event_type for the year 2024, demonstrating partition pruning. Time estimate: 30 minutes.

Hard

Mini Project

Scale-Ready Schema for a Multi-Tenant SaaS Platform

Design the full schema for a multi-tenant project management SaaS (think Asana or Linear). Requirements: tenants (organizations) each have users, projects, tasks, and comments; schema must be shard-friendly with tenant_id as the shard key propagated into every table's PK; tasks table must be partitioned by RANGE on YEAR(created_at); implement soft deletes on tasks and projects; add an audit table for task status changes; create a read-optimized materialized view (or equivalent) for per-tenant task completion rate. Provide all DDL, index definitions, and a written justification (as SQL comments) for each sharding and partitioning decision.

Hard