Aggregation PipelineLesson 4.1

How the MongoDB aggregation pipeline works

pipeline concept, stage execution order, $match, $group, $project, $sort, $limit, document transformation vs query

Pipeline fundamentals

The aggregation pipeline is MongoDB's framework for transforming and computing data. Documents flow through a sequence of stages - each stage receives the previous stage's output, processes it, and passes results forward. Think of it as a Unix pipe where each operator does one well-defined job.

// Revenue summary grouped by order status
const report = await db.collection('orders').aggregate([
  // Stage 1: filter - only include completed orders
  { $match: { status: 'completed' } },

  // Stage 2: group - aggregate per status group
  { $group: {
    _id: '$status',
    totalRevenue: { $sum: '$total' },
    orderCount: { $sum: 1 },
    avgOrderValue: { $avg: '$total' }
  }},

  // Stage 3: sort - highest revenue at the top
  { $sort: { totalRevenue: -1 } }
]).toArray()

console.log(report)
// [{ _id: 'completed', totalRevenue: 48200, orderCount: 312 }]

The most important pipeline performance rule

Always place $match as the first stage whenever possible. A $match at stage 1 can use a collection index and dramatically reduces the number of documents that flow through all subsequent stages. A $match placed after $group receives no index benefit - every document in the collection has already been processed before the filter applies. Use find() for simple retrieval and aggregate() for any computation, reshaping, or joining.