How the MongoDB aggregation pipeline works
pipeline concept, stage execution order, $match, $group, $project, $sort, $limit, document transformation vs query
Pipeline fundamentals
The aggregation pipeline is MongoDB's framework for transforming and computing data. Documents flow through a sequence of stages — each stage receives the previous stage's output, processes it, and passes results forward. Think of it as a Unix pipe where each operator does one well-defined job.
// Revenue summary grouped by order status
const report = await db.collection('orders').aggregate([
// Stage 1: filter — only include completed orders
{ $match: { status: 'completed' } },
// Stage 2: group — aggregate per status group
{ $group: {
_id: '$status',
totalRevenue: { $sum: '$total' },
orderCount: { $sum: 1 },
avgOrderValue: { $avg: '$total' }
}},
// Stage 3: sort — highest revenue at the top
{ $sort: { totalRevenue: -1 } }
]).toArray()
console.log(report)
// [{ _id: 'completed', totalRevenue: 48200, orderCount: 312 }]The most important pipeline performance rule
Always place $match as the first stage whenever possible. A $match at stage 1 can use a collection index and dramatically reduces the number of documents that flow through all subsequent stages. A $match placed after $group receives no index benefit — every document in the collection has already been processed before the filter applies. Use find() for simple retrieval and aggregate() for any computation, reshaping, or joining.
