MongoDB Aggregate Pipeline
Main Page >> MongoDB >>MongoDB Workbook >> Aggregation Pipeline
Aggregation Pipeline
The aggregation pipeline is a framework for data aggregation modelled on the concept of data processing pipelines. What this means, is documents enter a multi-stage pipeline that transforms the documents into aggregated results.
This is similar to using GROUP BY in SQL, where you might aggregate the average grades of all students taking a module.
The MongoDB aggregation pipeline consists of stages and each stage transforms the documents as they pass through the pipeline. A stage can generate new documents or filter out documents. A stage can also appear several times in the pipeline.
The syntax is:
db.collectionName.aggregate( [ { <stage> }, ... ] )
The pipeline for instance, could:
- project out certain details from each document, such as the employees;
- group the projected details by a certain fields and then using an aggregate function, such as group by the deptno and then counting the number of occurrences;
- sorting the results in order;
- limiting the results to a certain number, such as the first 10;
These are represented by the following operators: $project,$group, $sort or $limit.
A number of operations exist for the aggregation pipeline, details of which can be found in the MongoDB manual:
https://docs.mongodb.com/manual/reference/operator/aggregation/
$group
$group will take a set of input documents, group them by a specified key and then apply an aggregate function to each group.
For example, to sum the salaries found in the emp collection:
db.emp.aggregate ( [ { $group:
{ _id: "$deptno", total: {$sum: "$sal"} } } ])
This is similar to the SQL command:
SELECT deptno, sum AS total FROM emp GROUP BY deptno;
$lookup
Other Functions
Count
The power of the aggregation pipeline is to do processing on the data.
Lets count how many employees are in department 10:
db.emp.count({deptno: 10})
You can also add count() to a find query to count the records returned, instead of listing them:
db.dept.find({dname:"SALES"}).count()
Distinct
Sometimes you want to find the distinct values for a specified column (similar to distinct in SQL):
db.emp.distinct("deptno")
Next Step
Updating the collection