Difference between revisions of "MongoDB Aggregate Pipeline"

From mi-linux
Jump to navigationJump to search
Line 39: Line 39:
 
  { $group:  
 
  { $group:  
 
         { _id: "$deptno", total: {$sum: "$sal"} } }
 
         { _id: "$deptno", total: {$sum: "$sal"} } }
  ])
+
])
  
 
This is similar to the SQL command:
 
This is similar to the SQL command:

Revision as of 10:12, 13 November 2017

Main Page >> MongoDB >>MongoDB Workbook >> Aggregation Pipeline

Aggregation Pipeline

The aggregation pipeline is a framework for data aggregation modelled on the concept of data processing pipelines. What this means, is documents enter a multi-stage pipeline that transforms the documents into aggregated results.

This is similar to using GROUP BY in SQL, where you might aggregate the average grades of all students taking a module.

The MongoDB aggregation pipeline consists of stages and each stage transforms the documents as they pass through the pipeline. A stage can generate new documents or filter out documents. A stage can also appear several times in the pipeline.

The aggregation pipeline is implemented using the aggregate method() of a collection and the pipeline stages appear in an array. Documents pass through the stages in sequence.

The syntax is:

db.collectionName.aggregate( [ { <stage> }, ... ] )

The pipeline for instance, could:

  • project out certain details from each document, such as the employees;
  • group the projected details by a certain fields and then using an aggregate function, such as group by the deptno and then counting the number of occurrences;
  • sorting the results in order;
  • limiting the results to a certain number, such as the first 10;

These are represented by the following operators: $project,$group, $sort or $limit.

A number of operations exist for the aggregation pipeline, details of which can be found in the MongoDB manual:

https://docs.mongodb.com/manual/reference/operator/aggregation/

$group

$group will take a set of input documents, group them by a specified key and then apply an aggregate function to each group.

For example, to sum the salaries found in the emp collection:

db.emp.aggregate ( [
{ 	$group: 
       { _id: "$deptno", total: {$sum: "$sal"} } }
])

This is similar to the SQL command:

SELECT deptno, sum AS total FROM emp
GROUP BY deptno;

$lookup

$lookup can be used to provide a left outer join between two collections, such as dept and emp.

The $lookup stage does an equality match between a column from the input documents with a column from the joined collection.

In this example, the dept collection provides the input documents and the lookup will be performed in the emp collection. deptno in both documents is used to join the collections:

db.dept.aggregate([ {
     $lookup:  {
         from: "emp",
         localField: "deptno",
         foreignField: "deptno",
         as: "employees"
       }
  }
]).pretty()

This is the equivalent of the SQL statement:

SELECT * 
FROM dept 
LEFT OUTER JOIN emp
ON dept.deptno = emp.deptno;

The mgr field holds the empno of each employee's manager. To provide a list of employees the who they manage involved joining a collection to itself:

 db.emp.aggregate([ {
      $lookup:  {
          from: "emp",
          localField: "empno",
          foreignField: "mgr",
          as: "manages"
        }
   }
 ]).pretty()

An empty set [] will be returned for the manages field for any employees who are not managers.

Complex query

A more complex aggregate pipeline groups the emp collection by deptno, then returns the average salary, total annual salary and a count of employees by department:

db.emp.aggregate(
  [
     {
       $group : {
          _id : { _id : "$deptno"},
          averageSalary: { $avg: "$sal" },
          totalAnnualSalary: { $sum: { $multiply: [ "$sal", 12 ] } },
          count: { $sum: 1 }
       }
     }
  ]
)

Other Functions

Count

The power of the aggregation pipeline is to do processing on the data.

Lets count how many employees are in department 10:

db.emp.count({deptno: 10})


You can also add count() to a find query to count the records returned, instead of listing them:

db.dept.find({dname:"SALES"}).count()


Distinct

Sometimes you want to find the distinct values for a specified column (similar to distinct in SQL):

db.emp.distinct("deptno")


Exercise 3.1

Try the lookup with the collections swapped:

db.emp.aggregate([ {
      $lookup:  {
          from: "dept",
          localField: "deptno",
          foreignField: "deptno",
          as: "dept"
        }
   }
 ])

Explain the results.

Next Step

Updating the collection