Difference between revisions of "MongoDB QueryData"

From mi-linux
Jump to navigationJump to search
Line 177: Line 177:
 
     }})
 
     }})
  
The ''$ifNull'' operator is needed, since department 40 has no employees - will get an error message if left out!
+
The ''$ifNull'' operator is needed, since department 40 has no employees - you will get an error message if left out!
  
 
== Next Step ==
 
== Next Step ==
  
 
[[MongoDB_Update|Updating]] the collection
 
[[MongoDB_Update|Updating]] the collection

Revision as of 23:59, 10 November 2016

Main Page >> MongoDB >>MongoDB Workbook >> Querying Collections

Querying a collection

The find() function can be used to query the documents.

The format is:

 db.collectionName.find(optional_query_criteria)

Where the query_criteria follows a pattern:

 db.collectionName.find({fieldName: "value"})

Note:

  • the criteria is enclosed in curly brackets: {}
  • the value needs quotes if it is a string or date value
  • quotes are optional for the fieldName, so long as they do not contain spaces
  • if the fieldName refers to a nested document, the name must be in matching single or double quotes


Find all documents

For example, show all the data so far in the deptCollection:

db.deptCollection.find()

The data comes back messy. The pretty() function can be used to improve the layout::

db.deptCollection.find().pretty()


Find One document

To find just one document - department 10:

db.deptCollection.find({deptno:10}).pretty()

Finding an employee means using the array name too:

db.deptCollection.find({"employees.empno":7902}).pretty()

However, this does mean you get back all the employees in the department they were found in!

Since Version 2.2 MongoDB's new $elemMatch can be used with arrays to return only the first element matching the $elemMatch condition:

db.deptCollection.find({deptno:20},  
  { _id: 0, employees: {$elemMatch: {empno: 7902}}}).pretty()

$elemMatch limits the contents of the employees array to contain only the first element matching the $elemMatch condition.

This is akin to a SQL query:

 SELECT * FROM Emp WHERE deptno=20 AND empno = 7902

_id is a unique value automatically generated by MongoDB (like a Primary Key, except it is unique for the whole database).

Using _id:0 suppresses the ObjectId value, however to see it:

db.deptCollection.find({deptno:20},  
  { employees: {$elemMatch: {empno: 7902}}}).pretty()

More about _ids in the next section.

Find with Query Criteria

The query criteria can be as complex as that found in SQL.

To find all employees earning more than 2000 in department 10:

db.deptCollection.find({deptno:10},   
 { employees: {$elemMatch: {sal: { $gt: 2000}}}}).pretty()


Same again for department 20 and the managers:

db.deptCollection.find({deptno:20},  { employees: {$elemMatch: {sal: { $gt: 2000}, job: "MANAGER"}}}).pretty()


employees is an array, so $elemMatch only returns the first matching value. What if we try this instead:

db.deptCollection.find({ "employees.sal" : { $gt: 2000}}).pretty()

Things to note:

  • This time employees.sal must be enclosed in matching single or double quotes.
  • Comment on what the above query returns.
  • If you examine the data carefully:
    • either find() means if one element of an array is found to be true, then all the elements are returned, even if they do not satisfy the query criteria
    • or including $elemMatch means only the first element matching the query criteria is returned.
  • Is this good practice?


Find departments with no managers:

db.deptCollection.find({ "employees.job" : { $ne: "MANAGER"}}).pretty()

Aggregation Pipeline

So far find() either returns all the elements of an array, if one element matches the search criteria, or $elematch returns the first one found only. The latter is fine if there is only one to be found, but not so good if several items in the array should match the search criteria.

The aggregation pipeline is a framework for data aggregation modelled on the concept of data processing pipelines. What this means, is documents enter a multi-stage pipeline that transforms the documents into aggregated results.

This is similar to using GROUP BY in SQL, where you might aggregate the average grades of all students taking a module.

The MongoDB aggregation pipeline consists of stages and each stage transforms the documents as they pass through the pipeline. A stage can generate new documents or filter out documents. A stage can also appear several times in the pipeline.

The syntax is:

db.collectionName.aggregate( [ { <stage> }, ... ] )

The pipeline for instance, could:

  • project out certain details from each document, such as the employees;
  • group the projected details by a certain fields and then using an aggregate function, such as group by the deptno and then counting the number of occurrences;
  • sorting the results in order;
  • limiting the results to a certain number, such as the first 10;

These are represented by the following operators: $project,$group, $sort or $limit.

A number of operations exist for the aggregation pipeline, details of which can be found in the MongoDB manual:

https://docs.mongodb.com/manual/reference/operator/aggregation/


$filter

In the aggregation pipeline, array has various operators and the one we are interested in is $filter

This returns a subset of the array with only the elements that match the filter condition.

$filter has the following syntax:

{ $filter: {
	input: <array>,    /* expression for the array */
	as: <string>, 	   /* variable name for the element */   
	cond: <expression> /* filter condition */
} }

$filter is one of the stages of the pipeline and can not be used by itself. It is used with an aggregation framework operator, such as $project.


We can use the pipeline to gather elements of our employees array to get the employees matching the query criteria only, rather than one, or everyone. This example has only one stage:

db.deptCollection.aggregate([ {
     $project: {
        empSet: {
           $filter: {
              input: "$employees",
              as: "employee",
              cond: { $gte: [ "$$employee.sal", 2000 ] }
           }
        }
     }
  }
]).pretty()

Now the system should only retrieve the employees with salary > 2000.

Count

The power of the aggregation pipeline is to do processing on the data.

Lets count how many employees each department has:

db.deptCollection.aggregate({ 
 "$project": {
    "deptno": 1,
    "Count": { "$size": { "$ifNull": [ "$employees", [] ] }
     }
   }})

The $ifNull operator is needed, since department 40 has no employees - you will get an error message if left out!

Next Step

Updating the collection