MongoDB database - aggregation

Posted by figo2476 on Tue, 04 Jan 2022 04:33:20 +0100

catalogue

1, Map Reduce

1. MapReduce command

2, Aggregate

1. aggregate() method

2. Instance

3. Aggregate expression

4. Concept of pipeline

5. Pipe operator instance

1, Map Reduce

MAP REDUCE is a computing model, which simply means that a large number of work (data) are decomposed (MAP) and executed, and then the results are combined into the final result (REDUCE).

The map reduce provided by MongoDB is very flexible and practical for large-scale data analysis.

1. MapReduce command

Basic grammar

db.collection.mapReduce( //Target collection to operate on
	<map>,//Mapping function (generate key value pair sequence as the parameter of reduce function)
	<reduce>,//Statistical function
	{
		out: <collection>,//The statistical results are stored in the collection (if not specified, the temporary collection is used and automatically deleted after the client is disconnected). must
 Less than BSON Document size (16) M)limit
		query: <document>,//Target record filtering.
		sort: <document>,//Sort target records.
		limit: <number>,//Limit the number of target records.
		finalize: <function>,//The final processing function (the result returned by reduce is finally sorted and stored in the result set).
		scope: <document>,//Import external variables to map, reduce and finalize.
		jsMode: <boolean>,//Convert from BSON to JSON, execute the Map process, convert JSON to BOSN, and convert from BSON to JSON,
implement Reduce Process, will JSON Convert to BSON,be careful, jsMode suffer JSON Heap size and independent primary key up to 500 KB Limitations. Therefore, yes
 For larger tasks jsMode Not applicable. In this case, it will change to the usual mode
		verbose: <boolean>,//Displays detailed time statistics.
		bypassDocumentValidation: <boolean>
	}
)

Using MapReduce, you need to implement two functions: the Map function and the Reduce function. The Map function calls emit (key, value), traverses all records in the collection, and passes the key to the Reduce function at value for processing.

The Map function must call emit (key, value) to return a key value pair

 

2, Aggregate

1. aggregate() method

The method of aggregation in MongoDB uses aggregate()

Syntax:

 >db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)  

2. Instance

Data in collection:

{     
	_id: ObjectId(7df78ad8902c)    
	title: 'MongoDB Overview',      
	description: 'MongoDB is no sql database',   
	by_user: 'w3cschool.cc',   
	url: 'http://www.w3cschool.cc',  
	tags: ['mongodb', 'database', 'NoSQL'],  
	likes: 100  
}, 
{  
	_id: ObjectId(7df78ad8902d)     
	title: 'NoSQL Overview',    
	description: 'No sql database is very fast',    
	by_user: 'w3cschool.cc',   
	url: 'http://www.w3cschool.cc',   
	tags: ['mongodb', 'database', 'NoSQL'],   
	likes: 10  
},  
{ 
	_id: ObjectId(7df78ad8902e)    
	title: 'Neo4j Overview',  
	description: 'Neo4j is no sql database',   
	by_user: 'Neo4j',   
	url: 'http://www.neo4j.com',  
	tags: ['neo4j', 'database', 'NoSQL'],  
	likes: 750  
}, 

Now let's calculate the number of articles written by each author through the above set, and use aggregate() to calculate the results as follows

> db.mycol.aggregate([{$group :{_id: "$by_user",num_tutorial:{$sum : 1}}}])
{
        "result" :[
                {
                        "_id" : "w3cschool.cc",
                        "num_tutorial": 2
                },
                {         
			            "_id" : "Neo4j",       
			            "num_tutorial" : 1     
		        }     
	],    
	"ok" : 1 
}  
>  

We use the field by_ The user field groups the data and calculates by_ The sum of the same values for the user field.

3. Aggregate expression

expressiondescribeexample
$sumCalculate the sum.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])
$avgCalculate averagedb.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}])
$minGets the minimum value of the corresponding value of all documents in the collection.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}])
$maxGets the maximum corresponding value of all documents in the collection.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}])
$pushInserts values into an array in the result document.db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}])
$addToSetInserts values into an array in the result document, but does not create a copy.db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}])
$firstGet the first document data according to the sorting of resource documents.db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}])
$lastGet the last document data according to the sorting of resource documentsdb.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}])

4. Concept of pipeline

Pipes are generally used in Unix and Linux to take the output of the current command as the parameter of the next command.

The MongoDB aggregation pipeline passes the MongoDB document to the next pipeline for processing after one pipeline is processed. Pipeline operations can be repeated.

Expressions: process input documents and output. The expression is stateless. It can only be used to calculate the document of the current aggregation pipeline and cannot process other documents.

Several operations commonly used in the collection framework:

$project: modify the structure of the input document, which can be used to rename, add or delete fields, or create calculation results and nested documents.

$match: used to filter data, only output qualified documents, and use MongoDB's standard query operation.

$limit: used to limit the number of documents returned by the MongDB aggregation pipeline.

$skip: skip the specified number of documents in the aggregation pipeline and return the remaining documents.

$unwind: split an array type field in the document into multiple pieces, each containing a value in the array.

$group: groups documents in the collection, which can be used to count results.

$sort: sort the input documents and output them.

$geoNear: output ordered documents close to a geographic location.

5. Pipe operator instance

$project instance

db.article.aggregate(     
	{
		$project : {         
			title : 1 ,      
			author : 1 ,   
		}
	}
); 

In this case, the result is only_ id, tile, and author are three fields. By default_ The id field is included. If you want to exclude it_ id can be as follows:

db.article.aggregate(    
	{ 
		$project : { 
			_id : 0 ,       
			title : 1 ,   
			author : 1     
		}
	}
); 

$match instance

db.articles.aggregate( [   
	{ $match : { score : { $gt : 70, $lte : 90 } } },  
	{ $group: { _id: null, count: { $sum: 1 } } }                  
	] );  

$match is used to obtain records with scores greater than 70 and less than or equal to 90, and then send qualified records to the $group pipeline operator in the next stage for processing.

Instance skip $

db.article.aggregate(      
	{ $skip : 5 }); 

After being processed by the $skip pipeline operator, the first five documents are "filtered out".