[MongoDB] aggregate embedded arrays

Posted by saidbakr on Fri, 03 Apr 2020 03:06:18 +0200

An array is embedded in the Mongo aggregate nested array document, which is to aggregate the embedded array.

  • MongoDB 3.6

Data sample

{
    "_id" : ObjectId("5aa63f7034cdbd6928f831ff"),
    "domain" : "metacafe.com",
    "cdn" : "LLNW",
    "geo_bytes" : [ 
        {
            "bytes" : 223,
            "geo" : "Malta"
        }, 
        {
            "bytes" : 446,
            "geo" : "Canada"
        }, 
        {
            "bytes" : 892,
            "geo" : "The Philippines"
        }, 
        {
            "bytes" : 223,
            "geo" : "Spain"
        }
    ],
    "time" : 1520697600
}

geo_bytes is an embedded array, geo represents region, bytes represents traffic, and time is timestamp. The demand is similar: calculate the country distribution of traffic in a week? , the query description is to sum the bytes field according to the geo field.

Query writing

db.getCollection('daily_coll').aggregate(
    {"$match": {
        "time": {"$gte": 1520697600, "$lt": 1520697601},
    }},
    {"$unwind": "$geo_bytes"},
    {"$group": {
       "_id": { 
           "geo": "$geo_bytes.geo"
       }, 
       "bytes": {"$sum": "$geo_bytes.bytes"} 
   }},
   {"$project":{
        "_id": 0,
        "geo": "$_id.geo",
        "bytes": "$bytes"
   }},
   {"$sort": {"bytes": -1}}
 )

The key is $unwind, Document address Its function is to make the embedded array refer to the current document and make the embedded structure flat.

The above sample data is queried as follows

db.getCollection('daily_coll').aggregate(
    {"$match": {
        "time": {"$gte": 1520697600, "$lt": 1520697601},
    }},
    {"$unwind": "$geo_bytes"}
 )

As a result, the data will be split into four pieces

{
    "_id" : ObjectId("5aa63f7034cdbd6928f831ff"),
    "domain" : "metacafe.com",
    "cdn" : "LLNW",
    "geo_bytes" : {
        "bytes" : 223,
        "geo" : "Malta"
    }
    "time" : 1520697600
}

{
    "_id" : ObjectId("5aa63f7034cdbd6928f831ff"),
    "domain" : "metacafe.com",
    "cdn" : "LLNW",
    "geo_bytes" : {
         "bytes" : 446,
         "geo" : "Canada"
    }
    "time" : 1520697600
}

...

Then we can use the pipeline to aggregate.

ref

Topics: MongoDB