Thursday, 18 May 2017

Data from MongoDB for MEAN app exceeds max document size

I am building a MEAN app which gets a lot of data from mongoDB in the background. Basically so far I was handling just a small part to get to know everything, and had

  • A Collection for "Stations"
  • A Collection for "Trips In"
  • A Collection for "Trips Out"

Then in my routes.js, I would perform a $lookup to combine them like such:

var aggr = Station.aggregate([
    { 
        "$lookup": {
            "from": "Trips_in", // Contains only 1 Month
            "localField": "_id",
            "foreignField": "_id",
            "as": "trips_in"
        }
    },
    {  
        "$lookup": {
           "from": "Trips_out",  //Contains only 1 Month
           "localField": "_id",
           "foreignField": "_id",
           "as": "trips_out"
     }
    },
    {
        "$project": {
            "StationName"  : 1,
            "trips_in" : { "$arrayElemAt": ["$trips_in"]},
            "trips_out" : { "$arrayElemAt": ["$trips_out"]}                
        }
  }
])
aggr.options = { allowDiskUse: true };
aggr.exec(function(err, stations){
  if(err)
    res.send(err);
  res.json(stations); 
});

However now I am trying to work with the whole data set, which is massive. I am hitting the 16MB document limit when trying to put more than 1 month of data into these collections and then performing the same lookup.

Therefore my question is what a good approach here is.

What I already tried: Combining several Months into 1 document like such:

id    trips_in_January    trips_in_February
1     Destination: 2 {    Destination: 5 {
          Duration: 5,       ...
          Duration: 6     }
      }

With this, I hit the 16MB limit after just a couple of months in a single document.

What I think could be the solution:

  • Use more than a single collection? E.g. put each month into it's own collection, but then the following questions arise: How can I combine them again on the client side to actually use it? The lookup stages as mentioned exceed the max size when queried with all of these separate collections. Even if not, the first GET request will take forever.


via ffritz

No comments:

Post a Comment