Monday, 8 May 2017

Accessing external parameters and document in mongodb group aggregation

In a collection with the following general structure:

{_id: 'id1', clientId: 'cid1', clientName:'Jon', item: 'item1', dateOfPurchase: '...'},
{_id: 'id2', clientId: 'cid1', clientName:'Jon', item: 'item2', dateOfPurchase: '...'},
{_id: 'id3', clientId: 'cid2', clientName:'Doe', item: 'itemX', dateOfPurchase: '...'}
... etc

The objective is to create a grouping by clientId to calculate some simple statistics, e.g. total occurrences per clientId.

One way to achieve this using Node.js MongoDB Driver API Collection.group method is:

db.collection.group(
    'clientId',
    {},
    { count: 0 },
    function(obj, prev) {
        prev.count++;
    },
    true
}

The output of this for the sample data above would be similar to:

{clientId: 'cid1', count: 2}
{clientId: 'cid2', count: 1}

Question 1: what is the best way to pass some external values to the reducer function? For example I may want to calculate different counts for purchases made before/after a specific date and want to pass this date as a parameter. I know that with mapReduce I can use the scope option for this purpose. I'm wondering if there's a way to do this with the group function. I could use the iterator object but it feels hacky.

Question 2: is there a way to access the original document from inside the finalize function in order to include some extra data in the results? i.e. project extra fields from the original documents such as clientName:

{clientId: 'cid1', count: 2, clientName: 'Jon'}
{clientId: 'cid2', count: 1, clientName: 'Doe'}

Clarifications for Question 2, a) I could add the extra field inside the reducer function but it feels redundant to include code which is not supposed to run on every iteration. b) I could use aggregate pipelines to achieve something like this but I'm wondering if I can do this with Collection.group here



via gevou

No comments:

Post a Comment