I'm trying to write streamed JSON data via JSONStream to MongoDB. The stream is needed because the data can get very large (it can go up to tens of GBs), and I would like to use MongoDB's bulk write capability to further make the process faster. To do that, I would need to buffer the data, bulk-writing every 1000 JSON objects or so.
My problem is that when I buffer the writing of the data, it does not write all the data and leaves out the last few thousand objects. That is, if I try to write 100000 JSON objects, my code gets to write only 97000 of them. I have tried buffering both MongoDB bulk write and normal write with similar erroneous results.
My code:
var JSONStream = require('JSONStream');
var mongodb = require('mongodb');
// DB connect boilerplate here
var coll = database.collection('Collection');
var bulk = coll.initializeOrderedBulkOp();
var bufferSizeLimit = 1000;
var recordCount = 0;
var jsonStream = JSONStream.parse(['items', true]);
jsonStream.on('data', (data) => {
bulk.insert(data);
recordCount++;
// Write when bulk commands reach buffer size limit
if (recordCount % bufferSizeLimit == 0) {
bulk.execute((err, result) => {
bulk = coll.initializeOrderedBulkOp();
});
}
});
jsonStream.on('end', () => {
// Flush remaining buffered objects to DB
if (recordCount % bufferSizeLimit != 0) {
bulk.execute((err, result) => {
db.close();
});
}
});
If I substitute the buffered write code with a simple MongoDB insert, the code works properly. Is there anything I am missing here?
via iambas
No comments:
Post a Comment