Tuesday, 9 May 2017

Process 50k webpages on runtime (NodeJS)

I need to download ~50k webpages, get some data from them and put it to variable.

I wrap each request into Promise and then Promise.all() them. I use Request library.

Simplified code:

const request = require('request');
const urls = [url1, url2, ...];
const promises = [];

urls.forEach(url => {
    promises.push((resolve, reject) => {
        request(url, (error, response, body) => {
            if(error){ reject(error); return; }

            // do something with page

            resolve(someData);
        });
    });
});

Promise.all(promises.map(pr => new Promise(pr)))
    .then((someDataArray)=>{ /* process data /* });

But I receive ENFILE exception, which stands for too many open files in the system (on my desktop max number of open files is 2048).

I know that Promises execute on creation, but I can't solve this problem.

Maybe there is other approach to do that? Thanks for response.



via Ted Romanus

No comments:

Post a Comment