Thursday, 11 May 2017

Requesting HTTP HEAD of thousands of URLs via NodeJS

I need to check the availability of about 300.000 URLs on a local server via HTTP. The files are not in a local file system but a key value store and the goal is to sanity check if every system needing access to those files is able to do so vial HTTP.

To do so, I would use HTTP HEAD requests that return HTTP 200 for every file found and 404 for every file not found.

The problem is, if I do too many requests at once, I get rate limited by nginx or a local proxy, hence no info whether a file is really accessible.

My method to look for the availability of files looks as follows:

...
const request = require('request'); // Using the request lib.
... 
const checkEntity = entity => {
logger.debug("HTTP HEAD ", entity);
return request({ method: "HEAD", uri: entity.url })
    .then(result => {
        logger.debug("Successfully retrieved file: " + entity.url);
        entity.valid = result != undefined;
    })
    .catch(err => {
        logger.debug("Failed to retrieve file.", err);
        entity.valid = false;
    });
}

If I call this function a few times, things work as expected. When trying to run it within recursive promises, I quickly exceed the maximum stack. Setting up one promise for each call causes too much memory usage.

How could this be solved?



via Sascha Kaupp

No comments:

Post a Comment