Wednesday, 31 May 2017

Htmlparser2 parse to get links and then parse those links (node.js)

I'm using htmlparser2 in nodejs to parse a HTML page. I parse one page to get the links to other pages. Then I would like to parse those links (parsing function is not the same as on the first parsed page) to get some other information that I need beside that links. My problem is that I do not know how to parse multiple pages. If I put links in array then loop through it and call parser for each page, it just doesn't work because of asynchronous nature. Even if i called multiple requests in for loop, it wouldn't parse all links and I'm still stuck with a problem of getting request result out of the actual request.

request(link,function(error,response,body)
{
        var obj = parsingData(body); //parsingData is my parsing function


        for(var i = 0; i < obj.length; i++){
            var newLink = obj[i].link;

            request(newLink,function(error,response,body)
            {
                    var pObj = parsingPasma(body);
                    console.log(pObj);
            }

            });
        //how would I get pObj here, to update obj array + wait for the request to finish?
        }

}); 



via squirrel

No comments:

Post a Comment