Wednesday 17 May 2017

Browse html with horseman after scraping

I am using Horsmenan to scrape a website in order to build some graph with the extracted data.
I managed to get the root element of each big part with my code but I don't know how to browse each element inside.
What i want to do is build json with some child element such as :

  • company-name
  • company-stack (which contains a ul list)

This is my code so far :

router.get('/', function(req, res, next) {
  //All the web scraping magic will happen here

  var url = "http://www.welcometothejungle.co/stacks?q=&hPP=30&idx=cms_companies_stacks_production&p=";

  const pages = [0,1,2,3,4];
  pages.forEach((page) => {
    const horseman = new Horseman();
    horseman
        .open(url + '' + page)
        .html('article')
        .then((text) => {
            console.log(`${text}`);
        })
        .close();
  });
  res.render('index', {title :"Done"});

});

How can I browse the 'text' result variable ?



via Stephane Karagulmez

No comments:

Post a Comment