Tuesday, 30 May 2017

Scraping using nightmare js with node js

I'm trying to scrape and save the results into my database. I'm using NodeJS (sails.js framework)

This is a working a example using cheerio:

getRequest('some-url')
            .then((data) => {
                const $ = cheerio.load(data);
                let title = $('.title').each(function(i, element){
                    let a = $(this);
                    let title = a.text(); // Title

                    MyModel.create({title: title}).exec((err, event) => {

                    });
                });
        });

The problem with cheerio is that it's not acting as a browser and does not render any javascript-rendered web pages.

So I decided to try nightmare js, and it was a nightmare to do the same:

var articles = [];
Promise.resolve(nightmare
  .goto('some-url')
  .wait(0)
  .inject('js', 'assets/js/dependencies/jquery-3.2.1.min.js')
  .evaluate((articles) => {
    var article = {}
    var list = document.querySelectorAll('h3 a');
    var elementArray = [...list];
    elementArray.forEach(el => {
      article.title = el.innerText;
articles.push(article);
      myModel.create({title: article.title}).exec((err, event) => {

      });
    });
    return articles;

  }, articles)
  .end())
  .then((data) => {
    console.log(data);
  });

The problems News is not defined inside the evaluate() function. the evaluate function seem to accept only strings, and News is a model created by sails.js.

Also, the articles array is populated with the same data.

Is there any simpler way to scrape a webpage after DOM render using NodeJS?



via TheUnreal

No comments:

Post a Comment