I'm trying to scrape and save the results into my database. I'm using NodeJS (sails.js framework)
This is a working a example using cheerio:
getRequest('some-url')
.then((data) => {
const $ = cheerio.load(data);
let title = $('.title').each(function(i, element){
let a = $(this);
let title = a.text(); // Title
MyModel.create({title: title}).exec((err, event) => {
});
});
});
The problem with cheerio is that it's not acting as a browser and does not render any javascript-rendered web pages.
So I decided to try nightmare js, and it was a nightmare to do the same:
var articles = [];
Promise.resolve(nightmare
.goto('some-url')
.wait(0)
.inject('js', 'assets/js/dependencies/jquery-3.2.1.min.js')
.evaluate((articles) => {
var article = {}
var list = document.querySelectorAll('h3 a');
var elementArray = [...list];
elementArray.forEach(el => {
article.title = el.innerText;
articles.push(article);
myModel.create({title: article.title}).exec((err, event) => {
});
});
return articles;
}, articles)
.end())
.then((data) => {
console.log(data);
});
The problems News is not defined inside the evaluate()
function. the evaluate function seem to accept only strings, and News is a model created by sails.js
.
Also, the articles array is populated with the same data.
Is there any simpler way to scrape a webpage after DOM render using NodeJS?
via TheUnreal
No comments:
Post a Comment