Sunday, 28 May 2017

How to get the HTML of a page that loads dynamically

I'm trying to scrape a website for content. I was able to get the HTML of the main page using a nodejs but I have found that the page is generated dynamically. Resources are requested from other sources and they aren't there yet when the HTML is grabbed on my end.

I've verified this with PhantomJS which allowed me to screengrab the page and generate a screenshot of what was being grabbed. The page definitely loads, but it's completely barren.

What I'm asking is, can this be done? Is CORS the issue here, or do I need to delay the loading somehow? Any thoughts are appreciated.



via Zach Harriott

No comments:

Post a Comment