Sunday 28 May 2017

How can I enable JS scripts to run when making a request for a page's HTML in Nodejs with request module

I'm trying to scrape a website using node.js and the request module. So far, I've been able to connect to the target site and generate the page's HTML using a query string from the site, but I am now realizing that the site renders the data using a script on their site.

Here is the HTML that I'm getting back.

C:\Users\ZHarriott>node test
error: null
statusCode: 200
body: <!doctype html>
<html>
<head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
  <link rel="canonical" href="https://bethesda.net">

  <title>Bethesda.net</title>
  <meta name="description" content="The official site for Bethesda, publisher of Fallout, DOOM, Dishonored, Skyrim, Wolfenstein, The Elder Scrolls, more. Your source for news, features & community.">
  <meta name="keywords" content="Fallout,Fallout 4,Fallout 3,Fallout New Vegas,DOOM,Dishonored,Dishonored 2,The Elder Scrolls,The Elder Scrolls Online,The Elder Scrolls Online Tamriel Unlimited,Wolfenstein,Wolfenstein The Old Blood,Wolfenstein The New Order,The Evil Within,The Elder Scrolls V Skyrim,Skyrim,Rage,Brink,Wet,The Elder Scrolls IV Oblivion,Oblivion,Bethesda Game Studios,ZeniMax Online Studios,id Software,Arkane Studios,MachineGames,Machine Games,Tango Gameworks,Bethesda Softworks,Todd Howard">
  <meta name="author" content="">
  <meta property="og:site_name" content="Bethesda.net">
  <meta property="twitter:site" content="@bethesda">
  <meta property="twitter:title" content="Bethesda.net">

  <link href="https://bethesda.net" hreflang="x-default" rel="alternate">
  <link href="https://bethesda.net/en/dashboard" hreflang="en" rel="alternate">
  <link href="https://bethesda.net/de/dashboard" hreflang="de" rel="alternate">
  <link href="https://bethesda.net/es/dashboard" hreflang="es" rel="alternate">
  <link href="https://bethesda.net/fr/dashboard" hreflang="fr" rel="alternate">
  <link href="https://bethesda.net/it/dashboard" hreflang="it" rel="alternate">
  <link href="https://bethesda.net/pl/dashboard" hreflang="pl" rel="alternate">

  <meta name="referrer" content="origin">
  <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=0">

  <link rel="stylesheet" type="text/css" href="/main.css">
  <script src="https://cdn02.bethesda.net/contentful@3.8.1/browser-dist/contentful.min.js"></script>
</head>
<body>
  <!--[if lte IE 9]>
    <p class="browserupgrade"><h3>You are using an <strong>outdated</strong> browser.</h3></p>
    <p class="browserupgrade">Many things may be non-functional if you continue, please upgrade your browser to improve your experience.</p>
  <![endif]-->
  <noscript>
    <p class="browserupgrade">Please enable javascript to use this site.</p>
    <META HTTP-EQUIV="Refresh" CONTENT="0;URL=nojs.html">
  </noscript>
  <app></app>
  <section id="_bnContent"></section>
  <globalfooter></globalfooter>

  <script>
  // Please do not use www
  if (window.location.hostname === 'www.bethesda.net') {
    window.location.replace('https://bethesda.net/' + window.location.hash)
  }
  try {
    // This ensures the user is using javascript, this is required for bethesda.net
    document.getElementsByTagName('html')[0].classList.remove('no-js')
  } catch (e) {
    console.log(e)
  }
  </script>
  <script src="/sites/main.js"></script>
</body>
</html>


C:\Users\ZHarriott>

It looks like because I'm not actually using a web browser that the site is having trouble knowing how to handle this request. Any thoughts on this?



via Zach Harriott

No comments:

Post a Comment