Tuesday 16 May 2017

Why can't I seem to use an HTML document to construct jsdom object?

Goal:
Extract values via tags from an external webpage.

Method:
Perform an HTTP request and construct a jsdom object with the response. Use jsdom's query selectors to get the values from the tags.

Problem:
When I try to access the values of any tags... console.log(dom.window.document.querySelector("h4").textContent); ... I get an error: "Cannot read property 'textContent' of null".

This must mean that the jsdom object is not being properly constructed because of a problem with the chunk parameter (chunk is a string of the response object).

Discussion:
My guess is that there's a problem with quote escaping within the response chunk but my regex attempts have not seen any results. The dom.window.document.querySelector("h4").textContent works fine if I pass a simple string like <html><body><h4>testing</h4></body></html>.

The Important Section of the Code:

res.on('data', (chunk) => {
    console.log(typeof(chunk)); // string
    const dom = new JSDOM(chunk);
    console.log(dom.window.document.querySelector("h4").textContent);

  });

All of the Code:

var querystring = require('querystring');
var http = require('http');
const jsdom = require("jsdom");
const { JSDOM } = jsdom;

const postData = querystring.stringify({
  'id': '1'
});

const options = {
  hostname: 'www.southernnbtruckers.ca',
  port: 80,
  path: '/search/info/6',
  method: 'POST',
  headers: {
    'Content-Type': 'application/x-www-form-urlencoded',
    'Content-Length': Buffer.byteLength(postData)
  }
};

const req = http.request(options, (res) => {
  res.setEncoding('utf8');

  //Problem is likely to do with the HTTP response (chunk)
  res.on('data', (chunk) => {
    console.log(typeof(chunk)); // string
    const dom = new JSDOM(chunk);
    console.log(dom.window.document.querySelector("h4").textContent); //Cannot read property 'textContent' of null

  });
  res.on('end', () => {
      //Do stuff
  });
});

req.on('error', (e) => {
  console.error(`problem with request: ${e.message}`);
});

req.write(postData);
req.end();  

A bit messy but here's the HTML string for reference:

<html><head><base href="http://www.southernnbtruckers.ca/"><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /><meta name="generator" content="People and Groups" /><meta name="keywords" content="" /><title>TRUCKERS - Search</title><link rel="stylesheet" type="text/css" href="/core/styles/style_custom.php?org_name=truckers&language=english" /></head><body><a name="top"></a><div id="Container"><div id="Header"><table id="Lang"><tr><td valign="middle">&nbsp;&nbsp;&nbsp;<a href="/login">Login</a></td></tr></table></div><div id="MainNav"><div id="Nav"><ul><li class="LeftSelectedNav"><a href="/search">Home</a></li><li><a href="/contact_information">Contact</a></li><li style="float:right;" class="right">&nbsp;</li></ul></div></div><div id="MainContent"><div id="SideNav"><div id="Sub"><ul><li id="SubTitle">Home</li><li class="subsub"><a href="/our_mission_statemnt">Our Mission Statement</a
></li><li class="subsub"><a href="/our_executive">Our Executive</a></li><li class="currentSub"><a href="/search">Search</a></li><li id="SubSpacer"></li></ul><ul><li class="SubBlank"><h4>PO Box 342, Harvey, York Co. NB  E6K 3W9</h4><center><table class="featureImageTable"><tr><td><img src="/uploads/Website_Assets/truckers-sidetest.jpg"  alt="Side Test" title=""/></td></tr><tr><td></td></tr></table></center><br/><br/><a href="http://www.partsfortrucks.com" target="_tab"><center><table class="featureImageTable"><tr><td><img src="/uploads/Website_Assets/PartTrucks.jpg"  alt="Parts Trucks 300px" title=""/></td></tr><tr><td></td></tr></table></center></a></p></li><li id="SubSpacer"></li></ul></div></div><div id="Main"><table class="data" id="mainContentTable" cellspacing="0" cellpadding="0" width="100%"><tr><td valign="top"><h1>Truckers Search</h1>Find what you need!  This database is easy to use - if you're looking for a specific piece of equipment for hire just use the pull down menu that says "Company Name" and locate the equipment you require, then press return or the filter button.  If you're looking for a company to work in a specific county in New Brunswick - just use the pull down menu to identify the county.  You can also click on the name of any trucker to bring up their equipment profile and contact information.<hr/><a href="/search">&lt;&lt; Back to search</a><hr><h1>Gary MacBean</h1><div class="contact">Contact: Gary MacBean</div><hr>Address1:&nbsp;150 Sunrise Estates Avenue<br>City:&nbsp;New Maryland<br>Province:&nbsp;NB<br>Postal Code:&nbsp;E3C 1G6<br>Phone:&nbsp;1 506 459-3609<br>Cell Phone:&nbsp;1 506 444-1358<br>Fax:&nbsp;1 506 459-5154<br><hr>Number Of Trucks:&nbsp;2<br>Has Dump Trailer:&nbsp;Yes<br>Has Tandem Dump Truck:&nbsp;Yes<br>Has Belly Dump:&nbsp;Yes<br>Has Asphalt Tarp Spreader:&nbsp;Yes<br><hr>Has Compensation WorkSafeNB:&nbsp;Yes<br>Has Liability Insurance:&nbsp;Yes<br>Has HST Number:&nbsp;Yes<br><hr>Works Province Wide:&nbsp;Yes<br><hr><hr/><a href="/contact_information">Comment, Questions?</a></td></tr></table><div id="Footer"><hr/><p><h2>Serving Central and Southern New Brunswick</h2><br/><br/>Powered by: <a href="http://www.peopleandgroups.com" title="www.peopleandgroups.com">People&Groups</a></p></div></div></div></div></body></html>

Thoughts? Should I use a different method for capturing data from other sites?



via BrandonFlynn-NB

No comments:

Post a Comment