Monday, 3 April 2017

nodejs exec Wget command

I’m writing a nodejs application for download entire web sites using “wget” unix command, but I have a problem with some urls inside the downloaded pages, .html appeares at the end of the files e.g

<img src=“images/photo.jpeg.html”> or <script src=“js/scripts.js.html”>

The code i’m using is the following:

    var util = require('util'),
    exec = require('child_process').exec,
    child,
    url = 'http://www.example.com/';
child = exec('wget --mirror -p --convert-links --html-extension -e robots=off -P /destination_folder/ ' + url,
  function (error, stdout, stderr) {
    console.log('stdout: ' + stdout);
    console.log('stderr: ' + stderr);
    if (error !== null) {
      console.log('exec error: ' + error);
    }
});

N.B If i use this command (wget --mirror -p --html-extension --convert-links -e robots=off -P . http://www.example.com) directly on the Unix shell it works correctly.

I don’t understand where is the problem, could you help me please?

Thank you



via S Madry

No comments:

Post a Comment