Saturday 15 April 2017

Node: Get image from PDF whose URL has weird query string

As part of #1917Live, I've made a Twitter bot that tweets 100-year-old New York Times articles about Russia.

It uses the New York Times' Article Search API to get the articles and then uses twit to tweet them.

I also try to make the tweets more engaging, like an actual newspaper would try to do. So I parse the headlines to make them more readable, tag users that are part of #1917Live, and add a hashtag.

Now here's the part where I'm stuck. Each article comes with a URL to a pdf file showing how it looked when it was printed. Here's an example. I want to download that pdf, convert the first page into an image, and attach the image to the tweet. This is the simplified code I tried to use to get the PDF:

var http = require('http');
var fs = require('fs');

var url = "http://query.nytimes.com/mem/archive-free/pdf?res=9500E4DC153AE433A25756C1A9629C946696D6CF";

var file = fs.createWriteStream("file.pdf");
var request = http.get(url, function(response) {
  response.pipe(file);
});

But this does not work. If I were trying to download a normal pdf file, with a .pdf file extension, I suspect I wouldn't be having any problems. But this is different. Any help would be very much appreciated.



via hazards

No comments:

Post a Comment