Saturday, 6 May 2017

How to import an existing PDF file in node.js

I am working on import routines for node, so far I can import text nodes from a PDF using pdf2json, this works well, but doesn't work on PDF's that are image based and contain no text.

So I downloaded pdf2img, however there are plenty of issues with this module, the one I have now is that after running it, I get a lot of 0 byte png files created, no content and an error message:

    /docfire/node_modules/gm/lib/command.js:228
        proc.stdin.once('error', cb);
                  ^

    TypeError: Cannot read property 'once' of undefined
        at gm._spawn (/docfire/node_modules/gm/lib/command.js:228:15)
        at /docfire/node_modules/gm/lib/command.js:140:19
        at series (/docfire/node_modules/array-series/index.js:11:36)
        at gm._preprocess         

(/docfire/node_modules/gm/lib/command.js:177:5) at gm.stream (/docfire/node_modules/gm/lib/command.js:138:10) at convertPdf2Img (/docfire/node_modules/pdf2img/lib/pdf2img.js:93:6) at /docfire/node_modules/pdf2img/lib/pdf2img.js:67:9 at /docfire/node_modules/async/lib/async.js:246:17 at /docfire/node_modules/async/lib/async.js:122:13 at _each (/docfire/node_modules/async/lib/async.js:46:13)

I've tried posting a issue on the GIT site for the module, but it looks like quite a few people are having exactly the same problem and there doesn't seem to be any activity regarding any fixes.

What I would ideally like is a way to extract text and images from a PDF for node.



via SPlatten

No comments:

Post a Comment