I am working on import routines for node, so far I can import text nodes from a PDF using pdf2json, this works well, but doesn't work on PDF's that are image based and contain no text.
So I downloaded pdf2img, however there are plenty of issues with this module, the one I have now is that after running it, I get a lot of 0 byte png files created, no content and an error message:
/docfire/node_modules/gm/lib/command.js:228
proc.stdin.once('error', cb);
^
TypeError: Cannot read property 'once' of undefined
at gm._spawn (/docfire/node_modules/gm/lib/command.js:228:15)
at /docfire/node_modules/gm/lib/command.js:140:19
at series (/docfire/node_modules/array-series/index.js:11:36)
at gm._preprocess
(/docfire/node_modules/gm/lib/command.js:177:5) at gm.stream (/docfire/node_modules/gm/lib/command.js:138:10) at convertPdf2Img (/docfire/node_modules/pdf2img/lib/pdf2img.js:93:6) at /docfire/node_modules/pdf2img/lib/pdf2img.js:67:9 at /docfire/node_modules/async/lib/async.js:246:17 at /docfire/node_modules/async/lib/async.js:122:13 at _each (/docfire/node_modules/async/lib/async.js:46:13)
I've tried posting a issue on the GIT site for the module, but it looks like quite a few people are having exactly the same problem and there doesn't seem to be any activity regarding any fixes.
What I would ideally like is a way to extract text and images from a PDF for node.
via SPlatten
No comments:
Post a Comment