I am using node-crawler and i would like to know how can i properly extract all the text from html to get nice proper results. I would like to extract all words & keywords from a html document.
$("body").text();
The above code returns all the javascript code from the body which is wrong, also i would like to have have the words without tabs or whitespaces or possibly stored in an array.
Any suggestions what libraries are out there that can do such a task? Is it possible somehow with jquery selectors? Or i should roll my own functions to parse the html the way i want it?
via Azarus
No comments:
Post a Comment