In my application, I have a feed of social media posts, which I basically reduce to their text content. So, in my RethinkDB, there's a table of:
[{text: "This is a document."}, {text: "This is another document."}, ...]
These posts are constantly being added to the database.
What I want to do is to skip saving similar documents. In other words, I don't want to save documents where people have essentially said similar things.
For example:
{text: 'I ate ice-cream today!'} would be similar to {text: 'I ate a big bowl of ice-cream! #icecream'} but not to {text: 'I have visited the Disneyland!'}
What is the way (preferably, specific to RethinkDB) in which I can handle this task most efficiently?
via nainy
No comments:
Post a Comment