Friday 5 May 2017

MapReduce to removing duplicates of string

I have a map function which finds out domain names from email id & emit that one to reduce function which counts no of domains.

[
    { email:"xyz@gmail.com"},
    { email:"abc@abc.com"},
    { email:"inder@hotmail.com"},
    { email:"Ravi@Hotmail.com"},
    { email:"xxx@GMail.com"},
]

Here is the function

db.collection.mapReduce(
    function() {
        emit(this.email.substr(this.email.indexOf('@') + 1), 1);  
    }, 
    function(host, count) { 
        return Array.sum(count) ; }, 
    { out: "hosts" } 
)

Output is good:-

   gmail.com
   abc.com
   hotmail.com
   Hotmail.com
   GMail.com

But what I want is

   gmail.com
   abc.com
   hotmail.com

I don't want to have domain name with duplicates with Capital letters in between & same name prior to <.com>. Any ideas how to remove duplicates with CAPITAL LETTERS. OR any relevant example is also good.



via Inder R Singh

No comments:

Post a Comment