I have 3 CSV files which are several gigs each. Streaming the data is obviously a given. While streaming the data, what is the least expensive way to compare two of the files together? I would rather not load them into memory as my limit is 2GB.
I was thinking of loading one feed completely into redis and then each streamed chunk would be compared to that base file, if there is no match it would be added to redis and the redis database would essentially only store uniques.
via Query
No comments:
Post a Comment