Monday 8 May 2017

Nodejs: Performance issues parsing CSV and Zip

The files are submitted to my server and I'm trying to determine if the CSV is valid and if all the images referenced from the CSV are present in the zip. I have to populate a Mongo database with all that information but I want to do it in the background, and send a response to the client as fast as possible.

So I have two readable streams and I have three different approaches:

  • Unzipping the file takes 24sec, so unzip + parsing the csv + fs.exists is not an option.

  • Parsing the whole csv, save filenames in array and then reading the zip using node-unzip and pipe takes 5 seconds.

  • Reading the csv and in parallel read the zip and use a shared array to determine simultaneusly if the files are present, which is the fastest option, takes 4 seconds.

Does anyone have an idea of how to do it faster?

PD: I'm using split2 + through2 + regexp to check if the csv is valid and it's really fast. Node version is 7.7.2



via Diego

No comments:

Post a Comment