Saturday 29 April 2017

Implementation of lowWaterMark in NodeJS streams

I have streaming HTTP resource which sends around 100KB per second in uniform way (ie bytes are equally spaced in time). Of course during transmission these bytes packed into packets but they are relatively small still.

I have NodeJS app which uses "request" to retrieve this data as Readable stream which in turn piped into Transform stream which then pushes it out into Writeable stream. As expected data comes, gets transformed and then pushed out to the destination. But my CPU is at 100% load even there not much data to process.

I have investigated the issue and found that in the beginning I get full buffers of data (64K) and then it drops to 300-5000 bytes per chunk. Transform is relatively heavy and inefficient on small chunks and profiler shows heavy use of _read(). So I concluded that I waste plenty of CPU by processing data in small chunks. In many cases that would be OK: it keeps system responsive. But in my case I am sure that data will be forthcoming and instead of processing it straight away as it becomes available I would like my stream to pause until "lowWaterMark" is reached. Problem of course is that there no "lowWaterMark" in NodeJS streams.

I know that "blocking" and "NodeJS" must not be used in one sentence but I wish _read() from HTTP to block until enough data has arrived. Streams post advisory value "n" when calling _read() and I believe it is amount of free space in the buffer. Therefore making blocking read on HTTP socket for particular number of bytes will be the best solution in my case but I cannot find anything what would allow me to do this.

What would be the best solution to get Transform stream to process data in big chunks on slow moving data?



via Vladimir Bashkirtsev

No comments:

Post a Comment