Friday 5 May 2017

Express: want to do a big task after returning a response

I have created a program to extract 7GB zip file. It works normally when it is moved by itself.

It works normally when it is moved by itself.

Task 1. When Express.js receives the request, it will immediately return the response.

Task 2. After returning the response, download the zip file from s3 according to the request.Then unzip it.Upload it to s3 when that is over.

Task 3. If you received multiple requests simultaneously, I would like to work on Task 2 in order. Because processing it at the same time uses a lot of memory.

This code corresponds to Task 2. It works normally when it is moved by itself.

    const unzipUpload = path => {
        return new Promise((resolve, reject) => {
            let rStream = s3.getObject({Bucket: 'bucket', Key: path})
                .createReadStream()
                    .pipe(unzip.Parse())
                    .on('entry', function (entry) {
                        if(entry.path.match(/__MACOSX/) == null){

                            // pause
                            if(currentFileCount - uploadedFileCount > 10) rStream.pause()

                            currentFileCount += 1
                            var fileName = entry.path;
                            let up = entry.pipe(uploadFromStream(s3,fileName))

                            up.on('uploaded', e => {
                                uploadedFileCount += 1
                                console.log(currentFileCount, uploadedFileCount)

                                //resume
                                if(currentFileCount - uploadedFileCount <= 10) rStream.resume()

                                if(uploadedFileCount === allFileCount) resolve()
                                entry.autodrain()
                            }).on('error', e => {
                                reject()
                            })
                        }

                    }).on('error', e => {
                        console.log("unzip error")
                        reject()
                    }).on('finish', e => {
                        allFileCount = currentFileCount
                    })
            rStream.on('error', e=> {
                console.log(e)
                reject(e)
            })
        })
    }

    function uploadFromStream(s3,fileName) {
        var pass = new stream.PassThrough();

        var params = {Bucket: "bucket", Key: "hoge/unzip/" + fileName, Body: pass};
        let request = s3.upload(params, function(err, data) {
            if(err) pass.emit('error')
            if(!err) pass.emit('uploaded')
        })
        request.on('httpUploadProgress', progress => {
            console.log(progress)
        })

        return pass
    }

Here is the code I wrote to accomplish this task.

However, when receiving a request at the same time, processing starts at the same time and memory is consumed in large quantities.

router.post('/unzip', (req, res) => {
    let path = req.body.path
    res.json({message: 'OK.'})

    res.on('finish', () => {
        unzipUpload(path).then(() => {
            console.log("success")
        })
    })
})

Is there a way to minimize memory and realize this?



via tomoya ishizaka

No comments:

Post a Comment