Monday 10 April 2017

Stress testing API with very large dataset executes queries sequentially

I have an API endpoint which I am trying to stress test which reads a very large MongoDB database collection (2 million documents). Each query takes roughly 2 seconds however the problem I am having is that the connection to the database isn't being pooled correctly so each query runs sequentially instead of concurrently.

I am using Mongoose to connect to my database and I am using artillery.io for testing.

Here is my connection code:

const mongoose = require('mongoose');
const Promise = require('bluebird');

const connectionString = process.env.MONGO_DB || 'mongodb://localhost/mydatabase';

mongoose.Promise = Promise;

mongoose.connect(connectionString, {
    server: { poolSize: 10 }
});

const db = mongoose.connection;

db.on('error', console.error.bind(console, 'connection error: '));

db.once('open', function() {
    console.log('Connected to: ' + connectionString);
});

module.exports = db;

It's your pretty bog standard connection procedure however probably the most important part is the server: { poolSize: 10 } line.

I am using the following script for artillery.io testing:

config:
  target: 'http://localhost:1337'
  phases:
    -
      duration: 10
      arrivalRate: 5
      name: "Warm-up"

scenarios:
  -
    name: "Search by postcodes"
    flow:
      -
        post:
          url: "/api/postcodes/gb_full/search"
          headers:
            Content-Type: 'application/json'
          json:
            postcodes:
              - ABC 123,
              - DEF 345,
              - GHI 678

This test executes 50 calls to the API over 10 seconds. Now here's where the problem is, the API appears to execute queries sequentially, see the test results below:

"latency": {
  "min": 1394.1,
  "max": 57693,
  "median": 30222.7,
  "p95": 55396.8,
  "p99": 57693
},

And the database logs are as follows:

connection accepted from 127.0.0.1:60770 #1 (1 connection now open)
...
2017-04-10T18:45:55.389+0100 ... 1329ms
2017-04-10T18:45:56.711+0100 ... 1321ms
2017-04-10T18:45:58.016+0100 ... 1304ms
2017-04-10T18:45:59.355+0100 ... 1338ms
2017-04-10T18:46:00.651+0100 ... 1295ms

It appears as though the API is only using one connection, which seems correct however it was my understanding that this will automatically put the poolSize to good use and execute these queries concurrently instead of one at a time.

What am I doing wrong here? How can I execute these database queries in parallel?



via Mike Eason

No comments:

Post a Comment