Wednesday 10 May 2017

What is the best transaction isolation level in this scenario?

I have a Node.js app that runs on AWS.

There are two EC2 instances running the same application behind an elastic load balancer. If the servers load is high, more instances could start running the same Node.js app but there is always at least 2 instances.

Both instances connect to one unique instance of postgres:

enter image description here

Every few seconds the ELB receives HTTP POST requests that are processed by one of the two Node.js instances. If everything goes well, each HTTP POST ends as a new row in a table, let's call it TABLE1. Each row has a boolean field named is_processed that allows identifying if the row has been already processed.

Each Node.js instance is also running a scheduled job that is executed every few minutes. The scheduled job searches for rows in TABLE1 where is_processed is false and processes them. If the processing logic goes well:

  • Step 1: A new row is inserted or updated in TABLE2.
  • Step 2: The is_processed column in TABLE1 is updated to true.

These two steps take place within a transaction.

My problem takes place when:

  • Instance A fetches some is_processed=false rows and starts processing then within a transaction.
  • instance B fetches the same is_processed=false rows before A has committed its transaction.

As a result, some rows will be processed multiple times.

I believe this can be solved by using the right level of isolation:

enter image description here

My problem is that I'm not sure about the best isolation level. My guess is that I need REPEATABLE_READ because there are no delete operations?

My other concern is that I don't know if this locking could cause any kind of problems with the AWS automated DB backups?



via OweR ReLoaDeD

No comments:

Post a Comment