I have a Node.js app that runs on AWS.
There are two EC2 instances running the same application behind an elastic load balancer. If the servers load is high, more instances could start running the same Node.js app but there is always at least 2 instances.
Both instances connect to one unique instance of postgres:
Every few seconds the ELB receives HTTP POST requests that are processed by one of the two Node.js instances. If everything goes well, each HTTP POST ends as a new row in a table, let's call it TABLE1. Each row has a boolean field named is_processed
that allows identifying if the row has been already processed.
Each Node.js instance is also running a scheduled job that is executed every few minutes. The scheduled job searches for rows in TABLE1 where is_processed
is false
and processes them. If the processing logic goes well:
- Step 1: A new row is inserted or updated in TABLE2.
- Step 2: The
is_processed
column in TABLE1 is updated totrue
.
These two steps take place within a transaction.
My problem takes place when:
- Instance A fetches some
is_processed=false
rows and starts processing then within a transaction. - instance B fetches the same
is_processed=false
rows before A has committed its transaction.
As a result, some rows will be processed multiple times.
I believe this can be solved by using the right level of isolation:
My problem is that I'm not sure about the best isolation level. My guess is that I need REPEATABLE_READ
because there are no delete operations?
My other concern is that I don't know if this locking could cause any kind of problems with the AWS automated DB backups?
via OweR ReLoaDeD
No comments:
Post a Comment