The question:
In Postgres 13, I have a table which gets updated frequently. However, the update query is rather complicated and uses the same values multiple times. So, using a CTE seems quite a logical thing to do.
A simplified example looks like this:
WITH my_cte AS (
SELECT
my_id,
CASE WHEN my_value1 > 100 THEN 50 ELSE 10 END AS my_addition
FROM my_table
WHERE my_id = $1
)
UPDATE my_table
SET my_value1 = my_table.my_value1 + my_cte.my_addition,
my_value2 = my_table.my_value2 + my_cte.my_addition
FROM my_cte
WHERE my_table.my_id = my_cte.my_id
Now I’m wondering: What would happen if between the SELECT
in the CTE and the UPDATE
, the table is updated by another query, changing my_value1
on thus, the calculation of my_addition
were to become outdated and wrong when the UPDATE
happens. Can such a situation occur? Or does Postgres set an implicit lock automatically?
If Postgres does no magic here and I need to take care of it myself: Would it be sufficient to do FOR UPDATE
in the SELECT
of the CTE?
Sorry if I did not make myself clear here: It’s not that I want to “see” those concurrent modifications, I want to prevent them i.e. once the calculation the SELECT
is done, no other queries might modify that very row till the UPDATE
is done.
In real life, what I mocked here by CASE WHEN my_value1 > 100 THEN 50 ELSE 10 END
is about 20 lines long and I need it at about 5 places in the UPDATE
. Since I’m a big fan of “Do not repeat yourself”, I think a CTE is the way to go. Or is there a better way to avoid copy & pasting in an UPDATE
without a CTE?
The Solutions:
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
Method 1
Postgres uses a multiversion model (Multiversion Concurrency Control, MVCC).
In default READ COMMITTED
isolation level, each separate query effectively sees a snapshot of the database as of the instant the query begins to run. Subsequent queries – even within the same transaction – can see a different snapshot if concurrent transactions are committed in between. (Plus what has been done in the same transaction so far.)
However, as far as CTEs are concerned, all sub-statements in WITH
are executed concurrently with the outer statement, they effectively see the same snapshot of the database. All of it is considered a single query for this purpose.
So, no, you don’t need an explicit lock to stay consistent.
Encapsulating the logic in a function may be convenient for a number of reasons, but that has no effect whatsoever on concurrency. Aside: a CTE with a volatile function is never inlined. See:
A SELECT
does not lock queried rows. Postgres allows concurrent UPDATES
. But UPDATE
locks target rows. Concurrent transactions trying to write also, have to wait until the locking transaction has finished.
If you want to forbid writes to rows (columns) that have only been selected from while your UPDATE
is in progress, you may want to take locks anyway (or use a stricter isolation level). Maybe FOR UPDATE
locks, or maybe a weaker lock. That depends on details and requirements you are expressly withholding / not giving in your question.
Also (though you did not ask for that), if multiple concurrent transactions may be writing to overlapping rows (more than one at a time), be sure to adhere to the same, consistent order of rows to avoid deadlocks.
Method 2
Building on what a_horse_with_no_name said:
I would put such a condition into a (SQL) function. Another alternative to locking (if you expect this to occur rarely) would be to use the
serializable
isolation level and re-run the UPDATE if an error occurs.
Put the addition logic into a function, and then call that function each time you went to set a new value. This will help you in two ways.
- This allow you to avoid duplicating the addition logic each time you use it.
- This makes for a very simple update statement that can get in quick, lock just a few rows, and get out.
Something like this should work.
CREATE FUNCTION fn_my_addition(my_value int)
RETURNS INT
LANGUAGE SQL
AS
$$
select CASE my_value1 > 100 THEN 50 ELSE 10 END;
$$;
UPDATE my_table
SET my_value1 = my_value1 + fn_my_addition(my_value1),
my_value2 = my_value2 + fn_my_addition(my_value2)
WHERE my_id = $1;
Method 3
If you want to prevent concurrent statements from modifying the rows that the CTE selects before they get updated, you need to use SELECT ... FOR NO KEY UPDATE
in the CTE.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0