The question:
I need to add a new BIGSERIAL
column to a huge table (~3 billion records). This question is similar to what I need to do and the accepted answer has helped me somewhat. But I’m still wondering about something. In my case, the table already has a BIGSERIAL
column which is the primary key, but many rows have been deleted so now there are gaps. (The table has subsequently been fully vacuumed.) I need to regenerate the values so that they are sequential again. Here are 5 example rows of what I want to achieve where the new_value > 1000
:
+---------+---------+
|old_value|new_value|
+---------+---------+
|1026 |1001 |
|1027 |1002 |
|1030 |1003 |
|1032 |1004 |
|1039 |1005 |
+---------+---------+
I have successfully implemented the alternative approach as mentioned in the referenced answer above (CREATE TABLE ...
and then INSERT INTO new_table SELECT * FROM ...
), but I would also like to attempt, and benchmark against, the initial suggestion. The problem, however, is that I don’t know whether the new_value
will be generated in the same order as the old_value
as this is a requirement.
How can I ensure the order of the new_value
column follows/tracks the order of the old_value
column when the new_value
column is added using a statement like this:
ALTER TABLE existing_table ADD COLUMN new_value BIGSERIAL;
A different approach
I also attempted the following (that works quite well on a small table), but it’s much slower than the alternative suggestion of the referenced answer on very large tables:
ALTER TABLE existing_table ADD COLUMN new_value BIGINT NOT NULL DEFAULT 0;
UPDATE existing_table AS existing
SET new_value = generated.new_id
FROM (
SELECT original.old_value
, row_number() OVER (ORDER BY original.old_value) AS new_id
FROM existing_table AS original
) generated
WHERE existing.old_value = generated.old_value;
The Solutions:
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
Method 1
First, consider the advise in the comments: do you really need to remove gaps? Typically, you don’t. And gaps will be creeping back in. See:
- Is IDENTITY continuous in new versions of PostgreSQL?
- Serial numbers per group of rows for compound key
Next, consider the updates to my old answer you have been working off:
Among other things, how to possibly avoid writing WAL for the table data in the new table – that will be the most significant bit to further improve performance.
How can I ensure the order of the
new_value
column follows/tracks the
order of theold_value
column when thenew_value
column is added using
a statement like this:
ALTER TABLE existing_table ADD COLUMN new_value BIGSERIAL;
You cannot. Use one of the other routes.
I also attempted the following (that works quite well on a small table), but it’s much slower than the alternative suggestion of the referenced answer on very large tables:
The UPDATE
route is inherently more expensive than writing a pristine new table from scratch (adding indexes constraints afterwards).
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0