The question:
I have the following request. In a nutshell, the wallets_history
represents the transactions that happen in the wallet. And the goal is to get 1 point every in_interval
from in_from
to in_to
.
They will eventually be displayed as a histogram to the user. (I simplified the request to make it easier to read, this doesn’t impact the issue.)
CREATE OR REPLACE FUNCTION get_wallets_histogram(
in_wallet_id TEXT,
in_code TEXT,
in_asset_id TEXT,
in_from TIMESTAMPTZ,
in_to TIMESTAMPTZ,
in_interval INTERVAL)
RETURNS TABLE(date TIMESTAMPTZ, balance_change BIGINT) AS $$
SELECT s AS time,
change::bigint
FROM generate_series(in_from,
in_to - in_interval,
in_interval) s
LEFT JOIN LATERAL (
SELECT s, COALESCE(SUM(balance_change),0) change FROM wallets_history
WHERE s <= seen_at AND seen_at < s + in_interval
) q USING (s)
$$ LANGUAGE SQL STABLE;
Then I run this: (Note that the interval is 16 days, so there is only a single row in the lateral request for simplified example)
SELECT * FROM get_wallets_histogram(
'WHALE',
'BTC',
'',
'2022-01-01'::timestamptz,
'2022-02-01'::timestamptz,
interval '16 day');
The query plan is
Nested Loop Left Join (cost=2526.62..2526662.88 rows=1000 width=16) (actual time=109.301..109.303 rows=1 loops=1)
Output: s.s, (q.change)::bigint
Inner Unique: true
-> Function Scan on pg_catalog.generate_series s (cost=0.01..10.01 rows=1000 width=8) (actual time=104.854..104.855 rows=1 loops=1)
Output: s.s
Function Call: generate_series('2022-01-01 00:00:00+00'::timestamp with time zone, ('2022-02-01 00:00:00+00'::timestamp with time zone - '16 days'::interval), '16 days'::interval)
-> Subquery Scan on q (cost=2526.62..2526.64 rows=1 width=40) (actual time=4.442..4.443 rows=1 loops=1)
Output: q.s, q.change
Filter: (s.s = q.s)
-> Aggregate (cost=2526.62..2526.63 rows=1 width=40) (actual time=4.440..4.441 rows=1 loops=1)
Output: s.s, COALESCE(sum(wallets_history.balance_change), '0'::numeric)
-> Bitmap Heap Scan on public.wallets_history (cost=27.85..2523.83 rows=1115 width=5) (actual time=0.842..2.570 rows=23040 loops=1)
Output: wallets_history.wallet_id, wallets_history.code, wallets_history.asset_id, wallets_history.tx_id, wallets_history.seen_at, wallets_history.blk_height, wallets_history.blk_idx, wallets_history.balance_change,
wallets_history.balance_total, wallets_history.nth
Recheck Cond: ((s.s <= wallets_history.seen_at) AND (wallets_history.seen_at < (s.s + '16 days'::interval)))
Heap Blocks: exact=421
-> Bitmap Index Scan on wallets_history_by_seen_at (cost=0.00..27.57 rows=1115 width=0) (actual time=0.807..0.807 rows=23040 loops=1)
Index Cond: ((wallets_history.seen_at >= s.s) AND (wallets_history.seen_at < (s.s + '16 days'::interval)))
Planning Time: 0.443 ms
JIT:
Functions: 13
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 2.408 ms, Inlining 9.035 ms, Optimization 59.083 ms, Emission 36.578 ms, Total 107.104 ms
Execution Time: 111.823 ms
If I read properly the query plan, the first row generated by generate_series
doesn’t happen before 104ms into the request!
What I don’t understand though is why the lateral requests seems to have a startup time BEFORE the startup time of the generate_series
?
I don’t manage to find the bottleneck in this request. There is a huge gap between when generate_series
startup time begin, and when the bitmap heap scan of the inner later query finish.
Strangest thing is that when I try to execute the lateral request manually from outside the function. It performs fast. However, as soon as I left join it to the generate_series (even if there is only 1 timestamp emitted), it takes more than 100ms!
There isn’t any node that is obviously making things slow.
The Solutions:
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
Method 1
Summary of the fix
I found out that the jit was taking most of the time in the query, then I tried to track down why JIT was needed for my simple query.
It turns out that generate_series
for timestamps confuse the planner: the planner always estimates it returns 1000 rows.
So I created a generate_series_fixed
below which add a LIMIT
to generate_series
, then use this as a drop in replacement instead.
The rest of this answer is the history of my findings.
Here is the drop in replacement if you need it.
-- Technically this LIMIT clause is useless. However, without it the query planner
-- is unable to correctly estimate the numbers of row in generate_series
-- which cause JIT compilation, slowing down the query considerably
-- See https://dba.stackexchange.com/questions/310235/why-is-my-nested-loop-taking-so-much-time/310242#310242
CREATE OR REPLACE FUNCTION generate_series_fixed(in_from TIMESTAMPTZ, in_to TIMESTAMPTZ, in_interval INTERVAL) RETURNS TABLE(s TIMESTAMPTZ) AS $$
SELECT generate_series(in_from, in_to, in_interval)
LIMIT (EXTRACT(EPOCH FROM (in_to - in_from))/EXTRACT(EPOCH FROM in_interval)) + 1; -- I am unsure about the exact formula, but over estimating 1 row is fine...
$$ LANGUAGE SQL STABLE;
So it seems the culprit was… the JIT!
Using set jit = off
completely solves the issue.
Plan with the JIT on
WindowAgg (cost=2543486.92..2543511.92 rows=1000 width=24) (actual time=305.248..305.252 rows=1 loops=1)
Output: s.s, (q.change)::bigint, ((sum(q.change) OVER (?) + COALESCE($0, '0'::numeric)))::bigint
InitPlan 1 (returns $0)
-> Limit (cost=8414.21..8414.21 rows=1 width=23) (actual time=29.647..29.648 rows=1 loops=1)
Output: wallets_history_1.balance_total, wallets_history_1.seen_at, wallets_history_1.blk_height, wallets_history_1.blk_idx
-> Sort (cost=8414.21..8662.97 rows=99505 width=23) (actual time=29.646..29.646 rows=1 loops=1)
Output: wallets_history_1.balance_total, wallets_history_1.seen_at, wallets_history_1.blk_height, wallets_history_1.blk_idx
Sort Key: wallets_history_1.seen_at DESC, wallets_history_1.blk_height DESC, wallets_history_1.blk_idx DESC
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on public.wallets_history wallets_history_1 (cost=1871.58..7916.68 rows=99505 width=23) (actual time=3.425..19.254 rows=98750 loops=1)
Output: wallets_history_1.balance_total, wallets_history_1.seen_at, wallets_history_1.blk_height, wallets_history_1.blk_idx
Recheck Cond: (wallets_history_1.seen_at < '2022-01-01 00:00:00+00'::timestamp with time zone)
Filter: ((wallets_history_1.wallet_id = 'WHALE'::text) AND (wallets_history_1.code = 'BTC'::text) AND (wallets_history_1.asset_id = ''::text))
Heap Blocks: exact=1800
-> Bitmap Index Scan on wallets_history_by_seen_at (cost=0.00..1846.71 rows=99505 width=0) (actual time=3.272..3.272 rows=98750 loops=1)
Index Cond: (wallets_history_1.seen_at < '2022-01-01 00:00:00+00'::timestamp with time zone)
-> Sort (cost=2535072.71..2535075.21 rows=1000 width=40) (actual time=275.590..275.592 rows=1 loops=1)
Output: s.s, q.change
Sort Key: s.s
Sort Method: quicksort Memory: 25kB
-> Nested Loop Left Join (cost=2534.99..2535022.88 rows=1000 width=40) (actual time=275.557..275.559 rows=1 loops=1)
Output: s.s, q.change
Inner Unique: true
-> Function Scan on pg_catalog.generate_series s (cost=0.01..10.01 rows=1000 width=8) (actual time=268.958..268.959 rows=1 loops=1)
Output: s.s
Function Call: generate_series('2022-01-01 00:00:00+00'::timestamp with time zone, ('2022-02-01 00:00:00+00'::timestamp with time zone - '16 days'::interval), '16 days'::interval)
-> Subquery Scan on q (cost=2534.98..2535.00 rows=1 width=40) (actual time=6.586..6.587 rows=1 loops=1)
Output: q.s, q.change
Filter: (s.s = q.s)
-> Aggregate (cost=2534.98..2534.99 rows=1 width=40) (actual time=6.583..6.584 rows=1 loops=1)
Output: s.s, COALESCE(sum(wallets_history.balance_change), '0'::numeric)
-> Bitmap Heap Scan on public.wallets_history (cost=27.85..2532.19 rows=1115 width=5) (actual time=0.901..4.737 rows=23040 loops=1)
Output: wallets_history.wallet_id, wallets_history.code, wallets_history.asset_id, wallets_history.tx_id, wallets_history.seen_at, wallets_history.blk_height, wallets_history.blk_idx, wallets_history.bala
nce_change, wallets_history.balance_total, wallets_history.nth
Recheck Cond: ((s.s <= wallets_history.seen_at) AND (wallets_history.seen_at < (s.s + '16 days'::interval)))
Filter: ((wallets_history.wallet_id = 'WHALE'::text) AND (wallets_history.code = 'BTC'::text) AND (wallets_history.asset_id = ''::text))
Heap Blocks: exact=421
-> Bitmap Index Scan on wallets_history_by_seen_at (cost=0.00..27.57 rows=1115 width=0) (actual time=0.864..0.864 rows=23040 loops=1)
Index Cond: ((wallets_history.seen_at >= s.s) AND (wallets_history.seen_at < (s.s + '16 days'::interval)))
Planning Time: 2.333 ms
JIT:
Functions: 27
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 2.050 ms, Inlining 40.980 ms, Optimization 127.174 ms, Emission 100.448 ms, Total 270.652 ms
Execution Time: 334.052 ms
(44 rows)
Plan with the JIT off
WindowAgg (cost=2543486.92..2543511.92 rows=1000 width=24) (actual time=39.259..39.264 rows=1 loops=1)
Output: s.s, (q.change)::bigint, ((sum(q.change) OVER (?) + COALESCE($0, '0'::numeric)))::bigint
InitPlan 1 (returns $0)
-> Limit (cost=8414.21..8414.21 rows=1 width=23) (actual time=32.375..32.376 rows=1 loops=1)
Output: wallets_history_1.balance_total, wallets_history_1.seen_at, wallets_history_1.blk_height, wallets_history_1.blk_idx
-> Sort (cost=8414.21..8662.97 rows=99505 width=23) (actual time=32.374..32.374 rows=1 loops=1)
Output: wallets_history_1.balance_total, wallets_history_1.seen_at, wallets_history_1.blk_height, wallets_history_1.blk_idx
Sort Key: wallets_history_1.seen_at DESC, wallets_history_1.blk_height DESC, wallets_history_1.blk_idx DESC
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on public.wallets_history wallets_history_1 (cost=1871.58..7916.68 rows=99505 width=23) (actual time=3.147..21.561 rows=98750 loops=1)
Output: wallets_history_1.balance_total, wallets_history_1.seen_at, wallets_history_1.blk_height, wallets_history_1.blk_idx
Recheck Cond: (wallets_history_1.seen_at < '2022-01-01 00:00:00+00'::timestamp with time zone)
Filter: ((wallets_history_1.wallet_id = 'WHALE'::text) AND (wallets_history_1.code = 'BTC'::text) AND (wallets_history_1.asset_id = ''::text))
Heap Blocks: exact=1800
-> Bitmap Index Scan on wallets_history_by_seen_at (cost=0.00..1846.71 rows=99505 width=0) (actual time=3.005..3.005 rows=98750 loops=1)
Index Cond: (wallets_history_1.seen_at < '2022-01-01 00:00:00+00'::timestamp with time zone)
-> Sort (cost=2535072.71..2535075.21 rows=1000 width=40) (actual time=6.875..6.877 rows=1 loops=1)
Output: s.s, q.change
Sort Key: s.s
Sort Method: quicksort Memory: 25kB
-> Nested Loop Left Join (cost=2534.99..2535022.88 rows=1000 width=40) (actual time=6.870..6.873 rows=1 loops=1)
Output: s.s, q.change
Inner Unique: true
-> Function Scan on pg_catalog.generate_series s (cost=0.01..10.01 rows=1000 width=8) (actual time=0.023..0.023 rows=1 loops=1)
Output: s.s
Function Call: generate_series('2022-01-01 00:00:00+00'::timestamp with time zone, ('2022-02-01 00:00:00+00'::timestamp with time zone - '16 days'::interval), '16 days'::interval)
-> Subquery Scan on q (cost=2534.98..2535.00 rows=1 width=40) (actual time=6.846..6.848 rows=1 loops=1)
Output: q.s, q.change
Filter: (s.s = q.s)
-> Aggregate (cost=2534.98..2534.99 rows=1 width=40) (actual time=6.845..6.846 rows=1 loops=1)
Output: s.s, COALESCE(sum(wallets_history.balance_change), '0'::numeric)
-> Bitmap Heap Scan on public.wallets_history (cost=27.85..2532.19 rows=1115 width=5) (actual time=1.324..4.837 rows=23040 loops=1)
Output: wallets_history.wallet_id, wallets_history.code, wallets_history.asset_id, wallets_history.tx_id, wallets_history.seen_at, wallets_history.blk_height, wallets_history.blk_idx, wallets_history.bala
nce_change, wallets_history.balance_total, wallets_history.nth
Recheck Cond: ((s.s <= wallets_history.seen_at) AND (wallets_history.seen_at < (s.s + '16 days'::interval)))
Filter: ((wallets_history.wallet_id = 'WHALE'::text) AND (wallets_history.code = 'BTC'::text) AND (wallets_history.asset_id = ''::text))
Heap Blocks: exact=421
-> Bitmap Index Scan on wallets_history_by_seen_at (cost=0.00..27.57 rows=1115 width=0) (actual time=1.290..1.290 rows=23040 loops=1)
Index Cond: ((wallets_history.seen_at >= s.s) AND (wallets_history.seen_at < (s.s + '16 days'::interval)))
Planning Time: 0.451 ms
Execution Time: 39.424 ms
(40 rows)
Tested on Postgres v12.9.
This come to a surprise. As wallet_history
is a materialized view, the query didn’t seem too complex to me to require JIT.
I tried again also on Postgres 14.2, same problem.
I will keep posting here if I find the root cause about why the JIT is causing problem. I am marking this question solved meanwhile.
EDIT: I learned the JIT kick off if the cost of the query is high enough (100000 by default)
Looking at the query plan, it seems the Nested loop Left Join cost 2500000, pushing it above the threshold.
I don’t know yet why the nested loop left join is so expensive.
It may be related with the fact that the number of planned row of the generate_serie
is really wrong… digging.
Here is another question that seems related Left join with generate_series misleads the planner
EDIT 2:
I found a hack which work… I am basically using a subquery for generate_serie + a LIMIT clause to help the query planner.
For example, in my non-simplified query:
CREATE OR REPLACE FUNCTION get_wallets_histogram(in_wallet_id TEXT, in_code TEXT, in_asset_id TEXT, in_from TIMESTAMPTZ, in_to TIMESTAMPTZ, in_interval INTERVAL)
RETURNS TABLE(date TIMESTAMPTZ, balance_change BIGINT, balance BIGINT) AS $$
SELECT s AS time,
change::bigint,
(SUM (q.change) OVER (ORDER BY s) + COALESCE((SELECT balance_total FROM wallets_history WHERE seen_at < in_from AND wallet_id=in_wallet_id AND code=in_code AND asset_id=in_asset_id ORDER BY seen_at DESC, blk_height DESC, blk_idx DESC LIMIT 1), 0))::BIGINT AS balance
FROM (-- Technically this LIMIT clause is useless. However, without it the query planner
-- is unable to correctly estimate the numbers of row in generate_series
-- which cause JIT compilation, slowing down the query considerably
SELECT * FROM generate_series(in_from,
in_to - in_interval,
in_interval) s LIMIT (EXTRACT(EPOCH FROM (in_to - in_from))/EXTRACT(EPOCH FROM in_interval))) s
LEFT JOIN LATERAL (
SELECT s, COALESCE(SUM(balance_change),0) change FROM wallets_history
WHERE s <= seen_at AND seen_at < s + in_interval AND wallet_id=in_wallet_id AND code=in_code AND asset_id=in_asset_id
) q USING (s)
$$ LANGUAGE SQL STABLE;
EDIT 3:
In case somebody get into the same issue, here is a generate_series_fixed
function that you can use in place of the original generate_series
.
-- Technically this LIMIT clause is useless. However, without it the query planner
-- is unable to correctly estimate the numbers of row in generate_series
-- which cause JIT compilation, slowing down the query considerably
-- See https://dba.stackexchange.com/questions/310235/why-is-my-nested-loop-taking-so-much-time/310242#310242
CREATE OR REPLACE FUNCTION generate_series_fixed(in_from TIMESTAMPTZ, in_to TIMESTAMPTZ, in_interval INTERVAL) RETURNS TABLE(s TIMESTAMPTZ) AS $$
SELECT generate_series(in_from, in_to, in_interval)
LIMIT (EXTRACT(EPOCH FROM (in_to - in_from))/EXTRACT(EPOCH FROM in_interval)) + 1; -- I am unsure about the exact formula, but over estimating 1 row is fine...
$$ LANGUAGE SQL STABLE;
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0