The question:
I have this table
Table "public.lineitem"
Column | Type | Collation | Nullable | Default
-----------------+---------------+-----------+----------+---------
l_orderkey | integer | | |
l_partkey | integer | | |
l_suppkey | integer | | |
l_linenumber | integer | | |
l_quantity | integer | | |
l_extendedprice | numeric(12,2) | | |
l_discount | numeric(12,2) | | |
l_tax | numeric(12,2) | | |
l_returnflag | character(1) | | |
l_linestatus | character(1) | | |
l_shipdate | date | | |
l_commitdate | date | | |
l_receiptdate | date | | |
l_shipinstruct | character(25) | | |
l_shipmode | character(10) | | |
l_comment | character(44) | | |
l_partsuppkey | character(20) | | |
Indexes:
"l_shipdate_c_idx" btree (l_shipdate) CLUSTER
"l_shipmode_h_idx" hash (l_shipdate)
Foreign-key constraints:
"lineitem_l_orderkey_fkey" FOREIGN KEY (l_orderkey) REFERENCES orders(o_orderkey)
"lineitem_l_partkey_fkey" FOREIGN KEY (l_partkey) REFERENCES part(p_partkey)
"lineitem_l_partsuppkey_fkey" FOREIGN KEY (l_partsuppkey) REFERENCES partsupp(ps_partsuppkey)
"lineitem_l_suppkey_fkey" FOREIGN KEY (l_suppkey) REFERENCES supplier(s_suppkey)
and this query:
explain analyze select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice*(1 - l_discount)) as sum_disc_price,
sum(l_extendedprice*(1 - l_discount)*(1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate<='31/08/1998'
GROUP by
l_returnflag,
l_linestatus
ORDER by
l_returnflag,
l_linestatus
returning this query plan:
"Finalize GroupAggregate (cost=2631562.25..2631564.19 rows=6 width=212) (actual time=28624.012..28624.466 rows=4 loops=1)"
" Group Key: l_returnflag, l_linestatus"
" -> Gather Merge (cost=2631562.25..2631563.65 rows=12 width=212) (actual time=28623.998..28624.442 rows=12 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Sort (cost=2630562.23..2630562.24 rows=6 width=212) (actual time=28620.633..28620.633 rows=4 loops=3)"
" Sort Key: l_returnflag, l_linestatus"
" Sort Method: quicksort Memory: 27kB"
" Worker 0: Sort Method: quicksort Memory: 27kB"
" Worker 1: Sort Method: quicksort Memory: 27kB"
" -> Partial HashAggregate (cost=2630562.03..2630562.15 rows=6 width=212) (actual time=28620.607..28620.611 rows=4 loops=3)"
" Group Key: l_returnflag, l_linestatus"
" Batches: 1 Memory Usage: 24kB"
" Worker 0: Batches: 1 Memory Usage: 24kB"
" Worker 1: Batches: 1 Memory Usage: 24kB"
" -> Parallel Seq Scan on lineitem (cost=0.00..1707452.35 rows=24616258 width=24) (actual time=0.549..19028.353 rows=19701655 loops=3)"
" Filter: (l_shipdate <= '1998-08-31'::date)"
" Rows Removed by Filter: 293696"
"Planning Time: 0.374 ms"
"Execution Time: 28624.523 ms"
- Why is the optimizer preferring the sequential scan over
lineitem
table instead of usingl_shipdate_c_idx
? Should I drop it?
Postgres version: PostgreSQL 14.2 on x86_64-apple-darwin20.6.0, compiled by Apple clang version 12.0.0 (clang-1200.0.32.29), 64-bit
The Solutions:
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
Method 1
Your filter
l_shipdate<='31/08/1998'
Is not very selective, we can see from the plan that it was responsible for removing only 293,696 rows and it ended up needing to use 19,701,655. If it was to use an index to read those rows one by one it would probably have been much slower than the sequential scan of the table.
Should I drop it?
If this is the only query you are running and the only filter being used, then probably. Otherwise, there’s not enough information to go on. The index would probably be useful if you wanted to see the rows for 1 particular day. The index might be better off with some additional columns in it. Impossible to say without knowing your application.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0