The question:
I have two tables : customer_assortment
and assortment_product
. They are connected between them with an assortment_uuid
. I also have:
CREATE UNIQUE INDEX assortment_product_assortment_uuid_article_number_idx
ON public.assortment_product USING btree (assortment_uuid, article_number)
CREATE UNIQUE INDEX customer_assortment_assortment_uuid_published_to_idx
ON public.customer_assortment USING btree (assortment_uuid, published_to)
CREATE INDEX assortment_product_article_number_idx
ON public.assortment_product USING btree (article_number)
Having a list of customers and a list or articles, I want to get the entries where these 3 things match:
- assortment_uuid
- published_to (customer)
- article_number
My query looks like this:
select
ass_product.article_number as articleNumber,
cust_assortment.published_to as customerId,
array_agg(cast (cust_assortment.assortment_uuid as varchar) order by cust_assortment.id) as uuids,
array_agg(cust_assortment.assortment_name order by cust_assortment.id) as names
from customer_assortment cust_assortment
join assortment_product ass_product on cust_assortment.assortment_uuid = ass_product.assortment_uuid
# there are max 10 values here
where cust_assortment.published_to = any(values ('7000014910'), ('7000014940') ... )
# there a max 2000 values here
and ass_product.article_number = any(values ('FK5327'),('GG1114') ... )
group by ass_product.article_number, cust_assortment.published_to;
This takes tens of seconds to get.
My understanding of a query plan is somehow limited, but if it helps, here it is:
GroupAggregate (cost=92861.99..94421.60 rows=47988 width=81) (actual time=4107.602..4382.830 rows=20000 loops=1) |
Group Key: ass_product.article_number, cust_assortment.published_to |
-> Sort (cost=92861.99..92981.96 rows=47988 width=61) (actual time=4107.518..4123.623 rows=153760 loops=1) |
Sort Key: ass_product.article_number, cust_assortment.published_to |
Sort Method: quicksort Memory: 27767kB |
-> Nested Loop (cost=30.99..89130.83 rows=47988 width=61) (actual time=2.175..3632.692 rows=153760 loops=1) |
-> HashAggregate (cost=30.00..32.00 rows=200 width=32) (actual time=2.090..3.603 rows=2000 loops=1) |
Group Key: "*VALUES*_1".column1 |
-> Values Scan on "*VALUES*_1" (cost=0.00..25.00 rows=2000 width=32) (actual time=0.004..1.048 rows=2000 loops=1) |
-> Nested Loop (cost=0.99..445.39 rows=10 width=61) (actual time=0.034..1.797 rows=77 loops=2000) |
-> HashAggregate (cost=0.15..0.25 rows=10 width=32) (actual time=0.000..0.006 rows=10 loops=2000) |
Group Key: "*VALUES*".column1 |
-> Values Scan on "*VALUES*" (cost=0.00..0.12 rows=10 width=32) (actual time=0.002..0.006 rows=10 loops=1) |
-> Nested Loop (cost=0.84..44.34 rows=17 width=61) (actual time=0.031..0.176 rows=8 loops=20000) |
-> Index Scan using assortment_product_article_number_idx on assortment_product ass_product (cost=0.29..10.01 rows=4 width=22) (actual time=0.013..0.021 rows=10 loops=20000) |
Index Cond: ((article_number)::text = "*VALUES*_1".column1) |
-> Index Scan using customer_assortment_assortment_uuid_published_to_idx on customer_assortment cust_assortment (cost=0.55..8.57 rows=1 width=55) (actual time=0.014..0.015 rows=1 loops=195740)|
Index Cond: ((assortment_uuid = ass_product.assortment_uuid) AND ((published_to)::text = "*VALUES*".column1)) |
Planning time: 3.601 ms |
Execution time: 4385.493 ms |
Can anybody bring some sanity into my life please?
The Solutions:
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
Method 1
You would get better performance with an index-only scan on customer_assortment
. For that, create the index as follows:
CREATE UNIQUE INDEX customer_assortment_assortment_uuid_published_to_idx
ON public.customer_assortment (assortment_uuid, published_to)
INCLUDE (id, assortment_name);
Then, make sure the visibility map is up to date:
VACUUM public.customer_assortment;
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0