I have two tables :
assortment_product. They are connected between them with an
assortment_uuid. I also have:
CREATE UNIQUE INDEX assortment_product_assortment_uuid_article_number_idx ON public.assortment_product USING btree (assortment_uuid, article_number) CREATE UNIQUE INDEX customer_assortment_assortment_uuid_published_to_idx ON public.customer_assortment USING btree (assortment_uuid, published_to) CREATE INDEX assortment_product_article_number_idx ON public.assortment_product USING btree (article_number)
Having a list of customers and a list or articles, I want to get the entries where these 3 things match:
- assortment_uuid - published_to (customer) - article_number
My query looks like this:
select ass_product.article_number as articleNumber, cust_assortment.published_to as customerId, array_agg(cast (cust_assortment.assortment_uuid as varchar) order by cust_assortment.id) as uuids, array_agg(cust_assortment.assortment_name order by cust_assortment.id) as names from customer_assortment cust_assortment join assortment_product ass_product on cust_assortment.assortment_uuid = ass_product.assortment_uuid # there are max 10 values here where cust_assortment.published_to = any(values ('7000014910'), ('7000014940') ... ) # there a max 2000 values here and ass_product.article_number = any(values ('FK5327'),('GG1114') ... ) group by ass_product.article_number, cust_assortment.published_to;
This takes tens of seconds to get.
My understanding of a query plan is somehow limited, but if it helps, here it is:
GroupAggregate (cost=92861.99..94421.60 rows=47988 width=81) (actual time=4107.602..4382.830 rows=20000 loops=1) | Group Key: ass_product.article_number, cust_assortment.published_to | -> Sort (cost=92861.99..92981.96 rows=47988 width=61) (actual time=4107.518..4123.623 rows=153760 loops=1) | Sort Key: ass_product.article_number, cust_assortment.published_to | Sort Method: quicksort Memory: 27767kB | -> Nested Loop (cost=30.99..89130.83 rows=47988 width=61) (actual time=2.175..3632.692 rows=153760 loops=1) | -> HashAggregate (cost=30.00..32.00 rows=200 width=32) (actual time=2.090..3.603 rows=2000 loops=1) | Group Key: "*VALUES*_1".column1 | -> Values Scan on "*VALUES*_1" (cost=0.00..25.00 rows=2000 width=32) (actual time=0.004..1.048 rows=2000 loops=1) | -> Nested Loop (cost=0.99..445.39 rows=10 width=61) (actual time=0.034..1.797 rows=77 loops=2000) | -> HashAggregate (cost=0.15..0.25 rows=10 width=32) (actual time=0.000..0.006 rows=10 loops=2000) | Group Key: "*VALUES*".column1 | -> Values Scan on "*VALUES*" (cost=0.00..0.12 rows=10 width=32) (actual time=0.002..0.006 rows=10 loops=1) | -> Nested Loop (cost=0.84..44.34 rows=17 width=61) (actual time=0.031..0.176 rows=8 loops=20000) | -> Index Scan using assortment_product_article_number_idx on assortment_product ass_product (cost=0.29..10.01 rows=4 width=22) (actual time=0.013..0.021 rows=10 loops=20000) | Index Cond: ((article_number)::text = "*VALUES*_1".column1) | -> Index Scan using customer_assortment_assortment_uuid_published_to_idx on customer_assortment cust_assortment (cost=0.55..8.57 rows=1 width=55) (actual time=0.014..0.015 rows=1 loops=195740)| Index Cond: ((assortment_uuid = ass_product.assortment_uuid) AND ((published_to)::text = "*VALUES*".column1)) | Planning time: 3.601 ms | Execution time: 4385.493 ms |
Can anybody bring some sanity into my life please?
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
You would get better performance with an index-only scan on
customer_assortment. For that, create the index as follows:
CREATE UNIQUE INDEX customer_assortment_assortment_uuid_published_to_idx ON public.customer_assortment (assortment_uuid, published_to) INCLUDE (id, assortment_name);
Then, make sure the visibility map is up to date: