The question:
EXPLAIN ANALYZE SELECT * FROM "subscriptions_price"
WHERE "subscriptions_price"."product_id" = 78 AND "subscriptions_price"."active"
ORDER BY RANDOM() ASC
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=14.58..14.59 rows=6 width=127) (actual time=0.029..0.030 rows=7 loops=1)
Sort Key: (random())
Sort Method: quicksort Memory: 26kB
-> Index Scan using subscriptions_price_product_id_b807373d on subscriptions_price (cost=0.28..14.50 rows=6 width=127) (actual time=0.014..0.019 rows=7 loops=1)
Index Cond: (product_id = 78)
Filter: active
Planning Time: 0.082 ms
Execution Time: 0.056 ms
I’d like to think the sorting is only be done on 6 rows and I think the first line bears this out, but the index scan doesn’t happen until after the sort and so that makes me wonder why the index scan isn’t first and the sort applied over the result set instead of the whole table.
I’m wondering if I’m just reading the query plan incorrectly or what.
My assumption was that if the ORDER BY RANDOM() was on a sufficiently small set (limited by the select) the performance impact would be minimal but I’m worried that the sort is being applied to the table before the selection and one of the tables I’m doing this on is getting kind of large so I want to make sure I’m not making a big mistake.
This is a similar EXPLAIN ANALYZE in a table with 14000 rows so I’m thinking that I’m reading the QUERY PLAN backwards and I should be reading it from the bottom up since the sort claims it only sorted 63 rows.
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=406.34..406.34 rows=1 width=1488) (actual time=3.025..3.029 rows=63 loops=1)
Sort Key: (random())
Sort Method: quicksort Memory: 94kB
-> Nested Loop (cost=0.29..406.33 rows=1 width=1488) (actual time=0.054..2.926 rows=63 loops=1)
-> Seq Scan on accounts_profile (cost=0.00..398.02 rows=1 width=1429) (actual time=0.030..2.542 rows=63 loops=1)
Filter: (is_capper AND show_in_directory)
Rows Removed by Filter: 15093
-> Index Scan using auth_user_pkey on auth_user (cost=0.29..8.30 rows=1 width=51) (actual time=0.005..0.005 rows=1 loops=63)
Index Cond: (id = accounts_profile.user_id)
Planning Time: 1.062 ms
Execution Time: 3.117 ms
(11 rows)
The Solutions:
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
Method 1
You have to read an execution plan as a tree lying on its side. The more indented a plan step (node) is, the further it is from the root. It is customary to show these trees with the root on the top, so your second execution plan would correspond to this tree:
+------+
| Sort |
+------+
|
+-------------+
| Nested Loop |
+-------------+
/
+-----------------+ +------------+
| Sequential Scan | | Index Scan |
+-----------------+ +------------+
In a simplified view, lower nodes are executed first.
In reality, all nodes are executed on demand: For example, to get the first result from the nested loop join, PostgreSQL first gets the first row from the sequential scan. Then it performs the index scan to get join results. If only the first row from the nested loop join were needed, PostgreSQL might not execute the sequential scan to its end, but stop after fetching the first few rows. However, the sort node needs all rows from the nested loop join before it can even start, so in this query all nodes would be executed to the end. Sometimes, execution plan steps are executed one after the other, but often several of them are active at the same time.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0