The question:
I have multiple lookup tables and each need to be filtered by an array of values:
select
array_agg(joined_properties.name) as properties,
items.name
from items
left join (
select properties.name, item_property.item_id, item_property.property_id
from item_property
join properties on properties.id = item_property.property_id) as joined_properties on joined_properties.item_id = items.id
group by items.name;
properties | name |
---|---|
property1 | item2 |
property1,property2,property4 | item1 |
property2,property3 | item3 |
property2 | item5 |
property3,property4 | item4 |
I would like to filter each item by an array of properties, but return the unfiltered properties (just like above) in the same query.
What is the performant/correct way of doing it without needing to re-run the joins once I found matching rows? Whenever I filter inside the join
or outside it before grouping it only returns the matching rows of course:
select
array_agg(joined_properties.name) as properties,
items.name
from items
inner join (
select properties.name, item_property.item_id, item_property.property_id
from item_property
join properties on properties.id = item_property.property_id
where properties.name = ANY('{"property1","property2"}')
) as joined_properties on joined_properties.item_id = items.id
group by items.name;
properties | name |
---|---|
property1,property2 | item1 |
property1 | item2 |
property2 | item3 |
property2 | item5 |
Expected output:
properties | name |
---|---|
property1 | item2 |
property1,property2,property4 | item1 |
property2,property3 | item3 |
property2 | item5 |
Fiddle below:
Edit 1:
- Approximate hardware specs: 4-core Xeon Gold VM, 16GB RAM, SSD
- Relevant DB cardinalities:
- ~100k records in
items
, - 6
properties_i
text tables with their respective intermediary tables, - most
properties_i
tables range from 10-30 records, - largest
properties_i
will have 200 records indexed, - each item in
items
has 0-10 associated properties for anyproperty_i
- ~100k records in
The Solutions:
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
Method 1
The best solution heavily depends on data distribution and cardinalities – and available indexes.
This query should be generally good:
SELECT sub.properties, i.name AS item
FROM (
SELECT item_id, array_agg(p.name) AS properties
FROM (
SELECT DISTINCT item_id
FROM item_property
WHERE property_id = ANY (
SELECT id FROM properties
WHERE name = ANY ('{"property1","property2"}')
)
) ip0
JOIN item_property ip USING (item_id)
JOIN properties p ON p.id = ip.property_id
GROUP BY item_id
) sub
JOIN items i ON i.id = sub.item_id
ORDER BY i.name; -- my optional addition
db<>fiddle here
The innermost subquery ip0
identifies items (or rather: item_id
‘s) that have one of the filtered properties.
The next subquery sub
joins to all properties and aggregates per item – not involving table items
, yet!
The outer query replaces item_id
with actual item name to arrive at your desired result. If item names are not defined UNIQUE NOT NULL
, I would still return item_id
additionally to avoid ambiguities.
Other query styles may be faster for particular data distributions / search arguments. Thee are many ways. I added one alternative with IN
to the fiddle.
Indexes
For big tables and selective filters, matching indexes make all the difference. For your given data distribution:
-
You don’t need any indexes on
properties
as it’s tiny. -
You’ll already have a UNIQUE or PK constraint on
item_property (item_id, property_id)
. Add another index on the same two columns in reversed order, so(property_id, item_id)
in my example. Why? See:If
item_property
is vacuumed enough, you’ll even see index-only scans. -
Obviously an index on
items(id)
– that will already be covered by the PK.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0