The question:
This is a follow-up to:
Based on these sample tables:
data_providers:
id | field_map
--------------
1 | {"segments": "SEGMENT IDS", "full_name": "FULL NAME"}
leads:
id | data_provider_id | email | data
------------------------------------
1 | 201 | [email protected] | {"SEGMENT IDS": "id1,id1,id1,id2,id3", "FULL NAME": "John Doe"}
2 | 201 | [email protected] | {"FULL NAME": "Billy Bob"}
desired output:
data_provider_id | email | full_name | segment
----------------------------------------------
201 | [email protected] | John Doe | id1
201 | [email protected] | John Doe | id2
201 | [email protected] | John Doe | id3
201 | [email protected] | Billy Bob | NULL
I have the following query:
SELECT
leads.data_provider_id,
leads.email,
leads.data->>(p.field_map->>'full_name') AS full_name,
segment
FROM leads
LEFT OUTER JOIN data_providers p ON p.id = leads.data_provider_id
LEFT JOIN LATERAL unnest(string_to_array(leads.data->>(p.field_map->>'segments'), ',')) AS segment ON true
This query is doing 2 particular things:
-
its joining on
data_providers
table to get thefield_map
column which contains a JSONB mapping if CSV column headers. So something like{"segments": "SEGMENT ID", "full_name": "FULL NAME"}
-
Within the
data
JSONB column ofleads
, there is a key (which I discover through the field map above) that contains a comma separated string of segment_ids (it comes in a CSV and they chose to put 2 values within 1 row). I want to split it so each segment_id gets its own row (and obviously all other columns remain the same on both rows).
I have 2 goals:
-
If there is an empty string or the key doesn’t exist within the map, I want to return the row but just with NULL for the segment_id. I already got this working by changing
CROSS JOIN
toLEFT JOIN
. -
I’m trying to remove duplicates in segment ids, so if someone enters ‘id1,id1’ it should only produce 1 row. I do this because there is a unique index on that column for the materialized view.
I’m currently stuck on #2.
The Solutions:
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
Method 1
Make it a subquery and throw in DISTINCT
:
SELECT l.data_provider_id
, l.email
, l.data->>(p.field_map->>'full_name') AS full_name
, s.segment
FROM leads l
LEFT JOIN data_providers p ON p.id = l.data_provider_id
LEFT JOIN LATERAL (
SELECT DISTINCT segment
FROM unnest(string_to_array(l.data->>(p.field_map->>'segment'), ',')) AS segment
) s ON true
Your field_map
holds the key ‘segment’, not ‘segments’, btw.
You could even use this short syntax:
...
LEFT JOIN LATERAL (
SELECT DISTINCT unnest(string_to_array(l.data->>(p.field_map->>'segment'), ','))
) s(segment) ON true
(But the last one might make unsuspecting SQL purists cringe.)
Original order of array elements is not preserved. If you need that, see:
And use GROUP BY
rather than DISTINCT
and also aggregate the minimum ordinal position for each group of duplicates.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0