Unique array values for this string_to_array

The question:

This is a follow-up to:

Based on these sample tables:

data_providers:
id | field_map
--------------
1  | {"segments": "SEGMENT IDS", "full_name": "FULL NAME"}

leads:
id | data_provider_id | email | data
------------------------------------
1  | 201              | [email protected] | {"SEGMENT IDS": "id1,id1,id1,id2,id3", "FULL NAME": "John Doe"}
2  | 201              | [email protected] | {"FULL NAME": "Billy Bob"}

desired output:

data_provider_id | email | full_name | segment
----------------------------------------------
201              | [email protected] | John Doe  | id1
201              | [email protected] | John Doe  | id2
201              | [email protected] | John Doe  | id3
201              | [email protected] | Billy Bob | NULL

I have the following query:

SELECT
  leads.data_provider_id,
  leads.email,
  leads.data->>(p.field_map->>'full_name') AS full_name,
  segment
FROM leads
LEFT OUTER JOIN data_providers p ON p.id = leads.data_provider_id
LEFT JOIN LATERAL unnest(string_to_array(leads.data->>(p.field_map->>'segments'), ',')) AS segment ON true

This query is doing 2 particular things:

  1. its joining on data_providers table to get the field_map column which contains a JSONB mapping if CSV column headers. So something like {"segments": "SEGMENT ID", "full_name": "FULL NAME"}

  2. Within the data JSONB column of leads, there is a key (which I discover through the field map above) that contains a comma separated string of segment_ids (it comes in a CSV and they chose to put 2 values within 1 row). I want to split it so each segment_id gets its own row (and obviously all other columns remain the same on both rows).

I have 2 goals:

  1. If there is an empty string or the key doesn’t exist within the map, I want to return the row but just with NULL for the segment_id. I already got this working by changing CROSS JOIN to LEFT JOIN.

  2. I’m trying to remove duplicates in segment ids, so if someone enters ‘id1,id1’ it should only produce 1 row. I do this because there is a unique index on that column for the materialized view.

I’m currently stuck on #2.

The Solutions:

Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.

Method 1

Make it a subquery and throw in DISTINCT:

SELECT l.data_provider_id
     , l.email
     , l.data->>(p.field_map->>'full_name') AS full_name
     , s.segment
FROM   leads l
LEFT   JOIN data_providers p ON p.id = l.data_provider_id
LEFT   JOIN LATERAL (
   SELECT DISTINCT segment
   FROM   unnest(string_to_array(l.data->>(p.field_map->>'segment'), ',')) AS segment
   ) s ON true

Your field_map holds the key ‘segment’, not ‘segments’, btw.

You could even use this short syntax:

...
LEFT   JOIN LATERAL (
   SELECT DISTINCT unnest(string_to_array(l.data->>(p.field_map->>'segment'), ','))
   ) s(segment) ON true

(But the last one might make unsuspecting SQL purists cringe.)

Original order of array elements is not preserved. If you need that, see:

And use GROUP BY rather than DISTINCT and also aggregate the minimum ordinal position for each group of duplicates.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Comment