The question:
I have a column with strings containing a list of species:
+----------------------------------------+
| species |
+----------------------------------------+
| Dinosauria, Ornitischia, indeterminado |
| Sirenia |
| Dinosauria, Therophoda |
| Dinosauria, Therophoda, Allosaurus |
| and so on... |
+----------------------------------------+
I am looking for a way, in PostgreSQL 12, to list and count all the unique names such as:
+---------------+-------+
| species | count |
+---------------+-------+
| Dinossauria | 3 |
| Ornitischia | 1 |
| indeterminado | 1 |
| Sirenia | 1 |
| Theropoda | 2 |
| Allosaurus | 1 |
+-----------------------+
The Solutions:
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
Method 1
You can split the comma separate list into rows using regexp_split_to_table()
and the group by that value:
select s.species, count(*)
from the_table t
cross join regexp_split_to_table(t.species, 's*,s*') as s(species)
group by s.species
I am using a regex as the delimiter to get rid of the whitespace after the comma. The above would also be possible with unnest(string_to_array(t.species, ','))
but then you need to trim()
the values to get rid of the whitespace.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0