The question:
Are there performance reasons (or other reasons) to prefer idx
(from intarray) over array_position
to find the position of an integer in a list of integers?
I ask because we’ve used idx
since before array_position
was added to Postgres. It would be nice to use the built-in function rather than relying on the intarray extension, if there is no performance penalty.
Very informal benchmarking suggests it’s comparable. PostgreSQL version is 13.5.
The Solutions:
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
Method 1
There are multiple more or less subtle differences. Some become obvious from looking at the description in the manual:
idx
(integer[]
,item
integer
) →integer
Returns index of the first array element matching item, or 0 if no
match.
The manual about array_position()
:
array_position
(anycompatiblearray
,anycompatible
[,integer
] ) →integer
Returns the subscript of the first occurrence of the second argument
in the array, orNULL
if it’s not present. If the third argument is
given, the search begins at that subscript. The array must be
one-dimensional. Comparisons are done usingIS NOT DISTINCT FROM
semantics, so it is possible to search forNULL
.
-
If not found,
idx()
returns 0, butarray_position()
returnsnull
. -
idx()
cannot handlenull
at all.intarray
as a whole disallows arrays withnull
values. So it also makes no sense to search for it, you always getnull
when searching fornull
.
array_position()
allowsnull
. -
Unlike
idx()
,array_position()
allows an offset in the search as 3rd parameter. So you can iterate through arrays to find one match after the other.idx()
can never find elements past the first occurrence. (intarray
is mostly optimized to deal with sorted, unique array elements.) -
OTOH,
idx()
accepts multi-dimensional arrays (treats them as flat), whilearray_position()
does not. -
Non-standard array subscripts are ignored by
idx()
(generally bypassing array dimensions and subscripts), but honored byarray_position()
. So results can differ for seemingly equivalent, simple expressions! It’s best to stick with standard array subscripts, but one must be aware of possibilities. See: -
array_position()
takes polymorphic arguments, so it works for any array type, whileidx()
works forinteger
exclusively. There are pros and cons to either.idx()
is faster with function type resolution, and will never be confused with different types, but it’s limited tointeger
:integer
or die.
All the added versatility of array_position()
comes at a price. Most importantly, array_position()
uses IS NOT DISTINCT FROM
semantics to handle null
values, and that is substantially more expensive. This leads to the possibly most important difference:
idx()
is much faster and scales much better with long arrays.
(Your “very informal benchmarking” mislead you somehow.)
db<>fiddle here – demonstrating all.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0