Are there performance reasons (or other reasons) to prefer
idx (from intarray) over
array_position to find the position of an integer in a list of integers?
I ask because we’ve used
idx since before
array_position was added to Postgres. It would be nice to use the built-in function rather than relying on the intarray extension, if there is no performance penalty.
Very informal benchmarking suggests it’s comparable. PostgreSQL version is 13.5.
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
There are multiple more or less subtle differences. Some become obvious from looking at the description in the manual:
Returns index of the first array element matching item, or 0 if no
] ) →
Returns the subscript of the first occurrence of the second argument
in the array, or
NULLif it’s not present. If the third argument is
given, the search begins at that subscript. The array must be
one-dimensional. Comparisons are done using
IS NOT DISTINCT FROM
semantics, so it is possible to search for
If not found,
idx()returns 0, but
intarrayas a whole disallows arrays with
nullvalues. So it also makes no sense to search for it, you always get
nullwhen searching for
array_position()allows an offset in the search as 3rd parameter. So you can iterate through arrays to find one match after the other.
idx()can never find elements past the first occurrence. (
intarrayis mostly optimized to deal with sorted, unique array elements.)
idx()accepts multi-dimensional arrays (treats them as flat), while
Non-standard array subscripts are ignored by
idx()(generally bypassing array dimensions and subscripts), but honored by
array_position(). So results can differ for seemingly equivalent, simple expressions! It’s best to stick with standard array subscripts, but one must be aware of possibilities. See:
array_position()takes polymorphic arguments, so it works for any array type, while
integerexclusively. There are pros and cons to either.
idx()is faster with function type resolution, and will never be confused with different types, but it’s limited to
All the added versatility of
array_position() comes at a price. Most importantly,
IS NOT DISTINCT FROM semantics to handle
null values, and that is substantially more expensive. This leads to the possibly most important difference:
idx()is much faster and scales much better with long arrays.
(Your “very informal benchmarking” mislead you somehow.)
db<>fiddle here – demonstrating all.