Any performance reasons to prefer idx() (intarray) over array_position()?

The question:

Are there performance reasons (or other reasons) to prefer idx (from intarray) over array_position to find the position of an integer in a list of integers?

I ask because we’ve used idx since before array_position was added to Postgres. It would be nice to use the built-in function rather than relying on the intarray extension, if there is no performance penalty.

Very informal benchmarking suggests it’s comparable. PostgreSQL version is 13.5.

The Solutions:

Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.

Method 1

There are multiple more or less subtle differences. Some become obvious from looking at the description in the manual:

The manual about idx():

idx ( integer[], item integer ) → integer

Returns index of the first array element matching item, or 0 if no
match.

The manual about array_position():

array_position ( anycompatiblearray, anycompatible [, integer
] ) → integer

Returns the subscript of the first occurrence of the second argument
in the array, or NULL if it’s not present. If the third argument is
given, the search begins at that subscript. The array must be
one-dimensional. Comparisons are done using IS NOT DISTINCT FROM
semantics, so it is possible to search for NULL.

  • If not found, idx() returns 0, but array_position() returns null.

  • idx() cannot handle null at all. intarray as a whole disallows arrays with null values. So it also makes no sense to search for it, you always get null when searching for null.
    array_position() allows null.

  • Unlike idx(), array_position() allows an offset in the search as 3rd parameter. So you can iterate through arrays to find one match after the other. idx() can never find elements past the first occurrence. (intarray is mostly optimized to deal with sorted, unique array elements.)

  • OTOH, idx() accepts multi-dimensional arrays (treats them as flat), while array_position() does not.

  • Non-standard array subscripts are ignored by idx() (generally bypassing array dimensions and subscripts), but honored by array_position(). So results can differ for seemingly equivalent, simple expressions! It’s best to stick with standard array subscripts, but one must be aware of possibilities. See:

  • array_position() takes polymorphic arguments, so it works for any array type, while idx() works for integer exclusively. There are pros and cons to either. idx() is faster with function type resolution, and will never be confused with different types, but it’s limited to integer: integer or die.

All the added versatility of array_position() comes at a price. Most importantly, array_position() uses IS NOT DISTINCT FROM semantics to handle null values, and that is substantially more expensive. This leads to the possibly most important difference:

  • idx() is much faster and scales much better with long arrays.
    (Your “very informal benchmarking” mislead you somehow.)

db<>fiddle here – demonstrating all.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Comment