Why is count(x.*) slower than count(*)?

The question:

explain select count(x.*) from customer x;
...
->  Partial Aggregate  (cost=27005.45..27005.46 rows=1 width=8)
  ->  Parallel Seq Scan on customer x  (cost=0.00..26412.56 rows=237156 width=994)

explain select count(*) from customer x;
...                                     

->  Partial Aggregate  (cost=27005.45..27005.46 rows=1 width=8)
  ->  Parallel Seq Scan on customer x  (cost=0.00..26412.56 rows=237156 width=0)

The COUNT(x.*) here makes the width in the explain result read unnecessary row data.

I thought they should be identical, but it seems not, why?

The Solutions:

Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.

Method 1

Logically, both are identical – because x.* always counts, even when all columns are NULL.
But Postgres has a separate implementation for count(*).

It does not bother with any expression at all and only considers the existence of live rows. That’s slightly faster, which sums up to a relevant difference over many rows.
The performance penalty for count(x.*) grows with the number of columns / width of rows, and will be rather substantial for wide rows like yours (width=994).

It’s even documented explicitly:

count ( * ) → bigint

Computes the number of input rows.


count ( "any" ) → bigint

Computes the number of input rows in which the input value is not
null.

The gist of it: whenever you don’t care whether an expression is NULL, use count(*) instead.

Related:

Some other RDBMS do not have the same fast path for count(*). OTOH, counting all rows in a table is comparatively slow in Postgres due to its MVCC model that forces checking row visibility. See:


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Comment