What is more efficient on select – objects identified only by their ID, or by their ID + customer id?

The question:

I have a collection of ~500K configurations. Each configuration belongs to a specific customer. There are up to a few hundreds of customers.

I want to store the configurations in a table, and it will be used in JOIN statements.
In the select query, there is always a condition on the customer id. It is always joined with the configurations table. Sometimes there is a condition on one or more of the columns in the configuration table.

I would like to know what is the better approach:

  • PK of the configurations table is customer id + configuration id. I will include the customer id condition also in the join clause.
  • PK of the configuration table is only configuration id.

I want to understand:

  • Should the presence of the customer id in the PK have major affect on performance?
  • Are there any disadvantages on using 2-column PK? Assuming there is ALWAYS a condition on the customer, so I will never query the configurations table on configuration-id only.

Thanks.

The Solutions:

Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.

Method 1

Q: Should the presence of the customer id in the PK have major affect on performance?

A: Doing so will result in PostgreSQL creating a B-Tree index sorted on both the configuration_id and customer_id, which would help ensure your queries receive the most optimal execution plan, when you’re querying on both fields. This is known as a covering index. So yes, the affects on performance are usually beneficial ones.

Q: Are there any disadvantages on using 2-column PK? Assuming there is ALWAYS a condition on the customer, so I will never query the configurations table on configuration-id only.

A: No there is nothing inherently wrong with a two column Primary Key. But keep in mind, the main goal of the Primary Key is to ensure uniqueness of that Table. It is only a secondary benefit that PostgreSQL creates an index on the Primary Key.

Therefore if you’re breaking the uniqueness that makes logical sense of your data by adding another column to the Primary Key then you shouldn’t do that. Instead you should add your own secondary index on both columns.

In your case, it sounds like you’d want the data uniqueness to be dependent on customer_id as well, so you’re probably fine to just include it in the Primary Key.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Comment