Optimising a query for matching multiple values on a related table

The question:

I am trying to optimise a query for fetching all game IDs where certain units (Archer, Cavalry, etc.) were present.

I am using the players table, which contains the game_id, and the units table, which contains a player_id foreign key and a unit_type_id foreign key.

The relevant columns for the tables mentioned above are:

table: games

| id | 
|----|
| 1  |

table: players

| id | game_id |
|----|---------|
| 1  | 1       |
| 2  | 1       |
| 3  | 2       |
| 4  | 2       |

table: units

| id  | player_id    | unit_type_id |
|-----|--------------|--------------|
| 1   | 1            | 1            |
| 2   | 1            | 1            |
| 3   | 1            | 1            |
| 4   | 1            | 1            |
| 5   | 1            | 2            |
| 6   | 1            | 2            |
| 7   | 1            | 2            |
| 8   | 1            | 2            |
| 9   | 2            | 3            |
| 10  | 2            | 3            |
| 11  | 2            | 3            |
| 12  | 2            | 3            |

table: unit_types

| id | type              |
|----|-------------------|
| 1  | ARCHER            |
| 2  | BERSERKER         |
| 3  | CAVALRY           |
| 4  | CROSSBOWMAN       |

The query I’m currently using, which works, is:

SELECT
players.game_id AS gameId,
GROUP_CONCAT(DISTINCT units.unit_type_id) AS unitTypes
FROM players
INNER JOIN units
ON players.id = units.player_id
GROUP BY players.game_id
HAVING
FIND_IN_SET('1', unitTypes)
AND
FIND_IN_SET('2', unitTypes)

I programatically generate the FIND_IN_SET statements.

Importantly, I want to return only game IDs that contain all specified unit types. This is why WHERE units.unit_type_id IN (1, 2, 3) won’t work.

I have also tried:

  • HAVING SUM(CASE WHEN units.unit_type_id IN (1, 2) THEN 1 END) = 2; – didn’t work
  • HAVING SUM(units.unit_type_id IN (10, 11)) = 2; – didn’t work
  • Using multiple inner joins, one on each unit type – this was very slow, taking 3 times as long

MySQL version: 5.7.37

DB Fiddle with examples

Is there a better way of doing this to improve performance in terms of speed and resources? Any suggestions would be appreciated.

Thanks!

The Solutions:

Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.

Method 1

I hope I understood correctly the question, if not please let me know.

Try:

SELECT p.game_id AS gameId
FROM players p
INNER JOIN units u ON p.id = u.player_id
WHERE u.unit_type_id in (1,2)
GROUP BY p.game_id
HAVING COUNT(DISTINCT u.unit_type_id) =2 ;

Demo

This will return the players.game_id which both have unit_type_id (1,2). If you have three or more u.unit_type_id in you have to change HAVING COUNT(DISTINCT u.unit_type_id) =2 ; to the number of the records in the in condition

Method 2

Not sure whether this solves your problem:

SELECT players.game_id AS gameId
     , GROUP_CONCAT(DISTINCT units.unit_type_id) AS unitTypes
FROM players
JOIN units
    ON players.id = units.player_id
GROUP BY players.game_id
HAVING COUNT(DISTINCT CASE WHEN units.unit_type_id IN (11,16) 
                           THEN units.unit_type_id  
                      END) = 2;

Explanation, the CASE expression maps 11->11, 16->16 and everything else to NULL. By counting DISTINCT we get 2 for those that contain both 11, 16

Your attempt:

HAVING SUM(CASE WHEN units.unit_type_id IN (1, 2) THEN 1 END) = 2;

Won’t work since the bag {1,1,2} contains 3 elements and worse {2,2} contains 2 elements.

Since you map all values to the same constant transforming the bag to a set via DISTINCT won’t work:

{1,1,2} -> {1,1,1} -> {1}
{2,2} -> {1,1} -> {1}

You can however easily adjust the attempt to:

HAVING SUM(DISTINCT CASE WHEN units.unit_type_id IN (1, 2) 
                         THEN units.unit_type_id 
                    END) = 2;

which in all essential is the same expression as I used

Method 3

The current plan involves

  1. Do a JOIN, thereby exploding the number of rows into an intermediate table
  2. Shrink back via GROUP BY
  3. Filter down further

See if this provides the correct result and does it faster:

SELECT  p.game_id AS gameId
    FROM  
    (
        SELECT  u.player_id
            FROM  units AS u
            WHERE  u.unit_type_id in (1,2)
            GROUP BY  u.player_id
            HAVING  COUNT(DISTINCT u.unit_type_id) = 2 
    ) AS x
    JOIN  players AS p  ON p.id = x.player_id
    GROUP by  p.game_id 

With these indexes:

u:  INDEX(unit_type_id, player_id)

When adding a composite index, DROP index(es) with the same leading columns.
That is, when you have both INDEX(a) and INDEX(a,b), toss the former.

Method 4

select distinct(p.game_id) game_id
from players p
inner join units u on u.player_id=p.id and u.unit_type_id in ('1', '2')
where t.game_id in (1,2,3,4)

avoid misunderstanding

select game_id
from (
    select distinct(p.game_id) game_id
    from players p
    inner join units u on u.player_id=p.id and u.unitTypes in ('1', '2')
) t
where t.game_id in (1,2,3,4)

Method 5

The query filter clause:

where (players.game_id IN (1, 2, 3, 4)) 
AND units.unit_type_id = 1 
AND units.unit_type_id = 2
AND units.unit_type_id = 3  

Where is for filtering.

Grouping is for aggregation.

Select DISTINCT if you have multiple rows matching.

Yeah, no. I didn’t read the specifications carefully. Try this:

DECLARE @Games TABLE (ID INT, Description VARCHAR(20))
DECLARE @Players TABLE (ID INT, Game_ID INT, Description VARCHAR(20))
DECLARE @Unit_Types TABLE (ID INT, [Type] VARCHAR(20))
DECLARE @Units TABLE (ID INT, Player_ID INT, unit_type_id INT, Description VARCHAR(20))

INSERT INTO @Games (ID, Description) VALUES (1, 'Marbles'),(2, 'Noughts and Crosses')
INSERT INTO @Players (ID, Game_ID, Description) VALUES (1, 1, 'Player1'), (2, 2, 'Player2')
INSERT INTO @Unit_Types (ID, [Type]) VALUES (1, 'Archer'),(2,'Berserker'),(3,'Cavalry'),(4,'Crossbowman')
INSERT INTO @Units (ID, Player_ID, unit_type_id, Description) 
VALUES
( 1  , 1 , 1   ,'Player1 Archer 1'),
( 2  , 1 , 1   ,'Player1 Archer 2'),
( 3  , 1 , 1   ,'Player1 Archer 3'),
( 4  , 1 , 1   ,'Player1 Archer 4'),
( 5  , 1 , 2   ,'Player1 Berserker 5'),
( 6  , 1 , 2   ,'Player1 Berserker 6'),
( 7  , 1 , 2   ,'Player1 Berserker 7'),
( 8  , 1 , 2   ,'Player1 Berserker 8'),
( 9  , 2 , 3   ,'Player2 Cavalry 9'),
( 10 , 2 , 3   ,'Player2 Cavalry 10'),
( 11 , 2 , 3   ,'Player2 Cavalry 11'),
( 12 , 2 , 3   ,'Player2 Cavalry 12')

/* Importantly, I want to return only game IDs that contain all specified unit types. */
SELECT * 
FROM @Games G1
WHERE EXISTS(
/* Type 1 */
    SELECT *
    FROM @Games G
    JOIN @Players P
    ON P.Game_ID = G.ID
    JOIN @Units U
    ON U.Player_ID = P.ID
    JOIN @Unit_Types UT
    ON UT.ID = U.unit_type_id

    
    where (P.game_id IN (1, 2, 3, 4)) 
    AND U.unit_type_id = 1 
    AND G.ID = G1.ID
)
AND EXISTS (
/* Type 2 */
    SELECT *
    FROM @Games G
    JOIN @Players P
    ON P.Game_ID = G.ID
    JOIN @Units U
    ON U.Player_ID = P.ID
    JOIN @Unit_Types UT
    ON UT.ID = U.unit_type_id

    /* Importantly, I want to return only game IDs that contain all specified unit types. */
    where (P.game_id IN (1, 2, 3, 4)) 
    AND U.unit_type_id = 2 
    AND G.ID = G1.ID
)


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Comment