The question:
I am trying to optimise a query for fetching all game IDs where certain units (Archer, Cavalry, etc.) were present.
I am using the players
table, which contains the game_id
, and the units
table, which contains a player_id
foreign key and a unit_type_id
foreign key.
The relevant columns for the tables mentioned above are:
table: games
| id |
|----|
| 1 |
table: players
| id | game_id |
|----|---------|
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 2 |
table: units
| id | player_id | unit_type_id |
|-----|--------------|--------------|
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 1 | 1 |
| 4 | 1 | 1 |
| 5 | 1 | 2 |
| 6 | 1 | 2 |
| 7 | 1 | 2 |
| 8 | 1 | 2 |
| 9 | 2 | 3 |
| 10 | 2 | 3 |
| 11 | 2 | 3 |
| 12 | 2 | 3 |
table: unit_types
| id | type |
|----|-------------------|
| 1 | ARCHER |
| 2 | BERSERKER |
| 3 | CAVALRY |
| 4 | CROSSBOWMAN |
The query I’m currently using, which works, is:
SELECT
players.game_id AS gameId,
GROUP_CONCAT(DISTINCT units.unit_type_id) AS unitTypes
FROM players
INNER JOIN units
ON players.id = units.player_id
GROUP BY players.game_id
HAVING
FIND_IN_SET('1', unitTypes)
AND
FIND_IN_SET('2', unitTypes)
I programatically generate the FIND_IN_SET
statements.
Importantly, I want to return only game IDs that contain all specified unit types. This is why WHERE units.unit_type_id IN (1, 2, 3)
won’t work.
I have also tried:
HAVING SUM(CASE WHEN units.unit_type_id IN (1, 2) THEN 1 END) = 2;
– didn’t workHAVING SUM(units.unit_type_id IN (10, 11)) = 2;
– didn’t work- Using multiple inner joins, one on each unit type – this was very slow, taking 3 times as long
MySQL version: 5.7.37
Is there a better way of doing this to improve performance in terms of speed and resources? Any suggestions would be appreciated.
Thanks!
The Solutions:
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
Method 1
I hope I understood correctly the question, if not please let me know.
Try:
SELECT p.game_id AS gameId
FROM players p
INNER JOIN units u ON p.id = u.player_id
WHERE u.unit_type_id in (1,2)
GROUP BY p.game_id
HAVING COUNT(DISTINCT u.unit_type_id) =2 ;
This will return the players.game_id which both have unit_type_id (1,2). If you have three or more u.unit_type_id in
you have to change HAVING COUNT(DISTINCT u.unit_type_id) =2 ;
to the number of the records in the in condition
Method 2
Not sure whether this solves your problem:
SELECT players.game_id AS gameId
, GROUP_CONCAT(DISTINCT units.unit_type_id) AS unitTypes
FROM players
JOIN units
ON players.id = units.player_id
GROUP BY players.game_id
HAVING COUNT(DISTINCT CASE WHEN units.unit_type_id IN (11,16)
THEN units.unit_type_id
END) = 2;
Explanation, the CASE expression maps 11->11, 16->16 and everything else to NULL. By counting DISTINCT we get 2 for those that contain both 11, 16
Your attempt:
HAVING SUM(CASE WHEN units.unit_type_id IN (1, 2) THEN 1 END) = 2;
Won’t work since the bag {1,1,2} contains 3 elements and worse {2,2} contains 2 elements.
Since you map all values to the same constant transforming the bag to a set via DISTINCT won’t work:
{1,1,2} -> {1,1,1} -> {1}
{2,2} -> {1,1} -> {1}
You can however easily adjust the attempt to:
HAVING SUM(DISTINCT CASE WHEN units.unit_type_id IN (1, 2)
THEN units.unit_type_id
END) = 2;
which in all essential is the same expression as I used
Method 3
The current plan involves
- Do a
JOIN
, thereby exploding the number of rows into an intermediate table - Shrink back via GROUP BY
- Filter down further
See if this provides the correct result and does it faster:
SELECT p.game_id AS gameId
FROM
(
SELECT u.player_id
FROM units AS u
WHERE u.unit_type_id in (1,2)
GROUP BY u.player_id
HAVING COUNT(DISTINCT u.unit_type_id) = 2
) AS x
JOIN players AS p ON p.id = x.player_id
GROUP by p.game_id
With these indexes:
u: INDEX(unit_type_id, player_id)
When adding a composite index, DROP index(es) with the same leading columns.
That is, when you have both INDEX(a) and INDEX(a,b), toss the former.
Method 4
select distinct(p.game_id) game_id
from players p
inner join units u on u.player_id=p.id and u.unit_type_id in ('1', '2')
where t.game_id in (1,2,3,4)
avoid misunderstanding
select game_id
from (
select distinct(p.game_id) game_id
from players p
inner join units u on u.player_id=p.id and u.unitTypes in ('1', '2')
) t
where t.game_id in (1,2,3,4)
Method 5
The query filter clause:
where (players.game_id IN (1, 2, 3, 4))
AND units.unit_type_id = 1
AND units.unit_type_id = 2
AND units.unit_type_id = 3
Where is for filtering.
Grouping is for aggregation.
Select DISTINCT if you have multiple rows matching.
Yeah, no. I didn’t read the specifications carefully. Try this:
DECLARE @Games TABLE (ID INT, Description VARCHAR(20))
DECLARE @Players TABLE (ID INT, Game_ID INT, Description VARCHAR(20))
DECLARE @Unit_Types TABLE (ID INT, [Type] VARCHAR(20))
DECLARE @Units TABLE (ID INT, Player_ID INT, unit_type_id INT, Description VARCHAR(20))
INSERT INTO @Games (ID, Description) VALUES (1, 'Marbles'),(2, 'Noughts and Crosses')
INSERT INTO @Players (ID, Game_ID, Description) VALUES (1, 1, 'Player1'), (2, 2, 'Player2')
INSERT INTO @Unit_Types (ID, [Type]) VALUES (1, 'Archer'),(2,'Berserker'),(3,'Cavalry'),(4,'Crossbowman')
INSERT INTO @Units (ID, Player_ID, unit_type_id, Description)
VALUES
( 1 , 1 , 1 ,'Player1 Archer 1'),
( 2 , 1 , 1 ,'Player1 Archer 2'),
( 3 , 1 , 1 ,'Player1 Archer 3'),
( 4 , 1 , 1 ,'Player1 Archer 4'),
( 5 , 1 , 2 ,'Player1 Berserker 5'),
( 6 , 1 , 2 ,'Player1 Berserker 6'),
( 7 , 1 , 2 ,'Player1 Berserker 7'),
( 8 , 1 , 2 ,'Player1 Berserker 8'),
( 9 , 2 , 3 ,'Player2 Cavalry 9'),
( 10 , 2 , 3 ,'Player2 Cavalry 10'),
( 11 , 2 , 3 ,'Player2 Cavalry 11'),
( 12 , 2 , 3 ,'Player2 Cavalry 12')
/* Importantly, I want to return only game IDs that contain all specified unit types. */
SELECT *
FROM @Games G1
WHERE EXISTS(
/* Type 1 */
SELECT *
FROM @Games G
JOIN @Players P
ON P.Game_ID = G.ID
JOIN @Units U
ON U.Player_ID = P.ID
JOIN @Unit_Types UT
ON UT.ID = U.unit_type_id
where (P.game_id IN (1, 2, 3, 4))
AND U.unit_type_id = 1
AND G.ID = G1.ID
)
AND EXISTS (
/* Type 2 */
SELECT *
FROM @Games G
JOIN @Players P
ON P.Game_ID = G.ID
JOIN @Units U
ON U.Player_ID = P.ID
JOIN @Unit_Types UT
ON UT.ID = U.unit_type_id
/* Importantly, I want to return only game IDs that contain all specified unit types. */
where (P.game_id IN (1, 2, 3, 4))
AND U.unit_type_id = 2
AND G.ID = G1.ID
)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0