The question:
I have a table representing some files with their paths and another table with file sharings by user and files.
Below is Table files, simplified. Example:
- I have a directory A (path /A) containing some files.
- Directory B is inside A and its path is /A/B and it contains some files.
- Files C (path /A/B/C) and D (path /A/B/D) are inside B.
id | path |
---|---|
1 | /A |
2 | /A/B |
3 | /A/B/C |
4 | /A/B/D |
Below is table file_sharings, simplified. If a user has a file sharing with path=’/A’ and readable=true, it means that he can read every files with a path starting with ‘/A’
Now I have that in this file (it’s only an example, it can be more complex):
id | user_id | file_id | path | readable |
---|---|---|---|---|
1 | 1 | 1 | ‘/A’ | true |
2 | 1 | 2 | ‘/A/B’ | false |
3 | 1 | 4 | ‘/A/B/D’ | true |
So, it means that user 1 can see everything in the directory A, but he can’t see directory B and file C but he can see file D (the same way we can have with google drive for instance)
What I want is a query to have all files/directories that user 1 can see.
For the moment I have something like this:
SELECT *
FROM files
WHERE path LIKE ANY (SELECT fs.path || '%'
FROM file_sharings fs
WHERE fs.user_id='1'
AND fs.readable=true)
AND (path NOT LIKE ANY (SELECT fs.path || '%'
FROM file_sharings fs
WHERE fs.user_id='1'
AND fs.readable=false)
);
but it returns only:
/A
If I do:
SELECT *
FROM files
WHERE path LIKE ANY (SELECT fs.path || '%'
FROM file_sharings fs
WHERE fs.user_id='1'
AND fs.readable=true)
AND (path NOT IN (SELECT fs.path
FROM file_sharings fs
WHERE fs.user_id='1'
AND fs.readable=false)
);
… it returns:
/A
/A/B/C
/A/B/D
I don’t know how to have all files that user 1 can read except the ones he can’t read. Here I must have /A
and /A/B/D
, but in reality the tree files can be more complex but the tenet is the same that in this simple example.
It’s easy in the real life but I’m stuck to create the good request 😀
Here a dbfiddle
When I have no row in the DB, it means the user has no readable rights.
When I have a row with readable=true, it means the user has a readable right on the file and the children (the files in the repository).
When I have a row with readable=false, it means that the file and its children are specifically forbidden to the user.
I do this way because I don’t want to create billion rows when I share a root directory to another user.
file_id
is a foreign key used only to recalculate the file_sharings path when I move a file/repository into another one.
So I can have a readable right on a file whereas it is not explicitly in the DB
Here a picture of the fiddle example for user1:
With this request:
SELECT *
FROM files
WHERE path LIKE ANY (SELECT fs.path || '%'
FROM file_sharings fs
WHERE fs.user_id='1'
AND fs.readable=true)
AND (path NOT LIKE ANY (SELECT fs.path || '%'
FROM file_sharings fs
WHERE fs.user_id='1'
AND fs.readable=false)
);
It returns A1
, B2
and C3
. I should also have C2
.
With the second request:
SELECT *
FROM files
WHERE path LIKE ANY (SELECT fs.path || '%'
FROM file_sharings fs
WHERE fs.user_id='1'
AND fs.readable=true)
AND (path NOT IN (SELECT fs.path
FROM file_sharings fs
WHERE fs.user_id='1'
AND fs.readable=false)
);
It returns A1
, B2
, C3
, C2
and C1
. I should not have C1
.
The Solutions:
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
Method 1
You didn’t declare any constraints except NOT NULL
, so I assumed reasonable constraints as shown in my fiddle below.
I didn’t understand the role of file_id
, so I ignored it completely. (Seems orthogonal to the question.)
Your logic boils down to this: each file has permissions according to the longest matching path. So match every file to rows in file_sharings
with LIKE
and pick the row with the longest path. Only files with a readable
flag survive:
SELECT (f).*
FROM (
SELECT DISTINCT ON (f.id)
f, s.readable
FROM files f
JOIN file_sharings s ON f.path LIKE s.path || '%'
WHERE s.user_id = 1
ORDER BY f.id, s.path DESC -- choose the longest matching path !
) sub
WHERE readable
ORDER BY (f).path;
Produces your desired result.
db<>fiddle here
I simplified your fiddle, added the required UNIQUE
constraint, and fixed a data error to be in sync with the question.
It’s essential to select the match with the longest path in a subquery before filtering the readable
ones.
About DISTINCT ON
:
Depending on undisclosed cardinalities and data distribution, there may be ways to improve performance …
Aside
I select the whole row f
in the subquery for convenience – so that I don’t have to spell out all column names of table file
in the outer SELECT
.
Here is a more verbose form without that trick:
SELECT id, name, path
FROM (
SELECT DISTINCT ON (f.id)
f.*, s.readable
FROM files f
JOIN file_sharings s ON f.path LIKE s.path || '%'
WHERE s.user_id = 1
ORDER BY f.id, s.path DESC -- choose the longest matching path !
) sub
WHERE readable
ORDER BY path;
A subtle difference (besides short code) surfaces after changing the table definition of file
later: the first query keeps returning all columns as listed in the table definition (result changes with the changed table), while the second query keeps returning columns as explicitly listed. “Late binding” vs. “early binding”.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0