The question:
In Db2 v11.5.7 on Linux I have simple table:
create table admin.patient_data (
patientid int not null primary key,
patient_name varchar(10),
illness varchar(15),
doctor_name varchar(10)
);
insert into admin.patient_data values (1, 'Alice', 'illness A', 'DOCTOR1');
insert into admin.patient_data values (2, 'Bob', 'illness B', 'DOCTOR2');
select * from admin.patient_data;
Result:
PATIENTID PATIENT_NAME ILLNESS DOCTOR_NAME
----------- ------------ --------------- ----------
1 Alice illness A DOCTOR1
2 Bob illness B DOCTOR2
Column ILLNESS is sensitive data. I would like to allow this column to be displayed only if column DOCTOR_NAME matches Db2 logged-in user.
create mask admin.patient_data on admin.patient_data
for column illness return
case when doctor_name = USER then illness else 'Masked data' end
enable;
alter table admin.patient_data activate column access control;
Now user DOCTOR1 connects to database and checks the data with:
select * from admin.patient_data
it is returned:
PATIENTID PATIENT_NAME ILLNESS DOCTOR_NAME
----------- ------------ --------------- -----------
1 Alice illness A DOCTOR1
2 Bob Masked data DOCTOR2
In row PATIENTID=2 column ILLNESS is expected to be masked.
But doctor is educated, so he/she knows all the illnesses and now target “illness B”
db2 "select * from admin.patient_data where illness = 'illness B'"
and it gets:
PATIENTID PATIENT_NAME ILLNESS DOCTOR_NAME
----------- ------------ --------------- -----------
2 Bob Masked data DOCTOR2
Field ILLNESS is still marked as expected, but now because of where conditional DOCTOR1 knows that patient Bob has “illness B”.
I expected that end-users could filter by values they are returned with where condition. That is: “illness A” and “Marked data”.
Is there some simple solution to prevent this? I expect to have no record returned for last select statement.
The Solutions:
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
Method 1
This is Working As Designed™:
The application of enabled column masks does not interfere with the operations of other clauses within the statement such as the
WHERE
,GROUP BY
,HAVING
,SELECT DISTINCT
, andORDER BY
. The rows returned in the final result table remain the same, except that the values in the resulting rows might be masked by the column masks.
If you want to prevent access to specific rows, you’ll need to activate row access control and define the required row permissions.
Method 2
Masking is done pretty late in the process, before returning data to end-user, way after WHERE condition is executed.
If the requirement is to strictly do the masking on database level (and not on application level) like all doctors have to have access to all of the patient names, then I see no other option then redesign ER model. This kind of kills the main “marketing” idea masking to be applied on top of existing tables.
First option is to use views instead of data masking and applications access views instead of tables.
Second option is to separate sensitive and non-sensitive data in separate tables. Like having “patientid”, “patient_name”, “doctor_name” in “patient” table and separate patient-illness relationship in separate table with “patientid” and “illness” fields. Or having “patientid”, “patient_name”, “doctor_name” and new field “illnessid” in “patient” table and illnessid and illness in new illness table. Then alongside of column masking also use row access control in this two separated tables.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0