The question:
I was originally going to ask this on SO, but thought better of it, so I am trying my hand here.
I have searched this out and can’t find any definitive data. I have a MySQL table that looks like:
| id | type | value |
The value
field contains a JSON string and the column type is JSON. My JSON contains identification information that I’d like to be able to call on IE
WHERE json_extract(value, '$.contractor_id')='12345'
What are the performance ramifications doing it this way instead of just creating a separated contractor_id
column? This specific table has ~500,000 rows. The contractor_id
isn’t a key-able or index-able field either … So is it really 6 one way half dozen another? Or is there a specific reason I need to create a separated column for performance sake?
The Solutions:
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
Method 1
Depends on your version of MySQL.
In MySQL 5.7, it was possible to create a virtual column based on the json_extract() expression, and an index on that virtual column. But you must search using that virtual column to use the index.
mysql> create table mytable (id serial primary key, type text, value json);
Query OK, 0 rows affected (0.01 sec)
mysql> insert into mytable set value = '{"contractor_id": "12345"}';
Query OK, 1 row affected (0.00 sec)
mysql> alter table mytable add column contractor_id int as (json_extract(value, '$.contactor_id')), add index (contractor_id);
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> explain select * from mytable where json_extract(value, '$.contractor_id')='12345';
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | mytable | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)
mysql> explain select * from mytable where contractor_id='12345';
+----+-------------+---------+------------+------+---------------+---------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+---------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | mytable | NULL | ref | contractor_id | contractor_id | 5 | const | 1 | 100.00 | NULL |
+----+-------------+---------+------------+------+---------------+---------------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
In MySQL 8.0, you can create virtual indexes on expressions, without needing to create the virtual column first. But there are restrictions on indexing json expressions.
mysql> alter table mytable add index ((json_unquote(json_extract(value, '$.contractor_id'))));
ERROR 3757 (HY000): Cannot create a functional index on an expression that returns a BLOB or TEXT. Please consider using CAST.
So I have to cast it to an integer:
mysql> alter table mytable add index ((cast(json_unquote(json_extract(value, '$.contractor_id')) as signed)));
Query OK, 0 rows affected (0.01 sec)
The optimizer seems to be able to figure this out, and I can make use of the index even based on part of the expression.
mysql> explain select * from mytable where json_extract(`value`,'$.contractor_id')=12345;
+----+-------------+---------+------------+------+------------------+------------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+------------------+------------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | mytable | NULL | ref | functional_index | functional_index | 9 | const | 1 | 100.00 | NULL |
+----+-------------+---------+------------+------+------------------+------------------+---------+-------+------+----------+-------+
But all these idiosyncrasies regarding JSON in an RDBMS are a bother. You end up being forced into solutions that are complex and require deep understanding of advanced features.
Why not just create contractor_id
or any other attribute you want to be indexed as a normal column? That’s far simpler.
The more I see people using JSON in MySQL, the more I feel it is one of the worst and most unnecessary features to be added to the product.
There are cases where you might genuinely need to have a “semi-structured” column, when you have to store data that has variable fields. JSON is good for this, or XML, or YAML, or protobufs, etc. But then making them support SQL operations as if they are normal columns is not a good strategy.
Use normal columns for attributes that you want to search or sort by. If you must, use JSON only as a “payload” column to store the variable data.
You might also like my presentation How to Use JSON in MySQL Wrong. I developed that presentation before MySQL 8.0, so it doesn’t cover expression indexes, but other points are still true, like JSON requiring a lot more space to store the same data.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0