TLDR: Trying to optimize JSON queries in MYSQL, make use of stored generated columns
Ran into a situation where I had a table with loads and loads of records that included a meta
JSON column, ideally the fields in the JSON column were queried using JSON_EXTRACT
models.sequelize.where(
models.sequelize.fn(
'JSON_EXTRACT', models.sequelize.col('meta'), models.sequelize.literal('$.type')
),
Op.eq,
query.type
)
JSON columns are useful for flexible data models, but they have limitations; queries on JSON fields can’t make use of indexes. As the number of records increased, the limitations of not being able to index a JSON column began to become apparent, and queries started to become considerably slower.
STORED GENERATED COLUMN
A generated column in MySQL is a column where the values are computed from the values of other columns in the same row. A generated column can be either virtual or stored
- Virtual column - Data is computed every time the data is queried; data isn’t stored in the DB
- Stored generated column - Data is computed when the source columns are created/updated and stored on disk, similar to how a normal column would be stored
Virtual Column vs Stored-generated column
Virtual column |
Stored Generated column |
|
---|---|---|
Can query against |
Yes |
Yes |
Can index |
No |
Yes |
Uses a lot of space |
No |
Yes |
Columns are stored physically |
No |
Yes |
Automatic updates |
No, data is re-computed every time the column is queried |
Yes, the column is updated automatically |
Created or updated directly |
No |
No |
Migration
await queryInterface.sequelize.query(`
ALTER TABLE customer ADD COLUMN type VARCHAR(30)
AS (JSON_UNQUOTE(JSON_EXTRACT(meta, '$.type'))) STORED,
ADD INDEX type_idx (type); `);
This migration script adds a type
column to the customer
table, it then tells MySQL to compute the value by extracting it from the meta.type
and store STORED
as a generated column. ADD INDEX type_idx (type)
adds an index on the field to allow for faster queries
Benefits
- Columns are indexable
- Queries are faster than querying JSON columns
- You can query against the stored column as you’d query any other column
Cons
- Makes use of a little more space but as someone I know would say SPACE IS CHEAP
Conclusion
In conclusion, if there’s a need to query JSON columns at scale, extract the fields most frequently queried into stored generated columns to take advantage of the performance boosts indexing provides