TLDR: Trying to optimize JSON queries in MYSQL, make use of stored generated columns

Ran into a situation where I had a table with loads and loads of records that included a meta JSON column, ideally the fields in the JSON column were queried using JSON_EXTRACT

models.sequelize.where(
  models.sequelize.fn(
      'JSON_EXTRACT', models.sequelize.col('meta'), models.sequelize.literal('$.type')
  ),
  Op.eq,
  query.type
)

JSON columns are useful for flexible data models, but they have limitations; queries on JSON fields can’t make use of indexes. As the number of records increased, the limitations of not being able to index a JSON column began to become apparent, and queries started to become considerably slower.

STORED GENERATED COLUMN

A generated column in MySQL is a column where the values are computed from the values of other columns in the same row. A generated column can be either virtual or stored

Virtual Column vs Stored-generated column

Virtual column

Stored Generated column

Can query against

Yes

Yes

Can index

No

Yes

Uses a lot of space

No

Yes

Columns are stored physically

No

Yes

Automatic updates

No, data is re-computed every time the column is queried

Yes, the column is updated automatically

Created or updated directly

No

No

Migration

await queryInterface.sequelize.query(`
   ALTER TABLE customer ADD COLUMN type VARCHAR(30)
    AS (JSON_UNQUOTE(JSON_EXTRACT(meta, '$.type'))) STORED,
   ADD INDEX type_idx (type); `);

This migration script adds a type column to the customer table, it then tells MySQL to compute the value by extracting it from the meta.type and store STORED as a generated column. ADD INDEX type_idx (type) adds an index on the field to allow for faster queries

Benefits

Cons

Conclusion

In conclusion, if there’s a need to query JSON columns at scale, extract the fields most frequently queried into stored generated columns to take advantage of the performance boosts indexing provides