Introduction

Strings are all over databases, from emails and usernames to messages, logs, and data. It appears easy to use strings in MySQL at first: simply choose VARCHAR or TEXT and go on. In practice, string handling conceals some of the most imperceptible and harmful dangers. Unlike numbers or dates, string behavior heavily depends on encodings and data/index storage formats.

This article is part of a series about MySQL data types. It covers string types, how they function, what may go wrong, and how to make safe choices.


Overview of String Types

Character types:

Binary types:

MySQL has two sorts of strings: character and binary. Character types are read based on a charset and collation, but binary types only store raw bytes without any constraints.

When you use fixed-width types like CHAR(n) or BINARY(n), they always use the full length and fill in the gaps with spaces or zeros. This makes storage predictable, but it can waste capacity.

Variable-width types, such as VARCHAR(n) and VARBINARY(n), only utilize as many bytes as they need plus a little extra space for length. They work better for content with varied lengths, but they take a little longer to update when the values change in size.

The (n) for variable-length types tells you how many characters (for character types) or bytes (for binary types) can be in the type. The actual storage depends on the content; shorter values take up less space, but no value can be bigger than that. VARCHAR(50) in utf8mb4 can hold up to 50 characters, but it may need up to 200 bytes of space inside because each character can occupy up to 4 bytes.


Pitfalls and Lessons

Encodings and Collations

-- Example: trailing spaces ignored
table> CREATE TABLE t (val VARCHAR(10) UNIQUE);
table> INSERT INTO t VALUES ('abc');
table> INSERT INTO t VALUES ('abc   ');
ERROR 1062 (23000): Duplicate entry 'abc' for key 'val'

Storage and Indexing

-- Example: index too long
table> CREATE TABLE users (
  name VARCHAR(1000),
  INDEX(name)
) CHARSET=utf8mb4;
-- ERROR 1071: Specified key was too long

Text vs Binary Semantics

SELECT LENGTH('🙂'), CHAR_LENGTH('🙂');
-- LENGTH = 4, CHAR_LENGTH = 1

Search and Sorting


Application–Database Boundary

-- Safe session setup
SET NAMES utf8mb4 COLLATE utf8mb4_0900_ai_ci;

Best Practices


Conclusion

String types may seem easy, but they can create the worst issues, including text that doesn't make sense, broken uniqueness, delayed queries, or comparisons that aren't what you intended. The problems are caused by how MySQL deals with encodings, collations, and storage restrictions.

You may avoid problems before they happen in production by using safe defaults (utf8mb4, modern collations, and the right use of VARCHAR/BINARY) and knowing what to look out for.