Varchar vs Char in MySQL : understanding trailing spaces

MySQL is not a strictly typed database. Given that and the different character types some tricky situation might arise. Especially when it comes to trailing spaces. Since this caused some head scratching hours to a colleague of mine I thought it’s worth a bit more explanation.

To understand the problem first we need to see what these two types are actually doing.

The difference between CHAR and VARCHAR

Character type is pretty simple. You specify the length and it will become a character string with that amount of characters. But how a ‘cat’ becomes a CHAR(8). Simply it gets padded with spaces. Physically it stored as ‘cat     ‘ (with 5 spaces).

Varchar as the name suggests is meant to store variable length strings. So ‘cat’ is stored as ‘3cat’ where the first byte indicates the length of the string and 2 byte if it’s larger than varchar(255).

Disclaimer

Of course it is actually stored on disk as the number representation of the given character which depends on character sets and collation but for our case now it’s irrelevant.

So what happens with trailing spaces?

Let’s have a table and insert ‘asd  ‘ (asd + 2 spaces) into every column to see what happens with them.

First let’s see what we get back. I add an extra . to the end of each to see the end of the string.

Retrieve data

Varchar(5) returns what we inserted. Varchar(3) truncated the column and both CHAR column has only 3 characters in.

It makes more sense when you think about what happens with ‘asd’, ‘asd ‘ (one space) and ‘asd  ‘ (two spaces) after being inserted into CHAR(5). They are all stored as ‘asd  ‘. When retrieving MySQL cannot tell if the spaces are part of data or the padding so it assumes the latter. Hence the length of ‘asd  ‘ after inserted becomes 3.

Conditions

What if I want to filter on them? Let’s see which one is equal with ‘asd ‘ (one space). Logiccaly neither of them should.

For every column type the condition is true. Which means that both the 3 length and the 5 length columns were matching my ‘asd ‘ (one space, 4 length) string.

In a simple select ‘asd’ matches with ‘asd ‘ and ‘asd  ‘ as well.

But the length of them don’t.

So two string which are different length are actually matching with the equal operator.

Why is this a problem?

Data mutation

The data is changed between insert and retrieve without warning and that’s not expected. If I insert ‘cat  ‘ (two spaces, length = 5) to a CHAR(5) column which is absolutely legit I will get back only three characters.

Let’s see the same thing in postgresql:

I get the same data what I inserted. The length is returning 3 though to the char column. The same with the cat example.

One can say that use varchar everywhere and it’s solved. Not quite yet.

Comparison

MySQL cannot identify that a one space padding string is not equal even with the varchar(5) column where it actually stores the length and that can lead to confusion as we saw above. It could be explained with treating the input as char and matching that to the varchar column and due to casting results in equality. But the same thing happens if the table is joined with itself on the data column (varchar(5).

MySQL happily joined the two rows where one had one space and the other had two even though it stores the difference.

The comparison in PgSQL

For the VARCHARs postgresql correctly identifies that the varchar columns are not equal with ‘asd ‘ (one space). It returns true though for the CHAR columns where the padding makes it equal.

Conclusion

To overcome the query issues use LIKE instead of = and use the proper datatype for your columns. If space is a valid and part of the data then CHAR might not be the best field type.

 

 

You might like these too

Full text search in MySQL Full text is a critical point when it comes to mysql. It used to have that feature in MyISAM but that's not really maintained anymore nor it is advise...
Corrupted InnoDB dictionary Last week one of my collagues came to me with the following problem. He was not able to create an innodb table. MySQL returned ERROR 1005: Can't creat...
Difference between DISTINCT and GROUP BY Today we had an interesting situation where the same query was executed significantly slower when it was written with GROUP BY instead of DISTINCT and...