MySQL is not a strictly typed database. Given that and the different character types some tricky situation might arise. Especially when it comes to trailing spaces. Since this caused some head scratching hours to a colleague of mine I thought it’s worth a bit more explanation.
To understand the problem first we need to see what these two types are actually doing.
The difference between CHAR and VARCHAR
Character type is pretty simple. You specify the length and it will become a character string with that amount of characters. But how a ‘cat’ becomes a CHAR(8). Simply it gets padded with spaces. Physically it stored as ‘cat ‘ (with 5 spaces).
Varchar as the name suggests is meant to store variable length strings. So ‘cat’ is stored as ‘3cat’ where the first byte indicates the length of the string and 2 byte if it’s larger than varchar(255).
Disclaimer
Of course it is actually stored on disk as the number representation of the given character which depends on character sets and collation but for our case now it’s irrelevant.
So what happens with trailing spaces?
Let’s have a table and insert ‘asd ‘ (asd + 2 spaces) into every column to see what happens with them.
1 2 3 4 5 6 7 8 9 10 |
CREATE TABLE `test_one` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `data` varchar(5) DEFAULT NULL, `data_varch_3` varchar(3) DEFAULT NULL, `data_char_5` char(5) DEFAULT NULL, `data_char_3` char(3) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1; insert into test_one (data, data_varch_3, data_char_5, data_char_3) values ('asd ', 'asd ', 'asd ', 'asd '); |
First let’s see what we get back. I add an extra . to the end of each to see the end of the string.
Retrieve data
1 2 3 4 5 6 7 8 9 10 11 |
mysql> select concat(data, '.'), concat(data_varch_3, '.'), concat(data_char_5, '.'), concat(data_char_3, '.'), length(data), length(data_varch_3), length(data_char_5), length(data_char_3) from test_one\G *************************** 1. row *************************** concat(data, '.'): asd . concat(data_varch_3, '.'): asd. concat(data_char_5, '.'): asd. concat(data_char_3, '.'): asd. length(data): 5 length(data_varch_3): 3 length(data_char_5): 3 length(data_char_3): 3 1 row in set (0.00 sec) |
Varchar(5) returns what we inserted. Varchar(3) truncated the column and both CHAR column has only 3 characters in.
It makes more sense when you think about what happens with ‘asd’, ‘asd ‘ (one space) and ‘asd ‘ (two spaces) after being inserted into CHAR(5). They are all stored as ‘asd ‘. When retrieving MySQL cannot tell if the spaces are part of data or the padding so it assumes the latter. Hence the length of ‘asd ‘ after inserted becomes 3.
Conditions
What if I want to filter on them? Let’s see which one is equal with ‘asd ‘ (one space). Logiccaly neither of them should.
1 2 3 4 5 6 7 |
mysql> select data = 'asd ', data_varch_3 = 'asd ', data_char_5 = 'asd ', data_char_3 = 'asd ' from test_one\G *************************** 1. row *************************** data = 'asd ': 1 data_varch_3 = 'asd ': 1 data_char_5 = 'asd ': 1 data_char_3 = 'asd ': 1 1 row in set (0.00 sec) |
For every column type the condition is true. Which means that both the 3 length and the 5 length columns were matching my ‘asd ‘ (one space, 4 length) string.
In a simple select ‘asd’ matches with ‘asd ‘ and ‘asd ‘ as well.
1 2 3 4 |
mysql> select 'asd ' = 'asd ','asd ' = 'asd'\G *************************** 1. row *************************** 'asd ' = 'asd ': 1 'asd ' = 'asd': 1 |
But the length of them don’t.
1 2 3 4 |
mysql> select length('asd '), length('asd ')\G *************************** 1. row *************************** length('asd '): 4 length('asd '): 5 |
So two string which are different length are actually matching with the equal operator.
Why is this a problem?
Data mutation
The data is changed between insert and retrieve without warning and that’s not expected. If I insert ‘cat ‘ (two spaces, length = 5) to a CHAR(5) column which is absolutely legit I will get back only three characters.
1 2 3 4 5 6 7 8 9 10 11 12 |
mysql> insert into test_one (data_char_5) values ('cat '); Query OK, 1 row affected (0.01 sec) mysql> show warnings; Empty set (0.00 sec) mysql> select data_char_5, concat(data_char_5, '.'), length(data_char_5) from test_one where id = 2\G *************************** 1. row *************************** data_char_5: cat concat(data_char_5, '.'): cat. length(data_char_5): 3 1 row in set (0.00 sec) |
Let’s see the same thing in postgresql:
1 2 3 4 5 6 7 8 9 10 11 |
test=# create table test_one (id serial, data varchar(5), data_varch_3 varchar(3), data_char_5 char(5), data_char_3 char(3)); CREATE TABLE Time: 70.457 ms test=# insert into test_one (data, data_varch_3, data_char_5, data_char_3) values ('asd ', 'asd ', 'asd ', 'asd '); INSERT 0 1 Time: 30.613 ms test=# select concat(data, '.'), concat(data_varch_3, '.'), concat(data_char_5, '.'), concat(data_char_3, '.'), length(data), length(data_varch_3), length(data_char_5), length(data_char_3) from test_one; concat | concat | concat | concat | length | length | length | length --------+--------+--------+--------+--------+--------+--------+-------- asd . | asd. | asd . | asd. | 5 | 3 | 3 | 3 (1 row) |
I get the same data what I inserted. The length is returning 3 though to the char column. The same with the cat example.
1 2 3 4 5 6 7 8 |
test=# insert into test_one (data_char_5) values ('cat '); INSERT 0 1 Time: 21.461 ms test=# select data_char_5, concat(data_char_5, '.'), length(data_char_5) from test_one where id = 2; data_char_5 | concat | length -------------+--------+-------- cat | cat . | 3 (1 row) |
One can say that use varchar everywhere and it’s solved. Not quite yet.
Comparison
MySQL cannot identify that a one space padding string is not equal even with the varchar(5) column where it actually stores the length and that can lead to confusion as we saw above. It could be explained with treating the input as char and matching that to the varchar column and due to casting results in equality. But the same thing happens if the table is joined with itself on the data column (varchar(5).
1 2 3 4 5 6 7 8 9 10 11 |
mysql> insert into test_one (data) values ('asd '); -- one space Query OK, 1 row affected (0.04 sec) mysql> select * from test_one t1 join test_one t2 on (t1.data = t2.data and t1.id != t2.id); +----+-------+--------------+-------------+-------------+----+-------+--------------+-------------+-------------+ | id | data | data_varch_3 | data_char_5 | data_char_3 | id | data | data_varch_3 | data_char_5 | data_char_3 | +----+-------+--------------+-------------+-------------+----+-------+--------------+-------------+-------------+ | 4 | asd | NULL | NULL | NULL | 1 | asd | asd | asd | asd | | 1 | asd | asd | asd | asd | 4 | asd | NULL | NULL | NULL | +----+-------+--------------+-------------+-------------+----+-------+--------------+-------------+-------------+ 2 rows in set (0.00 sec) |
MySQL happily joined the two rows where one had one space and the other had two even though it stores the difference.
The comparison in PgSQL
1 2 3 4 |
test=# select data = 'asd ', data_varch_3 = 'asd ', data_char_5 = 'asd ', data_char_3 = 'asd ' from test_one; ?column? | ?column? | ?column? | ?column? ----------+----------+----------+---------- f | f | t | t |
For the VARCHARs postgresql correctly identifies that the varchar columns are not equal with ‘asd ‘ (one space). It returns true though for the CHAR columns where the padding makes it equal.
Conclusion
To overcome the query issues use LIKE instead of = and use the proper datatype for your columns. If space is a valid and part of the data then CHAR might not be the best field type.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
mysql> select * from test_one where data like 'asd'; Empty set (0.00 sec) mysql> select * from test_one where data like 'asd '; +----+------+--------------+-------------+-------------+ | id | data | data_varch_3 | data_char_5 | data_char_3 | +----+------+--------------+-------------+-------------+ | 4 | asd | NULL | NULL | NULL | +----+------+--------------+-------------+-------------+ 1 row in set (0.00 sec) mysql> select * from test_one where data like 'asd '; +----+-------+--------------+-------------+-------------+ | id | data | data_varch_3 | data_char_5 | data_char_3 | +----+-------+--------------+-------------+-------------+ | 1 | asd | asd | asd | asd | +----+-------+--------------+-------------+-------------+ 1 row in set (0.00 sec) |
Recent comments