The question:
As I said, I found a strange phenomenon when referring to other people’s projects,
some people’s table columns (qualified for the title) use char instead of tinyint, such as
create table A(
id int not null auto_increment,
a_seq char(9) comment 'The first one is 1, 2, 3 (national, private, foreign)'
a_type char(1) comment '0 is normal, 1 is disable'
a_status char(1) comment '0 is visible, 1 is not'
)
For a_type
and a_status
, both char(1) and tinyint(1) are one byte, and the comparison speed of numeric characters and numbers may be as fast, so the difference between the two is not very big.
So for a_seq
, why not use int for storage? int only takes up 4 bytes, but char(9) takes up 9 bytes. If you add a UNIQUE
index to a_seq
, doesn’t char(9) take up space and be slow?
To add, I also saw someone store the year (2020, 2021) in char(4) instead of shortint
Can anyone tell me the reason for this empirically, as I’m getting confused by this.
The Solutions:
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
Method 1
Why sometimes choose to use char instead of int
A half of access libraries/frameworks provides parameter values as strings unconditionally. In this case the server compares the column and literal as strings, without any type convertion.
the comparison speed of numeric characters and numbers may be as fast
I doubt. If one value to be compared is numeric and another one is string then both values to be compared are converted to floating point (double precision) values. See Type Conversion in Expression Evaluation.
If you add a UNIQUE index to a_seq, doesn’t char(9) take up space and be slow?
More disk space? of course. Slow? it depends.
Method 2
INT
takes 4 bytes. CHAR(9)
takes at least 9 bytes. If you are putting a number in CHAR(9)
, then you can get up to 1 billion. INT
can handle a bigger number than that.
Significantly, putting numbers in CHAR
(or VARCHAR
) make them compare ‘incorrectly’ when using an inequality test:
2 < 10 -- numeric comparison
"2" > "10" -- because "2" > "1"
Which do you need?
There is a datatype called YEAR
; wny not use that?
VARCHAR
stores only the characters needed; CHAR
pads to the length given. Don’t use CHAR
unless you really need the padding.
There are similar arguments for using DATETIME
instead of VARCHAR
.
In general, use the appropriate datatype!
As for speed, … fetching the rows involved is far more costly than something as trivial as comparing one but with another. The time difference is probably so insignificant as to be essentially impossible to measure.
The default collation in the MySQL 8.0 is a complex utf8mb4 (UTF-8) collation. Even for a simple equality test, it must check each byte in a complete way. A simple example is “B” = “b” — that is case folding. Add accented letters or non-spacing accents and it becomes much more complex.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0