(Note: when I’m talking about MySQL I usually assume InnoDB storage engine. Any other case I explicitly tell this is MyISAM or Memory etc.)
I’ve heared an interesting aproach of using Master-slave replication in MySQL.
Thesis
So the theory was that since updates by primary keys are fast and by secondary keys are slow the slave has to be queried for the primary key and then run the updates by the fetched primary keys. To make this in context and more understandable:
Original query
1 |
UPDATE table_for_test SET value_to_change = 123 WHERE cond_column_1 = 987 AND cond_column_2 > 765; |
This query get splitted to two different query. First query has to be run on the slave to fetch the primary keys:
1 |
SELECT pr_id_col FROM table_for_test WHERE cond_column_1 = 987 AND cond_column_2 > 765; |
When we have the values we can go to the master and update the necessary records.
1 |
UPDATE table_for_test SET value_to_change = 123 WHERE pr_id_col IN (1, 2, 3, 4, 5, 6, 726, 8912, 12335, 84739, ...); |
Antithesis
What’s my problem with this? First of all this creates unnessessary complexity to the application and extra load on the slave server(s). This is the typical case when the developer tries to outsmart MySQL. I saw sometimes working examples but those are the exceptions not the rule. How MySQL and InnoDB works primary key lookups are indeed lightning fast. But aren’t secondary key lookups good enough? I don’t think so. Especially with the adaptive hash buffer (will go into details in another post) if the memory is comparable with the size of the dataset or at least with total size of the data hot spots the secondary key lookups with covering index is as good as primary keys.
Synthesis
The test
So I (again) decided to create a benchmark suite specifically for testing this. It can be found on github.
Schema
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
create table t_update_test ( c_primary int unsigned not null AUTO_INCREMENT, c_secondary int unsigned not null, value_to_update varchar(32) not null, PRIMARY KEY (c_primary) ) Engine=InnoDB; create index sec_idx1 on t_update_test(c_secondary); create table query_times ( id int unsigned not null AUTO_INCREMENT, qtype tinyint unsigned not null, qcondition varchar(96) not null default '', qtime decimal(12,4), PRIMARY KEY (id), KEY (qtype) ) Engine=InnoDB; |
Testing
After loading data to the table I started 16 threads to update random rows with random data in random time. During the test occasionally I executed some huge query in the MySQL too (usually with full scan needed). Some of them on the same table, some of them on the same database but other tables and some of them on a different database. My goal was to model a usual pattern of a database server which is exposed to heavy duty use.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
while (1) { my $query = int(rand(2)); my $condition; switch ($query) { my $key = int(rand($max_rows)); case 0 {$condition = "c_primary = $key";} case 1 {$condition = "c_secondary = $key";} } my $stmt = $tdbh->prepare("update t_update_test set value_to_update = ? where $condition;"); my $start = time; $stmt->execute(md5_hex(int(rand(65536)) . localtime )); my $delta = time - $start; log_time($tdbh, $query, $condition, $delta * 10000); sleep rand(3) + 1; } |
Results
The results are standing for themselves.
- Qtype 0 is the primary key update
- Qtype 1 is the secondary key update
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
select count(*) from query_times; +----------+ | count(*) | +----------+ | 4317935 | +----------+ select qtype, avg(qtime), min(qtime), max(qtime) from query_times group by qtype; +-------+-------------+------------+-------------+ | qtype | avg(qtime) | min(qtime) | max(qtime) | +-------+-------------+------------+-------------+ | 0 | 59.45557799 | 14.2694 | 133739.8100 | | 1 | 56.37420515 | 13.5088 | 133559.5894 | +-------+-------------+------------+-------------+ |
So the average execution time for updates by primary key lookups is 59.46 milisecs while updates by secondary keys took in average 56.37 milisecs.
Final words
This was an empirical experiment. I didn’t want to solve any other issue than prove that secondary keys cost no more than primary key for updates which I managed to do.
Recent comments