MySQL Benchmark – updates by primary vs secondary keys

(Note: when I’m talking about MySQL I usually assume InnoDB storage engine. Any other case I explicitly tell this is MyISAM or Memory etc.)

I’ve heared an interesting aproach of using Master-slave replication in MySQL.

Thesis

So the theory was that since updates by primary keys are fast and by secondary keys are slow the slave has to be queried for the primary key and then run the updates by the fetched primary keys. To make this in context and more understandable:

Original query

UPDATE table_for_test SET value_to_change = 123 WHERE cond_column_1 = 987 AND cond_column_2 > 765;

1	UPDATE table_for_test SET value_to_change = 123 WHERE cond_column_1 = 987 AND cond_column_2 > 765;

This query get splitted to two different query. First query has to be run on the slave to fetch the primary keys:

SELECT pr_id_col FROM table_for_test WHERE cond_column_1 = 987 AND cond_column_2 > 765;

1	SELECT pr_id_col FROM table_for_test WHERE cond_column_1 = 987 AND cond_column_2 > 765;

When we have the values we can go to the master and update the necessary records.

UPDATE table_for_test SET value_to_change = 123 WHERE pr_id_col IN (1, 2, 3, 4, 5, 6, 726, 8912, 12335, 84739, ...);

1	UPDATE table_for_test SET value_to_change = 123 WHERE pr_id_col IN (1, 2, 3, 4, 5, 6, 726, 8912, 12335, 84739, ...);

Antithesis

What’s my problem with this? First of all this creates unnessessary complexity to the application and extra load on the slave server(s). This is the typical case when the developer tries to outsmart MySQL. I saw sometimes working examples but those are the exceptions not the rule. How MySQL and InnoDB works primary key lookups are indeed lightning fast. But aren’t secondary key lookups good enough? I don’t think so. Especially with the adaptive hash buffer (will go into details in another post) if the memory is comparable with the size of the dataset or at least with total size of the data hot spots the secondary key lookups with covering index is as good as primary keys.

Synthesis

The test

So I (again) decided to create a benchmark suite specifically for testing this. It can be found on github.

Schema

create table t_update_test (
c_primary int unsigned not null AUTO_INCREMENT,
c_secondary int unsigned not null,
value_to_update varchar(32) not null,
PRIMARY KEY (c_primary)
) Engine=InnoDB;

create index sec_idx1 on t_update_test(c_secondary);

create table query_times (
id int unsigned not null AUTO_INCREMENT,
qtype tinyint unsigned not null,
qcondition varchar(96) not null default '',
qtime decimal(12,4),
PRIMARY KEY (id),
KEY (qtype)
) Engine=InnoDB;

create table t_update_test (

c_primary int unsigned not null AUTO_INCREMENT,

c_secondary int unsigned not null,

value_to_update varchar(32) not null,

PRIMARY KEY (c_primary)

) Engine=InnoDB;

create index sec_idx1 on t_update_test(c_secondary);

create table query_times (

id int unsigned not null AUTO_INCREMENT,

qtype tinyint unsigned not null,

qcondition varchar(96) not null default '',

qtime decimal(12,4),

PRIMARY KEY (id),

KEY (qtype)

) Engine=InnoDB;

Testing

After loading data to the table I started 16 threads to update random rows with random data in random time. During the test occasionally I executed some huge query in the MySQL too (usually with full scan needed). Some of them on the same table, some of them on the same database but other tables and some of them on a different database. My goal was to model a usual pattern of a database server which is exposed to heavy duty use.

while (1) {
		my $query = int(rand(2));
		my $condition; 
		switch ($query) {
			my $key = int(rand($max_rows));
			case 0 {$condition = "c_primary = $key";}
			case 1 {$condition = "c_secondary = $key";}
		} 
		my $stmt = $tdbh->prepare("update t_update_test set value_to_update = ? where $condition;");
		my $start = time;
		$stmt->execute(md5_hex(int(rand(65536)) . localtime ));
		my $delta = time - $start;
		log_time($tdbh, $query, $condition, $delta * 10000);
		sleep rand(3) + 1;
	}

while (1) {

my $query = int(rand(2));

my $condition;

switch ($query) {

my $key = int(rand($max_rows));

case 0 {$condition = "c_primary = $key";}

case 1 {$condition = "c_secondary = $key";}

}

my $stmt = $tdbh->prepare("update t_update_test set value_to_update = ? where $condition;");

my $start = time;

$stmt->execute(md5_hex(int(rand(65536)) . localtime ));

my $delta = time - $start;

log_time($tdbh, $query, $condition, $delta * 10000);

sleep rand(3) + 1;

}

Results

The results are standing for themselves.

Qtype 0 is the primary key update
Qtype 1 is the secondary key update

select count(*) from query_times; 
+----------+ 
| count(*) | 
+----------+ 
| 4317935 | 
+----------+

select qtype, avg(qtime), min(qtime), max(qtime) from query_times group by qtype; 
+-------+-------------+------------+-------------+ 
| qtype | avg(qtime) | min(qtime) | max(qtime) |  
+-------+-------------+------------+-------------+  
| 0 | 59.45557799 | 14.2694 | 133739.8100 | 
| 1 | 56.37420515 | 13.5088 | 133559.5894 | 
+-------+-------------+------------+-------------+

select count(*) from query_times;

+----------+

| count(*) |

+----------+

| 4317935 |

+----------+

select qtype, avg(qtime), min(qtime), max(qtime) from query_times group by qtype;

+-------+-------------+------------+-------------+

+-------+-------------+------------+-------------+

| 0 | 59.45557799 | 14.2694 | 133739.8100 |

| 1 | 56.37420515 | 13.5088 | 133559.5894 |

+-------+-------------+------------+-------------+

So the average execution time for updates by primary key lookups is 59.46 milisecs while updates by secondary keys took in average 56.37 milisecs.

Final words

This was an empirical experiment. I didn’t want to solve any other issue than prove that secondary keys cost no more than primary key for updates which I managed to do.

You might like these too

How to troubleshoot MySQL replication issues? In MySQL a big portion of the problems you're facing is different replication delays so there's no surprise this is one of the most common interview q...

How to store boolean in MySQL I always hear newer and newer exotic way to store different types of data in MySQL. People are trying to solve the problem of storing complex and not ...

Corrupted InnoDB dictionary Last week one of my collagues came to me with the following problem. He was not able to create an innodb table. MySQL returned ERROR 1005: Can't creat...

MySQL Benchmark – updates by primary vs secondary keys

Thesis

Antithesis

Synthesis

The test

Schema

Testing

Results

Final words

You might like these too

You may also like...

Categories

Links

Recent comments

Tweets

MySQL Benchmark – updates by primary vs secondary keys

Thesis

Antithesis

Synthesis

The test

Schema

Testing

Results

Final words

You might like these too

You may also like...

Lighthouse before storm

PERL / Catalyst test and benchmark

Unindexed queries can be really expensive

Categories

Links

Recent comments

Tweets