MySQL Benchmark – updates by primary vs secondary keys

(Note: when I’m talking about MySQL I usually assume InnoDB storage engine. Any other case I explicitly tell this is MyISAM or Memory etc.)

I’ve heared an interesting aproach of using Master-slave replication in MySQL.

Thesis

So the theory was that since updates by primary keys are fast and by secondary keys are slow the slave has to be queried for the primary key and then run the updates by the fetched primary keys. To make this in context and more understandable:

Original query

UPDATE table_for_test SET value_to_change = 123 WHERE cond_column_1 = 987 AND cond_column_2 > 765;

1	UPDATE table_for_test SET value_to_change = 123 WHERE cond_column_1 = 987 AND cond_column_2 > 765;

This query get splitted to two different query. First query has to be run on the slave to fetch the primary keys:

SELECT pr_id_col FROM table_for_test WHERE cond_column_1 = 987 AND cond_column_2 > 765;

1	SELECT pr_id_col FROM table_for_test WHERE cond_column_1 = 987 AND cond_column_2 > 765;

When we have the values we can go to the master and update the necessary records.

UPDATE table_for_test SET value_to_change = 123 WHERE pr_id_col IN (1, 2, 3, 4, 5, 6, 726, 8912, 12335, 84739, ...);

1	UPDATE table_for_test SET value_to_change = 123 WHERE pr_id_col IN (1, 2, 3, 4, 5, 6, 726, 8912, 12335, 84739, ...);

Antithesis

What’s my problem with this? First of all this creates unnessessary complexity to the application and extra load on the slave server(s). This is the typical case when the developer tries to outsmart MySQL. I saw sometimes working examples but those are the exceptions not the rule. How MySQL and InnoDB works primary key lookups are indeed lightning fast. But aren’t secondary key lookups good enough? I don’t think so. Especially with the adaptive hash buffer (will go into details in another post) if the memory is comparable with the size of the dataset or at least with total size of the data hot spots the secondary key lookups with covering index is as good as primary keys.

Synthesis

The test

So I (again) decided to create a benchmark suite specifically for testing this. It can be found on github.

Schema

create table t_update_test (
c_primary int unsigned not null AUTO_INCREMENT,
c_secondary int unsigned not null,
value_to_update varchar(32) not null,
PRIMARY KEY (c_primary)
) Engine=InnoDB;

create index sec_idx1 on t_update_test(c_secondary);

create table query_times (
id int unsigned not null AUTO_INCREMENT,
qtype tinyint unsigned not null,
qcondition varchar(96) not null default '',
qtime decimal(12,4),
PRIMARY KEY (id),
KEY (qtype)
) Engine=InnoDB;

create table t_update_test (

c_primary int unsigned not null AUTO_INCREMENT,

c_secondary int unsigned not null,

value_to_update varchar(32) not null,

PRIMARY KEY (c_primary)

) Engine=InnoDB;

create index sec_idx1 on t_update_test(c_secondary);

create table query_times (

id int unsigned not null AUTO_INCREMENT,

qtype tinyint unsigned not null,

qcondition varchar(96) not null default '',

qtime decimal(12,4),

PRIMARY KEY (id),

KEY (qtype)

) Engine=InnoDB;

Testing

After loading data to the table I started 16 threads to update random rows with random data in random time. During the test occasionally I executed some huge query in the MySQL too (usually with full scan needed). Some of them on the same table, some of them on the same database but other tables and some of them on a different database. My goal was to model a usual pattern of a database server which is exposed to heavy duty use.

while (1) {
		my $query = int(rand(2));
		my $condition; 
		switch ($query) {
			my $key = int(rand($max_rows));
			case 0 {$condition = "c_primary = $key";}
			case 1 {$condition = "c_secondary = $key";}
		} 
		my $stmt = $tdbh->prepare("update t_update_test set value_to_update = ? where $condition;");
		my $start = time;
		$stmt->execute(md5_hex(int(rand(65536)) . localtime ));
		my $delta = time - $start;
		log_time($tdbh, $query, $condition, $delta * 10000);
		sleep rand(3) + 1;
	}

while (1) {

my $query = int(rand(2));

my $condition;

switch ($query) {

my $key = int(rand($max_rows));

case 0 {$condition = "c_primary = $key";}

case 1 {$condition = "c_secondary = $key";}

}

my $stmt = $tdbh->prepare("update t_update_test set value_to_update = ? where $condition;");

my $start = time;

$stmt->execute(md5_hex(int(rand(65536)) . localtime ));

my $delta = time - $start;

log_time($tdbh, $query, $condition, $delta * 10000);

sleep rand(3) + 1;

}

Results

The results are standing for themselves.

Qtype 0 is the primary key update
Qtype 1 is the secondary key update

select count(*) from query_times; 
+----------+ 
| count(*) | 
+----------+ 
| 4317935 | 
+----------+

select qtype, avg(qtime), min(qtime), max(qtime) from query_times group by qtype; 
+-------+-------------+------------+-------------+ 
| qtype | avg(qtime) | min(qtime) | max(qtime) |  
+-------+-------------+------------+-------------+  
| 0 | 59.45557799 | 14.2694 | 133739.8100 | 
| 1 | 56.37420515 | 13.5088 | 133559.5894 | 
+-------+-------------+------------+-------------+

select count(*) from query_times;

+----------+

| count(*) |

+----------+

| 4317935 |

+----------+

select qtype, avg(qtime), min(qtime), max(qtime) from query_times group by qtype;

+-------+-------------+------------+-------------+

+-------+-------------+------------+-------------+

| 0 | 59.45557799 | 14.2694 | 133739.8100 |

| 1 | 56.37420515 | 13.5088 | 133559.5894 |

+-------+-------------+------------+-------------+

So the average execution time for updates by primary key lookups is 59.46 milisecs while updates by secondary keys took in average 56.37 milisecs.

Final words

This was an empirical experiment. I didn’t want to solve any other issue than prove that secondary keys cost no more than primary key for updates which I managed to do.

You might like these too

Full text search in MySQL Full text is a critical point when it comes to mysql. It used to have that feature in MyISAM but that's not really maintained anymore nor it is advise...

Unindexed queries can be really expensive The story happened with a webshop application running on Amazon EC2 microinstances. Actually on two instance. Amazon business model is basically simpl...

Sleeping connections Why is it so important to close connections to databases if there's no explicit need to keep them open (which usually the case)? Symptoms Icinga ...

MySQL Benchmark – updates by primary vs secondary keys

Thesis

Antithesis

Synthesis

The test

Schema

Testing

Results

Final words

You might like these too

You may also like...

Categories

Links

Recent comments

Tweets

MySQL Benchmark – updates by primary vs secondary keys

Thesis

Antithesis

Synthesis

The test

Schema

Testing

Results

Final words

You might like these too

You may also like...

Group by limit per group in PostgreSQL

Difference between DISTINCT and GROUP BY

Python MySQLdb vs mysql-connector query performance

Categories

Links

Recent comments

Tweets