MySQL Benchmark – updates by primary vs secondary keys

(Note: when I’m talking about MySQL I usually assume InnoDB storage engine. Any other case I explicitly tell this is MyISAM or Memory etc.)

I’ve heared an interesting aproach of using Master-slave replication in MySQL.

Thesis

So the theory was that since updates by primary keys are fast and by secondary keys are slow the slave has to be queried for the primary key and then run the updates by the fetched primary keys. To make this in context and more understandable:

Original query

This query get splitted to two different query. First query has to be run on the slave to fetch the primary keys:

When we have the values we can go to the master and update the necessary records.

Antithesis

What’s my problem with this? First of all this creates unnessessary complexity to the application and extra load on the slave server(s). This is the typical case when the developer tries to outsmart MySQL. I saw sometimes working examples but those are the exceptions not the rule. How MySQL and InnoDB works primary key lookups are indeed lightning fast. But aren’t secondary key lookups good enough? I don’t think so. Especially with the adaptive hash buffer (will go into details in another post) if the memory is comparable with the size of the dataset or at least with total size of the data hot spots the secondary key lookups with covering index is as good as primary keys.

Synthesis

The test

So I (again) decided to create a benchmark suite specifically for testing this. It can be found on github.

Schema

 Testing

After loading data to the table I started 16 threads to update random rows with random data in random time. During the test occasionally I executed some huge query in the MySQL too (usually with full scan needed). Some of them on the same table, some of them on the same database but other tables and some of them on a different database. My goal was to model a usual pattern of a database server which is exposed to heavy duty use.

 Results

The results are standing for themselves.

  • Qtype 0 is the primary key update
  • Qtype 1 is the secondary key update

So the average execution time for updates by primary key lookups  is 59.46 milisecs while updates by secondary keys took in average 56.37 milisecs.

Final words

This was an empirical experiment. I didn’t want to solve any other issue than prove that secondary keys cost no more than primary key for updates which I managed to do.

You might like these too

Difference between DISTINCT and GROUP BY Today we had an interesting situation where the same query was executed significantly slower when it was written with GROUP BY instead of DISTINCT and...
Unindexed queries can be really expensive The story happened with a webshop application running on Amazon EC2 microinstances. Actually on two instance. Amazon business model is basically simpl...
How to troubleshoot MySQL replication issues? In MySQL a big portion of the problems you're facing is different replication delays so there's no surprise this is one of the most common interview q...