Python MySQLdb vs mysql-connector query performance

There are a lot of python driver available for MySQL and two stand out the most. The one, traditionally everybody’s choice, sort of industrial standard MySQLdb. It uses a C module to link to MySQL’s client library. Oracle’s mysql-connector on the other hand is pure python so no MySQL libraries and no compilation is necessary. But what is the price for such comfort?

Benchmark

To test that I’ve setup a very simple test case using the world sample database.

Test case single PK lookup

querying random cities by primary key using random from 1 to 8000. Max id in the City table is around 4000 so roughly half of the queries will run into no results and half will produce exactly one row. I did this to see if there is any difference between the two.

I ran 10000 random queries and collected the response times which are resulted in the following graph. On the X axis you can see the primary key value and Y axis shows the response time in ms.

Query times for random PK using MySQLdb and mysql-connector

The difference is quite significant. Most of the query are taking more than two times longer using mysql-connector. It is also much more scattered.

	min	max	avg	stddev
mysql-connector	0.66	154.3	1.88	11.75
MySQLdb	0.26	1.56	0.39	0.14

Even the minimum time is ~2.5 times bigger using the mysql-connector and the max time is two order of magnitude higher.

Still quick query but larger dataset

I still use primary key lookups but now using id between [lower_bound] and [upper_bound] to produce random number of rows.

X axis it the number of rows returned and Y is the query time. Y maximised at 50 ms to focus on the interesting part.

Query times for larger dataset using MySQLdb and mysql-connector

The difference is even more obvious here. With both driver the query time increases roughly linearly but with mysql-connector the growth is much more steep and more scattered again.

	min	max	avg
mysql-connector	1	177.92	48.44
MySQLdb	0.57	20.25	5.33

Since we’re having a linear growth here standard deviation doesn’t make too much sense to look at.

Where does the difference come from?

Using cProfile we can check where each version spends the most time. Now running the queries for 100 times.

$ python -m cProfile -s cumulative mysql_driver_benchmark.py

1	$ python -m cProfile -s cumulative mysql_driver_benchmark.py

cProfile output for mysql-connector

ncalls tottime percall cumtime percall filename:lineno(function)
1      0.015 0.015 7.617 7.617 mysql_driver_benchmark.py:1(<module>)
1      0.000 0.000 6.747 6.747 mysql_driver_benchmark.py:39(main)
1      0.013 0.013 6.297 6.297 mysql_driver_benchmark.py:26(benchmark)
100    0.078 0.001 6.283 0.063 mysql_driver_benchmark.py:19(query)
100    0.169 0.002 6.040 0.060 cursor.py:822(fetchall)
160570 1.351 0.000 3.152 0.000 conversion.py:363(row_to_python)
100    0.001 0.000 2.689 0.027 connection.py:655(get_rows)
100    0.376 0.004 2.688 0.027 protocol.py:292(read_text_result)
161474 0.774 0.000 1.611 0.000 network.py:219(recv_plain)
481710 0.548 0.000 1.498 0.000 conversion.py:537(_STRING_to_python)
482211 0.290 0.000 0.818 0.000 {method 'decode' of 'bytearray' objects}

ncalls tottime percall cumtime percall filename:lineno(function)

1 0.015 0.015 7.617 7.617 mysql_driver_benchmark.py:1(<module>)

1 0.000 0.000 6.747 6.747 mysql_driver_benchmark.py:39(main)

1 0.013 0.013 6.297 6.297 mysql_driver_benchmark.py:26(benchmark)

100 0.078 0.001 6.283 0.063 mysql_driver_benchmark.py:19(query)

100 0.169 0.002 6.040 0.060 cursor.py:822(fetchall)

160570 1.351 0.000 3.152 0.000 conversion.py:363(row_to_python)

100 0.001 0.000 2.689 0.027 connection.py:655(get_rows)

100 0.376 0.004 2.688 0.027 protocol.py:292(read_text_result)

161474 0.774 0.000 1.611 0.000 network.py:219(recv_plain)

481710 0.548 0.000 1.498 0.000 conversion.py:537(_STRING_to_python)

482211 0.290 0.000 0.818 0.000 {method 'decode' of 'bytearray' objects}

cProfile output for MySQLdb

ncalls tottime percall cumtime percall filename:lineno(function)
1      0.004 0.004 0.930 0.930 mysql_driver_benchmark.py:1(<module>)
1      0.000 0.000 0.635 0.635 mysql_driver_benchmark.py:39(main)
1      0.008 0.008 0.435 0.435 mysql_driver_benchmark.py:26(benchmark)
100    0.000 0.000 0.425 0.004 mysql_driver_benchmark.py:19(query)
100    0.001 0.000 0.424 0.004 cursors.py:164(execute)
100    0.000 0.000 0.421 0.004 cursors.py:353(_query)
100    0.001 0.000 0.241 0.002 cursors.py:315(_do_query)
100    0.001 0.000 0.180 0.002 cursors.py:358(_post_get_result)
100    0.000 0.000 0.180 0.002 cursors.py:324(_fetch_row)
100    0.180 0.002 0.180 0.002 {built-in method fetch_row}
100    0.001 0.000 0.173 0.002 cursors.py:142(_do_get_result)
100    0.000 0.000 0.171 0.002 cursors.py:351(_get_result)

ncalls tottime percall cumtime percall filename:lineno(function)

1 0.004 0.004 0.930 0.930 mysql_driver_benchmark.py:1(<module>)

1 0.000 0.000 0.635 0.635 mysql_driver_benchmark.py:39(main)

1 0.008 0.008 0.435 0.435 mysql_driver_benchmark.py:26(benchmark)

100 0.000 0.000 0.425 0.004 mysql_driver_benchmark.py:19(query)

100 0.001 0.000 0.424 0.004 cursors.py:164(execute)

100 0.000 0.000 0.421 0.004 cursors.py:353(_query)

100 0.001 0.000 0.241 0.002 cursors.py:315(_do_query)

100 0.001 0.000 0.180 0.002 cursors.py:358(_post_get_result)

100 0.000 0.000 0.180 0.002 cursors.py:324(_fetch_row)

100 0.180 0.002 0.180 0.002 {built-in method fetch_row}

100 0.001 0.000 0.173 0.002 cursors.py:142(_do_get_result)

100 0.000 0.000 0.171 0.002 cursors.py:351(_get_result)

The difference is quite obvious. Now python is doing the heavy lifting parsing and interpreting the data received from the socket while MySQLdb offloads that to the C library. This makes large dataset providing queries even less efficient and the difference is growing with the number of returned rows.

Slow(er) queries

Running SELECT sleep(0.1) queries 100 times results in the following stats. Values are in milliseconds.

					confidence interval
	min	max	avg	stddev	low	high
mysql-connector	101,11	106,72	102,29	0,73	100,86	103,72
MySQLdb	100,33	104,73	101,46	0,65	100,19	102,73

There is a slight difference between the averages but as the confidence interval shows there’s no statistically significant difference so we can treat the two as equals.

Conclusion

If you’re having a lot of primary key lookups or other type of quick point queries this difference can be quite significant and probably worth the extra effort to maintain those C binding.

If you query for larger dataset having an efficient way to parse is essential and while querying couple of thousand rows frequently might not be a common task to do the question boils down to if you can afford 20-30 ms extra time in such cases.

However if your queries are not in the millisecond range you probably won’t notice a thing as the overhead gets diminished with longer query runs and becomes insignificant around and above ~100 ms. In this case tune your queries first!

You might like these too

5 things to love about python Syntax Takes a bit of time to get used to it but after a certain point it just becomes natural. Like english. No heavy-weight operators or complicate...

BitTornado library forked by me on GitHub I just forked and committed some patches to the BitTornado library on GitHub. https://github.com/charlesnagy/BitTornado Right now I only committ...

Dataimport handler for Sunburnt Solr python librar... Dataimport handler I had to trigger the dataimporter delta-import command from the code so I added support for this function. Example: Followin...

Python MySQLdb vs mysql-connector query performance

Benchmark

Test case single PK lookup

Still quick query but larger dataset

Where does the difference come from?

cProfile output for mysql-connector

cProfile output for MySQLdb

Slow(er) queries

Conclusion

You might like these too

You may also like...

Categories

Links

Recent comments

Tweets

Python MySQLdb vs mysql-connector query performance

Benchmark

Test case single PK lookup

Still quick query but larger dataset

Where does the difference come from?

cProfile output for mysql-connector

cProfile output for MySQLdb

Slow(er) queries

Conclusion

You might like these too

You may also like...

RDS apply pending modifications

Python error: AttributeError: __exit__

Lvm status and statistics collector for Diamond/Graphite

Categories

Links

Recent comments

Tweets

Python error: AttributeError: exit