Memcache anomaly

Symptoms

We had a webserver running smoothly until extreme huge traffic appeared on the bond0 interface shown in cacti. The traffic was near the double of the loadbalancer’s in- and outbound traffic. The memcached was continuously rejecting new connections.

traffic on bond0

Bond0 traffic - daily view

traffic on bond0 week view

Bond0 traffic - Weekly view

Investigation

Memcache hit/miss ratio was almost perfect 99% / 1%. However cmd_set was 95% of cmd_get which (usually) means poor validation in code. But this way the hit miss ratio should have been fifty-fifty according to the almost same cmd_get and cmd_set vounter value. What happened?

We dump the traffic on bond0 interface to be able to see what’s happening.

TCPDUMP

Root cause

This could happen only if the key exists in memcache but the data doesn’t. In a scenario like this client get the key.

Example code

  1. The developer asked the memcache for the key (line 1) – cmd_get
    The memcache answers with a 0 byte value – hit
  2. The code has a check condition to ensure wether value is ok or not (line 2).
    On most language (this situation happened with php) 0 byte value is false in boolean cast
  3. So the program generate the data again (line 3)
  4. and (try to) rewrite it to the memcache (line 4) – cmd_set

This way it is possible to have very good hit/miss rate even if every get will trigger a set command for memcached. But why wasn’t the value there? Why was it 0 byte? The memcache has a 1MB limitation for values. This can be change by recompiling memcached but it is not recommended. Memcached is for a lot of small data to get as fast as possible.

Because of a little mistype in configuration a very big ( >1Mbyte ) value was to write in memcached for a certain key. The slab was allocated and the key came into being. But writing of the value’s content failed due to the 1MB limitation.

Slabs in phpMCAdmin

Slabs in phpMCAdmin - "allocated but empty"

Solution

After the configuration was fixed the big values was cached to files and all the graphs got back to normal.

You might like these too

Unindexed queries can be really expensive The story happened with a webshop application running on Amazon EC2 microinstances. Actually on two instance. Amazon business model is basically simpl...