Symptoms
We had a webserver running smoothly until extreme huge traffic appeared on the bond0 interface shown in cacti. The traffic was near the double of the loadbalancer’s in- and outbound traffic. The memcached was continuously rejecting new connections.
Investigation
Memcache hit/miss ratio was almost perfect 99% / 1%. However cmd_set was 95% of cmd_get which (usually) means poor validation in code. But this way the hit miss ratio should have been fifty-fifty according to the almost same cmd_get and cmd_set vounter value. What happened?
We dump the traffic on bond0 interface to be able to see what’s happening.
TCPDUMP
1 2 3 4 5 6 7 8 |
17:43:21.442340 IP webserver12.xxx.11212 > webserver10.xxx.50790: Flags [.], ack 27349, win 457, length 0 17:43:21.442357 IP webserver12.xxx.11212 > webserver10.xxx.50790: Flags [.], ack 28809, win 480, length 0 17:43:21.442368 IP webserver12.xxx.11212 > webserver10.xxx.50790: Flags [.], ack 28809, win 480, options [nop,nop,sack 1 {20617:22077}], length 0 17:43:21.442422 IP webserver10.xxx.50790 > webserver12.xxx.11212: Flags [P.], seq 28809:29526, ack 29507, win 217, length 717 17:43:21.442434 IP webserver12.xxx.11212 > webserver10.xxx.50790: Flags [.], ack 29526, win 501, length 0 17:43:21.442473 IP webserver12.xxx.11212 > webserver10.xxx.50790: Flags [P.], seq 29507:29515, ack 29526, win 501, length 8 17:43:21.443572 IP webserver10.xxx.50790 > webserver12.xxx.11212: Flags [F.], seq 29526, ack 29515, win 217, length 0 17:43:21.443644 IP webserver12.xxx.11212 > webserver10.xxx.50790: Flags [F.], seq 29515, ack 29527, win 501, length 0 |
Root cause
This could happen only if the key exists in memcache but the data doesn’t. In a scenario like this client get the key.
Example code
1 2 3 4 5 6 7 |
$value = $memcache->get('key'); if(!$value){ $value = some_function(); $memcache->set('key', $value); } return $value; |
- The developer asked the memcache for the key (line 1) – cmd_get
The memcache answers with a 0 byte value – hit - The code has a check condition to ensure wether value is ok or not (line 2).
On most language (this situation happened with php) 0 byte value is false in boolean cast - So the program generate the data again (line 3)
- and (try to) rewrite it to the memcache (line 4) – cmd_set
This way it is possible to have very good hit/miss rate even if every get will trigger a set command for memcached. But why wasn’t the value there? Why was it 0 byte? The memcache has a 1MB limitation for values. This can be change by recompiling memcached but it is not recommended. Memcached is for a lot of small data to get as fast as possible.
Because of a little mistype in configuration a very big ( >1Mbyte ) value was to write in memcached for a certain key. The slab was allocated and the key came into being. But writing of the value’s content failed due to the 1MB limitation.
Solution
After the configuration was fixed the big values was cached to files and all the graphs got back to normal.
Recent comments