Multi variant AB testing vs Multi-Armed bandit

Multi-variant: 0.5% vs 5.0% vs 5.2%

Payoff

Payoff from variation 0.5% vs 5% vs 5.2%

Roughly the same results as before but because AB has now two good versions instead of one against the one less converting the regret drop by ~50%. Here again Epsilon greedy comes out winning with the highest reward closely followed by UCB1 (~7.8% difference).

Due to the low certainty which you can see below we cannot stop the experiment early so there’s no room for improvement here.

Certainty

Certainty for 0.5% vs 5% vs 5.2%

Because of the two best version are very close to each other the certainty is growing very slowly. Event though AB was back and forth in the confidence interval for a while it’s never actually stayed there.

Run behaviour

Variation displayed in AB model

Variation displayed in Epsilon greedy

Variation displayed in UCB1

Same as before. AB testing showed all the versions equally during the timespan of the experiment which is a direct cause of the higher regret factor. Epsilon greedy came out as winner because it quickly eliminated the worse performing version and showing predominantly the 5.2% converting version. In contrast to that the UCB1 eliminated only the 0.5% version and keep testing the two higher performing variation.

Translating this to a real example. We had a new feature which proved to be very efficient quickly but we don’t know which design works best with the feature. The UCB1 behaviour is what’s closest to the natural execution of such cases.

Epsilon greedy errors

Epsilon greedy is very prone to such cases where anomalies in conversions can really make it go south. Sometimes certain variations “are lucky” and turn up more than they will in long term.

These are real execution results by running epsilon greedy more and more the results are not deterministic and can result in very different outcome.

Overcompensated run

The deviation in conversion rate is quite high therefore epsilon greedy is failing to determine best version and since the second best is being shown less and less it takes longer and longer to come out from such a phase. The two best versions are alternating.

Epsilon greedy error path 1 – Unable to determine best version

Bad change

Due to a lucky run of the 5% variation it’s being chosen over the 5.2% version and Epsilon greedy keeps serving that. The low amount of relevant results on the better version causes Epsilon greedy to never realize the mistake.

Epsilon greedy error path 2 – Starting with the second best

Completely off

The worst case when the2 better version is being considered worse from the start and such cases are “silent errors” because even with human eyes looking at it the error is not obvious comparing to the previous two error scenario.

Epsilon greedy error path 3 – Completely off

Multi variant AB testing vs Multi-Armed bandit

Multi-variant: 0.5% vs 5.0% vs 5.2%

Payoff

Certainty

Run behaviour