Multi-variant: 0.5% vs 5.0% vs 5.2%
Payoff
Roughly the same results as before but because AB has now two good versions instead of one against the one less converting the regret drop by ~50%. Here again Epsilon greedy comes out winning with the highest reward closely followed by UCB1 (~7.8% difference).
Due to the low certainty which you can see below we cannot stop the experiment early so there’s no room for improvement here.
Certainty
Because of the two best version are very close to each other the certainty is growing very slowly. Event though AB was back and forth in the confidence interval for a while it’s never actually stayed there.
Run behaviour
Same as before. AB testing showed all the versions equally during the timespan of the experiment which is a direct cause of the higher regret factor. Epsilon greedy came out as winner because it quickly eliminated the worse performing version and showing predominantly the 5.2% converting version. In contrast to that the UCB1 eliminated only the 0.5% version and keep testing the two higher performing variation.
Translating this to a real example. We had a new feature which proved to be very efficient quickly but we don’t know which design works best with the feature. The UCB1 behaviour is what’s closest to the natural execution of such cases.
Epsilon greedy errors
Epsilon greedy is very prone to such cases where anomalies in conversions can really make it go south. Sometimes certain variations “are lucky” and turn up more than they will in long term.
These are real execution results by running epsilon greedy more and more the results are not deterministic and can result in very different outcome.
Overcompensated run
The deviation in conversion rate is quite high therefore epsilon greedy is failing to determine best version and since the second best is being shown less and less it takes longer and longer to come out from such a phase. The two best versions are alternating.
Bad change
Due to a lucky run of the 5% variation it’s being chosen over the 5.2% version and Epsilon greedy keeps serving that. The low amount of relevant results on the better version causes Epsilon greedy to never realize the mistake.
Completely off
The worst case when the2 better version is being considered worse from the start and such cases are “silent errors” because even with human eyes looking at it the error is not obvious comparing to the previous two error scenario.
Recent comments