0.5% vs 5%
This is a bit tricky because of the high confidence level all the test could have been stopped much earlier and it wasn’t necessary to run through the 25000 visitor. You can see the result at the bottom what happens if we stop the experiment when the confidence reaches 95%.
Due to the high difference in the two version each model converges very quickly to 100% confidence. Surprisingly UCB1 even faster than AB testing and Epsilon greedy is the slowest but still in less than ~500 visitor is enough for a statistically relevant result.
If we let it run through the 25000 visitors the show frequency will be as the following.
The Epsilon greedy run plot shows very well why it yields the best payoff and the least regret factor in contrast to the AB testing. In such case when the difference is 10 fold the epsilon greedy’s behaviour of spending a short time exploring and more time in exploiting pays off very well while AB test shows the inferior version that doesn’t convert users to revenue longer (in case not stopped earlier).
The UCB1 correctly determines the superior version and shows it more and more with some further probing on the way.
Stopping when confidence level reached
Here Epsilon greedy became the most expensive and exactly because of it’s behaviour of serving regardless of the conversion rate only depending on the epsilon decay parameter. In such extreme cases AB testing and UCB1 are clearly superior with a slight extra for AB testing. However it’s important to note that the difference is almost negligible in. Under 5% between the worst and the best case.