Multi variant AB testing vs Multi-Armed bandit

Summary and conclusion

Reward

Two version Multi-variant
1.6% vs 2% 0.5% vs 5% 0.5% vs 5% vs 5.2% 3% vs 5% vs 7%
Full AB 419 668 957 1272
Epsilon 451 1135 1283 1611
UCB1 418 1047 1184 1349
Stop AB 1288 1720
Epsilon 1230 1784
UCB1 1262 1800

Conclusion

There’s no one silver bullet for every case. You need to come up with your priorities and the common use cases.

If you always compare only a control and a trial version than most likely tradition AB testing will give you the best results.

If you want to test multiple versions than go for a multi-armed bandit model. The regret factor of possibly running an inferior version for the sake of distinguish the two better performing variation is too high for those cases. I personally would go for the UCB1 for the higher confidence even though it may have higher regret factor compared to Epsilon greedy which we saw can be very error-prone to anomalies. And anomalies will happen especially because we talk about real users from real world not random probabilities.

Also important to decide how much you want to supervise your experiments. If you cannot keep a close eye on them or just want to run them for undetermined time unsupervised multi-armed bandits are the way to go.

At the end it comes down to never believe in other people’s results. Test your own scenarios with the

[1] http://tech.gc.com/bayesian-statistics-and-multi-armed-bandits/

[2] http://blog.yhathq.com/posts/the-beer-bandit.html

[3] https://www.chrisstucchio.com/blog/2012/bandit_algorithms_vs_ab.html