Multi variant AB testing vs Multi-Armed bandit

Summary and conclusion

Reward

		Two version		Multi-variant
		1.6% vs 2%	0.5% vs 5%	0.5% vs 5% vs 5.2%	3% vs 5% vs 7%
Full	AB	419	668	957	1272
	Epsilon	451	1135	1283	1611
	UCB1	418	1047	1184	1349

Stop	AB		1288		1720
	Epsilon		1230		1784
	UCB1		1262		1800

Conclusion

There’s no one silver bullet for every case. You need to come up with your priorities and the common use cases.

If you always compare only a control and a trial version than most likely tradition AB testing will give you the best results.

If you want to test multiple versions than go for a multi-armed bandit model. The regret factor of possibly running an inferior version for the sake of distinguish the two better performing variation is too high for those cases. I personally would go for the UCB1 for the higher confidence even though it may have higher regret factor compared to Epsilon greedy which we saw can be very error-prone to anomalies. And anomalies will happen especially because we talk about real users from real world not random probabilities.

Also important to decide how much you want to supervise your experiments. If you cannot keep a close eye on them or just want to run them for undetermined time unsupervised multi-armed bandits are the way to go.

At the end it comes down to never believe in other people’s results. Test your own scenarios with the

[1] http://tech.gc.com/bayesian-statistics-and-multi-armed-bandits/

[2] http://blog.yhathq.com/posts/the-beer-bandit.html

[3] https://www.chrisstucchio.com/blog/2012/bandit_algorithms_vs_ab.html

Multi variant AB testing vs Multi-Armed bandit

Summary and conclusion

Reward

Conclusion

You may also like...

Categories

Links

Recent comments

Tweets

Multi variant AB testing vs Multi-Armed bandit

Summary and conclusion

Reward

Conclusion

You may also like...

Python / Django test and benchmark

RDS apply pending modifications

Flashcache stats collector for Diamond / Graphite

Categories

Links

Recent comments

Tweets