1.6% vs 2%
As it was expected epsilon greedy has the highest payoff no the price of reaching 95% percent confidence later than UCB1 did.
Here AB never reaches certainty of 95% percent whereas UCB1 gets to there the quickest. It worth to note that AB has a point when it was close to 95% though it later dropped to 60% ish.
Run behaviour
The following plots show the runtime behaviour of each model which version was shown to the user. The Epsilon greedy quickly converged to the highest payoff model.
The UCB 1 model sometimes alternates between the two version with a general preference towards the 0.02 version which is expected and natural given the two version are close to each other hence to reach confidence level requires much more probing.
Recent comments