1.6% vs 2%
As it was expected epsilon greedy has the highest payoff no the price of reaching 95% percent confidence later than UCB1 did.
Here AB never reaches certainty of 95% percent whereas UCB1 gets to there the quickest. It worth to note that AB has a point when it was close to 95% though it later dropped to 60% ish.
The following plots show the runtime behaviour of each model which version was shown to the user. The Epsilon greedy quickly converged to the highest payoff model.
The UCB 1 model sometimes alternates between the two version with a general preference towards the 0.02 version which is expected and natural given the two version are close to each other hence to reach confidence level requires much more probing.