Stampy

Dev Blog

Paperless Post: Now With Machine Learning!

Our interface designers came up with several registration form views, but didn’t know which view would perform the best and wanted to compare them in the field. A/B testing with Google Website Optimizer was kicked around as a solution, but I stumbled across this excellent article by Steve Hanov explaining how to easily implement a “multi-armed bandit” library (we used more than 20 lines). This article is excellent, and I encourge you to implement something similar on your website if you have multiple UIs and doubts about which is the best.

Let’s say you design three pages. You want to make an objective decision about which page is most likely to cause a user that visits it to take an action that you like. You think all the pages are great, but sadly, there is no magic algorithm or application you can send pictures of your great designs to be graded for performance. That’s where multi-armed bandit testing comes in. Wikipedia says: “The multi-armed bandit problem models an agent that simultaneously attempts to acquire new knowledge and to optimize its decisions based on existing knowledge.”

We could run an A/B to determine the success rate of the different registration forms. However, we’d need to leave these views up on our site for a long time to gather data, and probably the results would be inconclusive. Also, we’d be showing a large portion of our users something that doesn’t work well. Instead, we decided to use a machine learning algorithm that exploits the results of the test while the test is still being run.

Multi-armed bandit testing works by choosing the “lever” with the highest expectation of reward. This means that the registration form with the highest probablity of converting a visitor to a registered user is shown the majority of the time. However, we also stir things up a bit and show a random view 10% of the time. This prevents the test from getting trapped at a local maximum (and also guarantees that we don’t only show whichever view first converted a visitor).

This strategy has two benefits. First and foremost, we lose the fewest amount of users to UI that isn’t optimal. After a short initial learning phase, 90% of users are seeing what is currently the best thing you got. Secondly, this serves to verify that the view the system thinks is the best IS actually the best. The test is still running, and gathering data for that particular choice, and it’s gathering the most data around the best choice. You get a high degree of certainty that your registration form works.

Additionally, if after running the test for a while you want to throw another choice into the mix, you can do so without ruining your results because the data you have gathered for each choice are independent of each other.

We hope to use this more widely across the site. In addition to building a library that makes it easy to store trial and reward data in Redis, we also put the data right on our internal dashboard, so that product owners can place bets on which horse is going to win the race.

Here are some of the results for the test running on our splash page. In this test we have three choices, “email”, “facebook”, and “email_form”, which represent three different views we’re displaying to the user. Our implementation yields two statistics for each choice. The first is “successes”, which measures the number of times a choice has been rewarded, meaning in this case that a user registered with that view. The second is “trials”, which is the total number of times that particular choice has been displayed to a user. Percentage is just a measure of successes over trials, giving us the conversion rate for each individual view.

In this case, “email” is the most successful, with a conversion rate of 2.34%. Because it’s currently the most successful view, it’s being shown 90% of the time, which accounts for the large number of trials relative to the number of trials for “facebook”. Looking at these results, we can tell that at some point in recent history “email_form” was the most successful, amassing about 130k trials. This highlights one of the benefits of running multi-armed bandit testing: since “email_form” was the most successful, it was subject to increased scrutiny by the system through more trials and ended up not being the best choice after all.

You may work for a big company, but you couldn’t possibly get your UI in front of more eyeballs without testing like this. 300k users can’t be wrong!

Comments