I was just reading Uday Gajendar’s post on why designers don’t like A/B testing. Uday feels that, as a designer, his job is to uphold aesthetic integrity while keeping business metrics in mind. The prevalence of A/B testing, he says, has the effect of diluting a strong design into an “unsightly pastiche of uneven incrementalism.”
Though I have no experience as a designer, I can’t help but disagree with his way of looking at A/B testing. Uday and many of his peers believe that a design must remain thoroughly consistent with its creator’s vision and intent in order for it to accomplish its purpose. He asks that we place more trust in the implicit data from designers’ experience and pattern-recognition as they implement their visions.
But let’s get real. A lot of UI design, just like a lot of engineering design, involves a larger goal supported by many small, arbitrary decisions that could easily take one of several options. Yes, there are lots of choices where a designer’s experience should be trusted completely in rendering an artistic vision. But for every one ‘vision’ decision, I’m willing to bet there are 2-3 choices in which the designer has no strong preference but says ‘I like that better’ and runs with one option.
Painting these small decisions as central to a design seems a bit too auteurist for me. Color me prosaic, but the same design could very easily exist in millions of slight variations, changing one or more of the arbitrary choices. Digging in and refusing to change an original design — in light of evidence that another design accomplishes its purpose better — seems like escalation of commitment.
So it makes perfect sense that designers would want to avoid A/B testing. Changing a design or having to fight for one’s decisions is a lot of work, and I can easily see how it would start feeling like red tape. Why would designers ever want to engage in a process that actively seeks out ways to create more work and question their assumptions? Instead, much easier to stay in the trenches and fight it out.
It’s worth note that good management practice demands prioritization. Optimizing a design isn’t worth the time when more important things still need to be created, and the A/B testing process does take some time. But evidence suggests that qualms about A/B are based on two misconceptions about the right way to apply it. Great design by data is a result of deeply mapping out a solution space, and rapidly testing options with a well-formed experimental design.
Generate lots of possibilities
Web and software designers like Uday often cite examples of truly innovative design in hardware as evidence that a great design need not be A/B tested. Apple for instance did not A/B test the iPhone. But they forget that these products have been extensively prototyped. The absence of A/B testing isn’t for lack of desire to test, but the dynamics of hardware businesses. Apple can’t afford to A/B test a phone because their distribution involves manufacturing, sales training, and so on. So instead, they make tens if not hundreds of prototypes before arriving at features that will ship.
So to Apple, A/B testing happens on a regular release cycle. They only have one bullet in the chamber and reloading takes 6-12 months, so they choose wisely of many prototypes. But for web companies, we have the advantage of rapid deployment and the ability to live-test multiple potential options in a production environment. It’s like comparing digital photography to analog. Maybe analog photographers developed more finesse taking ten minutes to set up a perfect shot, but digital photographers achieve better outcomes faster taking lots of small variations on the same shot and picking the best one.
The key here is lots of variations. It turns out, all types of innovation benefit from simply coming up with more ideas. In Innovation Tournaments, my design professor Karl Ulrich wrote about the massive marginal returns to producing more raw opportunities. The curve below depicting best opportunity vs quantity of opportunities generated doesn’t level off until between 70-100 options have been created.
That’s a massive challenge to the “trust my vision” type of design thinking. Though it’s a lot of extra cognitive effort, designers need the discipline to push way harder than the average 1-5 ideas for a UI. If the cost to generating many small variations on ideas is low (which it is with web and automated tools), and we have a good process for evaluating their quality, then we should push to have a lot more ideas than we would generate if relying on intuition alone.
Structuring an experimental design
In his post, Uday asks: “A/B testing locks you into just two comparative options, an exclusively binary (and thus limited) way of thinking. What about C or D or Z or some other alternatives?”
After generating lots of options, we need an effective way to test them. Fortunately, as others replied in the comments, multivariate testing is possible, using different permutations and combinations of feature choices. The key is to test out several orthogonal groups of options together, and use multivariate analysis to see which groups of options worked best. Then test them together, and test again.
Finally, and most importantly, a designer’s tacit knowledge should always be the final arbiter of these decisions. After exploring the solution space by generating many, many possible options, and then seeing user behavior under options tested, a designer can be armed with the right data to make an informed selection.
A design as a whole can’t be right or wrong, but small arbitrary decisions between different options can be. Designers can improve their practice by learning to use the A/B/n test to rapidly improve their designs. It shouldn’t be applied by business or engineering as a litmus test against designers, but rather demanded and owned by the designers themselves. Beautiful, useful designs — an inherently emotional thing — can be arrived at through the right combination of design and data.