In a nutshell
• 5 users for qualitative testing is a proven baseline that will yield the most cost-effective and confident results; 20 users will yield similarly confident results for quantitative testing
• The actual number of users varies by situation (e.g. number of tasks, level of task complexity, demographic expertise, etc.) and testing smaller initial numbers of users is recommended to find the most glaring usability problems
• Comparative usability testing has a higher baseline range of 10-12 than singular usability testing’s 5-10, and a higher cap for achieving confident numbers at 25 instead of 20
• The cost/benefit ratio for running usability tests has never been better due to intuitive and reliable remote user testing platforms
Working with a baseline for your usability test
Once you’ve decided to run a usability test, feel good about the tasks, and have the right platform, an important question remains… how many users do I need for reliable data, and how much is that going to cost?
Luckily, the statistical legwork has already been done for determining how many users is enough to feel confident in your usability test results. Here is a good rule-of-thumb often echoed in the user experience design and research community:
20 users for quantitative studies
5 users for qualitative studies
So that offers a good baseline of how many users to consider for usability tests, but how did the community come up with that number, and what about the cost?
The figures behind the numbers
The magic baseline numbers have been in product research since the 20th century, but were notably introduced to the UX industry by Laura Faulkner in 2003 (see her table below showing number of users to minimum and mean % of problems found) and later scrutinized by Jakob Nielsen in 2006.
By working with established data on deviations of web users, coupled with additional testing data ran by the Nielsen Norman Group, and then calculated to remove outliers and margins-of-error, Nielsen’s findings reinforced Faulkner’s 20 users model as providing confident results.
If you want a complete understanding of Nielsen’s math, see the linked article. For top-line information, let’s continue on.
Number of users, minimum % of problems found, and the mean % of problems found. Faulkner, 2003
The reckoning for 20 users for quantitative usability testings comes from the idea that since 6% of users are outliers,19 of the 20 testers should be statistically sound. Nielsen states that using this method will yield “great accuracy half the time” and “good accuracy” the other half of the time.
This isn’t an end-all number, however, and statistics are not without their own sets of problems. In fact, multiple studies since the NN Group have found meaningful results from anywhere between 8 to 25 testers.
Ranges of performance for single interface studies and comparative studies. Macefield, 2009
In the above graph, Ritch Macefield illustrates a key difference between tests designed to find problems with a single interface (problem discovery study), and tests comparing two interfaces against one-another (a comparative study). His findings suggest that there is never an ideal number for any specific test, but there are, as others have found, a reasonable range that varies based on task complexity, types of problems sought, demographic expertise, and other typical outliers.
It is also worth noting that even Nielsen cautions against forgetting about outliers entirely. After all, when we are talking about usability testing, we are not talking about mere blips or glitches in the data, we’re talking about actual humans interacting with your design. Instead, watch the videos of these outliers’ test sessions to find out what caused the drop in their performance results.
Read more: Quantitative vs qualitative testing
The numbers behind the people
For reliable qualitative results, Nielsen, Faulkner, Macefield, and more agree that 5 testers is enough to feel confident due to a diminishing of significant results beyond the first 5. From Nielsen’s research, you will learn the following equation:
N (1-(1- L ) n )
Where N represents the total number of usability problems on a given website (or app) and L represents the number of problems found by a single tester.
In the first 5 testers, you can reliably identify 85% of usability problems, translating to 16% of problems detected per user added. Interestingly, adding in a 6th tester only yields an overall result of about 90%. It isn’t until introducing 15 testers that you confidently achieve 100%.
The cost/value of usability testing
Although the methodologies of these studies are relatively sound (if inherently flawed) and the data they provide is foundational to the industry, something missing and even misleading are the brief discussions of cost and value. The Nielsen study, for example, opens:
“We can define usability in terms of quality metrics, such as learning time, efficiency of use, memorability, user errors, and subjective satisfaction. Sadly, few projects collect such metrics because doing so is expensive:it requires 4 times as many users as simple user testing.”
Nielsen is of course referring to the difficulty and cost associated with quantitative testing versus qualitative testing. However, technology in 2006 (the year of his study) is not comparable to the technology of 2019. With contemporary usability testing platforms such as TryMyUI, the cost-value ratio has never been better for the customer.
We’ve written already about the benefits of remote user testing and how much time, resources, and money it saves compared to traditional user testing (which Nielsen is no-doubt referring to), but within our powerful quantitative suite also comes plenty of room for qualitative analysis.
See also: UX Crowd: Quantifying the Qualitative
So how many testers should I test with?
When considering how many testers any given usability test should have, we typically recommend 10 for a good idea of where your UX stands, and 20 for a near-certainty, echoing the referenced studies. We have deigned our platform and plans around those numbers, and consistently see these baselines offering customers the insightful data they need.
At the end of the day, user experience is too valuable to be overlooked, and even a test with 2 users will give anyone the immediate feedback they need to validate further tests and research.