Hubbry Logo
System usability scaleSystem usability scaleMain
Open search
System usability scale
Community hub
System usability scale
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
System usability scale
System usability scale
from Wikipedia

Strongly
disagree
Strongly
agree
 1  2  3  4  5
1. I think that I would like to use this system frequently.
2. I found the system unnecessarily complex.
3. I thought the system was easy to use.
4. I think that I would need the support of a technical person to be able to use this system.
5. I found the various functions in this system were well integrated.
6. I thought there was too much inconsistency in this system.
7. I would imagine that most people would learn to use this system very quickly.
8. I found the system very cumbersome to use.
9. I felt very confident using the system.
10. I needed to learn a lot of things before I could get going with this system.
Standard version of the system usability scale

In systems engineering, the system usability scale (SUS) is a simple, ten-item attitude Likert scale giving a global view of subjective assessments of usability. It was developed by John Brooke[1] at Digital Equipment Corporation in the UK in 1996 as a tool to be used in usability engineering of electronic office systems.

The usability of a system, as defined by the ISO standard ISO 9241 Part 11, can be measured only by taking into account the context of use of the system—i.e., who is using the system, what they are using it for, and the environment in which they are using it. Furthermore, measurements of usability have several different aspects:

  • effectiveness (can users successfully achieve their objectives)
  • efficiency (how much effort and resources are expended in achieving those objectives)
  • satisfaction (was the experience satisfactory)

Measures of effectiveness and efficiency are also context-specific. Effectiveness in using a system for controlling a continuous industrial process would generally be measured in very different terms to, say, effectiveness in using a text editor. Thus, it can be difficult, if not impossible, to answer the question "is system A more usable than system B", because the measures of effectiveness and efficiency may be very different. However, it can be argued that given a sufficiently high-level definition of subjective assessments of usability, comparisons can be made between systems. The final SUS score can be computed with the following formula[2]:

SUS has generally been seen as providing this type of high-level subjective view of usability and is thus often used in carrying out comparisons of usability between systems. Because it yields a single score on a scale of 0–100, it can be used to compare even systems that are outwardly dissimilar. This one-dimensional aspect of the SUS is both a benefit and a drawback because the questionnaire is necessarily quite general.

Recently, Lewis and Sauro[3] suggested a two-factor orthogonal structure, which practitioners may use to score the SUS on independent Usability and Learnability dimensions. At the same time, Borsci, Federici and Lauriola[4] by an independent analysis confirm the two factors structure of SUS, also showing that those factors (Usability and Learnability) are correlated.

The SUS has been widely used in the evaluation of a range of systems. Bangor, Kortum and Miller[5] have used the scale extensively over a ten-year period and have produced normative data that allow SUS ratings to be positioned relative to other systems. They propose an extension to SUS to provide an adjective rating that correlates with a given score. Based on a review of hundreds of usability studies, Sauro and Lewis[6] proposed a curved grading scale for mean SUS scores.

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The System Usability Scale (SUS) is a standardized, ten-item developed to provide a quick and reliable measure of subjective perceptions of for various systems, products, or interfaces, yielding a single score from 0 to 100 that reflects overall user satisfaction and effectiveness. It focuses on aspects such as ease of use, learnability, and integration of functions, serving as a non-diagnostic tool for and comparing across designs or iterations. SUS was created in the mid-1980s by John Brooke while working at (DEC) in the UK, specifically to evaluate the of the ALL-IN-1 office system within the company's Integrated Office Systems Group. Brooke released the scale into the in 1986 without copyright restrictions, allowing free use and adaptation, which contributed to its widespread adoption. By 2013, it had been cited in over 1,200 publications and applied across diverse domains, including software, websites, mobile apps, and hardware, demonstrating its versatility and enduring relevance. The scale aligns with international standards like ISO 9241-11, emphasizing user satisfaction as a key component of alongside and . The SUS questionnaire alternates between positively and negatively worded statements to reduce response bias, with respondents rating their agreement on a five-point from 1 ("Strongly Disagree") to 5 ("Strongly Agree"). The ten items are:
  1. I think that I would like to use this system frequently.
  2. I found the system unnecessarily complex.
  3. I thought the system was easy to use.
  4. I think that I would need the support of a technical person to be able to use this system.
  5. I found the various functions in this system were well integrated.
  6. I thought there was too much inconsistency in this system.
  7. I would imagine that most people would learn to use this system very quickly.
  8. I found the system very cumbersome to use.
  9. I felt very confident using the system.
  10. I needed to learn a lot of things before I could get going with this system.
Scoring involves adjusting responses—subtracting 1 from positive items (1, 3, 5, 7, 9) and subtracting the response from 5 for negative items (2, 4, 6, 8, 10)—summing the adjusted values, and multiplying by 2.5 to obtain the final 0-100 score, which is not a but a composite index. SUS scores are interpreted relative to normative data, with an average of 68 representing the 50th across thousands of evaluations; scores above 68 indicate above-average , while those below 50 suggest poor performance, and above 80 denote excellent . Extensive research confirms its high reliability, with typically exceeding 0.90 and stable results from as few as 8-12 participants, as well as strong validity through correlations with other measures like the Software Usability Measurement Inventory (r = 0.79). Recent studies, including those on apps and educational technologies up to 2024, continue to validate its applicability and sensitivity across modern contexts, though limitations include potential cultural biases in wording and lack of diagnostic detail for specific issues.

Overview

Definition

The System Usability Scale (SUS) is a standardized, 10-item employing a to assess users' subjective perceptions of a system's . It is designed as a unidimensional tool, capturing an overall impression of ease of use across diverse systems, interfaces, or products without reliance on specific technologies. SUS generates a single composite score ranging from 0 to 100, with higher values reflecting stronger perceived ; notably, this score represents a normalized measure rather than a literal . The questionnaire's brevity and simplicity enable quick administration, making it suitable for in various contexts.

Purpose and Benefits

The System Scale (SUS) was developed to provide a quick, reliable, and low-cost method for assessing subjective perceptions of system , enabling and comparisons across different systems or design iterations over time. This approach allows evaluators to capture a broad snapshot of user satisfaction without the need for extensive qualitative analysis or expert intervention, making it particularly valuable in resource-constrained environments. Key benefits of SUS include its high reliability even with small sample sizes, typically as few as 8-12 participants, which reduces the time and expense associated with large-scale user testing. Administration is straightforward and brief, often taking just 1-2 minutes to complete the 10-item after a user session, facilitating integration into various workflows. Furthermore, SUS demonstrates broad applicability across diverse domains, including software interfaces, websites, hardware devices, and mobile applications, without requiring domain-specific adaptations. SUS scores exhibit strong correlations with established usability metrics, such as the Software Usability Measurement Inventory (SUMI, r = 0.79) and the Website Analysis and Measurement Inventory (WAMMI, r = 0.948), positioning it as a standardized tool for global usability assessments that does not rely on expert observers. This alignment enhances its utility in providing consistent, quantifiable insights that inform design improvements efficiently.

History and Development

Origins

The System Usability Scale (SUS) was developed in 1986 by John Brooke, a usability engineer at (DEC) in Reading, , as part of the company's internal program for evaluating electronic office systems. This effort targeted products like the ALL-IN-1 integrated office system, where subjective user feedback was needed alongside objective task performance data. Brooke's primary motivation was to create a simple, low-cost attitude scale that could supplement task-based by capturing users' overall perceptions of system ease and effectiveness, without demanding expertise in from evaluators or participants. Drawing inspiration from established psychometric approaches, such as Likert scaling, he simplified the design to make it practical for non-experts in industrial settings, where evaluation sessions were often limited to 25-30 minutes. To develop it, Brooke started with an initial pool of 50 potential statements, refining them through informal testing with about 20 colleagues in DEC's office group—ranging from secretaries to programmers—to select items that produced consistent, strong responses. Early pilots of the SUS were conducted by Brooke on DEC products during human factors lab sessions with UK customers, using video equipment to observe interactions, and later extended to portable evaluations at customer sites in the US and Europe. These initial applications revealed the tool's value in generating quick, actionable feedback on interface usability to support product iterations in DEC's phase review process, despite the absence of formal validation at the time.

Publication and Early Adoption

The System Usability Scale (SUS) received its first formal publication in 1996, when John Brooke contributed a chapter titled "SUS: A 'Quick and Dirty' Usability Scale" to the edited volume Usability Evaluation in Industry, published by . This non-peer-reviewed chapter documented the scale, which Brooke had originally developed a decade earlier at and shared informally among colleagues since 1986 to facilitate quick usability comparisons across systems. The publication marked a pivotal step in making SUS accessible beyond internal corporate use, emphasizing its role as a simple, low-cost tool for subjective assessments. Following its release, SUS began gaining traction within the human-computer interaction (HCI) and communities during the late 1990s, primarily through discussions at professional conferences such as those organized by the Usability Professionals' Association (now UXPA) and citations in emerging HCI literature. By the early , the scale had been adopted in a growing number of empirical studies, with researchers applying it to evaluate diverse interfaces and validating its psychometric properties; for instance, James R. Lewis conducted early reliability analyses, reporting a of 0.85 across 77 cases, underscoring its . This period saw SUS referenced in over 100 research works, reflecting its appeal for rapid post-task evaluations in both academic and industrial settings. The inclusion of SUS in Usability Evaluation in Industry—a key resource for practitioners—further propelled its dissemination, positioning it as a for benchmarking perceived without requiring extensive training or resources. Subsequent citations in HCI journals and during the early 2000s solidified its reputation, with studies demonstrating its versatility across product development phases and contributing to its widespread circulation among usability engineers.

Questionnaire Structure

Items and Response Format

The System Usability Scale (SUS) questionnaire comprises ten specific statements that probe users' subjective experiences with a system, focusing on aspects such as ease of use, learnability, and overall satisfaction. These items were originally formulated by John Brooke to provide a simple yet effective tool for gauging perceived across diverse systems. The ten items are worded as follows:
  1. I think that I would like to use this frequently.
  2. I found the unnecessarily complex.
  3. I thought the was easy to use.
  4. I think that I would need the support of a technical person to be able to use this .
  5. I found the various functions in this were well integrated.
  6. I thought there was too much inconsistency in this .
  7. I would imagine that most people would learn to use this very quickly.
  8. I found the very to use.
  9. I felt very confident using the .
  10. I needed to learn a lot of things before I could get going with this .
Respondents evaluate each statement using a five-point , where 1 indicates "Strongly Disagree" and 5 indicates "Strongly Agree," with no neutral midpoint option to encourage decisive responses and reduce . The items alternate between positively worded statements (odd-numbered: 1, 3, 5, 7, 9) and negatively worded statements (even-numbered: 2, 4, 6, 8, 10), a deliberate balanced that mitigates by compelling users to carefully consider each item rather than providing rote answers. Negative items are reverse-coded during subsequent to align with the positive direction of the scale.

Design Rationale

The System Usability Scale (SUS) was designed with 10 items to provide a brief yet reliable assessment of perceived , minimizing respondent fatigue while capturing essential aspects of . This structure emerged from an initial pool of 50 statements, from which the 10 were selected based on their strong intercorrelations (ranging from r = ±0.7 to ±0.9) and ability to elicit extreme responses when evaluating highly usable versus unusable systems. The choice of 10 items was influenced by the need for a "quick and dirty" tool suitable for rapid evaluations, drawing inspiration from ISO 9241-11, which defines through , , and satisfaction, but simplifying the focus to subjective satisfaction as a proxy for overall . To mitigate common response biases such as or extreme responding, the SUS incorporates alternating positively and negatively worded items, requiring participants to engage thoughtfully with each statement rather than defaulting to agreement patterns. Odd-numbered items (1, 3, 5, 7, 9) probe positive attitudes toward the system, while even-numbered items (2, 4, 6, 8, 10) address frustrations or shortcomings, thereby enhancing the scale's sensitivity to nuanced user perceptions. Although the SUS was intentionally constructed as a unidimensional measure of overall perceived , subsequent factor analyses have identified a two-factor structure consisting of (items 1, 2, 3, 5, 6, 7, 8, 9) and learnability (items 4, 10). This provides simplicity for global assessments while maintaining high , as demonstrated by exceeding 0.90.

Administration and Scoring

Administration Guidelines

The System Scale (SUS) is typically administered after users have interacted with the system in a structured session, allowing sufficient time to form informed opinions on its usability. Guidelines recommend conducting the assessment post-task or post-session, following at least 20-30 minutes of hands-on use to ensure participants have experienced key features without undue fatigue. Administrators should provide clear and neutral instructions to participants, such as "Please rate your experience using the system just now" or "Answer based on your overall impressions," to minimize and promote honest responses. The 10-item can be delivered via paper forms in moderated lab settings, online surveys for remote or unmoderated tests, or verbally for populations with needs, such as those with low or disabilities. Ensuring —by not collecting identifying information unless necessary—further encourages candid feedback without fear of repercussions. Best practices emphasize recruiting representative end-users rather than technical experts or internal stakeholders, as the scale measures subjective perceptions from the target audience's perspective. Aim for a minimum sample size of 5 participants to detect initial trends in , though 12 or more is ideal for greater reliability in identifying patterns across responses. Avoid any leading questions, comments, or influences during or immediately before administration that could sway opinions, and verify all items are completed to maintain .

Scoring Calculation

The System Usability Scale (SUS) score is computed by first adjusting the responses from each of the 10 questionnaire items to a common 0-4 scale, then summing these adjusted contributions and scaling the total to a 0-100 range. For the five odd-numbered items (1, 3, 5, 7, and 9), which are positively worded, the contribution is calculated as the user's response minus 1; responses are on a 5-point from 1 (strongly disagree) to 5 (strongly agree), so this adjustment yields values from 0 to 4. For the five even-numbered items (2, 4, 6, 8, and 10), which are negatively worded, the contribution is 5 minus the user's response, again resulting in a 0-4 range after reversal. The adjusted contributions from all 10 items are then summed, producing a total ranging from 0 to 40. This sum is multiplied by 2.5 to obtain the final SUS score on a 0-100 scale. The can be expressed as: SUS Score=(i=110adjusted contributioni)×2.5\text{SUS Score} = \left( \sum_{i=1}^{10} \text{adjusted contribution}_i \right) \times 2.5 All items are treated equally with no weighting applied to individual contributions. For group results, such as in studies involving multiple participants, the SUS score is calculated individually for each respondent before averaging the scores across the group to yield a mean SUS value.

Interpretation

Score Ranges and Benchmarks

The System Usability Scale (SUS) scores, ranging from 0 to 100, are interpreted using adjective ratings derived from empirical mapping to user perceptions, based on data from over 2,300 participants across multiple studies. These ratings provide a qualitative framework for understanding score implications, with boundaries established through percentile rankings and mean associations from large-scale validations. Specifically, scores of 0-25 are rated as "Worst Imaginable," 25-40 as "Awful," 40-60 as "Poor," 60-70 as "OK," 70-80 as "Good," 80-90 as "Excellent," and 90-100 as "Best Imaginable." These categories reflect significant differences in perceived usability, except between "Worst Imaginable" and "Awful," which show overlapping means around 20-32. SUS scores follow a , enabling reliable -based comparisons across studies and products. A global benchmark average is approximately 68, drawn from analyses of over 5,000 responses across nearly 500 studies spanning more than 30 years of data collection. Scores above 80 indicate above-average , positioning a product in the top 15% of evaluated systems, while scores below 68 fall below this norm. For cross-study comparisons, SUS ranks are recommended over raw scores to account for contextual variations. Domain-specific norms adjust these benchmarks to reflect category expectations; for example, software interfaces average around 70, while websites typically score about 65, based on aggregated data from over 1,000 web evaluations and various graphical user interfaces. These norms, derived from diverse product types including cell phones, customer premise equipment, and systems, underscore the importance of contextual for accurate interpretation.
Adjective RatingSUS Score RangeExample Mean from Validation
Best Imaginable90-10090.9
Excellent80-9085.4
Good70-8071.3
OK60-7050.9
Poor40-6046.2
Awful25-4031.7
Worst Imaginable0-2521.5
This table summarizes the adjective ratings with ranges and validation means from a study correlating SUS scores with a 7-point adjective scale (r=0.822).

Reliability and Validity

The System Usability Scale (SUS) exhibits strong reliability, with internal consistency coefficients (Cronbach's alpha) typically ranging from 0.85 to 0.95 across empirical studies evaluating diverse systems. Test-retest reliability is also high, with intraclass correlation coefficients around 0.90 reported over short intervals in validation research. Regarding validity, the SUS shows through moderate to strong correlations with objective metrics, including task completion times and error rates, as well as subjective scales like the Task Load Index (correlations approximately r=0.60). is supported by factor analyses confirming the scale's unidimensionality, with a single primary factor accounting for the majority of variance in perceived . The psychometric properties of the SUS have been validated in over 1,200 peer-reviewed publications since , spanning various domains and user populations. Cross-cultural adaptations further affirm its robustness, with validated translations into languages such as , Chinese, French, German, , and Spanish demonstrating equivalent reliability and validity scores through systematic back-translation and testing processes.

Applications

Common Use Cases

The System Usability Scale (SUS) is frequently employed in product development to conduct iterative of prototypes, allowing teams to refine designs based on user feedback throughout the development cycle. For instance, in , SUS facilitates A/B comparisons by quantifying perceived differences between design variants, enabling data-driven decisions to enhance interface efficiency and user satisfaction. Following product launches, SUS serves as a tool for gathering post-launch feedback on applications, particularly in sectors requiring high user trust and , such as apps. Developers use SUS surveys to assess ongoing user perceptions of app and reliability, informing updates that address real-world pain points like transaction ease or security interface clarity. In academic research, SUS is a staple for evaluating interface designs in controlled studies, including assessments of e-learning platforms where it measures learner interaction with virtual environments. Similarly, researchers apply SUS to medical devices, such as home health monitoring tools, to gauge end-user comprehension and operational intuitiveness in clinical settings. Recent applications as of 2024-2025 include evaluations of voice user interfaces and online transportation apps for elderly users. Notable applications include NASA's use of SUS for hardware usability evaluations, such as intranet search systems and prototype interfaces in aerospace contexts, to ensure operational effectiveness under demanding conditions. integrates SUS as a key (KPI) to benchmark product usability across releases, aligning development with user-centered metrics. Additionally, SUS supports ISO-compliant evaluations under standards like , particularly in medical device development, where it verifies ergonomic compliance through standardized user assessments. SUS also enables tracking usability improvements over successive product versions, providing a consistent metric to demonstrate progress in longitudinal studies or release cycles. By comparing scores across iterations, organizations can quantify enhancements in perceived ease of use, such as refined workflows in software updates.

Comparisons with Other Measures

The System Usability Scale (SUS) is often compared to other subjective and assessment tools, each serving distinct purposes in evaluating . Single Ease Question (SEQ): Unlike the SUS, which employs a multi-item to gauge overall system and learnability across an entire experience, the SEQ is a single-item, 7-point scale measure focused on perceived ease for specific tasks. This task-specific nature makes the SEQ quicker to administer post-task but less comprehensive for holistic evaluations, as it captures only immediate ease perceptions rather than broader attitudes. Studies have shown the SEQ correlates moderately with task performance metrics like completion rates (r ≈ 0.51) and errors (r ≈ -0.44). yet it provides narrower insights compared to the SUS's established across thousands of studies. NASA-Task Load Index (NASA-TLX): The SUS assesses overall perceived usability, emphasizing effectiveness, efficiency, and satisfaction, whereas the evaluates multidimensional workload, including mental demand, physical effort, temporal pressure, performance, and frustration on a 21-point scale across six subscales. This distinction positions the SUS as a streamlined tool for general UX (typically 1-2 minutes to complete with 10 items), while the NASA-TLX requires more time due to its pairwise weighting of subscales, offering deeper diagnostic detail into cognitive and emotional burdens but lacking the SUS's extensive norms. In high-stakes domains like healthcare or , the NASA-TLX complements the SUS by revealing workload facets that usability scores alone may overlook, though both tools benefit from sample sizes of 20-30 users for reliable results. Usability Metric for User Experience (UMUX): The SUS stands out for its long-standing establishment since 1986, with over 10,000 benchmark scores enabling robust comparisons, in contrast to the UMUX, a shorter 4-item scale introduced in 2010 that aligns with standards for perceived usefulness and ease of use. While the UMUX reduces respondent burden and shows high correlations with SUS scores (e.g., r = 0.90), it has fewer norms and less extensive validation, making the SUS preferable for contexts requiring proven reliability. The UMUX's newer status limits its adoption in large-scale benchmarking, though its brevity supports quick assessments where the SUS's 10 items might feel redundant.

Limitations and Criticisms

Key Limitations

The System Usability Scale (SUS) relies on self-reported perceptions from users, introducing subjectivity that may not fully align with objective metrics such as task completion rates or error frequencies. This subjective nature stems from its design as a capturing overall impressions rather than behavioral data, potentially leading to discrepancies where users rate a system highly despite poor performance outcomes. For instance, in empirical evaluations, SUS scores have shown moderate correlations with success rates but failed to capture nuances in objective efficiency or effectiveness. As of 2024, additional criticisms include the potentially antiquated wording of SUS items and risks of miscoding in , with studies finding errors in 11% of datasets and 13% of questionnaires. A notable limitation is the presence of ceiling effects, where SUS scores frequently cluster at high levels (often 80 or above) for familiar or well-designed systems, thereby reducing the scale's sensitivity to detect incremental improvements. This range restriction occurs because the SUS tends to produce elevated ratings in contexts of or adequate , limiting its discriminatory power in comparative assessments of refined products. Studies analyzing hundreds of SUS administrations (e.g., Bangor et al., 2008; Sauro, 2011) have shown that scores rarely fall below 40 and commonly exceed 70 in many standard interfaces, though low scores are possible in poorly designed systems. Continued validations as of 2025 confirm its reliability in contexts like gamified e-learning and parental mobile apps. The original English-language formulation of the SUS can introduce cultural biases when applied in non-Western contexts without proper , as response patterns on Likert scales may vary across cultural norms influencing agreement tendencies. Studies on cross-cultural translations, such as to Malay or , highlight the need for linguistic and cultural adjustments to mitigate these biases, which otherwise lead to inconsistent validity and reliability in diverse populations. Additionally, the SUS provides only a global score without item-level analysis, limiting its diagnostic power to identify specific interface problems or areas for targeted redesign. This aggregate approach, while efficient, precludes granular insights into individual questionnaire items, making it unsuitable for detailed diagnostics.

Alternatives and Improvements

Several alternatives to the System Usability Scale (SUS) exist for assessing , each tailored to specific aspects of or satisfaction. The (NPS) focuses on user loyalty and likelihood to recommend a product, using a single question on a 0-10 scale, which correlates with SUS scores but emphasizes business-oriented outcomes over detailed usability perceptions. In contrast, the Questionnaire for User Interaction Satisfaction (QUIS) provides multi-dimensional analysis across categories such as screen layout, terminology, and overall satisfaction, enabling more granular evaluation of human-computer interfaces compared to SUS's unidimensional score. Improvements to SUS have extended its applicability by incorporating additional elements for richer insights. One such enhancement involves adding an rating scale—a semantic approach with anchors like "worst imaginable" to "best imaginable"—to interpret individual SUS scores more intuitively, aiding communication of results to stakeholders. Hybrid metrics combining SUS with behavioral data, such as eye-tracking, offer objective validation of subjective scores; for instance, in evaluating apps, eye movement metrics like fixation duration complement SUS to quantify information access and operational efficiency, improving post-intervention interaction by approximately 24%. Recent adaptations like UMUX-LITE streamline SUS for mobile contexts, using just two items to measure perceived usefulness and ease of use with high reliability (alpha = 0.86) and strong correlation to SUS (r = 0.83), making it suitable for agile, resource-constrained assessments. Post-2020 research has introduced AI-assisted approaches to automate UX evaluation, including and large models for user and feedback analysis, which can complement metrics like SUS in large-scale assessments. Additionally, culturally localized versions address potential biases in the original English SUS by adapting wording and scales; examples include validations for Indonesian, Brazilian Portuguese, European Portuguese, Spanish (for gamified e-learning), and Swedish contexts, ensuring reliability across diverse populations through testing.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.