Recent from talks
Nothing was collected or created yet.
Repeatability
View on WikipediaRepeatability or test–retest reliability[1] is the closeness of the agreement between the results of successive measurements of the same measure, when carried out under the same conditions of measurement. [2] In other words, the measurements are taken by a single person or instrument on the same item, under the same conditions, and in a short period of time. A less-than-perfect test–retest reliability causes test–retest variability. Such variability can be caused by, for example, intra-individual variability and inter-observer variability. A measurement may be said to be repeatable when this variation is smaller than a predetermined acceptance criterion.
Test–retest variability is practically used, for example, in medical monitoring of conditions. In these situations, there is often a predetermined "critical difference", and for differences in monitored values that are smaller than this critical difference, the possibility of variability as a sole cause of the difference may be considered in addition to, for example, changes in diseases or treatments.[3]
Conditions
[edit]The following conditions need to be fulfilled in the establishment of repeatability: [2][4]
- the same experimental tools
- the same observer
- the same measuring instrument, used under the same conditions
- the same location
- repetition over a short period of time.
- same objectives
Repeatability methods were developed by Bland and Altman (1986).[5]
If the correlation between separate administrations of the test is high (e.g. 0.7 or higher as in this Cronbach's alpha-internal consistency-table[6]), then it has good test–retest reliability.
The repeatability coefficient is a precision measure which represents the value below which the absolute difference between two repeated test results may be expected to lie with a probability of 95%.[citation needed]
The standard deviation under repeatability conditions is part of precision and accuracy.[citation needed]
Attribute agreement analysis for defect databases
[edit]An attribute agreement analysis is designed to simultaneously evaluate the impact of repeatability and reproducibility on accuracy. It allows the analyst to examine the responses from multiple reviewers as they look at several scenarios multiple times. It produces statistics that evaluate the ability of the appraisers to agree with themselves (repeatability), with each other (reproducibility), and with a known master or correct value (overall accuracy) for each characteristic – over and over again.[7]
Psychological testing
[edit]Because the same test is administered twice and every test is parallel with itself, differences between scores on the test and scores on the retest should be due solely to measurement error. This sort of argument is quite probably true for many physical measurements. However, this argument is often inappropriate for psychological measurement, because it is often impossible to consider the second administration of a test a parallel measure to the first.[8]
The second administration of a psychological test might yield systematically different scores than the first administration due to the following reasons:[8]
- The attribute that is being measured may change between the first test and the retest. For example, a reading test that is administered in September to a third grade class may yield different results when retaken in June. One would expect some change in children's reading ability over that span of time, a low test–retest correlation might reflect real changes in the attribute itself.
- The experience of taking the test itself can change a person's true score. For example, completing an anxiety inventory could serve to increase a person's level of anxiety.
- Carryover effect, particularly if the interval between test and retest is short. When retested, people may remember their original answer, which could affect answers on the second administration.
See also
[edit]References
[edit]- ^ Types of Reliability Archived 2018-06-06 at the Wayback Machine The Research Methods Knowledge Base. Last Revised: 20 October 2006
- ^ a b JCGM 100:2008. Evaluation of measurement data – Guide to the expression of uncertainty in measurement (PDF), Joint Committee for Guides in Metrology, 2008, archived (PDF) from the original on 2009-10-01, retrieved 2018-04-11
- ^ Fraser, C. G.; Fogarty, Y. (1989). "Interpreting laboratory results". BMJ (Clinical Research Ed.). 298 (6689): 1659–1660. doi:10.1136/bmj.298.6689.1659. PMC 1836738. PMID 2503170.
- ^ Taylor, Barry N.; Kuyatt, Chris E. (1994), NIST Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results Cover, Gaithersburg, MD, USA: National Institute of Standards and Technology, archived from the original on 2019-09-30, retrieved 2018-04-11
- ^ "Statistical methods for assessing agreement between two methods of clinical measurement". Archived from the original on 2018-07-06. Retrieved 2010-09-30.
- ^ George, D., & Mallery, P. (2003). SPSS for Windows step by step: A simple guide and reference. 11.0 update (4th ed.). Boston: Allyn & Bacon.
- ^ "Attribute Agreement Analysis for Defect Databases | iSixSigma". 26 February 2010. Archived from the original on 22 March 2016. Retrieved 7 February 2013.
- ^ a b Davidshofer, Kevin R. Murphy, Charles O. (2005). Psychological testing : principles and applications (6th ed.). Upper Saddle River, N.J.: Pearson/Prentice Hall. ISBN 978-0-13-189172-2.
{{cite book}}: CS1 maint: multiple names: authors list (link)
External links
[edit]Repeatability
View on GrokipediaFundamental Concepts
Definition
Repeatability is the degree to which the results of a measurement, experiment, or process remain consistent when repeated multiple times under the same conditions, including the use of the same operator, equipment, location, and methodology over a short period. In metrology, it is specifically defined as measurement precision under a set of repeatability conditions, where these conditions encompass the same measurement procedure, operator, measuring system, operating conditions, and replicate measurements on the same or similar objects. This concept emphasizes the closeness of agreement between independent test results obtained under stipulated conditions of measurement, with all potential sources of variation—such as environmental factors, time intervals, and procedural details—held constant to isolate inherent process variability. The key attributes include minimizing extraneous influences to ensure that any observed differences arise solely from random measurement errors rather than systematic changes.[1] The notion of repeatability emerged in the 19th century as part of the broader standardization of scientific methods in metrology, driven by efforts to establish uniform measurement systems amid industrial expansion, culminating in the 1875 Metre Convention that founded the International Bureau of Weights and Measures (BIPM). It was further formalized in the 20th century by the International Organization for Standardization (ISO), notably in ISO 5725-1, first published in 1994 and revised in 2023, which provides general principles and definitions for accuracy, trueness, and precision in measurement methods. A basic example is repeating a chemical reaction in a controlled laboratory environment, where successive trials using the identical reagents, temperature, and stirring method yield the same pH values, illustrating the process's repeatability.[1] In contrast to reproducibility, which evaluates consistency under varied conditions like different operators or locations, repeatability strictly maintains identical setups to assess intrinsic reliability.Distinction from Related Terms
Repeatability is distinguished from related concepts in measurement and scientific practice by its focus on consistency under identical conditions, whereas reproducibility involves achieving similar outcomes when conditions are varied, such as by different operators or equipment. Replicability, in contrast, refers to the ability of independent researchers to obtain comparable results through new experiments or data, often emphasizing verification beyond the original study.[6] Reliability encompasses a broader assessment of a method's overall stability and consistency across repeated uses, time periods, and varying conditions, serving as an umbrella term that includes aspects of both repeatability and reproducibility.[7] The International Organization for Standardization (ISO) provides precise definitions in ISO 5725, where repeatability is defined as the closeness of agreement between successive measurements of the same quantity under the same conditions (known as within-run variation), and reproducibility as the closeness of agreement between measurements under changed conditions, such as different laboratories or time periods (between-run or between-laboratory variation). These distinctions highlight repeatability as a measure of precision in a controlled, unchanging environment, while reproducibility tests robustness against external variables.[8] A common misconception arises in media and public discourse on scientific integrity, where repeatability is frequently conflated with reproducibility during discussions of crises like the replication crisis in psychology, leading to overstated concerns about basic experimental consistency when the issue often pertains to broader inter-study validation.[9] The following table summarizes these distinctions for clarity:| Term | Conditions | Scope | Example |
|---|---|---|---|
| Repeatability | Identical (same operator, equipment, short time interval) | Within a single setup or run | Multiple temperature readings from the same laboratory thermometer under unchanged ambient conditions.[2] |
| Reproducibility | Varied (e.g., different labs, operators, or equipment) | Between setups or runs | DNA sequencing results obtained across independent laboratories using similar protocols.[2] |
| Replicability | Independent (new data, methods by other researchers) | External verification | Separate research teams confirming a statistical effect with fresh participant samples.[6] |
| Reliability | Overall (across time, conditions, and repetitions) | Broad measurement stability | A diagnostic tool providing consistent outcomes for the same patient over multiple sessions.[7] |
Measurement and Assessment
Statistical Methods
Statistical methods for evaluating repeatability focus on quantifying the variation in repeated measurements obtained under identical conditions, enabling researchers and practitioners to assess the precision of measurement processes. These techniques partition sources of variability and provide metrics to determine whether a system meets acceptable standards for reliability. Key approaches include descriptive statistics, variance component analysis, and graphical monitoring tools, often applied in quality control and experimental design. The standard deviation of repeated measurements is a fundamental metric for repeatability, capturing the typical spread of results from successive trials of the same item or process. It is calculated as the square root of the variance of the dataset, where lower values indicate higher repeatability. Complementing this, the coefficient of variation (CV) normalizes the standard deviation relative to the mean, expressed aswhere is the standard deviation and is the mean; this percentage-based measure facilitates comparisons across datasets with different units or scales, particularly in analytical and laboratory settings. Standardized formulas further refine these assessments. According to ISO 5725-2, the repeatability standard deviation approximates the interval within which 95% of repeated measurements are expected to fall, given by
where is the within-laboratory standard deviation derived from multiple replicates; this assumes a normal distribution and is used to establish precision limits in interlaboratory studies. In manufacturing contexts, the gauge repeatability and reproducibility (GR&R) percentage evaluates measurement system adequacy as
[10] where is the standard deviation from the Gage R&R study (combining repeatability and reproducibility variation) and tolerance is the specified process limit; values below 10% indicate an acceptable system, while 10-30% suggest marginal performance requiring improvement. Analysis of variance (ANOVA) is a core technique for dissecting repeatability by partitioning total variation into components attributable to operators, parts, or equipment, using a random effects model to estimate variance contributions. This method, often implemented in crossed or nested designs, tests for significant differences and quantifies repeatability as the residual error variance. Control charts, such as Shewhart charts, monitor repeatability over time by plotting measurement means or ranges against control limits (typically ), signaling deviations when points exceed bounds or exhibit non-random patterns, thus aiding ongoing process stability assessment. Software tools like R and Minitab facilitate these computations through built-in functions for variance analysis and metric calculation. For instance, in R's
irr package, repeatability indices can be derived from a simple dataset of repeated weight measurements—say, ten trials yielding values 100.2, 99.8, 100.1, 100.0, 99.9, 100.3, 100.1, 99.7, 100.0, 100.2 g—with a mean of 100.03 g and standard deviation of 0.17 g, resulting in a CV of approximately 0.17%, indicating high repeatability for a precision balance.
Attribute Agreement Analysis
Attribute Agreement Analysis (AAA) is a statistical method within Measurement System Analysis (MSA) designed to evaluate the consistency and accuracy of subjective classifications in categorical data, such as assigning defect types by multiple appraisers.[11] It focuses on repeatability by quantifying agreement beyond chance, helping identify sources of variation in human judgment during inspections.[12] A key component of AAA is the Cohen's kappa statistic, which measures inter-rater agreement for categorical assignments while adjusting for expected agreement by chance:where is the observed proportion of agreement and is the expected proportion under chance.[13] AAA typically includes two main appraisal types: appraiser-versus-standard, which assesses accuracy against a reference classification, and appraiser-versus-appraiser, which evaluates reproducibility among raters.[14] In defect databases, AAA is applied in manufacturing quality control, including Six Sigma processes, to measure repeatability in categorizing defects from visual inspections of parts, ensuring reliable data entry for root cause analysis.[15] For instance, in a binary defect classification (defective/non-defective) across 100 samples rated by three appraisers, percent agreement is calculated as the ratio of matching classifications to total ratings, while kappa values assess chance-adjusted consistency; interpretations often deem percent agreement above 80% and kappa above 0.75 as indicating acceptable repeatability.[16] AAA was developed in the 1990s primarily for the automotive and electronics industries to standardize gage studies for attribute data, and it was formally integrated into the Automotive Industry Action Group's (AIAG) Measurement Systems Analysis Reference Manual, third edition, published in 2002.[17]
