Recent from talks
Contribute something to knowledge base
Content stats: 0 posts, 0 articles, 0 media, 0 notes
Members stats: 0 subscribers, 0 contributors, 0 moderators, 0 supporters
Subscribers
Supporters
Contributors
Moderators
Hub AI
Item response theory AI simulator
(@Item response theory_simulator)
Hub AI
Item response theory AI simulator
(@Item response theory_simulator)
Item response theory
In psychometrics, item response theory (IRT, also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. It is a theory of testing based on the relationship between individuals' performances on a test item and the test takers' levels of performance on an overall measure of the ability that item was designed to measure. Several different statistical models are used to represent both item and test taker characteristics. Unlike simpler alternatives for creating scales and evaluating questionnaire responses, it does not assume that each item is equally difficult. This distinguishes IRT from, for instance, Likert scaling, in which "All items are assumed to be replications of each other or in other words items are considered to be parallel instruments". By contrast, item response theory treats the difficulty of each item (the item characteristic curves, or ICCs) as information to be incorporated in scaling items.
It is based on the application of related mathematical models to testing data. Because it is often regarded as superior to classical test theory, it is the preferred method for developing scales in the United States,[citation needed] especially when optimal decisions are demanded, as in so-called high-stakes tests, e.g., the Graduate Record Examination (GRE) and Graduate Management Admission Test (GMAT).
The name item response theory is due to the focus of the theory on the item, as opposed to the test-level focus of classical test theory. Thus IRT models the response of each examinee of a given ability to each item in the test. The term item is generic, covering all kinds of informative items. They might be multiple choice questions that have incorrect and correct responses, but are also commonly statements on questionnaires that allow respondents to indicate level of agreement (a rating or Likert scale), or patient symptoms scored as present/absent, or diagnostic information in complex systems.
IRT is based on the idea that the probability of a correct/keyed response to an item is a mathematical function of person and item parameters. (The expression "a mathematical function of person and item parameters" is analogous to Lewin's equation, B = f(P, E), which asserts that behavior is a function of the person in their environment.) The person parameter is construed as (usually) a single latent trait or dimension. Examples include general intelligence or the strength of an attitude. Parameters on which items are characterized include their difficulty (known as "location" for their location on the difficulty range); discrimination (slope or correlation), representing how steeply the rate of success of individuals varies with their ability; and a pseudoguessing parameter, characterising the (lower) asymptote at which even the least able persons will score due to guessing (for instance, 25% for a pure chance on a multiple choice item with four possible responses).
In the same manner, IRT can be used to measure human behavior in online social networks. The views expressed by different people can be aggregated to be studied using IRT. Its use in classifying information as misinformation or true information has also been evaluated.
The concept of the item response function was around before 1950. The pioneering work of IRT as a theory occurred during the 1950s and 1960s. Three of the pioneers were the Educational Testing Service psychometrician Frederic M. Lord, the Danish mathematician Georg Rasch, and Austrian sociologist Paul Lazarsfeld, who pursued parallel research independently. Key figures who furthered the progress of IRT include Benjamin Drake Wright and David Andrich. IRT did not become widely used until the late 1970s and 1980s, when practitioners were told the "usefulness" and "advantages" of IRT on the one hand, and personal computers gave many researchers access to the computing power necessary for IRT on the other. In the 1990's Margaret Wu developed two item response software programs that analyse PISA and TIMSS data; ACER ConQuest (1998) and the R-package TAM (2010).
Among other things, the purpose of IRT is to provide a framework for evaluating how well assessments work, and how well individual items on assessments work. The most common application of IRT is in education, where psychometricians use it for developing and designing exams, maintaining banks of items for exams, and equating the difficulties of items for successive versions of exams (for example, to allow comparisons between results over time).
IRT models are often referred to as latent trait models. The term latent is used to emphasize that discrete item responses are taken to be observable manifestations of hypothesized traits, constructs, or attributes, not directly observed, but which must be inferred from the manifest responses. Latent trait models were developed in the field of sociology, but are virtually identical to IRT models.
Item response theory
In psychometrics, item response theory (IRT, also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. It is a theory of testing based on the relationship between individuals' performances on a test item and the test takers' levels of performance on an overall measure of the ability that item was designed to measure. Several different statistical models are used to represent both item and test taker characteristics. Unlike simpler alternatives for creating scales and evaluating questionnaire responses, it does not assume that each item is equally difficult. This distinguishes IRT from, for instance, Likert scaling, in which "All items are assumed to be replications of each other or in other words items are considered to be parallel instruments". By contrast, item response theory treats the difficulty of each item (the item characteristic curves, or ICCs) as information to be incorporated in scaling items.
It is based on the application of related mathematical models to testing data. Because it is often regarded as superior to classical test theory, it is the preferred method for developing scales in the United States,[citation needed] especially when optimal decisions are demanded, as in so-called high-stakes tests, e.g., the Graduate Record Examination (GRE) and Graduate Management Admission Test (GMAT).
The name item response theory is due to the focus of the theory on the item, as opposed to the test-level focus of classical test theory. Thus IRT models the response of each examinee of a given ability to each item in the test. The term item is generic, covering all kinds of informative items. They might be multiple choice questions that have incorrect and correct responses, but are also commonly statements on questionnaires that allow respondents to indicate level of agreement (a rating or Likert scale), or patient symptoms scored as present/absent, or diagnostic information in complex systems.
IRT is based on the idea that the probability of a correct/keyed response to an item is a mathematical function of person and item parameters. (The expression "a mathematical function of person and item parameters" is analogous to Lewin's equation, B = f(P, E), which asserts that behavior is a function of the person in their environment.) The person parameter is construed as (usually) a single latent trait or dimension. Examples include general intelligence or the strength of an attitude. Parameters on which items are characterized include their difficulty (known as "location" for their location on the difficulty range); discrimination (slope or correlation), representing how steeply the rate of success of individuals varies with their ability; and a pseudoguessing parameter, characterising the (lower) asymptote at which even the least able persons will score due to guessing (for instance, 25% for a pure chance on a multiple choice item with four possible responses).
In the same manner, IRT can be used to measure human behavior in online social networks. The views expressed by different people can be aggregated to be studied using IRT. Its use in classifying information as misinformation or true information has also been evaluated.
The concept of the item response function was around before 1950. The pioneering work of IRT as a theory occurred during the 1950s and 1960s. Three of the pioneers were the Educational Testing Service psychometrician Frederic M. Lord, the Danish mathematician Georg Rasch, and Austrian sociologist Paul Lazarsfeld, who pursued parallel research independently. Key figures who furthered the progress of IRT include Benjamin Drake Wright and David Andrich. IRT did not become widely used until the late 1970s and 1980s, when practitioners were told the "usefulness" and "advantages" of IRT on the one hand, and personal computers gave many researchers access to the computing power necessary for IRT on the other. In the 1990's Margaret Wu developed two item response software programs that analyse PISA and TIMSS data; ACER ConQuest (1998) and the R-package TAM (2010).
Among other things, the purpose of IRT is to provide a framework for evaluating how well assessments work, and how well individual items on assessments work. The most common application of IRT is in education, where psychometricians use it for developing and designing exams, maintaining banks of items for exams, and equating the difficulties of items for successive versions of exams (for example, to allow comparisons between results over time).
IRT models are often referred to as latent trait models. The term latent is used to emphasize that discrete item responses are taken to be observable manifestations of hypothesized traits, constructs, or attributes, not directly observed, but which must be inferred from the manifest responses. Latent trait models were developed in the field of sociology, but are virtually identical to IRT models.
