Hubbry Logo
ExamExamMain
Open search
Exam
Community hub
Exam
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Exam
Exam
from Wikipedia

Cambodian students taking an exam in order to apply for the Don Bosco Technical School of Sihanoukville in 2008
American students in a computer fundamentals class taking an online test in 2001

An examination (exam or evaluation) or test is an educational assessment intended to measure a test-taker's knowledge, skill, aptitude, physical fitness, or classification in many other topics (e.g., beliefs).[1] A test may be administered verbally, on paper, on a computer, or in a predetermined area that requires a test taker to demonstrate or perform a set of skills.

Tests vary in style, rigor and requirements. There is no general consensus or invariable standard for test formats and difficulty. Often, the format and difficulty of the test is dependent upon the educational philosophy of the instructor, subject matter, class size, policy of the educational institution, and requirements of accreditation or governing bodies.

A test may be administered formally or informally. An example of an informal test is a reading test administered by a parent to a child. A formal test might be a final examination administered by a teacher in a classroom or an IQ test administered by a psychologist in a clinic. Formal testing often results in a grade or a test score.[2] A test score may be interpreted with regard to a norm or criterion, or occasionally both. The norm may be established independently, or by statistical analysis of a large number of participants.

A test may be developed and administered by an instructor, a clinician, a governing body, or a test provider. In some instances, the developer of the test may not be directly responsible for its administration. For example, in the United States, Educational Testing Service (ETS), a nonprofit educational testing and assessment organization, develops standardized tests such as the SAT but may not directly be involved in the administration or proctoring of these tests.

History

[edit]
"The Official Career of Xu Xianqing" - on the bottom right the imperial examination examinees sit their exam, 1590, Ming dynasty

Oral and informal examinations

[edit]

Informal, unofficial, and non-standardized tests and testing systems have existed throughout history. For example, tests of skill such as archery contests have existed in China since the Zhou dynasty (or, more mythologically, Yao).[3] Oral exams were administered in various parts of the world including ancient China and Europe. A precursor to the later Chinese imperial examinations was in place since the Han dynasty, during which the Confucian characteristic of the examinations was determined. However these examinations did not offer an official avenue to government appointment, the majority of which were filled through recommendations based on qualities such as social status, morals, and ability.

China

[edit]

Standardized written examinations were first implemented in China. They were commonly known as the imperial examinations (keju).

The bureaucratic imperial examinations as a concept has its origins in the year 605 during the short lived Sui dynasty. Its successor, the Tang dynasty, implemented imperial examinations on a relatively small scale until the examination system was extensively expanded during the reign of Wu Zetian.[4] Included in the expanded examination system was a military exam that tested physical ability, but the military exam never had a significant impact on the Chinese officer corps and military degrees were seen as inferior to their civil counterpart. The exact nature of Wu's influence on the examination system is still a matter of scholarly debate.

During the Song dynasty the emperors expanded both examinations and the government school system, in part to counter the influence of hereditary nobility, increasing the number of degree holders to more than five times that of the Tang. From the Song dynasty onward, the examinations played the primary role in selecting scholar-officials, who formed the literati elite of society. However the examinations co-existed with other forms of recruitment such as direct appointments for the ruling family, nominations, quotas, clerical promotions, sale of official titles, and special procedures for eunuchs. The regular higher level degree examination cycle was decreed in 1067 to be 3 years but this triennial cycle only existed in nominal terms. In practice both before and after this, the examinations were irregularly implemented for significant periods of time: thus, the calculated statistical averages for the number of degrees conferred annually should be understood in this context. The jinshi exams were not a yearly event and should not be considered so; the annual average figures are a necessary artifact of quantitative analysis.[5] The operations of the examination system were part of the imperial record keeping system, and the date of receiving the jinshi degree is often a key biographical datum: sometimes the date of achieving jinshi is the only firm date known for even some of the most historically prominent persons in Chinese history.

A brief interruption to the examinations occurred at the beginning of the Mongol Yuan dynasty in the 13th century, but was later brought back with regional quotas which favored the Mongols and disadvantaged Southern Chinese. During the Ming and Qing dynasties, the system contributed to the narrow and focused nature of intellectual life and enhanced the autocratic power of the emperor. The system continued with some modifications until its abolition in 1905 during the last years of the Qing dynasty. The modern examination system for selecting civil servants also indirectly evolved from the imperial one.[6]

Spread

[edit]
Invigilators seated on high chairs at a provincial exam in 1888 in northern Vietnam
From the mid 19th century, universities began to institute written examinations to assess the aptitude of the pupils. This is an excerpt from the 1842 Tripos examination in Cambridge University.

Japan

[edit]

Japan implemented the examination system for 200 years during the Heian period (794-1185). Like the Chinese examinations, the curriculum revolved around the Confucian canon. However, unlike in China, it was only ever applied to the minor nobility and so gradually faded away under the hereditary system during the Samurai era.[7]

Korea

[edit]

The examination system was established in Korea in 958 under the reign of Gwangjong of Goryeo. Any free man (not Nobi) was able to take the examinations. By the Joseon period, high offices were closed to aristocrats who had not passed the exams. The examination system continued until 1894 when it was abolished by the Gabo Reform. As in China, the content of the examinations focused on the Confucian canon and ensured a loyal scholar bureaucrat class which upheld the throne.[8]

Vietnam

[edit]

The Confucian examination system in Vietnam was established in 1075 under the Lý dynasty Emperor Lý Nhân Tông and lasted until the Nguyễn dynasty Emperor Khải Định (1919). There were only three levels of examinations in Vietnam: interprovincial, pre-court, and court.[8]

West

[edit]

The imperial examination system was known to Europeans as early as 1570. It received great attention from the Jesuit Matteo Ricci (1552–1610), who viewed it and its Confucian appeal to rationalism favorably in comparison to religious reliance on "apocalypse." Knowledge of Confucianism and the examination system was disseminated broadly in Europe following the Latin translation of Ricci's journal in 1614. During the 18th century, the imperial examinations were often discussed in conjunction with Confucianism, which attracted great attention from contemporary European thinkers such as Gottfried Wilhelm Leibniz, Voltaire, Montesquieu, Baron d'Holbach, Johann Wolfgang von Goethe, and Friedrich Schiller.[9] In France and Britain, Confucian ideology was used in attacking the privilege of the elite.[10] Figures such as Voltaire claimed that the Chinese had "perfected moral science" and François Quesnay advocated an economic and political system modeled after that of the Chinese. According to Ferdinand Brunetière (1849-1906), followers of Physiocracy such as François Quesnay, whose theory of free trade was based on Chinese classical theory, were sinophiles bent on introducing "l'esprit chinois" to France. He also admits that French education was really based on Chinese literary examinations which were popularized in France by philosophers, especially Voltaire. Western perception of China in the 18th century admired the Chinese bureaucratic system as favourable over European governments for its seeming meritocracy.[11][12] However those who admired China such as Christian Wolff were sometimes persecuted. In 1721 he gave a lecture at the University of Halle praising Confucianism, for which he was accused of atheism and forced to give up his position at the university.[13]

The earliest evidence of examinations in Europe date to 1215 or 1219 in Bologna. These were chiefly oral in the form of a question or answer, disputation, determination, defense, or public lecture. The candidate gave a public lecture of two prepared passages assigned to him from the civil or canon law, and then doctors asked him questions, or expressed objections to answers. Evidence of written examinations do not appear until 1702 at Trinity College, Cambridge. According to Sir Michael Sadler, Europe may have had written examinations since 1518 but he admits the "evidence is not very clear." In Prussia, medication examinations began in 1725. The Mathematical Tripos, founded in 1747, is commonly believed to be the first honor examination, but James Bass Mullinger considered "the candidates not having really undergone any examination whatsoever" because the qualification for a degree was merely four years of residence. France adopted the examination system in 1791 as a result of the French Revolution but it collapsed after only ten years. Germany implemented the examination system around 1800.[12]

Englishmen in the 18th century such as Eustace Budgell recommended imitating the Chinese examination system but the first English person to recommend competitive examinations to qualify for employment was Adam Smith in 1776. In 1838, the Congregational church missionary Walter Henry Medhurst considered the Chinese exams to be "worthy of imitating."[12] In 1806, the British established a Civil Service College near London for training of the East India Company's administrators in India. This was based on the recommendations of British East India Company officials serving in China and had seen the Imperial examinations. In 1829, the company introduced civil service examinations in India on a limited basis.[14] This established the principle of qualification process for civil servants in England.[13] In 1847 and 1856, Thomas Taylor Meadows strongly recommended the adoption of the Chinese principle of competitive examinations in Great Britain in his Desultory Notes on the Government and People of China. According to Meadows, "the long duration of the Chinese empire is solely and altogether owing to the good government which consists in the advancement of men of talent and merit only."[15] Both Thomas Babington Macaulay, who was instrumental in passing the Saint Helena Act 1833, and Stafford Northcote, 1st Earl of Iddesleigh, who prepared the Northcote–Trevelyan Report that catalyzed the British civil service, were familiar with Chinese history and institutions. The Northcote–Trevelyan Report of 1854 made four principal recommendations: that recruitment should be on the basis of merit determined through standardized written examination, that candidates should have a solid general education to enable inter-departmental transfers, that recruits should be graded into a hierarchy, and that promotion should be through achievement, rather than 'preferment, patronage, or purchase'.[16]

When the report was brought up in parliament in 1853, Lord Monteagle argued against the implementation of open examinations because it was a Chinese system and China was not an "enlightened country." Lord Stanley called the examinations the "Chinese Principle." The Earl of Granville did not deny this but argued in favor of the examination system, considering that the minority Manchus had been able to rule China with it for over 200 years. In 1854, Edwin Chadwick reported that some noblemen did not agree with the measures introduced because they were Chinese. The examination system was finally implemented in the British Indian Civil Service in 1855, prior to which admission into the civil service was purely a matter of patronage, and in England in 1870. Even as late as ten years after the competitive examination plan was passed, people still attacked it as an "adopted Chinese culture." Alexander Baillie-Cochrane, 1st Baron Lamington insisted that the English "did not know that it was necessary for them to take lessons from the Celestial Empire." In 1875, Archibald Sayce voiced concern over the prevalence of competitive examinations, which he described as "the invasion of this new Chinese culture."[12]

After Great Britain's successful implementation of systematic, open, and competitive examinations in India in the 19th century, similar systems were instituted in the United Kingdom itself, and in other Western nations.[17] Like the British, the development of the French and American civil service was influenced by the Chinese system. When Thomas Jenckes made a Report from the Joint Select Committee on Retrenchment in 1868, it contained a chapter on the civil service in China. In 1870, William Spear wrote a book called The Oldest and the Newest Empire-China and the United States, in which he urged the United States government to adopt the Chinese examination system. Like in Britain, many of the American elites scorned the plan to implement competitive examinations, which they considered foreign, Chinese, and "un-American." As a result, the civil services reform introduced into the House of Representatives in 1868 was not passed until 1883. The Civil Service Commission tried to combat such sentiments in its report:[18]

...with no intention of commending either the religion or the imperialism of China, we could not see why the fact that the most enlightened and enduring government of the Eastern world had acquired an examination as to the merits of candidates for office, should any more deprive the American people of that advantage, if it might be an advantage, than the facts that Confucius had taught political morality, and the people of China had read books, used the compass, gunpowder, and the multiplication table, during centuries when this continent was a wilderness, should deprive our people of those conveniences.[12]

— Civil Service Commission

Modern development

[edit]
Students taking a scholarship examination in a classroom in 1940

Standardized testing began to influence the method of examination in British universities from the 1850s, where oral exams had common since the Middle Ages. In the US, the transition happened under the influence of the educational reformer Horace Mann. The shift helped standardize an expansion of the curricula into the sciences and humanities, creating a rationalized method for the evaluation of teachers and institutions and creating a basis for the streaming of students according to ability.[19]

Both World War I and World War II demonstrated the necessity of standardized testing and the benefits associated with these tests. Tests were used to determine the mental aptitude of recruits to the military. The US Army used the Stanford–Binet Intelligence Scale to test the IQ of the soldiers.[20] After the War, industry began using tests to evaluate applicants for various jobs based on performance. In 1952, the first Advanced Placement (AP) test was administered to begin closing the gap between high schools and colleges.[21]

Contemporary tests

[edit]

Education

[edit]

Tests are used throughout most educational systems. Tests may range from brief, informal questions chosen by the teacher to major tests that students and teachers spend months preparing for.

Some countries such as the United Kingdom and France require all their secondary school students to take a standardized test on individual subjects such as the General Certificate of Secondary Education (GCSE) (in England) and Baccalauréat respectively as a requirement for graduation.[22] These tests are used primarily to assess a student's proficiency in specific subjects such as mathematics, science, or literature. In contrast, high school students in other countries such as the United States may not be required to take a standardized test to graduate. Moreover, students in these countries usually take standardized tests only to apply for a position in a university program and are typically given the option of taking different standardized tests such as the ACT or SAT, which are used primarily to measure a student's reasoning skill.[23][24] High school students in the United States may also take Advanced Placement tests on specific subjects to fulfill university-level credit. Depending on the policies of the test maker or country, administration of standardized tests may be done in a large hall, classroom, or testing center. A proctor or invigilator may also be present during the testing period to provide instructions, to answer questions, or to prevent cheating.

Grades or test scores from standardized test may also be used by universities to determine whether a student applicant should be admitted into one of its academic or professional programs. For example, universities in the United Kingdom admit applicants into their undergraduate programs based primarily or solely on an applicant's grades on pre-university qualifications such as the GCE A-levels or Cambridge Pre-U.[25][26] In contrast, universities in the United States use an applicant's test score on the SAT or ACT as just one of their many admission criteria to determine whether an applicant should be admitted into one of its undergraduate programs. The other criteria in this case may include the applicant's grades from high school, extracurricular activities, personal statement, and letters of recommendations.[27] Once admitted, undergraduate students in the United Kingdom or United States may be required by their respective programs to take a comprehensive examination as a requirement for passing their courses or for graduating from their respective programs.

Standardized tests are sometimes used by certain countries to manage the quality of their educational institutions. For example, the No Child Left Behind Act in the United States requires individual states to develop assessments for students in certain grades. In practice, these assessments typically appear in the form of standardized tests. Test scores of students in specific grades of an educational institution are then used to determine the status of that educational institution, i.e., whether it should be allowed to continue to operate in the same way or to receive funding.

Finally, standardized tests are sometimes used to compare proficiencies of students from different institutions or countries. For example, the Organisation for Economic Co-operation and Development (OECD) uses Programme for International Student Assessment (PISA) to evaluate certain skills and knowledge of students from different participating countries.[28]

Licensing and certification

[edit]

Standardized tests are sometimes used by certain governing bodies to determine whether a test taker is allowed to practice a profession, to use a specific job title, or to claim competency in a specific set of skills. For example, a test taker who intends to become a lawyer is usually required by a governing body such as a governmental bar licensing agency to pass a bar exam.

Immigration and naturalization

[edit]

Standardized tests are also used in certain countries to regulate immigration. For example, intended immigrants to Australia are legally required to pass a citizenship test as part of that country's naturalization process.[29]

Language testing in naturalization process

[edit]

When analyzed in the context of language texting in the naturalization processes, the ideology can be found from two distinct but nearly related points. One refers to the construction and deconstruction of the nation's constitutive elements that makes their own identity, while the second has a more restricted view of the notion of specific language and ideologies that may served in a specific purpose.[30]

Intelligence quotient

[edit]

Competitions

[edit]

Tests are sometimes used as a tool to select for participants that have potential to succeed in a competition such as a sporting event. For example, skaters who wish to participate in figure skating competitions in the United States must pass official U.S. Figure Skating tests just to qualify.[31]

Group memberships

[edit]

Tests are sometimes used by a group to select for certain types of individuals to join the group. For example, Mensa International is a high-IQ society that requires individuals to score at the 98th percentile or higher on a standardized, supervised IQ test.[32]

Types

[edit]

Assessment types include:[33][34][35]

Formative assessment
Formative assessments are informal and formal tests taken during the learning process. These assessments modify the later learning activities, to improve student achievement. They identify strengths and weaknesses and help target areas that need work. The goal of formative assessment is to monitor student learning to provide ongoing feedback that can be used by instructors to improve their teaching and by students to improve their learning.[citation needed]
Summative assessment
Summative assessments evaluate competence at the end of an instructional unit, with the goal of determining if the candidate has assimilated the knowledge or skills to the required standard. Summative assessments may cover a few days' instruction, an entire term's work in cases such as final exams, or even multiple years' study, in the case of high school exit exams, GCE Advanced Level examples, or professional licensing tests such as the United States Medical Licensing Examination.
Norm-referenced test
Norm-referenced tests compare a student's performance against a national or other "norm" group. Only a certain percentage of test takers will get the best and worse scores. Norm-referencing is usually called grading on a curve when the comparison group is students in the same classroom. Norm-referenced tests report whether test takers performed better or worse than a hypothetical average student, which is determined by comparing scores against the performance results of a statistically selected group of test takers, typically of the same age or grade level, who have already taken the exam.[36]
Criterion-referenced test

Criterion-referenced tests are designed to measure student performance against a fixed set of criteria or learning standards. It is possible for all test takers to pass, just like it is possible for all test takers to fail. These tests can use individual's scores to focus on improving the skills that were lacking in comprehension.[36]

Performance-based assessments
Performance-based assessments require students to solve real-world problems or produce something with real-world application. For example, the student can demonstrate baking skills by baking a cake, and having the outcome judged for appearance, flavor, and texture.
Authentic assessment
An authentic assessment is the measurement of accomplishments within a realistic, practical context that is relevant outside of the school setting.[37] For example, an authentic assessment of arithmetic skills is figuring out how much the family's groceries will cost this week. This provides as much information about the students' addition skills as a test question that asks what the sum of various numbers are.
Standardized test
Standardized tests are all tests that are administered and scored in a consistent manner, regardless of whether it is a quick quiz created by the local teacher or a heavily researched test given to millions of people.[38] Standardized tests are often used in education, professional certification, psychology (e.g., MMPI), the military, and many other fields.
Non-standardized test
Non-standardized tests are flexible in scope and format, and variable in difficulty. For example, a teacher may go around the classroom and ask each student a different question. Some questions will inevitably be harder than others, and the teacher may be more strict with the answers from better students. A non-standardized test may be used to determine the proficiency level of students, to motivate students to study, to provide feedback to students, and to modify the curriculum to make it more appropriate for either low- or high-skill students
High-stakes test
High-stakes tests are tests with important consequences for the individual test taker, such as getting a driver's license. A high-stakes test does not need to be a high-stress test, if the test taker is confident of passing.[citation needed]
Competitive examinations

Competitive exams are norm-referenced, high-stakes tests in which candidates are ranked according to their grades and/or percentile, and then top rankers are selected. If the examination is open for n positions, then the first n candidates in ranks pass, the others are rejected. They are used as entrance examinations for university and college admissions such as the Joint Entrance Examination or to secondary schools. Types are civil service examinations, required for positions in the public sector; the U.S. Foreign Service Exam, and the United Nations Competitive Examination. Competitive examinations are considered an egalitarian way to select worthy applicants without risking influence peddling, bias or other concerns.

A single test can have multiple qualities. For example, the bar exam for aspiring lawyers may be a norm-referenced, standardized, summative assessment. This means that only the test takers with higher scores will pass, that all of them took the same test under the same circumstances and were graded with the same scoring standards, and that the test is meant to determine whether the law school graduates have learned enough to practice their profession.

Assessment formats

[edit]

Written tests

[edit]
Indonesian students taking a written test

Written tests are tests that are administered on paper or on a computer (as an eExam). A test taker who takes a written test could respond to specific test items by writing or typing within a given space of the test or on a separate form or document.

In some tests; where knowledge of many constants or technical terms is required to effectively answer questions, like Chemistry or Biology – the test developer may allow every test taker to bring with them a cheat sheet.

A test developer's choice of which style or format to use when developing a written test is usually arbitrary given that there is no single invariant standard for testing. Be that as it may, certain test styles and formats have become more widely used than others. Below is a list of those formats of test items that are widely used by educators and test developers to construct paper or computer-based tests. As a result, these tests may consist of only one type of test item format (e.g., multiple-choice test, essay test) or may have a combination of different test item formats (e.g., a test that has multiple-choice and essay items).

Multiple choice

[edit]

In a test that has items formatted as multiple-choice questions, a candidate would be given a number of set answers for each question, and the candidate must choose which answer or group of answers is correct. There are two families of multiple-choice questions.[39] The first family is known as the True/False question and it requires a test taker to choose all answers that are appropriate. The second family is known as One-Best-Answer question and it requires a test taker to answer only one from a list of answers.

There are several reasons to using multiple-choice questions in tests. In terms of administration, multiple-choice questions usually requires less time for test takers to answer, are easy to score and grade, provide greater coverage of material, allows for a wide range of difficulty, and can easily diagnose a test taker's difficulty with certain concepts.[40] As an educational tool, multiple-choice items test many levels of learning as well as a test taker's ability to integrate information, and it provides feedback to the test taker about why distractors were wrong and why correct answers were right. Nevertheless, there are difficulties associated with the use of multiple-choice questions. In administrative terms, multiple-choice items that are effective usually take a great time to construct.[40] As an educational tool, multiple-choice items do not allow test takers to demonstrate knowledge beyond the choices provided and may even encourage guessing or approximation due to the presence of at least one correct answer. For instance, a test taker might not work out explicitly that , but knowing that , they would choose an answer close to 48. Moreover, test takers may misinterpret these items and in the process, perceive these items to be tricky or picky. Finally, multiple-choice items do not test a test taker's attitudes towards learning because correct responses can be easily faked.

Alternative response

[edit]

True/False questions present candidates with a binary choice – a statement is either true or false. This method presents problems, as depending on the number of questions, a significant number of candidates could get 100% just by guesswork, and should on average get 50%.

Matching type

[edit]

A matching item is an item that provides a defined term and requires a test taker to match identifying characteristics to the correct term.[41] [example needed]

Completion type

[edit]

A fill-in-the-blank item provides a test taker with identifying characteristics and requires the test taker to recall the correct term.[41] There are two types of fill-in-the-blank tests. The easier version provides a word bank of possible words that will fill in the blanks. For some exams all words in the word bank are used exactly once. If a teacher wanted to create a test of medium difficulty, they would provide a test with a word bank, but some words may be used more than once and others not at all. The hardest variety of such a test is a fill-in-the-blank test in which no word bank is provided at all. This generally requires a higher level of understanding and memory than a multiple-choice test. Because of this, fill-in-the-blank tests with no word bank are often feared by students.

Essay

[edit]

Items such as short answer or essay typically require a test taker to write a response to fulfill the requirements of the item. In administrative terms, essay items take less time to construct.[40] As an assessment tool, essay items can test complex learning objectives as well as processes used to answer the question. The items can also provide a more realistic and generalizable task for test. Finally, these items make it difficult for test takers to guess the correct answers and require test takers to demonstrate their writing skills as well as correct spelling and grammar.

The difficulties with essay items are primarily administrative: for example, test takers require adequate time to be able to compose their answers.[40] When these questions are answered, the answers themselves are usually poorly written because test takers may not have time to organize and proofread their answers. In turn, it takes more time to score or grade these items. When these items are being scored or graded, the grading process itself becomes subjective as non-test related information may influence the process. Thus, considerable effort is required to minimize the subjectivity of the grading process. Finally, as an assessment tool, essay questions may potentially be unreliable in assessing the entire content of a subject matter.

Instructions to exam candidates rely on the use of command words, which direct the examinee to respond in a particular way, for example by describing or defining a concept, or comparing and contrasting two or more scenarios or events. Some command words require more insight or skill than others: for example, "analyse" and "synthesise" assess higher-level skills than "describe".[42] More demanding command words usually attract greater mark weighting in the examination. In the UK, Ofqual maintains an official list of command words explaining their meaning.[43] The Welsh government's guidance on the use of command words advises that they should be used "consistently and correctly", but notes that some subjects have their own traditions and expectations in regard to candidates' responses,[44] and Cambridge Assessment notes that in some cases, subject-specific command words may be in used.[45]

Quizzes

[edit]

A quiz is a brief assessment which may cover a small amount of material that was given in a class. Some of them cover two to three lectures that were given in a period of times as a reading section or a given exercise in were the most important part of the class was summarize. However, a simple quiz usually does not count very much, and instructors usually provide this type of test as a formative assessment to help determine whether the student is learning the material. In addition, doing this at the time the instructor collected all can make a significant part of the final course grade.[46]

Mathematical questions

[edit]

Most mathematics questions, or calculation questions from subjects such as chemistry, physics, or economics employ a style which does not fall into any of the above categories, although some papers, notably the Maths Challenge papers in the United Kingdom employ multiple choice. Instead, most mathematics questions state a mathematical problem or exercise that requires a student to write a freehand response. Marks are given more for the steps taken than for the correct answer. If the question has multiple parts, later parts may use answers from previous sections, and marks may be granted if an earlier incorrect answer was used but the correct method was followed, and an answer which is correct (given the incorrect input) is returned.

Higher-level mathematical papers may include variations on true/false, where the candidate is given a statement and asked to verify its validity by direct proof or stating a counterexample.

Open-book tests

[edit]

Though not as popular as the closed-book test, open-book (or open-note) tests are slowly rising in popularity. An open-book test allows the test taker to access textbooks and all of their notes while taking the test.[47] The questions asked on open-book exams are typically more thought provoking and intellectual than questions on a closed-book exam. Rather than testing what facts test takers know, open-book exams force them to apply the facts to a broader question. The main benefit that is seen from open-book tests is that they are a better preparation for the real world where one does not have to memorize and has anything they need at their disposal.[48]

Oral tests

[edit]

An oral test is a test that is answered orally (verbally). The teacher or oral test assessor will verbally ask a question to a student, who will then answer it using words.

Physical fitness tests

[edit]
A Minnesota National Guardsman performs pushups during a physical fitness test.

A physical fitness test is a test designed to measure physical strength, agility, and endurance. They are commonly employed in educational institutions as part of the physical education curriculum, in medicine as part of diagnostic testing, and as eligibility requirements in fields that focus on physical ability such as military or police. Throughout the 20th century, scientific evidence emerged demonstrating the usefulness of strength training and aerobic exercise in maintaining overall health, and more agencies began to incorporate standardized fitness testing. In the United States, the President's Council on Youth Fitness was established in 1956 as a way to encourage and monitor fitness in schoolchildren.

Common tests[49][50][51] include timed running or the multi-stage fitness test (commonly known as the "beep test"), and numbers of push-ups, sit-ups/abdominal crunches, and pull-ups that the individual can perform. More specialised tests may be used to test ability to perform a particular job or role. Many gyms, private organisations and event organizers have their own fitness tests. Using military techniques developed by the British Army and modern test like Illinois Agility Run and Cooper Test.[52]

Stop watch timing was common until recent years when hand timing had proven to be inaccurate and inconsistent.[53] Electronic timing is the new standard in order to promote accuracy and consistency, and lessen bias.[citation needed]

Performance tests

[edit]

A performance test is an assessment that requires an examinee to actually perform a task or activity, rather than simply answering questions referring to specific parts. The purpose is to ensure greater fidelity to what is being tested.

An example is a behind-the-wheel driving test to obtain a driver's license. Rather than only answering simple multiple-choice items regarding the driving of an automobile, a student is required to actually drive one while being evaluated.

Performance tests are commonly used in workplace and professional applications, such as professional certification and licensure. When used for personnel selection, the tests might be referred to as a work sample. A licensure example would be cosmetologists being required to demonstrate a haircut or manicure on a live person. The Group–Bourdon test is one of a number of psychometric tests which trainee train drivers in the UK are required to pass.[54]

Some performance tests are simulations. For instance, the assessment to become certified as an ophthalmic technician includes two components, a multiple-choice examination and a computerized skill simulation. The examinee must demonstrate the ability to complete seven tasks commonly performed on the job, such as retinoscopy, that are simulated on a computer.

Midterms and finals

[edit]

Midterm exam

[edit]

A midterm exam, is an exam given near the middle of an academic grading term, or near the middle of any given quarter or semester.[55] Midterm exams are a type of formative or summative assessment.[56]

Final examination

[edit]

A final examination, annual, exam, final interview, or simply final, is a test given to students at the end of a course of study or training. Although the term can be used in the context of physical training, it most often occurs in the academic world. Most high schools, colleges, and universities run final exams at the end of a particular academic term, typically a quarter or semester, or more traditionally at the end of a complete degree course.

Isolated purpose and common practice

[edit]

The purpose of the test is to make a final review of the topics covered and assessment of each student's knowledge of the subject. A final is technically just a greater form of a "unit test". They have the same purpose; finals are simply larger. Not all courses or curricula culminate in a final exam; instructors may assign a term paper or final project in some courses. The weighting of the final exam also varies. It may be the largest—or only—factor in the student's course grade; in other cases, it may carry the same weight as a midterm exam, or the student may be exempted. Not all finals need be cumulative, however, as some simply cover the material presented since the last exam. For example, a microbiology course might only cover fungi and parasites on the final exam if this were the policy of the professor, and all other subjects presented in the course would then not be tested on the final exam.

Prior to the examination period most students in the Commonwealth have a week or so of intense revision and study known as swotvac.

In the UK, most universities hold a single set of "Finals" at the end of the entire degree course. In Australia, the exam period varies, with high schools commonly assigning one or two weeks for final exams, but the university period—sometimes called "exam week" or just "exams"—may stretch to a maximum of three weeks.

Practice varies widely in the United States; "finals" or the "finals period" at the university level constitutes two or three weeks after the end of the academic term, but sometimes exams are administered in the last week of instruction. Some institutions designate a "study week" or "reading period" between the end of instruction and the beginning of finals, during which no examinations may be administered. Students at many institutions know the week before finals as "dead week." Most final exams incorporate the reading material that has been assigned throughout the term.

Though common in French tertiary institutions, final exams are not often assigned in French high schools. However, French high school students hoping to continue their studies at university level will sit a national exam, known as the Baccalauréat.

In some countries and locales that hold standardised exams, it is customary for schools to administer mock examinations, with formats modelling the real exam. Students from different schools are often seen exchanging mock papers as a means of test preparation.

Take-home finals

[edit]

A take-home final is an examination at the end of an academic term that is usually too long or complex to be completed in a single session as an in-class final. There is usually a deadline for completion, such as within one or two weeks of the end of the semester. A take-home final differs from a final paper, often involving research, extended texts and display of data.[citation needed]

Schedule

[edit]

In some cases, schools will run on a modified schedule for final exams to allow students more time to do their exams. However, this is not necessarily the case for every institution. [citation needed]

Preparations

[edit]

From the perspective of a test developer, there is great variability with respect to time and effort needed to prepare a test. Likewise, from the perspective of a test taker, there is also great variability with respect to the time and effort needed to obtain a desired grade or score on any given test. When a test developer constructs a test, the amount of time and effort is dependent upon the significance of the test itself, the proficiency of the test taker, the format of the test, class size, deadline of the test, and experience of the test developer.

The process of test construction has been aided in several ways. For one, many test developers were themselves students at one time, and therefore are able to modify or outright adopt questions from their previous tests. In some countries, book publishers often provide teaching packages that include test banks to university instructors who adopt their published books for their courses.[57] These test banks may contain up to four thousand sample test questions that have been peer-reviewed and time-tested. The instructor who chooses to use this testbank would only have to select a fixed number of test questions from this test bank to construct a test.

As with test constructions, the time needed for a test taker to prepare for a test is dependent upon the frequency of the test, the test developer, and the significance of the test. In general, nonstandardized tests that are short, frequent, and do not constitute a major portion of the test taker's overall course grade or score do not require the test taker to spend much time preparing for the test.[58] Conversely, nonstandardized tests that are long, infrequent, and do constitute a major portion of the test taker's overall course grade or score usually require the test taker to spend great amounts of time preparing for the test. To prepare for a nonstandardized test, test takers may rely upon their reference books, class or lecture notes, Internet, and past experience. Test takers may also use various learning aids to study for tests such as flashcards and mnemonics.[59] Test takers may even hire tutors to coach them through the process so that they may increase the probability of obtaining a desired test grade or score. In countries such as the United Kingdom, demand for private tuition has increased significantly in recent years.[60] Finally, test takers may rely upon past copies of a test from previous years or semesters to study for a future test. These past tests may be provided by a friend or a group that has copies of previous tests or by instructors and their institutions, or by the test provider (such as an examination board) itself.[61][62]

Unlike a nonstandardized test, the time needed by test takers to prepare for standardized tests is less variable and usually considerable. This is because standardized tests are usually uniform in scope, format, and difficulty and often have important consequences with respect to a test taker's future such as a test taker's eligibility to attend a specific university program or to enter a desired profession. It is not unusual for test takers to prepare for standardized tests by relying upon commercially available books that provide in-depth coverage of the standardized test or compilations of previous tests (e.g., ten year series in Singapore). In many countries, test takers even enroll in test preparation centers or cram schools that provide extensive or supplementary instructions to test takers to help them better prepare for a standardized test. In Hong Kong, it has been suggested that the tutors running such centers are celebrities in their own right.[63] This has led to private tuition being a popular career choice for new graduates in developed economies.[64][65] Finally, in some countries, instructors and their institutions have also played a significant role in preparing test takers for a standardized test.

Cheating

[edit]
Invigilators may oversee a test to reduce cheating methods such as copying.

Cheating on a test is the process of using unauthorized means or methods to obtain a desired test score or grade. This may range from bringing and using notes during a closed book examination, to copying another test taker's answer or choice of answers during an individual test, to sending a paid proxy to take the test.[66]

Several common methods have been employed to combat cheating. They include the use of multiple proctors or invigilators during a testing period to monitor test takers. Test developers may construct multiple variants of the same test to be administered to different test takers at the same time, or write tests with few multiple-choice options, based on the theory that fully worked answers are difficult to imitate.[67] In some cases, instructors themselves may not administer their own tests but will leave the task to other instructors or invigilators, which may mean that the invigilators do not know the candidates, and thus some form of identification may be required. Finally, instructors or test providers may compare the answers of suspected cheaters on the test themselves to determine whether cheating did occur.

See also

[edit]

International examinations

[edit]

References

[edit]

Bibliography

[edit]

Further reading

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
An exam, abbreviated from examination, is a formal assessment intended to measure an individual's knowledge, skills, aptitude, or proficiency in a given subject or domain. Examinations originated in ancient China with the imperial keju system, a merit-based selection process for bureaucratic officials that emphasized written evaluations over hereditary privilege, influencing later educational testing worldwide. Common formats include multiple-choice questions for objective scoring, essays for analytical depth, short answers for factual recall, and computational problems for applied reasoning, allowing tailored evaluation of diverse competencies. In educational contexts, exams serve to gauge learning outcomes, reinforce retention via retrieval practice, and inform accountability, though empirical evidence highlights their limitations, such as correlations with socioeconomic factors rather than innate ability alone, prompting debates over high-stakes reliance that may prioritize test-taking over holistic skill development.

Definition and Purpose

Core Objectives

Examinations fundamentally aim to evaluate the degree to which individuals have acquired , skills, and competencies aligned with specific learning objectives. This measurement provides an objective benchmark for assessing mastery of subject matter, distinguishing between superficial familiarity and or application. In educational settings, such evaluations ensure by verifying that instructional efforts translate into tangible outcomes, rather than relying solely on self-reported progress. A key objective is to deliver actionable feedback that identifies strengths and deficiencies, enabling educators to refine methods and students to focus remedial efforts. This diagnostic function supports continuous , as reveals gaps in comprehension or execution, prompting targeted interventions over generalized instruction. Empirical studies of assessment practices underscore how this feedback loop enhances learning efficacy by aligning future efforts with evidenced needs. Exams also fulfill gatekeeping roles by certifying qualifications for advancement, professional entry, or , where standardized testing minimizes subjective biases in . In high-stakes contexts, they rank candidates based on demonstrated ability, facilitating meritocratic selection while mitigating risks associated with unverified credentials. This objective underpins systems like licensing boards, where exam results directly correlate with public safety and professional reliability.

Theoretical Underpinnings

The theoretical foundations of examinations derive from psychometric principles, which employ statistical models to measure latent human attributes such as , , or skill through observable responses under controlled conditions. These principles prioritize reliability—the consistency of scores across administrations or items—and validity—the alignment of inferences drawn from scores with intended constructs—ensuring exams serve as causal proxies for competence rather than arbitrary evaluations. Classical test theory (CTT), established in the early , posits that an observed score equals a true underlying score plus random , assuming items contribute equally to the total and scores aggregate via simple sums or proportions. Reliability in CTT is quantified through methods like coefficient alpha, which assesses , while validity encompasses content coverage, predictive power, and construct fidelity; its simplicity enables application with modest sample sizes (e.g., 20–50 examinees) but renders results test- and population-dependent, limiting generalizability without form-specific norms. Item response theory (IRT), formalized in the 1960s, refines measurement by modeling the nonlinear probability of correct responses via logistic functions incorporating examinee ability (θ) and item parameters: discrimination (a, slope of response curve), difficulty (b, point of 50% success probability), and pseudoguessing (c). This framework yields invariant ability estimates across test forms, supports vertical scaling for comparable difficulty levels, and underpins adaptive testing algorithms that select items dynamically to maximize information yield, though it demands large calibration samples (e.g., 100–1,000 per item) for parameter stability. IRT's probabilistic granularity enhances precision in high-stakes contexts like licensure exams, outperforming CTT in equating disparate administrations. Philosophically, examinations embody meritocratic ideals by standardizing to isolate from extraneous influences, assuming a causal linkage between assessed proficiency and subsequent efficacy in roles requiring those competencies. Empirical validation stems from predictive correlations: scores exhibit moderate associations with outcomes, such as r ≈ 0.3–0.5 with college GPA and persistence, and extend to metrics like and attainment, outperforming alternatives like high school grades alone in multivariate models. Cognitive and learning sciences further inform underpinnings by critiquing rote-focused designs, advocating assessments that probe processes like transfer and per models such as , yet standardized exams retain utility for scalable, comparable inference amid scalability constraints of richer formats.

Historical Development

Ancient Origins and Oral Traditions

In ancient civilizations, the assessment of and skills predated formalized written tests, relying instead on oral traditions that emphasized , , and interactive questioning to verify mastery. These methods arose from the necessities of societies where was limited to elites and knowledge transmission occurred primarily through , ensuring fidelity in passing down religious, legal, and practical lore. Oral examinations served practical purposes, such as selecting capable individuals for roles in , priesthood, or craftsmanship, by testing recall accuracy, , and rhetorical ability under scrutiny. In Vedic India, spanning approximately 1500–500 BCE, education centered on the guru-shishya parampara, where students resided with teachers to absorb scriptures like the through repeated oral chanting and mnemonic techniques. Assessment occurred via rigorous oral interrogations by the , who posed questions on textual content, interpretations, and applications, often in the form of debates or recitations before assemblies to demonstrate retention and comprehension. Practical demonstrations complemented these, evaluating skills in , rituals, or , with success determining progression or societal roles; failure could lead to repetition or exclusion. This system prioritized depth over breadth, fostering causal understanding through verbal defense of ideas. Similarly, in ancient during the (c. 1046–256 BCE), early bureaucratic selection involved noble recommendations followed by oral examinations conducted by rulers or ministers, probing candidates' knowledge of , , and administrative acumen through dialogues and policy discussions. These evolved into more structured interrogations by the (475–221 BCE), assessing moral character and strategic thinking to counter in appointments. Though precursors to the later written keju system, these oral tests emphasized real-time articulation and adaptability, reflecting a causal link between verbal prowess and effective governance. In , particularly from the 5th century BCE, the exemplified oral assessment as a dialectical process of questioning to expose inconsistencies in beliefs and compel self-examination. (c. 470–399 BCE) employed elenchus in public forums, grilling interlocutors on definitions and premises to test intellectual rigor, influencing educational practices that valued oral disputation over . This approach, documented in Plato's dialogues, underscored the primacy of spoken reasoning in evaluating philosophical and ethical competence, laying groundwork for later rhetorical training in academies.

Imperial Civil Service Systems

The imperial civil service examination system in , known as keju, emerged as a merit-based mechanism for selecting bureaucratic officials, with roots in the Han dynasty's established around 124 BCE to evaluate candidates' moral character and talents through recommendations and basic testing. Systematic implementation began under the (581–618 CE), which introduced regular provincial and capital examinations focused on Confucian , , and essays to replace hereditary appointments and reduce aristocratic dominance. This shift was driven by the need for administrative efficiency in governing a vast empire, as evidenced by Emperor Wen of Sui's reforms emphasizing textual mastery over lineage. The system matured during the (618–907 CE), expanding to include three tiers: local shengyuan (student member) exams, provincial juren (recommended man) tests, and the prestigious metropolitan (presented scholar) examination held triennially in the capital. By the (960–1279 CE), keju became the primary recruitment path, with over 20,000 candidates competing annually for fewer than 300 jinshi degrees, prioritizing rote memorization of the and policy analysis to ensure ideological alignment with Confucian governance principles. Despite theoretical , empirical data from Tang records show that passage rates hovered below 1% for higher levels, and access favored families able to afford prolonged education, though it enabled limited upward mobility for non-elites compared to pure . Under the Ming (1368–1644 CE) and Qing (1644–1912 CE) dynasties, the system rigidified with the "" format—a structured argumentative style enforcing orthodoxy—and palace exams for final selection by the , producing around 400–500 top officials per cycle amid millions of entrants. scandals, such as proxy test-taking and , were recurrent, prompting measures like secluded exam halls and tattooed identification, yet the process sustained bureaucratic competence by filtering for diligence and classical knowledge, contributing to imperial longevity through standardized administration. The keju was abolished in 1905 during , influenced by Western models and internal failures like the 1895 Sino-Japanese War, which highlighted technological and military shortcomings unaddressed by classical focus. China's model influenced tributary states: Korea adopted a parallel gwageo system from 958 CE under the Goryeo dynasty, testing Confucian texts for official selection until 1894, with similar tri-level structure but smaller scale yielding about 30–50 passers yearly. Vietnam's thi huong exams, initiated in 1075 CE by Emperor Ly Thanh Tong, mirrored keju in content and hierarchy, culminating in Hanoi-based finals, and persisted until 1919 under French colonial pressure, emphasizing Vietnamese adaptations of Chinese classics for bureaucratic staffing. These systems promoted cultural and merit selection but inherited limitations like gender exclusion—women barred until rare exceptions—and overemphasis on literary skills at the expense of practical expertise, as critiqued in historical records for fostering rote learners over innovators.

Spread to Non-Asian Cultures

Knowledge of the Chinese imperial examination system reached in the late through Jesuit missionaries such as , who documented its merit-based selection of officials in works like De Christiana expeditione apud Sinas (1615), highlighting its role in promoting administrative competence over hereditary privilege. European intellectuals, including , praised the system in the for its emphasis on scholarly merit, contrasting it with Europe's patronage-driven bureaucracies. By the mid-19th century, amid scandals over in the British Civil Service—such as appointments based on political connections rather than ability—reformers explicitly drew on the Chinese model. The Northcote-Trevelyan Report, published on November 23, 1853 (presented to in 1854), recommended open competitive examinations for civil service entry, citing the "system of examination in " as a proven mechanism for selecting capable administrators through rigorous testing. Authored by Stafford Northcote and Charles Trevelyan, the report argued that such exams would ensure recruitment from the most talented candidates regardless of social origin, leading to the establishment of the British Civil Service Commission in 1855 and the first competitive exams for the in 1855. This reform spread merit-based testing to colonial administrations, including competitive exams for the held in from 1855 onward, which prioritized intellectual merit over aristocratic ties. The British model influenced other Western nations. In the United States, the of January 16, 1883, introduced merit-based exams for federal positions following President James A. Garfield's by a disgruntled office-seeker, with advocates referencing both British and ancient Chinese precedents to argue for examinations as a bulwark against spoils systems. , which had implemented concours d'entrée for its grandes écoles since the (e.g., École Polytechnique in 1794), further formalized civil service exams in the 1870s, incorporating competitive elements akin to those praised in Chinese reports. These adaptations emphasized practical and over the Confucian of the Chinese original, but retained the core principle of standardized testing for impartial selection, spreading to professional licensing and educational assessments across Europe and by the early .

Modern Standardization and Expansion

In the mid-19th century, standardized written examinations emerged in the United States as a means to assess student performance uniformly, replacing inconsistent oral evaluations. , secretary of the , advocated for written tests in 1845 to provide objective data on teaching effectiveness in . By 1851, introduced standardized entrance examinations in response to variability in preparatory schooling, marking an early shift toward merit-based academic selection. These developments reflected growing demands for accountability in expanding public education systems. The late 19th and early 20th centuries saw the integration of psychological measurement into standardized testing. In 1905, French psychologist created the Binet-Simon scale, the first practical intelligence test, designed to identify schoolchildren requiring remedial education rather than to rank innate ability. American psychologist revised it into the Stanford-Binet Intelligence Scale in 1916, introducing the IQ concept as a ratio of to chronological age. During , the U.S. Army administered the (for literates) and Beta (for illiterates or non-English speakers) group tests to approximately 1.7 million recruits between 1917 and 1919, enabling rapid classification for military roles based on cognitive aptitude. These tests demonstrated the scalability of standardized assessment for large populations. Standardized college entrance exams proliferated in the early to facilitate admissions amid rising university enrollments. The College Entrance Examination Board, established in 1900, conducted its first nationwide exams in 1901 across nine subjects to ensure consistent evaluation of applicants. The Scholastic Aptitude Test (SAT), developed by and influenced by Army testing methods, debuted in 1926 as an aptitude measure for elite institutions, initially administered to about 8,000 students. The American College Testing Program (ACT), introduced in 1959 by E.F. Lindquist, emphasized achievement in core subjects and gained traction in Midwestern and less selective colleges as an alternative. Post-World War II, standardized testing expanded globally alongside mass education initiatives and economic reconstruction. In the United States, the of 1944 enabled millions of veterans to pursue higher education, necessitating broader use of exams like for selection amid enrollment surges from under 1.5 million students in 1940 to over 2.6 million by 1950. Internationally, and centralized education reforms in , , and led to widespread adoption of national high-stakes exams for secondary and tertiary sorting, with the prevalence of such systems rising from limited use in 1960 to near-universal in many countries by the 1990s. This era solidified exams as tools for meritocratic allocation in diverse contexts, though debates persist over their validity in capturing complex abilities beyond test performance.

Contemporary Applications

Educational and Academic Testing

Exams serve as primary tools for assessing student knowledge and skills in contemporary K-12 and higher education systems, enabling evaluation of learning outcomes and institutional accountability. In K-12 settings, high-stakes standardized tests, mandated by laws such as the of 2001 and its successor the Every Student Succeeds Act of 2015, measure proficiency in core subjects like and reading to identify underperforming schools and inform . These assessments aim to drive instructional improvements, though empirical data indicate mixed impacts, including curriculum narrowing toward tested content without consistent gains in broader skills. In higher education, final exams in courses typically constitute 25-30% of overall grades, testing cumulative and application under timed conditions to simulate real-world pressures. Cumulative final exams, which cover material from the entire term, yield approximately 4.91% higher scores on assessments compared to non-cumulative formats, demonstrating enhanced retention through retrieval practice known as the . Standardized admissions tests like correlate with first-year college GPA at 0.37, providing incremental beyond high school grades for academic success. Similarly, the GRE predicts graduate student outcomes such as GPA across disciplines, with meta-analyses confirming its utility despite debates over equity. While critics argue high-stakes exams induce anxiety and incentivize rote memorization, evidence from controlled experiments shows testing reinforces long-term learning more effectively than restudying alone. In professional graduate programs, standardized tests maintain for performance, countering claims of with data from longitudinal studies. However, over-reliance on exams can overlook non-cognitive factors, prompting hybrid approaches incorporating portfolios or projects, though pure exam formats remain dominant for their objectivity and scalability in large-scale evaluations.

Professional Licensing and Certification

Professional licensing examinations are standardized tests administered by government regulatory bodies or authorized organizations to assess whether candidates meet the minimum competency standards required to legally practice in regulated occupations. These exams evaluate essential for safe and effective performance, with the primary aim of safeguarding , , and welfare by preventing unqualified individuals from entering the . In contrast, professional certifications are often voluntary credentials issued by private entities to denote specialized expertise, though they may be mandated by employers or serve as prerequisites for licensure in some fields. In the United States, , which typically culminates in passing a licensing exam, applies to approximately 20 percent of the as of recent estimates, a figure that has doubled since the when it covered only about 5 percent. Over 1,000 occupations across states require such licensure, ranging from high-risk fields like and to lower-risk ones such as and hair braiding. Licensing exams vary by profession; for instance, the (USMLE) consists of three steps testing basic science, clinical knowledge, and patient management, with first-time pass rates exceeding 90 percent for Steps 1 and 2 among U.S. graduates. The for lawyers, often the Uniform Bar Exam in adopting states, assesses legal knowledge and reasoning, with pass rates typically ranging from 60 to 70 percent depending on and candidate background. Other prominent examples include the Principles and Practice of Engineering (PE) exam for licensed engineers, which evaluates advanced application of engineering principles and has pass rates around 60-70 percent across disciplines like civil and electrical engineering. The Certified Public Accountant (CPA) exam, required for accounting licensure, covers auditing, business environment, financial reporting, and regulation, with average pass rates of 45-60 percent per section. For nursing, the National Council Licensure Examination (NCLEX-RN) tests entry-level clinical judgment, achieving first-time pass rates of about 85-90 percent for U.S.-educated candidates. These exams often incorporate multiple-choice questions, simulations, and practical components, with scores scaled to ensure reliability across administrations. Empirical analyses of licensing exams' efficacy reveal mixed outcomes: while they demonstrably filter for basic competence in complex fields like , where errors carry high stakes, broader studies indicate limited evidence of improved or in many licensed occupations, alongside that reduce labor mobility and elevate prices. For example, licensing correlates with higher wages for —up to 15 percent premiums—but also restricts job switching across states and disproportionately affects lower-income and minority workers seeking entry. Critics argue that exam requirements, intended as signals, sometimes prioritize over benefit, as evidenced by licensing of low-risk trades without proportional gains. Internationally, similar systems exist, such as the European Union's mutual recognition directives for qualifications, which often hinge on standardized exams, though enforcement varies by .

Selection for Admissions and Employment

Standardized tests such as and ACT play a central role in admissions by evaluating cognitive abilities relevant to academic success. Empirical research demonstrates that these tests predict first-year grade point average (FYGPA) with validity coefficients typically between 0.3 and 0.5, with predictive power increasing to approximately 0.4-0.6 when combined with high school GPA. They also forecast degree completion and long-term academic outcomes, maintaining consistent validity across racial and socioeconomic groups, which refutes assertions of inherent . In selection, cognitive tests—often structured as timed exams assessing reasoning, problem-solving, and application—emerge as the strongest single predictor of job . Meta-analyses report uncorrected validity coefficients of 0.51 for general mental (GMA) against supervisory ratings of , outperforming other methods like interviews (0.38) or years of experience (0.18). This correlation holds across job levels and industries, with GMA explaining up to 25-30% of variance in outcomes due to its causal link to learning, adaptability, and task complexity handling. Civil service examinations, adapted from historical merit systems, remain a for hiring in many nations, prioritizing exam scores to minimize and ensure competence. Studies on digitized historical records from 19th-20th century bureaucracies show that replacing with exam-based selection improved administrative efficiency and reduced , while contemporary analyses in systems like India's reveal moderate correlations between exam performance and on-the-job effectiveness. In the U.S. federal government, exams like the Professional and Administrative Career Examination (PACE) have facilitated entry-level hiring, though their scope is limited, accounting for only about 5% of hires in the late 1970s; modern equivalents continue to validate against training success and productivity. Despite criticisms questioning the primacy of cognitive tests amid evolving job demands, replicated meta-analyses affirm their enduring utility, with validity stable even as experience accumulates, underscoring exams' role in meritocratic selection over subjective alternatives prone to .

Specialized Uses in Intelligence, Immigration, and Competitions

Exams play a critical role in the recruitment and selection processes of intelligence agencies, where they assess candidates' cognitive abilities, analytical skills, and suitability for handling . Agencies such as the U.S. (CIA) incorporate aptitude tests and structured interviews as part of initial screening to evaluate problem-solving and capabilities essential for intelligence analysis. Similarly, the (FBI) administers a Phase I computerized test lasting approximately three hours, comprising cognitive, behavioral, and components to predict performance in investigative roles. These assessments prioritize merit-based selection, often supplemented by polygraph examinations to verify truthfulness, though the latter focuses more on background validation than academic knowledge. In immigration and naturalization contexts, exams ensure applicants demonstrate basic integration into the host society's language and civic framework. The United States Citizenship and Immigration Services (USCIS) requires naturalization candidates to pass an English language test and a civics examination, unless exempted by age or disability. As of October 20, 2025, the updated civics test draws from a pool of 128 questions, presenting 20 orally to applicants who must correctly answer at least 12 to pass, emphasizing historical facts, government structure, and rights under the U.S. Constitution. This format replaced the prior 100-question, 10-answer version to enhance rigor while maintaining accessibility, with pass rates historically around 90% for prepared applicants. Competitive exams determine eligibility and rankings in academic and professional contests, selecting top performers for scholarships, olympiads, or elite opportunities based on demonstrated excellence. The National Merit Scholarship Qualifying Test (), taken by over 1.5 million U.S. high school students annually in October, serves as an initial qualifier for merit-based awards, with semifinalists advancing based on scores. In international academic competitions, such as the , national qualifying exams filter participants through multi-stage written tests assessing advanced problem-solving under time constraints. Professional equivalents, like those in DECA's career cluster events, combine multiple-choice exams with case studies to evaluate in simulated scenarios. These formats emphasize objective metrics over subjective evaluations, though preparation intensity can vary, with success correlating strongly to prior academic achievement and targeted practice.

Assessment Formats

Written Tests and Variations

Written tests constitute a core assessment format in examinations, wherein candidates generate responses in textual form—traditionally on paper, though increasingly via digital interfaces—to evaluate comprehension, analytical skills, and application of . These tests differ from oral or performance-based methods by emphasizing written articulation, which permits structured of factual recall, reasoning, and synthesis under controlled conditions. Objective formats prioritize unambiguous scoring through fixed-answer options, while subjective variants allow for open-ended expression, though the latter introduce greater inter-rater variability in . Objective written tests encompass multiple-choice questions (MCQs), true/false items, matching exercises, and completion tasks, each designed for high reliability via predetermined keys that minimize subjective judgment. MCQs, for instance, present a stem with four or five options, one correct, enabling coverage of broad content in limited time; their scoring consistency yields test-retest reliabilities often exceeding 0.80, surpassing subjective counterparts. True/false questions test binary factual accuracy but risk guessing inflation without penalty adjustments, while matching pairs concepts to definitions, promoting associative recall. Completion items require filling blanks with precise terms, balancing brevity and specificity. These formats excel in large-scale administration, as machine-scorable versions reduce human error, though they may underassess like evaluation or creation. Subjective written tests include short-answer and extended-response essays, which demand constructed to demonstrate depth. Short-answer questions elicit concise explanations, typically 1-5 , scoring via rubrics that award partial credit for logical steps; they bridge objective efficiency with interpretive demands. Essays, conversely, require comprehensive arguments or analyses, often 300-1000 words, assessed on criteria like coherence, integration, and —yet reliabilities hover around 0.50-0.70 due to grader subjectivity, necessitating multiple evaluators or anchors for . Hybrid variations blend formats, such as MCQs with explanatory justifications, to combine reliability with reasoning probes. Additional variations adapt written tests to contexts: timed administrations simulate , enforcing completion within 1-3 hours to mirror real-world constraints; closed-book setups retention, while open-book or take-home formats evaluate utilization and synthesis. Proctored exams ensure via , contrasting unproctored online submissions vulnerable to , as evidenced by detection rates below 10% in some platforms without verification. These adaptations persist due to their , with objective tests dominating high-stakes uses like licensing—e.g., over 90% of U.S. board exams employ MCQs—while subjective elements persist for disciplines requiring nuanced expression, such as or .

Oral and Performance-Based Exams

Oral examinations, also known as viva voce assessments, involve direct verbal interaction between an examiner and examinee to evaluate knowledge, reasoning, and communication skills through questioning and response. These formats originated in ancient educational practices and were predominant in European universities, such as and , where 16th-century exams were conducted orally in Latin before public audiences. By the , a shift toward written exams occurred for greater and scalability, as advocated by reformers like in 1845, who criticized annual oral recitations for their inconsistency. Despite this, oral exams persist in higher education for defenses and in fields requiring nuanced judgment, such as and , where structured formats enhance student motivation and performance outcomes. In professional licensing, oral components assess applied competencies beyond rote memorization; for instance, postgraduate medical examinations use structured vivas to test clinical reasoning, achieving high validity and reliability when standardized protocols are employed. Reliability concerns arise from examiner subjectivity and variability, though and rubrics mitigate these, yielding inter-rater consistency comparable to written tests in controlled settings. Empirical studies indicate that oral assessments better predict real-world application in interactive domains but demand careful design to avoid bias from examiner fatigue or cultural differences in verbal expression. Performance-based exams require examinees to demonstrate practical skills through tasks simulating real-world conditions, such as simulations, labs, or physical maneuvers, rather than theoretical recall. These are integral to licensing in high-stakes professions: aviation certifications involve evaluations, while healthcare uses Objective Structured Clinical Examinations (OSCEs) with standardized patients to score procedural proficiency. Driving tests mandate observed vehicle operation, and military fitness assessments, like the U.S. Army Physical Fitness Test, measure endurance via timed runs and repetitions. Reliability in performance assessments improves with detailed rubrics and multiple raters, as evidenced by studies showing generalizable scores across tasks when error sources like rater inconsistency are minimized. They outperform knowledge-only tests in predicting on-the-job effectiveness, particularly for skill-based roles, though logistical demands—such as equipment needs and trained observers—limit scalability compared to written formats. In educational contexts, performance tasks in or vocational correlate strongly with criterion measures of functional ability, supporting their use for causal evaluation of applied learning.

Digital, Adaptive, and Emerging Formats

Computer-based testing (CBT), also known as digital exams, involves administering assessments via computers or online platforms, enabling automated grading, immediate result delivery, and scalable administration for large cohorts. This format gained prominence with the rise of standardized testing organizations adopting it for efficiency, such as in professional certifications where it reduces logistical costs compared to paper-based alternatives. Empirical studies indicate CBT can yield comparable or superior measurement precision when designed properly, though challenges persist, including technical glitches, unequal access due to disparities, and potential mode effects where student scores drop by up to 0.2-0.3 standard deviations in CBT versus paper formats, as observed in South Carolina's statewide transition in 2015-2019. Computerized adaptive testing (CAT) represents an advanced digital subset, dynamically selecting question difficulty based on real-time respondent performance to optimize information gain per item, typically requiring 30-50% fewer questions than fixed-form tests for equivalent reliability. Originating from theoretical foundations in the 1940s and computationally feasible by the 1970s through , CAT has been implemented in high-stakes exams like the Graduate Record Examination (GRE) since 1994 and medical licensing tests, demonstrating improved measurement efficiency and reduced test exposure time without compromising validity. A 2024 of CAT effects confirmed its benefits in enhancing score precision across diverse examinee groups, though performance differentials appear for students with special educational needs, suggesting calibration adjustments for equity. Emerging formats integrate (AI) and advanced technologies to address integrity and personalization challenges in remote settings. AI-proctoring systems, employing facial recognition, gaze tracking, and behavioral , have proliferated post-2020, with the global online exam proctoring market projected to reach $2.83 billion by 2031, driven by automated flagging of anomalies like multiple faces or unauthorized devices. These tools enable scalable remote assessments while minimizing human oversight, as evidenced in platforms reducing proctor dependency by over 90% in automated modes, though false positives and concerns necessitate rigorous validation against empirical cheating detection benchmarks. Experimental integrations of for tamper-proof certification and (VR) for immersive performance simulations are under exploration in niche applications like professional training, but widespread adoption remains limited by and evidentiary gaps in as of 2025. Overall, these innovations prioritize causal mechanisms of accurate ability estimation over traditional fixed formats, yet require ongoing psychometric scrutiny to ensure robustness across populations.

Preparation and Strategies

Evidence-Based Study Techniques

Distributed practice, also known as spaced repetition, involves scheduling study sessions over increasing intervals rather than massing them in a single cramming period, leading to superior long-term retention compared to massed practice. A meta-analysis of 242 studies on learning techniques reported an effect size of d=0.70 for distributed practice, indicating robust benefits across age groups, materials, and retention intervals, particularly when spacing aligns with the desired forgetting curve for exam timing. This technique leverages the spacing effect, where repeated retrieval strengthens memory traces through consolidation processes, as evidenced in experiments showing doubled retention rates after spaced reviews versus immediate repetition. Practice testing, or active recall, requires actively retrieving information from through self-quizzing or low-stakes tests, outperforming passive rereading for both immediate and delayed exam performance. The same assigned it the highest utility among reviewed techniques, with d=0.74, effective across formats like or cued questions and enhanced by immediate feedback to correct errors. and classroom studies, such as those using flashcards or past exam questions, demonstrate that practice testing promotes deeper encoding and metacognitive monitoring, reducing overconfidence in weak areas and improving scores by 10-20% on final assessments. Interleaved practice mixes different topics or problem types within a study session, contrasting with blocked practice of one type at a time, and fosters better and application to problems. Meta-analytic evidence yields a moderate of d=0.53, with stronger gains in procedural skills like math or science where distinguishing categories is key, as interleaving encourages contextual cues over rote familiarity. A separate of interleaving confirmed benefits for category learning (Hedges' g=0.67), though effects diminish with highly similar materials and require initial guidance to avoid confusion in novices. Less effective techniques, such as highlighting key text or summarization, show limited utility (d≈0.44-0.50) primarily for surface-level rather than comprehension or transfer, often failing without extensive and prone to illusory . Combining high-utility methods—like active with interleaving—yields synergistic effects, as supported by cognitive models emphasizing retrieval strength and contextual variability for durable . Empirical caveats include greater benefits for factual and near-transfer tasks over , with lower-achieving students showing amplified gains from structured implementation.

Psychological and Motivational Factors

, characterized by cognitive and emotional distress before or during exams, impairs performance by overloading and disrupting concentration. A 30-year meta-analysis of over 100 studies found a significant negative between test anxiety and educational outcomes, including exam scores, with effect sizes indicating moderate interference across standardized tests and grade point averages. High anxiety levels exacerbate this through physiological , such as increased , which diverts resources from task execution, as evidenced in studies linking it to reduced under timed conditions. Self-efficacy, an individual's belief in their capacity to succeed in exam-related tasks, positively predicts outcomes more robustly than general in longitudinal analyses. in introductory courses showed self-efficacy at mid-semester explaining variance in final grades beyond initial motivation levels, with reciprocal effects where early performance boosts subsequent efficacy. Higher self-efficacy correlates with better regulation of study behaviors and resilience to setbacks, countering anxiety's effects in models integrating achievement emotions. Procrastination, often rooted in motivational deficits like low task value or fear of failure, negatively associates with exam via delayed preparation and incomplete mastery. A confirmed this inverse relationship, moderated by measurement type, with procrastinators showing lower GPAs due to rushed cramming rather than spaced retrieval. Active , involving intentional delay for incubation, yields neutral or positive effects in some contexts, but passive forms predominate and link to heightened stress and poorer retention during high-stakes assessments. Intrinsic , driven by interest in the material rather than external rewards, sustains deeper and superior exam results compared to extrinsic pressures alone. Empirical reviews highlight that expectancy-value frameworks, where students perceive high and success likelihood, forecast higher achievement scores, with autonomous enhancing persistence through autonomy support. Motivational strategies, such as reframing exam goals for personal relevance, maintain effort during preparation, as demonstrated in studies where they mediated sustained study time and improved scores over semesters. and perfectionism interact here, sometimes fueling over-preparation but often amplifying anxiety, underscoring the need for balanced self- to optimize outcomes.

Validity, Reliability, and Efficacy

Empirical Measures of Predictive Accuracy

Standardized exams, particularly those assessing cognitive abilities, demonstrate through correlation coefficients with subsequent performance metrics, such as first-year college grade point average (FYGPA) and job proficiency. Meta-analyses consistently report moderate to strong associations, with uncorrected validity coefficients typically ranging from 0.30 to 0.50 for academic outcomes and around 0.51 for occupational performance. These measures account for factors like range restriction in applicant pools, where corrected correlations often exceed 0.60, indicating substantial explanatory power beyond chance. In educational contexts, admissions tests like and ACT correlate with FYGPA at approximately 0.35 to 0.40, adding incremental validity over high school GPA (HSGPA), which itself yields correlations of 0.47. A of SAT validity found equivalent predictive strength to HSGPA (r=0.37) for first-year success, with combined use enhancing accuracy by 15% or more. For graduate admissions, the GRE predicts graduate GPA with similar moderate correlations (r≈0.30-0.40), outperforming undergraduate GPA in some domains while complementing it in others. These patterns hold across institutions, though validity slightly attenuates for retention and degree completion, where HSGPA edges out tests due to its aggregation of sustained effort.
PredictorCriterionUncorrected Validity (r)Source
SAT/ACT CompositeFirst-Year College GPA0.35-0.40
High School GPAFirst-Year College GPA0.47
GREGraduate GPA0.30-0.40
General Mental Ability TestsJob Performance0.51
General Mental Ability TestsTraining Success0.56
For occupational outcomes, cognitive ability tests—often administered as exams in selection processes—emerge as the strongest single predictor of job performance across professions, with meta-analytic evidence from over 85 years of data showing an operational validity of 0.51, rising to 0.65 when corrected for measurement error and range restriction. This surpasses other predictors like work experience (r=0.18) or interviews (r=0.27 uncorrected), and holds stable across job complexity levels and experience durations. UK-specific meta-analyses replicate these findings, with general mental ability (GMA) validities of 0.54 for performance and 0.63 for . Long-term outcomes, including and advancement, further align with early test scores, as cognitive measures forecast and health metrics that underpin professional success. These correlations reflect causal links via cognitive demands in learning and work, though critics in academic circles sometimes understate them amid equity concerns; however, the data persist across diverse samples and controls for socioeconomic factors. Incremental gains from combining exams with other metrics underscore their role in merit-based forecasting, with no evidence of diminished validity over time despite in non-test measures.

Strengths in Objectivity and Merit Assessment

Standardized exams enhance objectivity by employing predefined scoring rubrics and formats such as multiple-choice questions, which yield high coefficients often exceeding 0.90, minimizing discrepancies among evaluators compared to subjective methods like grading. This standardization ensures that performance is measured against uniform criteria, reducing the influence of personal biases, cultural preferences, or evaluator fatigue that plague holistic assessments. Empirical analyses confirm that items, when properly constructed, exhibit low susceptibility to construct-irrelevant variance, providing a consistent gauge of cognitive abilities across diverse test-takers. In merit assessment, exams facilitate the identification of individuals with requisite and skills through controlled, proctored conditions that isolate from external variables like socioeconomic networks or subjective recommendations. studies demonstrate that scores on tests like correlate with first-year GPA at rates of 0.3 to 0.5, with combined models incorporating high school GPA enhancing accuracy to explain up to 25% of variance in academic outcomes. These correlations hold across institutions, underscoring exams' utility in forecasting success in merit-based domains such as higher education and professional licensure, where competence directly predicts productivity. Exams promote meritocratic selection by prioritizing demonstrable over non-cognitive factors, enabling broader access to opportunities for high performers irrespective of background, as evidenced by historical expansions in admissions following test implementation. Unlike interviews or portfolios, which can favor articulate or connected candidates, standardized formats level the field by focusing on verifiable outputs, with research indicating they outperform alternative metrics in equitably ranking candidates for competitive fields. This approach aligns with causal mechanisms where tested skills causally contribute to real-world efficacy, as validated by longitudinal data linking exam performance to career attainment metrics.

Limitations Compared to Alternative Methods

Exams, as summative assessments, primarily measure recall and performance under timed conditions, which can undervalue sustained problem-solving and application skills better captured by alternatives such as or portfolios. For instance, traditional exams often prioritize lower-order cognitive processes like memorization, limiting their ability to assess higher-order skills such as critical analysis or , whereas performance-based methods demonstrate real-world application over time. Empirical comparisons in educational settings have shown that portfolio assessments correlate more strongly with competencies like attitudes and continuous development, areas where exams provide minimal insight due to their format constraints. High-stakes exams introduce additional validity challenges through factors like and processing speed, which do not reliably reflect underlying knowledge or ability, unlike continuous assessments that allow multiple opportunities for demonstration and feedback. indicates that time-limited testing reduces inclusivity and equity, as faster test-takers may outperform others despite equivalent mastery, a less prevalent in untimed alternatives like extended projects. In one study replacing traditional exams with collaborative projects in courses, student outcomes improved significantly, suggesting exams may constrain deeper engagement compared to methods fostering and . Furthermore, exams encourage cramming and extrinsic motivation, potentially hindering long-term retention and broad skill development, in contrast to formative alternatives that integrate ongoing to promote intrinsic learning. High-stakes formats have been linked to narrowing, where instruction focuses on testable content at the expense of interdisciplinary or practical skills better evaluated through portfolios or authentic tasks. While exams offer efficiency in scoring, their snapshot nature yields lower predictive power for non-academic outcomes, such as adaptability, where alternative methods provide of persistent effort and adaptability.

Criticisms and Controversies

Claims of Cultural and Socioeconomic Bias

Critics have argued that standardized exams disadvantage students from lower socioeconomic backgrounds due to disparities in access to quality education, , and resources, which correlate strongly with score outcomes. Studies show that SAT scores increase monotonically with family income levels, with test-takers from households in the top income quintile scoring an average of 400 points higher than those from the bottom quintile. Similarly, among applicants, family income, parental education, and race together account for over 40% of the variance in SAT/ACT scores as of 2020, up from 25% in 1994, a trend attributed by proponents of claims to unequal preparatory opportunities rather than innate . These gaps persist even after adjustments, leading some researchers to contend that exams encode socioeconomic privilege by rewarding familiarity with testing formats often unavailable to low-income students. Cultural bias claims posit that exam content embeds assumptions from dominant Western or middle-class norms, such as vocabulary or analogies drawn from specific cultural contexts, disadvantaging non-native or minority students. For example, historical analyses trace standardized testing origins to early 20th-century eugenics movements, where tests were used to justify racial hierarchies, fueling modern assertions that residual item biases persist in assessing abstract reasoning through culturally loaded prompts. (DIF) analyses in some studies have identified items where racial or ethnic groups perform differently even at equal ability levels, suggesting potential cultural loading in standardized assessments of achievement. Proponents, including education advocacy groups, argue this contributes to persistent racial score gaps, with Black and Hispanic students averaging 150-200 points lower on than white peers, interpreted as of systemic exclusion rather than preparation deficits. However, psychometric evaluations counter that modern exams undergo rigorous debiasing processes, including DIF reviews and culture-reduced item design, rendering inherent minimal compared to environmental factors like schooling quality. Longitudinal data indicate that socioeconomic correlations with scores largely reflect pre-existing academic skill differences, as SAT predictive validity for GPA holds across SES strata and exceeds that of high school grades alone. Claims of are often critiqued for conflating outcome disparities—driven by causal chains of family investment in —with test construction flaws, as gaps narrow with equivalent preparation but do not eliminate entirely due to non-cultural cognitive variances. Empirical reviews emphasize that while access inequities amplify disparities, exams provide a relatively objective merit signal amid subjective alternatives like essays, which also correlate with through stylistic advantages.

Impacts on Student Well-Being and Learning

High-stakes exams are associated with elevated levels of among , which correlates with impaired cognitive performance during assessments. Meta-analyses indicate that students experiencing higher exhibit reduced , with anxiety interfering with and , leading to scores approximately 0.2 to 0.5 standard deviations lower than those of low-anxiety peers. Empirical studies report that up to 40% of students in high-pressure testing environments, such as university entrance exams, experience significant pre-exam stress manifesting as sleep disturbances, elevated levels, and symptoms akin to . This anxiety contributes to broader declines, including increased risks of depression and diminished , particularly when exams determine progression or admission. Longitudinal data from cohorts show that persistent exposure to exacerbates , with students reporting higher incidences of mood instability and workload overwhelm compared to low-stakes assessment groups. However, moderate anxiety can serve as a motivator for preparation in some individuals, prompting enhanced study efforts without overwhelming cognitive resources, though this effect diminishes under extreme stakes. Regarding learning outcomes, frequent testing promotes long-term retention through the "testing effect," where retrieval practice during exams strengthens more effectively than repeated studying alone, yielding retention gains of 10-20% in controlled experiments across subjects. Yet high-stakes formats often incentivize superficial cramming and "," prioritizing rote memorization over conceptual understanding, as evidenced by reviews showing minimal transfer to untested skills and effect sizes on deeper learning below 0.1 standard deviations. Such practices correlate with reduced intrinsic and , as students and educators focus narrowly on exam formats rather than broad knowledge application, per analyses of narrowing in tested domains. Overall, while exams provide structured feedback that can enhance achievement in motivated learners, the high-stakes variant amplifies costs without proportionally advancing , as systematic reviews highlight opportunity costs like foregone creative pedagogies. Interventions such as optional retakes have demonstrated anxiety reductions of 15-25% without inflating scores unduly, suggesting pathways to mitigate harms while preserving assessment utility.

Debates on High-Stakes Testing and Reforms

Proponents of , such as those underpinning the of 2001, contend that linking test outcomes to consequences like school funding or teacher evaluations enforces and elevates educational standards, potentially driving short-term gains in measured skills. Empirical analyses, however, reveal mixed results; for instance, a 2012 review across multiple states found no consistent evidence of sustained student achievement improvements from high-stakes policies, with some isolated math gains but negligible effects in reading or other subjects. Similarly, studies on Chicago's system post-1996 reforms indicated modest test score increases but questioned their translation to broader learning outcomes, attributing rises partly to narrowing toward tested content. Critics argue that high-stakes mechanisms incentivize "," inflating scores without genuine skill enhancement and distorting curricula by de-emphasizing untested areas like arts, civics, or . Longitudinal data supports this, showing score inflation uncorrelated with external assessments like NAEP, suggesting superficial preparation over deep understanding. Furthermore, such testing correlates with heightened student anxiety and reduced motivation, with surveys indicating lower confidence and engagement among elementary pupils facing promotion-linked exams. Opponents also highlight inequitable impacts, where low-income or minority students experience amplified pressure without proportional benefits, exacerbating dropout risks in high-failure jurisdictions. Reform efforts have sought to mitigate these issues by de-emphasizing singular test reliance. The Every Student Succeeds Act of 2015 replaced NCLB's rigid proficiency mandates with state-designed systems incorporating multiple indicators, such as growth metrics and school quality measures, allowing flexibility in identifying underperforming schools without uniform sanctions. States like , post-1993 standards-based reforms, integrated high-stakes elements with portfolio assessments, yielding higher NAEP scores but prompting debates over whether gains stemmed from testing pressure or concurrent investments. Advocacy for alternatives, including performance-based assessments and provisions, has grown; by 2023, movements in states like New York reported increased parental refusals for exams like Regents, correlating with policy shifts toward formative evaluations that enhance engagement without compromising learning. These reforms prioritize causal linkages between assessment and instruction, aiming for validity over punitive stakes, though empirical validation remains ongoing amid source biases in pro-reform academic literature favoring reduced testing.

Cheating and Integrity Challenges

Common Methods and Empirical Prevalence

Common methods of exam cheating encompass both low-tech and technology-assisted techniques. The most frequently reported in-person method is copying answers from a peer's paper, often through collusion where students allow others to view their work. Other traditional approaches include concealing unauthorized notes on body parts, clothing, or small objects like rulers; writing formulas on hands or arms; and creating distractions to facilitate copying. In proctored settings, impersonation by proxies or bribing administrators occurs less commonly but has been documented in high-stakes tests. Technology has expanded cheating opportunities, particularly in online exams. Students frequently access external aids such as search engines, notes, or AI tools without permission; collaborate via messaging apps; or use secondary devices and virtual machines to evade proctoring software. Pre-loaded smartwatches, earpieces for receiving answers, and hacked exam platforms represent advanced electronic methods, though detection risks limit their use compared to simpler collusion. Empirical prevalence relies primarily on self-reported surveys, which may underestimate actual rates due to , though consistency across studies suggests widespread occurrence. Among students, 50-70% admit to on at least one exam, with 43% specifically reporting exam-related . The International Center for estimates over 60% of undergraduates engage in some , while high school students self-report test at 64%. exam self-reports averaged 44.7% in a of surveys, surging to 54.7% during the from 29.9% pre-pandemic, attributed to reduced oversight. Unproctored exams saw initial rates of 70%, dropping to 15% with explicit warnings and penalties. These figures vary by context, with higher rates in high-pressure or low-integrity environments, but peer-reviewed data consistently indicate affects a of students at some point.

Detection Technologies and Preventive Measures

Detection technologies for exam cheating primarily encompass AI-driven proctoring systems, which employ algorithms to monitor test-takers' eye movements, head orientations, and facial expressions in real time, flagging anomalies such as aversion or multiple faces indicative of . These systems also integrate device detection to identify unauthorized secondary screens, phones, or virtual machines, with behavioral analytics assessing patterns like unusual typing speeds or mouse movements that deviate from norms. A systematic review of AI-based proctoring analyzed over 50 studies, highlighting techniques like for , though noting variability in accuracy due to environmental factors such as lighting or . Biometric verification enhances identity assurance by capturing unique physiological traits, including geometry, fingerprints, or voice patterns, to confirm the test-taker matches the registered individual at exam start and intermittently thereafter. For instance, facial recognition systems cross-reference live feeds against pre-submitted photos, achieving reported match rates above 99% in controlled settings, while voice biometrics analyze spectral features during oral responses to detect substitutions. Empirical evaluations, such as a 2020 field study on in distance exams, demonstrated reduced impostor fraud in high-stakes certifications, though false positives from masks or accents necessitated hybrid human-AI review. Effectiveness of these technologies shows mixed empirical outcomes; a 2024 analysis of AI proctoring in undergraduate courses found no overall grade depression compared to non-proctored exams but course-specific reductions in one instance, attributing variability to adaptive cheating tactics like screen mirroring. In graduate settings, remote proctoring correlated with statistically lower average scores (p<0.05), suggesting deterrence of dishonesty but raising questions about stress-induced performance impacts. Accuracy studies report AI flagging precision around 85-95% for overt violations, yet under 70% for subtle aids like earpieces, underscoring the need for multimodal integration. Preventive measures emphasize structural and behavioral deterrents over reactive detection. Exam designs incorporating question banks with randomized selection and multiple versions reduce answer-sharing efficacy, with empirical data from controlled trials showing 20-30% drops in rates versus fixed formats. Assigned seating in proctored venues, as tested in a 2018 exam study involving 500+ students, yielded a significant 15% decline in social cheating behaviors like note-passing, per observational metrics. Non-invasive interventions, such as pre-exam pledges or visual reminders of without invasive monitoring, have demonstrated in randomized experiments; a 2023 study with 200 participants found a 25% reduction in self-reported intentions compared to controls, linked to heightened moral awareness rather than fear. Institutional models promoting voluntary training across programs, evaluated in a 2023 large-scale university analysis (n=10,000+ students), correlated with sustained 10-15% lower misconduct incidence over four years, outperforming isolated course modules by fostering systemic norms. These approaches prioritize causal factors like opportunity reduction and value reinforcement, with literature reviews confirming their superiority to punitive measures alone in sustaining long-term compliance.

Societal and Cultural Impacts

Role in Meritocracy and Social Mobility

Standardized exams serve as mechanisms for meritocratic selection by evaluating cognitive abilities and in a relatively objective manner, facilitating allocation of opportunities based on demonstrated competence rather than familial connections or alone. In systems relying on such assessments, high performers gain access to elite education and , theoretically decoupling advancement from inherited privilege. Empirical studies indicate that exam scores retain for economic outcomes even after controlling for parental income, suggesting they capture individual merit contributing to productivity and success. Historically, China's imperial civil service examination system, operational from 605 CE until its abolition in 1905, exemplified exams' role in enhancing . By prioritizing scholarly achievement over aristocratic birth, the exams enabled individuals from lower strata to enter the , with successful candidates often rising to influential positions; records show that up to 20-30% of degree holders in certain eras originated from non-elite families, fostering empire-wide stability through merit-based governance. The system's emphasis on rigorous testing of Confucian and administrative skills created pathways for upward movement, though success rates remained low—typically under 1% passing the highest level—requiring substantial preparation accessible via rather than wealth alone. In contemporary contexts, exam-based admissions correlate with greater intergenerational mobility. A study of South Korea's 1974 shift from nationwide high school entrance exams to district-based quotas found that the change increased intergenerational income elasticity from 0.22 to 0.37, implying reduced mobility as local advantages perpetuated inequality; under the exam regime, high-ability students from disadvantaged areas could compete nationally, elevating their earnings potential by 10-15% relative to peers. Similarly, U.S. data reveal SAT scores as strong predictors of adult earnings, with a one-standard-deviation increase in scores linked to 10-20% higher income in early career, independent of family background, underscoring exams' in identifying talent for high-value roles. Despite socioeconomic gradients in exam preparation—evident in U.S. SAT data where top-decile income students score 400 points higher on average than bottom-decile peers—standardized tests outperform alternatives like high school GPA in forecasting and labor market performance, with normalized predictive slopes four times greater. This resilience highlights causal links between tested abilities and outcomes, as measured by exams drive and economic value, thereby enabling mobility for those who excel irrespective of origin. Systems de-emphasizing exams risk entrenching ascriptive hierarchies, as seen in reduced mobility post-reforms in various jurisdictions.

Influence on Educational Policy and Equity

Standardized testing has profoundly shaped educational policy by establishing frameworks that tie school funding, teacher evaluations, and interventions to student performance metrics. The (NCLB), enacted in 2001 and implemented from 2002, required annual standardized testing in reading and for grades 3–8 and once in high school, mandating adequate yearly progress (AYP) toward 100% proficiency by 2014 or facing sanctions such as corrective actions or closures. This policy aimed to enforce uniform standards and close achievement gaps, resulting in measurable gains in state test scores, particularly in mathematics, with an average increase of 6.5 percentile points from 2002 to 2007 across affected grades. However, it also incentivized narrowing curricula toward tested subjects, reducing instructional time in non-tested areas like science and by up to 47% in some elementary schools. The Every Student Succeeds Act (ESSA) of 2015 replaced NCLB, preserving testing requirements but granting states greater flexibility in consequences, thereby moderating federal oversight while maintaining data-driven policy decisions. Regarding equity, high-stakes exams have enabled policy interventions targeting underperforming subgroups, such as disaggregated reporting under NCLB, which narrowed the black-white achievement gap in reading by about 50% between 2002 and 2009 through heightened focus on disadvantaged students. Yet empirical evidence indicates persistent socioeconomic disparities, as test scores correlate strongly with family income and parental education, with students from the highest SES outperforming the lowest by 1–2 standard deviations on average in large-scale assessments like NAEP. High-stakes accountability has sometimes exacerbated inequities by prompting schools in low-income districts to counsel out or disenroll economically disadvantaged students to avoid failing AYP thresholds, as evidenced by a 2–3% drop in such enrollments following negative ratings in urban districts. Additionally, access to resources amplifies gaps, with high-SES students gaining up to 0.1–0.2 standard deviations from private tutoring, while low-SES peers face barriers, contributing to widened performance differentials under pressure. Policy responses to these equity challenges include affirmative efforts like expanded access to free test prep in some jurisdictions and score-optional admissions in higher education, yet causal analyses reveal that removing high-stakes tests does not proportionally benefit underrepresented groups without addressing underlying preparation deficits, as admissions shifts favor applicants with stronger extracurricular profiles often held by privileged students. Overall, while exams provide verifiable, comparable data for allocating resources to equity gaps—such as targeted interventions yielding 5–10% score improvements in remedial programs—their reliance in policy risks perpetuating disparities absent complementary investments in early childhood education and family support, as SES-driven variance accounts for 40–60% of score differences in longitudinal studies.

Technological Advancements in Testing

Computerized adaptive testing (CAT), which tailors question difficulty to the test-taker's performance in real-time using algorithms, emerged in the mid-20th century and gained prominence with early implementations by the Educational Testing Service in the 1960s. The National Assessment of Educational Progress conducted one of the first large-scale CAT programs in 1979. Examples include the Graduate Management Admission Test (GMAT), SAT, and National Council Licensure Examination (NCLEX), where CAT reduces test length while maintaining accuracy by selecting items from a calibrated item bank based on item response theory. This approach minimizes respondent burden and enhances precision, as demonstrated in medical education applications where it has improved efficiency since the 1990s. Advancements in have integrated into proctoring and grading for online exams, particularly accelerated after 2020. AI proctoring systems employ facial recognition, eye-tracking, and behavioral analysis via webcams to detect anomalies like unauthorized gaze shifts or multiple faces, reducing reliance on human invigilators. Platforms such as those reviewed in systematic studies monitor tab changes, background noise, and environmental factors in real-time, with adoption surging during remote learning shifts. For grading, AI automates evaluation of subjective responses through , providing instant feedback and scaling assessments for large cohorts, as seen in K-12 tools that analyze patterns for formative purposes. Digital platforms have replaced traditional paper-based exams with interactive, tablet- or laptop-administered assessments, incorporating items and dynamic delivery. By 2024, these enabled personalized analytics and paths, with high-stakes exams leveraging automated scoring for objectivity. Post-2020 developments include enhanced biometric verification and AI agents for proctoring, as in systems like Alvy, which autonomously flag irregularities without constant human oversight. Such technologies address scalability in global testing while empirical data from implementations show improved security metrics, though challenges like require ongoing validation against ground-truth cheating rates.

Post-2020 Adaptations and Policy Changes

The COVID-19 pandemic, beginning in early 2020, led to the cancellation of standardized exams in numerous jurisdictions, including all state testing in the United States for that year, as administrators grappled with school closures and health risks. In response, some regions implemented interim measures, such as California's State Board of Education unanimously approving a shorter, streamlined assessment to replace traditional standardized tests in 2021. Similarly, professional licensing exams, like those from the National Council of Examiners for Engineering and Surveying (NCEES), shifted to reduced-capacity in-person formats with COVID-19 protocols or limited online options by October 2020. These adaptations prioritized continuity amid disruptions but raised concerns about data voids for evaluating school performance, with experts arguing against using incomplete 2020-2021 results for accountability ratings. University admissions policies underwent significant shifts, with widespread adoption of test-optional requirements for SAT and ACT scores starting in spring 2020 to accommodate testing center closures and access barriers. By 2021, over 1,800 U.S. four-year institutions had implemented such policies, a trend accelerated by the pandemic's inequities in and availability. However, from 2023 onward, selective institutions began reinstating mandatory testing; for instance, required scores for applicants to the Class of 2029 (entering fall 2025), followed by , Dartmouth, and MIT, which cited evidence that test scores better predict college GPA than high school grades alone. As of fall 2025 admissions, more than 2,000 colleges remained test-optional or test-free, though proponents of reinstatement argued that optional policies obscured without proportionally advancing equity goals. Technological integrations became prominent, with remote proctoring software and online platforms enabling supervised virtual exams to mitigate cheating risks and expand access. In , provinces like suspended standardized tests such as those from the Education Quality and Accountability Office in 2019-2020, contributing to a broader pre-existing decline in testing frequency, though some resumed by 2022 with hybrid formats. Policy debates post-2021 emphasized retaining standardized metrics for and problem identification, rather than permanent elimination, amid evidence of stalled math recovery and persistent reading declines in national assessments like the NAEP through 2022.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.