Recent from talks
Contribute something
Nothing was collected or created yet.
Code coverage
View on Wikipedia| Program execution |
|---|
| General concepts |
| Types of code |
| Compilation strategies |
| Notable runtimes |
|
| Notable compilers & toolchains |
|
In software engineering, code coverage, also called test coverage, is a percentage measure of the degree to which the source code of a program is executed when a particular test suite is run. A program with high code coverage has more of its source code executed during testing, which suggests it has a lower chance of containing undetected software bugs compared to a program with low code coverage.[1][2] Many different metrics can be used to calculate test coverage. Some of the most basic are the percentage of program subroutines and the percentage of program statements called during execution of the test suite.
Code coverage was among the first methods invented for systematic software testing. The first published reference was by Miller and Maloney in Communications of the ACM, in 1963.[3]
Coverage criteria
[edit]To measure what percentage of code has been executed by a test suite, one or more coverage criteria are used. These are usually defined as rules or requirements, which a test suite must satisfy.[4]
Basic coverage criteria
[edit]There are a number of coverage criteria, but the main ones are:[5]
- Function coverage – has each function (or subroutine) in the program been called?
- Statement coverage – has each statement in the program been executed?
- Edge coverage – has every edge in the control-flow graph been executed?
- Branch coverage – has each branch (also called the DD-path) of each control structure (such as in if and case statements) been executed? For example, given an if statement, have both the true and false branches been executed? (This is a subset of edge coverage.)
- Condition coverage – has each Boolean sub-expression evaluated both to true and false? (Also called predicate coverage.)
For example, consider the following C function:
int foo (int x, int y)
{
int z = 0;
if ((x > 0) && (y > 0))
{
z = x;
}
return z;
}
Assume this function is a part of some bigger program and this program was run with some test suite.
- Function coverage will be satisfied if, during this execution, the function
foowas called at least once. - Statement coverage for this function will be satisfied if it was called for example as
foo(1,1), because in this case, every line in the function would be executed—includingz = x;. - Branch coverage will be satisfied by tests calling
foo(1,1)andfoo(0,1)because, in the first case, bothifconditions are met andz = x;is executed, while in the second case, the first condition,(x>0), is not satisfied, which prevents the execution ofz = x;. - Condition coverage will be satisfied with tests that call
foo(1,0),foo(0,1), andfoo(1,1). These are necessary because in the first case,(x>0)is evaluated totrue, while in the second, it is evaluated tofalse. At the same time, the first case makes(y>0)false, the second case does not evaluate(y>0)(because of the lazy-evaluation of the Boolean operator), the third case makes ittrue.
In programming languages that do not perform short-circuit evaluation, condition coverage does not necessarily imply branch coverage. For example, consider the following Pascal code fragment:
if a and b then
Condition coverage can be satisfied by two tests:
a=true,b=falsea=false,b=true
However, this set of tests does not satisfy branch coverage since neither case will meet the if condition.
Fault injection may be necessary to ensure that all conditions and branches of exception-handling code have adequate coverage during testing.
Modified condition/decision coverage
[edit]A combination of function coverage and branch coverage is sometimes also called decision coverage. This criterion requires that every point of entry and exit in the program has been invoked at least once, and every decision in the program has taken on all possible outcomes at least once. In this context, the decision is a Boolean expression comprising conditions and zero or more Boolean operators. This definition is not the same as branch coverage,[6] however, the term decision coverage is sometimes used as a synonym for it.[7]
Condition/decision coverage requires that both decision and condition coverage be satisfied. However, for safety-critical applications (such as avionics software) it is often required that modified condition/decision coverage (MC/DC) be satisfied. This criterion extends condition/decision criteria with requirements that each condition should affect the decision outcome independently.
For example, consider the following code:
if (a or b) and c then
The condition/decision criteria will be satisfied by the following set of tests:
| a | b | c |
|---|---|---|
| true | true | true |
| false | false | false |
However, the above tests set will not satisfy modified condition/decision coverage, since in the first test, the value of 'b' and in the second test the value of 'c' would not influence the output. So, the following test set is needed to satisfy MC/DC:
| a | b | c |
|---|---|---|
| false | true | false |
| false | true | true |
| false | false | true |
| true | false | true |
Multiple condition coverage
[edit]This criterion requires that all combinations of conditions inside each decision are tested. For example, the code fragment from the previous section will require eight tests:
| a | b | c |
|---|---|---|
| false | false | false |
| false | false | true |
| false | true | false |
| false | true | true |
| true | false | false |
| true | false | true |
| true | true | false |
| true | true | true |
Parameter value coverage
[edit]Parameter value coverage (PVC) requires that in a method taking parameters, all the common values for such parameters be considered. The idea is that all common possible values for a parameter are tested.[8] For example, common values for a string are: 1) null, 2) empty, 3) whitespace (space, tabs, newline), 4) valid string, 5) invalid string, 6) single-byte string, 7) double-byte string. It may also be appropriate to use very long strings. Failure to test each possible parameter value may result in a bug. Testing only one of these could result in 100% code coverage as each line is covered, but as only one of seven options are tested, there is only 14.2% PVC.
Other coverage criteria
[edit]There are further coverage criteria, which are used less often:
- Linear Code Sequence and Jump (LCSAJ) coverage a.k.a. JJ-Path coverage – has every LCSAJ/JJ-path been executed?[9]
- Path coverage – Has every possible route through a given part of the code been executed?
- Entry/exit coverage – Has every possible call and return of the function been executed?
- Loop coverage – Has every possible loop been executed zero times, once, and more than once?
- State coverage – Has each state in a finite-state machine been reached and explored?
- Data-flow coverage – Has each variable definition and its usage been reached and explored?[10]
Safety-critical or dependable applications are often required to demonstrate 100% of some form of test coverage. For example, the ECSS-E-ST-40C standard demands 100% statement and decision coverage for two out of four different criticality levels; for the other ones, target coverage values are up to negotiation between supplier and customer.[11] However, setting specific target values - and, in particular, 100% - has been criticized by practitioners for various reasons (cf.[12]) Martin Fowler writes: "I would be suspicious of anything like 100% - it would smell of someone writing tests to make the coverage numbers happy, but not thinking about what they are doing".[13]
Some of the coverage criteria above are connected. For instance, path coverage implies decision, statement and entry/exit coverage. Decision coverage implies statement coverage, because every statement is part of a branch.
Full path coverage, of the type described above, is usually impractical or impossible. Any module with a succession of decisions in it can have up to paths within it; loop constructs can result in an infinite number of paths. Many paths may also be infeasible, in that there is no input to the program under test that can cause that particular path to be executed. However, a general-purpose algorithm for identifying infeasible paths has been proven to be impossible (such an algorithm could be used to solve the halting problem).[14] Basis path testing is for instance a method of achieving complete branch coverage without achieving complete path coverage.[15]
Methods for practical path coverage testing instead attempt to identify classes of code paths that differ only in the number of loop executions, and to achieve "basis path" coverage the tester must cover all the path classes.[citation needed][clarification needed]
In practice
[edit]This section needs additional citations for verification. (February 2015) |
The target software is built with special options or libraries and run under a controlled environment, to map every executed function to the function points in the source code. This allows testing parts of the target software that are rarely or never accessed under normal conditions, and helps reassure that the most important conditions (function points) have been tested. The resulting output is then analyzed to see what areas of code have not been exercised and the tests are updated to include these areas as necessary. Combined with other test coverage methods, the aim is to develop a rigorous, yet manageable, set of regression tests.
In implementing test coverage policies within a software development environment, one must consider the following:
- What are coverage requirements for the end product certification and if so what level of test coverage is required? The typical level of rigor progression is as follows: Statement, Branch/Decision, Modified Condition/Decision Coverage (MC/DC), LCSAJ (Linear Code Sequence and Jump)
- Will coverage be measured against tests that verify requirements levied on the system under test (DO-178B)?
- Is the object code generated directly traceable to source code statements? Certain certifications, (i.e. DO-178B Level A) require coverage at the assembly level if this is not the case: "Then, additional verification should be performed on the object code to establish the correctness of such generated code sequences" (DO-178B) para-6.4.4.2.[16]
Software authors can look at test coverage results to devise additional tests and input or configuration sets to increase the coverage over vital functions. Two common forms of test coverage are statement (or line) coverage and branch (or edge) coverage. Line coverage reports on the execution footprint of testing in terms of which lines of code were executed to complete the test. Edge coverage reports which branches or code decision points were executed to complete the test. They both report a coverage metric, measured as a percentage. The meaning of this depends on what form(s) of coverage have been used, as 67% branch coverage is more comprehensive than 67% statement coverage.
Generally, test coverage tools incur computation and logging in addition to the actual program thereby slowing down the application, so typically this analysis is not done in production. As one might expect, there are classes of software that cannot be feasibly subjected to these coverage tests, though a degree of coverage mapping can be approximated through analysis rather than direct testing.
There are also some sorts of defects which are affected by such tools. In particular, some race conditions or similar real time sensitive operations can be masked when run under test environments; though conversely, some of these defects may become easier to find as a result of the additional overhead of the testing code.
Most professional software developers use C1 and C2 coverage. C1 stands for statement coverage and C2 for branch or condition coverage. With a combination of C1 and C2, it is possible to cover most statements in a code base. Statement coverage would also cover function coverage with entry and exit, loop, path, state flow, control flow and data flow coverage. With these methods, it is possible to achieve nearly 100% code coverage in most software projects.[17]
Notable code coverage tools
[edit]Hardware manufacturers
[edit]Software
[edit]Usage in industry
[edit]Test coverage is one consideration in the safety certification of avionics equipment. The guidelines by which avionics gear is certified by the Federal Aviation Administration (FAA) is documented in DO-178B[16] and DO-178C.[18]
Test coverage is also a requirement in part 6 of the automotive safety standard ISO 26262 Road Vehicles - Functional Safety.[19]
See also
[edit]References
[edit]- ^ Brader, Larry; Hilliker, Howie; Wills, Alan (March 2, 2013). "Chapter 2 Unit Testing: Testing the Inside". Testing for Continuous Delivery with Visual Studio 2012. Microsoft. p. 30. ISBN 978-1621140184. Retrieved 16 June 2016.
- ^ Williams, Laurie; Smith, Ben; Heckman, Sarah. "Test Coverage with EclEmma". Open Seminar Software Engineering. North Carolina State University. Archived from the original on 14 March 2016. Retrieved 16 June 2016.
- ^ Joan C. Miller, Clifford J. Maloney (February 1963). "Systematic mistake analysis of digital computer programs". Communications of the ACM. 6 (2). New York, NY, USA: ACM: 58–63. doi:10.1145/366246.366248. ISSN 0001-0782.
- ^ Paul Ammann, Jeff Offutt (2013). Introduction to Software Testing. Cambridge University Press.
- ^ Glenford J. Myers (2004). The Art of Software Testing, 2nd edition. Wiley. ISBN 0-471-46912-2.
- ^ Position Paper CAST-10 (June 2002). What is a "Decision" in Application of Modified Condition/Decision Coverage (MC/DC) and Decision Coverage (DC)?
- ^ MathWorks. Types of Model Coverage.
- ^ "Unit Testing with Parameter Value Coverage (PVC)".
- ^ M. R. Woodward, M. A. Hennell, "On the relationship between two control-flow coverage criteria: all JJ-paths and MCDC", Information and Software Technology 48 (2006) pp. 433-440
- ^ Ting Su, Ke Wu, Weikai Miao, Geguang Pu, Jifeng He, Yuting Chen, and Zhendong Su. "A Survey on Data-Flow Testing". ACM Comput. Surv. 50, 1, Article 5 (March 2017), 35 pages.
- ^ ECSS-E-ST-40C: Space engineering - Software. ECSS Secretariat, ESA-ESTEC. March, 2009
- ^ C. Prause, J. Werner, K. Hornig, S. Bosecker, M. Kuhrmann (2017): Is 100% Test Coverage a Reasonable Requirement? Lessons Learned from a Space Software Project. In: PROFES 2017. Springer. Last accessed: 2017-11-17
- ^ Martin Fowler's blog: TestCoverage. Last accessed: 2017-11-17
- ^ Dorf, Richard C.: Computers, Software Engineering, and Digital Devices, Chapter 12, pg. 15. CRC Press, 2006. ISBN 0-8493-7340-9, ISBN 978-0-8493-7340-4; via Google Book Search
- ^ Y.N. Srikant; Priti Shankar (2002). The Compiler Design Handbook: Optimizations and Machine Code Generation. CRC Press. p. 249. ISBN 978-1-4200-4057-9.
- ^ a b RTCA/DO-178B, Software Considerations in Airborne Systems and Equipment Certification, Radio Technical Commission for Aeronautics, December 1, 1992
- ^ Boris beizer (2009). Software testing techniques, 2nd edition. Dreamtech press. ISBN 978-81-7722-260-9.
- ^ RTCA/DO-178C, Software Considerations in Airborne Systems and Equipment Certification, Radio Technical Commission for Aeronautics, January, 2012.
- ^ ISO 26262-6:2011(en) Road vehicles -- Functional safety -- Part 6: Product development at the software level. International Standardization Organization.
Code coverage
View on GrokipediaFundamentals
Definition and Purpose
Code coverage is a software testing metric that quantifies the extent to which the source code of a program is executed when a particular test suite runs.[7] It is typically expressed as a percentage, calculated as the ratio of executed code elements (such as statements, branches, or functions) to the total number of such elements in the codebase.[8] A test suite refers to a collection of test cases intended to validate the software's behavior under various conditions, while an execution trace represents the specific sequence of code paths traversed during the running of those tests. The primary purpose of code coverage is to identify untested portions of the code, thereby guiding developers to create additional tests that enhance software reliability and reduce the risk of defects in production.[9] By highlighting gaps in test execution, it supports efforts to improve overall code quality and facilitates regression testing, where changes to the codebase are verified to ensure no new issues arise in previously covered areas.[7] For instance, just as mapping all roads in a city ensures comprehensive navigation coverage rather than focusing only on major highways, code coverage encourages testing all potential paths—including edge cases—rather than just the most common ones. Unlike metrics focused on bug detection rates, which evaluate how effectively tests uncover faults, code coverage emphasizes structural thoroughness but does not guarantee fault revelation, as covered code may still contain errors if tests lack assertions or diverse inputs.[9] This metric underpins various coverage criteria, such as those assessing statements or decisions, which are explored in detail elsewhere.[7]Historical Development
The concept of code coverage in software testing emerged during the 1970s as a means to quantify the extent to which test cases exercised program code, amid the rise of structured programming paradigms that emphasized modular and verifiable designs.[10] Early efforts focused on basic metrics like statement execution to address the growing complexity of software systems, building on foundational testing literature such as Glenford Myers' 1979 book The Art of Software Testing, which advocated for coverage measures including statement and branch coverage to improve test adequacy.[10] Tools like TCOV, initially developed for Fortran and later extended to C and C++, exemplified this era's innovations by providing source code coverage analysis and statement profiling, enabling developers to identify untested paths in scientific and engineering applications.[11] In the 1980s and early 1990s, coverage criteria evolved to meet rigorous safety requirements in critical domains, with researchers like William E. Howden advancing theoretical foundations through work on symbolic evaluation and error-based testing methods that informed coverage adequacy.[12] A pivotal milestone came in 1992 with the publication of the DO-178B standard for airborne software certification, which introduced Modified Condition/Decision Coverage (MC/DC) as a stringent criterion for Level A software, requiring each condition in a decision to independently affect the outcome to ensure high structural thoroughness in avionics systems. This standard, rooted in earlier 1980s guidelines like DO-178A, marked a shift toward formalized, verifiable coverage in safety-critical industries, influencing global practices beyond aviation.[13] The late 1990s saw accelerated adoption of coverage tools. Post-2000, the rise of agile methodologies further embedded code coverage in iterative development, with practices like Test-Driven Development emphasizing continuous metrics to maintain quality during rapid cycles, as seen in frameworks that integrated coverage reporting into CI/CD pipelines.[10] By the 2010s, international standards like the ISO/IEC/IEEE 29119 series formalized coverage within software testing processes, with Part 4 (2021 edition) specifying structural techniques such as statement, decision, and condition coverage as essential for deriving test cases from code artifacts. This evolution continued into the 2020s, where cloud-native environments and AI-assisted testing transformed coverage practices; for instance, generative AI tools have enabled automated test generation to achieve higher coverage in legacy systems, reducing manual effort by up to 85% in large-scale projects like those at Salesforce.[14] These advancements prioritize dynamic analysis in distributed systems, aligning coverage goals with modern DevOps while addressing scalability challenges in microservices and AI-driven codebases.[15]Basic Measurement Concepts
Code coverage is quantified through various measurement units that assess different aspects of code execution during testing. Line coverage measures the proportion of lines of code that are executed at least once by the test suite, providing a straightforward indicator of breadth in testing. Function coverage evaluates whether all functions or methods in the codebase are invoked, helping identify unused or untested modules. Basic path coverage concepts focus on the execution of distinct execution paths through the code, though full path coverage is often impractical due to exponential growth in paths; instead, it introduces the idea of tracing control flow to ensure diverse behavioral coverage.[4][16][17] When aggregating coverage across multiple test suites, tools compute metrics based on the union of execution traces from all tests, where an element (such as a line or function) is considered covered if executed by at least one test case. This union-based approach avoids double-counting and yields an overall percentage from 0% (no coverage) to 100% (complete coverage), reflecting the cumulative effectiveness of the entire test suite rather than individual tests.[18][3] A fundamental formula for statement coverage, a core metric akin to line coverage, is given by: This equation, defined in international testing standards, calculates the percentage of executable statements traversed during testing. Coverage reporting typically includes visual aids such as color-coded reports, where executed code is highlighted in green, unexecuted in red, and partially covered branches in yellow, functioning like heatmaps to quickly identify coverage gaps in source files. Industry baselines often target at least 80% coverage for statement or line metrics to ensure reasonable test adequacy, though this threshold serves as a guideline rather than a guarantee of quality.[19][3]Coverage Criteria
Statement and Decision Coverage
Statement coverage, also known as line coverage, is a fundamental white-box testing criterion that requires every executable statement in the source code to be executed at least once during testing.[16] This metric ensures that no part of the code is left untested in terms of basic execution flow, helping to identify unexercised code segments. The formula for statement coverage is calculated as the ratio of executed statements to the total number of statements, expressed as a percentage: For instance, in a simple conditional block with multiple statements, tests must cover all paths to achieve 100% coverage, such as verifying positive, negative, and zero values in anif-else chain.[16]
Decision coverage, often referred to as branch coverage, extends statement coverage by focusing on the outcomes of control flow decisions, such as conditional branches in if, while, or switch statements. It requires that each possible outcome (true or false) of every decision point be exercised at least once, ensuring that both branches of control structures are tested.[16] This criterion is particularly useful for validating the logic of branching constructs. The formula for decision coverage is:
Consider an if-else structure:
if (x > 0) {
printf("Positive");
} else {
printf("Non-positive");
}
if (x > 0) {
printf("Positive");
} else {
printf("Non-positive");
}
int x = input();
if (x > 0) {
print("Positive");
}
print("End of program");
int x = input();
if (x > 0) {
print("Positive");
}
print("End of program");
if is never taken, resulting in 50% decision coverage and potentially missing faults in the untested path.[21] In practice, achieving 100% statement coverage often correlates with at least 50% decision coverage, but higher statement levels do not guarantee equivalent decision thoroughness, underscoring the need to prioritize decision coverage for better control flow validation.
Condition and Multiple Condition Coverage
Condition coverage, also known as predicate or clause coverage, is a white-box testing criterion that requires each boolean sub-condition (or atomic condition) within a decision to evaluate to both true and false at least once during testing.[22] This ensures that individual conditions, such as A or B in an expression like (A && B), are independently exercised regardless of their combined effect on the overall decision outcome.[22] For instance, in the decisionif ((x > 0) && (y < 10)), tests must include cases where x > 0 is true and false, and separately where y < 10 is true and false.[22]
Modified condition/decision coverage (MC/DC) extends condition coverage by requiring not only that each condition evaluates to true and false, but also that the outcome of the decision changes when that condition is altered while all other conditions remain fixed—a demonstration of each condition's independent influence on the decision.[23] This criterion, proposed by NASA researchers, mandates coverage of all decision points (true and false outcomes) alongside the independent effect of each condition.[23] For a decision with n independent conditions, MC/DC can often be achieved with a minimal test set of n + 1 cases, though the exact number depends on the logical structure; for example, the expression (A && B) requires three tests: one where both are true (decision true), one where A is false and B is true (decision false, showing A's effect), and one where A is true and B is false (decision false, showing B's effect).[23]
Multiple condition coverage, also referred to as full predicate or combinatorial coverage, demands that every possible combination of truth values for all boolean sub-conditions in a decision be tested, covering all 2n outcomes where n is the number of conditions.[22] This exhaustive approach guarantees complete exploration of the decision's logic but becomes impractical for decisions with more than a few conditions due to the exponential growth in test cases.[23] For example, the decision (A && B) || C involves three conditions (A, B, and C), necessitating eight distinct tests to cover combinations such as (true, true, true), (true, true, false), ..., and (false, false, false).[22]
These criteria refine basic decision coverage by scrutinizing the internal logic of conditions, addressing potential gaps where correlated conditions might mask faults, such as incorrect operator precedence or condition dependencies.[23] In safety-critical domains like aerospace, where software failures can have catastrophic consequences, MC/DC is mandated for the highest assurance levels (e.g., Level A in DO-178B) to provide high confidence that all decision logic is verified without unintended behaviors, balancing thoroughness against the infeasibility of full multiple condition coverage.[23] This rationale stems from the need to detect subtle errors in complex control logic, as evidenced in aviation systems where structural coverage analysis complements requirements-based testing.[23]
