Software testability

Software testability is the degree to which a software artifact (e.g. a software system, module, requirement, or design document) supports testing in a given test context. If the testability of an artifact is high, then finding faults in the system (if any) by means of testing is easier.

Formally, some systems are testable, and some are not. This classification can be achieved by noticing that, to be testable, for a functionality of the system under test "S", which takes input "I", a computable functional predicate "V" must exists such that $V(S,I)$ is true when S, given input I, produce a valid output, false otherwise. This function "V" is known as the verification function for the system with input I.

Many software systems are untestable, or not immediately testable. For example, Google's ReCAPTCHA, without having any metadata about the images is not a testable system. Recaptcha, however, can be immediately tested if for each image shown, there is a tag stored elsewhere. Given this meta information, one can test the system.

Therefore, testability is often thought of as an extrinsic property which results from interdependency of the software to be tested and the test goals, test methods used, and test resources (i.e., the test context). Even though testability can not be measured directly (such as software size) it should be considered an intrinsic property of a software artifact because it is highly correlated with other key software qualities such as encapsulation, coupling, cohesion, and redundancy.

The correlation of 'testability' to good design can be observed by seeing that code that has weak cohesion, tight coupling, redundancy and lack of encapsulation is difficult to test.^[1]

A lower degree of testability results in increased test effort. In extreme cases a lack of testability may hinder testing parts of the software or software requirements at all.

Background

Testability, a property applying to empirical hypothesis, involves two components. The effort and effectiveness of software tests depends on numerous factors including:

Properties of the software requirements
Properties of the software itself (such as size, complexity and testability)
Properties of the test methods used
Properties of the development- and testing processes
Qualification and motivation of the persons involved in the test process

Testability of software components

The testability of software components (modules, classes) is determined by factors such as:

Controllability: The degree to which it is possible to control the state of the component under test (CUT) as required for testing.
Observability: The degree to which it is possible to observe (intermediate and final) test results.
Isolateability: The degree to which the component under test (CUT) can be tested in isolation.
Separation of concerns: The degree to which the component under test has a single, well defined responsibility.
Understandability: The degree to which the component under test is documented or self-explaining.
Automatability: The degree to which it is possible to automate testing of the component under test.
Heterogeneity: The degree to which the use of diverse technologies requires to use diverse test methods and tools in parallel.

The testability of software components can be improved by:

Test-driven development
Design for testability (similar to design for test in the hardware domain)

Testability of requirements

Requirements need to fulfill the following criteria in order to be testable:

consistent
complete
unambiguous
quantitative (a requirement like "fast response time" can not be verification/verified)
verification/verifiable in practice (a test is feasible not only in theory but also in practice with limited resources)

Treating the requirement as axioms, testability can be treated via asserting existence of a function $F_{S}$ (software) such that input $I_{k}$ generates output $O_{k}$ , therefore $F_{S}:I\to O$ . Therefore, the ideal software generates the tuple $(I_{k},O_{k})$ which is the input-output set $\Sigma$ , standing for specification.

Now, take a test input $I_{t}$ , which generates the output $O_{t}$ , that is the test tuple $\tau =(I_{t},O_{t})$ . Now, the question is whether or not $\tau \in \Sigma$ or $\tau \not \in \Sigma$ . If it is in the set, the test tuple $\tau$ passes, else the system fails the test input. Therefore, it is of imperative importance to figure out : can we or can we not create a function that effectively translates into the notion of the set indicator function for the specification set $\Sigma$ .

By the notion, $1_{\Sigma }$ is the testability function for the specification $\Sigma$ . The existence should not merely be asserted, should be proven rigorously. Therefore, obviously without algebraic consistency, no such function can be found, and therefore, the specification cease to be termed as testable.

References

^ Shalloway, Alan; Trott, Jim (2004). Design Patterns Explained, 2nd Ed. p. 133. ISBN 978-0321247148.

Robert V. Binder: Testing Object-Oriented Systems: Models, Patterns, and Tools, ISBN 0-201-80938-9
Stefan Jungmayr: Improving testability of object-oriented systems, ISBN 3-89825-781-9
Wanderlei Souza: Abstract Testability Patterns, ISSN 1884-0760
Boris Beizer: [1], Software Testing Techniques

[DesignPatternsExplained2ndEd-1] Shalloway, Alan; Trott, Jim (2004). Design Patterns Explained, 2nd Ed. p. 133. ISBN 978-0321247148.

[1]

Metric	Formula	Description	Interpretation
Lack of Cohesion in Methods (LCOM)	$\text{LCOM} = 1 - \frac{\sum \text{similarities between method pairs}}{\text{number of methods} \times \text{number of attributes}}$	Measures intra-class cohesion based on shared attribute access among methods; similarities are typically the count or Jaccard index of common fields.	High LCOM signals low cohesion, complicating unit testing by scattering responsibilities and increasing fault isolation difficulty.
Weighted Methods per Class (WMC)	$\text{WMC} = \sum \text{complexity of each method}$ (e.g., cyclomatic complexity $v(G) = e - n + 2p$ , where $e$ is edges, $n$ nodes, $p$ components)	Sums the complexities of all methods in a class to gauge overall class intricacy.	Higher WMC elevates testing effort, as more paths require coverage; values exceeding class-specific thresholds indicate potential refactoring needs.
Fan-Out (FOUT)	Number of distinct modules called by the module	Quantifies outgoing dependencies, reflecting coupling to external components.	High FOUT reduces modularity, amplifying integration testing complexity through ripple effects.
Lines of Code per Class (LOCC)	Total lines of code divided by number of classes	Provides a size-based proxy for testing volume.	Excessive LOCC correlates with higher defect density and prolonged testing, though it should be combined with complexity metrics for accuracy.

History

Software testability

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

Software testability

Background

Testability of software components

Testability of requirements

See also

References