Reliability Testing

Reliability Testing

Measuring Software Maturity

An objective of reliability testing is to monitor a statistical measure of software maturity over time and compare this to a desired reliability goal which may be expressed as a Service Level Agreement (SLA). The measures may take the form of a Mean Time Between Failures (MTBF), Mean Time To Repair (MTTR) or any other form of failure intensity measurement (e.g., number of failures of a particular severity occurring per week). These may be used as exit criteria (e.g., for production release).

Tests for Fault Tolerance

In addition to the functional testing that evaluates the software’s tolerance to faults in terms of handling unexpected input values (so-called negative tests), additional testing is needed to evaluate a system’s tolerance to faults which occur externally to the application under test. Such faults are typically reported by the operating system (e.g., disk full, process or service not available, file not found, memory not available). Tests of fault tolerance at the system level may be supported by specific tools.

Note that the terms “robustness” and “error tolerance” are also commonly used when discussing fault tolerance.

Recoverability Testing

Further forms of reliability testing evaluate the software system’s ability to recover from hardware or software failures in a predetermined manner which subsequently allows normal operations to be resumed. Recoverability tests include Failover and Backup and Restore tests.

Failover tests are performed where the consequences of a software failure are so negative that specific hardware and/or software measures have been implemented to ensure system operation even in the event of failure. Failover tests may be applicable, for example, where the risk of financial losses is extreme or where critical safety issues exist. Where failures may result from catastrophic events, this form of recoverability testing may also be called “disaster recovery” testing.

Typical preventive measures for hardware failures might include load balancing across several processors and clustering servers, processors or disks so that one can immediately take over from another if it should fail (redundant systems). A typical software measure might be the implementation of more than one independent instance of a software system (for example, an aircraft’s flight control system) in so-called redundant dissimilar systems. Redundant systems are typically a combination of software and hardware measures and may be called duplex, triplex or quadruplex systems, depending on the number of independent instances (two, three or four respectively). The dissimilar aspect for the software is achieved when the same software requirements are provided to two (or more) independent and not connected development teams, with the objective of having the same services provided with different software. This protects the redundant dissimilar systems in that a similar defective input is less likely to have the same result. These measures taken to improve the recoverability of a system may directly influence its reliability as well and should also be considered when performing reliability testing.

Failover testing is designed to explicitly test systems by simulating failure modes or actually causing failures in a controlled environment. Following a failure, the failover mechanism is tested to ensure that data is not lost or corrupted and that any agreed service levels are maintained (e.g., function availability or response times).

Backup and Restore tests focus on the procedural measures set up to minimize the effects of a failure. Such tests evaluate the procedures (usually documented in a manual) for taking different forms of backup and for restoring that data if data loss or corruption should occur. Test cases are designed to ensure that critical paths through each procedure are covered. Technical reviews may be performed to “dry-run” these scenarios and validate the manuals against the actual procedures. Operational Acceptance Tests (OAT) exercise the scenarios in a production or production-like environment to validate their actual use.

Measures for Backup and Restore tests may include the following:

  • Time taken to perform different types of backup (e.g., full, incremental)
  • Time taken to restore data
  • Levels of guaranteed data backup (e.g., recovery of all data no more than 24 hours old, recovery of specific transaction data no more than one hour old)

Reliability Test Planning

In general the following aspects are of particular relevance when planning reliability tests:
Reliability can continue to be monitored after the software has entered production. The organization and staff responsible for operation of the software must be consulted when gathering reliability requirements for test planning purposes.
The Technical Test Analyst may select a reliability growth model which shows the expected levels of reliability over time. A reliability growth model can provide useful information to the Test Manager by enabling comparison of the expected and achieved reliability levels.

Reliability tests should be conducted in a production-like environment. The environment used should remain as stable as possible to enable reliability trends to be monitored over time. Because reliability tests often require use of the entire system, reliability testing is most commonly done as part of system testing. However, individual components can be subjected to reliability testing as well as integrated sets of components. Detailed architecture, design and code reviews can also be used to remove some of the risk of reliability issues occurring in the implemented system.

In order to produce test results that are statistically significant, reliability tests usually require long execution times. This may make it difficult to schedule within other planned tests.

Reliability Test Specification

Reliability testing may take the form of a repeated set of predetermined tests. These may be tests selected at random from a pool or test cases generated by a statistical model using random or pseudo-random methods. Tests may also be based on patterns of use which are sometimes referred to as “Operational Profiles”.

Certain reliability tests may specify that memory-intensive actions be executed repeatedly so that possible memory leaks can be detected.