Computing has been the fastest developing technology during the last century. Computing systems are widely used in many areas, and they are desired to achieve various complex and safety-critical missions. The applications of the computing systems have now crossed many different fields and can be found in different products, for example, air traffic control systems, nuclear power plants, aircrafts, real-time military systems, telephone switching, bank auto-payment, hospital patient monitoring systems, and so forth.
The size and complexity of the computing systems has increased from one single processor to multiple distributed processors, from individual-separated systems to networked-integrated systems, from small-scale program running to large-scale resource sharing, and from local-area computation to global-area collaboration. A computing system today may contain many processors and communication channels and it may cover a wide area all over the world. They combine both software and hardware that have to function together to complete various tasks. They may incorporate multiple states and their failures may be correlated with one another. These factors make the system modeling and analysis complicated. As a result, making decisions in the system design or resource allocation also becomes difficult accordingly.
There is no common approach to assess computing systems. Reliability is a quantitative measure useful in this context as reliability can be broadly interpreted as the ability for a system to perform its intended function. Intensive
studies on reliability models and analytical tools are carried out to improve the chance that the computing systems will perform satisfactorily in operations. As the functionality of computing operations becomes more essential, there is a
greater need for a high reliability of the computing systems.
In fact, in order to increase the performance of the computing systems and to improve the development process, a thorough analysis of their reliability is needed. Based on the models and analysis, approaches to improve system
reliability can be further implemented.