Recent from talks
Knowledge base stats:
Talk channels stats:
Members stats:
High availability
High availability (HA) is a characteristic of a system that aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.
There is now more dependence on these systems as a result of modernization. For example, to carry out their regular daily tasks, hospitals and data centers need their systems to be highly available. Availability refers to the ability of the user to access a service or system, whether to submit new work, update or modify existing work, or retrieve the results of previous work. If a user cannot access the system, it is considered unavailable from the user's perspective. The term downtime is generally used to refer to describe periods when a system is unavailable.
High availability is a property of network resilience, the ability to "provide and maintain an acceptable level of service in the face of faults and challenges to normal operation." Threats and challenges for services can range from simple misconfiguration over large scale natural disasters to targeted attacks. As such, network resilience touches a very wide range of topics. In order to increase the resilience of a given communication network, the probable challenges and risks have to be identified and appropriate resilience metrics have to be defined for the service to be protected.
The importance of network resilience is continuously increasing, as communication networks are becoming a fundamental component in the operation of critical infrastructures. Consequently, recent efforts focus on interpreting and improving network and computing resilience with applications to critical infrastructures. As an example, one can consider as a resilience objective the provisioning of services over the network, instead of the services of the network itself. This may require coordinated response from both the network and from the services running on top of the network.
These services include:
Resilience and survivability are interchangeably used according to the specific context of a given study.
There are three principles of systems design in reliability engineering that can help achieve high availability.
A distinction can be made between scheduled and unscheduled downtime. Typically, scheduled downtime is a result of maintenance that is disruptive to system operation and usually cannot be avoided with a currently installed system design. Scheduled downtime events might include patches to system software that require a reboot or system configuration changes that only take effect upon a reboot. In general, scheduled downtime is usually the result of some logical, management-initiated event. Unscheduled downtime events typically arise from some physical event, such as a hardware or software failure or environmental anomaly. Examples of unscheduled downtime events include power outages, failed CPU or RAM components (or possibly other failed hardware components), an over-temperature related shutdown, logically or physically severed network connections, security breaches, or various application, middleware, and operating system failures.
Hub AI
High availability AI simulator
(@High availability_simulator)
High availability
High availability (HA) is a characteristic of a system that aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.
There is now more dependence on these systems as a result of modernization. For example, to carry out their regular daily tasks, hospitals and data centers need their systems to be highly available. Availability refers to the ability of the user to access a service or system, whether to submit new work, update or modify existing work, or retrieve the results of previous work. If a user cannot access the system, it is considered unavailable from the user's perspective. The term downtime is generally used to refer to describe periods when a system is unavailable.
High availability is a property of network resilience, the ability to "provide and maintain an acceptable level of service in the face of faults and challenges to normal operation." Threats and challenges for services can range from simple misconfiguration over large scale natural disasters to targeted attacks. As such, network resilience touches a very wide range of topics. In order to increase the resilience of a given communication network, the probable challenges and risks have to be identified and appropriate resilience metrics have to be defined for the service to be protected.
The importance of network resilience is continuously increasing, as communication networks are becoming a fundamental component in the operation of critical infrastructures. Consequently, recent efforts focus on interpreting and improving network and computing resilience with applications to critical infrastructures. As an example, one can consider as a resilience objective the provisioning of services over the network, instead of the services of the network itself. This may require coordinated response from both the network and from the services running on top of the network.
These services include:
Resilience and survivability are interchangeably used according to the specific context of a given study.
There are three principles of systems design in reliability engineering that can help achieve high availability.
A distinction can be made between scheduled and unscheduled downtime. Typically, scheduled downtime is a result of maintenance that is disruptive to system operation and usually cannot be avoided with a currently installed system design. Scheduled downtime events might include patches to system software that require a reboot or system configuration changes that only take effect upon a reboot. In general, scheduled downtime is usually the result of some logical, management-initiated event. Unscheduled downtime events typically arise from some physical event, such as a hardware or software failure or environmental anomaly. Examples of unscheduled downtime events include power outages, failed CPU or RAM components (or possibly other failed hardware components), an over-temperature related shutdown, logically or physically severed network connections, security breaches, or various application, middleware, and operating system failures.