Hyper Converged Infrastructure (HCI/aSV)

Sangfor HCI and aSV provide a unified infrastructure combining compute, storage, networking, and built-in security to simplify deployment, operations, and services.
{{ $t('productDocDetail.guideClickSwitch') }}
{{ $t('productDocDetail.know') }}
{{ $t('productDocDetail.dontRemind') }}
6.11.1R1
{{sendMatomoQuery("Hyper Converged Infrastructure (HCI/aSV)","Host Health Monitoring")}}

Host Health Monitoring

{{ $t('productDocDetail.updateTime') }}: 2026-01-05

Description

The HCI platform can automatically identify and display hosts' health, and for hosts that have been judged to be unhealthy, they will be downgraded when the virtual machine is powered on or HA is performed. For scenarios such as cluster capacity expansion and host replacement, hardware status is checked to avoid frequent node downtime or suspended systems due to hardware failures and to reduce business risks caused by hardware problems.

Precautions

  1. Only supports the identification of the suspended host due to hardware failures.
  2. Suppose it is suspended caused of a memory failure when the host is turned on and restarted. In that case, the host will not be automatically released if the faulty memory position is not accessed. To remove the unhealthy host manually, you can click Remove to remove the host from the list after resolving the issue.

Prerequisites

None.

Steps

  1. Enter the Reliability > Host Health Monitoring interface. If the physical host is automatically identified as unhealthy, it will be displayed in the Unhealthy Hosts list.

  1. Go to Health Monitoring to configure Host Health and Network Health.

Unhealth Metrics: If Host hardware (CPU, memory, system disk or RAID card) anomaly is enabled, the following will be checked: ECC Memory, UECC Memory, Bad Sector in System Disk, System Disk Read-Only, Short System Disk Lifetime Remaining, and RAID Card Failure. You can customize the crash frequency to identify unhealthy nodes that meet the criteria.

Check Schedule: If Health Monitoring is enabled, the check will be automatically performed at each host startup or restart. You can also customize the check interval.

Notification Method: You can go to aSecurity > Security Settings > Alert Options to configure the email notification by referring to Section 9.5.1 Alert Options. When an unhealthy node is detected, the platform will send an email to notify the check results.

Recovery Method: You can go to Reliability > HA to configure recovery methods for host hardware failures.

Fixing Method: This mechanism will only migrate VMs on unhealthy nodes to healthy nodes. VMs that have been configured with a scheduling policy are prioritized to be scheduled based on the scheduling policy, and VMs without a scheduling policy will only be migrated to unhealthy nodes that are relatively healthy. This mechanism does not take effect if there are no healthy nodes in the cluster and does not apply to NFV devices.

Auto Removal: If enabled, unhealthy nodes will be automatically removed from the Unhealthy Nodes list when they are healthy again.