Hyper Converged Infrastructure (HCI/aSV)

Sangfor HCI and aSV provide a unified infrastructure combining compute, storage, networking, and built-in security to simplify deployment, operations, and services.
{{ $t('productDocDetail.guideClickSwitch') }}
{{ $t('productDocDetail.know') }}
{{ $t('productDocDetail.dontRemind') }}
6.11.1R1
{{sendMatomoQuery("Hyper Converged Infrastructure (HCI/aSV)","Daily Maintenance")}}

Daily Maintenance

{{ $t('productDocDetail.updateTime') }}: 2026-01-05

It is recommended to check the corresponding alert information on the HCI console for daily maintenance. You should immediately deal with emergency-level alerts. For ordinary-level alerts, you should formulate a processing plan. For specific maintenance recommendations, please refer to the following table:

Maintenance Scope

Maintenance scenario

Maintenance advice

Operations

HCI status check

Monitor the hardware resources and components of the HCI node to quickly discover system abnormalities.

Check the status of the physical server if any physical host is offline.

Whether the storage status is normal, the disk status is normal;

Node alert monitoring

Node swap partition usage is too high.

It is recommended to expand the node memory or shut down temporarily unused virtual machines.

Node offline

It is recommended to check whether the node or network is abnormal and perform service recovery immediately.

The node system partition is abnormal

There may be bad sectors or failures in the system disk, please contact Sangfor technical support.

Node GPU usage is too high

Please shut down some virtual machines or migrate some virtual machines to other nodes.

The node cannot detect the graphics card

Please log in to IPMI to check whether the graphics card is abnormal

The graphic card temperature of the node is too high

Please check whether the heat dissipation of the node or the temperature of the server room is abnormal.

Multiple types of graphics card hardware detected

Please remove the heterogeneous graphics card. Otherwise, the GPU-Supported VM will not run on the node.

Insufficient memory on the node

Please shut down some virtual machines or migrate some virtual machines to other nodes.

Node CPU usage is too high

It is recommended to expand the capacity of the node or shut down the virtual machines that are not in use temporarily.

The node CPU temperature is abnormal

It is recommended to check whether the temperature of the equipment room, the node fan, and the cooling equipment is normal.

Host CPU throttling

If the server frequently drops alerts, you need to check whether the CPU hardware status is normal through the BMC.

Node memory usage is too high

It is recommended to expand the node memory or shut down temporarily unused virtual machines.

Node memory downclocking

It is recommended to log in to the node's BMC console and check the node's memory.

RAID card status is abnormal

The status of the RAID card is abnormal. Check whether the disk and storage status are normal. If it is not normal, please contact Sangfor Support in time.

The memory module (%s) of the node (%s) is faulty.

Please troubleshoot or replace the memory stick.

The SMS function is abnormal, and the connection between the SMS agent module and the sending module is abnormal.

Please check whether the SMS configuration is correct and the network connection is normal.

Memory over-provisioning alarm

Memory capacity expansion is recommended.

VCPU overcommitment alert

Node capacity expansion is recommended.

The speed of the interface is too low

It is recommended to replace the NIC or network cable.

Network alert monitoring

The node NIC is working abnormally.

If frequent alarms occur, it is recommended to replace the host NIC

The network packet loss rate is too high

Check whether the physical network is abnormal

The node has a persistent packet loss error

Check whether the physical network is abnormal

The VXLAN ports of node xx and node xx are blocked

Check the configuration of the VXLAN port and the configuration of the VXLAN switch.

The interface of node xx is disconnected.

Please check the interface connection status of the node.

Virtual network device (%s) not responding

Check the status of virtual network devices

If node X cannot reach the gateway, please check whether the network connection is abnormal.

It is recommended to check whether the network is normal.

Storage Alarm Monitoring

Storage IO latency is too high

It is recommended to check whether the storage network is normal.

Storage usage is too high

1. Delete virtual machines that are no longer needed. 2. Clear junk files in the recycle bin. 3. Expand storage capacity

Storage disconnected from Node or Storage dropped.

Check storage and node network conditions in a timely manner.

Abnormal storage status

Check whether the storage is faulty in time

Node xx access to storage remains busy.

It is recommended to upgrade storage or migrate some virtual machines to run on other storage.

Bad disk state, remounted.

Log in to the BMC console of the server, check the hardware-related logs, and confirm the cause of the fault.

It is detected that there is data block reuse in the storage. Please contact technical support as soon as possible for assistance.

Contact Sangfor technical support for assistance

Hot spare replacement detected

Log in to the HCI web console to view the status of the replaced disk

Log in to the BMC console of the server to view hardware-related logs.

It is detected that the hard disk (node <%s>, hard disk name: %s) has been pulled out. If it is ejected by mistake, please reinsert the hard disk back to the original disk as soon as possible!

Log in to the HCI web console to view the status of the disk.

Log in to the BMC console of the server to view hardware-related logs.

Disk status is abnormal

Log in to the HCI web console to view the status of the disk.

Log in to the BMC console of the server to view hardware-related logs.

Attempt to plug and unplug the disk.

Disk bad sectors exceed the threshold.

Replace the disk as soon as possible

Disk IO error

Log in to the HCI web console to view the status of the disk.

Log in to the BMC console of the server to view hardware-related logs.

Storage private network exception

Checking Storage Private Network Connectivity

License

License expiration reminder

It is recommended to purchase a new license in time.

The license key status is abnormal.

It is suggested to unplug and plug the KEY. If it still does not work, contact Sangfor Support.

The licensed USB-KEY is pulled out. Please insert it. Otherwise, the system may be abnormal.

Check whether the USB-KEY is normal, and try to plug and unplug the USB-KEY.

Virtual Machine

Scheduled backup of virtual machine fails.

Check whether the backup repository is normal.

The number of connection sessions is too high. The current session connection number is %s, which exceeds the threshold %s. %s

Check whether the service session of the virtual machine is normal, and try to adjust the session threshold.

The CPU usage of the virtual machine continues to be too high.

Check whether the vmTools of the virtual machine are normal.

Try to scale the vCPU configuration of the virtual machine.

Corrupted virtual machine image file

Check whether the virtual machine can be started normally. If the virtual machine cannot be started, you can contact Sangfor technical support to solve it.

The virtual machine is out of memory.

Check whether the vmTools of the virtual machine are normal.

Attempt to expand the memory configuration of the virtual machine.

The physical egress connected to the virtual machine does not bridge the node's interface where the virtual machine is located, which will cause the virtual machine to fail to communicate with the external network.

Bridge the service incoming of the node to the physical egress.

The backup image of the virtual machine is found to be corrupted when deleting the backup.

If the virtual machine still exists, please back it up immediately and contact Sangfor technical support.

VM restart CDP fails

Please go to Administration >VM Backup and Recovery >Backup Policy page to manually enable CDP.

Failed to enable CDP on virtual machine

Navigate to Reliability > Scheduled Backup/CDP > HCI Backup Policies page to enable CDP manually.

The virtual machine is running, but its configured CDP policy is disabled, and the data is currently in an unprotected state. Please adjust the CDP policy

Please go to Administration >VM Backup and Recovery >Backup Policy page to manually enable CDP.

The virtual machine is not responding / virtual machine failed and has been automatically restarted and recovered.

Check the virtual machine log and check the reason for the no response of the virtual machine.

Persistent high GPU utilization of virtual machines.

Check virtual machine GPU load.

Expanding virtual machine GPU configuration.

The virtual machine has insufficient video memory(VRAM).

Check virtual machine GPU load.

Expanding virtual machine GPU configuration.

The packet loss rate of the network port is too high.

Check the virtual machine's virtual NIC configuration.

The virtual machine's IO log backup space exceeds the alert threshold.

Please adjust the backup space of the virtual machine IO log.

The detected operating system type of the virtual machine does not match the configuration, which may lead to inaccurate report information

Check whether the OS type of the virtual machine is the same as that configured in the HCI web console.

Table 13:Daily Maintenance