| Maintenance Scope |
Maintenance scenario |
Maintenance advice |
Operations |
| HCI status check |
Monitor the hardware resources and components of the HCI node to quickly discover system abnormalities. |
Check the status of the physical server if any physical host is offline. Whether the storage status is normal, the disk status is normal; |
|
| Node alert monitoring |
Node swap partition usage is too high. |
It is recommended to expand the node memory or shut down temporarily unused virtual machines. |
|
| |
Node offline |
It is recommended to check whether the node or network is abnormal and perform service recovery immediately. |
|
| |
The node system partition is abnormal |
There may be bad sectors or failures in the system disk, please contact Sangfor technical support. |
|
| |
Node GPU usage is too high |
Please shut down some virtual machines or migrate some virtual machines to other nodes. |
|
| |
The node cannot detect the graphics card |
Please log in to IPMI to check whether the graphics card is abnormal |
|
| |
The graphic card temperature of the node is too high |
Please check whether the heat dissipation of the node or the temperature of the server room is abnormal. |
|
| |
Multiple types of graphics card hardware detected |
Please remove the heterogeneous graphics card. Otherwise, the GPU-Supported VM will not run on the node. |
|
| |
Insufficient memory on the node |
Please shut down some virtual machines or migrate some virtual machines to other nodes. |
|
| |
Node CPU usage is too high |
It is recommended to expand the capacity of the node or shut down the virtual machines that are not in use temporarily. |
|
| |
The node CPU temperature is abnormal |
It is recommended to check whether the temperature of the equipment room, the node fan, and the cooling equipment is normal. |
|
| |
Host CPU throttling |
If the server frequently drops alerts, you need to check whether the CPU hardware status is normal through the BMC. |
|
| |
Node memory usage is too high |
It is recommended to expand the node memory or shut down temporarily unused virtual machines. |
|
| |
Node memory downclocking |
It is recommended to log in to the node's BMC console and check the node's memory. |
|
| |
RAID card status is abnormal |
The status of the RAID card is abnormal. Check whether the disk and storage status are normal. If it is not normal, please contact Sangfor Support in time. |
|
| |
The memory module (%s) of the node (%s) is faulty. |
Please troubleshoot or replace the memory stick. |
|
| |
The SMS function is abnormal, and the connection between the SMS agent module and the sending module is abnormal. |
Please check whether the SMS configuration is correct and the network connection is normal. |
|
| |
Memory over-provisioning alarm |
Memory capacity expansion is recommended. |
|
| |
VCPU overcommitment alert |
Node capacity expansion is recommended. |
|
| |
The speed of the interface is too low |
It is recommended to replace the NIC or network cable. |
|
| Network alert monitoring |
The node NIC is working abnormally. |
If frequent alarms occur, it is recommended to replace the host NIC |
|
| |
The network packet loss rate is too high |
Check whether the physical network is abnormal |
|
| |
The node has a persistent packet loss error |
Check whether the physical network is abnormal |
|
| |
The VXLAN ports of node xx and node xx are blocked |
Check the configuration of the VXLAN port and the configuration of the VXLAN switch. |
|
| |
The interface of node xx is disconnected. |
Please check the interface connection status of the node. |
|
| |
Virtual network device (%s) not responding |
Check the status of virtual network devices |
|
| |
If node X cannot reach the gateway, please check whether the network connection is abnormal. |
It is recommended to check whether the network is normal. |
|
| Storage Alarm Monitoring |
Storage IO latency is too high |
It is recommended to check whether the storage network is normal. |
|
| |
Storage usage is too high |
1. Delete virtual machines that are no longer needed. 2. Clear junk files in the recycle bin. 3. Expand storage capacity |
|
| |
Storage disconnected from Node or Storage dropped. |
Check storage and node network conditions in a timely manner. |
|
| |
Abnormal storage status |
Check whether the storage is faulty in time |
|
| |
Node xx access to storage remains busy. |
It is recommended to upgrade storage or migrate some virtual machines to run on other storage. |
|
| |
Bad disk state, remounted. |
Log in to the BMC console of the server, check the hardware-related logs, and confirm the cause of the fault. |
|
| |
It is detected that there is data block reuse in the storage. Please contact technical support as soon as possible for assistance. |
Contact Sangfor technical support for assistance |
|
| |
Hot spare replacement detected |
Log in to the HCI web console to view the status of the replaced disk Log in to the BMC console of the server to view hardware-related logs. |
|
| |
It is detected that the hard disk (node <%s>, hard disk name: %s) has been pulled out. If it is ejected by mistake, please reinsert the hard disk back to the original disk as soon as possible! |
Log in to the HCI web console to view the status of the disk. Log in to the BMC console of the server to view hardware-related logs. |
|
| |
Disk status is abnormal |
Log in to the HCI web console to view the status of the disk. Log in to the BMC console of the server to view hardware-related logs. Attempt to plug and unplug the disk. |
|
| |
Disk bad sectors exceed the threshold. |
Replace the disk as soon as possible |
|
| |
Disk IO error |
Log in to the HCI web console to view the status of the disk. Log in to the BMC console of the server to view hardware-related logs. |
|
| |
Storage private network exception |
Checking Storage Private Network Connectivity |
|
| License |
License expiration reminder |
It is recommended to purchase a new license in time. |
|
| |
The license key status is abnormal. |
It is suggested to unplug and plug the KEY. If it still does not work, contact Sangfor Support. |
|
| |
The licensed USB-KEY is pulled out. Please insert it. Otherwise, the system may be abnormal. |
Check whether the USB-KEY is normal, and try to plug and unplug the USB-KEY. |
|
| Virtual Machine |
Scheduled backup of virtual machine fails. |
Check whether the backup repository is normal. |
|
| |
The number of connection sessions is too high. The current session connection number is %s, which exceeds the threshold %s. %s |
Check whether the service session of the virtual machine is normal, and try to adjust the session threshold. |
|
| |
The CPU usage of the virtual machine continues to be too high. |
Check whether the vmTools of the virtual machine are normal. Try to scale the vCPU configuration of the virtual machine. |
|
| |
Corrupted virtual machine image file |
Check whether the virtual machine can be started normally. If the virtual machine cannot be started, you can contact Sangfor technical support to solve it. |
|
| |
The virtual machine is out of memory. |
Check whether the vmTools of the virtual machine are normal. Attempt to expand the memory configuration of the virtual machine. |
|
| |
The physical egress connected to the virtual machine does not bridge the node's interface where the virtual machine is located, which will cause the virtual machine to fail to communicate with the external network. |
Bridge the service incoming of the node to the physical egress. |
|
| |
The backup image of the virtual machine is found to be corrupted when deleting the backup. |
If the virtual machine still exists, please back it up immediately and contact Sangfor technical support. |
|
| |
VM restart CDP fails |
Please go to Administration >VM Backup and Recovery >Backup Policy page to manually enable CDP. |
|
| |
Failed to enable CDP on virtual machine |
Navigate to Reliability > Scheduled Backup/CDP > HCI Backup Policies page to enable CDP manually. |
|
| |
The virtual machine is running, but its configured CDP policy is disabled, and the data is currently in an unprotected state. Please adjust the CDP policy |
Please go to Administration >VM Backup and Recovery >Backup Policy page to manually enable CDP. |
|
| |
The virtual machine is not responding / virtual machine failed and has been automatically restarted and recovered. |
Check the virtual machine log and check the reason for the no response of the virtual machine. |
|
| |
Persistent high GPU utilization of virtual machines. |
Check virtual machine GPU load. Expanding virtual machine GPU configuration. |
|
| |
The virtual machine has insufficient video memory(VRAM). |
Check virtual machine GPU load. Expanding virtual machine GPU configuration. |
|
| |
The packet loss rate of the network port is too high. |
Check the virtual machine's virtual NIC configuration. |
|
| |
The virtual machine's IO log backup space exceeds the alert threshold. |
Please adjust the backup space of the virtual machine IO log. |
|
| |
The detected operating system type of the virtual machine does not match the configuration, which may lead to inaccurate report information |
Check whether the OS type of the virtual machine is the same as that configured in the HCI web console. |
|