{{ $t('productDocDetail.guideClickSwitch') }}
{{ $t('productDocDetail.know') }}
{{ $t('productDocDetail.dontRemind') }}
6.11.3
{{sendMatomoQuery("Sangfor Cloud Platform (SCP)","Fault Prevention and Handling")}}

Fault Prevention and Handling

{{ $t('productDocDetail.updateTime') }}: 2025-12-17

VM Restart Upon an Error

By monitoring VM heartbeats and I/O activity via vmTools, aSV can automatically detect guest OS failures such as blue screens or hangs. Upon detection, it can automatically restart the affected VM, reducing application downtime and the need for immediate administrator intervention.

HA Mechanism for VMs

The High Availability feature continuously monitors host and VM health. If a host fails or a VM becomes unresponsive, the system automatically restarts affected VMs on other healthy hosts in the cluster. This provides automated failover for business services, ensuring service continuity despite underlying hardware or software failures.

Handling of Subhealthy Hosts

aSV proactively identifies hosts showing early warning signs of potential failure, such as correctable memory errors or disk S.M.A.R.T. alerts. It can then automatically live-migrate VMs off these "subhealthy" hosts before a complete failure occurs, implementing a proactive failure avoidance strategy.

Monitoring and Handling of External Storage I/O Latency

For VMs using external storage, aSV monitors I/O response times. If latency exceeds defined thresholds, it can trigger VM live migration to a host with better storage connectivity or alert administrators, helping to maintain consistent application performance.

Memory Failure Isolation and Recovery

When a host experiences correctable memory errors, aSV can dynamically isolate the affected memory pages, map them out of the available memory pool, and continue operation on the remaining healthy memory. This increases system resilience against transient hardware faults.

RAID Card Failure and Recovery

The platform monitors the health of hardware RAID controllers. In the event of a RAID card failure or battery warning, it generates alerts to prompt preemptive replacement, preventing potential storage subsystem failures that could impact multiple VMs.

System Disk Software RAID Group

For hosts without hardware RAID, aSV can create a software RAID configuration for the hypervisor's system disks. This provides redundancy for the management plane, ensuring the host itself remains operational even if a system disk fails.

Secure Replacement of System Disks

When a hypervisor system disk requires replacement, a guided process rebuilds the system onto the new disk using the software RAID configuration or a recent system backup. This simplifies a critical maintenance task and minimizes host downtime.

Multi-USB Mapping Between Hosts

Physical USB devices connected to one host can be mapped to VMs running on other hosts within the cluster. This centralizes the management of hardware dongles or security keys, allowing VMs that require them to be freely migrated while maintaining access.

UPS Integration

Upon receiving a power loss signal from an Uninterruptible Power Supply, aSV can orchestrate an orderly shutdown of VMs and hosts before the battery is exhausted. This prevents data corruption and filesystem damage that could result from an abrupt power loss.

Recycle Bin

Deleted VMs are initially moved to a recycle bin rather than being permanently erased. This safety net allows administrators to recover accidentally deleted VMs, protecting against operational errors.

Process Watchdog

Critical aSV management services are monitored by a watchdog process. If a key service fails or becomes unresponsive, the watchdog automatically attempts to restart it, maintaining the stability and availability of the management plane.

Black Box Technology

The system continuously collects and stores detailed logs, performance metrics, and system state information. In the event of a failure, this "black box" data provides crucial forensic information for root cause analysis and problem resolution.

Host Maintenance Mode

Before performing hardware maintenance on a host, placing it in maintenance mode triggers the automatic live migration of all its VMs to other hosts in the cluster. This enables non-disruptive hardware servicing and upgrades.

System File Backup and Restore

Critical configuration files for the aSV hypervisor are automatically backed up. If these files become corrupted, they can be quickly restored to a known good state, facilitating rapid recovery from configuration errors.

Memory Snapshot

For forensic analysis or debugging, aSV can capture a complete snapshot of a VM's memory at a specific moment. This memory dump can be analyzed to diagnose complex software issues, security incidents, or system crashes.