Use the intelligent delivery tool aDeploy to perform a platform-wide check, covering hardware health, storage status, system services, network connectivity, SPs, and licensing status.
Check result fixing: If hardware faults, such as disks with bad sectors or malfunctioned NICs, are identified, replace the faulty hardware immediately. Adjustment plans must be provided within one month to fix alerts, such as performance optimization suggestions. Submit the check report to the IT administration department for archiving.
SP assessment and fixing:
After receiving a Sangfor product security announcement, use aDeploy to check whether the same issues exist in the current SCP version, such as compatibility vulnerabilities or performance bottlenecks.
SP installation necessity assessment: Assess whether SP installation is required based on the current service loads (whether core services can be suspended) and risks of SP installation (whether it will pose compatibility issues).
SP installation: Install the SP during off-peak hours, such as the last weekends of the month. Back up the configuration and service data in advance and verify the status of SCP after SP installation.
Core resource assessment: CPU: Check the peak values of CPU usage within the previous month. If the CPU usage remains above 70%, a capacity expansion is required. Assess the CPU redundancy of the cluster. In a regular cluster, make sure that services can still run as expected when one or two nodes are down. In a stretched cluster, make sure that services can still run as expected when one fault domain is down.
i. Memory: Check the peak values of memory usage within the previous month. If the memory usage remains above 80%, a capacity expansion is required. The memory redundancy requirement is the same as that for CPU resources.
ii. Storage: Check the current storage usage. If the storage usage exceeds 90%, a capacity expansion is required. Estimate the number of days supported by the remaining capacity. If the number of days is less than 90, a capacity expansion is required. If the storage IO latency exceeds 20 milliseconds, performance optimization is required, such as upgrade to all-flash datastores.
Capacity expansion plan: Provide an SCP Capacity Expansion Plan according to the assessment result. Confirm the expansion scope (nodes, disks, or external storage), the time window of the expansion, and the impact scope.
Service backup and disaster recovery verification:
Check the status of backup and disaster recovery tasks for core services, such as databases and production systems. Make sure that these tasks are not failed or delayed.
Disaster recovery drills: Perform disaster recovery drills (such as VM recovery from backup) on one or two non-core services every quarter to verify the validity of backups. Record the drilling results in the Disaster Recovery Drill Report.
Security management check:
Login limit: Check the list of users (IP addresses) that are allowed to log in to SCP, the validity (no more than six months) of passwords, and the maximum password retry attempts (no more than 10) to avoid unlicensed access.
Port management: Check the open ports of SCP and close unused ports, such as redundant service ports. This helps mitigate security risks.
SP server connectivity: Check the connectivity to the SP server. Make sure that SPs can be downloaded as expected.