The infrastructure department has the following requirements on alerting. The requirements are for reference only. Different customers may have different requirements.
Timeliness: When key alerts related to platform stability are triggered, responses must be made, and the alerts must be handled within the SLA required by the customer, such as 30 minutes.
Classification and grading: Severity levels are defined based on the impact of faults. Alerts related to virtualized platform stability are preferably pushed.
Accuracy: Filter out unneeded alerts to reduce noise and simplify the O&M work.
Monitor integration: Use general interfaces (APIs or SNMP) to integrate with the alerting platforms of the customer, such as self-developed alerting platforms, Zabbix, or Prometheus.