When an abnormality occurs in the panel indicators, it is necessary to determine the problem further. To locate the problem, you need to view the detailed event log through IPMI. The daily handling methods are as follows:
- Suggested solutions for the power outage.
• Access IPMI to view the event logs to confirm whether the CPU is overheating or the memory is overheating.
• Check the heat-generating parts; the momentary heat is usually caused by ambient temperature and will recover automatically. However, if it continues to overheat, it must be turned off for further inspection.
• If It continues to overheat and cool down by itself. You need to contact experts to confirm whether to repair it.
- Suggested solutions for memory failures
• Access IPMI to view the event log to confirm the specific slot of the abnormal memory module.
• Fault type, for example, the memory device is disabled, and the slot is unable to identify the capacity. Try to plug and unplug the RAM. If still unable to solve, proceed to RMA(Return merchandise authorization). If the ECC cannot be fixed, directly RMA the corresponding memory.
• For alerts, the number of (correctable ECC) is usually small and is not reported continuously for several days. You can clear the log and restart the BMC to solve the problem; if the number of continuous reports is large, repair the corresponding memory.
- Suggested solutions for fan failures
• Access IPMI to view event logs and view sensor values.
• When the fan speed value is higher or lower than the BMC set threshold, an alert will be issued, which usually can be recovered by itself and does not occur repeatedly, so don’t ignore it.
• The fan is faulty, and the sensor shows an abnormal fan value. If still unable to fix it after plugging and unplugging the fan, request RMA directly.
- Suggested solutions for system failures
• Access IPMI to view the event log (usually subsystem health failure).
• Gather the IPMI event log to confirm whether the system is abnormal. A restart can usually solve the system's abnormality.
• You need to contact a specialist to confirm whether to request RMA if restarting cannot solve the issue.
- Suggested solutions for hard disk failures
• Re-plug the raid card.
• Confirm whether the raid card is recognized in the BIOS.
• Confirm whether there is a hard disk on the raid card.
• Request RMA if plugging and unplugging the JBOD disk couldn't solve the issue.