Alarm Lifecycle
The following is an example of the alarm lifecycle based on a nodeLostService
event.
Lifecycle example
A new nodeLostService
event is received and creates a new alarm.
data:image/s3,"s3://crabby-images/1cc8f/1cc8f449cbbcccfe65b6b634448d6832b7af9a47" alt=""New alarm visible in outstanding alarm list"
Clicking the number displayed in the Count column displays the corresponding events and their details.
data:image/s3,"s3://crabby-images/ee500/ee500e60473703ab0cefb4dcc5430652d98aee9f" alt=""Event list showing events related to the alarm"
The alarm clears automatically when service is restored, based on a nodeRegainedService
event.
data:image/s3,"s3://crabby-images/655f2/655f2b92af5ceb333a1d662dcffe682490e0fbc3" alt="Alarm List displaying one cleared alarm and its log message"
data:image/s3,"s3://crabby-images/4e591/4e5916d2e0ac41849218c17e36a976f1883ea810" alt="Event list page displaying one service down event and one service restored event"
If the problem occurs again, the events are reduced into the existing alarm. The alarm’s count is updated to reflect the new activity.
data:image/s3,"s3://crabby-images/9b7b8/9b7b8869770e6205ac44d45a74de9a1f1c2d4d42" alt="Alarm List displaying one alarm with a count of 2"
count
valuedata:image/s3,"s3://crabby-images/2ad41/2ad41736e5f241ff1f3c5a74b9ab2dde17498e1e" alt="Detailed event list page displaying two service down events and one service restored event, all of which are components of the same alarm"
The alarm once again clears immediately when service is restored.
data:image/s3,"s3://crabby-images/4a34a/4a34a4bbd233ba370dec53f9e0c570ec2fed4eb6" alt="Alarm List displaying one cleared alarm with a count of 2, and its log message"
Note that the alarm’s count only increments on events with a severity of Warning or greater.
data:image/s3,"s3://crabby-images/efcfd/efcfd76b21c8e834ce90964dd32ec48d8769f7ae" alt="Detailed event list page displaying two service down events and two service restored events, all of which are members of the same alarm"
Alarm lifetime rules
Alarms are deleted from the Meridian database after a set amount of time.
This lifetime can be configured via Drools rules in the ${OPENNMS_HOME}/etc/alarmd/drools-rules.d/alarmd.drl
file.
The default alarm lifetimes:
Alarm State | Deletion Delay |
---|---|
Cleared and Unacknowledged |
5 minutes |
Cleared and Acknowledged |
1 day |
Active and Unacknowledged |
3 days |
All other alarms |
8 days |
These delays are based on the last event time, and will restart the counter if a new problem event is reduced into the same alarm.