Alarm Lifecycle

The following is an example of the alarm lifecycle based on a nodeLostService event.

Lifecycle example

A new nodeLostService event is received and creates a new alarm.

"New alarm visible in outstanding alarm list
Figure 1. New alarm visible in outstanding alarm list

Clicking the number displayed in the Count column displays the corresponding events and their details.

"Event list showing events related to the alarm
Figure 2. Event list showing events related to the alarm

The alarm clears automatically when service is restored, based on a nodeRegainedService event.

Alarm List displaying one cleared alarm and its log message
Figure 3. Alarm cleared
Event list page displaying one service down event and one service restored event
Figure 4. Service down and service restored events

If the problem occurs again, the events are reduced into the existing alarm. The alarm’s count is updated to reflect the new activity.

Alarm List displaying one alarm with a count of 2
Figure 5. Alarm reopened with an increase in the count value
Detailed event list page displaying two service down events and one service restored event, all of which are components of the same alarm
Figure 6. Event list showing events related to the alarm

The alarm once again clears immediately when service is restored.

Alarm List displaying one cleared alarm with a count of 2, and its log message
Figure 7. Reduced alarm cleared

Note that the alarm’s count only increments on events with a severity of Warning or greater.

Detailed event list page displaying two service down events and two service restored events, all of which are members of the same alarm
Figure 8. Service down and restored events

Alarm lifetime rules

Alarms are deleted from the Meridian database after a set amount of time. This lifetime can be configured via Drools rules in the ${OPENNMS_HOME}/etc/alarmd/drools-rules.d/alarmd.drl file. The default alarm lifetimes:

Alarm State Deletion Delay

Cleared and Unacknowledged

5 minutes

Cleared and Acknowledged

1 day

Active and Unacknowledged

3 days

All other alarms

8 days

These delays are based on the last event time, and will restart the counter if a new problem event is reduced into the same alarm.