Configure Alarms
Because alarmd instantiates alarms from events, defining alarms in Meridian involves defining an additional event XML element that indicates a problem or resolution in the network.
This is the alarm-data
element.
Any event that is marked as donotpersist in the logmsg element’s dest attribute will not be processed as an alarm.
|
alarm-data
schema definition (XSD)<element name="alarm-data">
<annotation>
<documentation>This element is used for converting events into alarms.</documentation>
</annotation>
<complexType>
<sequence>
<element ref="this:update-field" minOccurs="0" maxOccurs="unbounded" />
</sequence>
<attribute name="reduction-key" type="string" use="required" />
<attribute name="alarm-type" use="required" >
<simpleType>
<restriction base="int">
<minInclusive value="1"/>
</restriction>
</simpleType>
</attribute>
<attribute name="clear-key" type="string" use="optional" />
<attribute name="auto-clean" type="boolean" use="optional" default="false" />
<attribute name="x733-alarm-type" type="this:x733-alarm-type" use="optional" />
<attribute name="x733-probable-cause" type="int" use="optional" />
</complexType>
</element>
<element name="update-field">
<complexType>
<attribute name="field-name" type="string" use="required" />
<attribute name="update-on-reduction" type="boolean" use="optional" default="true" />
<attribute name="value-expression" type="string" use="optional" default="" />
</complexType>
</element>
<simpleType name="x733-alarm-type">
<restriction base="string" >
<pattern value="CommunicationsAlarm|ProcessingErrorAlarm|EnvironmentalAlarm|QualityOfServiceAlarm|EquipmentAlarm|IntegrityViolation|SecurityViolation|TimeDomainViolation|OperationalViolation|PhysicalViolation" />
</restriction>
</simpleType>
See Anatomy of an event for more information.
Attributes and elements
reduction-key
Alarmd is designed to consolidate multiple occurrences of an alarm into a single alarm.
When defining an event’s alarm-data
, the critical attribute is the reduction-key
.
It can contain literal strings as well as references to the event’s properties (fields and parameters).
The reduction-key
uniquely identifies a problem’s signature and is used to reduce events so that only one problem is instantiated.
Most commonly, the event’s unique event identifier (UEI) is used as the least significant portion of the reduction-key
, followed by other event properties (from least to most significant), and separated by a colon (:
).
reduction-key
<event>
<uei>uei.opennms.org/nodes/nodeDown</uei>
...
<alarm-data reduction-key="%uei%:%dpname%:%nodeid%" alarm-type="1" auto-clean="false"/>
</event>
%dpname%
refers to the "distributed poller name", which is the name of the monitoring location where the event originated.
Decreasing the specificity of the reduction-key
is a way to aggregate events from multiple nodes into a single alarm.
There are caveats, however:
reduction-key
attribute<!-- Don't do this in production -->
<event>
<uei>uei.opennms.org/nodes/nodeDown</uei>
<alarm-data reduction-key="%uei%" alarm-type="1"/>
</event>
This reduction-key
instantiates a single alarm for all nodes that the poller determines to be down.
The alarm includes a count that represents the number of nodes down.
uei.opennms.org/nodes/nodeUp
is not a good "pairwise" reduction-key
for resolving this alarm, however, because it would require only a single "node up" event to clear all down nodes that are tracked with this single-alarm configuration.
alarm-type
Alarmd is designed to automatically match resolving events with an existing alarm.
Alarms that have associated resolutions with problems (ups and downs) should be indicated with the alarm-type
attribute.
Three types of alarms exist in Meridian:
-
alarm-type="1"
: Problem alarm -
alarm-type="2"
: Resolving alarm -
alarm-type="3"
: Notification alarm (an alarm with no resolution, such as an SNMP authentication failure)
alarm-type
helps alarmd with pairwise resolution by matching resolution events to problem events.
If an alarm transitions from alarm-type="2" to alarm-type="1" , its severity will be set to the value of the most recent event.
|
clear-key attribute
The clear-key
attribute is used in alarmd’s pairwise correlation feature.
When configuring a resolution alarm, you can set this attribute to match the corresponding problem alarm’s reduction-key
attribute.
interfaceUp
event clearing an interfaceDown
alarm<event>
<uei>uei.opennms.org/nodes/interfaceDown</uei>
...
<alarm-data reduction-key="%uei%:%dpname%:%nodeid%:%interface%" (1)
alarm-type="1"
auto-clean="false"/>
</event>
<event>
<uei>uei.opennms.org/nodes/interfaceUp</uei>
...
<alarm-data reduction-key="%uei%:%dpname%:%nodeid%:%interface%"
alarm-type="2"
clear-key="uei.opennms.org/nodes/interfaceDown:%dpname%:%nodeid%:%interface%" (2)
auto-clean="false"/>
</event>
1 | The interfaceDown event sets a reduction-key that includes enough information to identify a specific interface on a specific node. |
2 | The interfaceUp event has a clear-key that matches the reduction-key of an interfaceDown alarm, letting a match automatically clear the previous alarm. |
auto-clean
The auto-clean
attribute instructs alarmd to retain only the most recent event that has been reduced into an alarm.
For alarms that produce many events, this serves as a way to reduce the size of the most recent events in the database.
Avoid using this feature with alarms that have pairwise correlation (type 1 and 2 alarms that match problems with resolutions). It may delete all problem events, erasing your ability to study an alarm’s history. |
Instantiate new alarms for existing cleared problems
Alarmd includes a global property setting that controls the behavior of alarm reduction of currently cleared alarms.
You can modify it by editing ${OPENNMS_HOME}/etc/opennms.properties.d/alarmd.properties
and inserting the following property, set to true
:
###### Alarmd Properties ######
#
# Enable this property to force Alarmd to create new alarms when an problem re-occurs and the
# existing Alarm is in a "Cleared" state.
#
# Default: false
#org.opennms.alarmd.newIfClearedAlarmExists = false
org.opennms.alarmd.newIfClearedAlarmExists = true
With this property set, when a repeat incident occurs and the current state of the problem’s alarm is "Cleared", a new instance of the alarm is created instead of resetting the current alarm to its default severity and incrementing the counter.
node-down
alarm and existing cleared alarmWhen enabled, alarmd alters the existing alarm’s reduction-key
to be unique (appended with ":ID:" and the alarm’s ID).
This prevents it from being reused for a reoccurring problem in the network.
reduction-key
attributeRe-enable legacy dual alarm state behavior
You can re-enable the legacy dual alarm behavior via a global property setting in ${OPENNMS_HOME}/etc/opennms.properties.d/alarmd.properties
.
Open the file in a text editor and insert the following property definition, set to true
:
###### Alarmd Properties ######
# Enable this property to have the traditional dual alarm handling of alarms state
# for alarm pairwise correlation.
# Default: false
org.opennms.alarmd.legacyAlarmState = true
Setting org.opennms.alarmd.legacyAlarmState overwrites org.opennms.alarmd.newIfClearedAlarmExists .
|