The Architecture for Learning Enabled Correlation (ALEC) is a framework for logically grouping related faults (alarms) into higher level objects (situations) with OpenNMS.
Nodes, their components and their relations are used to build a graph.
The layout of the graph, and the different types of elements on the graph are what compose "the meta-model". In ALEC, the vertices on the graph are referred to as "inventory objects" and edges are used to represent "relationships".
Alarms are enriched to help identify which component in the model they relate to. The alarms are attached to the graph when they are triggered.
In OpenNMS, we can now populate the managed object (MO) type and MO instance of an alarm when it is created. These fields are used (and required) to relate the alarm to a specific IO on the graph.
We aim to group the alarms into clusters based on their level or similarity, and whether or not they share the same root cause. The engine will periodically group (cluster) the alarms.
Different clustering algorithms can be used:
Deep Learning (AI with TensorFlow)
DB-Scan (Unsupervised ML)
When we determine that a group of alarms is related, we send an event to OpenNMS. OpenNMS will then create a new situation for the set of related alarms. Situations are also managed like alarms.
We use Drools rules for managing the state of all alarms including situations:
These are used to propagate the severity:
the severity of a situation is the maximum severity of all related alarms + 1
They are used to propagate acknowledgments:
if all of the related alarms on a situation are acknowledged, then the situation is also acknowledged