The Architecture for Learning Enabled Correlation (ALEC) provides a machine learning-powered framework for logically grouping related alarms into higher-level situations.
Because correlated alarms are likely to have been generated by the same underlying issue, alarm groupings provide a starting point for root-cause analysis.
How does it work?
First, the cluster engine analyzes your provisioned network inventory and converts it to a graphical representation. Inventory objects that are analyzed include nodes, their components, and their relations to other nodes.
After the graph is generated, alarms are enriched with additional metadata: the managed object type, and the alarm’s managed object instance. This metadata is required to identify which components the alarms are related to.
Alarms are then attached to the graph as they are triggered, and grouped into clusters based on their level of similarity and whether they share the same root cause. When ALEC determines that a cluster of alarms is related, it sends an event to OpenNMS. OpenNMS then creates a new situation for the cluster.
ALEC uses Drools rules to manage the state of all alarms, including situations:
The severity of a situation is the maximum severity of all related alarms, plus one.
If all alarms related to a situation are acknowledged, the situation is also acknowledged.
ALEC implements two correlation engines:
Deep Learning: an algorithm that leverages deep-learning techniques through a neural network to correlate and group alarms.
Clustering: leverages the DBSCAN algorithm to group alarms based on their distance from each other.
To learn more, see the Correlation Engines section.