Deep Learning Engine
The deep learning engine is a correlation engine that uses a TensorFlow model to build alarm clusters. It draws information from the locally persisted network inventory graph. ALEC automatically creates and maintains this graph, which is based on information from the associated datasource.
TensorFlow
To create clusters using TensorFlow, ALEC first reduces the clustering calculation to a binary classification that asks, "Are these two alarms related?" When the deep learning engine determines that two or more alarms are related, they are added to a cluster. To reduce the computational complexity of these calculations, ALEC limits the search of potential active candidates to those that are nearby.
Ludwig
The current model definition for the deep learning engine has been developed using Ludwig:
The model’s input features include the following:
-
Inventory object types (categorical)
-
Relations between inventory objects (binary)
-
Difference in time (numerical)
-
Graphical distance (numerical)
See ludwig_model.yaml on the project’s GitHub for more information on the model definition.
Train the engine
The deep learning engine must be trained via the Karaf shell before ALEC can use it to correlate alarms.
These instructions are included to help you get started with the engine. They are by no means all-encompassing. The training process will be improved in future ALEC releases. |
Install shell commands
In the OpenNMS Karaf shell, run the following command to install the ALEC shell commands:
feature:install alec-features-deeplearning alec-features-shell
Vectorize datasets
First, take a snapshot of the current state of the datasource:
opennms-alec:datasource-snapshot /tmp/snap1
Next, open /tmp/snap1/alec.situations.xml
in a text editor and configure your desired situation state.
Save your changes and return to the Karaf shell.
Run the following commands to build a vectorized representation of the snapshot:
opennms-alec:tensorflow-vectorize --alarms-in /tmp/snap1/alec.alarms.xml \
--inventory-in /tmp/snap1/alec.inventory.xml \
--situations-in /tmp/snap1/alec.situations.xml \
--csv-out /tmp/snap1/alec.vector.dataset.csv
Train the model with Ludwig
Retrieve model.yaml
from the source tree’s ludwig_model.yaml
file.
Run the following command to train the model using Ludwig:
ludwig train --data_csv /tmp/snap1/alec.vector.dataset.csv
--model_definition_file model.yaml
Export the model
Use the following script to export the trained model to a format that ALEC can use:
echo '#!/usr/bin/env python
import numpy as np
import tensorflow as tf
from tensorflow.python.framework import graph_util
from tensorflow.python.framework import ops
from tensorflow.python.saved_model import builder as saved_model_builder
from ludwig import LudwigModel
model_path = "results/experiment_run_0/model"
model = LudwigModel.load(model_path)
builder = tf.saved_model.Builder("export")
with tf.Session(graph=model.model.graph) as sess:
saver = tf.train.Saver()
saver.restore(sess, model.model.weights_save_path)
builder.add_meta_graph_and_variables(sess, [tf.saved_model.tag_constants.SERVING])
builder.save()
model.close()' > export_model.py
chmod +x export_model.py
./export_model.py
mkdir -p /tmp/tf-export
cp -R ./export/* /tmp/tf-export/
cp results/experiment_run_0/model/model_hyperparameters.json /tmp/tf-export/
Use the model in ALEC
First, you must verify that the trained model can be loaded into ALEC:
opennms-alec:tensorflow-load-model /tmp/tf-export
If the command results are negative, you must retrain and re-export the training model. |
If the command results are positive, you can then configure the deep learning engine to use the model:
config:edit org.opennms.alec.engine.deeplearning
property-set modelPath /tmp/tf-export
config:update
Verify using simulations
You can run simulations to verify that the training model clusters alerts as expected. First, use the following commands to generate situations based on the dataset snapshot from earlier:
opennms-alec:process-alarms --alarms-in /tmp/snap1/alec.alarms.xml \
--inventory-in /tmp/snap1/alec.inventory.xml \
--situations-out /tmp/snap1/alec.situations.deeplearning.trained.xml \
--engine deeplearning
Run the following command to compare the model’s results to your ideal definition in /tmp/snap1/alec.situations.xml
:
opennms-alec:score-situations -s peer /tmp/snap1/alec.situations.xml /tmp/snap1/alec.situations.deeplearning.trained.xml
From here, you can repeat the previous steps to tweak the model as desired.