This section describes troubleshooting methods for flows and associated features.
You can run the following checks to determine that telemetryd is working as expected:
Check telemetryd availability.
Verify that routers are sending data.
Check Docker networking configuration.
Perform a Minion health check.
Perform a Horizon health check.
Verify that sink consumer graphs are populated.
Verify that SNMP is available on routers that provide Netflow.
Review Horizon and Minion logs.
The Troubleshoot Telemetryd Discourse article describes how to run these checks.
In a scenario where you see no data or incorrect data, you should view the state and parameters of telemetry listeners:
If you store flows in Elasticsearch, you can use Kibana to check if flow documents (raw and aggregated) are written to Elasticsearch.
|You must know your endpoint address and API key.|
To check Elasticsearch persistence, run this command:
The query returns a list of indices.
Those that start with a
. are system indices; all others are regular indices.
Regular indices appear only when Elasticsearch is receiving flows.
If no regular indices are displayed, check
etc/telemetryd-configuration.xml and verify that Netflow listeners are enabled.
For more information on troubleshooting Elasticsearch, refer to the official Elasticsearch documentation.
If your persisted flows are not displayed in the OpenNMS plugin for Grafana, check the
elasticUrl property in
If you are using aggregated flows, ensure that your
aggregate.elasticIndexStrategy matches the index strategy that you configured in the streaming analytics tool.
To persist only raw flows or only aggregated flows, set either
alwaysUseAggForQueries, as appropriate.
To verify flows by device, clickin the web UI’s top menu bar. The flows indicator symbol on the Nodes page shows flows data for each device, with SNMP details and the direction of flows.
You can replay flows from a package capture to help debug flow processing:
List the available listeners and parsers:
Replay the package capture to the target parser (obtained from the previous step):
opennms:telemetry-replay-pcap <listener> <parser> <path-to-pcap-file>
This example is the result of replaying a package capture with Netflow 9 flows to the Netflow 9 parser:
admin@opennms()> opennms:telemetry-listeners Name = Multi-UDP-9999 Description = UDP *:9999 Properties: Max Packet Size = 8096 Port = 9999 Parsers: - Multi-UDP-9999.Netflow-5-Parser - Multi-UDP-9999.Netflow-9-Parser - Multi-UDP-9999.IPFIX-TCP-Parser - Multi-UDP-9999.SFlow-Parser admin@opennms()> opennms:telemetry-replay-pcap Multi-UDP-9999 Multi-UDP-9999.Netflow-9-Parser /tmp/flows.pcap Processing packets from '/tmp/flows.pcap'. Processing packet #100. Processing packet #200. Processing packet #300. Processing packet #400 Processing packet #500. Done processing 515 packets. admin@opennms()>
Flows are ingested using the same pipeline that they would if received directly from the devices. Nodes with interfaces that match the IP addresses in the package capture must exist to associate the results with a node.
Flow analyses use timestamps that are exposed by the underlying flow management protocol. The timestamps are set depending on the exporting router’s clock settings. If the router’s clock differs from the actual time, the skew is reflected in the received flows; this can subsequently impact further analysis and aggregation.
Horizon can correct skewed flow timestamps. To do so, it compares the exporting device’s current time with the actual time that the packet is received. If the times differ significantly, the time that the flow was received is considered more correct and all timestamps associated with the flow are updated.
To enable clock correction, configure a threshold for the maximum allowed delta, in milliseconds.
Note that setting the threshold to
0 disables the correction mechanism.
$ ssh -p 8101 admin@localhost ... admin@opennms()> config:edit org.opennms.features.flows.persistence.elastic admin@opennms()> config:property-set clockSkewCorrectionThreshold 5000 admin@opennms()> config:update