Improve Monitoring Efficiency
Monitoring services on an IP network can be resource-intensive, especially in cases where many of the services are not available. When a service is offline or unreachable, the monitoring system spends most of its time waiting for retries and timeouts.
To improve efficiency, Horizon deems all services on an interface to be down if the critical service is down. Horizon uses ICMP as the critical service by default.
The following image shows how to use critical services to generate these events:
Critical services are all up on the node and just a
Critical service of one of many IP interfaces is down and
interfaceDownis sent. All other services are not tested and no events are sent. The services are assumed to be unreachable.
All critical services on the node are down and just a
nodeDownis sent. All other services on the other IP interfaces are not tested and no events are sent. These services are assumed as unreachable.
Horizon uses the critical service to correlate outages from services to a
This is a global configuration option of pollerd, defined in
The Horizon default configuration enables this behavior.
<poller-configuration threads="30" pathOutageEnabled="false" serviceUnresponsiveEnabled="false"> <node-outage status="on" (1) pollAllIfNoCriticalServiceDefined="true"> (2) <critical-service name="ICMP" /> (3) </node-outage>
|1||Allow node outage correlation based on a critical service.|
|2||(Optional) For nodes without a critical service, this option controls the polling behavior during a node outage.
If set to
|3||Define the critical service for node outage correlation.|