Service Assurance

This section describes how Horizon tests the availability of a service or device and measures its latency. In Horizon, a service monitor framework fulfills these tasks. The main component, pollerd, provides the following functionality:

Tracks the status of a management resource or an application for availability calculations.
Measures response times for service quality.
Correlates node and interface outages based on a critical service.

See Step 7: Determine Service Availability in the Quick Start section for additional information on determining service availability.

The following image shows the model and representation of availability and response time:

Graphical representations of service availability and response time for services on a single node

Figure 1. Service availability and response time

This information is based on service monitors which are stored and run by pollerd. A service can have any arbitrary name, and is associated with a service monitor; for example, two services can be defined with the names "HTTP" and "HTTP-8080". Both are associated with the HTTP Service Monitor, but each uses a different TCP port configuration parameter.

The following image shows how pollerd interacts with other Horizon components, and applications or agents to be monitored:

Network architecture diagram showing relationships between pollerd and Horizon components

Figure 2. Pollerd architecture

Availability is calculated over the last 24 hours, and is displayed in Surveillance Views, SLA Categories, and Node Detail pages. Response times are displayed as IP interface resource graphs on Node Detail pages. You can click a service’s name on any Node Detail page to see the service monitor’s configuration parameters. The status of a service can be up or down.

The Service page also includes timestamps that indicate the last time the service was polled and found to to be up (last good) or down (last fail). These fields help to validate that pollerd is polling the services as expected.

When a service monitor detects an outage, pollerd sends an event that is used to create an alarm. You can also use events to generate notifications for on-call network or server administrators. Pollerd can generate the following events in Horizon:

Event Name	Description
uei.opennms.org/nodes/nodeLostService	Critical services are still up; just this service is lost.
uei.opennms.org/nodes/nodeRegainedService	Service came back up.
uei.opennms.org/nodes/interfaceDown	Critical service on an IP interface is down, or all services are down.
uei.opennms.org/nodes/interfaceUp	Critical service on that interface came back up.
uei.opennms.org/nodes/nodeDown	All critical services on all IP interfaces are down from node. The whole host is unreachable over the network.
uei.opennms.org/nodes/nodeUp	Some critical services came back online.

Event Name

Description

uei.opennms.org/nodes/nodeLostService

Critical services are still up; just this service is lost.

uei.opennms.org/nodes/nodeRegainedService

Service came back up.

uei.opennms.org/nodes/interfaceDown

Critical service on an IP interface is down, or all services are down.

uei.opennms.org/nodes/interfaceUp

Critical service on that interface came back up.

uei.opennms.org/nodes/nodeDown

All critical services on all IP interfaces are down from node. The whole host is unreachable over the network.

uei.opennms.org/nodes/nodeUp

Some critical services came back online.

The behavior to generate interfaceDown and nodeDown events is described in the critical services section.

This assumes that node-outage processing is enabled.