Time to Live (TTL) for RPCs

Remote procedure calls (RPCs) between Horizon Core and Minion include a time limit which, if exceeded, aborts the requests and discards any subsequent responses. We call this limit the time-to-live or TTL. We make an effort to use logical values where possible and provide means to configure the TTLs for circumstances where the default values are not suitable (i.e., for interacting with devices or locations with high latency).

Our modules and services have different rules for determining the TTL.

To troubleshoot TTLs, consider enabling the Open Tracing integration in Horizon Core instance and the Minion.

TTLs in pollerd and collectd

The TTL used when invoking monitors and collectors remotely is:

  1. The value from the ttl parameter associated with the service

  2. Fall-back to using the service interval as the TTL (this is the rate at which the service is scheduled)

These rules apply to all monitors and collectors, excluding the SnmpCollector.

TTLs for the SNMP collector

The TTL used when invoking the SnmpCollector remotely is:

  1. The value of the ttl associate with the SNMP agent’s configuration

  2. Fall-back to using the service interval as the TTL (this is the rate at which the service is scheduled)

TTLs for the other SNMP communication

The TTL used when invoking other types of SNMP requests remotely is:

  1. The value of the ttl associate with the SNMP agent’s configuration

  2. Fall-back to using the service interval as the TTL (this is the rate at which the service is scheduled)

These rules apply to SNMP queries like agent scans in provisiond, table scans in enlinkd, etc…​

TTLs for provisiond detectors

The TTL used when invoking detectors remotely is:

  1. The value from the ttl parameter associated with the detector

  2. Fall-back to using the global default TTL

Global TTL

For cases where we cannot derive a suitable TTL, a global value is used which defaults to 20000 (20 seconds). This value can be configured. When using the JMS-based RPC implementation, set value of the org.opennms.jms.timeout system property to the desired number of milliseconds. When using the Kafka-based RPC implementation, set the value of the org.opennms.core.ipc.rpc.kafka.ttl system property to the desired number of milliseconds.

Using metadata for TTLs

For RPCs that derive TTLs from service parameters, the metadata feature can be used to customize these values on a node/interface/service basis.

For example, the ICMP detector could be configured to use the value of the ttl associated with the node metadata, or default to 30 seconds if none is setup as follows:

<detector name="ICMP" class="org.opennms.netmgt.provision.detector.icmp.IcmpDetector">
  <parameter key="ttl" value="${requisition:ttl|300000}"/>
</detector>