Other Errors

This section describes other errors that you may encounter when Horizon does not start as expected.

XML configuration errors

XML errors can occur when an XML file in the ${OPENNMS_HOME}/etc/ hierarchy contains errors that cause it not to be a well-formed XML document, or that lead to an invalid configuration according to the XML file’s schema.

Example XML error log
Mar 20 14:41:53 rhel9h.totusmel.com opennms[130123]: Starting OpenNMS:
Mar 20 14:41:53 rhel9h.totusmel.com opennms[130823]: ERROR: XML validation failed: /opt/opennms/etc/events/VMWare.events.xml
Mar 20 14:41:53 rhel9h.totusmel.com opennms[130823]:            run '/usr/bin/xmllint /opt/opennms/etc/events/VMWare.events.xml' for details
Mar 20 14:41:53 rhel9h.totusmel.com opennms[130123]: Validation failed on 1 XML files.  Exiting.
Mar 20 14:41:53 rhel9h.totusmel.com opennms[130123]: failed

If you run config-tester -a, the output shows that an error is generated immediately after the Drools Northbounder configuration is loaded:

11:05:42.570 [Main] INFO  org.opennms.core.xml.AbstractJaxbConfigDao - Loaded Config for Drools Northbounder in 6 ms
java.lang.reflect.InvocationTargetException
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.opennms.bootstrap.Bootstrap$4.run(Bootstrap.java:531)
    at java.base/java.lang.Thread.run(Thread.java:829)

You can use the xmllint tool to validate the XML files and more effectively identify the source of the error:

# xmllint --schema /var/opennms/xsds/eventconf.xsd /opt/opennms/etc/events/VMWare.events.xml --noout
/opt/opennms/etc/events/VMWare.events.xml:275: parser error : Opening and ending tag mismatch: logmsg line 273 and event
   </event>
           ^
/opt/opennms/etc/events/VMWare.events.xml:276: parser error : Opening and ending tag mismatch: logmsg line 265 and events
</events>
         ^
/opt/opennms/etc/events/VMWare.events.xml:277: parser error : Premature end of data in tag event line 223
^

In this example, an unmatched tag in /opt/opennms/etc/events/VMWare.events.xml is the source of the issue:

265      <logmsg dest="logndisplay">&lt;p>
266            vpxdTrap trap received
267            vpxdTrapType=%parm[#1]%
268            vpxdHostName=%parm[#2]%
269            vpxdVMName=%parm[#3]%
270            vpxdNewStatus=%parm[#4]%
271            vpxdOldStatus=%parm[#5]%
272            vpxdObjValue=%parm[#6]%&lt;/p>
273      <logmsg>

This error can be fixed by updating line 273 to </logmsg>. If you run xmllint again, the file should validate properly:

# xmllint --schema /var/opennms/xsds/eventconf.xsd /opt/opennms/etc/events/VMWare.events.xml --noout
/opt/opennms/etc/events/VMWare.events.xml validates

Horizon should now start properly.

You can run xmllint on all files in ${OPENNMS_HOME}/etc/events using this command:

for x in ${OPENNMS_HOME}/etc/events/*;do xmllint --schema ${OPENNMS_HOME}/share/xsds/eventconf.xsd $x --noout;done
xmllint can help pinpoint errors in XML syntax and in content that violates the schema (using the --schema flag). More subtle errors may cause problems, however, even if the output indicates that all files are valid. The config-tester command is even more exhaustive, but it may not catch all errors that break startup.

Incoming messages not received

If your $Horizon or Minion server is not processing traps, syslog, flow, or other incoming messages that you are able to see via tcpdump, check your firewall settings. The tcpdump utility will show traffic before system firewalls such as iptables or firewalld block unexpected data.

Database connection deadlocks

If the system ends up in a state where all available database connections are exhausted and fails to recover, you can turn on deadlock detection to help identify the cause.

To do so, enable deadlock detection with the following:

# echo 'org.opennms.core.db.deadlock.detection=true' >> /opt/opennms/etc/opennms.properties.d/db.properties

Restart Horizon for the changes to become effective.

Once enabled, you should see the following in the logs:

2023-07-03 21:38:49,874 ERROR [Main] o.o.c.d.HikariCPConnectionFactory: Deadlock detection is enabled.

Possible deadlocks will be logged with a message similar to the following:

---
2023-07-03 21:39:52,357 ERROR [Main] o.o.c.d.HikariCPConnectionFactory: Possible database deadlock detected: Attempting to acquire connection in thread while existing transaction active.
java.lang.Exception: Possible deadlock
        at org.opennms.core.db.HikariCPConnectionFactory.getConnection(HikariCPConnectionFactory.java:109) ~[org.opennms.core.db-31.0.3-SNAPSHOT.jar:?]
        at org.opennms.netmgt.filter.JdbcFilterDao.getNodeIPAddressServiceMap(JdbcFilterDao.java:235) ~[opennms-config-31.0.3-SNAPSHOT.jar:?]
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
        at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
        at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:302) ~[org.apache.servicemix.bundles.spring-aop-4.2.9.RELEASE_1.ONMS.1.jar:?]
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) ~[org.apache.servicemix.bundles.spring-aop-4.2.9.RELEASE_1.ONMS.1.jar:?]
        at com.sun.proxy.$Proxy136.getNodeIPAddressServiceMap(Unknown Source) ~[?:?]
        at org.opennms.netmgt.dao.support.DefaultFilterWatcher$FilterSession.lambda$refreshNow$0(DefaultFilterWatcher.java:228) ~[opennms-dao-31.0.3-SNAPSHOT.jar:?]
        at org.opennms.netmgt.dao.hibernate.DefaultSessionUtils.withManualFlush(DefaultSessionUtils.java:81) ~[opennms-dao-31.0.3-SNAPSHOT.jar:?]
---

Inspect these messages and stack traces further to see if there are any potential problems. Some of the messages may be false positives.