Automatic Database Cleanup
Horizon has several independent cleanup mechanisms for events, alarms, and outages (and other data stored in the database). Each one targets a different set of records and has its own configuration file.
| Mechanism | Controls | Configured in |
|---|---|---|
Alarmd Drools rules |
Alarms (and situations) |
|
Vacuumd |
Events, soft-deleted nodes/interfaces/services, statistics reports, key-value store entries, path-outage alarm clearing |
|
AlarmPersister |
Reduced events under an existing alarm (opt-in, per event type) |
|
Alarms (Alarmd Drools rules)
Alarms are pruned by Drools rules running inside Alarmd, not by Vacuumd. The default rules use a tiered cascade so most cleared, low-severity alarms disappear within minutes while persistent or unattended alarms survive a few days at most.
Rule file: ${OPENNMS_HOME}/etc/alarmd/drools-rules.d/alarmd.drl.
| Rule | Inactivity window | Match conditions |
|---|---|---|
|
5 minutes |
Severity is at most NORMAL (CLEARED, NORMAL, INDETERMINATE), unacknowledged, no open trouble ticket. |
|
1 day |
Severity is at most NORMAL, no open trouble ticket. Acknowledged or not. |
|
3 days |
Unacknowledged, any severity. |
|
8 days |
Any alarm at all. Final backstop. |
The "inactivity window" is the time since the alarm last received an event.
A flapping alarm whose lastUpdateTime keeps moving forward will never trip the timer; once events stop arriving, the appropriate rule fires after the window elapses.
The same file also contains:
-
cosmicClear: when a resolution event arrives whoseclear-keymatches a problem alarm’s reduction key, the problem alarm is set to CLEARED severity. This is what drives the bulk of thecleanUpdeletions five minutes later. -
unclear: if a CLEARED alarm receives a new event with severity above CLEARED, it is restored to that severity (so the cleanup timer resets).
Adjusting alarm cleanup
Edit the windows directly in alarmd.drl. For example, to keep cleared alarms around for an hour rather than five minutes, change the cleanUp rule’s over window:time( 5m ) to over window:time( 1h ).
To disable a rule entirely, add enabled false to its declaration:
rule "fullGC"
enabled false
salience 0
when
...
end
Reload after editing:
${OPENNMS_HOME}/bin/send-event.pl uei.opennms.org/internal/reloadDaemonConfig --parm 'daemonName Alarmd'
Lengthening the windows can let alarms accumulate faster than they are deleted. Watch the alarms table row count after changes. This can manifest as Alarmd memory pressure long before it causes UI symptoms or crashes Horizon.
|
Disabled-by-default rules
alarmd.drl includes several ticketing-related rules with enabled false: createTickets, createCriticalTicket, updateTickets, closeClearedAlarmTickets, clearAlarmsForClosedTickets. These can be enabled in combination with a trouble-ticketing integration to have alarms auto-acknowledged when tickets are opened. Severity escalation is also present as a commented-out escalation rule near the top of the file.
Events, soft-deleted entities, and KV-store entries (Vacuumd)
Vacuumd is a Horizon daemon that runs SQL statements on a schedule. The default configuration uses two distinct mechanisms inside the same file:
-
<statement>: These blocks run on a single shared period, default 24 hours (controlled by theperiodattribute on<VacuumdConfiguration>, in milliseconds). -
<automations>: Each automation runs on its owninterval, optionally driven by a<trigger>query.
Configuration file: ${OPENNMS_HOME}/etc/vacuumd-configuration.xml.
What the periodic statements clean
All run every 24 hours by default.
| Target | What gets deleted |
|---|---|
|
Rows older than 6 weeks that are not referenced by any outage or notification. Events tied to outages or notifications are kept indefinitely. |
|
Rows where |
|
Rows where |
|
Rows where |
|
Rows where |
|
Rows whose |
What the default automations do
| Automation | Interval | What it does |
|---|---|---|
|
1 hour |
|
|
30 seconds |
When a node sends |
The configuration file also contains commented-out examples for monitoring asset maintenance-contract expirations.
Adjusting Vacuumd
Edit vacuumd-configuration.xml directly:
-
To change how aggressively events are pruned, edit the SQL in the
events<statement>block (many users tighten or loosen the'6 weeks'interval based on their needs). -
To change the global statement period, set
<VacuumdConfiguration period="…">to a different millisecond value. -
To change an automation’s frequency, set its
intervalattribute. -
To disable an automation without deleting it, set
active="false".
Reload after editing:
${OPENNMS_HOME}/bin/send-event.pl uei.opennms.org/internal/reloadDaemonConfig --parm 'daemonName Vacuumd'
Reduced-event cleanup (auto-clean in eventconf)
This is a separate, opt-in mechanism that prunes the events table only, and only for a specific class of events: events that keep reducing into the same existing alarm.
When an event arrives whose <alarm-data> definition has auto-clean="true", and that event reduces into an existing alarm rather than creating a new one, AlarmPersister deletes every prior event for that alarm:
DELETE FROM events WHERE alarmid = ? AND eventid != ?
The alarm itself is untouched. Only the previously-reduced event rows go.
| This runs on each reduction, not on alarm creation. The first event that creates the alarm is kept until the next reducing event arrives, at which point the first event is deleted along with the rest. |
There are many event definitions included with Horizon by default that have auto-clean="true" set; including authentication events, the catch-all SNMP trap UEI, and many vendor-MIB-driven trap events.
When to enable it
Useful for high-frequency reducing UEIs where the alarm count stays at one but the event table grows linearly with the reduction count. Common candidates:
-
Interface flapping events that reduce into a single up/down alarm.
-
Polling-failure events that reduce into a single service-outage alarm.
-
Trap-driven events from chatty devices that all reduce to the same reduction key.
Without auto-clean, every reduction stores another events row, even though the user-visible alarm is just one entry. With auto-clean, the alarm carries only its most recent contributing event.
Where to enable it
Edit the appropriate event definition under ${OPENNMS_HOME}/etc/events/ (or via the web UI event-config editor) and set the attribute on <alarm-data>:
<event>
<uei>uei.opennms.org/example/myFloodingEvent</uei>
<event-label>Example: Flooding Event</event-label>
...
<alarm-data reduction-key="%uei%:%nodeid%"
alarm-type="1"
auto-clean="true"/>
</event>
Then reload event configuration:
${OPENNMS_HOME}/bin/send-event.pl uei.opennms.org/internal/reloadDaemonConfig --parm 'daemonName Eventd'
Enabling auto-clean is destructive of historical data. The full event log for that alarm collapses to a single row each time the alarm is reduced into. If you rely on a complete event-by-event audit trail, leave auto-clean disabled.
|
What does not get cleaned by default
These records accumulate forever unless an operator removes them or you add cleanup logic yourself.
| Records | Notes |
|---|---|
Outages |
The |
Notifications |
The |
Acknowledgments |
Each |
Events tied to outages or notifications |
The 6-week event purge explicitly excludes these. An event referenced by a five-year-old resolved outage will still be on disk. |
Outage and notification history are kept by default. Environments with high event throughput may want to add their own cleanup.
Extending the defaults
Add new retention rules in Vacuumd. Vacuumd statements can target any table; Drools rules only operate on alarms.
Two patterns to follow:
-
For a fixed cadence with no preconditions, add a
<statement>. It inherits the globalperiod. -
For higher frequency than the global period, sub-second runs, or chaining a query result into a parameterized
UPDATE/DELETE, add an<automation>with its ownintervaland an optional<trigger>to feed it.
Example: purge resolved outages older than 90 days
Add inside <VacuumdConfiguration>:
<statement>
DELETE FROM outages
WHERE ifregainedservice IS NOT NULL
AND ifregainedservice < now() - interval '90 days';
</statement>
Reports that join the outages table will lose visibility into anything older than the cutoff. Pick the interval to match how far back your reporting actually looks.
|
Example: purge old notifications
<statement>
DELETE FROM notifications
WHERE pagetime IS NOT NULL
AND pagetime < now() - interval '90 days';
</statement>
Notifications reference events via the eventid column. Once they are gone, the periodic events purge will start removing the events that those notifications were keeping alive.
Example: tighten the events retention
The default statement uses 6 weeks. To compress it to 14 days, replace the interval literal:
<statement>
DELETE FROM events WHERE NOT EXISTS
(SELECT svclosteventid FROM outages WHERE svclosteventid = events.eventid
UNION
SELECT svcregainedeventid FROM outages WHERE svcregainedeventid = events.eventid
UNION
SELECT eventid FROM notifications WHERE eventid = events.eventid)
AND eventtime < now() - interval '14 days';
</statement>
Run the SELECT count(*) form of the statement against your database first to gauge how many rows the change will affect. Pruning years of accumulated events in one pass can hold a database lock for a long time and consume a large amount of transaction-log space.
|
Example: a custom automation with a trigger
The default clearPathOutages automation uses a trigger pattern (IE: a trigger query feeds rows into a parameterized action). The same basic pattern works for any "for each row matching X, do Y" cleanup task.
<automations>
<automation name="archiveStaleAcks" interval="3600000" active="true"
trigger-name="findStaleAcks"
action-name="deleteStaleAcks"/>
</automations>
<triggers>
<trigger name="findStaleAcks" operator=">=" row-count="1">
<statement>
SELECT id AS _ackid FROM acks
WHERE acktime < now() - interval '180 days'
</statement>
</trigger>
</triggers>
<actions>
<action name="deleteStaleAcks">
<statement>
DELETE FROM acks WHERE id = ${_ackid}
</statement>
</action>
</actions>
Tokens like ${ackid} come from the trigger’s columns prefixed with . The action runs once per row returned by the trigger.
Verifying that cleanup is happening
| Signal | Where to look |
|---|---|
Vacuumd executed its statements |
|
Alarmd Drools rules fired |
|
Row counts directly |
|
If a Vacuumd reload silently does nothing (no Ran update lines, but no errors either), check that Vacuumd is started: opennms:status from the Karaf shell should list it as Running. The same applies to Alarmd.