# DBSCAN Engine

The DSCAN engine is a Cluster Engine that builds clusters using the DBSCAN algorithm.

## DBSCAN

Given the graph, we extract the alarms from all the vertices and use these as points as input to the DBSCAN algorithm.

DBSCAN requires a constant $\epsilon$ and a distance function, which we define as follows:

$d(a_{1}, a_{2}) = \alpha ( ( \beta | t( a_{1} ) - t( a_{2} ) | \frac{1}{60} + ( 1 - \beta ) dg( a_{1}, a_{2} ) )$

where:

• $a_{1}$ and $a_{2}$ are the points representing the alarms

• $\alpha \in (0, \infty)$ is a scaling constant (directly related to $\epsilon$)

• $\beta \in [0,1$] is a weighting constant

• When $\beta$ is closer to 0, more weight is given to the temporal component

• When $\beta$ is closer to 1, more weight is given to the spatital component

• $t(a_{k})$ returns the time (timestamp in seconds) of the last occurence of the given alarm

• $dg(a_{i}, a_{j})$ returns the normalized distance on the shortest path between the vertices for $a_{i}$ and $a_{k}$

• If both alarms are on the same vertex, then the distance is 0

• If there is no path between both alarms, then the distance is $\infty$

In simpler terms, we can think of the distance function as taking a weighted combination of both the distance in time and in space.

### Defaults

We set the constants with the following defaults:

• $\epsilon = 100$

• $minpts = 1$

• $\alpha = 144.47$

• $\beta = 0.55$

These were derived empirically during our testing.

### Examples

Let’s assume that we have the following graph: #### a1 and a2

Let’s start determining the distance between $a_{1}$ and $a_{2}$. We can calculate the time component with:

\begin{align} | t( a_{1} ) - t( a_{2} ) | & = | 1 - 30 | \\\\ & = 29 \end{align}

And given that $a_{1}$ and $a_{2}$ are on the same vertex, the spatial component is simply zero:

$dg( a_{1}, a_{2} ) = 0$

Placing these results in the original equation gives us:

\begin{align} d( a_{1}, a_{2} ) & = \alpha ( \beta \times \frac{29}{60} ) \\\\ & = 38.40 \end{align}

and $d(a_{1}, a_{2}) < \epsilon$, so the alarms will be clustered together.

#### a3 and a4

Now let’s determine the distance between $a_{3}$ and $a_{4}$. We can calculate the time component with:

\begin{align} | t( a_{3} ) - t( a_{4} ) | & = | 500 - 510 | \\\\ & = 10 \end{align}

To calculate the spatial distance between $a_{3}$ and $a_{4}$, we sum up the weights on the edges between the shortest path and divide this result by the default weight (=100), so:

\begin{align} dg( a_{3}, a_{4} ) & = \frac{50 + 50}{100} \\\\ & = 1 \end{align}

Placing these results in the original equation gives us:

\begin{align} d( a_{3}, a_{4} ) & = \alpha ( \beta \times \frac{10}{60} + (1 - \beta) \times 1 ) \\\\ & = \alpha ( 0.092 + 0.45 ) \\\\ & = 78.30 \end{align}

and $d(a_{3}, a_{4}) < \epsilon$, so the alarms will be clustered together.

#### a2 and a3

Now let’s determine the distance between $a_{2}$ and $a_{3}$. We can calculate the time component with:

\begin{align} | t( a_{2} ) - t( a_{3} ) | & = | 30 - 500 | \\\\ & = 470 \end{align}

The value of the spatial component is:

\begin{align} dg( a_{2}, a_{3} ) & = \frac{100}{100} \\\\ & = 1 \end{align}

Placing these results in the original equation gives us:

\begin{align} d( a_{2}, a_{3} ) & = \alpha ( \beta \times \frac{470}{60} + (1 - \beta) \times 1 ) \\\\ & = \alpha ( 7.83 + 0.45 ) \\\\ & = 1196.2116 \end{align}

and $d(a_{2}, a_{3}) > \epsilon$, so the alarms will not be clustered together.

#### Results

Given the results of the calculations above, we the DBSCAN algorithm will output the following clusters:

$\text{clusters} = \{ \{ a_{1}, a_{2} \}, \{ a_{3}, a_{4} \} \}$

## Performance

The DBSCAN algorithm performs well when there are less than 500 candidate alarms. It has a worst-case complexity of $O(n^2)$.

Note that alarms are only considered to be candidates for correlation when they have been created and/or updated in the last 2 hours (configurable). This means that the engine can still be used on systems with more than 500 active alarms, since many of these will age out over time.