Cassandra Monitoring

This section describes some of the metrics that Horizon collects from a Cassandra cluster. JMX must be enabled on the Cassandra nodes and made accessible from Horizon in order to collect these metrics (see Enabling JMX Authentication and Authorization in the Cassandra documentation).

The data collection process is bound to the agent IP interface whose service name is JMX-Cassandra. The JMXCollector retrieves MBean entities from the Cassandra node.

Client connections

Cassandra collects the number of active client connections from org.apache.cassandra.metrics.Client:

Name Description

connectedNativeClients

Metrics for connected native clients.

connectedThriftClients

Metrics for connected thrift clients.

Compacted bytes

Cassandra collects the following compaction manager metric from org.apache.cassandra.metrics.Compaction:

Name Description

BytesCompacted

Number of bytes compacted since node started.

Compaction tasks

Cassandra collects the following compaction manager metrics from org.apache.cassandra.metrics.Compaction:

Name Description

CompletedTasks

Estimated number of completed compaction tasks.

PendingTasks

Estimated number of pending compaction tasks.

Storage load

Cassandra collects the following storage load metric from org.apache.cassandra.metrics.Storage:

Name Description

Load

Total disk space (in bytes) that this node uses.

Storage exceptions

Cassandra collects the following storage exception metric from org.apache.cassandra.metrics.Storage:

Name Description

Exceptions

Number of unhandled exceptions since start of this Cassandra instance.

Dropped messages

Any messages run after a given timeout (set per message type) are discarded. The number of dropped metrics across different message queues is a good indication of whether a given cluster can handle its load.

Cassandra measures this by gathering metrics data from org.apache.cassandra.metrics.DroppedMessage:

Name Description Stage

Mutation

If a write message is processed after its timeout (write_request_timeout_in_ms), it either sent a failure message to the client, or it met its requested consistency level and will rely on hinted handoff and read repairs to do the mutation if it succeeded.

MutationStage

Counter_Mutation

If a write message is processed after its timeout (write_request_timeout_in_ms), it either sent a failure message to the client, or it met its requested consistency level and will rely on hinted handoff and read repairs to do the mutation if it succeeded.

MutationStage

Read_Repair

Times out after write_request_timeout_in_ms.

MutationStage

Read

Times out after read_request_timeout_in_ms. After this point, an error is returned to the client and no further messages should be read.

ReadStage

Range_Slice

Times out after range_request_timeout_in_ms.

ReadStage

Request_Response

Times out after request_timeout_in_ms. Indicates that the response was completed and sent back, but not before the timeout.

RequestResponseStage

Thread pools

Apache Cassandra is based on a staged event-driven architecture (SEDA). This separates different operations into stages. Each stage is loosely coupled using a messaging service, and each uses queues and thread pools to group and run its tasks.

The documentation for Cassandra thread pool monitoring originated from the Pythian Guide to Cassandra Thread Pools.

Collected thread pool metrics
Name Description

ActiveTasks

Tasks that are currently running.

CompletedTasks

Tasks that have finished.

CurrentlyBlockedTasks

Tasks that are blocked due to a full queue.

PendingTasks

Tasks that are queued to run.

Memtable FlushWriter

You can use using org.apache.cassandra.metrics.ThreadPools to sort and write memtables to disk.

Most of the time, memtable issues are caused by overrunning disk capability. Sorting can cause issues as well; they are usually accompanied by a high load, but a small number of actual flushes (as seen in cfstats). Substantial rows with long column names—​or something inserting many large values into a CQL collection—​can cause these problems.

If disk capabilities are being overrun, you should either add nodes to reduce the load, or update the node’s configuration.

Alerts: pending > 15 || blocked > 0

Memtable post-flusher

These are any operations that are run after flushing the memtable:

  • Discard commit log files that have had all data in them in SSTables.

  • Flush non-cf-backed secondary indexes.

Alerts: pending > 15 || blocked > 0

Anti-entropy stage

This stage primarily focuses on repairing consistency. It handles repair messages (for example, Merkle tree transfers from validation compaction) and streaming.

Alerts: pending > 15 || blocked > 0

Gossip stage

This stage monitors pending tasks for issues, and outputs logs as appropriate.

Example log message denoting the number of pending tasks
Gossip stage has {} pending tasks; skipping status check ...

To remedy any issues, check that your NTP is working correctly and run the nodetool resetlocalschema command. If that does not work, you can try to delete the system column family order.

Alerts: pending > 15 || blocked > 0

Migration stage

Any schema changes occur during this stage.

Alerts: pending > 15 || blocked > 0

MiscStage

During this stage, snapshots are taken and data is replicated after a node is removed.

Alerts: pending > 15 || blocked > 0

Mutation stage

During this stage, local inserts and deletions are performed. This includes the following processes:

  • Inserts and updates.

  • Schema merges.

  • Commit log replays.

  • Hints in progress.

An increase in pending tasks during this stage can be caused by disk issues, system overload, or poor configuration. If messages are backed up in this stage, you can add nodes, update your hardware and configuration settings, or update the data model and use case settings.

Alerts: pending > 15 || blocked > 0

Read stage

Local reads are performed during this stage. Data is also deserialized from the row cache.

Pending values can cause increased read latency. The number of pending values can spike because of disk problems, poor configuration, or cluster overload. In many cases (not including disk failure), you can resolve this by adding nodes or updating the configuration settings.

Alerts: pending > 15 || blocked > 0

Request response stage

When a response to a request is received, this stage performs any callbacks that were created with the original request.

Alerts: pending > 15 || blocked > 0

Read repair stage

Read repairs are performed during this stage. The chance of repairs occurring can be configured per column family with the read_repair_chance property.

Data is more likely to be backed up if you are using CL.ONE (and, to a lesser extent, other non-CL.ALL queries) for reads. It’s also more likely if you are using multiple data centers.

The process runs asynchronously outside of the query feedback loop. Note that this is not likely to be a problem, since repair does not happen on all queries, and it quickly provides good connectivity between replicas. The repair being droppable also means that it will be discarded after the time period specified in write_request_timeout_in_ms, which further mitigates issues.

If the pending queue grows, you can try to lower the rate for high-read CFs.

Alerts: pending > 15 || blocked > 0

JVM metrics

Cassandra also collects some key metrics from the Java virtual machine:

  • java.lang:type=Memory: The memory system for the Java virtual machine. This includes heap and non-heap memory.

  • java.lang:type=GarbageCollector,name=ConcurrentMarkSweep: Metrics for the Java virtual machine’s garbage collection process.

If you use Apache Cassandra to run Newts, you can also enable additional metrics for the Newts keyspace.