Cassandra vnodes Streaming Reliability Calculator

The Cassandra database has a setting in cassandra.yaml, num_tokens, for the number of vnodes. num_tokens is the number of partitions to use per host, and thus the number of parallel streams to use for data updates.

The default was 256 vnodes, but that lead to a high probability of a streaming failure, so “DataStax recommends using 8 vnodes (tokens)” now.

A Netflix paper agrees, saying, “the Cassandra default of 256 virtual nodes per physical host is unwise”, as well as the experienced DBAs on the Apache Cassandra Users List.

To calculate the impact of vnodes count on cluster streaming reliability:

where Pstreaming-one-failure is the independent probability of a streaming failure of one connection, possibly in the range of .0001 to .00001, during one week. (You could process your log files to get your exact failure count.)

I wrote a Javascript calculator to help visualize how vnodes increase the probability of streaming failures.

Calculate Cassandra streaming reliability using Javascript:

Expected probability of a vnode stream failing (per week):
Number of nodes in cluster:

Note that changing num_tokens after a ring bootstraps is not a casual thing. The easiest way is to replicate to a new ring or DC with different num_tokens setting, then fail over.

Examples of Streaming Errors

datastax: Streaming operations throw “java.lang.AssertionError: Memory was freed” error
SO: Can’t add a new Cassandra datacenter due to streaming errors
Cassandra Vnodes and token Ranges
Netflix: Cassandra Availability with vnodes Whitepaper

This entry was posted in Cassandra, Open Source, Tech. Bookmark the permalink.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.