The Cassandra database has a setting in cassandra.yaml, num_tokens, for the number of vnodes. num_tokens is the number of partitions to use per host, and thus the number of parallel streams to use for data updates.
The default was 256 vnodes, but that lead to a high probability of a streaming failure, so “DataStax recommends using 8 vnodes (tokens)” now.
A Netflix paper agrees, saying, “the Cassandra default of 256 virtual nodes per physical host is unwise”, as well as the experienced DBAs on the Apache Cassandra Users List.
To calculate the impact of vnodes count on cluster streaming reliability:
where Pstreaming-one-failure is the independent probability of a streaming failure of one connection, possibly in the range of .0001 to .00001, during one week. (You could process your log files to get your exact failure count.)
Note that changing num_tokens after a ring bootstraps is not a casual thing. The easiest way is to replicate to a new ring or DC with different num_tokens setting, then fail over.
Examples of Streaming Errors
datastax: Streaming operations throw “java.lang.AssertionError: Memory was freed” error
SO: Can’t add a new Cassandra datacenter due to streaming errors
Cassandra Vnodes and token Ranges
Netflix: Cassandra Availability with vnodes Whitepaper