DataStax Apache Cassandra Training in SF

I went to the DataStax training for Apache Cassandra today at the Hilton San Francisco Financial District Hotel.

Ben Coverston, Director of Operations at DataStax, did a great job presenting some very technical material.

Topics included data modeling, client programs, server configuration, replication, tuning, performance monitoring and operations.

Labs were done using the provided USB drive containing a VM (system requirements: 2 GB RAM and 12 GB disk free) of Cassandra and other tools like JConsole running on Debian Linux.

(I should have used VirtualBox, which is free (GPL) and doesn’t require registration, instead of the VMware Fusion trial, which has restrictive licensing and only works for a month.

To use VirtualBox4, create a new VM and instead of creating new disk space, attach “Riptano Training.vmdk”. You may also need to rename the parent directory from “datastax_training_dist client.vmware” to “datastax_training_dist client” for the navigator to open it.)

About 45 people attended, a new class record.

If you get a chance to take Ben’s class, I highly recommend it – send your whole team.

To get the most out of the class, I recommend reading about Cassandra data modeling and trying the various tools (cassandra-cli, cql, nodetool, JConsole, JNA, JMX, Opscenter, etc.) and hanging out on the #cassandra channel on irc.freenode.net. beforehand.

The hotel was a good choice, located near the financial district and Chinatown, with reliable WiFi, conveniently organized meeting space, and a very good lunch.

I had an interesting discussion with some veteran DBAs about the popularity of NoSQL solutions on the East Coast. The most popular request was for – get this – SQL access via Hive or cql. đŸ™‚

Hilton San Francisco Financial District
750 Kearny Street, San Francisco, California, United States 94108

@bcoverston
Ben Coverston – The Apache Cassandra Project
sfgate.com: DataStax Training for Apache Cassandra – SF
Apache Cassandra training for $30? from @spyced of Datastax? No way!
Let’s play with Cassandra… (Part 1/3)

Some notes about 0.8 and previous releases:

– Cassandra Clients

  • cassandra-cli
  • Hector
  • CQL
  • pycassa
  • Perl and PHP clients are not as reliable because of poor Thrift bindings, should be rewritten to use cql binary protocol
  • Java

– 3 common cassandra-cli commands are:

  1. show
  2. describe
  3. list

– Counters

  • If you use counters, then don’t delete SStables. decommission and bootstrap.
  • TimeoutException can cause overcount with counters.
  • No secondary indexes on counters.
  • Repair does not fix counters.

– nodetool

  • nodetool flush
  • nodetool drain

– Repair

  • Send schema changes 5 seconds apart to avoid wedging other nodes. To fix a wedge, delete old schema (leave data alone) and restart.
  • mlockall is enabled by default if JNA is found on the classpath (must be installed separately because of licensing). Also a good idea to turn swap off.
  • nodetool repair (or rsync all files then nodetool repair)
  • read at consistency level of all (ghetto repair)
  • adding a new node can be more efficient than removing a node then adding it.

– Compaction

  • rows over 64 MB need 2-pass compaction
  • minor compaction can purge tombstones > 0.6.6
  • avoid major compaction, but keep an eye on SStables-per-read histogram

– Monitoring

  • watch -d -n 5 nodetool info or cfstats or tpstats
  • JConsole and JMX
  • storageproxy latency, reads, writes
  • oadcdb storageproxy

The Best Cassandra Talks of 2011
Datastax Dev Blog

This entry was posted in Cassandra, Cloud, Conferences, Linux, MySQL, Open Source, Perl, Storage, Tech, Toys. Bookmark the permalink.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.