Cassandra SF 2011, San Francisco

I went to the Cassandra SF 2011 conference today, at the UCSF Mission Bay Conference Center.

It was hosted by Datastax.com, where many of the Cassandra developers work.

Here’s an overview of some of the talks I attended.

Keynote
Jonathan Ellis, DataStax CTO and Apache Project Chair of Cassandra

Talked about the latest features and use cases for Cassandra and Brisk, Hadoop powered by Cassandra.

Cassandra Use Cases at Twitter
Chris Goffinet, Twitter (ex-Yahoo! Performance Engineer)

Ext4-writeback-512k best, xfs also good
Noop best for reads p90, p99, avg, max
All schedulers same for writes because of journal.
Multi-threaded compaction all the time:
CASSANDRA-2191 Multithread across compaction buckets
CASSANDRA-2156 Compaction Throttling
Compression reduces data by 7x
memcached most efficient space-wise
99th percentile went from 200-800 ms -> 2.5 ms
Cuckoo replaces Ganglia, updates every 60s
140 MB/s writes, 70 MB/s reads
500 GB new data / hour
x3 replication factor
Slab allocator fixed size allocs of 2 MB
CASSANDRA-2252 off-heap memtables
Reduce GC time by 1000x
Don’t use TTL, drop SSTables after N days
CASSANDRA-2819 Split rpc timeout for read and write ops
mlockall recommended with JNA jar to fix JVM allocation problems
git, Hudson/Jenkins, murder (Twitter Open Source), Bittorrent to deploy to 100s of nodes in 20s
Cassie is a light-weight Cassandra Client
Based on finagle
Yahoo! Cloud Serving Benchmark (YCSB)-based test framework
Generate 1,200 reports per build, takes 48 hours to run
Understand your stack
Measure everything
Invest in your storage technology (right components, understand failure modes, etc.)
Automate
Expect everything to fail
Linux page cache did not have good read performance since not LRU and fixed 4K, thus memcached esp. with narrow rows
Need to evaluate alternative JVMs like JRockit and Azul for GC
Apache Mesos cluster manager and virtualization project.
Twitter Ops improvements – 6 people for 6 months.
Bittorrent originally used to update 1000s of apache servers, create 1 seed and have it then do neighbor. Can update 300-node Cassandra cluster in 20s. 🙂

(An attendee said speed of deployment is important more for recovering after a bad push than anything else in his case. It’s also helpful for software developers writing deployment scripts, and also when you have network architecture issues like overcommitment or latency-sensitive production traffic.)

Eric Onnen, Urban Airship

Committer on HBase, Cassandra, Zookeeper projects
Hosting for mobile services
United API for services across platforms
Common SLA is 10,000 or 120,000 messages / s
Over 160 million app installs on 80 million devices
Half of requests are device check-ins (ie. when an app foregrounds)
Now 25 million per day
Used early MongoDB, not so good
EC2 does not offer enough RAM for caching all their data
Cassandra starting in summer 2010 for android
6 EC2 himems, 1000 reads/s, 750 writes/s
30 GB node
Rolling upgrades
Column TTLs
Nice community
Good with EC2
No SPOF
Ability to alter CLs on a per-operation basis like checkins vs schema changes
Problems
Know your data model
Creating indexes later is a PITA
Wide rows are bad – IO, thrift, count
JSON better than packed binaries
Careful with thrift in the stack
Read timeout vs connection refused, GC vs failure, library assumptions and exception handling
Verify client
Ops
Ensure dynamic switch is enabled
IO
Avoid EBS except for snapshot backups or use S3
Stripe epehemerals, not EBS volumes
Use common init scripts
Avoid smaller instances – more steal
Bare metal will rock your world (compared to VMs or EC2)
Uselargepages
Look at thread dumps
Java TDA (Thread Dump Analyzer) – love it, only good thing to come out of SAP 🙂
Eclipse Memory Analyzer (MAT)
Perform major compactions daily in lull periods
Monitor JMX religiously
Cassandra exposes a lot of performance counters, use them
Still too much time looking at GC
Upgrading from 0.6 to 0.7 was rocky, had to do a read repair surprise
Need CQL or Pig to offload ad hoc reporting from 12 developers to 11 administrative staff 🙂
Sample end-user ad hoc query: How many android devices are using this version of software title “whatever”?

Introduction to Brisk
Jake Luciani, DataStax

Great talk on Brisk.

Highly Available DNS and Request Routing Using Apache Cassandra
David Strauss, Founder/CTO, Pantheon Systems

We accept HTTP requests and forward them to app servers (mostly for hosted Drupal)
Reconfiguring Varnish, nginx, and proxies not fun, disruptive
DNS has a nice model
And client tools like dig
nginx did not do round-robin
Node.js ok using DNS, HTTP proxy and Fugue modules, 99 lines of code
Why another? DNS has no ring replication, defined master
MS AD and ApacheDS too heavy
DNS needs to withstand DoS
Github cassandra-dns
Twisted doesn’t leak memory, but clients may
TxSQL is another way, and is non-blocking
Other ideas, point client resolvers at haproxy

Replacing Datacenter Oracle with global Apache Cassandra on AWS
Adrian Cockcraft, Netflix

DC capacity is 4 Billion requests per month
exponential growth to 30 billion requests per month
10 minutes to create a cluster in SG, would take 9 months to build a DC there
Full backup – daily cron snapshot
Incremental –
Continuous – commit log to EBS
Archive – S3, encrypt, region, separate account, different cloud
Guide to NoSQL, redux. Mark Atwood
Practicalcloudcomputing.com
Chaos monkey runs 9 am to 3 pm, need to opt-out
For BI take backup after midnite, but restore and chop off extra seconds after 00:00:00

Real World Capacity Planning Cassandra on Blades and Big Iron
Ed Capriolo, Media6degrees

Compacting tombstones at nite (batch time) can help with day-time traffic (real-time)
nodetool
JBOD not commonly used, but you could, like Hadoop or Riak
Puppet
Cacti graphs to acquire JMX data
Uses all SCSI/SAS drives
Likes blades, though only 1 or 2 drives per blade
Centos 5.5 ext4
Writing “High Performance Cassandra Cookbook”

Lightning Talks

Mike Weir, Mktg VP, Datastax

Compaction performance improvements C-1608

Ankit Shah, Principal Software Engineer, Verisign Authentication Group, Symantec

DB is 1.2 GB or so (small) but want HA
Can be read-intensive, some writes
Want good read performance, failover, replication
Secure gossip over IPSec or similar
Doesn’t like existing AWS database product limitations
So Cassandra is it

The Auto-Clustering Brisk AMI
Joaquin Casares, Datastax

Also RAIDs ephemeral volumes

Twitter

Largely done programming:

CASSANDRA-47 SSTable compression
CASSANDRA-674 New SSTable Format
CASSANDRA-16 Memory efficient compactions
CASSANDRA-2319 Promote row index
CASSANDRA-293 remove_key_range operation and
CASSANDRA-494 add remove_slice to the api
CASSANDRA-808 Need a way to skip corrupted data in SSTables
CASSANDRA-2398 Type specific compression

Mike Bulman, Opscenter Team, Datastax

Uses Cassandra itself for logging
Ring view graph
Add a node to ring tool

Cassandra Anti-Patterns
Matt Dennis, Datastax

Use Sun JVM u22 or newer, not OpenJDK or Blackdown
6-8 GB for java heap, not above 16 GB
Separate spindles for. data and commit log unless EBS
No EBS, use ephemeral
Don’t read before write if possible
Use bulk loader in 0.8, Java SStable program
Supercolumns need to be rewritten.

SlideShare: Cassandra Query Language (CQL)
IBM and SAP Open Source their JVM Diagnostics Tools
Migrating Netflix from Datacenter Oracle to Global Cassandra
Urban Airship raises $15 million (Nov., 2011)
Datastax Conference Videos

This entry was posted in Cloud, Conferences, Linux, MySQL, Open Source, Storage, Tech. Bookmark the permalink.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.