Cassandra Database Summit 2013 San Francisco

Cassandra Database Summit 2013 was another great conference by DataStax, held at historic Fort Mason in San Francisco. The fort’s various barracks were used as meeting rooms.

It made for a long trip from San Jose, but it was interesting to watch the America’s Cup yachts sailing near Alcatraz on Tuesday.

Executive Summary

  1. Cassandra database powers some of the largest Internet sites, including Netflix and growing parts of eBay and Barracuda Networks. Each of those sites is handling billions of requests per day with Cassandra.
  2. The support software ecosystem has continued to grow, with about a dozen exhibitor booths at the conference. Quest TOAD supports Cassandra via ODBC, and you can do “netflow”-type monitoring with Boundary
  3. Netflix Cloud Persistence Group recommends CL=1, RF=3 for most applications.
  4. Monitoring is extremely important in Cassandra operations. Since server time (and timezone) setting is so critical, ensure that it is specifically monitored on each node.
  5. Barracuda Networks: “To manage Cassandra operations, you need to be a Java programmer.”

The slides and videos will be available online starting around June 26, 2013.

Tuesday Keynotes

Billy Bosworth, CEO, DataStax
Jonathon Ellis, CTO, DataStax/Apache Cassandra Chair

slideshare: Cassandra Summit 2013 Keynote

Tuesday Talks

Getting to the Right Overall Data Architecture
Vincent Dell’Anno and Thomas J. Glazier, Accenture

– gambling psychology (risk/reward analysis) applied to executive mgmt. decisions
– don’t think incremental change, think disruptive to tip risk vs. reward calculation
– “virtualization, SANs and WANs are bad news”
– Enterprise IT departments are actually more risk averse than CxOs
– Enterprise retort to new tools: “we’re not Netflix and we don’t have Google’s budget”
– can’t rip and replace, can intro new systems to provide a new set of capabilities
– all accomplishments are by the greater fool

Netflix Open Source Tools and Benchmarks for Cassandra
Adrian Cockcroft, Netflix

– another conference, another mind-blowing talk from Adrian – his slides are like seeing a crystal ball of the future
– this year, instead of building a massive cluster in 10 minutes, he reviewed the Netflix multi-region Cassandra CL=1 test
– test was 9 Gbps, hoping for 480 Gbps “to break (partition) the entire Internet between USA west coast and east coast”, but replication is single-threaded. Used Boundary during test to graph packet flow.
– “SSD makes compaction problems go away.”
denominator is a new project for portable control of DNS clouds for HA DNS
– one of the benefits of storing your data in S3 is that you can also do your data warehouse and BI analysis in AWS with no data migration! And you can use reserved instances for your DW cluster by doing the peak load processing, then reusing the same nodes off-peak for other analysis clusters. Incredible flexibility and velocity with massive cost savings!!

Tuesday Lunch

– Jason Brown is Netflix’ Cassandra committer now
– choice of 6 food trucks for lunch. I had the chicken curry with a somasa while watching the Oracle boat practising near Alcatraz Island
– attendee: “I live in SF, but even I had to take a taxi here.”


t_c2013_foodtruck1.jpg
t_c2013_foodtruck2.jpg

Librato comments
– time-series aggregation and reporting
– set compaction level to 2
– vnodes are important to save money by incrementally instead of doubling ring size
– no Cassandra source code changes required yet.

Eventual Consistency != Hopeful Consistency
Embracing Optimistic Design in the Persistence Layer
Christos Kalantzis, Netflix
Manager, Cloud Persistence Team

– CL=1 works nearly all the time for most applications – know your business. If business owners can’t understand that, then maybe you’re working at the wrong company
– low consistency examples: an Amazon order (“sorry, that item is sold out”, bank checks (“sorry, overdraft”)
– less latency, better user experience: talk to mgmt. about requirements
– do a POC. We did the test that Adrian talked about this morning
– Netflix might make REST calls to 100 databases per page and architecture is AWS (moderate latency)
– CAP theorem – still valid
– post-talk discussion: affinity for immediate read after write is not necessary because the client knows what it wrote.

Lock It Up: Securing Sensitive Data
– Sam Heywood, Gazzang

– 3 main security products: zncrypt, ztrustee, zescrow
– transparent AES encryption for cloud with key mgmt.
– key manager can also manage private keys for PGP, SSL certs, etc.
– can be per-process encryption
– PII, PCI-DSS, FERPA (student data law)
– file, directory, block-level encryption fast with modern Intel processors
– good presenter.

Data modelers Save their Careers: Surviving and Thriving with NoSQL
Joe Maguire, Analyst/Author, Dataqualitystrategies.com

– conceptual modeling vs. relational modeling
– conceptual, logical, physical
– ask users in English
– understanding the problem is of course important to solving the problem
– ask about consistency requirements, not necessary with RDBMS generally
– partitioning discussion not necessary at conceptual level except in special cases like PII
– “Mastering Data Modeling: a user-driven approach”, Sulis and Maguire
– users don’t care about object models, which is a software construct
– data requirements last longer than process requirements
– typically his enterprise data model is the basis for 50 later data “applets”.

When Bad Things Happen to Good Data
Understanding Anti-Entropy in Cassandra
Jason Brown, Netflix/Apache Cassandra Committer

– test major versions before upgrading in production
– 1.2 has atomic updates, bad with 5,000 record batches
– do major compaction at least every 10 days or deleted rows will resurrect
– you also need nodetool cleanup after compaction or the keys will be sent over and over.

Wednesday Talks

Hindsight is 20/20. MySQL to Cassandra
Michael Kjellman, Barracuda Networks/perlcassa maintainer

– tried to scale MySQL master with hardware (256 GB RAM, 16 disks, 16 cores) – nope
– moved spam filtering system from MySQL to Cassandra, lower latency now, 100x more rules
– but painful rewrite, educating business on getting away from flat-file mentality. team of 4.
– carefully define data model up front
– consider migration plan – sync data, epoch times
– don’t use Cassandra as a queue – we use Kafka
– use Cassandra stress tool to estimate performance, multiply by replication factor
– distributed systems move complexity to operations – automation is super important
– CCM can make a local cluster
– jconsole, jolokia (JMX via HTTP)
– 2 years in production, 2 product lines, another one in beta
– CQL is cool. business logic all standardized in CQL.
– for search, insert into kafka => ElasticSearch, triggers probably won’t be performant
– RF=3 for important data, RF=2 for less important blobs to save space, 24 nodes across 2 DCs
– pin column family to a SSD using symlink, separate disk for commit log, separate disk for data directory
– keeping data per node under 600 GB makes repair and compaction operations manageable
– Operations folks should be Java programmers to understand Cassandra well
– good to have internal advocates for other databases to keep new database evaluations honest.

Webinar questions:

– Azure has Cassandra available, Datastax has .NET driver
– Ops has to be ready for Cassandra or outages can be as bad or worse

I talked with Colin Charles, MariaDB Community Evangelist. He suggested that I email my MySQL SHUTDOWN patch to the MariaDB mailing list, and that they were looking for bug patches for the memcached plugin to be more reliable. And consider adding my blog to PlanetMySQL.

The Next Top Data Model
Patrick McFadin, DataStax

– Patrick is a DBA who has helped many DataStax users with data modelling questions
– went over several data model “recipes” for Web 2.0-type features
– Cassandra is good at writes, so it’s fine to write multiple tables for the same data (denormalization++)
– expiration feature is very handy for session tables, real-time analytics, etc.
– counters can be handy for some things
– post-talk discussion: although you could do a game scoreboard in Cassandra with 1 second updates, Redis would be better because in-memory.


t_c2013_patrick_mcfadin.jpg
Patrick “Rockstar” McFadin (click to enlarge)

Lunch

Same food trucks came back, different sequence today though. Had BBQ beef mini-sandwich.

I spent some time talking to three of The Next Great Data Developer winners from last nite about Silicon Valley, technology and business. They are all recent CS grads who have been offered internships at DataStax. They were from Montreal, New York/Irvine (school workflow) and London/Bulgaria (messaging via Cassandra over Tor).

Splunk and Cassandra = New value to business
Eddie Satterly, Splunk

In Case of Emergency Break Glass
Aaron Morton, Apache Cassandra Committer

– Aaron is a Cassandra consultant in New Zealand
– deep talk on trouble-shooting Cassandra operational problems from a consultant
– important to really understand nodetool output for various options
– recommends “Java Performance” by Charlie Hunt from 2012
– one of the most interesting problems he encountered was when a node had a future time setting
– Java can only use 8 GB heap. To use more memory, consider using the off-heap row cache.


t_c2013_aaron_morton.jpg
Aaron Morton (click to enlarge)

Cassandra at eBay Scale
Feng Qu and Anurag Jambhekar

– Feng has been a DBA since Oracle 5
– billions of reads and writes per day on 10+ Cassandra rings
– some nodes run on Violin SSD
– typically 96 GB RAM or 128 GB RAM per node
– row cache is only used when less than 100,000 rows, thus seldom
– highly recommend DataStax support contract, use DSE plus integrated NOC monitoring plus email alerts to DBA team
– company is thinking about making their private datacenters more elastic.

Cassandra Internals, Aaron Morton

Vendor Exhibits

There were about a dozen vendor booths, plus vendor speakers like Librato and Splunk.

Sandisk

– wide range of NAND products
– new PCIe product Kilimanjaro in 2 months
– own NAND, Firmware, controller – major player
– could be cheap PCIe
– Schooner sales team working at Sandisk now(?)

Acunu

– analytics on top of Cassandra
– major verticals: 1. Financial 2. Hi-tech/web 3. Telco POC
– used to have own Cassandra distro, not now.

Dell/Quest BI Tools

– Dell bought Quest a year ago or so
– TOAD for Cloud Databases has supported Cassandra since 2010 via ODBC, but not sure about anything Cassandra-specific yet
toadworld.com

Boundary

– Network Traffic Analytics – SaaS-only for now
– light-weight daemon on each linux node to capture PCAP data and transmit to SaaS cloud for analysis, source code available, works on bare-metal or VMs
– can overlay with Splunk or other event sources using API (like overlay repair and compaction log events)
– Adrian Cockcroft from Netflix used this in his large multi-region test, as explained in his Tuesday talk.

Gazzang

– cloud security and encryption products.

Fort Mason – DataStax Event Details
#cassandra
@DataStax

Cassandra Summit 2012
DataStax Apache Cassandra Training in SF (2011)


t_c2013_alcatraz.jpg
t_c2013_gg1.jpg
t_c2013_gg_bell.jpg
t_c2013_pier3.jpg
t_c2013_pier4.jpg
t_c2013_pier5.jpg
Camera: Nikon D300/70-200mm f/2.8VR

Transit info: I took the Caltrain for $9.00 from San Jose to SF, then a $20.00 taxi to Fort Mason. There’s no BART stop, but if you have 45 to 60 minutes, the SFMTA 30X-Marina Express bus for $2.00 would get you within 200 yards, where you can walk or transfer to the 28 Inbound to Fort Mason bus.

One Chinese tourist on the Muni bus had a Nikon D800 … with a Nikkor manual focus 28mm AIS lens! He had one of the latest bodies and a lens they stopped making 30 years ago!

This entry was posted in API Programming, Business, Cassandra, Cloud, Conferences, Hadoop, Linux, MySQL, Open Source, Storage, Tech. Bookmark the permalink.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.