At the Bay Area Chef User Group tonite in Mountain View, Phil Dibowitz from Facebook Operations reprised a talk he gave at ChefConf 2013, “Scaling Chef at Facebook.”
His team of 4 people write and maintain the Chef systems for hundreds of thousands of servers, divided into several clusters of 10,000 or so nodes.
Facebook used CFEngine v2 before, but migrated to Enterprise Chef (private), and Community Chef (public) on at least one cluster.
The advantage of private Chef is to get features faster, especially related to large clusters.
He does not like CFEngine because it is file-based, whereas Chef is template-based.
Phil likes idempotent behavior. Idempotent originally meant “not changed in value following multiplication by itself,” but in this case means “acting as if used only once, even if used multiple times.” Thus a file only gets downloaded or updated once per client, not over and over.
(A second principle in managing networked systems like HPC is near locality of reference. Blekko is a big proponent of this for performance reasons.)
And in any large cluster, the secret to eventual consistency of all nodes is to have clients pull changes, not have the deploy server push them.
He used sysctl.conf as an example of how Chef makes it easy for each group or engineer to customize their servers.
Also, he talked a little about how Chef can help configure a production test node as a test canary that
automatically reverts to live after an hour of non-testing.
Facebook is big on IPv6, as a significant percentage of mobile and Comcast traffic is using it.
Also, with the large number of devices they have, IPv6 helps when RFC1918 space happens to run out internally.
Phil has previously worked in Operations at TicketMaster and Google, and runs a popular Metallica
Thanks to Ooyala for the nice meeting space. 800 W. El Camino Real is a fairly historic building. Searchme.com was a Sequoia-funded visual search engine also based here.
Ooyala is a video CDN. They use Ruby, Go, Scala and Chef, and they are hiring.
ChefConf 2013: Scaling systems configuration at Facebook – Phil Dibowitz
keywords: opscode, chef