GitLab Validating Ceph in Production For Me Spikes are Outages. OSD = Ceph Object Storage Daemon

  • It would be easy for me to criticize GitLab for using a distributed file system in production, especially Ceph, in AWS. I just wouldn’t roll that way.
  • And it would be easy for me to say, “I told you so.” again about AWS latency being a performance killer. It’s physics.

    After all, when you yoke a bunch of water buffalo together, your team is only as fast as the slowest buffalo.

    But I find it fascinating and convenient that they’re doing all that distributed file system testing for me. Thanks, guys! 🙂

    On the plus side, supporting a distributed file system is almost possible on homogeneous hardware …

    Here’s some free consulting from somebody who works on x,000 to xx,000-server data centers:

    1. buy hardware compatible with Ceph
    2. use 10 Gbps switch ports
    3. use cluster-dedicated switches
    4. hire somebody already doing it now
    5. don’t goof up your health-checks. Include all healthy servers, not just the healthiest one
    6. or instead of using Ceph or Gluster, do it right. Implement Backblaze’s object store design. Invert the problem from being “the network and OSD has to always work” to something tractable like “my HTTP API has to work most of the time”. And use a combination of Arista Clos network design and HAProxy as the mesh router to avoid network hotpspots and SPOFs. Non-blocking and “Propah!” with multi-terabits per second sustained throughput! Now we’re talking! 😎

    “There is a threshold of performance on the cloud and if you need more, you will have to pay a lot more, be punished with latencies, or leave the cloud.” How We Knew It Was Time to Leave the Cloud
    HN discussion (with Cloud Apologists)
    Proposed server purchase for HN

  • This entry was posted in Cassandra, Cloud, Open Source, Storage, Tech, Toys. Bookmark the permalink.

    Leave a Reply

    Your email address will not be published.

    This site uses Akismet to reduce spam. Learn how your comment data is processed.