He talked about their odyssey through the years to monitor their infrastructure, now over 5,000 AWS servers (originally Ruby on Rails and Mongo) and serverless (AWS Lambda.)
Like most startups, they don’t have a DBA team because features. So they rely on monitoring during outages to tell them when to run EXPLAIN. 🙂
EXPLAIN is your friend!
They have used or evaluated several products over the years:
New Relic – likely too expensive
- Kibana – based on logs, so detailed, but poor aggregation and alerting. Only can afford to store 7 days. Dashboard with 100 ms/200 ms and 500s graphs. They use AWS-managed services as much as possible to reduce workload.
Grafana/Prometheus – too DIY, poor alerting UI
- Datadog – good aggregation and alerting, easiest for new engineers, long retention, poor details and granularity. Special VPC routing for eng. security.
Kibana vs. Datadog – Complementary Features
Now using Kibana and Datadog, which are literally complementary (see slides) but would like to combine the best of both into one tool. Maybe someday! 🙂
For serverless (AWS lambda), either AWS X-ray with Datadog, or Cloudwatch.
As a finance company, Coinbase does spend effort on compliance, though it’s a Cloud world now.
Coming soon: AWS Database Week in SF from June 4-6.
AWS: Patching Python Libraries to Instrument Downstream Calls
bloomberg.com: Coinbase Says Chief Operating Officer Has Left Crypto Exchange
Firefox zero-day was used in attack against Coinbase employees, not its users