talk-what-i-wish-i-had-known-before-scaling-uber

https://news.ycombinator.com/item?id=12597232

talk-what-i-wish-i-had-known-before-scaling-uber#stability-of-service-leave-alone At 5:38 A surprising benefit of microservices is that you never touch them. You can leave many of them alone after deploying them. That sounds good, but the flip side, which he talks about at 39:30, is that when someone needs to make a cross-cutting change, they might be forced to update six month old code on a service that hasn't changed to keep up with the rest of the services (since it was working fine) talk-what-i-wish-i-had-known-before-scaling-uber#stability-of-service-leave-alone

Teams "own their own uptime"

talk-what-i-wish-i-had-known-before-scaling-uber#trade-complexity-for-politics At 9:01, you might trade complexity for politics. Basically, you build a new service because you don't want to deal with talking to other people about how bad the old code is. talk-what-i-wish-i-had-known-before-scaling-uber#trade-complexity-for-politics

talk-what-i-wish-i-had-known-before-scaling-uber#keep-your-biases At 9:30 with multi-language microservices you get to keep your biases. If you like a specific language, you can use it, and interface with other people using other languages. talk-what-i-wish-i-had-known-before-scaling-uber#keep-your-biases

talk-what-i-wish-i-had-known-before-scaling-uber#fragment-culture Having microservices with multiple languages fragments the culture. People say "oh I am a go programmer, oh I'm a java a programmer." talk-what-i-wish-i-had-known-before-scaling-uber#fragment-culture

talk-what-i-wish-i-had-known-before-scaling-uber#json-a-mess-at-scale At 13:30, he starts talking about HTTP between services, and some of the issues with it. He specifically calls out the JSON, because it does not have types, it can be a big mess at scale. talk-what-i-wish-i-had-known-before-scaling-uber#json-a-mess-at-scale

talk-what-i-wish-i-had-known-before-scaling-uber#if-you-own-it-make-it-a-function-call At 16:03 he finishes talking about that by saying that if you own both sides of the interaction, just treat it as a function call, don't treat it as the server being, basically, a web browser. talk-what-i-wish-i-had-known-before-scaling-uber#if-you-own-it-make-it-a-function-call

At 19:30 is your automation good enough that other teams can deploy to your service, or do they need to wait on you?

talk-what-i-wish-i-had-known-before-scaling-uber#same-dashboard At 22, every service should have the same dashboard, and it should be created automatically. talk-what-i-wish-i-had-known-before-scaling-uber#same-dashboard

talk-what-i-wish-i-had-known-before-scaling-uber#distributed-tracing At 26:40 using distributed tracing to figure out issues and fan out.

At 31:20 tracing requires cross-language context-propagation. talk-what-i-wish-i-had-known-before-scaling-uber#distributed-tracing

At 33:10 starting to put back pressure in the logs, so logs are dropped if something starts logging too much.

talk-what-i-wish-i-had-known-before-scaling-uber#accounting-in-logs At 30 for some kind of accounting for the logs talk-what-i-wish-i-had-known-before-scaling-uber#accounting-in-logs

talk-what-i-wish-i-had-known-before-scaling-uber#structured-logging At 34:50 zap for structured logging open source talk-what-i-wish-i-had-known-before-scaling-uber#structured-logging

At 35:20, there's no way to create a test environment that's the same as production, and there is no way to simulate the same load as production

talk-what-i-wish-i-had-known-before-scaling-uber#load-testing-in-production At 36:05 load testing on production during slow times, need context-propagation. The request must tell the system that it is a test request, and that it should not increment the counters, for example. talk-what-i-wish-i-had-known-before-scaling-uber#load-testing-in-production

talk-what-i-wish-i-had-known-before-scaling-uber#design-systems-with-test-load-in-mind At 36:50 since many of their bugs show up when they are near peak traffic, they like to use their test traffic to keep them selves near their peak load. They wish they had designed their system to have that be a fundamental part. talk-what-i-wish-i-had-known-before-scaling-uber#design-systems-with-test-load-in-mind

talk-what-i-wish-i-had-known-before-scaling-uber#failure-testing-a-prerequisite-part-of-design At 37:45 failure testing should be built-in from the start. Nobody wants to hold it on after the fact. talk-what-i-wish-i-had-known-before-scaling-uber#failure-testing-a-prerequisite-part-of-design

At 39:30 the problem with micro services that have been deployed and not change for a long time because they're working, is that occasionally someone wants to come along and make the cross cutting change and the micro service is very far back and the migration cost is increased because of that.

talk-what-i-wish-i-had-known-before-scaling-uber#migration-mandates At 40:15 Mandates to migrate are bad. Rather, the new systems should be so much better that people want to get on it. talk-what-i-wish-i-had-known-before-scaling-uber#migration-mandates

talk-what-i-wish-i-had-known-before-scaling-uber#build-buy-tradeoff At 41:15, the build/by trade off talk-what-i-wish-i-had-known-before-scaling-uber#build-buy-tradeoff

talk-what-i-wish-i-had-known-before-scaling-uber#breaking-up-allows-people-to-play-politics At 42:50, by breaking these services up it allows people to play politics. talk-what-i-wish-i-had-known-before-scaling-uber#breaking-up-allows-people-to-play-politics

At 4440, there are trade-offs being made, and sometimes things would just move in a direction, but he wouldn't be thinking about explicitly what the trade-offs are that were being made.

talk-what-i-wish-i-had-known-before-scaling-uber#failure-testing-find-coupling At 45:50, use failure testing to identify unintended service coupling. talk-what-i-wish-i-had-known-before-scaling-uber#failure-testing-find-coupling

Referring Pages

distributed-computing-metrics-and-logging our-continuous-deployments-setup