大规模分布式系统的设计和部署实践

论文 http://mvdirona.com/jrh/talksAndPapers/JamesRH_Lisa.pdf

这篇论文主要从面向运维友好的角度,思考了大规模分布式系统的设计和部署相关的一些原则和最佳实践。

总体设计原则

We have long believed that 80% of operations issues originate in design and development, so this section on overall service design is the largest and most important.

When systems fail, there is a natural tendency to look first to operations since that is where the problem actually took place. Most operations issues, however, either have their genesis in design and development or are best solved there.

对服务整体设计影响最大的一些运维友好的基本原则如下:

  1. Keep things simple and robust
  2. Design for failure