Part 1. SRE implementation. Context versus control in SRE -- Interviewing site reliability engineers -- So, you want to build an SRE team? -- Using incident metrics to improve SRE at Scale -- Working with third parties shouldn't suck -- How to apply SRE principles without dedicated SRE teams -- SRE without SRE: the spotify case study -- Introducing SRE in large enterprises -- From SysAdmin to SRE in 8.963 words -- Clearing the way for SRE in the enterprise -- SRE patterns loved by DavOps people everywhere -- DevOps and SRE: voices from the community -- Production engineering at Facebook -- Part 2. Near Edge SRE. In the beginning, there was chaos -- The intersection of reliability and privacy -- Database reliability engineering -- Engineering for data durability -- Introduction to machine learning for SRE -- Part 3. SRE best practices and technologies. Do Docs better: integrating documentation into the engineering workflow -- Active teaching and learning -- The art and science of the service-level objective -- SRE as a success culture -- SRE antipatterns -- Immutable infrastructure and SRE -- Scriptable load balancers -- The service mesh: wrangler of your microservices? -- Part 4. The human side of SRE. Psychological safety in SRE -- SRE cognitive work -- Beyond burnout -- Against on-call: a polemic -- Elegy for complex systems -- Intersections between operations and social activism -- Conclusion.
This resource is supported by the Institute of Museum and Library Services under the provisions of the Library Services and Technology Act as administered by State Library of Iowa.