Câu hỏi phỏng vấn DevOps
Câu hỏi

Tell me about the worst-run/best-run outage you’ve been a part of. What made it bad/well-run?

Câu trả lời

Here is my attempt at answering the DevOps interview question:

The worst-run outage I was involved with occurred when a critical production database server crashed due to a hardware failure. The outage was poorly handled for several reasons:

  1. There was no clear incident commander to coordinate the response. Multiple people were trying to lead the effort, causing confusion and lack of direction[1].

  2. We did not have an up-to-date runbook with steps to recover the database from backups. The DBA team had to piece together the recovery process on the fly, wasting valuable time[1].

  3. Communication to stakeholders was lacking. Customers were not informed of the outage and expected SLAs were not adjusted. This led to angry calls from customers unaware of the situation[1].

  4. Postmortem actions were not tracked or followed up on. Lessons learned were not implemented to prevent similar outages in the future[3].

In contrast, the best-run outage I experienced was when a major cloud provider had a region-wide outage that impacted our application. The response was well-executed:

  1. An incident commander was quickly established to drive the response and keep everyone fo...


Gợi ý câu hỏi phỏng vấn


What is Chef?


How do all DevOps tools work together?


What is Canary Releasing?

Bình luận

Chưa có bình luận nào

Chưa có bình luận nào