Câu hỏi phỏng vấn DevOps
Câu hỏi

Tell me about the worst-run/best-run outage you’ve been a part of. What made it bad/well-run?

Câu trả lời

Here is my attempt at answering the DevOps interview question:

The worst-run outage I was involved with occurred when a critical production database server crashed due to a hardware failure. The outage was poorly handled for several reasons:

  1. There was no clear incident commander to coordinate the response. Multiple people were trying to lead the effort, causing confusion and lack of direction[1].

  2. We did not have an up-to-date runbook with steps to recover the database from backups. The DBA team had to piece together the recovery process on the fly, wasting valuable time[1].

  3. Communication to stakeholders was lacking. Customers were not informed of the outage and expected SLAs were not adjusted. This led to angry calls from customers unaware of the situation[1].

  4. Postmortem actions were not tracked or followed up on. Lessons learned were not implemented to prevent similar outages in the future[3].

In contrast, the best-run outage I experienced was when a major cloud provider had a region-wide outage that impacted our application. The response was well-executed:

  1. An incident commander was quickly established to drive the response and keep everyone fo...
middle

middle

Gợi ý câu hỏi phỏng vấn

middle

What is Chef?

middle

How do all DevOps tools work together?

expert

What is Canary Releasing?

Bình luận

Chưa có bình luận nào

Chưa có bình luận nào