Câu hỏi phỏng vấn DevOps
Câu hỏi

Tell me about the worst-run/best-run outage you’ve been a part of. What made it bad/well-run?

Câu trả lời

Here is my attempt at answering the DevOps interview question:

The worst-run outage I was involved with occurred when a critical production database server crashed due to a hardware failure. The outage was poorly handled for several reasons:

  1. There was no clear incident commander to coordinate the response. Multiple people were trying to lead the effort, causing confusion and lack of direction[1].

  2. We did not have an up-to-date runbook with steps to recover the database from backups. The DBA team had to piece together the recovery process on the fly, wasting valuable time[1].

  3. Communication to stakeholders was lacking. Customers were not informed of the outage and expected SLAs were not adjusted. This led to angry calls from customers unaware of the situation[1].

  4. Postmortem actions were not tracked or followed up on. Lessons learned were not implemented to prevent similar outages in the future[3].

In contrast, the best-run outage I experienced was when a major cloud provider had a region-wide outage that impacted our application. The response was well-executed:

  1. An incident commander was quickly established to drive the response and keep everyone fo...
middle

middle

Gợi ý câu hỏi phỏng vấn

middle

Classify Cloud Platforms by category

expert

How would you introduce Continuous Delivery in a successful, huge company for which the change
from Waterfall to Continuous Delivery would be not trivial, because of the size and complexity of the
business?

junior

What is Continuous Monitoring?

Bình luận

Chưa có bình luận nào

Chưa có bình luận nào