In the dynamic world of modern technology, maintaining a seamless user experience is paramount. Things that can, at some point will go wrong. It’s during these crucial moments that an on-call rotation proves to be the backbone of a well-functioning tech team. At Spreaker, we recognize the significance of this responsibility and have meticulously crafted an on-call policy that ensures swift incident resolution and promotes the well-being of our engineers.
The Foundation: Policy Overview
Unified Teams, Unifying Mission
Our on-call teams, stand as the vanguards of our service reliability. They are not merely teams; they are the guardians of the digital realm, ensuring that our apps and services remain uninterrupted.
Rotation Rationale
Adopting a weekly rotation, our on-call system commences each Friday at 5pm CEST. During this time, one engineer from each team is designated as the official on-call personnel. Their responsibility is clear: to promptly address incidents related to their team’s domain. Within regular working hours, the engineer is primed to handle alerts of all severity levels, whereas outside these hours, they are focused on critical alerts.
To fortify this system, a secondary team is on standby to provide support for unacknowledged incidents, thereby ensuring that every situation receives the attention it merits.
Navigating Challenges: Escalation Strategy
In the realm of on-call, time is of the essence. Our escalation strategy is meticulously designed to mitigate delays and swiftly resolve any impending issues.
- Primary Notification: The primary on-call engineer is immediately notified of an incident.
- Backup Notification: Should the primary engineer fail to acknowledge the alert within 30 minutes, the backup on-call team is engaged.
Staying Connected: Notification Protocol
In a hyper-connected world, communication is key. Our notification protocol stands as a testament to our commitment to timely updates.
For New and Assigned Alerts:
- Immediate notification via Slack.
- Immediate notification via Push notification.
- 1-minute delay for SMS notification if unacknowledged.
- Gradual phone call notifications at the 3, 10, 15, 20, and 25-minute marks if unacknowledged.
At Schedule Start/End:
- Immediate notification via Slack.
- Immediate notification via Push notification.
Recovery and Recognition: A Nurturing Approach
We understand that the on-call responsibility can be demanding, even in the absence of active incidents. As advocates for our engineers’ well-being, we’ve implemented a system that not only resolves incidents but also rewards dedication.
At Spreaker, we believe in acknowledging the efforts of our engineers, whether incidents occurred during their on-call week or not. Upon completing an on-call shift, engineers are entitled to a mandatory, fully paid day off on the immediate following Monday. This day off is separate from the yearly allowance and serves as a rejuvenating pause after the week’s responsibilities. Importantly, this extra day cannot be rescheduled and does not accumulate.
This additional day off carries immense value beyond relaxation. It symbolizes a pause for reflection, a moment to appreciate the impact of proactive architectural decisions. It’s a tangible reward for choosing foresight over firefighting, and it encourages our engineers to continually enhance the robustness of our systems.
In Closing
Our on-call policy stands as a testament to our commitment to reliability and the welfare of our engineers. By fostering a culture of responsibility, recognition, and communication, we ensure that our digital landscape remains resilient, and our team members are empowered to navigate challenges with vigor. At Spreaker, we don’t just resolve incidents; we empower our engineers to embrace their roles and continue striving for excellence.