On-call Playbook
Common incidents and how to respond.
"Calls aren't appearing for company X"
- Check the background job dashboard (opens in a new tab) — is the Five9 sync running and succeeding?
- Check whether the company's Five9 campaign name in the portal exactly matches the Five9 configuration.
- Check Five9 directly — are calls actually landing there?
- If sync is failing systemwide, check Sentry for errors and escalate to engineering.
"Recording audio won't play"
- Check the S3 audio move job in the background job dashboard (opens in a new tab).
- Wait 15 minutes (one cycle) and try again.
- If still missing, the audio may not have been delivered by Five9. Check their SFTP server (engineering has access).
"An invoice is wrong"
- Open the invoice — check the line items.
- Cross-reference with the period's call data in the Calls (opens in a new tab) view.
- If the call count looks right but the rate is wrong, it's a Stripe configuration issue — fix in Stripe.
- If the call count is wrong, dig into Calls and look for missing or duplicate records.
"A spike alert seems wrong"
- Open the call-spike detail (opens in a new tab).
- Look at the actual call list during the window.
- If the calls are clearly noise (one source flooding), document and acknowledge.
- If the detector is consistently wrong for a particular company, escalate to engineering — the model may need tuning for that company's pattern.
"A user can't log in"
- Check whether their account is active (not discarded).
- Check whether their email is confirmed.
- Check whether they're behind a 2FA wall they can't pass — disable 2FA after verifying identity (see Security).
- Resend invitation if their original was never accepted.
- If still broken, escalate to engineering.
"Stripe webhook events look stuck"
Engineering territory — escalate. Note the time and the affected entity in your handoff.
"All recordings dashboard widgets are blank"
- Check whether the Five9 sync job is running (Sidekiq dashboard).
- Check whether Five9 itself is up.
- Wait 15 minutes for the next cycle.
- If still blank for multiple companies, escalate.
"An integrator's API calls are returning 401"
- Check whether their OAuth token has expired (Doorkeeper tracks issuance and expiration).
- Have them re-run the OAuth authorization flow to get a fresh token.
- If the token is valid but calls still 401, check whether the user behind the token is still active.
"The portal is slow"
- Check Sentry for performance issues.
- Check the Sidekiq dashboard — is one queue backed up?
- Check whether the Hetzner box is under load (engineering has direct access).
- Escalate if not resolved in 15 minutes.
"A customer says they were charged twice"
- Open Stripe directly — are there really two successful charges?
- If yes, refund the duplicate in Stripe.
- The portal will reflect the refund on the next sync.
- Document the cause — usually a webhook retry or a duplicate Stripe customer.
Closing notes
Admin access is powerful. With it comes a few habits worth keeping:
- Document significant actions. When you impersonate, deactivate users, or void invoices, leave a note somewhere your team can find it.
- Prefer fixing root causes over workarounds. A one-time manual fix is fine; the same fix three times means escalate to engineering.
- Watch the audit log when something feels off. It's the easiest way to tell whether a problem is human (someone changed something) or systemic (something broke on its own).
- Keep 2FA on. Your account has the keys to the platform.
Next
- System Operations — the tools you use during incidents
- Security — access controls and 2FA