Billing Support × Billing Engineering

Escalation Process Review

Aligning on a move from case-by-case escalations to incident-based management · Daniel Anselmo (Global Lead, Billing Customer Incident Support) & Stijn Zweegers (Senior Billing Engineering Manager)

Why we met

With new leadership over the FINTECH team, we opened a joint review of the billing escalation process — moving from a rough, individual understanding to a concrete plan for reducing unnecessary escalations and improving alignment between Customer Incident Support (CSUP) and Billing Engineering.

The core problem — repetitive escalations don't scale

~800
cases linked to a single incident (R2 / Zero Trust)
100s
duplicate escalations for one known issue
Human-dependent
escalation decisions vary by individual skill

Today, cases are escalated individually even when the workaround or root cause is already known. That consumes capacity better spent on root-cause fixes and innovation. An efficient earlier model (Epic Jira + tables of contents for bulk fixes) was discontinued for unknown reasons.

Current vs. target operating model

flowchart LR
    subgraph CUR["CURRENT — case-by-case"]
      direction TB
      A1["Customer cases"] --> A2["Individual escalation
per case"] A2 --> A3["Billing Eng repeats
an already-known fix"] A3 --> A4["Hundreds of duplicate
escalations per issue"] end subgraph TGT["TARGET — incident-based"] direction TB B1["Customer cases"] --> B2["Confirm issue once"] B2 --> B3["Incident record +
public status page"] B3 --> B4["RCA ticket: root cause,
fix ETA, data cleanup, severity"] B4 --> B5["Customers informed →
fewer new cases"] end CUR -.->|"shift focus"| TGT classDef cur fill:#fce8e6,stroke:#ea4335 classDef tgt fill:#e6f4ea,stroke:#34a853 class A1,A2,A3,A4 cur class B1,B2,B3,B4,B5 tgt

Once an issue is confirmed, shift from individual escalations to an incident with public status updates — keeping customers informed and preventing further case creation. CSUP effort refocuses on investigation, status, and root-cause identification.

What we agreed

Define incident criteria

A formal policy distinguishing a bug from an incident: cosmetic/visual issues do not warrant an incident; anything blocking subscriptions or payments does. Removes the sustainability problem of high incident volume.

Prioritize beyond escalation counts

Every root-cause bug gets a dedicated RCA ticket (code issue, fix timeline, data cleanup, severity). Prioritization weights:

Customer sizeHybrid / enterpriseCustomer ageTotal impact

So critical enterprise clients (e.g. Notion) are addressed first. Escalation count alone has proven an incomplete metric.

Controlled tooling for CSUP

A safe, controlled UI for the team to action accounts (e.g. clearing billing flags) without exposing high-risk commands. Enables autonomous resolution, reduces engineering load, and speeds resolution times.

Close tooling & training gaps

CSUP currently lacks visibility into subscription lifecycle, Ninja Panel, and Stripe, forcing reliance on hunting down individuals. Documentation, training, and system access are needed to operate as a true frontline engineering team.

Engineering initiatives in progress

Next steps & ownership

The outcome we're targeting

Fewer duplicate escalations, faster resolution, customers proactively informed, and engineering freed to fix root causes — with CSUP operating as an empowered frontline team rather than a relay for repetitive escalations.