Writing

Why SaaS Billing Systems Still Get Failed Payments Wrong

When you start looking closely at how many SaaS billing systems behave, a certain pattern shows up.

Many systems reduce failed payments to a single internal state: payment_failed.

Once that happens, the rest of the flow is usually predetermined. The system retries a few times, usually within a fixed grace period and eventually suspends access.

That might seem reasonable at first but it hides a crucial problem.

The Flat Retry Pattern

A card with insufficient_funds, an expired_card or even a fraud-related decline can still trigger the same retry sequence without taking into consideration the failure reason.

All the nuance coming from the payment processor gets flattened into one state.

Everything becomes “payment failed.”

This is what I call the Flat Retry Pattern.

Why This Happens

This pattern doesn't usually happen because teams don't understand payment failures.

It happens because billing systems tend to grow incrementally. As the system grows, more pieces get added around that core behavior e.g., retries, grace periods, suspension rules, notification emails. But the original state payment_failed often stays unchanged.

Once that state becomes the anchor for the rest of the billing logic, everything downstream ends up following the same recovery path, even when the underlying failure signals are very different.

Payment Failures Carry Different Signals

Payment failures are not all the same and they shouldn't all trigger the same recovery behavior.

Some failures recover naturally if the system retries later. Others require the customer to take action before the payment can succeed again. In some cases retries should be limited and in others they should be disabled.

Payment processors return signals that distinguish these situations. A decline caused by insufficient_funds behaves very differently from an expired_card, a fraud-related decline or a temporary processing failure.

Timing Only Solves Part of the Problem

Retry timing helps in some cases. Tools like Smart Retries from Stripe can improve recovery when a later retry has a better chance of succeeding.

However, recovery strategies shouldn't always be purely time-driven. Sometimes, they need to be event-driven. Expired cards are a good example. Retrying will keep failing until:

the customer updates their payment method
the card network automatically updates the card details through account updater services

Where Recovery Actually Happens

Failed payments aren't handled only by the payment processor. The payment processor determines whether the transaction succeeds or fails. But the SaaS product still decides:

when retries happen
how long access continues
when the customer is notified
when product access changes

Customer Payment Attempt

Payment Processor

Stripe / Adyen / Braintree

Billing System Logic

Retry policy • Grace periods • Recovery strategy

Product Entitlements

Access control • Feature restrictions • Suspension

This layer is where most recovery logic actually lives and it's also where the flat retry pattern shows up.

What This Actually Costs

A customer with temporary insufficient funds might successfully pay a few days later but a flat retry schedule combined with a fixed grace period can still lead to suspension.

Recoverable customers get churned unnecessarily which leads to lost revenue. It also distorts retention metrics, making them harder to interpret accurately.

A Better Diagnostic Question

Instead of reacting to every failed payment the same way, the system should ask:

What kind of failure is this?

That’s the difference between generic recovery logic and a billing system that responds to failures with the right strategy.

Once the failure type is clear, the recovery strategy becomes much easier to design. In many cases, the difference between those strategies could recover a long-term customer who wasn't planning on cancelling their subscription anytime soon.

A Quick Sanity Check

When a payment fails, does your system actually care why it failed?

Does it treat insufficient_funds, expired_card and fraud-related declines differently? Or do they all go through the same retry schedule and grace period?

This is one of the first things I check when I audit SaaS billing systems.

If your system behaves like this, you're probably losing money

I run a focused 48h diagnostic to identify where your billing system is breaking and what to fix first.

Request Revenue Leak Diagnostic

About the Author

I help SaaS teams find and fix revenue leaks in billing and entitlement flows, especially where payment state and product access drift apart. I help teams reduce involuntary churn and stabilize recurring revenue.

Back to home