Technical Debt: When to Pay, When to Defer

Technical Debt: When to Pay, When to Defer

The $2 Million Question I Had to Answer

Last year, my CTO asked me a question that made my palms sweat: "Should we spend 3 months paying down technical debt, or ship these 5 new features our biggest client is asking for?"

I froze. Because honestly? I had no good answer. I'd read all the blog posts about "treating tech debt like financial debt" and "paying down the principal," but none of that helped me decide whether to tell our sales team we'd miss their deadline.

That conversation forced me to develop a framework I actually use now. Not perfect, but way better than my previous strategy of "panic and hope for the best."

What Technical Debt Actually Costs You

Here's what nobody tells you when you're starting out: tech debt isn't bad because it's messy code. It's bad because of what it prevents.

Real example from last month: Our team wanted to add a simple feature—let users export their data to CSV. Should've taken 2 days. Took 3 weeks. Why? Because our data layer was so coupled to our old MySQL schema that touching anything meant rewriting 15 other files.

The real costs I track now:

  • Velocity tax: How much slower are we shipping features? (We measured: 2.5x slower on average)
  • Talent drain: Are good engineers leaving because the codebase frustrates them? (Lost 2 senior devs in 2023 who cited "too much legacy code")
  • Bug inflation: Are we creating 3 bugs for every 1 we fix? (Our ratio hit 3.2:1 before we intervened)
  • Opportunity cost: What features are we NOT building because we're maintaining old code?

That last one hurts the most. In 2024, we passed on a partnership that could've brought in $500K because we knew our system couldn't handle their integration requirements.

My "Pain Score" System (Stolen and Improved)

I stole this idea from a conference talk and made it actually useful. Every quarter, we score our technical debt on these factors:

1. Blast Radius (1-10)

How many systems/features does this affect?

  • Our authentication system: 10/10 (touches everything)
  • That old admin dashboard only 2 people use: 2/10

2. Change Frequency (1-10)

How often do we have to work in this code?

  • Payment processing: 9/10 (weekly changes)
  • Initial onboarding flow: 3/10 (rarely touch it)

3. Developer Pain (1-10)

Honest developer survey: "On a scale of 1-10, how much do you hate working on this?"

  • Our event processing system: 10/10 ("I have nightmares about this")
  • Email templates: 4/10 ("Not fun, but manageable")

4. Business Risk (1-10)

What happens if this breaks?

  • Payment processing: 10/10 (we lose money immediately)
  • Analytics dashboard: 5/10 (annoying but not critical)

The formula: (Blast Radius × Change Frequency) + Developer Pain + Business Risk

Our payment processing code scored 109/130. We paid that down first. The old admin dashboard? 21/130. We deleted it instead.

When to Pay (and When to Defer)

This is the part that took me 5 years to learn. Here's my decision tree now:

Pay Immediately If:

  • It's blocking current work: If your team is spending 50% of their time working around the debt, stop and fix it. We once spent 2 weeks refactoring our API client middleware because every new endpoint took a full day to add. After the refactor? New endpoints took 20 minutes.
  • It's causing production incidents: Our caching layer was so fragile it caused 3 outages in one month. We stopped feature work and rebuilt it. Zero cache-related incidents since.
  • You're losing talent over it: When your best engineer says "I can't work in this codebase anymore," you have a retention problem disguised as a technical problem.

Defer (But Track) If:

  • It's in low-traffic code: We have an old reporting system that 3 people use monthly. Ugly code? Yes. Worth 3 weeks to rewrite? No.
  • The feature might get deprecated: We almost refactored our SMS notification system right before deciding to move everything to push notifications. Glad we waited.
  • You don't understand the domain well enough yet: Early in a new codebase, I thought our billing code was a mess. Turns out it was complex because billing IS complex. Rewriting would've been a disaster.

Delete If:

Seriously, just delete more code. Last year we deleted 23,000 lines of code for features nobody used anymore. Our build got 40% faster. Our test suite ran in half the time. Deleting code is the best refactoring.

The "20% Time" Approach That Actually Works

You've probably heard "spend 20% of your sprint on tech debt." I tried that. It failed miserably. Here's why:

  • Nobody actually does it when feature pressure is high
  • 20% isn't enough for big refactors, but it's too much for small ones
  • Engineers pick easy, low-impact debt instead of important debt

What works for us now:

The "Boy Scout Plus" rule: Leave code better than you found it, BUT track meaningful improvements in our tech debt backlog. Small improvements count: adding tests, extracting a function, documenting a weird behavior.

Quarterly "Debt Sprints": Once per quarter, we spend one full week (not 20% of many weeks) on the highest-scoring debt from our pain score system. Full team focus. No features. Just improvement.

Embed debt work in features: When planning a feature that touches messy code, we add 30% time for "refactor the area we're working in." This is when most meaningful refactoring happens.

How to Talk About Tech Debt With Non-Engineers

This was my hardest lesson. Engineers want to talk about "coupling" and "abstraction layers." Executives want to know "why is this taking so long?"

What DOESN'T work:
"We need to refactor the service layer because the dependency injection is too tightly coupled to the ORM."

What DOES work:
"Right now, adding a new feature takes 3 weeks. If we spend 2 weeks cleaning up this system, future features will take 4 days. We'll break even after the next 3 features, and save 3 weeks per feature after that."

Translate tech debt into:

  • Time saved (we can ship faster)
  • Risk reduced (fewer production incidents)
  • Cost avoided (less time debugging, less overtime)
  • Opportunities enabled (we CAN build that integration now)

I once got approval for a 6-week refactor by showing that it would prevent $80K in projected overtime costs. CFO approved it in 10 minutes.

What You Can Do This Week

If you're drowning in tech debt right now, here's your action plan:

  1. Score your top 10 debt items using the pain score system above. Takes 1 hour in a meeting.
  2. Pick the highest-scoring item and estimate how long it would take to fix.
  3. Calculate the ROI: How much time/pain will this save vs. how much time it takes to fix?
  4. If ROI > 3x, schedule it now. If ROI < 3x, defer and track it.
  5. Start tracking velocity impact with a simple question each sprint: "What took longer than expected due to tech debt?"

The Brutal Truth

You will never pay down all your tech debt. Never. I've been in codebases that are 2 years old and 20 years old. They all have debt.

The goal isn't zero debt. The goal is managed, strategic debt that doesn't prevent you from shipping.

Some of our highest-pain code is 4 years old and still running in production. Why? Because it's stable, we understand it, and rewriting it would take 3 months we don't have. We've documented it, tested it, and built monitoring around it. That's good enough.

The difference between teams that thrive and teams that collapse isn't the amount of tech debt. It's whether they have a system for deciding what to fix and what to live with.

Now go score your debt. You'll be surprised what you find.