The 10-Minute Infrastructure Scaling Health Check

The 10-Minute Infrastructure Scaling Health Check

Use this checklist to quickly assess if your infrastructure is ready to scale from £1M to £10M ARR.

Each “No” answer is a scaling bottleneck that will bite you in the next 6 months.

1. Database Performance ✓

□ Can your database handle 10x current traffic?

Quick test:

  • Check current read/write IOPS against database limits
  • If you’re above 60% capacity → you need to scale soon
  • Review slow query log – anything taking >100ms under load?

Red flags:

  • No read replicas configured
  • No connection pooling (seeing “too many connections” errors)
  • Running on default database instance from 2 years ago
  • No query performance monitoring

□ Are critical queries indexed properly?

Run this on PostgreSQL:

SELECT schemaname, tablename,
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;

Check if your largest tables have indexes on frequently queried columns.

□ Do you have automated backups with tested restores?

Not just “backups enabled” – when did you last restore from backup to verify it works?

2. Application Architecture ✓

□ Can you scale horizontally by adding more servers?

Warning signs you can’t:

  • In-memory session storage (not Redis/database)
  • File uploads stored on local disk (not S3/object storage)
  • Background jobs running on same server as web app
  • Hard-coded server IPs anywhere in config

□ Are static assets served via CDN?

Check: View source on your app. Are images/CSS/JS served from your domain or a CDN?

If from your domain → your servers are wasting resources serving static files.

□ Is there a queue for background jobs?

Red flags:

  • Email sending blocks HTTP requests
  • Report generation happens in web requests
  • No job queue system (Sidekiq, Celery, BullMQ, SQS)

3. Deployment Pipeline ✓

□ Can you deploy without downtime?

Test: Run a deployment during business hours. Do users see errors or connection drops?

Blue-green or rolling deployments are essential at scale.

□ Can you rollback a bad deployment in under 5 minutes?

Do you have:

  • Automated rollback command/button?
  • Process to identify and revert bad deploys quickly?
  • Documented rollback playbook?

□ Does deployment take under 15 minutes?

If deploys take 30+ minutes, engineers batch changes instead of shipping continuously.

This slows innovation and increases risk per deployment.

4. Monitoring & Observability ✓

□ Do you have alerts for critical failures?

Minimum required alerts:

  • Server/container health checks failing
  • Database connection pool exhausted
  • Error rate above threshold (5xx responses)
  • Response time P95 above SLA
  • SSL certificate expiring soon

□ Can you trace a slow request from user → database?

Do you have:

  • Application Performance Monitoring (APM)?
  • Distributed tracing across microservices?
  • Ability to see full request lifecycle?

□ Are you monitoring business metrics, not just technical metrics?

Track:

  • Sign-ups per hour
  • Failed payment attempts
  • Feature usage rates
  • Customer-facing transaction success rates

If a deployment breaks sign-ups but servers are “healthy,” technical monitoring alone won’t catch it.

5. Security & Compliance ✓

□ Are secrets managed properly (not in code)?

Check:

  • No API keys or passwords in GitHub
  • Using secret management (AWS Secrets Manager, Vault, etc.)
  • Environment variables properly injected at runtime

□ Is production access restricted and audited?

Required:

  • Production SSH/database access requires MFA
  • Audit log of who accessed what and when
  • Principle of least privilege (developers don’t have production DB passwords)

□ Are dependencies kept up to date?

Run: npm audit or pip-audit

If you see critical vulnerabilities → attackers can too.

6. Infrastructure as Code ✓

□ Can you recreate your entire infrastructure from code?

Test: If AWS eu-west-1 went down tomorrow, how long would it take to rebuild in us-east-1?

If answer is “no idea” or “weeks” → you need IaC.

□ Is your infrastructure version controlled?

Check:

  • All infrastructure defined in Terraform/CloudFormation/Pulumi?
  • Changes go through Git and code review?
  • Can roll back infrastructure changes like code?

□ Do you have separate environments (dev/staging/prod)?

Red flags:

  • Testing in production because staging doesn’t exist
  • Staging shares database with production
  • Can’t test infrastructure changes safely

7. Cost Management ✓

□ Do you know where your cloud spend goes?

Can you answer these in 60 seconds:

  • What’s your biggest AWS cost category? (Compute, storage, data transfer?)
  • Which service/product line costs most to run?
  • What’s your cost per customer/transaction?

□ Have you optimized cloud costs in the last 6 months?

Quick wins often available:

  • Reserved instances / Savings Plans for predictable workloads
  • Rightsizing over-provisioned resources
  • Deleting orphaned volumes/snapshots
  • Shutting down non-production environments overnight

□ Do you have cost alerts configured?

Set up:

  • Alert when monthly spend exceeds budget by 20%
  • Alert on unusual spend spikes
  • Per-service budgets for high-cost items

Scoring Your Infrastructure

18-21 checkmarks: You’re in great shape. Minor optimizations only.

14-17 checkmarks: Some gaps. Prioritize the missing ones before scaling aggressively.

10-13 checkmarks: Significant scaling risks. You’ll hit major issues in next 6 months.

Below 10: Critical infrastructure debt. Scaling will be painful and expensive.

What To Do If You Scored Low

Don’t panic. Most teams at £1-3M ARR have 8-12 checkmarks.

The key is fixing this before you hit scaling problems, not after.

Our Infrastructure Audit (£3,000, 2 weeks)

We complete this checklist in detail, then deliver:

  • Scored assessment of each area
  • Prioritized roadmap of fixes (critical → nice-to-have)
  • Specific implementation recommendations
  • Cost-benefit analysis for each improvement
  • We implement the top 5 quick wins for you

Platform Acceleration Programme (£23,000, 10 weeks)

For teams ready to fix everything:

  • Week 1-2: Complete infrastructure audit
  • Week 3-8: Implement all critical improvements
  • Week 9-10: Team training and documentation

Guaranteed outcomes:

  • 30-40% cost reduction
  • 50%+ faster deployments
  • Production-ready security posture
  • Infrastructure ready for 10x growth

Book a Free Discovery Call

We’ll go through this checklist together and identify your biggest risks.