The 10-Minute Infrastructure Scaling Health Check
The 10-Minute Infrastructure Scaling Health Check
Use this checklist to quickly assess if your infrastructure is ready to scale from £1M to £10M ARR.
Each “No” answer is a scaling bottleneck that will bite you in the next 6 months.
1. Database Performance ✓
□ Can your database handle 10x current traffic?
Quick test:
- Check current read/write IOPS against database limits
- If you’re above 60% capacity → you need to scale soon
- Review slow query log – anything taking >100ms under load?
Red flags:
- No read replicas configured
- No connection pooling (seeing “too many connections” errors)
- Running on default database instance from 2 years ago
- No query performance monitoring
□ Are critical queries indexed properly?
Run this on PostgreSQL:
SELECT schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;
Check if your largest tables have indexes on frequently queried columns.
□ Do you have automated backups with tested restores?
Not just “backups enabled” – when did you last restore from backup to verify it works?
2. Application Architecture ✓
□ Can you scale horizontally by adding more servers?
Warning signs you can’t:
- In-memory session storage (not Redis/database)
- File uploads stored on local disk (not S3/object storage)
- Background jobs running on same server as web app
- Hard-coded server IPs anywhere in config
□ Are static assets served via CDN?
Check: View source on your app. Are images/CSS/JS served from your domain or a CDN?
If from your domain → your servers are wasting resources serving static files.
□ Is there a queue for background jobs?
Red flags:
- Email sending blocks HTTP requests
- Report generation happens in web requests
- No job queue system (Sidekiq, Celery, BullMQ, SQS)
3. Deployment Pipeline ✓
□ Can you deploy without downtime?
Test: Run a deployment during business hours. Do users see errors or connection drops?
Blue-green or rolling deployments are essential at scale.
□ Can you rollback a bad deployment in under 5 minutes?
Do you have:
- Automated rollback command/button?
- Process to identify and revert bad deploys quickly?
- Documented rollback playbook?
□ Does deployment take under 15 minutes?
If deploys take 30+ minutes, engineers batch changes instead of shipping continuously.
This slows innovation and increases risk per deployment.
4. Monitoring & Observability ✓
□ Do you have alerts for critical failures?
Minimum required alerts:
- Server/container health checks failing
- Database connection pool exhausted
- Error rate above threshold (5xx responses)
- Response time P95 above SLA
- SSL certificate expiring soon
□ Can you trace a slow request from user → database?
Do you have:
- Application Performance Monitoring (APM)?
- Distributed tracing across microservices?
- Ability to see full request lifecycle?
□ Are you monitoring business metrics, not just technical metrics?
Track:
- Sign-ups per hour
- Failed payment attempts
- Feature usage rates
- Customer-facing transaction success rates
If a deployment breaks sign-ups but servers are “healthy,” technical monitoring alone won’t catch it.
5. Security & Compliance ✓
□ Are secrets managed properly (not in code)?
Check:
- No API keys or passwords in GitHub
- Using secret management (AWS Secrets Manager, Vault, etc.)
- Environment variables properly injected at runtime
□ Is production access restricted and audited?
Required:
- Production SSH/database access requires MFA
- Audit log of who accessed what and when
- Principle of least privilege (developers don’t have production DB passwords)
□ Are dependencies kept up to date?
Run: npm audit or pip-audit
If you see critical vulnerabilities → attackers can too.
6. Infrastructure as Code ✓
□ Can you recreate your entire infrastructure from code?
Test: If AWS eu-west-1 went down tomorrow, how long would it take to rebuild in us-east-1?
If answer is “no idea” or “weeks” → you need IaC.
□ Is your infrastructure version controlled?
Check:
- All infrastructure defined in Terraform/CloudFormation/Pulumi?
- Changes go through Git and code review?
- Can roll back infrastructure changes like code?
□ Do you have separate environments (dev/staging/prod)?
Red flags:
- Testing in production because staging doesn’t exist
- Staging shares database with production
- Can’t test infrastructure changes safely
7. Cost Management ✓
□ Do you know where your cloud spend goes?
Can you answer these in 60 seconds:
- What’s your biggest AWS cost category? (Compute, storage, data transfer?)
- Which service/product line costs most to run?
- What’s your cost per customer/transaction?
□ Have you optimized cloud costs in the last 6 months?
Quick wins often available:
- Reserved instances / Savings Plans for predictable workloads
- Rightsizing over-provisioned resources
- Deleting orphaned volumes/snapshots
- Shutting down non-production environments overnight
□ Do you have cost alerts configured?
Set up:
- Alert when monthly spend exceeds budget by 20%
- Alert on unusual spend spikes
- Per-service budgets for high-cost items
Scoring Your Infrastructure
18-21 checkmarks: You’re in great shape. Minor optimizations only.
14-17 checkmarks: Some gaps. Prioritize the missing ones before scaling aggressively.
10-13 checkmarks: Significant scaling risks. You’ll hit major issues in next 6 months.
Below 10: Critical infrastructure debt. Scaling will be painful and expensive.
What To Do If You Scored Low
Don’t panic. Most teams at £1-3M ARR have 8-12 checkmarks.
The key is fixing this before you hit scaling problems, not after.
Our Infrastructure Audit (£3,000, 2 weeks)
We complete this checklist in detail, then deliver:
- Scored assessment of each area
- Prioritized roadmap of fixes (critical → nice-to-have)
- Specific implementation recommendations
- Cost-benefit analysis for each improvement
- We implement the top 5 quick wins for you
Platform Acceleration Programme (£23,000, 10 weeks)
For teams ready to fix everything:
- Week 1-2: Complete infrastructure audit
- Week 3-8: Implement all critical improvements
- Week 9-10: Team training and documentation
Guaranteed outcomes:
- 30-40% cost reduction
- 50%+ faster deployments
- Production-ready security posture
- Infrastructure ready for 10x growth
We’ll go through this checklist together and identify your biggest risks.