Lessons from Building Distributed Systems at TCS
Real-world insights from designing scalable backend systems handling millions of requests daily at Tata Consultancy Services.
Lessons from Building Distributed Systems at TCS
Since joining TCS in June 2025, I’ve been working on backend systems that handle millions of requests daily. Here are the hard-won lessons.
1. Event-Driven > Request-Response
Switching from synchronous REST calls to Apache Kafka for inter-service communication reduced our P99 latency by 60%.
# Before: Synchronous
response = service_b.process(data) # Blocks until complete
# After: Event-driven
producer.send('events', {'type': 'process', 'data': data})
2. Redis is Not Just Caching
We use Redis for:
- Rate limiting (sliding window counters)
- Session management
- Real-time leaderboards
- Distributed locks (Redlock algorithm)
3. Circuit Breakers Save Lives
When downstream services fail, circuit breakers prevent cascade failures:
const breaker = new CircuitBreaker(apiCall, {
timeout: 3000,
errorThresholdPercentage: 50,
resetTimeout: 30000,
});
4. Observability is Non-Negotiable
You can’t fix what you can’t see. We use:
- Prometheus + Grafana for metrics
- Jaeger for distributed tracing
- ELK stack for centralized logging
5. Design for Failure
Every service assumes its dependencies will fail. Graceful degradation > complete outage.
Comments
Recently Viewed
Related Posts
Part 19: The MNC/TCS Employee Ultimate Strategy Guide
The definitive guide for IT professionals and TCS employees to maximize lifetime free credit card approvals and bypass strict bank policies.
The Ultimate Teen Drama Masterlist: 100 Best Shows to Watch (10-Part Series)
From 90s classics to modern masterpieces, we rank and review the 100 greatest teen television shows of all time. Your ultimate binge-watching guide.
100 Best Teen Dramas Part 1: The Foundation (90s Icons)
Part 1 of our ultimate teen drama masterlist. We look back at the 90s classics like Buffy, Dawson's Creek, and Beverly Hills 90210 that defined the genre.