SahilKumarSahu
Role
Senior Site Reliability Engineer
Companies
Bosch · Zeta · Oracle
2021 – Present
I build systems that don't wake you up at 3am. Production-grade Kubernetes, GitOps and observability at scale.

LOCATION
Bengaluru, Karnataka
STATUS
Open to Work
About
Reliability is easy to claim.
Hard to hold under load.
Who I am
Senior Site Reliability Engineer with 5+ years designing cloud-native platforms at Oracle, Zeta and Robert Bosch. I specialise in Kubernetes, GitOps and observability — building systems that are highly available, secure and operationally quiet so engineering teams can ship without fear. The goal is always the same: make reliability invisible.
5+
Years SRE
$10.5K
Monthly Savings
40%
MTTR Reduced
100+
Microservices
3,000+
Alerts Optimised
50+
Nodes Migrated
Quick facts
| Current Focus | Kubernetes Platform Engineering · GitOps · Cloud Security |
| Location | Bengaluru, Karnataka, India |
| Open To | Senior SRE · Staff DevOps · Platform Engineering |
| Education | BTech CSE · CET Bhubaneswar · CGPA 8.86 |
| Publication | IEEE 2018 · TCA Task Scheduling Algorithm |
| Mentorship | topmate.io/sahilsahu246 |
What I don't do
Experience & Education
The trajectory.
2017
BTech Computer Science Engineering
CET Bhubaneswar
CGPA 8.86 · IEEE publication 2018 · Active IEEE student chapter member.
- ▪
Graduated with CGPA 8.86 — consistently in top percentile of the batch
- ▪
Published TCA Task Scheduling Algorithm paper at IEEE 2018 — peer-reviewed international conference proceedings
- ▪
Proposed TCA (Task-Cluster Algorithm) benchmarked against FCFS and Round Robin — improved throughput and resource utilisation
- ▪
Active IEEE student chapter member — organised 75+ technical events and workshops
- ▪
Final year capstone: developed and benchmarked novel scheduling heuristics for distributed systems
Associate Software Engineer
Robert Bosch
Owned infra for 100+ apps including 20+ critical services ensuring high availability.
- ▪
Spearheaded migration of infrastructure to CI/CD across build and production environments
- ▪
Built Ansible automation for monthly release patch fixes across applications
- ▪
Owned end-to-end infrastructure for 100+ apps — 20+ critical with HA guarantees
- ▪
Configured error rate tracking and distributed tracing (APM) in Datadog and New Relic
2021
2022
Site Reliability Engineer 1
Zeta
Saved $10,500/month in cloud costs. Reduced p95 latency 3 s → 1 s.
- ▪
Orchestrated cloud cost-saving measures — $10,500+ monthly savings
- ▪
Led observability for 100+ microservices — SLIs, SLOs and error budgets defined
- ▪
Removed 3,000+ redundant Prometheus alert rules — noise reduced significantly
- ▪
Reduced p95 latency 3 s→1 s for Sodexo transactions via private endpoint migration
- ▪
Created Jenkins seed jobs for deployment pipelines and payment-service token rotation
- ▪
Migrated centralised logging ELK→OpenSearch for lower cost and better scalability
Site Reliability Developer 2
Oracle
Raised security score 64%→85%. Reduced recurring P1 incidents by 40%.
- ▪
Migrated 50+ on-premise CPDI nodes to OKE, OCI Compute and OCI Block Volume
- ▪
Improved OCI IAM, Cloud Guard and Security Zones — security score 64%→85%
- ▪
Configured OCI Monitoring, Alarms and Notifications via email, Slack and PagerDuty
- ▪
Built Ansible automation to self-heal Kubernetes Operations Framework — toil ↓20%
- ▪
Led blameless postmortems and RCA for P1 incidents — recurring issues ↓40%
2024
Impact
Proof, not promises.
Hover a node — every number delivered, not projected.
Work
What the arc
produced.
DevOps CI/CD Pipeline on AWS
Personal · 2024
Context
Approach
Built from scratch on a custom VPC. Jenkins master on EC2 with GitHub webhook triggers. ALB handles routing, ASG manages scaling. SSM Run Command replaces SSH for zero-touch deployments.
DevSecOps Pipeline · TravellerHub
Personal · 2024
Context
Approach
Jenkins CI master+worker on EC2. SonarQube for code quality, OWASP for dependency vulnerabilities, Trivy for container scanning. ArgoCD GitOps for deployment. Prometheus and Grafana for observability.
pylance.in — AI Tools Platform
Self-Built · 2025 – Present
Context
Approach
Next.js App Router with TypeScript. Claude API for AI generation. Mammoth and pdfjs for file parsing. Vercel for deployment. Razorpay for payment integration on premium tools.
Stack
What I run
in production.
Battle-tested under load.
Not just imported.
Container and Orchestration
Kubernetes
EKS · OKE
Docker
Helm
ArgoCD
GitOps
Cloud Platforms
AWS
EC2 · S3 · ALB
Oracle Cloud
OCI
Terraform
Ansible
CI/CD and GitOps
Jenkins
GitHub Actions
GitLab CI
Observability
Prometheus
Grafana
Datadog
Languages
Python
Go
Bash
Linux
Databases and Tooling
PostgreSQL
MySQL
Redis
MongoDB
Nginx
Contact
Hard problems
welcome.
Book a 1:1 for resume reviews, SRE mentorship or DevOps guidance — or just to talk shop about Kubernetes, GitOps and keeping production quiet at 3am. I reply to every message.