Pavan Madduri pmady

Hi there, I'm Pavan 👋

About Me

Senior Cloud Platform Engineer at W.W. Grainger, Inc. and CNCF Golden Kubestronaut. Deep expertise in cloud-native GPU/AI infrastructure, Kubernetes ecosystems, and platform engineering. Building open-source tools for GPU workload autoscaling, observability, and topology-aware incident response.

📊 GitHub Stats

Stats updated on 2026-04-19 01:15 UTC

🏆 Certifications

Kubestronaut - One of the elite professionals who have achieved all five Kubernetes certifications from the CNCF:

KCNA (Kubernetes and Cloud Native Associate)
CKA (Certified Kubernetes Administrator)
CKAD (Certified Kubernetes Application Developer)
CKS (Certified Kubernetes Security Specialist)
KCSA (Kubernetes and Cloud Native Security Associate)

🌱 Open Source Contributions

Actively contributing to CNCF (Cloud Native Computing Foundation) and ASWF (Academy Software Foundation) projects:

CNCF (Cloud Native Computing Foundation)

Project	Description	Contributions
Dragonfly	P2P-based file distribution and image acceleration	client#1665 - Add Hugging Face backend support with hf:// protocol, client#1673 - Add ModelScope backend support with modelscope:// protocol, d7y.io#386 - Add hf:// protocol documentation, d7y.io#398 - Add P2P-accelerated AI model downloads blog post, helm-charts#455 - Add injector support to helm chart, helm-charts#480 - Replace deprecated bitnamilegacy/mysql with bitnami/mysql
Kubernetes	Production-Grade Container Orchestration	#53891 - Document deployment.kubernetes.io/* annotations, #53892 - Add kubectl apply view-last-applied documentation
TiKV	Distributed transactional key-value database	#19225 - Add AGENTS.md for AI agent guidance
Volcano	Cloud-native batch scheduling for AI/HPC	#5095 - GPU NUMA topology awareness in scheduler, apis#229 - Add GPUInfo type to NumatopoSpec CRD, resource-exporter#12 - GPU NUMA topology discovery via sysfs
KEDA	Kubernetes Event-driven Autoscaling	keda-docs#1658 - Removing metricName from the kedadocs, #7538 - GPU/AI inference scaler architectural analysis
Metal³	Bare metal host provisioning for Kubernetes	#624 - Fix redirect links in tryit.md
OpenTelemetry	Observability framework	#8632 - Add .NET troubleshooting page
kpt	Kubernetes-native packaging and resource management	#4278 - Fix kpt fn doc command for KRM functions expecting input

ASWF (Academy Software Foundation)

Project	Description	Contributions
OpenColorIO	Color management library	#2229 - Add release signing workflow, #2230 - Add Dependabot configuration, #2243 - Add Vulkan unit test framework
OpenCue	Cloud rendering management system	#2134 - Add scheduled subscription recalculation task
OpenImageIO	Image processing library	#4976 - Fix IBA::compare_Yee() channel access
RAWtoACES	RAW to ACES image conversion	#222 - Add build developer documentation
xSTUDIO	Playback and review application	#186 - Fix broken build guide links

Total: 26 PRs across 15 projects in CNCF and ASWF foundations!

Personal Projects

Project	Description	Contributions
keda-gpu-scaler	KEDA External gRPC Scaler for GPU/AI workloads	Native NVML metrics, DaemonSet deployment, pre-built scaling profiles (vLLM, Triton, training), Helm chart, scale-to-zero
otel-gpu-receiver	OpenTelemetry Collector receiver for GPU metrics	NVIDIA GPU metrics via NVML, OpenTelemetry-native, Prometheus exporter, multi-GPU support
kube-topology-agent	K8s topology discovery & automated root-cause analysis	Knowledge graph of cluster resources, AlertManager webhook integration, GPU workload classification, blast-radius analysis
Golden Kubestronaut Learning	Kubernetes certification study guides and resources	#23 - Dark mode persistence, #24 - PDF generation workflow

� Research Publications

Peer-reviewed research on Cloud-Native, Kubernetes, AI/ML Operations, and Platform Engineering:

AI & Agentic Systems

#	Paper	Links
1	AI Security: Preemptive Cybersecurity — Using AI Agents for Proactive Threat Hunting in Cloud-Native Environments	Scholar · ResearchGate
2	Agentic AI Introduction: Model Context Protocol (MCP) — Bridging Large Language Models and Real-Time Kubernetes Observability	Scholar · ResearchGate
3	Scale & LLM-Ops: Architecting LLM-as-a-Service — Infrastructure Requirements for High-Concurrency Agentic Workloads	Scholar · ResearchGate

SRE & Self-Healing Infrastructure

#	Paper	Links
4	Agentic SRE Teams: Human-Agent Collaboration — A New Operational Model for Autonomous Incident Response	Scholar · ResearchGate
5	Autonomous Remediation and Agentic SRE Teams: Reinforcement Learning for Self-Healing Infrastructure and Human-Agent Collaboration in Incident Response	Scholar · ResearchGate
6	From PagerDuty to 'Agentic Ops': The Rise of Self-Healing Kubernetes	Scholar · ResearchGate

Platform Engineering & GitOps

#	Paper	Links
7	Platform Engineering Foundations: The Internal Developer Platform (IDP) — A Qualitative Study on Reducing Cognitive Load for Java Developers	Scholar · ResearchGate
8	GitOps & Stability: Formal Verification of ArgoCD Manifests — Preventing Deployment Drift in Mission-Critical Platforms	Scholar · ResearchGate
9	Beyond Basic Sync: Why ArgoCD v3 is the Backbone of Modern Platform Engineering	Scholar · ResearchGate

Kubernetes & Cloud Infrastructure

#	Paper	Links
10	The Efficiency Era: How Kubernetes v1.35 Finally Solves the "Restart" Headache	Scholar · ResearchGate
11	FinOps & Resource Efficiency: Predictive Autoscaling Using Time-Series Analysis to Reduce Cloud Waste in EKS Clusters	Scholar · ResearchGate
12	Zero-Trust Infrastructure: Automated Identity Governance in Kubernetes — A Framework for Zero-Trust Microservices	Scholar · ResearchGate
13	Multi-Cluster Orchestration: Performance Benchmarking of Cross-Cluster Service Meshes in High-Traffic Retail Environments	Scholar · ResearchGate

📋 How to Cite My Work (BibTeX)

@article{madduri2026ai_security,
  author  = {Madduri, Pavan},
  title   = {AI Security: Preemptive Cybersecurity — Using AI Agents for Proactive Threat Hunting in Cloud-Native Environments},
  journal = {ACTA SCIENTIAE},
  year    = {2026},
  note    = {Available at: https://scholar.google.com/citations?user=au0O-8oAAAAJ}
}

@article{madduri2025agentic_mcp,
  author  = {Madduri, Pavan},
  title   = {Agentic AI Introduction: Model Context Protocol (MCP) — Bridging Large Language Models and Real-Time Kubernetes Observability},
  journal = {Power System Protection and Control},
  year    = {2025},
  note    = {Available at: https://scholar.google.com/citations?user=au0O-8oAAAAJ}
}

@article{madduri2026llm_ops,
  author  = {Madduri, Pavan},
  title   = {Scale and LLM-Ops: Architecting LLM-as-a-Service — Infrastructure Requirements for High-Concurrency Agentic Workloads},
  journal = {Power System Protection and Control},
  year    = {2026},
  note    = {Available at: https://scholar.google.com/citations?user=au0O-8oAAAAJ}
}

@article{madduri2026agentic_sre,
  author  = {Madduri, Pavan},
  title   = {Agentic SRE Teams: Human-Agent Collaboration — A New Operational Model for Autonomous Incident Response},
  journal = {Power System Protection and Control},
  year    = {2026},
  note    = {Available at: https://scholar.google.com/citations?user=au0O-8oAAAAJ}
}

@article{madduri2025platform_engineering,
  author  = {Madduri, Pavan},
  title   = {Platform Engineering Foundations: The Internal Developer Platform (IDP) — A Qualitative Study on Reducing Cognitive Load for Java Developers},
  journal = {ACTA SCIENTIAE},
  year    = {2025},
  note    = {Available at: https://scholar.google.com/citations?user=au0O-8oAAAAJ}
}

@article{madduri2024gitops_verification,
  author  = {Madduri, Pavan},
  title   = {GitOps and Stability: Formal Verification of ArgoCD Manifests — Preventing Deployment Drift in Mission-Critical Platforms},
  journal = {Power System Protection and Control},
  year    = {2024},
  note    = {Available at: https://scholar.google.com/citations?user=au0O-8oAAAAJ}
}

@article{madduri2024finops_autoscaling,
  author  = {Madduri, Pavan},
  title   = {FinOps and Resource Efficiency: Predictive Autoscaling Using Time-Series Analysis to Reduce Cloud Waste in EKS Clusters},
  journal = {ACTA SCIENTIAE},
  year    = {2024},
  note    = {Available at: https://scholar.google.com/citations?user=au0O-8oAAAAJ}
}

@article{madduri2023zero_trust,
  author  = {Madduri, Pavan},
  title   = {Zero-Trust Infrastructure: Automated Identity Governance in Kubernetes — A Framework for Zero-Trust Microservices},
  journal = {Power System Protection and Control},
  year    = {2023},
  note    = {Available at: https://scholar.google.com/citations?user=au0O-8oAAAAJ}
}

@article{madduri2023service_mesh,
  author  = {Madduri, Pavan},
  title   = {Multi-Cluster Orchestration: Performance Benchmarking of Cross-Cluster Service Meshes in High-Traffic Retail Environments},
  journal = {ACTA SCIENTIAE},
  year    = {2023},
  note    = {Available at: https://scholar.google.com/citations?user=au0O-8oAAAAJ}
}

🚀 Featured Projects - Looking for Contributors!

I'm actively developing these open source projects and welcome contributors of all skill levels!

🎮 KEDA GPU Scaler

KEDA External gRPC Scaler for GPU/AI workloads

🎮 Native NVML - Direct GPU metrics via go-nvml
🚀 Scaling Profiles - vLLM, Triton, training presets
� DaemonSet - Per-node GPU metric collection
🔄 Scale-to-Zero - GPU-aware idle detection

Tech Stack: Go, gRPC, NVIDIA NVML, Kubernetes, Helm

Referenced in KEDA #7538

� OpenTelemetry GPU Receiver

OpenTelemetry Collector receiver for GPU metrics

🔋 NVIDIA NVML - GPU utilization, memory, temperature
� OTel Native - Standard OTLP export pipeline
� Multi-GPU - All devices on the node
📈 Prometheus - Built-in Prometheus exporter

Tech Stack: Go, OpenTelemetry Collector SDK, NVML

🧠 Kube Topology Agent

K8s knowledge graph & automated root-cause analysis

🗺️ Knowledge Graph - Real-time resource topology
� Root-Cause Traversal - Graph-based incident investigation
🎮 GPU Aware - Training/inference/batch classification
🔔 AlertManager - Webhook integration for auto-investigation

Tech Stack: Go, Kubernetes API, Gorilla Mux, Helm

🤝 How to Contribute

Pick a project that interests you
Check Issues labeled good first issue or help wanted
Fork & Clone the repository
Submit a PR - I review all PRs promptly!

All contributions welcome:

💻 Code contributions
📖 Documentation improvements
🐛 Bug reports
💡 Feature suggestions
⭐ Star the repos!

More projects: KubeAI Autoscaler · Ingress2Gateway · LLMOps

☁️ Cloud Platforms

AWS - Primary cloud platform for production workloads
Azure - Previous experience with enterprise deployments

🔧 Technologies & Tools

Container Orchestration & GitOps

Kubernetes - Production cluster management, multi-tenancy, and workload orchestration
ArgoCD - GitOps-driven continuous delivery and application lifecycle management
Docker - Container image building and runtime management
Crossplane - Kubernetes-native infrastructure provisioning and composition

Observability

Prometheus & Grafana - Metrics collection, alerting, and dashboard visualization
Splunk - Enterprise log aggregation and security analytics
Datadog - Full-stack monitoring and application performance management
OpenTelemetry - Vendor-neutral distributed tracing and telemetry collection

Policy Management

Kyverno - Kubernetes-native policy engine for security and compliance
OPA (Open Policy Agent) - Unified policy enforcement across the stack

CI/CD

GitHub Actions - Cloud-native workflow automation and CI/CD pipelines
Jenkins - Enterprise CI/CD automation and pipeline orchestration
Flux - GitOps toolkit for Kubernetes continuous delivery
UrbanCode Deploy - Enterprise application release automation

Big Data

PrestoDB & Trino - High-performance distributed SQL query engines for analytics
Apache Superset - Modern data exploration and business intelligence platform
Alluxio - Unified data orchestration for compute and storage
Jupyter Notebooks - Interactive data science and machine learning workflows

🤖 Interests

Deeply interested in the convergence of AI/ML and Kubernetes - enabling organizations to run machine learning workloads at scale on cloud-native infrastructure. Exploring MLOps practices, GPU scheduling, and AI platform engineering.

📝 Blog

Sharing insights on DevOps best practices, Kubernetes deep-dives, and cloud-native architecture:

👉 pavanmadduri.wordpress.com

💬 Let's Connect

Always open to connecting with fellow engineers and enthusiasts in the cloud-native and AI/ML space!

💬 Collaborate - Open an issue or discussion on any of my repositories
🤝 Partner - Interested in contributing to CNCF or ASWF projects together
👋 Network - Happy to exchange ideas and share experiences

Let's build something great together! 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly