Skip to content
View pmady's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report pmady

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
pmady/README.md

Hi there, I'm Pavan 👋

LinkedIn Blog HackerNoon Medium Dev.to Google Scholar ResearchGate Profile Views

About Me

Senior Cloud Platform Engineer at W.W. Grainger, Inc. and CNCF Golden Kubestronaut. Deep expertise in cloud-native GPU/AI infrastructure, Kubernetes ecosystems, and platform engineering. Building open-source tools for GPU workload autoscaling, observability, and topology-aware incident response.

📊 GitHub Stats

GitHub Stats

Stats updated on 2026-04-19 01:15 UTC

🏆 Certifications

Kubestronaut

Kubestronaut - One of the elite professionals who have achieved all five Kubernetes certifications from the CNCF:

  • KCNA (Kubernetes and Cloud Native Associate)
  • CKA (Certified Kubernetes Administrator)
  • CKAD (Certified Kubernetes Application Developer)
  • CKS (Certified Kubernetes Security Specialist)
  • KCSA (Kubernetes and Cloud Native Security Associate)

🌱 Open Source Contributions

CNCF Contributor

Actively contributing to CNCF (Cloud Native Computing Foundation) and ASWF (Academy Software Foundation) projects:

CNCF (Cloud Native Computing Foundation)

Project Description Contributions
Dragonfly P2P-based file distribution and image acceleration client#1665 - Add Hugging Face backend support with hf:// protocol, client#1673 - Add ModelScope backend support with modelscope:// protocol, d7y.io#386 - Add hf:// protocol documentation, d7y.io#398 - Add P2P-accelerated AI model downloads blog post, helm-charts#455 - Add injector support to helm chart, helm-charts#480 - Replace deprecated bitnamilegacy/mysql with bitnami/mysql
Kubernetes Production-Grade Container Orchestration #53891 - Document deployment.kubernetes.io/* annotations, #53892 - Add kubectl apply view-last-applied documentation
TiKV Distributed transactional key-value database #19225 - Add AGENTS.md for AI agent guidance
Volcano Cloud-native batch scheduling for AI/HPC #5095 - GPU NUMA topology awareness in scheduler, apis#229 - Add GPUInfo type to NumatopoSpec CRD, resource-exporter#12 - GPU NUMA topology discovery via sysfs
KEDA Kubernetes Event-driven Autoscaling keda-docs#1658 - Removing metricName from the kedadocs, #7538 - GPU/AI inference scaler architectural analysis
Metal³ Bare metal host provisioning for Kubernetes #624 - Fix redirect links in tryit.md
OpenTelemetry Observability framework #8632 - Add .NET troubleshooting page
kpt Kubernetes-native packaging and resource management #4278 - Fix kpt fn doc command for KRM functions expecting input

ASWF (Academy Software Foundation)

Project Description Contributions
OpenColorIO Color management library #2229 - Add release signing workflow, #2230 - Add Dependabot configuration, #2243 - Add Vulkan unit test framework
OpenCue Cloud rendering management system #2134 - Add scheduled subscription recalculation task
OpenImageIO Image processing library #4976 - Fix IBA::compare_Yee() channel access
RAWtoACES RAW to ACES image conversion #222 - Add build developer documentation
xSTUDIO Playback and review application #186 - Fix broken build guide links

Total: 26 PRs across 15 projects in CNCF and ASWF foundations!

Personal Projects

Project Description Contributions
keda-gpu-scaler KEDA External gRPC Scaler for GPU/AI workloads CI Native NVML metrics, DaemonSet deployment, pre-built scaling profiles (vLLM, Triton, training), Helm chart, scale-to-zero
otel-gpu-receiver OpenTelemetry Collector receiver for GPU metrics NVIDIA GPU metrics via NVML, OpenTelemetry-native, Prometheus exporter, multi-GPU support
kube-topology-agent K8s topology discovery & automated root-cause analysis Knowledge graph of cluster resources, AlertManager webhook integration, GPU workload classification, blast-radius analysis
Golden Kubestronaut Learning Kubernetes certification study guides and resources #23 - Dark mode persistence, #24 - PDF generation workflow

� Research Publications

Google Scholar ResearchGate

Peer-reviewed research on Cloud-Native, Kubernetes, AI/ML Operations, and Platform Engineering:

AI & Agentic Systems

# Paper Links
1 AI Security: Preemptive Cybersecurity — Using AI Agents for Proactive Threat Hunting in Cloud-Native Environments Scholar · ResearchGate
2 Agentic AI Introduction: Model Context Protocol (MCP) — Bridging Large Language Models and Real-Time Kubernetes Observability Scholar · ResearchGate
3 Scale & LLM-Ops: Architecting LLM-as-a-Service — Infrastructure Requirements for High-Concurrency Agentic Workloads Scholar · ResearchGate

SRE & Self-Healing Infrastructure

# Paper Links
4 Agentic SRE Teams: Human-Agent Collaboration — A New Operational Model for Autonomous Incident Response Scholar · ResearchGate
5 Autonomous Remediation and Agentic SRE Teams: Reinforcement Learning for Self-Healing Infrastructure and Human-Agent Collaboration in Incident Response Scholar · ResearchGate
6 From PagerDuty to 'Agentic Ops': The Rise of Self-Healing Kubernetes Scholar · ResearchGate

Platform Engineering & GitOps

# Paper Links
7 Platform Engineering Foundations: The Internal Developer Platform (IDP) — A Qualitative Study on Reducing Cognitive Load for Java Developers Scholar · ResearchGate
8 GitOps & Stability: Formal Verification of ArgoCD Manifests — Preventing Deployment Drift in Mission-Critical Platforms Scholar · ResearchGate
9 Beyond Basic Sync: Why ArgoCD v3 is the Backbone of Modern Platform Engineering Scholar · ResearchGate

Kubernetes & Cloud Infrastructure

# Paper Links
10 The Efficiency Era: How Kubernetes v1.35 Finally Solves the "Restart" Headache Scholar · ResearchGate
11 FinOps & Resource Efficiency: Predictive Autoscaling Using Time-Series Analysis to Reduce Cloud Waste in EKS Clusters Scholar · ResearchGate
12 Zero-Trust Infrastructure: Automated Identity Governance in Kubernetes — A Framework for Zero-Trust Microservices Scholar · ResearchGate
13 Multi-Cluster Orchestration: Performance Benchmarking of Cross-Cluster Service Meshes in High-Traffic Retail Environments Scholar · ResearchGate
📋 How to Cite My Work (BibTeX)
@article{madduri2026ai_security,
  author  = {Madduri, Pavan},
  title   = {AI Security: Preemptive Cybersecurity — Using AI Agents for Proactive Threat Hunting in Cloud-Native Environments},
  journal = {ACTA SCIENTIAE},
  year    = {2026},
  note    = {Available at: https://scholar.google.com/citations?user=au0O-8oAAAAJ}
}

@article{madduri2025agentic_mcp,
  author  = {Madduri, Pavan},
  title   = {Agentic AI Introduction: Model Context Protocol (MCP) — Bridging Large Language Models and Real-Time Kubernetes Observability},
  journal = {Power System Protection and Control},
  year    = {2025},
  note    = {Available at: https://scholar.google.com/citations?user=au0O-8oAAAAJ}
}

@article{madduri2026llm_ops,
  author  = {Madduri, Pavan},
  title   = {Scale and LLM-Ops: Architecting LLM-as-a-Service — Infrastructure Requirements for High-Concurrency Agentic Workloads},
  journal = {Power System Protection and Control},
  year    = {2026},
  note    = {Available at: https://scholar.google.com/citations?user=au0O-8oAAAAJ}
}

@article{madduri2026agentic_sre,
  author  = {Madduri, Pavan},
  title   = {Agentic SRE Teams: Human-Agent Collaboration — A New Operational Model for Autonomous Incident Response},
  journal = {Power System Protection and Control},
  year    = {2026},
  note    = {Available at: https://scholar.google.com/citations?user=au0O-8oAAAAJ}
}

@article{madduri2025platform_engineering,
  author  = {Madduri, Pavan},
  title   = {Platform Engineering Foundations: The Internal Developer Platform (IDP) — A Qualitative Study on Reducing Cognitive Load for Java Developers},
  journal = {ACTA SCIENTIAE},
  year    = {2025},
  note    = {Available at: https://scholar.google.com/citations?user=au0O-8oAAAAJ}
}

@article{madduri2024gitops_verification,
  author  = {Madduri, Pavan},
  title   = {GitOps and Stability: Formal Verification of ArgoCD Manifests — Preventing Deployment Drift in Mission-Critical Platforms},
  journal = {Power System Protection and Control},
  year    = {2024},
  note    = {Available at: https://scholar.google.com/citations?user=au0O-8oAAAAJ}
}

@article{madduri2024finops_autoscaling,
  author  = {Madduri, Pavan},
  title   = {FinOps and Resource Efficiency: Predictive Autoscaling Using Time-Series Analysis to Reduce Cloud Waste in EKS Clusters},
  journal = {ACTA SCIENTIAE},
  year    = {2024},
  note    = {Available at: https://scholar.google.com/citations?user=au0O-8oAAAAJ}
}

@article{madduri2023zero_trust,
  author  = {Madduri, Pavan},
  title   = {Zero-Trust Infrastructure: Automated Identity Governance in Kubernetes — A Framework for Zero-Trust Microservices},
  journal = {Power System Protection and Control},
  year    = {2023},
  note    = {Available at: https://scholar.google.com/citations?user=au0O-8oAAAAJ}
}

@article{madduri2023service_mesh,
  author  = {Madduri, Pavan},
  title   = {Multi-Cluster Orchestration: Performance Benchmarking of Cross-Cluster Service Meshes in High-Traffic Retail Environments},
  journal = {ACTA SCIENTIAE},
  year    = {2023},
  note    = {Available at: https://scholar.google.com/citations?user=au0O-8oAAAAJ}
}

🚀 Featured Projects - Looking for Contributors!

I'm actively developing these open source projects and welcome contributors of all skill levels!

Stars License

KEDA External gRPC Scaler for GPU/AI workloads

  • 🎮 Native NVML - Direct GPU metrics via go-nvml
  • 🚀 Scaling Profiles - vLLM, Triton, training presets
  • DaemonSet - Per-node GPU metric collection
  • 🔄 Scale-to-Zero - GPU-aware idle detection

Tech Stack: Go, gRPC, NVIDIA NVML, Kubernetes, Helm

Referenced in KEDA #7538

Stars License

OpenTelemetry Collector receiver for GPU metrics

  • 🔋 NVIDIA NVML - GPU utilization, memory, temperature
  • OTel Native - Standard OTLP export pipeline
  • Multi-GPU - All devices on the node
  • 📈 Prometheus - Built-in Prometheus exporter

Tech Stack: Go, OpenTelemetry Collector SDK, NVML

Stars License

K8s knowledge graph & automated root-cause analysis

  • 🗺️ Knowledge Graph - Real-time resource topology
  • Root-Cause Traversal - Graph-based incident investigation
  • 🎮 GPU Aware - Training/inference/batch classification
  • 🔔 AlertManager - Webhook integration for auto-investigation

Tech Stack: Go, Kubernetes API, Gorilla Mux, Helm

🤝 How to Contribute

  1. Pick a project that interests you
  2. Check Issues labeled good first issue or help wanted
  3. Fork & Clone the repository
  4. Submit a PR - I review all PRs promptly!

All contributions welcome:

  • 💻 Code contributions
  • 📖 Documentation improvements
  • 🐛 Bug reports
  • 💡 Feature suggestions
  • ⭐ Star the repos!

More projects: KubeAI Autoscaler · Ingress2Gateway · LLMOps

☁️ Cloud Platforms

  • AWS - Primary cloud platform for production workloads
  • Azure - Previous experience with enterprise deployments

🔧 Technologies & Tools

Container Orchestration & GitOps

  • Kubernetes - Production cluster management, multi-tenancy, and workload orchestration
  • ArgoCD - GitOps-driven continuous delivery and application lifecycle management
  • Docker - Container image building and runtime management
  • Crossplane - Kubernetes-native infrastructure provisioning and composition

Observability

  • Prometheus & Grafana - Metrics collection, alerting, and dashboard visualization
  • Splunk - Enterprise log aggregation and security analytics
  • Datadog - Full-stack monitoring and application performance management
  • OpenTelemetry - Vendor-neutral distributed tracing and telemetry collection

Policy Management

  • Kyverno - Kubernetes-native policy engine for security and compliance
  • OPA (Open Policy Agent) - Unified policy enforcement across the stack

CI/CD

  • GitHub Actions - Cloud-native workflow automation and CI/CD pipelines
  • Jenkins - Enterprise CI/CD automation and pipeline orchestration
  • Flux - GitOps toolkit for Kubernetes continuous delivery
  • UrbanCode Deploy - Enterprise application release automation

Big Data

  • PrestoDB & Trino - High-performance distributed SQL query engines for analytics
  • Apache Superset - Modern data exploration and business intelligence platform
  • Alluxio - Unified data orchestration for compute and storage
  • Jupyter Notebooks - Interactive data science and machine learning workflows

🤖 Interests

Deeply interested in the convergence of AI/ML and Kubernetes - enabling organizations to run machine learning workloads at scale on cloud-native infrastructure. Exploring MLOps practices, GPU scheduling, and AI platform engineering.

📝 Blog

Sharing insights on DevOps best practices, Kubernetes deep-dives, and cloud-native architecture:

👉 pavanmadduri.wordpress.com

💬 Let's Connect

Always open to connecting with fellow engineers and enthusiasts in the cloud-native and AI/ML space!

  • 💬 Collaborate - Open an issue or discussion on any of my repositories
  • 🤝 Partner - Interested in contributing to CNCF or ASWF projects together
  • 👋 Network - Happy to exchange ideas and share experiences

Let's build something great together! 🚀

Pinned Loading

  1. llmops llmops Public

    🚀 The Ultimate Curated List of LLMOps Tools, Frameworks, and Resources - A comprehensive collection of the best tools for Large Language Model Operations

    Shell 8 4

  2. pmady pmady Public

    8 1

  3. golden-kubestronaut-learning golden-kubestronaut-learning Public

    A comprehensive learning resource for achieving Kubestronaut and Golden Kubestronaut status through CNCF certifications

    Markdown 12 7

  4. kubeai-autoscaler kubeai-autoscaler Public

    Go 10 5

  5. ingress2gateway ingress2gateway Public

    Convert Kubernetes Ingress objects to Gateway API resources - Web GUI and REST API

    Python 7 3

  6. keda-gpu-scaler keda-gpu-scaler Public

    KEDA External gRPC Scaler for GPU workloads — native NVML metrics via DaemonSet, no Prometheus required

    Go 15 10