|

Detect issues • Gives root causes • Trigger alerts instantly

InfraGuardian continuously monitors Kubernetes pods, cluster events in real-time and sends alerts instantly.

Real-time monitoring for pods & nodes
Secure deployment inside cluster
Actionable insights + alerting
infraguardian@k8s-cluster
Connected
~

Features of InfraGuardian

Everything you need for Kubernetes monitoring, diagnostics, and optimization

Pod Failure Detection

Detect critical pod issues like BackOff/CrashLoop patterns, OOMKilled, and FailedScheduling using Kubernetes events.

Prometheus Restart Metrics

Analyze pod restart counts from Prometheus to measure stability and spot repeated failures.

Event + Metrics Correlation

Correlate Kubernetes events with Prometheus restart metrics to calculate severity (MEDIUM/HIGH).

Severity Intelligence

Automatically classifies issues based on restart frequency to highlight the most critical pods first.

Smart Recommendations

YAML-based recommendation engine provides actionable fixes like checking logs, rollback, or tuning resources.

Secure In-Cluster Access

Runs inside Kubernetes using ServiceAccount + RBAC for controlled access to pods, events, and nodes.

Intelligent Pod Diagnostics

Kubernetes events + Prometheus restarts correlation with Slack alerts and actionable recommendations.

Preview Notice

This dashboard shows static sample data for demonstration. Infra Guardian currently provides real insights via APIs and Slack alerts. Live dashboard integration will be added in future versions.

Infra Guardian Insights

Static Preview

Insights Generated

24

Problem Pods Found

5

Restart Signals

Prometheus

Slack Alerts

Enabled

Issues Detected (from Kubernetes Events)

default/backoff-test

Event: BackOff · Severity: HIGH

12x

restarts

default/database-backup

Event: OOMKilled · Severity: MEDIUM

3x

restarts

default/worker-scheduler

Event: FailedScheduling · Severity: MEDIUM

0x

restarts

Severity is calculated by correlating Kubernetes events with Prometheus restart metrics.

Recommended Actions (from YAML Rules)

  • Check container logs for BackOff pods (kubectl logs)
  • Verify startup command/config and rollback recent deployment if needed
  • Increase memory limits or fix memory leak for OOMKilled workloads

Infra Guardian uses rule-based recommendations (rules/pod.rules.yaml) to provide guided fixes.

⚠️ Dashboard values shown above are sample data for UI preview. Infra Guardian currently delivers real monitoring via APIs + Slack alerts.

Deploy in 4 Simple Steps

Install Infra Guardian with Helm, validate Prometheus, and enable Slack alerts in minutes.

Install Infra Guardian (with Prometheus)

Helm installs Infra Guardian + kube-prometheus-stack dependency

1
bash
# From repo root (InfraGuardian/)
helm dependency update ./infra-guardian
helm install infra-guardian ./infra-guardian

Build Docker Image

Build Infra Guardian image locally

2
bash
docker build -t sagarbawanthade/infra-guardian:0.1.0 ./infra-guardian-core

Create kind Cluster & Load Image

kind needs image loaded manually (it can't see local Docker images)

3
bash
kind create cluster --name infra-guardian
kind load docker-image sagarbawanthade/infra-guardian:0.1.0 --name infra-guardian

Configure Slack Webhook

Create secret so Infra Guardian can send Slack alerts

4
bash
kubectl create secret generic infra-guardian-secrets \
  --from-literal=SLACK_WEBHOOK_URL="https://hooks.slack.com/services/XXX/YYY/ZZZ"

kubectl rollout restart deploy/infra-guardian

Next Steps

  • Check Infra Guardian health:kubectl port-forward deploy/infra-guardian 3000:3000
  • Test APIs:curl http://localhost:3000/insights
  • View Prometheus:kubectl port-forward svc/monitoring-prometheus 9090:9090
  • Check logs:kubectl logs deploy/infra-guardian
  • Trigger alert:kubectl run backoff-test --image=busybox --restart=Always -- sh -c "exit 1"
Kubernetes Native

Ready to Monitor Your Kubernetes Cluster?

Deploy Infra Guardian with Helm to detect pod failures, correlate events with Prometheus restart metrics, and receive actionable Slack alerts with recommendations.

< 5 min

Helm Installation

5 APIs

Health + Insights Endpoints

Slack Alerts

With Cooldown Protection

Infra Guardian is currently focused on pod-event diagnostics and restart metrics correlation. More dashboard and metric coverage will be added in upcoming versions.