|

Detect issues • Gives root causes • Trigger alerts instantly

InfraGuardian continuously monitors Kubernetes pods, cluster events in real-time and sends alerts instantly.

Real-time monitoring for pods & nodes

Secure deployment inside cluster

Actionable insights + alerting

infraguardian@k8s-cluster

Connected

➜~▊

Features of InfraGuardian

Everything you need for Kubernetes monitoring, diagnostics, and optimization

Pod Failure Detection

Detect critical pod issues like BackOff/CrashLoop patterns, OOMKilled, and FailedScheduling using Kubernetes events.

Prometheus Restart Metrics

Analyze pod restart counts from Prometheus to measure stability and spot repeated failures.

Event + Metrics Correlation

Correlate Kubernetes events with Prometheus restart metrics to calculate severity (MEDIUM/HIGH).

Severity Intelligence

Automatically classifies issues based on restart frequency to highlight the most critical pods first.

Smart Recommendations

YAML-based recommendation engine provides actionable fixes like checking logs, rollback, or tuning resources.

Secure In-Cluster Access

Runs inside Kubernetes using ServiceAccount + RBAC for controlled access to pods, events, and nodes.

Intelligent Pod Diagnostics

Kubernetes events + Prometheus restarts correlation with Slack alerts and actionable recommendations.

Preview Notice

This dashboard shows static sample data for demonstration. Infra Guardian currently provides real insights via APIs and Slack alerts. Live dashboard integration will be added in future versions.

Infra Guardian Insights

Static Preview

Insights Generated

Problem Pods Found

Restart Signals

Prometheus

Slack Alerts

Enabled

Issues Detected (from Kubernetes Events)

default/backoff-test

Event: BackOff · Severity: HIGH

12x

restarts

default/database-backup

Event: OOMKilled · Severity: MEDIUM

restarts

default/worker-scheduler

Event: FailedScheduling · Severity: MEDIUM

restarts

Severity is calculated by correlating Kubernetes events with Prometheus restart metrics.

Recommended Actions (from YAML Rules)

Check container logs for BackOff pods (kubectl logs)
Verify startup command/config and rollback recent deployment if needed
Increase memory limits or fix memory leak for OOMKilled workloads

Infra Guardian uses rule-based recommendations (rules/pod.rules.yaml) to provide guided fixes.

⚠️ Dashboard values shown above are sample data for UI preview. Infra Guardian currently delivers real monitoring via APIs + Slack alerts.

Deploy in 4 Simple Steps

Install Infra Guardian with Helm, validate Prometheus, and enable Slack alerts in minutes.

Create kind Cluster & Load Image

kind needs image loaded manually (it can't see local Docker images)

bash

kind create cluster --name infra-guardian
docker pull sagarbawanthade/infra-guardian:0.1.0
kind load docker-image sagarbawanthade/infra-guardian:0.1.0 --name infra-guardian

Install Infra Guardian (with Prometheus)

Clone the Project from Github, Helm installs Infra Guardian + kube-prometheus-stack dependency

bash

# From repo root (InfraGuardian/)
helm dependency update ./infra-guardian
helm install infra-guardian ./infra-guardian

Configure Slack Webhook

Create secret so Infra Guardian can send Slack alerts

bash

kubectl create secret generic infra-guardian-secrets \
  --from-literal=SLACK_WEBHOOK_URL="https://hooks.slack.com/services/XXX/YYY/ZZZ"

kubectl rollout restart deploy/infra-guardian

Next Steps

Check Infra Guardian health:kubectl port-forward deploy/infra-guardian 3000:3000
Test APIs:curl http://localhost:3000/insights
View Prometheus:kubectl port-forward svc/monitoring-prometheus 9090:9090
Check logs:kubectl logs deploy/infra-guardian
Trigger alert:kubectl run backoff-test --image=busybox --restart=Always -- sh -c "exit 1"

Kubernetes Native

Ready to Monitor Your Kubernetes Cluster?

Deploy Infra Guardian with Helm to detect pod failures, correlate events with Prometheus restart metrics, and receive actionable Slack alerts with recommendations.

Deploy with Helm View on GitHub Read Docs

< 5 min

Helm Installation

5 APIs

Health + Insights Endpoints

Slack Alerts

With Cooldown Protection

Infra Guardian is currently focused on pod-event diagnostics and restart metrics correlation. More dashboard and metric coverage will be added in upcoming versions.