|
Detect issues • Gives root causes • Trigger alerts instantly
InfraGuardian continuously monitors Kubernetes pods, cluster events in real-time and sends alerts instantly.
Features of InfraGuardian
Everything you need for Kubernetes monitoring, diagnostics, and optimization
Pod Failure Detection
Detect critical pod issues like BackOff/CrashLoop patterns, OOMKilled, and FailedScheduling using Kubernetes events.
Prometheus Restart Metrics
Analyze pod restart counts from Prometheus to measure stability and spot repeated failures.
Event + Metrics Correlation
Correlate Kubernetes events with Prometheus restart metrics to calculate severity (MEDIUM/HIGH).
Severity Intelligence
Automatically classifies issues based on restart frequency to highlight the most critical pods first.
Smart Recommendations
YAML-based recommendation engine provides actionable fixes like checking logs, rollback, or tuning resources.
Secure In-Cluster Access
Runs inside Kubernetes using ServiceAccount + RBAC for controlled access to pods, events, and nodes.
Intelligent Pod Diagnostics
Kubernetes events + Prometheus restarts correlation with Slack alerts and actionable recommendations.
Preview Notice
This dashboard shows static sample data for demonstration. Infra Guardian currently provides real insights via APIs and Slack alerts. Live dashboard integration will be added in future versions.
Infra Guardian Insights
Insights Generated
24
Problem Pods Found
5
Restart Signals
Prometheus
Slack Alerts
Enabled
Issues Detected (from Kubernetes Events)
default/backoff-test
Event: BackOff · Severity: HIGH
12x
restarts
default/database-backup
Event: OOMKilled · Severity: MEDIUM
3x
restarts
default/worker-scheduler
Event: FailedScheduling · Severity: MEDIUM
0x
restarts
Severity is calculated by correlating Kubernetes events with Prometheus restart metrics.
Recommended Actions (from YAML Rules)
- Check container logs for BackOff pods (kubectl logs)
- Verify startup command/config and rollback recent deployment if needed
- Increase memory limits or fix memory leak for OOMKilled workloads
Infra Guardian uses rule-based recommendations (rules/pod.rules.yaml) to provide guided fixes.
⚠️ Dashboard values shown above are sample data for UI preview. Infra Guardian currently delivers real monitoring via APIs + Slack alerts.
Deploy in 4 Simple Steps
Install Infra Guardian with Helm, validate Prometheus, and enable Slack alerts in minutes.
Install Infra Guardian (with Prometheus)
Helm installs Infra Guardian + kube-prometheus-stack dependency
# From repo root (InfraGuardian/) helm dependency update ./infra-guardian helm install infra-guardian ./infra-guardian
Build Docker Image
Build Infra Guardian image locally
docker build -t sagarbawanthade/infra-guardian:0.1.0 ./infra-guardian-core
Create kind Cluster & Load Image
kind needs image loaded manually (it can't see local Docker images)
kind create cluster --name infra-guardian kind load docker-image sagarbawanthade/infra-guardian:0.1.0 --name infra-guardian
Configure Slack Webhook
Create secret so Infra Guardian can send Slack alerts
kubectl create secret generic infra-guardian-secrets \ --from-literal=SLACK_WEBHOOK_URL="https://hooks.slack.com/services/XXX/YYY/ZZZ" kubectl rollout restart deploy/infra-guardian
Next Steps
- Check Infra Guardian health:
kubectl port-forward deploy/infra-guardian 3000:3000 - Test APIs:
curl http://localhost:3000/insights - View Prometheus:
kubectl port-forward svc/monitoring-prometheus 9090:9090 - Check logs:
kubectl logs deploy/infra-guardian - Trigger alert:
kubectl run backoff-test --image=busybox --restart=Always -- sh -c "exit 1"
Ready to Monitor Your Kubernetes Cluster?
Deploy Infra Guardian with Helm to detect pod failures, correlate events with Prometheus restart metrics, and receive actionable Slack alerts with recommendations.
< 5 min
Helm Installation
5 APIs
Health + Insights Endpoints
Slack Alerts
With Cooldown Protection
Infra Guardian is currently focused on pod-event diagnostics and restart metrics correlation. More dashboard and metric coverage will be added in upcoming versions.