Self-hosted AI agent for production incident management
Quick Start • Full Documentation • Features • Contributing • MIT License
Runs locally on your laptop as Docker containers. No login, no cloud sync, no telemetry. Paste errors, logs, alerts, or screenshots — the agent triages immediately, runs commands against your infrastructure, and walks you through root cause and fix.
Built by Doctor Droid.
- Run commands, not just suggestions — executes kubectl, aws, az, gcloud, gh, docker, terraform directly on your host
- Skills system — drop markdown files to teach the agent about your stack, runbooks, architecture
- Persistent memory — remembers your infrastructure, past incidents, and learned patterns
- Infrastructure sync — discovers Docker, Kubernetes (all contexts), AWS, Azure, GCP, GitHub, network ports
- MCP integrations — connect Datadog, Sentry, Grafana, PagerDuty, and 15+ other services
- Learner worker — periodically analyzes past conversations to extract reusable knowledge
- Feedback loop — thumbs up/down on responses calibrates the agent over time
- Multi-provider — OpenAI, Anthropic Claude, Azure AI Foundry (OpenAI & Kimi models)
- Conversation history — all conversations persisted with full tool call replay
- Dark & light mode — terminal-aesthetic web UI at localhost:7433
git clone git@github.com:DrDroidLab/OpenDebug.git
cd OpenDebug/droid-agent
cp .env.example .env # Add your AI provider API key
cp config/mcp.example.json config/mcp.json # Enable MCP integrations (optional)
docker compose up -d --build
open http://localhost:7433See the full setup guide for detailed configuration.
┌─────────────────────────────────────────────────┐
│ Docker Compose (your laptop) │
│ │
│ ┌──────────────┐ ┌────────┐ ┌────────────┐ │
│ │ Droid Agent │ │ Redis │ │ PostgreSQL │ │
│ │ Web UI :7433 │ │ cache │ │ persistent │ │
│ │ Agent loop │ │ │ │ storage │ │
│ │ Learner │ │ │ │ │ │
│ │ MCP client │ │ │ │ │ │
│ └──────────────┘ └────────┘ └────────────┘ │
│ │ │
│ Mounted: skills/ memory/ config/ ~/.kube/ etc. │
└─────────────────────────────────────────────────┘
- Paste a stack trace → agent identifies the failing service and checks its pods
- Upload a Grafana screenshot → "what caused this spike?"
- "Check if my prod pods are healthy" → agent runs kubectl across all clusters
- "Why is the API slow?" → agent checks logs, metrics, DB connections, recent deploys
- "Write a runbook for restarting payments" → agent creates a skill file
Full documentation lives in droid-agent/README.md:
- Configuration & Providers
- Web UI Guide
- CLI Usage
- Skills
- Memory
- MCP Server Integrations
- API Reference
- Troubleshooting
We welcome contributions. See CONTRIBUTING.md for guidelines.
