Ainz-OS is narrowly focused: coordinate multiple agents, route tasks to the right agents, track what's happening.
Tasks are matched to agents based on required capabilities. Only agents with the right skills get the work. No manual routing needed.
Each agent declares max concurrent tasks. We enforce it strictly. If an agent says capacity=5, it will never have 6 concurrent tasks. Period.
High-priority tasks jump the queue. User requests beat batch jobs. Make it explicit in your design.
When an agent fails: record failure, mark agent as FAILED, move task back to queue for retry. You control recovery.
Every agent has a state: IDLE, BUSY, or FAILED. State transitions are tracked and logged.
Failed tasks automatically go back to the queue. No work is lost.
Health checks every 5 seconds. Know what's happening right now.
Utilization, completed tasks, failed tasks, heartbeat age, active task count.
Metrics are collected in history (up to 1000 snapshots). See trends over time.
Scheduling latency per decision. Good enough for most workloads.
Per agent throughput. Your agent code will be the bottleneck, not the scheduler.
Memory overhead. Scales linearly. 100 agents = ~400MB.
Task queues persist state; Ainz-OS runs in-memory.
Load balancers manage network traffic; Ainz-OS manages agent work.
Workflow engines have complex DAG scheduling; Ainz-OS does simple task-to-agent routing.
Autoscaling policies are your job. Ainz-OS tells you what's congested.