Orchestration

Capability-Based Routing

Tasks are matched to agents based on required capabilities. Only agents with the right skills get the work. No manual routing needed.

Hard Capacity Limits

Each agent declares max concurrent tasks. We enforce it strictly. If an agent says capacity=5, it will never have 6 concurrent tasks. Period.

Priority Scheduling

High-priority tasks jump the queue. User requests beat batch jobs. Make it explicit in your design.

Reliability

Failure Handling

When an agent fails: record failure, mark agent as FAILED, move task back to queue for retry. You control recovery.

State Tracking

Every agent has a state: IDLE, BUSY, or FAILED. State transitions are tracked and logged.

Task Retry

Failed tasks automatically go back to the queue. No work is lost.

Observability

Real-time Monitoring

Health checks every 5 seconds. Know what's happening right now.

Per-agent Metrics

Utilization, completed tasks, failed tasks, heartbeat age, active task count.

Historical Snapshots

Metrics are collected in history (up to 1000 snapshots). See trends over time.

Performance

~40ms Scheduling

Scheduling latency per decision. Good enough for most workloads.

~900 tasks/min

Per agent throughput. Your agent code will be the bottleneck, not the scheduler.

~3-5MB per agent

Memory overhead. Scales linearly. 100 agents = ~400MB.

What Ainz-OS Is Not

Not a task queue

Task queues persist state; Ainz-OS runs in-memory.

Not a load balancer

Load balancers manage network traffic; Ainz-OS manages agent work.

Not a workflow engine

Workflow engines have complex DAG scheduling; Ainz-OS does simple task-to-agent routing.

Not autoscaling

Autoscaling policies are your job. Ainz-OS tells you what's congested.

Real Production Results

Before Ainz-OS

Validation agents
95%
Transformation
40%
Report agent
10%
Latency: 8s

After Ainz-OS

Validation agents
65%
Transformation
70%
Report agent
60%
Latency: 2.1s