Building an Async Agent Queue for Long-Running AI Worker Tasks

A practical architecture for routing long-running AI agent work through durable queues, shared artifacts, worker roles, and an agentq CLI.

Building an Async Agent Queue for Long-Running AI Worker Tasks

Most agent systems start with a simple pattern: one agent asks another agent to do something, waits for the answer, then continues.

That works for small tasks.

It breaks down when the work takes minutes, needs large context, writes artifacts, runs tests, or might survive beyond the lifetime of the original chat request.

I ran into that problem while coordinating multiple self-contained AI workers. Some tasks were implementation-heavy. Others were architecture or review-heavy. A synchronous request/response call was too brittle: it could time out, lose useful context, block the coordinator, or make it hard to recover after a failure.

The solution was to treat long-running agent work less like a chat message and more like infrastructure: a durable async task queue.

This post describes a generic version of that setup: a coordinator agent, specialized worker agents, RabbitMQ, shared artifacts, and a small CLI called agentq.


The Problem

Direct synchronous agent calls are fine when the task is:

  • Short
  • Stateless
  • Easy to retry
  • Small enough to fit comfortably in one prompt
  • Not expected to produce large files, diffs, logs, or test output

But real agent work often looks different:

  • “Review this architecture and identify risks.”
  • “Implement this scoped change and write tests.”
  • “Run a long market/sentiment analysis and save the report.”
  • “Compare multiple approaches, then produce a final recommendation.”

Those tasks can take several minutes. They may need big context files. They may fail halfway through. They may need result artifacts that are bigger than a chat response.

If the coordinator waits synchronously, the whole system becomes fragile.

So the key design goal became:

The coordinator should be able to submit durable work, stay responsive, and later validate the result.


The Pattern

The system has three generic agent roles:

  • Coordinator Agent — breaks down work, submits tasks, monitors progress, validates final results.
  • Implementation Worker — handles scoped coding/implementation tasks, tests, patches, and technical verification.
  • Review Worker — reviews architecture, identifies risks, challenges weak assumptions, and writes critique.

The important part is that the worker agents are not just API endpoints. They are self-contained agents with their own tools, runtime, and execution environment.

The coordinator does not need to keep an HTTP request open while they work. It submits a task into a queue and comes back later for the result.


Architecture

User
  |
  v
Coordinator Agent
  |
  | agentq submit
  v
Task Exchange / Message Broker
  |
  +--> implementation task queue --> Implementation Worker
  |
  +--> review task queue ---------> Review Worker
  |
  v
Result Queue
  |
  v
Coordinator validates result and responds

Shared artifact storage:
  /shared/tasks/<task-id>.md
  /shared/results/<task-id>.md
  /shared/artifacts/<task-id>/...

Dead-letter queue:
  failed or poison messages

And in Mermaid form:

flowchart TD
    U[User] --> A[Coordinator Agent]
    A -->|agentq submit| X[Task Exchange]
    X --> Q1[Implementation Queue]
    X --> Q2[Review Queue]
    Q1 --> W1[Implementation Worker]
    Q2 --> W2[Review Worker]
    W1 --> S[(Shared Artifacts)]
    W2 --> S
    W1 --> R[Result Queue]
    W2 --> R
    R --> A
    X --> D[Dead-letter Queue]

The broker carries small JSON envelopes. Large context and outputs live in shared artifact storage.

That separation matters. Message queues are good at routing work. They are not ideal places to store huge prompts, logs, patches, reports, screenshots, or test output.


Message Shape

A task message is intentionally small. Conceptually it looks like this:

{
  "task_id": "task-123",
  "from": "coordinator",
  "to": "implementation-worker",
  "type": "implementation",
  "priority": "normal",
  "ttl_seconds": 3600,
  "reply_to": "results.coordinator",
  "context_ref": "file:///shared/tasks/task-123.md",
  "instructions": "Implement one scoped change and run tests.",
  "expected_output": {
    "format": "markdown",
    "artifacts": ["summary", "patch", "test-output", "risks"]
  }
}

The worker reads the task, does the work, writes a result artifact, then publishes a small completion message:

{
  "task_id": "task-123",
  "from": "implementation-worker",
  "to": "coordinator",
  "status": "completed",
  "result_ref": "file:///shared/results/task-123.md",
  "summary": "Implemented and tested the scoped change."
}

The queue is for coordination. The filesystem or blob store is for evidence.


The agentq CLI

The queue is operated through a small CLI called agentq.

Initialize the queue topology:

agentq init

Submit implementation work:

agentq submit \
  --to implementation-worker \
  --type implementation \
  --file /shared/tasks/task-123.md

Submit review work:

agentq submit \
  --to review-worker \
  --type reasoning \
  --instructions "Review this architecture and identify risks."

Check a task:

agentq status task-123
agentq result task-123

Complete a task from the worker side:

agentq complete task-123 \
  --from implementation-worker \
  --result-file /shared/results/task-123.md \
  --summary "Implemented and tested"

Fail a task explicitly:

agentq fail task-123 \
  --from implementation-worker \
  --reason "Could not reproduce the issue"

Inspect dead letters:

agentq dlq

In practice, the most reliable pattern is:

  1. Write detailed task context to a file.
  2. Submit that file with agentq submit --file.
  3. Let a worker process the task independently.
  4. Read the result artifact directly.
  5. Validate before responding to the user.

Why Use Files for Context and Results?

Large agent tasks produce artifacts:

  • Long prompts
  • Source diffs
  • Test output
  • Logs
  • Research notes
  • Architecture reviews
  • Markdown reports
  • Screenshots or generated files

Trying to cram all of that into queue messages creates operational pain.

A better split is:

  • Queue message: who should do what, where the context lives, where to reply.
  • Artifact storage: the full task instructions, result reports, patches, logs, and evidence.

This makes debugging much easier. If something looks wrong, the coordinator can inspect the result artifact directly instead of trusting a summary message.


Delivery Semantics

This design is closer to at-least-once than exactly-once.

A safe worker flow is:

  1. Receive task.
  2. Read context artifact.
  3. Do work.
  4. Write result artifact.
  5. Publish completion metadata.
  6. Acknowledge the queue message.

If the worker crashes after writing the artifact but before acknowledging the message, the task may be delivered again. That is normal for this kind of system.

Therefore, workers should be idempotent where possible:

  • Use stable task IDs.
  • Check whether a result already exists.
  • Avoid irreversible side effects unless explicitly approved.
  • Make deployment or production actions separate from analysis tasks.

For AI agents, this matters. A duplicated review is harmless. A duplicated deployment, trade, email, or payment is not.


What Works Well

The main benefits are practical rather than theoretical.

1. The coordinator stays responsive

The coordinator can submit long-running work and continue helping the user. It does not need to keep a synchronous call open for minutes.

2. Workers can specialize

One worker can be optimized for implementation. Another can be optimized for review and critique. The coordinator owns final integration.

This avoids the “one giant agent does everything” trap.

3. Large artifacts become first-class

Reports, patches, logs, and test output are saved as files. The final answer can cite evidence instead of relying on a transient chat response.

4. Failures become inspectable

A dead-letter queue gives failed messages somewhere to go. Result files give humans something to inspect.

5. The model is simple

The mental model is easy:

  • Submit task.
  • Worker processes task.
  • Worker writes result.
  • Coordinator validates result.

There is not much magic in the middle.


Tradeoffs and Sharp Edges

This architecture is useful, but it is not free.

1. More moving parts

A queue, worker processes, shared storage, result messages, and a dispatcher are more complex than a direct function call.

For small tasks, synchronous calls are still simpler.

2. Shared storage can become a coupling point

A shared filesystem is convenient, but it can become a single point of failure.

For a single-machine or tightly controlled cluster, that may be acceptable. For broader distributed systems, a blob store such as S3 or MinIO is usually a better artifact layer.

3. Status can lie

Queue status and result artifacts can drift if the worker writes one but fails to update the other.

That is why the coordinator should verify the actual result artifact, not just trust completed metadata.

4. Cancellation is hard

Once a long-running agent starts working, cancelling cleanly is not trivial. You need explicit cancellation state, worker polling, or cooperative cancellation support.

Without it, the coordinator may have to let stale work finish and discard the result.

5. Priority and TTL must be enforced, not just declared

It is easy to put priority or ttl_seconds in a JSON envelope. That does not mean the broker or worker actually enforces them.

If priority and expiry matter, they need to be implemented at the queue/publish layer or actively checked by workers.

6. At-least-once means duplicates are possible

This is not a flaw; it is the normal tradeoff of durable queues.

But it means side-effecting tasks need guardrails. Idempotency is not optional once the workers can touch real systems.

7. Observability matters

agentq status is useful, but production systems eventually need metrics:

  • Queue depth
  • Oldest queued task age
  • Processing latency
  • Failure rate
  • Retry count
  • Worker heartbeat
  • Cost per task

Without those, debugging turns into manual polling.


Security Considerations

Async agent systems should be treated as infrastructure, not just prompts.

Some important boundaries:

  • Do not put secrets in queue messages.
  • Avoid publishing internal hostnames, tokens, or private paths in public artifacts.
  • Restrict who can write task files.
  • Restrict who can publish to task queues.
  • Treat shared artifacts as untrusted input unless provenance is clear.
  • Separate analysis/review tasks from tasks that can perform irreversible actions.
  • Require explicit approval before workers touch production systems or money.

The queue is powerful because it lets agents act independently. That is also the risk.


Lessons Learned

A few practical lessons stood out.

Use file-backed task context

Long inline shell strings and giant prompt payloads are fragile. Writing a task file and submitting it by reference is cleaner and easier to debug.

Keep tasks scoped

Workers perform better when the task has one clear objective.

Bad:

“Analyze the whole system, fix the bugs, write the docs, and deploy it.”

Better:

“Inspect the queue CLI and verify the command examples for the article.”

Separate implementation from review

Implementation and critique are different modes. A worker that is good at patching files is not necessarily the best worker to challenge architectural assumptions.

Use different roles.

The coordinator must validate

Worker output is not truth. It is a report.

The coordinator should inspect artifacts, check commands, verify claims, and only then present a final answer.

Do not overbuild too early

For a small trusted agent cluster, RabbitMQ plus shared artifacts is enough to get durable async workflows working.

You can add cancellation, metrics, priority enforcement, object storage, and stronger auth later as the system proves it needs them.


When I Would Use This Pattern

This pattern is a good fit when:

  • Tasks take longer than a normal request timeout.
  • Workers need to run tools, tests, or research independently.
  • Result artifacts matter.
  • The coordinator should remain responsive.
  • Work can be retried safely or made idempotent.
  • Agent roles are meaningfully different.

I would not use it for every agent call. If the task is small, synchronous is simpler.

But once agent work starts looking like real background jobs, it should be treated like real background jobs.


Conclusion

The biggest shift is mental.

Long-running agent delegation should not be modeled as “send a chat message and wait.”

It should be modeled as:

  • Create a task.
  • Route it durably.
  • Let the right worker handle it.
  • Save evidence.
  • Validate the result.

A small queue-backed tool like agentq turns multi-agent coordination from a fragile conversation into a workflow.

That does not eliminate the hard parts. You still need idempotency, observability, security boundaries, and honest failure handling.

But it gives the system a backbone.

And once agents have a backbone, they become much easier to trust with work that takes longer than a single turn.


Tags: #AI #Agents #RabbitMQ #Automation #Nostr #AgenticAI #Infrastructure


Write a comment
No comments yet.