Agent Loop
The agent loop is the core processing cycle: a message arrives, the agent assembles context, calls the LLM, executes tools, and produces a response. This page walks through each step based on the actual implementation.
Entry points
A turn can be triggered by:
- A chat message via a channel (Feishu, Discord, HTTP)
- A heartbeat timer firing
- A scheduled task from the scheduler
Step-by-step flow
1. Receive the trigger message
The Think.think() method receives:
msg— the incoming message that triggered this turntool_msgs— results from tool calls in a previous iterationmode—"text"or"voice"
2. Build the system prompt
The system prompt is assembled from 5 ordered layers (see System Prompt):
- Fixed skeleton — agent role, persona, skills list, knowledge context, conversation summary
- Runtime facts — agent ID, channel info, current time, workspace path, turn variables
- Workspace context — contents of AGENTS.md, SOUL.md, IDENTITY.md if present
- Heartbeat context — injected only for heartbeat-triggered turns
- Ephemeral system prompt — per-turn temporary instructions (used for sub-agents)
3. Fetch conversation history
Recent messages are retrieved from the sensor memory store. Up to context_top_k (default 12) most recent messages for the current conversation are loaded. Each session key stores a rolling cap of 100 raw messages.
4. Build the message list
The LLM message list is constructed in order:
- System prompt messages (from step 2)
- Historical conversation messages (from step 3), prefixed with
[timestamp] speaker: content - The current trigger message
- Any tool result messages from a previous iteration
Messages are deduplicated by ID.
5. Sanitize messages
Message sanitization ensures OpenAI-compatible tool call ordering:
- Orphan tool messages (no preceding assistant tool_call) are removed
- Incomplete tool call blocks (missing tool responses) are dropped
- Tool responses are ensured to be immediately after their assistant call
6. Call the LLM
The assembled messages are sent to the LLM via the configured provider. The agent can either:
- Stream the response — yield deltas as they arrive
- Return the complete response at once
The LLM can respond with:
- Text — a direct reply
- Tool calls — requests to execute tools (read, exec, web_fetch, etc.)
7. Handle the result
- If the LLM returns text: the response is sent back through the channel
- If the LLM returns tool calls: each tool is executed, results are collected, and the loop restarts from step 2 with the tool results as
tool_msgs
Voice mode (mode="voice") uses a separate VoiceThink class that manages realtime WebSocket connections with the LLM for streaming audio, VAD, and session reconnection.
Tool execution
Available tools depend on the mode:
| Mode | Available tools |
|---|---|
| Text | read, write, exec, web_fetch, api_request, delegate_task, manage_schedule, process |
| Voice | skip_voice_reply (plus a subset of text tools) |
Tool results are collected and fed back into the loop. The LLM can make multiple rounds of tool calls before producing a final text response.
Context budget
The LLM context window is finite. MushroomAgent manages it through:
- Message history: limited to
context_top_krecent messages (configurable) - Workspace files: each file capped at 4000 chars, total at 12000 chars
- Tool output: individual tools apply their own truncation limits
- Token counting:
max_completion_tokensis calculated based on available context window
If the combined prompt exceeds the model's context window, the LLM call will fail. Adjust memory.context_top_k or keep workspace files concise to avoid this.
Sub-agents
The delegate_task tool spawns child agents for isolated tasks. Sub-agents:
- Receive an ephemeral system prompt with the task instructions
- Skip workspace context files (
skip_context_files=True) - Run in quiet mode (
quiet_mode=True) - Output a plain-text result back to the parent agent