MushroomAgent
One mind, orchestrating everything. Thinking belongs to AI — sensing belongs to the world.
We break the physical boundaries of hardware with distributed architecture, orchestrating a cross-spatial "digital body" for AI.
Hear · Speak — real-time conversational context:
microphones and speakers form a responsive acoustic system for fluid agent interaction.
See · Remember — environmental semantic understanding:
cameras capture physical moments; AI interprets the environment in real time, giving intelligence the depth of "memory."
Act · Do — cross-device action orchestration:
hardware interfaces become virtual hands, turning cloud-based thought into precise physical action.
Install an agent first, then attach nodes when you need separate device I/O.
Official Linux/macOS scripts: install.sh for agent, install_node.sh for node, plus uninstall.
After starting the agent, the browser chat is at /i/chat.
What is MushroomAgent?
MushroomAgent is a distributed agent runtime with two roles:
-
Agent — the decision side. It receives text, voice, video, device events, and other context, calls the model, decides what should happen next, and dispatches actions. The agent can run by itself on one machine with
mushroom-agent start, so a separate node is optional. -
Node — the device I/O side. A node collects information such as microphone audio, camera video, text, and local device events, forwards it to the agent, then executes actions returned by the agent. Those actions can be voice output, UI work, robot movement, or other device-specific behavior.
MushroomAgent is for developers building systems that span platforms — chat bots, voice assistants, IoT controllers, and hardware-accelerated agents — without running separate instances per surface.
How it works
channel → communication → sensor → agent → think → skill
Input enters through channels — Feishu, Discord, HTTP, WebSocket. The sensor layer processes text, voice, and files into structured perception. The agent assembles context from conversation history, workspace files, tools results, and loaded Skills, then hands it to the think engine. The LLM decides what to do: reply, execute a tool, or dispatch a device action. Results flow back the same way.
| Mode | Think location | Input/output location | Command |
|---|---|---|---|
| Standalone agent | This device | This device | mushroom-agent start |
| Agent + nodes | Agent host | Each attached node | mushroom-agent start on agent, mushroom-agent node attach on nodes |
Key capabilities
Agent decides; nodes collect input and execute device output.
Feishu, Discord, HTTP, WebSocket — one agent serves them all.
Realtime voice with VAD, TTS, and streaming LLM. Speak to your agent naturally.
Shell exec, file I/O, web fetch, API calls, task delegation, and scheduling.
Domain knowledge and procedures loaded on demand. Write your own or install from the Skills Hub.
AGENTS.md for rules, SOUL.md for persona, IDENTITY.md for self-concept — customize the agent.
Quick start
Start with agent mode. The agent can run alone and is enough for local chat or local runtime use. Add node mode later when another device should collect voice/video/input and execute actions for that agent.
See Getting Started for the install command and the node-mode path.
Explore the docs
Think loop, system prompt, workspace, and the core pipeline.
All built-in tools, creating Skills, and the Skills Hub.
Feishu, Discord — configure platform channels.
LLM providers, embedding, and OpenAI-compatible services.
init, serve, node, admin — all commands and options.
Every config.yaml section and field, with types and defaults.
Troubleshooting and common issues across platforms.