MushroomAgent

Distributed Embodied Intelligence

One mind, orchestrating everything. Thinking belongs to AI — sensing belongs to the world.

We break the physical boundaries of hardware with distributed architecture, orchestrating a cross-spatial "digital body" for AI.

Hear · Speak — real-time conversational context:
microphones and speakers form a responsive acoustic system for fluid agent interaction.

See · Remember — environmental semantic understanding:
cameras capture physical moments; AI interprets the environment in real time, giving intelligence the depth of "memory."

Act · Do — cross-device action orchestration:
hardware interfaces become virtual hands, turning cloud-based thought into precise physical action.

Get Started

Install an agent first, then attach nodes when you need separate device I/O.

Install Guide

Official Linux/macOS scripts: install.sh for agent, install_node.sh for node, plus uninstall.

Open the chat

After starting the agent, the browser chat is at /i/chat.

What is MushroomAgent?

MushroomAgent is a distributed agent runtime with two roles:

Agent — the decision side. It receives text, voice, video, device events, and other context, calls the model, decides what should happen next, and dispatches actions. The agent can run by itself on one machine with mushroom-agent start, so a separate node is optional.
Node — the device I/O side. A node collects information such as microphone audio, camera video, text, and local device events, forwards it to the agent, then executes actions returned by the agent. Those actions can be voice output, UI work, robot movement, or other device-specific behavior.

MushroomAgent is for developers building systems that span platforms — chat bots, voice assistants, IoT controllers, and hardware-accelerated agents — without running separate instances per surface.

How it works

channel → communication → sensor → agent → think → skill

Input enters through channels — Feishu, Discord, HTTP, WebSocket. The sensor layer processes text, voice, and files into structured perception. The agent assembles context from conversation history, workspace files, tools results, and loaded Skills, then hands it to the think engine. The LLM decides what to do: reply, execute a tool, or dispatch a device action. Results flow back the same way.

Mode	Think location	Input/output location	Command
Standalone agent	This device	This device	`mushroom-agent start`
Agent + nodes	Agent host	Each attached node	`mushroom-agent start` on agent, `mushroom-agent node attach` on nodes

Key capabilities

Agent / node roles

Agent decides; nodes collect input and execute device output.

Multi-channel

Feishu, Discord, HTTP, WebSocket — one agent serves them all.

Voice mode

Realtime voice with VAD, TTS, and streaming LLM. Speak to your agent naturally.

Built-in tools

Shell exec, file I/O, web fetch, API calls, task delegation, and scheduling.

Skills

Domain knowledge and procedures loaded on demand. Write your own or install from the Skills Hub.

Workspace context

AGENTS.md for rules, SOUL.md for persona, IDENTITY.md for self-concept — customize the agent.

Quick start

Start with agent mode. The agent can run alone and is enough for local chat or local runtime use. Add node mode later when another device should collect voice/video/input and execute actions for that agent.

See Getting Started for the install command and the node-mode path.