Configuration Reference

MushroomAgent's primary agent runtime config lives in ~/.mushroom_agent/config.yaml. This page documents every section and field.

Configuration is loaded in order:

MUSHROOM_ROOT/config/config.yaml (project root)
~/.mushroom_agent/config.yaml (local overrides)

Local settings override fields with the same name.

mushroom-agent init also creates two sidecar files next to config.yaml:

File	Purpose
`hardware.yaml`	Local device inputs and outputs for `mushroom-agent start` and `mushroom-agent node attach`
`remote.yaml`	Remote node attachment settings for `mushroom-agent node attach`

The shortest local chat path only requires editing config.yaml and setting llm.api_key.

`hardware.yaml`

hardware.yaml configures local device inputs and outputs for mushroom-agent start and mushroom-agent node attach.

For camera input, hardware.video_input.width, hardware.video_input.height, and hardware.video_input.jpeg_quality define the image size and JPEG compression used by RTC voice sessions and the node video preview. Camera frame sizing and compression are owned by hardware.yaml, not config.yaml.

Camera proximity greeting is configured under hardware.video_input.proximity:

hardware:
  video_input:
    enabled: true
    devices:
      - device_index: 0
        camera_id: "camera_0"
      - device_index: 1
        camera_id: "camera_1"
    proximity:
      enabled: true
      activate_cameras: ["camera_0"]
      sample_interval_ms: 300
      near_confidence: 0.6
      near_consecutive: 2
      far_consecutive: 3
      near_face_ratio: 0.04
      near_exit_ratio_margin: 0.02
      backend: "onnxrt"

Field	Type	Default	Description
`enabled`	bool	`false`	Enable the best-effort proximity observer
`activate_cameras`	string[]	`[]`	Optional list of `video_input.devices[*].camera_id` values used by proximity detection
`sample_interval_ms`	int	`300`	Minimum interval between proximity samples
`near_confidence`	float	`0.6`	Minimum detector confidence required before a face can be treated as near
`near_consecutive`	int	`2`	Number of consecutive near samples required before emitting `user_approached_camera`
`far_consecutive`	int	`3`	Number of consecutive far samples required before re-arming another approach event
`near_face_ratio`	float	`0.04`	Largest-face area ratio required to enter the near state
`near_exit_ratio_margin`	float	`0.02`	Amount subtracted from `near_face_ratio` before an already-near user is treated as far
`backend`	string	`"onnxrt"`	Face inference backend: `"onnxrt"` or `"tensorrt"`

When proximity is enabled, the node samples video frames before normal upload and emits user_approached_camera only after a debounced approach transition. If activate_cameras is empty, proximity uses the same stitched camera frame as the normal video stream. If it contains one or more camera IDs, proximity crops those camera segments from the stitched frame, re-stitches only the selected cameras in video_input.devices order, and runs detection on that temporary sample frame. The normal input_video.append path is unchanged and continues to upload all configured video input cameras.

near_confidence and near_face_ratio must both pass before a sample is treated as near. near_consecutive and far_consecutive control how quickly the runtime enters and exits the near state. Once the user is near, the runtime uses near_exit_ratio_margin to derive the lower exit threshold so slight movement around the boundary does not re-arm greetings. Face detector and embedder models are resolved from ~/.mushroom_agent/models/ or downloaded from the default model repositories when available. If identity recognition is enabled and the transition-time face embedding matches a registered identity with display_name, the greeting system event includes that display name; raw images and raw identity IDs are not included.

`agent`

Agent identity and runtime settings.

agent:
  id: "local-agent"
  name: "LocalAgent"

Field	Type	Default	Description
`id`	string	—	Unique agent identifier
`name`	string	—	Display name for the agent
`ephemeral_system_prompt`	string	`""`	Per-turn temporary instructions (used by sub-agents)
`skip_context_files`	bool	`false`	Skip workspace context files (AGENTS.md, SOUL.md, IDENTITY.md)
`quiet_mode`	bool	`false`	Suppress non-essential output (used by sub-agents)

`llm`

LLM provider configuration.

llm:
  api_type: "openai"
  api_key: "sk-..."
  base_url: "https://api.openai.com/v1"
  model: "gpt-5.2"
  temperature: 0.0
  max_completion_tokens: 4096
  stream: false
  timeout: 600
  proxy: ""

Field	Type	Default	Description
`api_type`	string	—	Provider type: `openai`, `deepseek`, `anthropic`, `gemini`, etc.
`api_key`	string	—	API key (also picked up from `LLM_API_KEY` env var)
`base_url`	string	—	API endpoint URL
`api_version`	string	—	API version (Azure)
`model`	string	—	Model name
`temperature`	float	`0.0`	Sampling temperature
`top_p`	float	`1.0`	Nucleus sampling
`top_k`	int	`0`	Top-k sampling
`max_completion_tokens`	int	`4096`	Maximum output tokens
`stream`	bool	`false`	Enable streaming responses
`timeout`	int	`600`	Request timeout in seconds
`proxy`	string	—	Proxy URL (overrides global proxy)

llm.vision only controls native vision enablement, provider detail mode, and the maximum number of images attached to one model turn. It does not expose camera sizing, JPEG quality, image byte limits, or fetch timeout settings.

For the full list of supported providers, see LLM Providers.

`tools`

Tool enablement and configuration.

tools:
  read:
    enabled: true
  exec:
    enabled: true
    allowlist:
      - curl
      - python
      - python3
      - ls
      - rg
      - cat
      - pwd
    default_timeout: 120
    max_output_chars: 80000

Field	Type	Default	Description
`tools.manage_schedule.enabled`	bool	`true`	Enable schedule management tool
`tools.skip_voice_reply.enabled`	bool	`true`	Enable voice skip-reply tool
`tools.emit_ui_command.enabled`	bool	`false`	Enable UI command emit tool
`tools.memory_search.enabled`	bool	`true`	Enable long-term memory search tool
`tools.update_identity_profile.enabled`	bool	`false`	Enable identity profile update tool
`tools.read.enabled`	bool	`true`	Enable file read tool
`tools.exec.enabled`	bool	`true`	Enable shell exec tool
`tools.exec.allowlist`	list	`[]`	Commands allowed in `exec`
`tools.exec.default_timeout`	int	`1800`	Default timeout for exec commands (seconds)
`tools.exec.max_output_chars`	int	`200000`	Max output chars captured
`tools.write.enabled`	bool	`false`	Enable file write tool
`tools.web_fetch.enabled`	bool	`false`	Enable web fetch tool
`tools.web_fetch.timeout`	int	`20`	Web fetch request timeout (seconds)
`tools.web_fetch.max_bytes`	int	`200000`	Max bytes to fetch
`tools.api_request.enabled`	bool	`false`	Enable API request tool
`tools.web_search.enabled`	bool	`false`	Enable web search tool
`tools.process.enabled`	bool	`false`	Enable long-running process management
`tools.delegate_task.enabled`	bool	`false`	Enable task delegation (sub-agents)
`tools.delegate_task.max_depth`	int	`2`	Max sub-agent nesting depth
`tools.delegate_task.max_iterations`	int	`20`	Max iterations per sub-agent

In the local starter created by mushroom-agent init, read.cwd, exec.cwd, and write.cwd are intentionally omitted. These tools fall back to workspace.path automatically.

Available tools: read, write, exec, web_fetch, api_request, process, delegate_task, manage_schedule, skip_voice_reply.

`channels`

Platform channel configuration.

channels:
  feishu:
    enabled: false
    app_id: ""
    app_secret: ""
    verification_token: ""
    encrypt_key: ""
  discord:
    enabled: false
    bot_token: ""
  livekit:
    enabled: false
    url: ""
    api_key: ""
    api_secret: ""

Field	Type	Default	Description
`feishu.enabled`	bool	`false`	Enable Feishu bot
`feishu.app_id`	string	—	Feishu app ID
`feishu.app_secret`	string	—	Feishu app secret
`feishu.verification_token`	string	—	Feishu verification token
`feishu.encrypt_key`	string	—	Feishu encrypt key
`discord.enabled`	bool	`false`	Enable Discord bot
`discord.bot_token`	string	—	Discord bot token
`livekit.enabled`	bool	`false`	Enable LiveKit voice
`livekit.url`	string	—	LiveKit server URL
`livekit.api_key`	string	—	LiveKit API key
`livekit.api_secret`	string	—	LiveKit API secret

`rtc`

Realtime voice mode switch and VAD settings.

rtc:
  enabled: false
  voice_mode: "realtime"
  vad_mode: "vad"
  allow_interruptions: true

Field	Type	Default	Description
`enabled`	bool	`false`	Enable voice-mode preflight and runtime voice configuration
`voice_mode`	string	`"realtime"`	Voice runtime mode
`vad_mode`	string	`"vad"`	Voice activity detection mode
`allow_interruptions`	bool	`true`	Default voice interruption behavior for realtime sessions when clients do not provide `session.create.config.allow_interruptions`

When rtc.enabled is false, mushroom-agent serve --ui and normal websocket chat do not require tts.tts_token or realtime_llm.api_key.

`tts`

Text-to-speech configuration.

tts:
  provider: "minimax"
  group_id: ""
  tts_url: "wss://api.minimax.io"
  tts_token: "YOUR_TTS_TOKEN"
  voice_id: "Chinese (Mandarin)_Warm_Bestie"

Field	Type	Default	Description
`provider`	string	`"minimax"`	TTS provider
`group_id`	string	—	Provider group ID
`tts_url`	string	—	TTS WebSocket URL
`tts_token`	string	—	TTS auth token
`voice_id`	string	—	Voice identifier

`realtime_llm`

Realtime LLM for voice mode.

realtime_llm:
  model: "gpt-realtime"
  api_key: "YOUR_REALTIME_API_KEY"

Field	Type	Default	Description
`model`	string	`"gpt-realtime"`	Realtime model name
`api_key`	string	—	API key for realtime (falls back to `llm.api_key`)

`skills`

Skills configuration.

skills:
  enabled: true
  roots: []
  include_defaults: true
  explicit_only: false
  max_active: 3
  active: []
  allowlist: []
  blocklist: []
  inject_mode: "hybrid"
  summary_max_chars: 280
  content_max_chars: 2000
  refresh_interval: 300

Field	Type	Default	Description
`enabled`	bool	`true`	Enable skills system
`roots`	list	`[]`	Additional skill directories beyond `workspace.path/skills`
`include_defaults`	bool	`true`	Include workspace skills
`explicit_only`	bool	`false`	Only load explicitly requested skills
`max_active`	int	`3`	Max concurrently active skills
`active`	list	`[]`	Always-active skill names
`allowlist`	list	`[]`	Only allow these skills
`blocklist`	list	`[]`	Block these skills
`inject_mode`	string	`"hybrid"`	`"summary"`, `"full"`, or `"hybrid"`
`summary_max_chars`	int	`280`	Max chars in summary
`content_max_chars`	int	`2000`	Max chars when loading full content
`refresh_interval`	int	`300`	Auto-refresh interval (seconds)

`workspace`

Workspace directory settings.

workspace:
  path: "/absolute/path/to/workspace"

Field	Type	Default	Description
`path`	string	—	Workspace directory path
`use_uid`	bool	—	Use UID-based path

The local starter created by mushroom-agent init treats workspace.path as the only required path anchor. Logs, workspace skills/, memory storage, and tool working directories are derived internally from this directory.

`memory`

Memory and knowledge retrieval.

memory:
  context_top_k: 12
  chat_mp_top_k: 1
  knowledge_top_k: 2
  similar_threshold: 0.01
  knowledge_similar_threshold: 0.1

Field	Type	Default	Description
`context_top_k`	int	`12`	Recent context items to retrieve
`chat_mp_top_k`	int	`1`	Chat memory items to retrieve
`knowledge_top_k`	int	`2`	Knowledge items to retrieve
`similar_threshold`	float	`0.01`	Similarity threshold for context
`knowledge_similar_threshold`	float	`0.1`	Similarity threshold for knowledge

`embedding`

Embedding model configuration.

embedding:
  api_type: "openai"
  api_key: ""
  base_url: "https://api.openai.com/v1"
  model: "text-embedding-3-small"
  dimensions: 1536

Field	Type	Default	Description
`api_type`	string	—	`openai` or `dashscope`
`api_key`	string	—	API key
`base_url`	string	—	API endpoint URL
`model`	string	—	Embedding model name
`dimensions`	int	`1536`	Embedding vector dimensions
`embed_batch_size`	int	—	Batch size for embedding

`scheduler`

Scheduled task settings.

scheduler:
  execution_timeout_seconds: 300

Field	Type	Default	Description
`execution_timeout_seconds`	float	`300.0`	Max execution time for scheduled tasks (seconds)
`one_shot_retry_max_attempts`	int	`3`	Max retry attempts for one-shot tasks
`error_auto_disable_threshold`	int	`3`	Consecutive errors before auto-disabling a task

`heartbeat`

Periodic heartbeat for proactive agent checks.

heartbeat:
  enabled: true
  every_seconds: 300
  run_on_startup: false
  active_hours: []
  prompt: ""

Field	Type	Default	Description
`enabled`	bool	`true`	Enable periodic heartbeats
`every_seconds`	int	`300`	Interval between heartbeats (seconds)
`run_on_startup`	bool	`false`	Run a heartbeat immediately on startup
`wake_coalesce_ms`	int	`250`	Coalesce window for wake events (ms)
`wake_retry_ms`	int	`1000`	Retry interval for failed wake events (ms)
`ack_max_chars`	int	`300`	Max characters for heartbeat acknowledgment
`conversation_ttl_seconds`	int	`0`	Conversation inactivity TTL (0 = disabled)
`timezone`	string	`""`	Timezone for active hours (e.g., `"Asia/Shanghai"`)
`active_hours`	list	`[]`	Restrict heartbeats to these hours (e.g., `["09:00-18:00"]`)
`prompt`	string	—	Custom heartbeat prompt (defaults to built-in)

`server.accesskeys`

Agent server and remote node authentication accesskey. Use mushroom-agent accesskey to manage this list; accesskey plaintext is shown only when created. Accesskeys do not expire automatically; revoke them with disable or delete.

server:
  accesskeys:
    - id: "akid_example"
      name: "Kitchen Pi"
      accesskey_hash: "<sha256>"
      accesskey_preview: "ak_abcd...wxyz"
      enabled: true
      created_at: "2026-05-19T00:00:00Z"
      intended_node_id: "pi-kitchen"
      description: ""

hardware.yaml​

agent​

llm​

tools​

channels​

rtc​

tts​

realtime_llm​

skills​

workspace​

memory​

embedding​

scheduler​

heartbeat​

server.accesskeys​