Voice Interaction

Models and Repository Links

LLM Model: Qwen2.5-1.5B
ASR Model: SenseVoice
TTS and KWS Models: Sherpa-onnx
Audio Stream Processing: https://github.com/D-Robotics/magicbox_audio_io
Large Language Model: https://github.com/D-Robotics/magicbox_qwen_llm
Lighting Control Library: https://github.com/D-Robotics/magicbox_lighting_control

Audio Stream Processing Function Description

Note

Currently, only Chinese interaction is supported.

For instructions on using the voice interaction feature, refer to Quick Start Voice Interaction.
In addition to the default continuous conversation mode, it also supports a continuous wake-up mode, i.e., "one wake-up, one conversation."
- You can wake up the device by saying "Hello, Digua" to initiate a single conversation.
- You can also directly start a conversation by saying "Hello, Digua + your question."
After detecting "Hello Sweet Potato," the light will flash. You can enable or disable the continuous wake-up mode by modifying the continuous_wake_mode setting in /userdata/magicbox/app/ros_ws/src/magicbox/magicbox_audio_io/launch/audio_io.launch.py.
- How to set: Use the script launch command ros2 launch audio_io audio_io.launch.py continuous_wake_mode:=true.
This function is used together with qwen_llm, so it will wait for qwen_llm to start. If you want to use it independently, you can switch the wait_for_llm setting in /userdata/magicbox/app/ros_ws/src/magicbox/magicbox_audio_io/launch/audio_io.launch.py.
- How to set: Use the script launch command ros2 launch audio_io audio_io.launch.py wait_for_llm:=false.
After the TTS audio playback is complete and the end message is received, the microphone will be reactivated (with the light constantly on). The end message is primarily used to confirm that the large language model has finished outputting.
Due to the long initialization time of the audio processing node, it starts automatically after booting. The button only controls the start and stop of the large language model node, but both nodes will wait for each other to start before functioning.
For more parameter details and compilation instructions, refer to the README file in the code repository.

Large Language Model Function Description

The default model path is /dev/shm/qwen2.5-1.5b-instruct-q5_k_m.gguf, so the model must be placed in memory first to avoid long initialization times when starting with the button. To change the model path, modify the llm_model_path parameter in /userdata/magicbox/app/ros_ws/src/magicbox/magicbox_qwen_llm/launch/qwen_llm.launch.py.
This functionality package is used by default with audio_io, so it will block and wait for the audio_io node to start. If you want to use it independently, you can switch it via wait_for_audio in /userdata/magicbox/app/ros_ws/src/magicbox/magicbox_qwen_llm/launch/qwen_llm.launch.py.
- Configuration method: Use the script startup command ros2 launch qwen_llm qwen_llm.launch.py wait_for_audio:=false.
For more parameter details and compilation instructions, refer to the README file in the code repository.

Functional Architecture Diagram

Startup Steps

Use the following command to start audio stream processing.

# Start audio stream processing
cd /userdata/magicbox
source /opt/tros/humble/setup.bash
source app/ros_ws/install/setup.bash
ros2 launch audio_io audio_io.launch.py

Open a new terminal and use the following command to start the large language model.

warning
Please specify the model path llm_model_path in /userdata/magicbox/app/ros_ws/src/magicbox/magicbox_qwen_llm/launch/qwen_llm.launch.py first.
```
# Start the large language model
cd /userdata/magicbox
source /opt/tros/humble/setup.bash
source app/ros_ws/install/setup.bash
ros2 launch qwen_llm qwen_llm.launch.py 
```

Audio Stream Processing Function Description​

Large Language Model Function Description​

Functional Architecture Diagram​

Startup Steps​

Audio Stream Processing Function Description

Large Language Model Function Description

Functional Architecture Diagram

Startup Steps