Review Shengming Zhang 1 , MSc, BSc ; Chaohai Zhang 1 , BSc ; Jiaxin Zhang 1, 2 , PhD 1School of Automation and Intelligent Manufacturing, Southern University of Science and Technology, Shenzhen, Guangdong, China 2Guangdong Provincial Key Laboratory of Fully Actuated System Control Theory and Technology, School of Automation and Intelligent Manufacturing, Southern University of Science

Frontiers | Adaptive multi-mode locomotion for bipedal wheel-legged robots via sparse mixture-of-experts deep reinforcement learning
Your new experience awaits. Try the new design now and help us make it even better
BRIEF RESEARCH REPORT article
Front. Robot. AI
Sec. Robot Learning and Evolution
- Beijing Institute of Technology, Beijing, China
The bipedal wheel-legged robot combines the high energy efficiency of wheeled movement with the 8 terrain adaptability of legged locomotion. However, achieving a smooth transition between these two 9 heterogeneous motion modes within a unified control framework remains challenging. This study 10 proposes a reinforcement learning control framework that integrates the Mixture of Experts (MoE) 11 architecture. This approach employs a “divide and conquer” strategy by introducing a dynamic gating 12 network and a Top-K sparse activation mechanism, which automatically allocates different motion 13 modes to specific expert subnetworks, effectively decoupling conflicting gradients. Simulation 14 results demonstrate that, compared to the single-network PPO method, the MoE-enhanced algorithm 15 exhibits significant improvements in training stability and rewards. The learned policy successfully 16 achieved smooth rolling on flat surfaces and transitioned to dynamic leg-lifting gaits when 17 confronted with obstacles. In various test terrains, it showed a markedly higher success rate 18 compared to the single-network PPO method.
Keywords: Bipedal wheel-legged robot1, Curriculum learning5, Gradient conflict4, mixture of experts3, Reinforcement learning2
Received: 15 Jan 2026; Accepted: 11 Feb 2026.
Copyright: © 2026 He, Zhao, Duan, Wang and Lei. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Zeang Zhao
Shengyu Duan
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
