Your AI agent worked in the demo, impressed stakeholders, handled test scenarios, and seemed ready for production. Then you deployed it, and the picture changed. Real users experienced wrong tool calls, inconsistent responses, and failure modes nobody anticipated during testing. The result is a gap between expected agent behavior and actual user experience in production.

A
Abstract
This paper presents a novel Deep Q-Learning (DQL) framework for multi-robot navigation that addresses the critical problem of target congestion in swarm robotics systems. The framework employs a Centralized Training with Decentralized Execution (CTDE) paradigm, where a single DeepQ-Network (DQN) agent learns a global coordination policy during offline training, and individual robots execute learned policies using only local Radio-Frequency Identification (RFID) based observations during deployment. Scalability refers to decentralized execution, while training is conducted under fixed swarm configurations. Unlike traditional vision-based obstacle-avoidance methods, we propose an RFID-based information-processing module combined with Robot Constraint Rules (RCR) that provide explicit heuristic guidance for flow regulation, while the DQN learns to optimize its application. The Probabilistic Finite State Machines (PFSM) and RCR modules define structured safety constraints, while coordination strategies are learned within this constrained action space. The system is modeled as a Discrete-time Markov Decision Process (DTMDP), which is subsequently integrated with PFSM to capture both reactive behaviors and learned coordination strategies. The PFSM framework enables the formalization of human-traffic-inspired behaviors (such as following, lane-changing, etc.) that arise within the constrained decision-making framework through reward-driven learning rather than hard-coded rules. The framework demonstrates empirically stable convergence under structured constraints. Simulation results demonstrate that our method significantly reduces the average robot workload by up to 66% and improves system efficiency by 51% compared to baseline reactive collision-avoidance methods without congestion control, while maintaining target occupancy density below 50% and ensuring deadlock-free operation even with 100 robots.
Introduction
In recent years, swarm robotics has emerged as a transformative paradigm for accomplishing complex tasks through the coordinated operation of large numbers of simple, low-cost robots [1], [2], [3]. Unlike traditional single-robot systems that rely on sophisticated sensing, computation, and actuation capabilities concentrated in individual units, swarm robotics distributes functionality across many agents, offering significant advantages in robustness (graceful degradation under failures), flexibility (adaptation to diverse tasks), and scalability (performance improvement with additional robots) [4]. These characteristics make swarm robotics particularly attractive for applications in warehouse automation [5], search and rescue operations [6], environmental monitoring [7], and precision agriculture, where tasks can be naturally decomposed and parallelized across multiple agents.
However, as swarm size grows, target congestion, where multiple robots simultaneously converge on shared goal locations, becomes a critical bottleneck that can degrade throughput by up to 70% and induce system-wide deadlock [8], [9], [10]. This problem is distinct from general collision avoidance: it is spatially concentrated at predictable locations, temporally dynamic with stochastic arrival patterns, and exhibits non-linear degradation with sharp phase transitions beyond critical density thresholds. Existing approaches, centralized planning, reactive avoidance, spatial partitioning, and learning-based methods, each address only a subset of these characteristics, leaving a significant gap in scalable, proactive congestion prevention.
This paper bridges that gap by presenting a Deep Q-Learning (DQL) framework that integrates Radio-Frequency Identification (RFID) based infrastructure with adaptive reinforcement learning to regulate robot flow into congested regions before congestion occurs. Our main contributions are:
- 1.
A hybrid Centralized Training and Decentralized Execution (CTDE)–RFID architecture enabling scalable deployment without centralized control;
- 2.
A Probabilistic Finite State Machines (PFSM) constrained Discrete-time Markov Decision Process (DTMDP) formulation combining interpretable finite-state behaviors with learned coordination;
- 3.
Flow regulated congestion aware reward shaping through Robot Constraint Rules (RCR) that enforce safety while permitting adaptive optimization;
- 4.
Comprehensive empirical validation against Multi-Agent Reinforcement Learning (MARL) baselines (QMIX, Counterfactual Multi-Agent Policy Gradient (COMA), Independent Deep Q-Network (I-DQN)) demonstrating 87.6% efficiency at 100 robots with deadlock-free operation.
The resulting framework is theoretically grounded and empirically validated, rather than provably optimal.
Our proposed framework is inspired by human traffic management systems, such as traffic lights, flow metering, and lane control, as well as transportation theory, including queuing models and flow dynamics. These concepts inform our method for regulating robot entry into congested regions while maintaining smooth bidirectional flows. Specifically, we address key limitations in existing approaches as follows. (i) Flow regulation is implemented through an RFID-based entry and exit control module, which manages robot admission based on real-time density measurements rather than relying on post-hoc collision avoidance. (ii) Predictive coordination is achieved using a DQN that learns anticipatory policies, such as proactively slowing before arrival when target occupancy approaches capacity, by optimizing cumulative discounted rewards over a planning horizon. (iii) Entry and exit balance is maintained through zone geometry design, which features separated entry and exit sectors and distributed exit angles. (iv) Scalability is ensured by the CTDE architecture: training utilizes the global state, while execution is fully decentralized with constant per-robot computation and linear communication complexity. (v) Automatic parameter tuning replaces manual threshold selection, as the DQN jointly optimizes admission thresholds, detour triggers, and speed-adjustment parameters through reward maximization. (vi) Theoretical grounding is established through formal DTMDP convergence analysis under tabular assumptions (Theorem 1, Remark 1), with explicit acknowledgment of the gap to practical DQN and supporting empirical convergence evidence.
The central insight of this framework is the modeling of target congestion as a DTMDP, in which system states represent robot distributions across behavioral modes (waiting, navigating, following, detouring, agitated) and target-area density. Concurrently, actions regulate admission policies and behavioral triggers via an RFID-based coordination module. By employing DQN to learn high-quality policies, the system adaptively balances competing objectives: maximizing throughput, minimizing waiting times, reducing energy consumption, and preventing deadlocks. Fig. 1 presents an overview of the complete system architecture.
The remainder of this paper is organized as follows. Section 2 reviews the related work. Section 3 introduces the DQL framework for multi-robot anti-congestion navigation. Section 4 presents a comprehensive simulation-based evaluation. Section 5 discusses the theoretical contributions and practical implications. Finally, Section 6 concludes the paper and outlines directions for future research.
Access through your organization
Check access to the full text by signing in through your organization.
Access through your organization
Section snippets
Related work
Multi-robot congestion control occupies a central position among several interrelated research domains. This section first formally characterizes the congestion problem and differentiates it from traditional collision avoidance. Subsequently, it reviews four principal paradigms addressing this challenge: centralized navigation planning, reactive collision avoidance, spatial partitioning strategies, and learning-based adaptive methods. For each paradigm, methodological strengths and inherent
Proposed deep Q-learning framework for multi-robot anti-congestion navigation
This section presents our DQL framework that integrates RFID-based coordination with adaptive reinforcement learning to solve target congestion in large-scale robot swarms. The framework consists of three integrated components: (1) a rigorous DTMDP formulation that captures robot behaviors and congestion dynamics, (2) an RFID-based information process module for real-time state estimation and command broadcasting, and (3) DQN learning for high-quality policy discovery with RCRs that translate
Simulation evaluation
To validate the proposed DQN-based anti-congestion navigation framework, we conducted comprehensive simulations.
DQN training analysis
Fig. 5 illustrates the complete training dynamics of our DQN-based framework over 1000 episodes. The four subplots demonstrate successful convergence across multiple metrics.
The training progression reveals four distinct learning phases. Subplot (a) shows episode rewards converging from highly negative values (exploration phase, episodes 0-200) to consistently positive rewards (stable policy phase, episodes 800-1000). The 50 episode moving average smooths short-term variance and clearly
Conclusion
This paper presented a novel Deep Q-Learning based framework for multi-robot anti-congestion navigation that explicitly targets flow regulation at shared destinations rather than reactive collision avoidance. By integrating learning-based decision making with structured, rule-constrained coordination, the proposed approach addresses the dual challenges of scalability and safety in large-scale robot swarms.
The primary contributions of this work are fourfold. First, we introduced a CTDE
CRediT authorship contribution statement
Ning Wang: Writing – original draft, Methodology, Formal analysis, Data curation, Conceptualization. Hamid R. Parsaei: Writing – review & editing, Supervision, Conceptualization. Yali Ren: Writing – review & editing, Validation.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Hamid R. Parsaei, Ph.D., P.E. is a Professor in the Wm. Michael Barnes ’64 Department of Industrial and Systems Engineering and Director of Accreditation and Assessment of the College of Engineering at Texas A&M University. He is an internationally recognized leader in engineering education, manufacturing automation, economic and financial decision-making, leadership, and sustainability, with more than four decades of academic experience. Dr. Parsaei served as Associate Dean of Academic Affairs
References (49)
- K. Hornik et al.
Multilayer feedforward networks are universal approximators
Neural Netw.
(1989)
- S. Mayya et al.
Voluntary retreat for decentralized interference reduction in robot swarms
- P.G.F. Dias et al.
Swarm robotics: a perspective on the latest reviewed concepts and applications
Sensors.
(2021)
- M. Schranz et al.
Swarm robotic behaviors and current applications
Front. Robot. AI.
(2020)
- M. Dorigo et al.
Swarm robotics: past, present, and future
Proc. IEEE
(2021)
- K. Azadeh et al.
Robotized and automated warehouse systems: review and recent developments
Transp. Sci.
(2019)
- J.P. Queralta et al.
Collaborative mapping with IoE-based heterogeneous vehicles for enhanced situational awareness
IEEE Sens. J.
(2020)
- E. Hernandez et al.
Aerial robotic systems for environmental monitoring
Sensors.
(2021)
- Z. Wang et al.
Adaptive control barrier functions
IEEE Trans. Automat. Control.
(2020)
- D. Zhou et al.
Fast, on-line collision avoidance for dynamic vehicles using buffered voronoi cells
IEEE Robot. Autom. Lett.
(2021)
The impact of diversity on optimal control policies for heterogeneous robot swarms
IEEE Trans. Robot.
(2021)
Path planning for robotic mobile fulfillment systems
IEEE Trans. Autom. Sci. Eng.
(2019)
Avoiding target congestion on the navigation of robotic swarms
Auton. Robot.
(2017)
Planning paths for package delivery in heterogeneous multirobot teams
IEEE Trans. Autom. Sci. Eng.
(2020)
Multi-agent pathfinding: definitions, variants, and benchmarks
Congestion-aware policy synthesis for multirobot systems
IEEE Trans. Robot.
(2021)
ALAN: adaptive learning for multi-agent navigation, auton
Robot.
(2018)
Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning, in: proc
PRIMAL: pathfinding via reinforcement and imitation multi-agent learning
IEEE Robot. Autom. Lett.
(2019)
Collision avoidance for aerial vehicles in multi-agent scenarios
Auton. Robots.
(2015)
Scalable cooperative transport of cable-suspended loads with UAVs using distributed trajectory optimization
IEEE Robot. Autom. Lett.
(2020)
Disjoint splitting for multi-agent path finding with conflict-based search
Proc. Int. Conf. Automat. Plan
(2019)
Robust and efficient quadrotor trajectory generation for fast autonomous flight
IEEE Robot. Autom. Lett.
(2020)
Reactive collision avoidance of multiple moving agents by cooperation and conflict propagation
Cited by (0)

Hamid R. Parsaei, Ph.D., P.E. is a Professor in the Wm. Michael Barnes ’64 Department of Industrial and Systems Engineering and Director of Accreditation and Assessment of the College of Engineering at Texas A&M University. He is an internationally recognized leader in engineering education, manufacturing automation, economic and financial decision-making, leadership, and sustainability, with more than four decades of academic experience. Dr. Parsaei served as Associate Dean of Academic Affairs at Texas A&M University-Qatar and as Chair of the Department of Industrial Engineering at the University of Houston. His co-authored book, Reconfigurable Manufacturing Enterprises for Industry 4.0 (with Ibrahim H. Garbie; CRC Press), received the IISE Joint Publishers Book of the Year Award in 2022. His most recent book, Sustainability 4.0: Objectives, Challenges, and Expectations (with Ibrahim H. Garbie, CRC Press), was released in November 2025. Dr. Parsaei is a Life Fellow of the Institute of Industrial and Systems Engineers (IISE), the Society of Manufacturing Engineers (SME), the American Society for Engineering Education (ASEE), and the Industrial Engineering and Operations Management Society International (IEOM). He has been a registered Professional Engineer in Texas for over 36 years.

Ning Wang received his B.S. degree in Information Management System from China Agriculture University (CAU), Beijing, China, in 2002. He received the M.S degree in Software Science from Hong Kong Polytechnic University (HKPU) in 2008 and the M.S degree in Computer Science from Texas Southern University (TSU) in 2014. He received his Ph.D. degree in Electrical Engineering from the University of Houston (UH) in 2017. He is a Senior Research Scientist at Baylor Genetics Lab. He received the COSET Distinguished Graduate Student Award from Texas Southern University in 2014. He published over 27 peerreviewed journal papers, 1 book, and 14 conference papers. His research interests include Embedded AI, Reinforcement Learning / Deep Reinforcement Learning, Internet of Things (IoT), and Edge Computing. He is a member of IEEE.

Yali Ren received her master’s degree in computer science from the Georgia Institute of Technology in 2018. She earned her Ph.D. in Computer Science from National University in 2024. She was the first recipient of the Georgia Tech SnapTravel STEM Fellowship Award in Computer Science. She is currently an Instructional Associate at the School of Computer Science, College of Computing, Georgia Institute of Technology. Her research interests include data science, machine learning, artificial intelligence, and cybersecurity. She is a member of Institute of Electrical and Electronics Engineers (IEEE).
View full text
© 2026 Published by Elsevier B.V.
