Πλοήγηση ανά Συγγραφέας "Papageorgiou, Dimitrios"
Τώρα δείχνει 1 - 1 of 1
Αποτελέσματα ανά σελίδα
Επιλογές ταξινόμησης
Τεκμήριο Learning-based control of multi-agent systems(ΕΛΜΕΠΑ, Σχολή Μηχανικών (ΣΜΗΧ), Τμήμα Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών, 2025-03-12) Gkoutzounis, Dimitrios; Γκουτζούνης, Δημήτριος; Papageorgiou, Dimitrios; Παπαγεωργίου, Δημήτριος; Baumann, DominikIn this thesis, we develop a set of algorithms capable of learning and controlling unknown system dynamics of Multi-Agent Systems (MAS). Our key contribution is ensuring that we learn control while ensuring the safety of the MAS. In contrast to traditional Reinforcement Learning (RL) and neural network techniques, our algorithms are sample-efficient and assume no prior model knowledge. This enables our algorithms to operate directly on the hardware, mitigating modeling and sim—to—real transfer concerns. We employ Bayesian optimization (BO) and Gaussian processes (GPs) to model the surrogate function under approximation. Herein, we formulate our task as a cooperative Markov Decision Process (MDP) environment and quantify the performance of each experiment as a scalar reward. In our setting, the agents are heterogeneous and autonomous; we, therefore, need to learn separate policies for each agent. This entails a distributed framework approach, that scales to larger MAS while reducing dependency on communication. Our first approach uses local rewards to decipher the rewarding actions w.r.t the global reward and GPs to predict the optimal parameters for each agent. Consequently, with our second approach, we constraint optimization under the critical aspect of safety and utilize only a single communication instance to update the agent policies. The proposed approaches are evaluated using simulated and hardware experiments. Our results show that both algorithms succeed in learning the unknown objective in very few iterations, showing competitive results to prior techniques. In addition, compared to the current state—of—the—art algorithm, our approach improves in predicting higher rewards in the same number of iterations. The research concludes by providing useful insights toward safe MARL.