Multiagent reinforcement learning (Back)


Reinforcement learning in multiagent domain is very challenging as each agent should consider all other agents in its state-space –which results in curse of dimensionality- or the environment becomes dynamics and evolving for each individual agent that considers other agents as a part of the environment.  In our research we try to transform the difficulty of having multiple learning agents into an opportunity. In other words we exploit having multiple agents for increasing the speed and quality of learning. More specifically, by using each other’s knowledge and expertise in learning – what we call cooperation in learning – the agents reduce the number of learning trials, which is crucial in real world learning applications. In addition, we are interested in learning individual skills and cooperation protocols in multi-robotic systems. Our research theme in multiagent reinforcement learning includes:


• Knowledge evaluation and combination

Cooperation in learning –also called cooperative learning- can be realized in a multiagent system, if agents are capable of learning from both their own experiments and other agents’ knowledge and expertise. Extra resources are exploited into higher efficiency and faster learning in cooperation in learning as compared to that of Individual Learning. In the real world; however, implementation of cooperative learning is not a straightforward task, in part due to requirement to evaluate other agent’s knowledge and possible differences in area of expertise.

Three crucial questions are addressed in this research:

  • How to evaluate overall knowledge of an agent?

  • How the area of expertise of an agent can be extracted?

  • How agents can improve their performance in cooperative learning by knowing their areas of expertise?


Credit assignment in multiagent systems

Realization of multiagent reinforcement learning requires some basic problems to be solved. Among these problems, multiagent credit assignment is one of the most important and challenging ones. The reason is that, in many practical cases, the environmental critic is not intelligent enough to assess roles of individual agents and a single reinforcement signal –or called global- is available just for the team evaluation. Multiagent credit assignment cannot be solved in general case using a single technique. Therefore, in this research, we study possible methods for distribution of global reinforcement among the Q learning team members.

Robotic mind development based on multiagent paradigm

Designing an intelligent situated agent, in general, is a difficult task as the designer must see the problem from the viewpoint of the agent considering all its sensors and actuators. To facilitate this process, we have focused on devising a mathematical and general methodology to automate designing the agent. In this research, we develop a few methods for automatic design of hierarchical behavior-based systems using multiagent model, reinforcement learning, co-evolution and memetic algorithm. The designing process is divided into two different sub-problems: structure (organization) and behavior learning and evolution. In the structure learning, the goal is finding a suitable organization of predefined behaviors in a hierarchical architecture, while in the latter, the aim is finding an optimal map between behaviors’ state spaces to their action spaces. Solving these two tasks simultaneously leads to a general framework for learning in hierarchical behavior-based systems.
Our memtic algorithm is based on a combination of evolutionary methods and reinforcement learning in which co-evolutionary methods are used for behavior learning and culture of the previous generations is used as the initial bias for structure learning.


< Back