Order this book
Paul D. Scott
Department of Computer Science, University of Essex, Colchester, CO4 3SQ, United Kingdom.
This volume presents a selection of the papers presented at two satellite meetings: the ECAI-96 workshop on "Learning in Distributed Artificial Intelligence Systems" and the ICMAS-96 workshop on "Learning, Interaction, and Organization in Multiagent Environments". My primary research interest is in machine learning rather than multi-agent systems and what I wanted to find out when I read this book was what new challenges multi-agent systems presented to my own field. Is it essentially an application area? That is, can the intelligent application of established machine learning procedures solve the problems of adaptation that arise in multi-agent systems? Alternatively, do multi-agent systemd present fundamentally new types of learning problems whose solution will require the development of radically new machine learning procedures?
Singh and Huhn's paper "Challenges for Machine Learning in Cooperative Information Systems" comes closest to addressing these questions directly. It begins promisingly by pointing out that, in contrast to the problems traditionally addressed by machine learning, multi-agent systems require that agents must learn about entities (other agents) that have intentions and beliefs. This is certainly true and it hints at enormous difficulties if traditional machine learning techniques are applied to such systems. Intentionality is at the root of some of the deepest schisms in the social sciences, between those who seek to apply methodologies originally developed for the non-intentional systems studied by physical scientists and those whose methods owe more to the traditional humanities. From a machine learning perspective, however, the rest of Singh and Huhn's paper was disappointing. It identifies numerous adaptation problems, within the context of agents operating on distributed databases, but has little of substance to say about if or how they could be addressed using established machine learning techniques.
The rest of the papers in the book are grouped under three headings. The first section comprises seven papers that address the problems involved in agents learning to cooperate or compete. Many of the authors adopt a reinforcement learning approach. For traditional machine learning, the core problem of reinforcement is credit assignment. If a sequence of actions culminates in a reward, which of the actions should be reinforced? The Q-learning algorithm is an established machine learning solution that addresses the credit assignment by applying discount factors to delayed rewards. The obvious way of extending Q-learning to multi-agent systems is simply to treat other agents as part of the environment. Unfortunately, if you do this in a naïve way, you immediately encounter the fundamental weakness of Q-learning: it rapidly becomes intractable as the state space to be explored increases in size. Ono and Fukumoto present a solution to this by decomposing the state space in such a way that each component includes only one other agent, applying Q-learning independently to each. This is an interesting idea but it is not clear how far it can be extended into more complex situations. In such cases it seems likely to me that more powerful methods (involving generalisation to partition the state space) would seem to be increasingly necessary. It is therefore particularly interesting that Ono and Fukuta's paper describes just such a system to address the state space size problem presented by continuous domains.
The system of cooperative robots described in Versino and Gambardella's interesting paper has no need of Q-learning since all rewards are effectively immediate because the element to be rewarded is an action sequence generator rather than an isolated action. They have a slightly different perspective on the credit assignment problem. In traditional machine learning the problem is "Which action should be rewarded?", the natural extension to multi-agent systems is "Which agent should be rewarded for which action?" By contrast, Versino and Gambardella are fundamentally concerned with the question "What should the reward be?" Essentially they argue that one cannot produce team behaviour by rewarding individual success: the reward must be based on overall team performance. I was particularly intrigued by their problem domain since each robot's behaviour is determined by only two adaptive parameters. Many other learning procedures could thus be applied to the same task.
Schmidhuber and Zhao extend reinforcement learning to meta-learning by including the basic learning procedures, which they term 'policy modification processes', in the set of actions that is subject to adaptation. In essence their method utilises a stack to permit backtracking over changes in the learning function. Changes are only preserved if they lead to an overall increase in average reward. This is an interesting idea whose utility is not confined to learning in multi-agent systems.
Davidsson describes a system in which the agents have two components: a reactive element that acts upon the world and an anticipatory unit which contains a world model and acts upon the reactor. The anticipator acts as a critic, modifying the reactor if the next action would lead to an undesirable state. This basic architecture is potentially very powerful and, again, its utility is not confined to multi-agent learning. It is disappointing that it is only applied to simple situations in which the selection of the appropriate modification to avoid an undesired state is, as the author says, obvious.
Bazzan takes a game theoretic approach in her paper on the evolution of coordination. Its roots lie in Maynard Smith's application of game theory to biological evolution and in particular to the notion of an evolutionary stable strategy. (It was a popularisation of this work that made Richard Dawkins famous and ultimately led to the current enthusiasm for evolutionary psychology.) The author introduces the EVO algorithm which enables a set of agents to converge on a similar equilibrium and demonstrates its efficacy by applying to the coordination of sequential sets of traffic lights.
Learning plays a relatively minor role in the system for searching 3D spaces described by Ye and Tsotsos. This takes the form of direct communication between agents of a limited amount of information about the search results. This paper has much more to say to those interested in distributed search strategies than to machine learning researchers.
With the slight exception of the article by Ye and Tsotsos, all the papers in the first section of the book are devoted to systems that learn to perform well in a situation containing other agents. They do not attempt to learn directly either about or from other agents. The next section comprises five papers that each involve some aspect of the problems involved in building systems that do.
Nadella and Sen have chosen to work in the increasingly popular domain of simulated robot soccer. Their paper identifies a number of places in which learning can play an essential role in enabling agents to cooperate with members of their own team in competing successfully against their opponents. They then describe three very simple learning mechanisms for improving passing, estimating how likely on opponent is to fend off a tackle and determining the maximum effective range for shots at goal. These procedures are deliberately simple, and hence computationally cheap, because the authors use them with small amounts of data during the course of an actual game.
Dragoni and Giorgini are concerned with the problems involved in learning from other agents: in particular those that arise when several agents must pool evidence to reach conclusions. This means that each agent in making a decision must integrate evidence from other sources with its own observations. This in turn requires that it should be able to estimate the reliability of each such source. Dragoni and Giogini's approach is eclectic: they use Bayesian methods to assess source reliability but rely on aspects of Dempster-Shafer Theory for integrating evidence.
Two of the papers in this section are concerned with the application of learning to contract net processes for task allocation. In such systems, agents send in bids for a task to be allocated by a managing agent. Terabe, Washio, Katai and Sawaragi contrast the contract net method with a procedure in which the manager uses its own knowledge of the other agents to select the most appropriate. The advantage of the latter is a big reduction in communication costs. Terabe et al.'s system uses the contract net process only if the managing agent lacks the necessary knowledge. As the system runs, the manager acquires more experience of each agent's capabilities. A simple exponential lag learning mechanism is used to provide increasingly reliable estimates of the time each agent is likely to take to perform each type of task. Ultimately the vast majority of tasks are allocated using this knowledge rather than through contract net processing. Lenzmann and Wachsmuth describe a more complex environment in which the set of agents are used to provide a user interface that adapts to match user preferences. In this case the competing agents provide alternative functionalities. Once again, the heart of the selection procedure is a contract net. User feedback provides information that is used to build an implicit rather than explicit user model.
Plaza, Arcos and Martin demonstrate how case-based reasoning and learning can be extended to systems of cooperating agents. They compare two alternative modes of cooperation: 'distributed' in which an agent sends a problem to another for solution and 'collective' in which the agents sends both a problem and a method of solution. The authors discuss how these may be implemented using the Plural Noos representation language and applied to the selection of protein purification techniques. I was disappointed that no results were presented in the paper.
The final section of the book comprises four papers grouped under the heading "Learning, Communication and Understanding". Davies and Edwards address the problem that arises when several agents need to share the information they have each individually acquired through inductive learning. Since each agent will have made inductive inferences from different experiences their conclusions may be mutually inconsistent. How may they be reconciled? The authors' proposed solution is an ingenious application of Mitchell's classic version space formulation of learning tasks. They suggest that in addition to sharing inductive conclusions, agents should share the sets of maximally specific and maximally general hypotheses consistent with the training data. These bounds can then be used to produce hypothesis consistent with the experiences of all the agents. Unfortunately, as the authors are aware, this approach shares all the disadvantages of Mitchell's version space algorithm: high space and time complexity and sensitivity to noise.
Ohko, Hiraki and Anzai present another learning system in the context of a control net protocol. Like Terabe et al. (see above) their concern is to reduce the heavy communication load that arises if tasks must be broadcast for bids to all agents and again like Terabe et al. they achieve this by enabling the task manager to learn selecting the right agent for the task. However, where Terabe et al.'s system simply accumulated moving averages of the time taken to perform types of task, Ohko et al.'s uses case-based reasoning to identify the most suitable agent.
I found the paper by Friedrich, Kaiser, Rogalla and Dillman disappointing. They embark on a discussion of why communication is necessary in multi-agent systems, why it is difficult and what role learning can play. But, just at the point that a reader might reasonably expect that all this will be brought together, in a proposed design if not an implementation, the paper ends. Subsequent work by this group might be more rewarding reading.
In the final paper of the book, Lacey, Nakata and Lee argue the case for explicit consideration of the epistemological foundations upon which agents are based. They begin with a review of the major schools of thought on both truth and justification, arguing that the choice made between these can have a major effect on the architecture an agent requires to acquire and process knowledge. An agent based on the correspondence theory of truth would modify its beliefs to maximise the accuracy of its predictions about the external world, while one founded on the coherence theory would maximise internal consistency. Inevitably the subsequent implementation involving agents of both types is very simple. It would have been interesting to see these arguments applied to the analysis of existing systems, particularly if examples could be found where correspondence is subordinate to coherence.
So what about the questions with which I embarked on reading this collection of papers? Can any conclusions about the relationship of machine learning and multi-agent systems be drawn? First, I was struck by the number of systems described which had achieved major improvements using extremely simple learning procedures. This suggests that there is a lot of scope for doing interesting things with learning in multi-agent systems without requiring that machine learning researchers come up with anything new. Secondly, I was struck by the absence of systems that built explicit models of other agents which could be used to predict their behavioural choices. This is, in general terms, a very difficult problem. The theoretical limits have long been known, since the problem is analogous to that of inducing the grammar of a language from a set of strings. So perhaps researchers are wise to be wary of it. On the other hand, there is enormous potential for heuristic methods that could build approximate but useful models.
Return to Contents of this issue
© Copyright Journal of Artificial Societies and Social Simulation, 2000