Brian Sallans, Alexander Pfister, Alexandros Karatzoglou and Georg Dorffner (2003)
Simulation and Validation of an Integrated Markets Model
Journal of Artificial Societies and Social Simulation
vol. 6, no. 4
To cite articles published in the Journal of Artificial Societies and Social Simulation, please reference the above information and include paragraph numbers if necessary
The behavior of boundedly rational agents in two interacting markets is investigated. A discrete-time model of coupled financial and consumer markets is described. The integrated model consists of heterogenous consumers, financial traders, and production firms. The production firms operate in the consumer market, and offer their shares to be traded on the financial market. The model is validated by comparing its output to known empirical properties of real markets. In order to better explore the influence of model parameters on behavior, a novel Markov chain Monte Carlo method is introduced. This method allows for the efficient exploration of large parameter spaces, in order to find which parameter regimes lead to reproduction of empirical phenomena. It is shown that the integrated markets model can reproduce a number of empirical "stylized facts'', including learning-by-doing effects, fundamental price effects, low autocorrelations, volatility clustering, high kurtosis, and volatility-volume correlations.
Agent-based Economics; Artificial Consumer Market; Artificial Stock Market; Bounded Rationality; Reinforcement Learning
The study of economic phenomena involves not just the domain
of economics, but also dynamical systems theory, game theory,
the theory of adaptive learning systems, psychology and many
others. Beginning with the seminal work of Herbert Simon (Simon, 1982), there
has been a realization that classical economic theory, based on rational
equilibria, is limited. In reality, economic agents are bounded both
in their knowledge and in their computational abilities. Recent work in
simulation-based computational economics has sought to implement
boundedly rational economic actors as learning agents, and to study the
implications on the resultant economic systems. See (Tesfatsion, 2002) for
a review of agent-based computational economics.
Our goal is to study a discrete-time agent-based economic model
which incorporates three types of boundedly rational agents: Production
firms, Consumers, and Financial traders. These three agents operate
in two coupled markets: a consumer market and a financial
equities market. In the consumer market, production firms
offer goods for sale, and customers purchase the good.
The financial equities market consists of stock traders who can buy
and sell shares in the production firms. The two markets are coupled
through the production firms, which try to increase shareholder value.
They might do this by increasing profits, or by taking actions which
directly boost their stock price. Each firm explicitly implements
a boundedly-rational agent which learns from experience, and
has limited knowledge and computational power.
Models of consumers (Baier and Mazanec, 1999), financial traders
(Steiglitz et al., 1995,Arthur et al., 1997b,LeBaron et al., 1999,Gaunersdorfer, 2000), and production firms
(Natter et al., 2001) have been studied previously. Usually, the
focus is on a single type of actor (firm, consumer or trader).
The other actors are typically modeled as exogenous inputs, or
simple random processes. We focus here on the integration of the
two markets including explicit models of all three actors. Specifically,
we build on the work of (Steiglitz et al., 1995), (Arthur et al., 1997a),
(Brock and Hommes, 1998), (Gaunersdorfer, 2000), and (Dangl et al., 2001) in financial market
modeling; (Baier and Mazanec, 1999) in consumer modeling; and (Natter et al., 2001)
in production firm modeling. Our approach is to simplify and
integrate these previous models, while still retaining their
empirical behavior. In addition, the integrated model should address
new phenomena that can not be investigated in separate models.
Our ultimate goal is to investigate the mutual influence of the two
markets. In particular, because the firms learn based on feedback from
the financial market, we can examine the influence of the financial
market on production firm behavior. Given rational expectations of firms
and financial traders, one might expect that it does not matter whether a
firm bases its actions on its own estimate of future performance, or on its
stock price (the shareholders estimate of future performance). This would
be the case if firms and traders were fully rational, since both
would have the same estimate of the value of a firm and its actions.
However, when both are only boundedly rational, then their estimators might
be in disagreement. In this case the financial market could have a positive
or negative influence on firm performance. The type and degree of influence
will depend on how firms and stock traders estimate future performance, and
how managers of the firm are compensated.
Before we can use the model to investigate inter-market effects,
we have to satisfy ourselves that it behaves in a reasonable way.
We validate our computational model by comparing its output to known
"stylized facts'' in consumer and financial markets. Stylized
facts are robust empirical effects that have been identified in a number
of examples of real markets. Successful reproduction of empirical
phenomena suggests that the dynamical properties of the model are similar
to those of the real markets that it tries to emulate. We can then
use the model to better understand the underlying dynamics and causes
behind the observed effects. For example, by looking at what model
parameter settings encourage particular behavior, we can get some
insight into the underlying mechanism which causes it.
This article introduces a novel validation technique based on
Markov chain Monte Carlo (MCMC) sampling. By using the technique, we
can investigate how model parameters influence model behavior, even for
large parameter spaces. We can explicitly investigate how different
model parameters are correlated, and under what conditions the model
reproduces empirical "stylized facts''. This new validation and
exploration technique is widely applicable to agent-based simulation
models, and is an important contribution of this paper.
The goal of this article is to introduce the integrated markets
model, and present validation results suggesting that it is a good
combined model of the two markets. After describing the model in
detail, we introduce the new model exploration and validation technique
based on MCMC sampling. Finally, we describe a number of stylized facts,
and show simulation results from the integrated markets model. Using
MCMC exploration, we show that the dynamics of competition in the consumer
market are an important part of the overall dynamics in the financial market.
Similarly, the dynamics of the financial market have an impact on the learning
abilities of firms in the consumer market.
The model consists of two markets: a consumer market and a financial equities
market. The consumer market simulates the manufacture of a product
by production firms, and the purchase of the product by consumers.
The financial market simulates trading of shares. The shares are traded
by financial traders. The two markets are coupled: The financial
traders buy and sell shares in the production firms, and the managers of
firms may be concerned with their share price. The traders can use the
performance of a firm in the consumer market in order to make trading
decisions. Similarly, the production firms can potentially use positioning
in product space and pricing to influence the decisions of financial
traders (see figure 1).
The Integrated Markets Model. Consumers purchase products,
and financial traders trade shares. Production firms link
the consumer and financial markets, by selling products to
consumers and offering their shares in the financial
The simulator runs in discrete time steps. Simulation steps consist
of the following operations:
We describe the details of the markets, and how they interact,
in the following sections.
- Consumers make purchase decisions.
- Firms receive an income based on their sales and their position
in product space.
- Financial traders make buy/hold/sell decisions. Share prices
are set and the market is cleared.
- Every steps, production firms update their products or pricing
policies based on performance in previous iterations.
The Consumer Market
- The consumer market consists of firms which manufacture products, and consumers
who purchase them. The model is meant to simulate production and purchase of
non-durable goods, which the consumers will re-purchase at regular intervals.
The product space is represented as a two-dimensional simplex, with product
features represented as real numbers in the range [0,1]. Each firm
manufactures a single product, represented by a point in this
two-dimensional space. Consumers have fixed preferences about what kind of
product they would like to purchase. Consumer preferences are also represented
in the two-dimensional product feature space. There is no distinction between
product features and consumer perceptions of those
features (see figure 2).
The two-dimensional product space. Consumers have
fixed product preferences (denoted by "*''). Firms
can position their products (denoted by " . '')
in the feature space.
- The production firms are adaptive learning agents. They adapt to consumer
preferences and changing market conditions via a reinforcement learning
algorithm (Sutton and Barto, 1998). Every iterations of the simulation the firms
must examine market conditions and their own performance in the previous iterations,
and then modify their product or pricing.
A boundedly rational agent can be subject to several kinds of limitations.
We focus here on limits on knowledge, and representational and computational
power. How these limitations are implemented is detailed below.
The firms do not have complete information about the environment in
which they operate. In particular, they do not have direct access to consumer
preferences. They must infer what the consumers want by observing what
they purchase. Purchase information is summarized by performing "k-means''
clustering on consumer purchases. The number of cluster centers is fixed at
the start of the simulation. The current information about the environment
consists of the positions of the cluster centers in feature space, along with some
additional information. The information is encoded in a bit-vector
of "features''. The features are summarized in Table 1.
Features Available to Production Firms.
||1 if assets increased in the
previous iteration, 0 otherwise
||1 if share price increased in the
previous iteration, 0 otherwise
||1 if product price is greater than mean price
of competitors, 0 otherwise
|Cluster Center 1
||A bit-vector that encodes the position
of cluster center 1
|Cluster Center N
||A bit-vector that encodes the position
of cluster center N
The assets are equal to the initial endowment plus the accumulated
profits up to now. The cluster centers are encoded as binary vectors.
Each cluster center can be described as a pair of numbers
Two corresponding binary vectors are generated by "binning''. Each axis
is divided in to bins. For all of our experiments, was set to 10.
The bit representing the bin occupied by each number
is set to 1. All other bits are 0. For example, given 10 bins per axis and
a cluster center
, the resulting bit vector is
(see figure 3).
Computing the bit vector representation of a cluster center.
In this case the cluster center is located at
the resulting bit-vector is
This information gives a summary of the environment at the
current time step. Firms make decisions based on the current "state'',
which is a finite history of bit vectors. In other words, firms
make decisions based on a limited memory of length . This limited
history window represents an additional explicit limit on the firm's
In each iteration the firms can take one of several actions.
The actions are summarized in Table 2.
Actions Available to Production Firms.
||Take a random action from one
of the actions below, drawn from
a uniform distribution
||Take no actions in this iteration.
||Increase the product price by 1.
||Decrease the product price by 1.
||Move product in negative Y direction.
||Move product in positive Y direction.
||Move product in negative X direction.
||Move product in positive X direction.
|Move Towards Center 1
||Move the product features
towards cluster center 1.
|Move Towards Center N
||Move the product features
towards cluseter center N.
The "Do Nothing'' and "Increase/Decrease price'' actions
are self-explanatory. The "random'' action is designed to
allow the firm to explicitly try "risky'' behavior. The
"Move product'' actions move the features of the product produced by
the firm a small distance in a direction along the chosen axis or towards or
away from the chosen cluster center. For example, if the action selected by
firm is "Move Towards Center j'' then the product is modified as
enumerates product features, and and are
product feature and feature of cluster center j respectively. The
is a small fixed constant.
- A firm's manager seeks to modify its behavior so as to maximize an external
reward signal. The reward signal takes the form of a fixed reward, a variable
amount based on the firm's profitability, and a variable amount due to change
in the value of the firms stock. The reward received by firm at time
is given by:
where is the fixed reward,
denotes the profits of firm
at time , and denotes the share price of firm at time .
For all of our experiments, was set to zero for simplicity.
sum to unity.
They are fixed at the beginning of the simulation and held constant throughout.
They trade off the relative importance of profits and stock price in a firm's
decision-making process. The constant reward signal can be interpreted
as a fixed salary paid to the manager of the firm. The profit-based reward can be
interpreted as an performance-based bonus given to the manager of the firm, and
the stock-based reward as a stock grant or stock option.1
- Given the state history of the simulator for the previous time steps,
a production firm makes strategic decisions based on its utility function.
Utility functions (analogous to "cost-to-go'' functions in the control theory
literature, and value functions in reinforcement learning) are a basic
component of economics, reinforcement learning and optimal control
theory (Bertsekas and Tsitsiklis, 1996,Sutton and Barto, 1998). Given the "reward signal'' or payoff at
each time step, the learning agent attempts to act so as to maximize the total
expected discounted reward, called expected discounted return, received over the
course of the task:2
denotes taking expectations with respect to the
distribution , and is the policy of the firm. The policy is a
from states to distributions over actions. In our case
is the set of
possible state histories, and
is the set of possible actions taken by a
firm. The range of the policy
is the set of probability
distributions over actions in
. Note that the discount factor
encodes how "impatient'' the firm is to receive reward. It dictates how much
future rewards are devalued by the agent. If desired, the discount factor can
be set to the rate of inflation or the interest rate in economic simulations,
such that the loss of interest on deferred earnings is taken into account by the
firm's manager. In our simulations, the discount factor was found using
the Markov chain Monte Carlo validation technique (see section 5).
Given the above definitions, the action-value function
(or Q-function (Watkins, 1989,Watkins and Dayan, 1992)) is defined as the
expected discounted return conditioned on the current state and action:
where and denote the current state information
and action respectively (see tables 1
and 2). The action-value function tells the firm how much
total discounted reward it should expect to receive, starting now, if it
executes action in the current state , and then follows policy
. In other words, it is the firm's expected discounted utility (under
policy ) conditioned on the current state and the next action. Note
that this is not a myopic measure of reward. This utility function takes
into account all future (discounted) rewards.
The coding scheme used for world states makes the overall state space
quite large. However, in practice, the number of world states observed
during a typical simulation is not very large. We can therefore represent
the action-value function as a table indexed by unique state histories of
length and actions.
- Reinforcement learning provides a way to estimate the action-value
function from experience. Based on observations of states, actions and
rewards, the learner can build up an estimate of the long term consequences
of its actions.
By definition, the action-value function at time can be related to
the action-value function at time :
The first line is just the definition of the action-value function. The
second line simply unrolls the infinite series one step, and the
third line explicitly replaces the second term by the expected action-value
function, again by the definition of expectations under policy . The last
line is called the Bellman equations (Bellman, 1957).
It is a set of self-consistency equations (one for each state-action pair)
relating the utility at time to the utility at time .
One way to compute the action-value function is to solve the Bellman
equations. We use a reinforcement learning technique called
SARSA (Rummery and Niranjan, 1994,Sutton, 1996). SARSA can be viewed as using a Monte Carlo
estimate of the expectations in Eq.(7) in order to iteratively solve
the Bellman equations. At time , the estimate of the action-value
is updated by:
where is a small learning rate. For all of our experiments,
was set to 0.1. In words, the utility function
is updated as a linear mixture of two parts. The first part is the
previous estimate. The second part is a Monte Carlo estimate of future
discounted return. It includes a sample of the reward at time (instead
of the expected value of the reward), and the utility for a sampled state and
action at time (instead of the expected utility based on the policy and
state transition probability). Finally, the current estimate of the
action-value function is used, in place of the true action-value function.
Intuitively this learning rule minimizes the squared error between the action-value
function and a bootstrap estimate based on the current reward and the future discounted
return, as estimated by the action-value function. Theoretically, this technique
has been closely linked to stochastic dynamic programming and Monte Carlo
approximation techniques (Bertsekas and Tsitsiklis, 1996,Sutton and Barto, 1998).
After each action-value function update, a new policy is constructed
from the updated action-value function estimate:
This policy selects actions under a Boltzmann distribution, with
better actions selected more frequently. The "temperature''
can be reduced over the course of the simulation. At the beginning
of the simulation, a large temperature makes the policy quite "flat''
and encourages exploration. Later, at a low temperature, optimal actions
are most often selected.
In theory, using the SARSA algorithm, the action-value
function estimate will converge to the optimal action-value
function. However, convergence relies on the learner operating in a
stationary stochastic Markov environment (Sutton and Barto, 1998). When
there is more than one adaptive firm in the environment, the stationarity
assumption is violated. Nevertheless, it has been shown that reinforcement
learning can be used to approximately solve competitive games
(Sandholm and Crites, 1995,Tesauro, 1999,Mundhe and Sen, 2000). As noted previously, the Markov
assumption is also violated, since the state
vector of the firm does not include all information necessary for solving
the task. For example,explicit consumer preferences and exact product
positions are not known by the firm. Limited-memory reinforcement
learning algorithms have been used previously to approximately solve
partially observable problems (Jaakkola et al., 1995). There exist
learning algorithms which explicitly take in to account non-stationarity
and partial observability (Littman, 1994,Wellman and Hu, 1998,Hu and Wellman, 2003). While the
firms could use one of these algorithms, it would complicate matters,
perhaps needlessly. It is possible that changing the learning
algorithm would influence the results of our simulations. However,
all such algorithms have common features: they start with
little knowledge of the environment; they require initial
exploration followed by exploitation; and they attempt to improve
their performance over the course of the task. We choose to use
a simple algorithm initially, and leave investigation of the
game-theoretic algorithms for future work.
- Consumers are defined by their product preference. Each consumer agent
is initialized with a random preference in product feature space. During
each iteration of the simulation, a consumer must make a product purchase
decision. For each available product, the consumer computes a measure
of "dissatisfaction'' with the product. Dissatisfaction is a function
of product price and the distance between the product and the consumer's
preferred product. Consumer 's dissatisfaction with product is given
where denotes the price of product , and trades off the
importance of product features and price. For all of our experiments,
was set to . The measure
distance in feature space between the ideal product of customer
and product :
Here bold-faced letters denote the feature-vector representations of products and
preferences. The diagonal matrix
is common to all consumers and models
the relative importance of features in the feature space.
The denominator in Eq.(11) normalizes the distance given the
axis weightings . In all of our simulations, the matrix was set
to the identity matrix, and the denominator was therefore unity,
resulting in a Euclidean distance metric.
Every consumer is also initialized with a different "ceiling'' dissatisfaction
. If all product dissatisfactions are above its ceiling,
a consumer will simply make no purchase in that iteration. For all of our
was set to . Given dissatisfaction ratings
for all products, and given the set of products with dissatisfactions below the ceiling in
iteration , consumer selects from this set the product with the lowest
The Financial Market
One important paradigm of modern finance is the efficient market hypothesis.
It is assumed, that the actual market price contains all available information.
Especially past prices cannot help in forecasting future price changes. This is
in contrast to the empirical trading behavior of many investors. Chartists or
technical traders believe that prices may be predicted by extrapolation of
trends, technical trading rules or other patterns generated by past prices.
Another paradigm is the rational expectation equilibrium (REE), introduced
by (Muth, 1961). Agents are expected to be fully informed and to know all
equations of the economic model. Perfectly rational agents maximize their utility
function and are able to solve complicated optimization problems. This seems to
be unrealistically demanding for real-world agents, and therefor bounded rationality
models have been proposed. In our model bounded rationality enters via the agent's
formation of expectations over future prices and variances. Investors only use
publicly available information, i.e. past stock prices and dividends and they
do not make systematic mistakes.
Within this section we will present a standard capital market model (see e.g.
(Arthur et al., 1997a,Brock and Hommes, 1998,Dangl et al., 2001)). Myopic investors maximize their next period's
utility subject to a budget restriction. At time agents invest their wealth
in a risky asset with price and in bonds, which
are assumed to be risk free. Each agent only trades with a stock of a single firm. Within
this section we therefor drop the index for the firms.
There are stocks paying a
dividend . It is assumed that firms pay out all of their profits if positive,
therefor each stock gets a proportion of of the profits.
The risk free asset is perfectly elastically supplied and
earns the risk free and constant interest rate . Investors are
allowed to change their portfolio in every time step. The wealth of
investor at time is given by
is the wealth at time and the number of stocks
of the risky asset hold at time .
As in (Brock and Hommes, 1998) , (Levy and Levy, 1996) , (Chiarella and He, 2001) , and (Chiarella and He, 2002)
the demand functions of the following models are derived from a Walrasian scenario.
This means that each agent is viewed as a price taker
(see (Brock and Hommes, 1997) and (Grossman, 1989) )
Let an investor with wealth maximize his/her utility of the form
||Price per share of the risky asset at time
||Dividend at time
||Risk free rate
||Total number of shares of the risky asset
||Total Number of investors
||Number of shares investor holds at time
||Wealth of investor at time
||Risk aversion of investor
with as constant absolute risk aversion. Denote by
the information set available at time .3 Let
and be the conditional expectation and conditional variance
of investor at time based on . Then the demand for the risky asset
Let be the total number of shares, then the market clearing price
is implicitly given by the equilibrium equation
It is well known, that expectations play a key role in modeling dynamic
phenomena in economics. Heterogeneous expectations are introduced in the
- As in many other heterogeneous agent models we assume that two kinds of
investors exist: fundamentalists and chartists. Additionally, investors have
different time horizons which are modeled via the time length agents
look back into the past. In our simulations is distributed between 1 and 250.
The proportion of fundamentalists and chartists
in the market are denoted and , where
In our model we want to focus on the formation of expectations about prices
and not on the formation of expectations about variances. Therefore we assume
homogeneous and time independent expectations about the variance .
This means that both types of investors, fundamentalists and chartists,
determine in the following way
where is a constant. Now we split
First investors form their expectations about the next periods dividend
. Agents take the average of the last dividends as a
Fundamentalists determine their price expectations according to a model based
on fundamental information, which in our model are past dividends. The
dividends are based on the earnings of the firms in the consumer market.
Fundamentalists calculate a fair price and expect that the current price
will gradually move towards it with a rate of . A fundamentalist
assumes that the fair price
is a linear function
of past dividends, i.e.
where 1/ is the fair dividend yield. In our simulation we set = 0 and = 50, which corresponds to
a fair dividend yield per period of 2%.4 Note that the fair dividend yield is
just an assumption of the fundamentalists and does not take the
specific risk of the stock into account. Therefore the assumed fair price
of the stock does not necessarily match the true fair value of the stock.
This leads to the following price expectation
Chartists use the history of the stock prices in order to form their
expectations. They assume that the next period's price change will equal
the average price change during the last periods, scaled by a
Note that at time , is not included in the information set ,
therefore the investor has to form his/her expectation on the basis of last
- The market uses a sealed-bid auction, where the clearance mechanism chooses the
price at which trading volume is maximized. The first step is to construct
supply and demand curves based on the transaction requests. Example
supply and demand curves are shown in figure 4.
Supply and demand curves. Supply is marked with "O'' and increases
with price. Demand, marked with "*'', decreases with price.
The market price (vertical line) is set to a price which maximizes
the volume traded. In this case, the market price is .
Note that there may be a range of prices that would maximize volume. We
select the maximum price in this range. If there are buy orders but no
sellers then the share price is set to the maximum bid. If there are only
sell orders then the price is set to the minimum ask. If there are
no orders in a time period, then the price remains unchanged.
Each trader specializes in a single firm, and only buys or sells shares in this
firm. Each trader is initialized with a supply of shares in its firm of
- Let us have a look at the timing of the events within the financial model. The
first step is the formation of expectations. Based on past prices and dividends
an investor forms his/her expectation about the distribution of the next
period's price and dividend, i.e.
. Plugging the expectations
into Equation 16 the agent is able to determine the demand function,
which is submitted to the stock market via limit buy orders and limit sell
orders.6 After the
orders of all agents are submitted the stock market calculates this period's
equilibrium price , i.e. the price where supply equals demand. At the end
of the period the current dividend is announced and becomes public
- One goal of constructing agent-based economic models is to gain
some insight into the mechanisms that cause observed market
behaviors. Agent-based economic models offer a kind of economic
laboratory, in which parameters can be changed, and the results observed.
Useful models will reproduce known market behaviors for reasonable
parameter settings. Knowing the behavior of the model in different
parameter regimes is therefore important both for validating that a
model is reasonable, and using the model to understand economic
phenomena. However, in complicated models with many parameters, it may
be difficult to discover relationships between model parameters, and find
regions in parameter space where the model has interesting behavior.
We will validate our model by confirming that it can indeed
reproduce empirically observed market behaviors, or "stylized facts''.
In this section we propose a novel algorithm for exploring the
relationship between model parameters and stylized facts.
The algorithm is based on Markov chain Monte Carlo (MCMC)
sampling. We describe a number of empirical phenomena that have
been observed in consumer and financial markets, and give corresponding
simulation results. We show that a number of stylized facts within the
two markets can be reproduced by our model under reasonable parameter
settings. We further show that the behavior of each of the markets is
dependent on the dynamics of the other market. In other words, the
integrated model is not simply two separate models joined together.
The behavior of each market is intimately tied to the parameters and
dynamics of the other market. We explore the mechanisms behind some
stylized facts by examining correlations between model parameters.
Although it has been our intention to keep the model simple,
the firm's learning algorithm and trader's decision rules have
tuning parameters. Parameter values must be selected
before a simulation can be run. These parameters have been
introduced in earlier sections describing each of the agents in the
model. Using preliminary simulations, some of the parameters were
found to have a large influence on the outcome of the simulation, and
others were found to be relatively unimportant. All parameters
are summarized for convenience in table 3. The
"reference'' column indicates where in the text the parameter was
introduced, and where more details can be found. The "value'' column
indicates the value used for simulations (see section 5.3).
The values of parameters in the first group (above the double line) were
found using the Markov chain simulation technique described in the
next section. Those in the second group were found to be relatively
unimportant. These values were set based on initial trial simulations,
and held fixed for all simulation runs.
Parameters for Integrated Markets Simulator
|| strength of profitability reinforcement
|| strength of stock price reinforcement
|| Number of cluster centers
|| section 3.1.1
|| product update rate
|| reinforcement learning discount factor
|| History window length for firms
|| section 3.1.1
|| Proportion of fundamentalists
|| section 4
|| Proportion of chartists
|| section 4
|| Fundamentalist price update rate
|| Chartist price update rate
|| Number of bins
|| figure 3
|| product update frequency
|| section 2
|| base salary
|| reinforcement learning rate
|| reinforcement learning temperature
|| Consumer feature/price tradeoff
|| Maximum dissatisfaction for consumer
|| inverse fair dividend yield
We would like to understand the effect of parameters on model behavior.
We could "grid'' the space of parameters, and then run a large number
of repetitions of the simulator, one for each grid point. However,
this approach would very quickly become infeasible with more than a
few parameters. Ten parameters, each taking on one of ten values,
would require runs to cover the grid. Many of these
parameter combinations will not be of particular interest.
Instead we would like a way to focus computational power on areas of
parameter space that are "interesting''. We will define as interesting
areas where a stylized fact is well-reproduced. To this end, we will
adapt Markov chain Monte Carlo sampling to do a "directed'' random walk
through parameter space.
Consider the problem of evaluating the expected value of some
multivariate function with respect to a probability distribution or density.
In some cases (such as linear functions and Gaussian distributions) expectations
can be computed analytically. In many cases this is not possible. Monte Carlo
algorithms allow for the approximate evaluation of expectations in more difficult
circumstances. In the following, bold face will denote a vector and subscripts will
denote elements of a vector or set:
. Given a set of
from a distribution
we can approximate the expected value of a function
Before this approximation can be employed, we need a set of samples
. In many cases we do not
have a closed-form distribution from which samples can be drawn. The
Metropolis algorithm (Metropolis et al., 1953) is a method for drawing a set of samples from a
. Further, we need not have access to
but only need an unnormalized energy function
Given an initial point , the
step of the Metropolis algorithm
operates as follows:
- Select a dimension . Select a proposed sample from a
. The proposal
distribution can be a function of the previous point, and leaves all
of the elements of
unchanged except for the
This is called accepting the proposed sample.
(rejecting the proposed sample).
Note that when a proposal is rejected, the old point is added to
the sample in its place. In the algorithm as described above, the
proposal distributions should be symmetric. That is,
In the limit, the sequence of samples will converge to a unique stationary
distribution with marginal distribution
. Thus the set of
samples can be used for the approximation in Eq.(24). In practice,
the speed of convergence of the chain to the stationary distribution will depend
on the dimensionality of , the energy function of interest and the
proposal distribution. Assessing convergence can be problematic. If a non-convergent
set of samples is used, then the estimate will be biased. The algorithm can also be
extended to include non-symmetric proposal distributions.
Markov Chain Model Exploration
In our application, we do not want to evaluate expectations of a
function. Instead, we want to find settings for model parameters
that reproduce stylized facts. The Metropolis sampler has the following
property: Samples are more likely to be drawn from low-energy areas.
Given a stylized fact, we can define an energy function such that
low energy corresponds to good reproduction of the fact. Then, we
implement a Metropolis sampler using this energy function. In
the limit, parameter samples are drawn according to the normalized
probability distribution defined by the energy function. In practice,
we will not generate Markov chains which are sufficiently long to
reach the equilibrium distribution. But even without theoretical
guarantees on the distribution of sampled parameters, the sampler can
find good model parameter settings, and reveal interesting correlations
between model parameters. The Metropolis sampler acts as a "directed''
random walk through parameter space, avoiding high energy areas.
We have constructed energy functions for several stylized facts including:
learning-by-doing in the consumer market, low autocorrelations in stock
returns, high kurtosis in marginal returns, and volatility clustering.
The sampler operated over the parameters in the first group of
Table 3. We used symmetric Gaussian proposal distributions
over real-valued parameters, and uniform distributions over discrete
parameters. It was assumed that the energy function took on a value of
wherever parameters fell outside of their valid range, ensuring that such
values would be rejected by the sampler. One thousand samples were drawn using
the Metropolis sampler. While this is too short to allow for convergence, we
can still examine the sample set to identify regions where stylized facts are
well-reproduced, and look for significant correlations between parameters.
As it turns out, two of the four Markov chain experiments were
uninteresting. These were the runs trying to achieve high kurtosis
in the stock market returns, and getting high autocorrelations in the
absolute stock returns. The simulated stock market had both of these
features for almost all parameter values, and there were no interesting
correlations or relationships between parameters for these energy
functions. The only parameters of interest were the proportion of
fundamentalists and chartists. If the number of chartists fell below
20%, the returns looked Gaussian. This suggests that high kurtosis
and volatility clustering are very robust features of the artificial
stock market, and are driven by the interaction between fundamentalists
In the sections below, we show the results for two energy
functions: The "learning-by-doing'' effect, and
low-autocorrelations in the stock market returns.
Learning by Doing
- The "learning-by-doing'' effect (Argote, 1999) encapsulates the idea that
firms gain knowledge and optimize their behavior over the course of
performing a task. Empirically, costs go down, and efficiency and
profits go up as a function of the number of units of a particular
product produced. Our model explicitly includes learning by doing
in the production firm. As the firm produces its product, it learns
what sells in the marketplace and at what price. This results in
an increase in profits over time. Note that this is very different
from models which include a "learning'' component in populations
of agents, implemented as an evolutionary algorithm. Our individual
firms learn over the course of the task.
We investigated which parameter settings influence the learning-by-doing
effect using our adapted Metropolis algorithm. The energy function
was the negative profits:
where indexes the firms, and is simply a scaling factor designed
to bring the energies into a reasonable range (set to for our
We found that the learning effect was quite robust to parameter settings.
In general, firms learned to perform well in the market place for almost
all parameter settings (see figure 5).
The "Learning by Doing'' effect was robust across almost
all parameter settings. The bar graph shows the per-time-step profits
of the firms sampled by the Metropolis algorithm. The vertical line
shows mean profits achieved by a randomly-behaving firm. The learning
firms do better than a randomly acting firm for nearly all parameter settings.
There was a significant negative correlation between the proportion
of fundamentalist traders in the simulation, and the adaption rate
of the fundamentalists (see figure 6).78
Note what this implies: Two parameters of the stock market
are correlated when trying to maximize a quantity from the consumer
market (profits). This suggests that the feedback mechanism from the
stock market to the production firms (via stock price) is having an
influence on the behavior of the firms in the consumer market,
and that some intermediate behavior of the financial market is optimal
from the point of view of firm learning. This may be because of an
exploration/exploitation tradeoff: A certain amount of noise or
uncertainty in the financial market could help the firms avoid shallow
local minima and prompt them to find products and prices that are
clearly winning in the consumer market. Too much noise can inhibit
learning. The proportion and adaptation rate of the fundamentalists
influences the volatility of the financial market, and therefore
the noise in a firm's reward signal.
Negative correlation between adaption rate of fundamentalist
traders and proportion of fundamentalist traders . In order for firms
to maximize their profits, there is an optimal influence of
fundamentalists on the stock market. When more fundamentalists are
in the market, their per-time-step influence is decreased. The plot
shows the density of samples at various parameter values for the
Markov chain Monte Carlo simulation. The plot is a smoothed
normalized density plot, based on the frequencies of sampled
parameter values for the best 100 samples according to the "profits''
MCMC energy function. The colors represent density, with blue
being low density (dark blue here is a density of approximately 0.04)
and red being high density (dark red is approximately 1.1 in this
Low Predictability and Volatility Clustering
- A fundamental feature of financial markets is that they are not
easily predictable. The efficient market hypothesis claims that
new information is immediately factored into prices, so that
the price at any given time reflects all prior knowledge. Under
this assumption, it is in principle impossible to predict market
movements. In practice, it has been found that many financial return
series have insignificant autocorrelations. Unlike most artificial stock
markets, our model does not include any extrinsic noise (such as randomized
trading strategies (Gaunersdorfer, 2000,Raberto et al., 2001), or a randomized dividend
process (Arthur et al., 1997b)). Interestingly, the autocorrelations are
nevertheless very low. This is due to a combination of heterogenous
trading strategies in the market, and the difficulty in predicting
profits in the consumer market.
Unlike price movements, price volatility is highly autocorrelated.
Empirically, market volatility is known to come in "clusters''. That is,
periods of high or low volatility tend to follow one another. This is the
basis of conditional heteroskodacity models, which predict volatility in
the next time step based on volatility in previous time steps.
In our model technical traders will tend to adjust their market position
during large price movements. This will in turn cause greater price movements.
Similarly, when the price is near the fundamental price, fundamentalists are
satisfied and hold their stock. This in turn stabilizes prices, and causes the
chartists to hold their stock as well.
We investigated which parameter settings lead to low autocorrelations
in the returns of the artificial stock market. The energy function used
was the squared error between the actual autocorrelations in the returns,
and an idealized set of autocorrelations:
where denotes the autocorrelation at lag , and
is the idealized autocorrelation. We used ,
. That is, a slight
negative autocorrelation at the first lag, and zero autocorrelation
After sampling with this energy function, we found significant correlations
between some sampled production firm parameters. This is particularly
interesting, because it indicates that the statistical properties of the
stock returns are substantially affected by the dynamics in the consumer
market. Specifically, there is a significant negative correlation between
the firm's "history depth'' parameter , and the weighting
placed by the firm on profits
(at the 95% confidence level).
That is, in order to get low autocorrelations in the stock returns, it is
best to have either a short history depth, or to place the most weight on
improving the stock price (at the expense of profits) (see figure 7).
Negative correlation between history depth and importance of profits.
"History depth'' is the number of past states available to the firm
when making decisions. The parameter is the proportion of firm reward
that comes from profits. In order to get low autocorrelations in the
stock price returns, either history depth should be high and profit
importance low, or profit importance should be high and history depth
low. The plot is a smoothed normalized density plot, based on the
frequencies of sampled parameter values for the best 100 samples
according to the "low-autocorrelations'' MCMC energy function.
The colors represent density, with blue being low density (dark blue
here is a density of approximately 5x10-4) and red being high density
(dark red in this figure is approximately 0.17).
This is likely related to how hard it is for the firms to learn
to do well in the market. Recall, the sampler is trying to find
parameter values for which the stock returns have low autocorrelations.
That is, the sampler prefers stock prices that are unpredictable. If
the firms do very well or very poorly, then their fundamental price is
predictable, and the stock returns have higher autocorrelation.
There is a regime in which firms have variable performance. The
amount of information available to firms (the history length )
and the kind of information available (profits or stock price)
appear to trade-off in determining firm performance.
We identified a set of parameter settings for which all of the stylized
facts were well reproduced (see figure 8 and column "Value'' in Table
3). We did this by intersecting the histograms of
parameter values from the MCMC simulation runs, and finding common parameter
settings. Since nearly all parameter settings gave good kurtosis and
volatility clustering behavior, these have been omitted from the figure for
clarity. After identifying a set of parameter settings for which all of the
stylized facts were well reproduced we ran 20 repetitions of the simulation at
these ideal parameter settings. The simulation consisted of two competing firms,
50 stock traders, and 200 consumers.
Histograms of parameter values from Markov chain Monte Carlo
sampling. The plot for each parameter shows three histograms:
blue for the "learning-by-doing'' energy function (section 5.3.1),
red for the low-autocorrelations energy function (section 5.3.2), and
green for the intersection of the other two. Each histograms includes
the top 30% of samples from the MCMC sampler, ranked by the negative energy.
The curve shows a Gaussian fit to this intersection. The "ideal'' parameters
were taken to be the means of these best-fit Gaussians.
The "ideal'' parameter values are reasonable. There are no
parameters which must take on extreme or unlikely values in
order to get good simulation behavior. We discuss each of
the parameter histograms below:
- firm learning rate ()
The firm learning rate has the least area of overlap between
MCMC runs. In order to get good learning, the learning rate should be
low. In order to get low autocorrelations, the learning rate should
be high. We chose an intermediate learning rate which seemed
to work well enough.
- Number of cluster centers ()
Again, there is tension between firm "learning-by-doing''
and making the stock market process more random. The firms
learned best when they did not have to deal with too much
information (fewer clusters). Again, an intermediate
value was used.
- Profit strength (
Interestingly, both energy functions were quite
insensitive to whether firms learned based on their
profits or their stock price, although both had peaks
for intermediate values of
Again, an intermediate value was chosen.
- Firm discounting ()
Firms learned best with either a low or high discounting
value. That is, they should either focused on the near term,
or the long term, but not the intermediate term. The
stock market, in contrast, had lower autocorrelations when
firms took the long view.
- History depth ()
Firms learned best with shorter histories, and the
stock market autocorrelations were better when the firms
used longer histories. This is the same effect as seen
with the number of cluster centers: Firms learn better
when their information is somewhat compressed. The
autocorrelations in the financial market seem to be
lower when the firms have too much information, and
therefore do not learn as well.
- Proportion of Fundamentalists ()
The market had the best autocorrelation structure for
intermediate numbers of fundamentalists. The firms seemed
to have a strong preference for a particular value
(around =0.6). This is interesting, because it
indicates that the stock market is not just "noise'' in the
firms learning process. Certain market dynamics make it
easier for the firms to learn. This might be because of
an "exploration-exploitation'' trade-off inherent in firm
learning. The firms must discover what strategies work well,
and then exploit them. Intermediate noise in the financial
market could allow them to escape local behavioral optima, and
find better strategies.
- Chartist adaptation rate ()
One might expect that firms learn better, and the
market is less random, when the adaptation rate
of the chartists is lower. In fact, the opposite
seems to be the case. Again, it could be that certain
levels of noise in the market are optimal for firm learning.
- Fundamentalist adaptation rate ()
Similarly to above, these values seem counterintuitive:
Firms learn better when the fundamentalists adapt more slowly,
and the market appears more random when the fundamentalists
The following sections show stylized facts reproduced by
simulation runs at the ideal parameter settings.
Figure 9 shows simulated profits as a function
of time, across the 20 simulation runs at the parameter
settings specified in Table 3. Median profits
increase as a function of time, indicating that firms learn to identify
good product positions and prices. The increase is significant
at the 5% level, as tested with a Wilcoxon signed rank test.
"Learning by Doing'' in the consumer market. The plot
shows median profits as a function of time, across 20 simulation runs.
The longer a firm spends in the market, the higher its profits.
Figure 10 shows autocorrelations of returns and absolute returns
for the artificial market. The autocorrelations were computed for the last 2000
periods of each runs, and averaged over 20 runs. For these plots, is
stock price at time , and returns at time are defined
. There are small negative
autocorrelations in the first few lags, followed by zero autocorrelations.
The kurtosis of the market returns was quite high at . The error
bars show 95% confidence bounds.
Autocorrelations of log returns and absolute log returns in the
artificial stock market.
- In financial markets, it is generally assumed that share price
oscillates around a "fundamental'' fair value, or fundamental price.
This price can be related to dividends, cash flow or profits
made by the firm. Empirically, it has been shown that
models of the fundamental price can account for some of the variance
in share price (Shiller, 1981,Kim and Koveos, 1994). Computational models of stock markets
have typically assumed either a static fundamental price, or a simple time-varying
price such as a first-order autoregressive process (Arthur et al., 1997b,Gaunersdorfer, 2000).
Because our model includes a consumer market, our fundamentalist traders
construct a fundamental price based on the actual past profits of the firm.
Figure 11 shows a simulated stock price and the
associated fundamental price, as calculated by the fundamentalist
traders, from a sample run. The simulation used the parameter settings
from table 3. The fundamental price was generated
by using Eq.(22) but assuming an adaption rate of 1.0.
The fundamental price was then rescaled and translated to compensate for
the actual adaptation rate of the fundamentalists (
Fundamental price (thick line) and stock price (thin line)
from a section of a single run of the integrated markets model.
The fundamental price has been translated and rescaled to compensate
for the adaptation rate of the fundamentalist traders.
This sequence shows several aspects of the artificial stock market. First,
the stock price roughly reflects the underlying fundamental price. The price
differential is due to the number of stocks held by the traders initially
(in our case 120). Second, the stock price oscillates at a higher frequency
than the underlying fundamental price. Despite this, fundamental price
information is incorporated slowly, due to the adaption rate less
than 1.0. Large stock price changes lag behind similar changes in the
fundamental price. Third, large changes in fundamental price lead to high
volatility in the stock price. Fourth, the stock price tends to over- or
under-shoot and then oscillate after a large change.
For this run, the proportion of fundamentalists was quite high ().
It is interesting that, under our model, decreasing the proportion of
fundamentalists tends to also decrease the kurtosis of the
returns. In a market with only 20% fundamentalists, the
returns look Gaussian. If the proportion of fundamentalists drops below
10%, the stock price collapses. The heterogeneity of the market
traders is necessary to maintain market liquidity and trading volume.
If the fundamental price remains static over a long period of
time, then the share price tends to decay in a deterministic
way to the fundamental price. The variation in fundamental price
due to the dynamics in the consumer market is an integral
part of the stock returns in our model.
- There is a known positive correlation between volatility and trading
volume in financial markets (Karpoff, 1987). That is, periods of high
volatility are also those of high trading volume.
Our integrated model exhibits the same behavior. There
is a correlation between volatility and trading volume.
High volume and high volatility are interrelated, and each
can significantly predict the other, although
the effect of high volatility on trading volume is longer lasting.
Figure 12 shows average cross-correlations and 95% confidence
intervals for stocks from the 20 runs of the simulator, with parameters
set as in Table 3.
Cross correlation between trading volume and absolute returns.
The figure was generated by averaging 45-day periods of volume
and absolute returns for 40 stocks (20 runs, 2 firms per run).
Cross correlations were measured for each stock. The plot shows
mean cross correlations and 95% confidence intervals. The plot
shows that volatility and trading volume are interrelated, with
each being a significant predictor of the other, although the
effect of volatility on trading volume is longer-lasting.
- We have described an integrated model consisting of three agent types:
production firms, consumers and financial traders. The agents
operate in two coupled markets: a consumer market and a financial market.
The model builds on previous work by simplifying and integrating previous
models of consumers, firms and traders. We have found that for a particular
reasonable setting of the parameters, a large number of stylized facts can be
reproduced simultaneously in the two markets. We have also indicated in which
parameter regimes the model does not perform well with respect to different
stylized facts. We have shown that it is possible to incorporate a profit
signal from a competitive consumer market endogenous to the model itself.
This endogenous profit signal provides some of the low-frequency
and large-scale variability seen in the financial market model.
We have introduced a new model validation technique based on Markov chain
Monte Carlo sampling, and used the new technique to investigate under which
model parameter regimes the model exhibits realistic behaviors. We have
shown that this technique can highlight interesting correlations between
model parameters and offer insights into the mechanisms underlying the
behavior of the model. We feel that this technique has wide applicability
to other agent-based models, and is an important contribution of this
We have demonstrated that the combined model is more than just the sum
of its parts. The behavior of each of the markets is substantially
influenced by the dynamics of the other market. In particular, firm
performance in the consumer market is significantly affected by how the
firm estimates future performance. Firms operate best given a mixture
of performance-based and stock-based pay. Similarly, the statistical
properties of the stock market are best for intermediate values of firm
parameters. We are currently using the integrated model to investigate
inter-market stylized facts that are beyond the reach of individual models.
These include managerial compensation schemes; product hype in the financial
market; consumer social networks and their effect on the financial market;
and brand-recognition-based trading strategies.
This work was funded by the Austrian Science Fund (FWF) under
grant SFB#010: "Adaptive Information Systems and Modeling in
Economics and Management Science''. The Austrian Research Institute
for Artificial Intelligence is supported by the Austrian Federal
Ministry of Education, Science and Culture.
The stock based
reward changes linearly with stock return. This would be consistent with a limited
stock grant, or with a call option where the current price of the underlying stock
is significantly above the strike price of the option.
We will drop the firm index in this section for
clarity. The same reinforcement learning algorithm is used for each firm, with
the same parameter settings. Each firm learns its own value function from
Note that at time price
and dividend are not included in the information set
The risk free
interest rate was set to 1.5%. Because of the fact
that holding the stock is riskier, the fair dividend yield should
be above the risk free rate.
Note that for chartists
for computing an average price change.
A limit order is an instruction stating the maximum price
the buyer is willing to pay when buying shares (a limit buy order), or the
minimum the seller will accept when selling (a limit sell order).
density plots were generated using the kernel density estimator for
Matlab provided by C.C. Beardah at http://science.ntu.ac.uk/msor/ccb/densest.html
(Beardah and Baxter, 1996).
Significance was measured in the following way:
First, the sequence of parameter values was subsampled such that
autocorrelations were insignificant. Given this independent
sample, the correlations between parameters could be measured,
and significance levels found.
ARGOTE, L. (1999) Organizational Learning: Creating, Retaining
and Transferring Knowledge Kluwer Academic Publishers.
ARTHUR, W. B., Holland, J., LeBaron, B., Palmer, R., and Tayler, P.
(1997a) The Economy as an Evolving Complex System II, chapter
Asset pricing under endogenous expectationsin an artificial stock market,
pages 15-44 Addison-Wesley, Reading, MA.
ARTHUR, W. B., Holland, J. H., LeBarron, B., Palmer, R., and Taylor, P.
(1997b) Asset pricing under endogenous expectations in an artificial
stock market In ARTHUR, W. B., Durlauf, S. N., and Lane, D. A.,
editors, The Economy as an Evolving Complex System II, pages 15-44.
Addison-Wesley, Reading, MA.
BAIER, T. and Mazanec, J. (1999) The SIMSEG project: A simulation
environment for market segmentation and positioning strategies
Technical report, SFB Adaptive Information Systems and Modelling in
Economics and Management Science.
BEARDAH, C. and Baxter, M. (1996) Matlab routines for kernel density
estimation and the graphical presentation of archaeological data In
KAMMERMANS, H. and Fennema, K., editors, Interfacing the Past: Computer
Applications and Quantitative Methods in Archaeology 1995, Analecta
Prehistorica Leidensia 28(1), Leiden.
BELLMAN, R. E. (1957) Dynamic Programming Princeton
University Press, Princeton, NJ.
BERTSEKAS, D. P. and Tsitsiklis, J. N. (1996) Neuro-Dynamic
Programming Athena Scientific, Belmont, MA.
BROCK, W. and Hommes, C. (1997) A rational route to
randomness Econometrica, 65:1059-1095.
BROCK, W. and Hommes, C. (1998) Heterogeneous beliefs and routes to
chaos in a simple asset pricing model Journal of Economic
Dynamics and Control, 22:1235-1274.
CHIARELLA, C. and He, X. (2001) Asset pricing and wealth dynamics
under heterogeneous expectations Quantitative Finance,
CHIARELLA, C. and He, X. (2002) Heterogeneous beliefs, risk and
learning in a simple asset pricing model Computational
DANGL, T., Dockner, E., Gaunersdorfer, A., Pfister, A., Soegner, A., and
Strobl, G. (2001) Adaptive erwartungsbildung und
finanzmarktdynamik Zeitschrift fΠr betriebswirtschaftliche
GAUNERSDORFER, A. (2000) Adaptive beliefs and the volatility of asset
prices Technical report, SFB Adaptive Information Systems and
Modelling in Economics and Management Science.
GROSSMAN, S. (1989) The Informational Role of Prices
MIT Press, Cambridge, MA.
HU, J. and Wellman, M. (2003) Nash q-learning for general-sum
stochastic games Journal of Machine Learning Research
JAAKKOLA, T. S., Singh, S. P., and Jordan, M. I. (1995) Reinforcement
learning algorithm for partially observable Markov decision
problems In TESAURO, G., Touretzky, D. S., and Leen, T. K., editors,
Advances in Neural Information Processing Systems, volume 7, pages
345-352. The MIT Press, Cambridge.
KARPOFF, J. (1987) The relationship between price changes and trading
volume: A survey Journal of Financial and Quantitative
KIM, M. and Koveos, P. (1994) Cross-country analysis of the
price-earnings ratio Journal of Multinational Financial
LEBARON, B., Arthur, W. B., and Palmer, R. (1999) Time series
properties of an artificial stock market Journal of Economic
Dynamics and Control, 23(9-10):1487-1516.
LEVY, M. and Levy, H. (1996) The danger of assuming homogeneous
expectations Financial Analysists Journal, 52(3):65-70.
LITTMAN, M. L. (1994) Markov games as a framework for multi-agent
reinforcement learning In COHEN, W. W. and Hirsh, H., editors, Proceedings of the Eleventh International Conference on Machine Learning,
pages 157-163, San Francisco, CA. Morgan Kaufmann.
METROPOLIS, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and
Teller, E. (1953) Equation of state calculations by fast computing
machines Journal of Chemical Physics, 21:1087-1092.
MUNDHE, M. and Sen, S. (2000) Evaluating concurrent reinforcement
learners In Proceedings of the Fourth International Conference
on Multiagent Systems, pages 421-422, Los Alamitos, CA. IEEE Press.
MUTH, J. (1961) Rational expectations and the theory of price
movements Econometrica, 29:315-335.
NATTER, M., Mild, A., Feurstein, M., Dorffner, G., and Taudes, A.
(2001) The effect of incentive schemes and organizational
arrangements on the new product development process Management
Science to appear.
RABERTO, M., Cincotti, S., Focardi, S., and Marchesi, M. (2001)
Agent-based simulation of a financial market Physica A,
RUMMERY, G. A. and Niranjan, M. (1994) On-line Q-learning using
connectionist systems Technical Report CUED/F-INFENG/TR 166,
Engineering Department, Cambridge University.
SANDHOLM, T. and Crites, R. (1995) Multiagent reinforcement learning
in the iterated prisoner's dilemma Biosystems Special Issue on
the Prisoner's Dilemma, 37:147-166.
SHILLER, R. (1981) Do stock prices move too much to be justified by
subsequent changes in dividends? The American Economic Review,
SIMON, H. A. (1982) Models of Bounded Rationality, Vol 2:
Behavioral Economics and Business Organization The MIT Press,
STEIGLITZ, K., Honig, M., and Cohen, L. (1995) A computational market
model based on individual action In CLEARWATER, S., editor, Market-Based Control: A Paradigm for Distributed Resource Allocation. World
Scientic, Hong Kong.
SUTTON, R. S. (1996) Generalization in reinforcement learning:
Successful examples using sparse coarse coding In TOURETZKY, D. S.,
Mozer, M. C., and Hasselmo, M. E., editors, Advances in Neural
Information Processing Systems, volume 8, pages 1038-1044. The MIT Press,
SUTTON, R. S. and Barto, A. G. (1998) Reinforcement Learning: An
Introduction The MIT Press, Cambridge, MA.
TESAURO, G. (1999) Pricing in agent economies using neural networks
and multi-agent Q-learning In IJCAI-99.
TESFATSION, L. (2002) Agent-based computational economics: Growing
economies from the bottom up Artificial Life, 8(1):55-82.
WATKINS, C. J. C. H. (1989) Learning from Delayed
Rewards Cambridge University, Cambridge, UK Ph.D. thesis.
WATKINS, C. J. C. H. and Dayan, P. (1992) Q-learning Machine Learning, 8:279-292.
WELLMAN, M. and Hu, J. (1998) Conjectural equilibrium in multiagent
learning Machine Learning, 33:179-200.
Return to Contents of this issue
Copyright Journal of Artificial Societies and Social Simulation,