The Use of Surrogate Models to Analyse Agent-Based Models

,


Introduction
. Agent-based models (ABMs) are used for a wide range of applications (Macal ) and with various aims (Edmonds et al. ). The most common aim (Heath et al. ) is 'explanation', i.e., using the ABM to assess whether modelled interactions provide possible explanations for an observed phenomenon. When using ABMs for explanation, sensitivity analysis is needed for two reasons. First, explanations may be tested by using sensitivity analysis to vary relevant parameters. For example, Brown & Robinson ( ) tested the hypothesis that heterogeneous agent preferences in an ABM of residential development can explain observed patterns of sprawl. In this study, sensitivity analysis revealed that increasing heterogeneity does indeed increase the amount of sprawl, confirming their hypothesis. Second, sensitivity analysis is used to assess the robustness of an explanation (Grimm & Berger ). If the phenomenon to be explained depends on unrelated parameters, the explanation might need revision, or alternative explanations may be involved. Thus, we need to test robustness of the explanation to parameter changes. For example, in the Schelling model social segregation emerges under relatively small intolerances for neighbours of a di erent type (Schelling ). Sensitivity analysis has shown robustness of this emergence to changes in a number of assumptions and parameters, increasing the relevance of the model as explanation (Gilbert ; Grimm & Berger ). ABMs are not o en used to make quantitative predictions (Heath et al. ), but when they are, sensitivity analysis is needed to reveal uncertainty of model predictions, and assess which parameters contribute most to this uncertainty (Ligmann-Zielinska et al. ). Note that this kind of assessment considers only uncertainty of predictions due to uncertain parameters, which is one out of multiple sources of uncertainty (Walker et al. ; Kwakkel et al. ). .
The main strength of ABMs is that they allow for modelling a wide range of features, such as heterogeneous agents, local interactions, and various kinds of agent adaptation and learning. These features can lead to complex behavioural structures, enabling for instance, tipping points, adaptation of populations, and resilience. While this makes ABMs ideally suited for modelling complex adaptive systems, it also makes it challenging to perform sensitivity analysis. Extensive exploration of ABM parameter space is usually not practical due to the large number of parameters, high computational costs per model run, and strongly nonlinear relations between parameters and model outcomes. As a result, it is especially di icult to show robustness of model results, or uncertainties of model predictions. Thus, it is an important challenge to analyse ABMs by e iciently exploring di erent parameter settings and extracting relevant information from the simulation results (Macal ; Filatova et al. ; Crooks et al. ).
. Sensitivity analysis is an essential tool to analyse ABMs. One-factor-at-a-time sensitivity analysis is a good tool for assessing e ects of individual parameters. Furthermore, it can reveal the existence of tipping points in parameter space, thus helping to distinguish qualitatively di erent types of model behaviour. In fields like ecology, systems biology, economics, or physics, distinguishing these behaviours is considered to be an important aim, and specialised tools for bifurcation analysis are available (Kuznetsov ). For example, in ecology bifurcation analysis may be used to assess whether or not populations will go extinct, or show cyclic behaviour. For ABMs bifurcation analysis tools are not applicable and the main tool for model analysis is sensitivity analysis (Ten Broeke et al. b). But, while one-factor-at-a-time sensitivity analysis is to some extent helpful for revealing di erent types of model behaviour, it ignores interaction e ects (Ligmann-Zielinska et al. ). Variance-based global methods (Sobol ) can assess interaction e ects over an ABM's entire parameter space and rank influential parameters based on their contribution to model output variance. However, these methods have two important limitations. First, the ABM outputs are aggregated without regarding the behaviour that underlies these outcomes. Thus, variance-based sensitivity analysis methods give little insight into what types of behaviour the model produces, which parameters determine the type of behaviour, or to what extent behaviour is robust (Ten Broeke et al. a; Magliocca et al. ). Thus, these methods do not really help to 'open the black box' (Lorscheid et al. ) of ABMs. Second, the computational costs of gaining accurate variance-based sensitivity estimates are o en prohibitive. These might be reasons why most ABM studies do not include a systematic sensitivity analysis (Thiele et al. ). .
In this paper, we present a method for sensitivity analysis that addresses the above limitations by using surrogate models. Surrogate models can provide a way to explore ABM parameter space while keeping computational costs managable (Dosi et al. ; Edali & Yücel ; Most & Will ). The idea is to scan parameter space, and to fit a surrogate model to the resulting simulation data. Preliminary versions of the surrogate model are used to assess where in parameter space additional simulation data should be gathered to improve the surrogate model's performance. Once the performance is su icient, the surrogate model is used to generate additonal simulation data. In this paper, we will use support vector machines (SVMs) and support vector regression (SVR) as surrogate models. Both are machine learning techniques that are frequently used in Big Data applications (Bennett & Campbell ). Other classification or regression methods may be considered instead of SVM or SVR. Our main reasons for using SVM and SVR are that they are relatively simple, and known to be very flexible for fitting di erent kinds of relationship between input and output. SVM and SVR surrogate models allow for the estimation of large amounts of simulation data at much lower computational costs than the ABM. We use this data to compute sensitivity indices that would be too costly without a surrogate model. Preliminary versions of the surrogate model are also used to predict where in parameter space additional samples are needed, thus helping to further control the computational costs.

.
Surrogate models have been previously used as a tool for sensitivity analysis. For example, Edali & Yücel ( ) fitted a SVR model to a simple tra ic ABM. Van Strien et al. ( ) have fitted an SVM to identify stable and unstable equilibria in an ABM of land-use change in a Swiss mountain region. They used the SVM to generate bifurcation diagrams that give insight into the resilience of the case-study. These results confirm that SVR and SVM can be suitable surrogate models for ABM. Other types of machine learning surrogate models have also been applied to ABM. For example, Lamperti et al. ( ) have demonstrated the use of decision trees for calibration of ABMs, and Barde & Van Der Hoog ( ) have proposed the use of Kriging metamodels for empirical validation of ABMs. Several studies have used a Bayesian approach to demonstrate the use of Gaussian processes for calibration (Ciampaglia ; O'Hagan ) and sensitivity analysis (Parry et al. ; O'Hagan ) of ABMs. The novelty of our approach lies in the combination of qualitative and quantitative analysis. .
Our method consists of two steps. In the first step, we classify ABM simulation data (e.g., using SVM) into distinct behavioural regimes for which the ABM behaviour is di erent according to a pre-defined criterion. This criterion might distinguish regions separated by a tipping point, or can represent any quality of interest to the model user.
For example, in this paper behavioural regimes will correspond to whether or not a population goes extinct, i.e., similar to a bifurcation diagram. We then compute sensitivity indices to show how robust this behaviour is to parameter changes. In the second step, we use a regression method (e.g., SVR) to assess quantitative di erences within a regime, and estimate which parameters are influential. In this way, our method yields additional insight over methods of global sensitivity analysis, which aggregate across parameter space as a whole without considering qualitative di erences in underlying model behaviour. While such an approach does show which parameters are influential overall, it o en remains unclear in what way they are influential (Ten Broeke et al. a). Our approach improves upon this, by distinguishing between whether a parameter is influential in driving the model towards a certain behavioural regime (e.g., extinction), or whether it is influential on the quantitative behaviour within the regime. Furthermore, using a surrogate model that divides parameter space in regions to accurately predict ABM output also helps to decrease the computational costs of global sensitivity analysis. . We apply the method proposed in this paper to analyse three case studies. The first case study is a di erential equation model of predator-prey interaction. ABMs similar to this di erential equation model have been discussed in scientific literature (Colon et al. ). Here we have chosen the di erential equation model because of its simplicity, and because it has already been thoroughly analysed (Van Voorn et al. ). Both ABM versions and di erential equation versions of the model contain tipping points beyond which the population goes extinct. Furthermore, it has been previously shown how these tipping points a ect global sensitivity analysis results (Ten Broeke et al. a). The second case study is an ABM in which agents compete in a spatial environment for a common-pool resource (Ten Broeke et al. b). The third case study is an ABM of a Philippine tuna fishery system (Libre et al. ). Like the di erential equation model, both ABMs have tipping points where the system collapses due to over-exploitation. The resource-consumer model has a relatively high level of abstraction, whereas the fishery model is a more concrete case study. For all three case studies, we demonstrate how SVM can be used to locate tipping points, which separate di erent behavioural regimes. Next, we use SVR to further investigate model behaviour within the region of parameter space for which the system does not collapse. In the Methods Section we describe the machine learning and sensitivity analysis techniques used in this paper. The Model Description Section gives a summarised description of the ABM test cases. The Results Section describes the results of applying SVM and SVR to analyse these test cases. The Discussion Section discusses the insights gained from applying the approach, and describes the strengths and limitations of the proposed method versus current methods of ABM analysis.

Methods
. Our approach consists of two main parts. In the first part we classify the ABM's output into di erent behavioural modes, and determine which parameters drive the ABM towards these behavioural modes. We assume here that the behavioural modes are defined in advance based on the interests of the modellers. Thus, the modellers need to already have some knowledge of what kind of behaviour can be produced by the model. This knowledge can for instance be obtained by performing an OFAT analysis (Ten Broeke et al. b).
For the examples in this paper, the criterion is whether or not the system goes to extinction, but any other criterion could be used instead. ABM output is compared directly to this criterion to determine to which behavioural mode model runs belong. An alternative approach would be to use an automated technique, such as a clustering procedure (Edali & Yücel ). This approach would help the modeller to explore what kind of behaviour the ABM generates without any pre-defined criterion, and would make the method applicable without having knowledge of the behaviour produced by the model. In the second part, we zoom in on a specific behavioural mode and we determine which parameters strongly a ect the output. In this step, we apply a surrogate model to predict continuous ABM output. The approach is summarised in Figure and discussed in more detail in the following paragraphs. R code for the analysis of both example models is available as an electronic supplement (https://github. com/GuustenBroeke/ABM-surrogates). All of the simulation data used during the analysis and the NetLogo files of the ABMs is also available here. Both ABMs are also available on OpenABM (https://www.comses. net/codebases/5374/releases/1.1.0/ for the resource-consumer model, and https://www.comses.net/ codebases/a5823aa6-e45c-4d2e-b930-0bf9546cd514/releases/1.0.0/ for the fishery model).

Sampling ABM parameter space .
To train a surrogate model, we need a training set of sampled ABM parameter settings and corresponding output.
To each parameter we assign a probability density function that represents the range over which we want to explore its e ects (see e.g., Ligmann-Zielinska et al. ( )). We draw a sample from these density functions using a replicated Latin hypercube design (McKay et al. ) to achieve balanced coverage of parameter space. .
For the case studies in this paper, we will use an initial training sample of points. If the initial training sample does not yield su icient performance of the surrogate model, we add additional sample points to the training sample using the adaptive sampling procedure described by Xu et al. ( ). The main idea is to draw additional sample points in the vicinity of sample points in which the surrogate model shows large prediction errors. In this way, the sampling density is increased in regions where the ABM dynamics show high activity. More precisely, for each iteration of the procedure, a large ( times the size of the training sample) pool of sample points is generated. Each of these sample points is assigned to the closest training sample point based on Euclidian distance in parameter space. We then select a number of training sample points for which the prediction errors are largest (we use a batch size of ). For each of these training sample points, we select a single sample point from the pool that was assigned to it. This sample point is added to the training set. The process continues either until the surrogate model fits su iciently well (an F score exceeding . for classification, or a coe icient of prognosis exceeding . for regression), or until a maximum number of model runs has been performed. This maximum number of runs will usually depend on the computational costs of the ABM. For our three case studies, the details are provided in Sections . , . and . respectively. In addition to the training sample we also draw a test sample, using replicated Latin hypercube sampling. This test sample remains the same throughout the analysis, and allows to test the performance of the surrogate model on an independent sample. For each of the case studies we use a test sample size of model runs. Figure : Flowchart for sensitivity analysis using surrogate models.

Support vector machines .
We use SVM surrogate models to classify ABM output into behavioural modes. Here we summarise the main ideas. A more thorough description of SVMs can be found in Appendix A and Ng ( ). For implementation we have used the e package in R (Meyer et al. ). While we consider classification into two behavioural modes, the method can be extended to multiple modes by training a separate SVM for each mode. We write x for the training sample of ABM parameters, and y for the corresponding behavioural mode. The index i refers to specific sample points x i and their corresponding behavioural mode y i . The criterion for classification is Here, we write x for a parameter setting to be classified, h(x) for the corresponding SVM prediction, and m for the number of training sample points. Note that the point x to be classified di ers from the training set x . The SVM prediction is based on the similarity between both, which is measured by the kernel function K(x i , x). A weight a i is assigned to each training sample point. Training consists of optimising the values of a i and b such that the training sample points are classified correctly. Choosing a good sample size m is crucial for training the surrogate. As discussed in Section . , we will start with m = 1000, but will increase the sample size over a number of iterations if needed. .
, any symmetric positive semi-definite function may be used (Ng ). The Gaussian kernel is the most common choice because it has been shown to perform well for many applications (Hsu et al. ). For this paper we have compared the performance of the Gaussian kernel against several other commonly used kernel functions, and we indeed find that the Gaussian kernel yields the best performance (Appendix C). So, throughout this paper we will use the Gaussian kernel, ( ) The value of K(x i , x) ranges between 0 and 1. If K(x i , x) = 1, both samples are identical, and K(x i , x) asymptotically decreases to zero as the distance between the samples increases. Thus, the SVM tends to give the same prediction for nearby samples.

Support vector regression
. We use SVR surrogate models to predict real-valued output data for some behavioural mode. This allows us to zoom in on a region of parameter space identified using SVM, and to investigate which parameters a ect model output within this region the most. The prediction is given as Similar to the SVM case, a i , a * i , and b are optimised using the training data. Details of this procedure are given in Appendix A, and by Smola & Schölkopf ( ). As for the SVM case, we will use an initial training sample of size n = 1000, but will increase this sample size when needed. The kernel function K(x i , x) needs to satisfy the same constraints for SVR as it does for SVM. We have tested several common alternatives (Appendix C), and we found that the Gaussian kernel (Equation ) yields the best overall performance. Therefore, we will thus use the Gaussian kernel, also for SVR.

Cross-validation and testing .
Both SVM and the SVR can be tuned by choosing the values of a few 'meta-parameters', which yields great flexibility for fitting a wide range of functions (Appendix A). But, these values should be chosen such that overfitting or underfitting are avoided. To this end we use fivefold cross-validation. The training set is split randomly in five equal parts. Four parts are recombined and used as training set, for a given set of meta-parameter values. The remaining part serves to assess the performance of the surrogate model. This is repeated five times, so that each of the five parts once serves to assess the performance. The meta-parameter values that on average yield the best performance are selected. The main idea behind this procedure is to avoid over-fitting by using independent samples for choosing meta-parameter values and training the surrogate model.

Sensitivity measures .
We use sensitivity indices for two aims. The first aim is to indicate which parameters drive the model towards a behavioural mode. The second aim is to zoom in on one behavioural mode, and indicate which parameters strongly a ect model output within this mode. A priori, we cannot know whether the same parameters are influential for both aims. Potentially, some parameters might influence the location of the border between behavioural modes, whereas other influence the quantitative results within a behavioural mode. Therefore, we use di erent indices to measure the contribution of each parameter for both aims.
. When determining which parameters drive the model towards a behavioural mode, the model outcome is a binary variable y(x) that shows whether or not the behavioural mode is found for parameter settings x. To indicate the sensitivity of a single parameter p, we consider the conditional probability of a behavioural mode (e.g., y(x) = 1), a er fixing that parameter, P (y(x) = 1|x p ). If the probability strongly depends on the value of x p , this indicates that the parameter influences which behavioural mode the model will display. If, in contrast, the parameter has little e ect, then the conditional probability will vary little. As sensitivity index, we use the total range over which the conditional probability varies S p ranges between and , with larger values corresponding to influential parameters. We use the SVM surrogate model to compute S p . To compute P (y = 1|x p ), we fix the parameter p, and sample from the remaining parameters. The probability is computed as the probability of a random point from this sample corresponding to a behavioural mode y = 1. The maximum and minimum in Equation are then computed over a number of di erent values of the parameter p, covering its full range. .
The second type of sensitivity index assesses the quantitative e ect of a parameter within a behavioural mode.
A choice for this could be the standard variance-based sensitivity index as proposed by Sobol ( ); Saltelli et al. ( ). This index assigns portions of variance in the output to di erent parameters. If fixing a parameter causes a considerable decrease in the variance of model output, the sensitivity of the model to this parameter is apparently large. If fixing the parameter does not decrease the variance, then the parameter apparently has no e ect on the model output. This index is suitable when the variance is a good measure of the variation of model output, in which case the index also can be used to quantify interaction e ects. However, for some models the variance is not a good measure for output variation, for example if the output distribution is bimodal, or contains strong outliers. This has been demonstrated for two of our case study models (Ten Broeke et al. b,a). In Ten Broeke et al. ( a) it was shown that our first case study displays di erent, distinct modes of behaviour, which are separated by nonlinear boundaries in parameter space and are also dependent on initial conditions. As a result, the output is far from normally distributed, limiting the usefulness of variance-based sensitivity indices. .
Several alternative indices are available that do not depend on the variance of the output distribution. Borgonovo ( ) introduced moment-free sensitivity indices that take into account the entire output distribution. Essentially, these indices are based on a comparison between the output distribution, and the conditional output distribution a er fixing one parameter. This comparison is made for di erent values of the fixed parameter. If the fixed parameter is influential, both distributions will di er strongly on average. Entropy-based indices (Krzykacz-Hausmann ) are conceptually similar, but use an entropy-based measure to compare both distributions. Other possible alternatives include the DELSA method (Rakovec et al. ), risk-based measures (Tsanakas & Millossovich ), and Shapley e ects (Molnar ). We believe any of these indices can be suitable for our method, although the reliance of DELSA on local derivatives is problematic for stochastic ABMs. A comparison between indices would be a useful topic of further study. For the present paper, we choose to use entropy-based indices (Krzykacz-Hausmann ) for their embedding in information theory. .
The entropy H(Y ) of output variable Y measures its unpredictability. Uncertain parameter values lead to output that is to some extent unpredictable. This unpredictability will decrease when parameter values are fixed. This decrease will be larger for parameters to which the model is strongly sensitive. Based on this idea, the entropy-based sensitivity is defined as, Here H(Y ) is the entropy of Y , and I(x p , Y ) is the 'mutual information' between parameter x p and the output Y . If all information about Y is contained in x p , then I(x p , Y ) = H(Y ), so that e p = 1. In this case, the model output would be exactly predictable upon learning the value of x p . If, in contrast, learning the value of x p does not yield any information on Y , then I(x p , Y ) = 0. Using the expression H(Y ) = I(x p , Y ) + H(Y |x p ) yields the rightmost expression for e p in Equation .
Here H(Y |x p ) is the conditional entropy, i.e., the entropy in Y that remains upon learning the value of x p . Note that the remaining uncertainty of Y can depend on the specific value taken by x p . Thus, the computation of H(Y |x p ) involves aggregation over the probability distribution of x p . For the computation of H(Y ) and H(Y |x p ) we used the R package by Hausser & Strimmer ( ). For further details regarding the computation of entropy-based sensitivity indices, we refer also to Krzykacz-Hausmann ( ). .

Equation does not yield any information on interactions between
x p and other parameters. Such interactions can be relevant, because the e ect of x p may depend on the values of the other parameters. To assess parameter interactions, we propose the following index, where H(Y |x ∼p ) represents the entropy of Y a er learning the values of all parameters except for x p . Note that this entropy depends on the values of all parameters except for x p (which is unknown) and the computation of the entropy involves aggregation over those parameters. Therefore the mean in Equation is computed over the probability distribution of the remaining parameter x p . While e p expresses the decrease in uncertainty about Y upon learning x p , e * p expresses the uncertainty about Y that is expected to remain as long as x p remains unknown. The former excludes interactions with other parameters, while the latter includes all such interactions.
Thus, e * p should normally be equal to or larger than e p . If, for example, both are zero, it implies that x p has no e ect on the model output. But, if e p = 0 and e * p is positive, it means that x p has no e ect on the output by itself, but it does have an e ect if changed in combination with other parameters. Like for the computation of H(Y |x p ), for the computation of H(Y |x ∼p ) we used the R package by Hausser & Strimmer ( ). To compute the expection value in Equation we repeated the computation of H(Y |x ∼p ) for randomly sampled values of x p and computed the mean over these values.

.
Computing Equation and over a subset of parameter space that corresponds to a behavioural mode may cause dependencies between the model parameters. As a result, the e ects of di erent parameters can become di icult to disentangle. For example, a parameter might be indicated as influential because it has a direct e ect on the model output, but it may also be influential because it limits the range of a di erent parameter, which in turn a ects the model output. We therefore check the correlations between parameters. If parameters are strongly correlated, there may be indirect e ects like in the above example. To further account for such e ects, we will compare Equation and to a sensitivity index based on the mean gradient of the output with respect to a parameter, where x p,min and x p,max denote the minimum and maximum value of x p , a er fixing the other parameters. This index isolates the e ect of individual parameters. Thus, if a parameter were to be strongly influential according to Equation and , but not according to Equation , then this indicates an indirect e ect.
. The computation of above indices, especially Equation (Krzykacz-Hausmann ) and Equation , requires a very large number of model runs. We therefore use the SVR surrogate model to estimate the ABM output to keep the computational costs within acceptable bounds.

Model Description
. We study the following three test-cases.

Bazykin-Berezovskaya model .
The Bazykin-Berezovskaya model describes the interaction between a prey population with an Allee e ect and a predator population. The populations are described by an ordinary di erential equation model, where Y 1 represents the prey population density, Y 2 the predator population density, ζ the Allee threshold, κ the prey carrying capacity, γ the conversion factor from prey to predator, and h the predator mortality rate. The model contains a transcritical bifurcation in parameter space. On one side of this bifurcation the population always goes extinct, while on the other side it either goes extinct or converges to a steady state, depending on the initial conditions. In addition, it features a Hopf bifurcation beyond which the population oscillates, rather than going to a steady state. For a more detailed analysis of these bifurcations and a standard global sensitivity analysis of the model, we refer to Ten Broeke et al. ( a). In that paper, it was shown that the presence of bifurcations in parameter space a ects the global sensitivity analysis outcomes, and can lead to misleading outcomes. .
In this paper we show how our approach based on surrogate models can be used to deal with these tipping points in a sensitivity analysis. We consider whether or not the populations survive as distinct behavioural modes. In the first part of the analysis we will investigate under what conditions extinction occurs, and which parameters are involved. In the second part, we will focus on the region of parameter space where the populations survive.
As main output we consider the population density Y 2 , averaged between time-steps t = 1000 and t = 5000 to decrease the e ect of oscillations on the output. We use the same parameter ranges as in Ten Broeke et al.
( a) (see Appendix D). This means that to demonstrate the method we vary the parameters ζ and h and the initial condition Y 2 (0), while keeping other parameters fixed.

Resource consumer ABM
.
The resource consumer ABM was previously published as a test-case for sensitivity analysis methodologies for ABMs (Ten Broeke et al. b). The model is relatively simple, while still possessing essential characteristics that complicate sensitivity analysis for ABMs. The model consists of a square grid of sites on which resource grows and di uses. Agents can move between sites and harvest from sites. Through these actions, agents compete for the resource, which they need to survive, to act, and to procreate. .
The original publication (Ten Broeke et al. b) includes a one-factor-at-a-time sensitivity analysis, regressionbased global sensitivity analysis, and model-free global sensitivity analysis. It was concluded that the global methods yielded little insight into model behaviour because tipping points where the population goes extinct are ignored by these methods (Ten Broeke et al. a). In the following we show that more insight is gained by considering survival and extinction as distinct behavioural modes in our procedure. In the first part of the analysis we will investigate under what conditions extinction occurs, and which parameters are involved. In the second part, we will focus on the region of parameter space where the population survives. As main output we consider the number of agents, averaged between time-steps t = 500 and t = 1000. This averaging was done to allow for the model to converge to steady state, and to average out stochastic fluctuations. The considered parameter ranges are given in Appendix D. Both the considered output and parameter ranges are the same as those used in the previous analysis (Ten Broeke et al. b).
The fishery ABM was published with the aim to 'evaluate the e ects of social factors and bounded rationality on macro-level outcomes of fisheries' (Libre et al. ). The spatial component of the model consists of a square grid of patches, on which fish grows and di uses. Fishing companies, which are modelled as agents, decide yearly to buy or sell vessels. Fishing vessels, also modelled as agents, decide daily whether and where to fish. The agent behaviour of the companies and vessels are based on data from interviews with Filipino fishers. Default parameter values are based on the same interview data, as well as reports from the Philippine Bureau of Fisheries and Aquatic Resources and the Western and Central Pacific Fisheries Commission.
. Previously, a one-factor-at-a-time sensitivity analysis was performed in which parameters are varied individually over a wide range (Libre et al. ). No global sensitivity analysis was performed. For the global sensitivity analysis in the present paper, we vary all parameters over the same range as used for the one-factor-at-a-time sensitivity analysis (Table ). We use uniform distributions, except for some parameters of which the sum must equal one. For those parameters, we used a Dirichlet distribution to vary the individual parameters while keeping their sum constant. As main outputs we consider the total yearly profit of fishing companies and the total biomass of fish in the system. Like in the original publication, we simulate a time-span of years per model run.
To remove stochastic fluctuations, the output of each run is averaged over the second half of the simulation, a er which the model has converged.

Bazykin-Berezovskaya model
Performance of method . Both the test set and the training set consist of sample points, drawn using replicated Latin hypercube sampling. Since the model is deterministic, no replicates were used. Bifurcation analysis was used to determine analytically for which sample points the population goes extinct. This analytical outcome was used to train a SVM surrogate model to predict whether extinction occurs for sample points in the test set. We assess the performance of the SVM by its F -score (Sasaki ) (Appendix D). A score of would correspond to all test samples being classified correctly. Based on the training set of sample points, the surrogate model achieves an F -score of . . Thus, even without adaptive sampling the surrogate model performs well. .
For the sample points with a non-extinct population we train an SVR surrogate model to predict the population density Y . Plotting the SVR predictions against the test set output reveals a good match (Figure ). We measure the performance of the SVR by the coe icient of prognosis (Most & Will ) (Appendix B) between the model output and SVR prediction. A coe icient of prognosis of means that both models generate exactly the same output, while a value of means that there is no correlation between the outputs. Based on the training set of sample points, we obtain a coe icient of prognosis of . . Thus, the SVR accurately predicts the population density, and no sequential sampling is needed.

Classification into behavioural models .
In total, out of sample points in the test set maintain a positive population. The SVM predicts for which parameter settings the population goes extinct. Figure shows the decision boundary as function of ζ and h for y 0 = 0.1. The population survives when h is su iciently large, and ζ is su iciently small. Based on the values of the sensitivity indices S p (Table ) ζ is the most important parameter for the survival of the population. But h and to a lesser extent Y 2 (0) also influence survival.

Analysis of behavioural mode with surviving population
.
The SVR surrogate model predicts the population size for parameter settings where the population survives.
The results show that when considering only the region of parameter space with a positive population, h is clearly the most influential parameter. This is true for both the first-order indices and the total-order indices.
In contrast, when considering the entire parameter space, h and ζ are approximately equally influential. This reveals that the influence of ζ on the population size is largely explained by its influence on the tipping point where the population goes extinct. The variance-based index S p shows ζ to be more influential than h, seemingly contradicting the values of e p and e p,f ull . It has been previously shown that variance-based indices overestimate the contribution of ζ compared to other indices. For a detailed discussion, we refer to Ten Broeke et al. ( a). In short, increasing ζ leads to increased output in all parts of parameter space. Increasing h increases the output in some parts, but decreases it in other parts. Variance-based indices tend to be larger for parameters that have a consistent e ect throughout parameter space. Thus, which parameter is the most influential depends on the used definition of influential. .
Note that the values of g p , which separate the e ects of parameter dependencies, which are reported in Appendix E, confirm that h is the most influential parameter. The initial condition Y 2 (0) can drive the population to extinction, but has no e ect on the size of surviving populations, as indicated by the values of g p . This is consistent with the existence of a global bifurcation in this model (Van Voorn et al. ), that works as a separator of two domains of attraction. Depending on the initial condition, the population will hence persist or go extinct.    Table : Sensitivity indices of the Bazykin-Berezovskaya model. Entropy-based indices are reported both over the full range of all parameters (e p,f ull and e * p,f ull ) and over the range for which a positive population is maintained (e p and e * p ). There are di erences between the sensitivity indices calculated for the behavioural mode of interest, for the full parameter space, and for the classification.

Performance of method .
The test set for the resource-consumer ABM consists of sample points. For each points, we performed replicate runs to assess the e ect of stochasticity. This sample was drawn using a replicated Latin hypercube design, of cubes with sample points each. For a number of samples the population goes extinct. Stochasticity has little e ect on the occurrence of extinction, since . % of replicate pairs show the same outcome. A SVM surrogate model was trained to predict where in parameter space extinction occurs. We assess the performance of the SVM by its F -score (Sasaki ) (Appendix B). A score of would correspond to all test samples being classified correctly. For a replicated Latin hypercube training sample of size , a score of . is achieved.
. Of the variance in population size, . % is explained by stochasticity, and thus cannot be captured by the SVR surrogate model. We measure the performance of the SVR by the coe icient of prognosis (Most & Will ) (Appendix B) between the ABM output and SVR predictions, both for replicated Latin hypercube sampling and adaptive sampling. A coe icient of prognosis of means that both models generate exactly the same output, while a value of means that there is no correlation between the outputs. To compare adaptive sampling to random sampling, we start with a training set of sample points, and expand this set using either sampling method. The results show that adaptive sampling leads to a faster increase in the coe icient of prognosis ( Figure ). Note that for both methods the number of samples used for training is smaller than the number of drawn samples, because some of the drawn sample points result in extinction and therefore are not used for training the SVR. For the adaptive sampling, however, fewer points need to be discarded, since the use of the surrogate model helps to avoid sample points outside the region of interest. If only samples with a non-extinct population were included, the di erence between both sampling methods would be smaller. For a training sample size of runs, we obtain a coe icient of prognosis of . . The SVR predictions for the test set are plotted against the true values in Figure . While there are some deviations, especially at the extremes, the predictions generally match the ABM output well. Figure : Surrogate model performance as function of number of drawn samples. For adaptive sampling (red squares), the coe icient increases faster than for replicated Latin hypercube sampling (black circles). For both sampling methods the coe icient of prognosis was computed with respect to the same fixed test sample, which was drawn using replicated Latin hypercube sampling (Section . ).

Classification into behavioural modes .
Out of sample points in the test set, maintain a positive population. The SVM defines a decision boundary in parameter space beyond which the population is predicted to go extinct. This decision boundary may be visualised for any pair of ABM parameters, while keeping all other parameters fixed (similar to a bifurcation diagram) (Fig ). Visualisation of the decision boundary is not possible when varying all parameters simultaneously, but a parallel coordinates plot (Inselberg et al. ) could be used to visualise which parameter combinations lead to di erent behavioural modes. Instead, we report the sensitivity indices S p (Table ). The indices show that the conversion e iciency, maximum resource harvest size and energy cost of harvesting strongly a ect the occurrence of extinction. All of these represent bio-physical parameters involved directly in the process of harvesting resource.

Analysis of behavioural mode with surviving population .
The entropy-based sensitivity indices have been estimated based on additional sample points for which the output was obtained using the surrogate model. These indices show which parameters strongly a ect the size of the population if it survives. Considering only samples with surviving populations introduces correlations between the input parameters, since certain parameter combinations are more likely to lead to extinction. As discussed in the methods section, these correlations can make it di icult to disentangle the e ects of di erent parameters in the entropy-based indices. For our model, the energy cost of harvesting is positively correlated with both the conversion e iciency, and the maximum resource harvest size. Other parameter pairs are not as strongly correlated. These pairs all have correlation coe icients with an absolute value below . (Appendix E). For the conversion e iciency, the correlation seems to lead to a low value of e p . The value of g p shows that the parameter is influential not only on the classification, but also on the population size. The sensitivity indices show that the resource growth rate clearly has the strongest e ect on population size. Thus, while this parameter has a relatively small e ect on the survival of the population, it has a major e ect on the size of the population, if it survives. The maximum harvest size influences both the survival of the population and the population size. Increasing this parameter increases chances of survival of the population, but decreases population size. I.e., if agents can harvest more during a time-step, this generally leads to a smaller population, but it decreases the chances of extinction.  Table : Sensitivity indices of the resource-consumer ABM. Entropy-based indices are reported both over the full range of all parameters (e p,f ull and e * p,f ull ) and over the range for which a positive population is maintained (e p and e * p ). To improve legibility, any values of . or smaller have been omitted, values above . have been printed bold, and values above . have been underlined. All indices were computed using a total training sample size of for the surrogate models, obtained through adaptive sampling. The di erences between e p and e * p indicate there are large interaction e ects for some parameters. This translates into a nonlinear boundary between the regions of extinction and persistence as depicted in Figure . .
Comparison between the indices S p , e p and e p,f ull reveals that generally the values of e p and e p,f ull show a similar pattern. The same is true for the total-order indices e * p and e * p,f ull . Thus, for the number of agents as output the results of the analysis are similar when applying the method over the full parameter space, versus only the region that supports a population. The main contribution of our approach is that it also distinguishes which parameters determine whether the population will go extinct. For example, the agent mortality coe icient is strongly influential on whether the population goes extinct, but not on the population size. In contrast, the resource growth rate is the most influential parameter on the population size, but has little e ect on the occurrence of extinction. .

Fishery ABM
SVR was used to predict both the fish stock, and the combined yearly profit of fishing companies. We use the same sample points for the test set as for the SVM. For both outputs, stochasticity explains less than % of variance. Based on the same training samples that were used for the SVM, the SVR performance is very good for both outputs, as measured by a coe icient of prognosis of . for fish stock, and . for total profit. The performance is visualised by plotting the SVR predictions against the ABM output (Figures and ).

Classification into behavioural modes .
We use the SVM to examine which parameters determine whether or not the fishery still exists a er years.
In total, of sample points in the test set maintain a positive number of fishing vessels. The estimated sensitivity indices (Table ) reveal that no single parameter is as influential as some of the parameters in the other case studies. We could not find a good visualisation of the classification, since no single parameter has a strong enough e ect for a clear visualisation. The indices show that the weight of available sites in P(entry) (and to some extent the other weights involved in P(entry)), the number of potential entrants, and the maximum amount of fish are relatively important parameters. Except for the maximum amount of fish, all of these parameters relate to new fishing companies starting up. Thus, controlling the influx of new companies is important for sustaining the fishery. This influx ensures that when companies go bankrupt, new companies are available to replace them.

Stock and profit .
For the model runs where the fishery still exists a er years there are no strong correlations between parameters in the training or test set, which enables us to easily identify the e ects of each individual parameter. For fish stock, the maximum amount of fish is clearly the most influential parameter. The value e p = 0.48 indicates that a substantial part of the uncertainty in model predictions is solely attributed to this parameter. Furthermore, the value of e * p = 0.80 indicates that the parameter is also involved in interaction e ects. A number of other model parameters are also involved in interactions. Both economic parameters, such as base skipjack price and fuel cost per liter, and ecological parameters, such as fish growth rate, are relevant. For a number of parameters, e * p = 0. This indicates that those parameters do not contribute to output uncertainty, even when considering interaction e ects. .
Note that parameters that strongly a ect the classification (measured by the indices S p ), may not be influential on predictions of stock or profit, or vice versa. For example, while parameters related to entry of new fishing companies are among the most important parameters for existence of the fishery a er years, these parameters have little e ect on stock or profit if the fishery persists. In contrast, skipjack price strongly a ects profit of the fishery, if it continues to exist, but has little e ect on whether or not it does continue to exist. The only parameter that is influential on all the considered outputs is the maximum amount of fish in the system. Di erences between e p and e p,f ull are minor. Thus, generally the same parameters are influential for both the full parameter space, and the region where extinction does not occur. But, our method yields additional information over standard global sensitivity analysis, since it also shows which parameters a ect the continued existence of the fishery, as measured by the indices S p .    Table : Sensitivity indices of the fishery ABM. Entropy-based indices are reported both over the full range of all parameters (e p,f ull and e * p,f ull ) and over the range for which a positive number of vessels is maintained (e p and e * p ). To improve legibility, any values of . or smaller have been omitted, values above . have been printed bold, and values above . have been underlined. All indices were computed using a training sample size of for the surrogate models.

Conclusions and Discussion
. In this paper, we introduced a sensitivity analysis approach that uses surrogate models to distinguish between di erent behavioural modes. We demonstrated the approach using three case studies. For all three case studies, the SVM and SVR surrogate models yield good fits with the test data. Furthermore, our method yields insight into the behaviour of the models. For the Bazykin-Berezovskaya model, the method shows that di erent parameters a ect the survival of the population and the population density. Furthermore, this case also shows that performing sensitivity analysis over the region of parameter space with a surviving population can yield di erent outcomes from performing a standard global sensitivity analysis over the entire parameter space. For the resource-consumer ABM a variance-based global sensitivity analysis was previously performed. Our procedure improves upon this analysis by distinguishing between runs with qualitatively di erent behaviour, which cannot be achieved by standard variance-based sensitivity analysis. Using our procedure, we can distinguish between parameters that impact the qualitative behaviour of the ABM, and parameters that influence the quantitative outcomes, such as the size of the population. Thus, our method gives additional information, because it shows that di erent parameters are important for the quantitative and qualitative outcomes. For example, the parameters for the conversion e iciency, mortality coe icient, and maximum harvest in the resource-consumer model strongly influence the occurrence of extinction, but are relatively less influential on the quantitative outcomes (Table ). The same holds true in the fishery model for the four parameters related to entry of fishing companies (Table ). This information could not have been obtained using standard variance-based sensitivity analysis, because that approach does not distinguish between di erent qualitative behaviours. Thus, our approach is especially valuable if one aims to assess, in addition to the quantitative model outcomes, transitions between di erent types of behaviour, and which parameters influence these transitions. We therefore believe that our approach presents one step towards addressing the need for methodologies to analysis for ABMs (Macal ; Filatova et al. ; Crooks et al. ).
In our examples, the behavioural modes are separated by tipping points beyond which the system goes extinct, but any criterion of interest to the model user could be substituted. In some cases, like the second and third case study, the parameters that influence the quantitative behaviour within a behavioural mode may be the same as those that are influential in a standard variance-based sensitivity analysis. When comparing these indices, it is relevant to consider the sample size corresponding to the behavioural mode of interest relative to the total sample size. Other cases, like the first case study, display clear di erences between the sensitivity indices. If clear di erences exist, this indicates that the parameter sensitivities vary depending on the region of parameter space. Note that in particular the first case study has strong non-linearities, which may explain such di erences in the sensitivity indices. .
We have used entropy-based sensitivity indices to express parameter importance. This includes the index e p introduced by Krzykacz-Hausmann ( ) and the index e * p , which is based on the index e p , but was not previously used in this form. The more commonly used variance-based indices are not applicable in case of parameter dependencies, because the normalisation breaks down. Entropy-based indices provide a good alternative. Furthermore, they have the added advantage that they do not depend only on the second moment of the distribution, but on its entire shape. In addition, we have used gradient-based indices, which can separate the direct e ect of a parameter on the output, from the indirect e ect of limiting the range of other parameters when a subregion of parameter space is considered. Alternative sensitivity indices or visualisation methods could still be explored to express parameter importance. Our approach in this paper is mostly aimed towards global analyses where all parameters are varied simultaneously. Compared to local methods, global methods have much higher computational costs, which makes the training of a surrogate model worthwhile. An overview of global sensitivity analysis methods is given by Iooss & Lemaître ( ). Since ABMs are o en strongly non-linear indices based on regression Pearson correlation coe icients or standard regression coe icients are not suitable. The indices proposed by Borgonovo ( ) could be suitable, and have a similar interpretation to the entropybased indices in terms of first-and total-order e ects. Other alternatives include risk-based measures Tsanakas & Millossovich ( ) or the DELSA method Rakovec et al. ( ), scatterplots can be useful, but are o en di icult to interpret if the number of model parameters is large. Parallel coordinate plots can be a valuable tool for visualing the classification outcomes, or for example the % largest simulation outcomes. Likewise, methods to increase 'interpretability' of machine learning models (Molnar ) could also be applied. Methods like permutation feature importance and Shapley values look especially promising, and could likely be used to generate indices similar to first-and total-order indices. To evaluate the advantages and disadvantages of these di erent methods in more detail, a direct comparison would be needed.

.
Previously we have compared several standard methods of sensitivity analysis based on application to the same resource-consumer model discussed in this paper (Ten Broeke et al. b). One of the main conclusions was that one-factor-at-a-time sensitivity analysis, in which parameters are varied individually across a large range, is a good starting point for sensitivity analysis of ABMs. Compared to the method discussed here, one-factorat-a-time sensitivity analysis is computationally cheaper. Furthermore, the method shows causal e ects of changes in individual parameters on the model output, and can reveal tipping points. The disadvantage is that interaction e ects are not taken into account. The approach in this paper does assess interaction e ects by doing a more extensive exploration of parameter space. Note, however, that the global e ect of a parameter can di er strongly from its local e ect around a specific point in parameter space. For example, the sensitivity indices of the fishery model show that di usivity of fish has only a minor e ect on the continued existence of the fishery. This means that in some parts of parameter space, changes in this parameter will cross a tipping point leading to disappearance of the fishery, as was noted in the original publication (Libre et al. ). Thus, the method presented here is not a substitute for local one-at-a-time sensitivity-analysis. Both methods give di erent kinds of information, and can be used to complement each other. .
The present methodology requires some prior information about the model behaviour to make a classification and start up the analysis. Performing an OFAT analysis is one way to obtain information about e.g. tipping points that function as boundaries between di erent behavioural modes Ten Broeke et al. ( b). An alternative approach is to use a clustering procedure, as was done by Edali & Yücel ( ). This would make the method applicable without previous knowledge about model behaviour. It depends on the aims of the researcher which of these approaches is the most suitable. The approach in this paper is suitable if one is interested in specific behavioural modes, while a clustering procedure would be suitable if one wants to explore what behavioural modes the model can produce.

.
In the present paper temporal dynamics of the method are not yet covered. Computing sensitivity indices as JASSS, ( ) , http://jasss.soc.surrey.ac.uk/ / / .html Doi: . /jasss. function of time, e.g., by using a sliding window with a mean value across a small time interval, can give insight into the dynamics of ABM (Ligmann-Zielinska & Sun ) and this is a possible extension of the method. Machine learning techniques in particular may be helpful for gaining insight into these dynamics because they can allow us to look in more detail at simulated sample paths (Nelson ). While the present paper does not yet address this, by separating parameter space into regions it does present one step towards using more detailed analysis methods to gain additional insight into ABM behaviour. Furthermore, using surrogate models ensures that this additional insight over standard methods is gained at relatively low computational costs.

Appendix A: Support Vector Machines and Support Vector Regression
Classification Support vector machines (SVMs) are a machine learning tool used to classify output data. While the method can be extended to deal with a larger number of categories, here we describe it for two categories, marked as -and . The classification is based on the following decision criterion, Here, h(x) is the SVM classification, and the vector x contains any number of features to be used for classification. The vector w assigns a weight to each feature, and b is a constant.
To determine optimal values for w, we need a set of training samples for which both the features and classification are known. The weights are set such that the SVM correctly predicts the classification of as many training samples as possible. This is achieved by solving the following optimisation .., n.

( )
Here x i denotes the features of sample i, and y i the corresponding classification. It can be shown that the minimisation of the weights w ensures that the margin between the decision boundary (Equation ) and the training samples is maximised. The constraints ensure correct classification according to Equation .
Since correct classificaton of all samples may be impossible, slack variables ξ i are introduced. If ξ i > 0, sample i can be misclassified while still meeting the minimisation constraints. However, this misclassification is penalised in the minimisation term. The regularisation parameter C weighs the penalty for slack variables against the minimisation of w. Large values of C strongly penalise misclassification, and may lead to overfitting. Small values of C avoid overfitting, but may lead to a decreased classification accuracy.
For many applications, the linear decision criterion of Equation is not suitable. An advantage of SVMs is that the linear criterion can be replaced by a number of alternative criteria. This replacement is based on the dual form of the mathematical model specified by Equation and . This dual form is written as (Ng ), where a i are Lagrange multipliers. Solving the dual problem yields the same solution as solving the primal problem (Equation ). Using the dual form, it can be shown that the decision criterion can be written as Both Equation and Equation depend on the inner product x i , x , rather than on x itself. Therefore, this inner product can be replaced by a Kernel function K(x i , x) = φ(x i ) T φ(x), where φ(x) represents a feature mapping of x. A necessary and su icient condition for this replacement is that K(x i , x) is symmetric positive semi-definite. Furthermore, the Kernel function should measure the similarity between x and x i . In this paper we will use a Gaussian Kernel function ( ) The value of K(x i , x) ranges between 0 and 1. If K(x i , x) = 1, both samples are identical, and K(x i , x) asymptotically decreases to zero as the distance between the samples increases.

Regression
While SVM is used to classify output data, support vector regression (SVR) is used to predict real-valued output data, rather than to classify data. This prediction is given as where we have used a Kernel in the same way as for SVM. To each sample x i , a weight w i is assigned. These weights are optimised when Equation is fitted to the training data by solving the following optimisation problem ( ) The constraints ensure that if the di erence between the prediction and the actual output y i of the training data is smaller than , the sample does not contribute any cost to the objective function. If the di erence is larger than , then ξ i or ξ * i will be positive and will contribute to the objective function. Similar to SVM, we can write the dual form of the mathematical model specified by Equation and choose a Kernel function.

Appendix B: Performance Measures
We assess the performance of the SVM surrogate models by its F -score (Sasaki ). The score is defined as the harmonic mean of the precision and recall, where f n denotes the number of false negatives. By considering both precision and recall it is ensured that the F -score yields a balanced performance measure, even when the numbers of positives and negatives di er strongly.
The performance of the SVR surrogate model is assessed by the coe icient of prognosis (Most & Will ). It is defined as ( ) Here Y(x) is the ABM output and f (x) is the corresponding SVR prediction. Both are measured over test data that is independent from the training data used to train the SVR. A value of means that the ABM output and SVR predictions are exactly equal. A value of means that there is no correlation between them.  --Weight of capital to cost ratio in P(buy) --Weight of buying ratio in P(buy) --Weight of available sites in P(buy) --Weight of entrepreneurship in P(buy) --Weight of catch in P(sell) --Weight of earning vessels in P(sell) --Weight of buying ratio in P(entry) --Weight of catch information in P(entry) --Weight of available sites in P(entry) --Starting capital USD -. * 6 Investment costs of a vessel USD -. * 6 Annual fixed cost per vessel USD year −1 -. * 5 Fuel cost per liter USD l −1 . -. Base skipjack price USD -Price to catch coe icient USD ton −1 -.
-. Di usivity of fish miles −1 day −1 - Table : For each parameter in the resource-consumer model, the units and range used in the sensitivity analysis are listed here. We have used uniform distributions for each parameter, except for the di erent weights. For the weights, we used Dirichlet distributions to ensure that the sum equals one.

Appendix E: Parameter Correlations for Behavioural Mode of Interest
Considering only a specific behavioural mode, rather than the whole parameter space, introduces dependencies between parameters. These dependencies can influence computed sensitivity indices, because one parameter can have an indirect e ect on the output through influencing the range of other parameters. Here, we provide the correlation matrices for the model parameters for the Bazykin-Berezovskaya model, and the resource consumer ABM. The most important results are discussed in the main paper. The values here are provided so that readers can verify these results themselves. The matrices have been computed on the test samples of the corresponding case studies. For the fishery model, we do not give the full matrix, due to the large number of parameters. For this model, all of the correlations are below . , except for the correlation between the target ratio of catch to capacity, and the weight of available sites in P (entry). The correlation between these two parameters equals . . -. -. -. -.
-. z . K Table : Correlation matrix for the behavioural mode of interest of the resource-consumer ABM. Indices with an absolute value above . have been printed bold.