Agent-based Social Simulation of the Covid-19 Pandemic: A Systematic Review

Whenplanning interventions to limit the spreadof Covid-19, the current state of knowledgeabout the disease and specific characteristics of the population need to be considered. Simulations can facilitate policy making as they take prevailing circumstances into account. Moreover, they allow for the investigation of the potential e ects of di erent interventions using an artificial population. Agent-based Social Simulation (ABSS) is argued to be particularly useful as it can capture the behavior of and interactions between individuals. We performeda systematic literature reviewand identified 126 articles that describe ABSSof Covid-19 transmission processes. Our review showed that ABSS iswidely used for investigating the spreadof Covid-19. Existingmodels are very heterogeneous with respect to their purpose, the number of simulated individuals, and the modeled geographical region, as well as how they model transmission dynamics, disease states, human behavior, and interventions. To this end, a discrepancy can be identified between the needs of policy makers and what is implemented by the simulationmodels. This also includes how thoroughly themodels consider and represent the real world, e.g. in terms of factors that a ect the transmission probability or how humans make decisions. Shortcomingswere also identified in the transparency of the presentedmodels, e.g. in terms of documentation or availability, as well as in their validation, which might limit their suitability for supporting decision-making processes. We discuss how these issues can be mitigated to further establish ABSS as a powerful tool for crisis management.


Introduction
. Since the Covid-disease was first identified in December , it has spread almost around the entire world and become a global pandemic causing approximately cases and deaths within the first month (Dong et al. ). To contain the spread of the responsible SARS-CoV-virus, di erent strategies were adopted worldwide, which then had to be revised in accordance with new insights. Moreover, interventions that were found to be successful in some regions were less expedient in others (Chu et al. ; Hale et al. ). This can be due to, for instance, socio-demographic characteristics of the population, cultural di erences in the way of life or individual behavior, and country-specific variations in infrastructure and medical care capacities (Pullano et al. ). .
Since the outbreak of the pandemic, many of simulation studies have been conducted to investigate di erent aspects of the disease spread, the resulting hospital occupancy, or economic e ects (Currie et al. ; Nicola et al. ). Many of these studies were based on traditional mathematical macromodels that are not capable of simulating individual behavior (Shinde et al. ). In contrast, the microscopic simulation of individual .
In order to better understand how ABSS can be used in pandemics, we performed a systematic review to identify agent-based models of Covid-transmission, which can be used to investigate the introduction, management, or removal of di erent interventions. We provide an analysis of these models including a comparison of the interventions that can be investigated, inputs to the transmission model, characteristics of individuals, and modeled disease states. .
The research questions that we aim to answer with this study include the following: • For what purpose has ABSS of Covid-transmission been used during the pandemic?
• How is the behavior of individuals modeled and how do they make decisions?
• What population size and geographical area are simulated?
• How are transmission processes between individuals modeled?
• How are disease states and the progress of the disease modeled?
• Which interventions are investigated and how do researchers assess the feasibility of these interventions?
• How is the trustworthiness of the models and the generated results ensured, e.g. for the use by decision and policy makers? . Articles were included in the review based on the following eligibility criteria: • Published in English; • Published in a journal or proceedings, or available as preprint in a recognized archive (i.e. PubMed, arXiv, medRxiv); • Uses an ABSS model that allows for investigating the spread of Covid-, i.e. a micro-level model where the identity and status of each individual can be tracked throughout the simulation; • Article describes the simulation model and the transmission process; • For each model and each team of authors, only the latest version of the preprint or, if existing, the peerreviewed article is included; • Uploaded or published earlier than October , (articles published a er this date are considered if a preprint was published before). .
Information regarding the presented simulation models was extracted from the articles using a list of characteristics that was developed and extended based on a first review of all identified articles (see Appendix I). The assessment was conducted based on the articles and, if available, on supplemental materials such as technical descriptions that were published with the article. As most models were not available for download, the source code was not reviewed for further insights in the models' functionalities. Whenever the assessment of a particular feature was ambiguous, the article was discussed among all authors. The detailed results of the analysis are shown in Appendix II and a list of all reviewed articles can also be found in Appendix III. .
The consideration of articles published in English language, the limitation of the study's time period to October , and the inclusion of non peer-reviewed articles may have introduced bias and a ected the results. However, even though preprints do not necessarily fulfill the same requirements as a scientific article, they provide a complementary perspective on existing modeling approaches.

Classification of the Articles
. To answer the research questions, attributes were extracted and analyzed for the di erent models. These include both simple attributes, where the existence of a particular functionality is mostly assessed as either true or false, and more complex attributes where a textual description is provided on how and to what extent the model implements or corresponds to the attribute. All classifications that are presented in this study refer Table : Attributes that were used to assess and classify the models presented in the surveyed articles. There are two di erent types of attributes: those that consist of binary classifications, i.e. whether or not a particular feature is part of the model, and those that are more complex and consist of nominal categories or individual textual descriptions of a feature. In this table, the non-binary attributes are underlined. .
For each article, the names of the authors, whether or not the article has been peer-reviewed, and the date when it was first published are provided. For articles that were uploaded to open-access archives, the upload date of the latest version is provided, however, this cannot be later than December , , when this study was conducted. .
We distinguish between di erent purposes of the models. This can be the investigation of the spread of the virus as well as of the e ects of non-pharmaceutical (NPI) or of pharmaceutical interventions (PI) over time. Moreover, we distinguish between the introduction, adaptive (dynamic) management, and removal of these interventions. There is a total of di erent NPIs and two PIs that were simulated, e.g. lockdowns, face masks, or vaccinations. .
In terms of the input data that are used for initializing the model, we distinguish between three di erent types of data: socio-demographic (census) data, mobility data on movement patterns, and spatial GIS data of an area. Accordingly, we also analyze the output data provided by the models, e.g. in terms of the reported performance measures such as number of infected, hospitalized, or deceased individuals, but also more comprehensive output data such as infection chains or economic e ects.
. The transmission model, which defines how transmissions occur between individuals, can include di erent attributes and factors when determining the contagion probability. In our study, we identified a total of eight common factors that might positively or negatively a ect the probability of an individual being infected when meeting an infected individual, e.g. the progress of the disease, the distance between the individuals, or the exposure time. In addition, this study also investigates what personal attributes are used to describe properties of individuals. In total, there are ten di erent personal traits that were used in the models, e.g. age, health status, or the wearing of protective equipment such as face masks.
. Compartment models are o en used to model individual disease progress and disease states (Brauer ). Each phase of the disease describes the condition of the individual, e.g. the need of medical care or the occurrence of symptoms, but also its capability of infecting others. This study assesses nine di erent disease states ranging from susceptible to deceased with di erent infection states.
. Finally, we summarize other relevant model characteristics that cannot be assigned to the previously described categories. This includes more technical properties of the models, e.g. the number of simulated individuals, the Figure : Analysis of the purpose of the models in terms of the interventions that can be simulated. Visualization of the percentage share of models that can be used for investigating di erent applications of nonpharmaceutical and pharmaceutical interventions. Models can support one or multiple interventions. In the review, no models were identified that support the adaptive management or removing of pharmaceutical interventions. .
The general purpose of all models is to estimate the spread of Covid-over time. Their specific purpose can be distinguished in two ways: by the modeled interventions -i.e. no interventions, non-pharmaceutical interventions (NPI), or pharmaceutical interventions (PI) -and by their application, i.e. introduction, adaptive management, or removal of interventions. The vast majority of the models ( , . %) were used for simulating the e ects of one or multiple NPIs, whereas eight models ( . %) support the simulation of PIs. Out of these, two models ( . %) consider PIs only, whereas the other models also include NPIs. Five models ( . %) do not explicitly simulate any interventions, just the spreading of the virus. As illustrated in Figure , models ( . %) of the models that simulate NPIs can be used for analyzing the introduction of NPIs; eight models ( . %) for the adaptive management, i.e. dynamic introduction and removal based on certain criteria; and models ( . %) were used for the simulation of exit strategies and removal of NPIs. In total, of the analyzed models ( . %) have multiple purposes and can simulate di erent combinations of interventions and applications. As most of the investigated models were developed in the beginning or early phase of the pandemic, the focus on the introduction of NPIs is not surprising. During this time, it became clear that the rapid spread of the virus was a challenge to hospital capacity and, thus, had to be contained by means of interventions. However, the small number of models for simulating the e ects of vaccinations on the spread of the virus is surprising. Even though WHO stated already in February that a vaccine might be available within months, we could not identify any agent-based models that can be used for simulating di erent vaccination strategies.

.
There are two di erent PIs whose e ects were investigated by the analyzed models: preventive vaccinations and acute treatments. In contrast, there is a variety of NPIs that have been simulated. As shown in Figure , most articles ( , . %) analyze the e ects of quarantining and isolation of (potentially) infected individuals, followed by articles ( . %) that analyze social distancing. For both these NPIs, about % of the articles indicate that a clear positive e ect is achieved with respect to limiting the transmission. This also includes, for instance, voluntary home quarantine once an individual experiences symptoms or when a certain number of personal contacts have been infected. Testing ( . %) and tracing ( . %) are o en analyzed together, such that tracing of contacts results in them being tested and quarantined if necessary. More than half of the articles that implement either of these two NPIs ( articles, . %) include both of them ( of , . %). Testing, usually in combination with quarantining, seems to be the NPI with the clearest positive e ect, according to the reviewed papers. Figure : Percentage share of models that support the simulation of non-pharmaceutical and pharmaceutical interventions. The color represents the authors' conclusions regarding the e ectiveness of the intervention for containing the spread of the virus based on experimental results: positive e ect (green), unclear or neglectable e ect (orange), negative e ect (red). Models that allow for the analysis of a particular intervention without presenting any experiments are marked in grey. This classification can be used as an indicator to identify those inventions whose benefits are controversial, i.e. lockdown and closure of schools. .
The closure of di erent types of facilities -such as schools, universities, workplaces, leisure, and shoppingis another well-studied NPI. In total, articles ( . %) analyze closure of at least one type of facility and articles ( . %) the closure of multiple types of facilities. The closure of schools, including preschools, is most common ( . %), followed by all work-places ( . %), universities ( . %), and o ices ( . %). A distinction is made between workplaces in general and o ices where employees can work from home. As a compromise between remote learning and opening schools for all children, some countries have discussed shi operation. Here, the class will be divided into groups and that alternate between remote learning and being at school. Through this, keeping social distance is facilitated, the number of potential contacts is reduced, and the consequences of a potential infection are lower. In our study, we found that most interventions are simulated in an "all or nothing" manner, i.e. schools or workplaces are either entirely closed or opened for all. Hybrid forms or shi operations, which seem most promising, are usually not modeled. This is, probably, due to the di erentiated and more sophisticated modeling that is required. It is also interesting to note that some interventions that have been frequently applied in reality, such as curfews and limiting the size of public and private gatherings, were studied in very few articles. A reason for this could be that the modeling of individual behavior needs to be more detailed and the time step resolution finer than in most of the analyzed simulation models. .
Apart from the simulation model itself, the quality and credibility of the results also strongly depend on the quality of input data that are used to configure and adapt the model to the circumstances or the environment that is to be simulated (Bonabeau ). Viral transmission between individuals depends on socio-demographic attributes such as age and household size as well as on movement behavior. It can be assumed that simulations making use of real data on the behavior and characteristics of individuals are more successful in generating credible results. In total, of the analyzed models ( . %) apply real-world census data for generating an artificial population such that the socio-demographic features of the modeled individuals correspond to those of the population of the simulated region or country. Overall, models ( . %) make use of real-world mobility data, e.g. cellphone data, for generating movement profiles between di erent locations, such as home, workplace, and leisure activities. To adequately model neighborhoods and distances, models ( . %) use GIS data for generating a realistic model of the environment. It should be noted, however, that simulations not using real input data may still be useful for getting a general understanding of the e ects of di erent interventions under di erent circumstances.
. For modeling disease states, most of the analyzed models adapt variations of the SIR compartment model (Kermack & McKendrick ). As shown in Figure , the progress of the disease is described by a number of discrete states, i.e. susceptible individuals that can be infected, di erent incubation and severity states a er an infec-tion occurred, and potential outcomes a er an infection such as recovery or dead. Altogether, ( . %) of the analyzed models include a susceptible state whereas the remaining simulations only focus on already infected individuals. In total, models ( . %) consider an incubation period a er being exposed to the virus in which individuals carry the virus but neither infect others and nor show symptoms. When becoming infectious, models ( . %) distinguish between being symptomatic or asymptomatic, which can either be consecutive or exclusive. In addition to the classical SIR model, models ( . %) define states for severely ill and models ( . %) for critically ill individuals that require hospital or ICU treatment. A state for deceased individuals is considered by models ( . %). Figure : Disease states of the compartment models used for representing the progress of the infection. The most common states include those of the traditional SIR (susceptible, infected, recovered) compartment model. However, a variety of extensions of the classical SIR model could be identified with additional states that allow for a more detailed representation of the disease progress. .
Due to the identified di erences in the applied disease models, the purpose and application area of the models di er. When investigating the e ects of di erent interventions, one might also be interested in the impact on the healthcare system. However, the majority of the models do not consider individuals becoming severely ill such that they require hospital or ICU treatment. Moreover, most models do not include the risk of reinfections.
In the early phase of the pandemic, it was assumed that antibodies would prevent a second infection, yet new studies show that there is a risk of reinfection, which might be due to mutations of the virus, a mild first infection, or as antibodies disappear.
. Traditionally, state transition in SIR models is implemented by means of transition rates and probabilities of disease transmission. In agent-based simulations, personal attributes or circumstances can be used to calculate individual transition probabilities. Figure provides an overview of attributes used in the transmission models of the investigated simulations. In models ( . %), the likelihood of infecting others upon contact varies depending on the specific disease state of the infecting individual, e.g. whether an individual is asymptomatic or symptomatic, or on the current location where the contact takes place. The time since the infected individual itself was infected or the age or age-group of either the infecting or infected individual both a ect the transmission probability in models ( . %) respectively. Other common factors that a ect the likelihood of transmission are the distance between the individuals ( . %), the density of people at a location ( . %), or contact time ( . %) once a contact occurs. It should be noted, however, that when investigating the e ect of social distancing, some models only consider encounters as contacts within a certain distance. Here, the actual distance between the individuals does not necessarily a ect the transmission probability. To simplify and combine di erent factors that might a ect the infection probability at specific locations, e.g. at workplaces or during outside activities, models ( . %) include location-specific transition probabilities. In models ( . %), uniform transmission probabilities are applied such that the likelihood of infecting others is always the same, despite individual factors such as the location, health condition, or contact duration, or no detailed description of the transmission models are provided. The e ect that di erent mutations of the virus might have on the transmission process has not been investigated or discussed in any of the articles. The stage of infection and the time since infection obviously have a major impact on the likelihood of infecting other individuals.
However, other widely discussed factors, such as the distance between individuals, which is also used as an indicator for tracking contacts using smartphone apps, are only considered by a smaller number of models. This is because most models lack a fine-grained representation of the actual position of individuals. Instead, most models make use of contact networks or gathering points to simulate interactions between individuals. This limits the models' suitability to simulate some types of interventions, e.g. the introduction of tracing apps.

Figure :
Analysis of the factors that are included in the transmission models to determine the probability of infecting other individuals a er being in contact. Visualization of the percentage share of models that consider the respective factor when determining individual transmission probabilities. Overall, . % of the models either do not describe the transmission mechanisms or make use of a uniform probability that is equal for all individuals and contacts.
. In addition to the described individual disease states, many models add further traits to individuals to make the population more heterogeneous and similar to an actual population. As shown in Figure , common attributes include age or age group ( . %), assigned household ( . %), workplace ( . %), and the individual's current location ( . %). Some models define specific networks of other individuals that the agent can or will have contact with ( . %). This contact network is sometimes stratified into household contacts, workplace contacts, or random encounters. Despite . % of the articles claiming to simulate face masks or other personal protection, only models ( . %) include an attribute whether or not individuals are wearing protection. In the remaining models, the simulation of the e ect of protective equipment consists of changing global transmission probability parameters or assuming that the entire population is wearing protective equipment. In models ( . %), no description of personal attributes is provided. Figure : Analysis of the attributes and traits that are used to describe individuals. In all, . % of the models do not describe any attributes or claim that all individuals are identical except for their disease state.
. For interventions to be successful, it is essential that individuals comply with them. In most simulations, it is assumed that all individuals will comply with any given intervention without exceptions. In reality, however, individuals tend to violate restrictions, such as limitations of gatherings, lockdowns, or the requirement to wear a face mask. To simulate obedience and disobedience to norms, more advanced models are required that consider personality traits as well as needs of individuals. However, our results show that personality traits are not among the attributes that are used to describe and characterize individuals.
. Squazzoni et al. ( ), the modeling of realistic human social behavior is crucial for simulating the dynamics of the Covid-pandemic. One major challenge is the modeling of decision-making processes and the actions that result from an individual's perception of a situation, as well as the individual's attributes. Approaches and architectures that can be used for modeling agent behavior di er greatly and range from homogeneous reactive behavior patterns, which are more rule-based, to sophisticated deliberation processes that are based on individual needs or the perceived utility of possible actions (Russell & Norvig ). As shown in Figure , there are di erences in how the behavior of the individuals is modeled. This includes how sophisticated the underlying decision-making is and whether or not a uniform behavior model is used for all individuals. Of the articles studied, both Kai et al. ( ) and Pollmann et al. ( ) present two di erent versions of their models with di erent agent behavior, which have also been included in this analysis. Figure : Classification of models according to how agent behavior and actions are implemented. Most articles present models with random behavior. This might be fully random, with an equal likelihood of infection any individual within the population, or random within given social networks, spatial networks, or both. Some models implement more dynamic decision-making, e.g. by means of individual schedules, need models, or utility functions. A distinct classification cannot be given for some articles, e.g. due to a combination of approaches or lack of description.

.
In total, models ( . %) consist of very simplistic behavior models where individuals randomly infect other individuals in the population. In these models, there are neither social networks nor spatial networks. Hence, there is an equal probability of infecting any individual within the population as there is no representation of locations, contexts, or interpersonal relationships. Overall, models ( . %) make use of either social networks ( . %), spatial networks ( . %), or both ( . %) to model the individual behavior. Here, agents can only infect other individuals, either when they meet due to a social relation (e.g. household members or coworkers), or as they visit the same location at the same time (e.g. gatherings at home, at work, or at a shop). Most of the solely spatial models are random walk models, where individuals randomly wander in a specific area and transmission might occur once their distance falls under a certain threshold value (e.g. meters). In all, models ( . %) make use of more explicit behavioral patterns that are mostly derived from empirical data or pre-defined (static) schedules for each individual, for instance, based on its age, personal status, or employment status. Only models ( . %) include more advanced dynamic or adaptive behavior models. Here, the agents perceive their current situation and assess potential actions to identify the one that is most rewarding or suitable. .
The range of interventions that can be analyzed using the di erent models is strongly limited by design. A high degree of randomness in contacts between individuals facilitates the simulation of large populations. However, in the real world, the probability of people meeting or interacting depends greatly on their location, their routines, and their contact networks. Simple random walk models might be su icient to get a first impression of the potential dynamics of the pandemic. However, it limits the possibilities of simulating interventions such as quarantining or closure of certain facilities. Models consisting only of social networks are well-suited for simulating transmission processes in households or at workplaces. However, it cannot be simulated when two individuals are at the same location at the same time, thus being able to infect each other. Likewise, limiting the model to the representation of spatial networks does not allow for the representation of fixed households or work colleagues, which are relevant transmission hotspots. For an in-depth analysis of how di erent interventions a ect the behavior and individuals and thus the spread of the virus, more sophisticated behavioral models are required that adequately represent daily routines and activities. However, this is only provided by a few models. .
In articles ( . %), a specific country or region is simulated. As shown in Figure , the most studied countries are the United States of America ( articles), the United Kingdom ( ), China ( ), and Italy ( ). There might be a bias towards English-speaking countries due to the limitation of the study to articles that are written in English.
Other than geographical locations, settings studied included hospitals ( Figure : Countries that are simulated by the studies presented in the articles. When only a particular city has been simulated, the corresponding country is highlighted in this map. Some models simulate multiple countries or cities. The simulations focus on the United States of America, the United Kingdom, China, and Italy. Countries that are di icult to see on the map include Kuwait and Singapore. Figure : Analysis of the number of agents that are simulated. Visualization of the percentage share of models that use a certain number of agents. In total, . % of the models do not provide information on the number of agents that can be simulated. . In Figure , an overview of the number of simulated agents is provided. Altogether, articles ( . %) state how many agents were simulated and the number ranges from six to . From those models that provide information on the population size, seven models ( . %) consist of less than agents and models ( . %) of more than . The median of simulated agents is . The considerable di erence in population size results from a multiple of the aforementioned aspects. Usually, a trade-o needs to be made between the level of detail of the modeled individuals, as well as the complexity of their behavior, the time step size, and the number of simulated individuals. A simulation of individuals with a sophisticated needs model that consists of rather narrow time-steps, e.g. hour ticks, needs to be limited in the number of individuals with respect to the time and resources required to execute the model. Finding a suitable balance between level of detail and number of individuals strongly depends on the interventions that are studied. .
In addition to the described attributes, some other observations were made during the analysis of the included articles. For instance, there are significant di erences in the extent and rigor of the models' documentation, such that some articles had to be excluded from this review due to a lack of information on the model's fundamental functionalities. This concerns both peer-reviewed and non-peer-reviewed articles, and includes the assumptions that were made regarding the behavior of individuals or Covid-transmission dynamics. Standardized protocols that describe the structure of the model, e.g. the ODD protocol (Grimm et al. ), are rarely used and only . % of the models are available for download. Though some studies reused existing models, the majority of articles developed new models. .
In a literature study by Heath et al. ( ), it was shown that % of the investigated articles presenting agentbased models were lacking a thorough validation. Thus, with respect to the trustworthiness of the models and the generated results, we also investigated whether and to what extent the simulations were validated. As shown in Figure , the majority of the articles do not elaborate on how the presented model or the generated results were validated. The remaining articles ( . %) primarily make use of real-world data for assessing the validity of their models by, for instance, comparing the transmission dynamics and the course of the pandemic as observed in the simulation to data from reality. This includes data from reports or surveys, e.g. on the number of infections, hospitalization, or death rates, as well as other epidemic parameters, such as the doubling period or reproduction number (R ). When analyzing phenomena that have not been broadly studied, the availability of real-world data might be limited; thus, it is challenging to validate the models. Hence, four models ( . %) compare the behavior of their model against other models. The suitability of this approach for validating simulation models has been, for instance, described by Axtell et al. ( ) and Sargent ( ). Finally, three models ( . %) apply systematic testing and experimentation for assessing the quality of their results and two models ( . %) involve domain experts, e.g. epidemiologists, for validating their models and results. Validation of soundess by experts 2% Not described / not conducted 75% Figure : Approaches that were used to validate the models presented in the articles. The majority of the articles ( %) do not discuss the validation of the presented model.

.
However, as stated by (Beisbart ), even results of models that do not comply with the highest validation standards can still provide valuable insights for decisions makers. This is especially relevant in crisis situations, where other approaches for generating insights are limited. Here, simulations can provide di erent potential explanations for specific phenomena or observations, that can be used by experts to conclude the real explanation.

Challenges and opportunities
. To support decision making, the results generated by simulation experiments need to be reliable. In ABSS, the design of the simulated population may be a threat to the reliability of the results. In case where the characteristics or behavior of the artificial population do not represent those of the real population in an adequate way, the simulation results' ability to draw conclusions regarding mechanisms in the real-world system might be limited. Our review showed that articles ( . %) mention the inclusion of any real-world input datasets, e.g. on individuals, mobility behavior, or geodata. Due to reasons of privacy, for instance, such data are o en not available on the individual level, and artificial populations need to be generated based on aggregated data instead. Still, to adequately represent the real-world environment and to be able to draw conclusions regarding the real world, socio-demographic, and behavioral data for the simulated population are required for the model development. However, the sole use of real-world data is not su icient to ensure the transferability and applicability of the simulation results, as it depends strongly on the quality, source, relevance, and extent of the used data. .
In addition to data on the simulated phenomenon, expert knowledge is valuable in order to verify model assumptions and the generated results. In this review, some simulation models were developed solely by computer scientists, engineers, or experts from other technical disciplines. To calibrate the model to observations from the real world, indicators like the basic reproduction number are used. Cross-disciplinary author collectives and the incorporation of medical experts for verifying the plausibility of assumptions made regarding the transmission process seem to be promising steps towards the development of credible models. However, a certain degree of uncertainty remains as some of the underlying mechanisms of the pandemic are still unknown and data are not available. .
In contrast to humans, artificial agents are usually modeled to behave rationally. When simulating the introduction of interventions, results might be misleading due to such rational obedience to restrictions. In the real world, it can be observed that humans disregard and violate restrictive interventions such as curfews or the obligation to wear masks. Such deviations from the intended behavior can either be modeled by means of probability distributions or through the explicit representation of personality traits. Either way, data are required concerning under what circumstances and to what extent individuals tend to disobey restrictions. .
When developing an ABSS model, a trade-o has to be made between complexity of the individuals and the number of individuals that are simulated. Incorporating multiple personal and environmental variables into the individual decision-making process requires additional computation time. This can be counteracted by reducing the number of simulated individuals or by increasing the size of the time steps. In this study, the analyzed models consist of six to agents. More sophisticated models of human behavior were mainly found in the models that consist of a smaller number of individuals. The decision regarding how many agents to include depends on the purpose of the simulation. When simulating a specific city to identify a suitable intervention, one might want to simulate individuals living together in households, commuting to workplaces, and leisure activities combined with a sophisticated transmission model. In contrast, when analyzing the e ect of travel restrictions for a country, individual commuting routines might not be of high relevance. Still, the e ect they have on the overall movement of individuals needs to be modeled correctly to generate realistic and applicable results. With respect to the scalability of the model and the e iciency of computation, such models might choose to implement contacts between individuals based on a network approach. Moreover, di erent models can be combined so that results from more detailed simulation at city level can serve as inputs for country-wide or transnational simulations. .
With respect to improving the transparency, access, and rigor of simulation models that are used to understand the pandemic and to assist policy and decision makers, Squazzoni et al. ( ) formulate three major challenges that need to be addressed: the Covid-prediction challenge, the Covid-modeling human behavior challenge, and the Covid-data calibration and validation challenge. .
In their prediction challenge, the authors outline di iculties that might arise when trying to predict the behavior of complex systems. It is the modelers' goal to minimize limitations of their models by scientifically grounding their assumptions and by calibrating their models with the most accurate data. However, data are o en not available, and it is di icult to validate the assumptions made. Our study showed that nearly all investigated models aim at predicting how di erent interventions might a ect the transmission dynamics. This includes the simulation of a certain time period, in which one or multiple interventions can be activated at a specific point in time or when a certain threshold of an epidemic parameter, e.g. a certain number or ratio of infected individuals, is exceeded. It further showed that the simulated time period, i.e. the horizon of prediction, di ers greatly between the models. It ranges from a few days up to two years. Most models, however, predict over a time period between three months and one year.
. The modeling human behavior challenge addresses the modeling of complex social dynamics, which is of particular relevance when simulating infection dynamics of the Covid-pandemic, as well as the e ects of interventions. That is, because such dynamics emerge from the individual behavior of humans. However, it is challenging to build cognitive models of human decision making as actions might not always be rational and the heterogeneity of individuals needs to be considered. Building sound models requires not only socio-demographic data on the particular attributes, e.g. age, gender, or household, but also cross-disciplinary expertise. Squazzoni et al. ( ) argue that models that cannot be used to examine social dynamics are lacking a crucial aspect for analyzing and predicting Covid-transmissions. .
Most models that were investigated as part of this study include rather simplistic models of human behavior. To this end, the actions taken by individuals and the transmission of the virus between the individuals take place randomly. Socio-demographic data are o en used to generate a population that corresponds to the real-world population with respect to specific features, e.g. age distribution. However, this information is then primarily used for the global organization of the model, e.g. the composition of households, determining their occupation, or as factors in the transmission model for determining the individual probability of infection. Moreover, interactions in most models are predetermined as a result of social or spatial network graphs. Only a few models include more sophisticated models of human behavior, where schedules, utility, or need models are used for proactively determining suitable actions based on a given situation and personal features. Social dynamics, as described by Squazzoni et al. ( ), are in fact part of many models. However, they are mostly not the result of the individual assessment of a given situation; rather, they are the consequence of a predefined social interaction network. Especially with respect to the simulation of whether or not individuals comply with interventions, more sophisticated decision-making models are required. In particular, compliance with di erent measures depends on personal characteristics and circumstances, all of which need to be taken into account. .
Finally, the data calibration and validation challenge especially arises when using simulations for ex ante analyses with the aim of predicting future developments. This is a result of a lack of data and possibilities for gathering it, e.g. as experiments are either impossible or unethical. To this end, the authors suggest the retrospective validation of models. However, this might not always be feasible when simulation results are required for making timely decisions. Squazzoni et al. ( ) also discuss the quality of available data on the Covid-pandemic as metrics, e.g. the number of cases, might be defined and collected di erently among di erent countries. This a ects both calibration and validation and it might lead to biased estimations. .
Nearly half of the investigated articles outline how their models have been calibrated. This includes determining feasible probabilities for transmission processes as well as for the progress of the disease, e.g. the likelihood of being infected, the transmission rate, the probability of developing a severe progress of disease that requires hospital admission, or the likelihood of dying as a result of the disease. Moreover, calibration is used to adjust social interactions between individuals, that potentially result in infections and the transmission of the virus. This is achieved by, for instance, calibrating the overall contact probability and even the specific weights of edges in network graphs that represent individual transmission or contact probabilities. Approaches that are used for calibrating models include optimization techniques such as Bayesian optimization, Sequential Monte Carlo, maximum likelihood estimation, minimization of the sum of squared residuals, and experimental designs such as Latin Hypercube Sampling. However, the large number of potentially interdependent model parameters makes the calibration challenging. .
In accordance with the study by Heath et al. ( ), we also found that the majority of articles ( %) are lacking a description of how the model was validated. Due to the pandemic situation and as most models aim at the ex ante analysis of the e ects of di erent interventions, the availability of data is strongly limited. In addition, the use of early available data, whose quality might be questionable, another approach is the comparison of di erent models among each other, e.g. in accordance with Axtell et al. ( ) and Sargent ( ). However, as the validity of these models also is uncertain, the results of these comparisons need to be assessed carefully.

Limitations .
As the Covid-pandemic is still ongoing, the work on novel models and simulations continues, thus resulting in new publications. This literature review analyzes and discusses the state of publications as of October , , approximately nine months a er the outbreak of the first Covid-cases. Due to the high number of preprints that are included in this study, it can be assumed that some of the presented models are relatively quick responses to the pandemic that were developed without comprehensive funding. However, the researchers should still be responsible for the validity and completeness. It can also be assumed that the number of quality-assured publications will increase in the future as a result of extensive peer review processes. In addition, these future articles might present an updated or extended version of the models.

Implications for practice and future research .
To simulate transmission processes, the underlying mechanisms that lead to the spread of the disease between individuals need to be modeled. Due to the novelty of Covid-, the understanding of the infection mechanisms is still limited. Thus modelers are required to make assumptions on how certain mechanisms work and how di erent parts of the model are interconnected. At a later point, these assumptions might prove to be imprecise or even incorrect, and thus need to be revised. In cases where they have been hard-coded into the model, future users are not able to adjust them, which severely limits the applicability of the model. Instead, modelers and developers should aim at mapping assumptions to model parameters so that they can be easily modified. This is also relevant with respect to the interpretation of the generated results, as it allows for the analysis and comparison of di erent assumptions. .
Instead of explicitly programming all parts of the simulation model or manually setting parameters, these may also be induced from observed data using machine learning techniques. This includes, for instance, information on where infections occur, personal factors that influence the susceptibility, and the likelihood of transmission when having contact with an infected individual. However, data on these mechanisms are still being collected and only limited information is available. .
To promote the use of simulation results in policy making, it is not only the model building but also the model description that needs to be improved. In fact, some papers were excluded from the study because of an incomplete or shallow description of the model. To be able to adequately interpret simulation results and to use them as evidence in decision-making processes, the assumptions made by the model, as well as the underlying mechanisms, need to be transparently communicated. This includes not only the possibility to download the model but also a comprehensive description of its entities and their interactions. This can, for instance, be achieved by using the ODD protocol (Grimm et al. ). In addition, the thorough documentation of a model's mechanisms, validation of the model -e.g. by including epidemiological experts in the project team -seems necessary to ensure the correctness and plausibility of model assumptions. .
When comparing interventions that are supported by the simulation models with those interventions that are discussed in reality, a discrepancy can be observed. Many simulations model lockdown scenarios, however, complete lockdowns o en are considered a last resort. Instead, curfews and limitations of public and private gatherings are common interventions, but only a few models analyze these interventions. This might be due to the increased complexity of the simulation models needed, for instance, in terms of the size of the simulated time steps or the used human behavior models. Simulating the e ects of some NPIs requires a more realistic and individual representation of human decision making and norm obedience as well as a more fine-grained time step size. However, most of the investigated models consist of rather simplistic behavior models and simulate on a daily basis. This does not allow for the analysis of more advanced NPIs such as curfews or preventive quarantine. .
The great variety of models identified in this study as well as the small number of reused models, also suggests that collaborations and model combinations might allow for the development of more complex models.
Combinations of di erent models were not identified in this literature review.

Conclusions
. In this systematic review on ABSS of Covid-transmission, we identified and analyzed articles that propose relevant simulation models. The models can be used to investigate di erent aspects of the ongoing Covidpandemic, including transmission dynamics and interventions for containing the spread of the virus. However, these models di er in the interventions that can be simulated, in the extent of the transmission model, the attributes of the individuals, and the states used for representing the progress of the disease. .
Over % of the analyzed articles present simulations of at least one NPI and investigate how this a ects the dynamics of the pandemic in terms of infected individuals, mortality, or demand on the healthcare system. In addition to the parameters that can be used to adapt the model, more than half of the models include some real-world input datasets of individuals, mobility behavior, or geodata, which allows for adapting the model to the real-world conditions of the area that is simulated. By this means, local circumstances can be modeled and di erent scenarios of interventions can be investigated prior to their actual introduction. The results generated by the simulations can provide new information on how transmission dynamics are a ected under di erent circumstances. Thus they provide policy makers with additional valuable insights to be used in their decisionmaking process. .
For the conducted literature review on ABSS models for the Covid-pandemic, we can summarize the following key messages: • ABSS o ers a more powerful tool to investigate Covid-transmission processes between individuals and the e ects of interventions compared to traditional macro-level models, e.g. system dynamics or mathematical models. Nevertheless, although we identified published ABSS models for Covid-in this review, the full potential of ABSS is yet to be realized.
• A discrepancy can be identified between the needs of policy makers and what the simulation models implement. There is, for instance, a need to consider di erent configurations of curfews and restrictions of gatherings. The analyzed simulations, however, mainly implement tougher general interventions such as lockdowns and quarantines, which are easier to model. Moreover, the e ects of pharmaceutical interventions are rarely studied.
• The e ects of the studied interventions are o en estimated to be positive, with the exception of lockdowns and closing of schools and universities, where the e ects are unclear or negative in % or more of the studies.
• Few models include factors that are generally assumed to a ect the transmission probability, such as the distance between the individuals ( . %), density of individuals at the location ( . %), wearing protection ( . %), and contact time ( . %).
• Although age is the most commonly modeled attribute of individuals, more than half of the analyzed models do not consider this attribute in the transmission model, despite it being widely regarded as a critical factor.
• Regarding the modeling of disease states, more than % of the models do not distinguish between being symptomatic or asymptomatic. Moreover, less than % of the models have a state for being critically ill and requiring hospital or ICU treatment.
• Most models ( . %) include simplistic models of human behavior, where decisions and actions are mainly random with predefined spatial or social networks. Some models, however, include more sophisticated models of human decision making that make use of individual schedules, needs, or utility functions.
• As the trustworthiness of the ABSS models is a key factor for policy making, the transparency needs to be improved. This can, for instance, be achieved by providing more detailed information about the model's underlying mechanisms and assumptions made using a standardized protocol (e.g. ODD), and by making the model available for download.
• The validation of simulation models whose purpose is prediction is challenging. This is, as required data are usually not available. In % of the investigated articles, there is no information provided on whether or how the model has been validated. This is a drawback with respect to the trustworthiness of the generated results. However, even without su icient data, validation can consist of systematic testing of the model, comparing it against other models, or expert assessment. Nevertheless, only a few models pursue such approaches. .
Our analysis showed that simulation can be used to identify potential e ects or side-e ects of di erent interventions against the spread of Covid-in an artificial population. However, the results presented by the articles mostly allow for understanding the dynamics of the pandemic and to identify factors that potentially a ect these dynamics. Thus the generated results need to be interpreted thoroughly, and their capability of predicting future developments is limited.
. This study gives an overview of models and their features but does not assess their trustworthiness. This is, to identify shortcomings in existing models with respect to future research endeavors, as well as to support decision makers in the identification of appropriate models for a given scenario. We conclude that ABSS is a powerful tool to investigate Covid-transmission processes and to analyze potential interventions. However, the full potential of ABSS is yet to be realized.