Arguments as Drivers of Issue Polarisation in Debates Among Artificial Agents

Canargumentsand theirproperties influence thedevelopmentof issuepolarisation indebatesamong artificial agents? This paper presents an agent-based model of debates with logical constraints based on the theory of dialectical structures. Simulations on thismodel reveal that the exchange of arguments can drive polarisation evenwithout social influence, and that the usage of di erent argumentation strategies can influence the obtained levels of polarisation.


Introduction
. Two recent agent-based models of polarisation (Mäs & Flache ; Banisch & Olbrich ) rely on the exchange of arguments as a driver of polarisation. These studies underpin the hypothesis that arguments and their properties could play a role in polarisation dynamics, alongside diverse other candidate causes, such as lacking exposure to other views (Mutz . These models also limit the functions of arguments to providing reasons in favour or against an issue. Although these are clearly central functions of argumentation, not all arguments can be reduced to these roles: for example, arguments can shape debates by showing that the issues under discussion can be mutually accommodated, they can enlarge or reduce the scope of issues, etc. The e ects on polarisation of these and other argumentative features are not investigated, and an argument exchange mechanism that resolves the inner workings of arguments seems necessary for this task. . Figure shows a comparison of models with respect to their conception of sentences under discussion in a debate. The models by Mäs & Flache and Banisch & Olbrich have a split ontology (with arguments and issues), and two relations that represent pro-and con-reasons, respectively. The TDS model presented here has a uniform ontology consisting of arguments only, which are further specified as consisting of premises and conclusions. In the TDS model, the defeat and support relation just mentioned replace the two relations from the first two models. The figure also shows a graph representation of propositional relations in the model by Friedkin et al. ( ).
. Friedkin et al.'s model is a general model of opinion dynamics and not a specific model of polarisation. It is interesting when studying TDS models though, because both encode logical constraints in opinion dynamics: Friedkin et al. ( ) use a matrix C with elements c ij ∈ [0, 1] showing the logical constraint of sentence i on sentence j, while the model presented here uses Boolean formulas to store logical constraints. In Friedkin et al.'s model, the e ects from the logical constraints in the matrix C compete with influences from a social network in forming agents' belief systems. The matrix C is stochastic, which requires that the row sums equal . In the graph representation, this translates to all constraints on a proposition adding up to in their weight, including reflexive constraints. .
Another example (not pictured in Figure ) for combining the e ects of logical constraints and social networks is the model by Butler et al. ( ). In fact, this model also uses argumentation frameworks, but in comparison to the present model, theirs is built on argumentation frameworks in the original sense (Dung ), which means that arguments can only stand in defeating relations toward other arguments and are not resolved further in terms of premises and conclusions. A further di erence is that agents are directly a ected by the opinions of others in their model, and that this impact depends on the antecedent opinion di erence. models, respectively. Those two distinguish issues i j and arguments a k . Dashed edges show arguments supporting a negative stance toward the issue (contra arguments), and solid edges support a positive stance (pro arguments). In (B), more than one issue is considered in a debate, and not every argument needs to contribute to every issue. (C) illustrates Friedkin et al.'s model, in which nodes represent propositions and edges show the weight of doxastic implication: in this example, an agent's belief in p 1 implies belief in p 2 to a degree of . , and an agent's disbelief in p 1 would imply disbelief to the same weight, but its belief in p 1 is not constrained by belief in another proposition. An example for debates in the TDS model from this paper is in (D), where nodes are arguments consisting of premises and conclusions, and edges are either of the support or defeat relation. .
A second important di erence between agent-based debate models is how they have agents behave toward each other and how agents update their belief systems. Mäs & Flache's and Banisch & Olbrich's are social influence models (SIM, see Flache et al. ( ) for a typology). In SIM, the opinions that agents move to in an updating event are aggregates of the distances between the updating agent and the agents that influence it. For example, in assimilative SIM, the opinion of an agent at time t + 1 is given as the agent's position at t plus the sum of weighted distances to all other agents in the agent's network, normalised by an influence parameter. In contrast to this updating mechanism, agents in the model presented below are only indirectly influenced by the belief systems of other agents, through the arguments those agents introduce. Agents decide how to change their opinion based on maximal opinion continuity, and do not rely on particular neighbours for their updating. .
The updating process in models of source reliability (Merdes et al. ) is reminiscent to that in SIM, and some of these aim to model polarisation in debates (Pallavicini et al. ; Olsson ). An important di erence to SIM is their usage of conditional probabilities in the updating processes: agents are influenced by their communication partner according to a probability conditional to the agent's trust into its partner (Bayesian updating). Note that trust need not be identical to similarity in SIM (agents can trust others even if they are far removed). But the ultimate driver of polarisation in these Bayesian models can not be seen to be argumentative, since trust in other agents is an epistemic property of the agents, not one of the produced arguments. .
The model presented below does not only di er in how agent's belief systems are influenced by others, but also which agents can exert this influence. Mäs & Flache's and Banisch & Olbrich's models only yield polarised outcomes when they rely on homophily to determine which agents are partnered up in an argument exchange event (see Figure  for the e ects of homophily in their models). In the context of these models, homophily basically means that agents are more likely to communicate the more alike they are. While this influence is a well-established phenomenon in the communication of humans and an interesting factor in its own right (McPherson et al. ), its influence may limit the insights to be gathered about arguments as drivers of polarisation. Homophily, a er all, is not a property of arguments, but of the agents. The model presented here is di erent in that respect: it exhibits rising polarisation even as all agents communicate in a continuous debate forum, with all of them having equal probability to select each other as communication partners.

Details on the concept of polarisation .
Issue polarisation is the target phenomenon of the model below, and although it is intuitively easy to understand, it is also a theoretical concept that has recently seen substantial conceptual and empirical contributions. The distinction between issue polarisation and (dis-)agreement is also interesting, since both are properties of a populations' opinion distribution. The next two sections briefly review di erences between ( ) social and issue polarisation and ( ) measuring issue polarisation and measuring agreement.
There are at least two types of polarisation .
There are a number of ways to understand polarisation (Mason ; Iyengar et al. , ), and a model that concentrates on the arguments in a debate is particularly designed to track polarisation in reasons and opinions, which is known as issue or belief polarisation. Such a model has a harder time to track other aspects of polarisation, particularly social polarisation (also called behavioural polarisation with the important sub-type of a ective polarisation). Put broadly, social polarisation is a state of a airs in which members of a population cease to view other members as enrichment, partners in collaborative projects, etc., and begin to see them primarily as a likely source of harm (either toward themselves or toward others one cares about), a threat to the well-being of society, etc. .
For the US public, the Pew Research Center ( ) extensively studies e ects of both types of polarisation. The United States are occasionally cited as an example of a polarised society, but this picture needs specification. Summarising research on her own country, Mason ( , -) finds that e ects of social polarisation are more pronounced than those of issue polarisation, leading her to conclude ( , ) that "the outcome is a nation that may agree on many things, but is bitterly divided nonetheless". .
For the interpretation of polarised outcomes in agent-based models, this constitutes an important caveat: it is important to distinguish issue from social polarisation in these models, because polarisation in terms of ontopic opinions can develop independently of how disputants regard other parties in the discussion. Results obtained on one type should not be applied to understanding polarisation in general, which is particularly true for the model presented below.
. Polarisation as a concept gets even trickier since its two sub-types di er in scope. Issue polarisation is not restricted to debates in politics, but can be expected in other deliberative populations as well, such as scientific communities or in the courtroom. It is a property of particular debates (on particular issues with particular agents), compared to social polarisation, which is a property of populations as a whole and has so far been reported in the general public and in politically engaged groups, but its role in specialised communities is unclear and possibly lower. An important question then arises about the correlation between these two kinds of polarisation. While Mason is clear that social polarisation does not seem to correlate with the mean issue polarisation across a range of issues, it may still turn out that social polarisation does correlate with specific issues polarising.

Measuring polarisation is di erent from measuring (dis)agreement .
If issue polarisation is characterised by divergence in on-topic views, a good question to ask is how this fits into the overall study of debates and deliberation. In particular, how does studying polarisation di er from studying disagreement? .
Betz ( ) measures disagreement in a population of agents as the averaged normalised distance between pairs of agents' belief systems, and agreement as the inverse of this value. Let δ be a distance measure between two agents, such as the Hamming distance of their belief systems, n the number of sentences under discussion and A the population of agents. Then the population-wide mean agreement (PWMA) is given as: In PWMA, the inverse of normalised di erences (1 − δ(x, y)/n) is averaged over the pairs of agents in the population, but no further aggregation of the atomic δ-values takes place, which means that di erences contribute uniformly to the measure. This is di erent to polarisation measures, which aggregate the same normalised difference values: they either track how these measurements spread around the mean di erence, or aggregate the di erences based on group membership of the individual agents. Measuring polarisation thus goes beyond reporting an absence of agreement in a population and further characterises such disagreement in terms of variation and clustering.

The Model
. This section presents the model according to Grimm et al.'s ( ) ODD protocol.

Overview
Purpose .
This model is designed to study the e ects of argument introductions and argumentation strategies on issue polarisation in debates among artificial agents.

State variables and scales
. Debates in this model are simulated as the logical conjunction of arguments. Arguments are logical implication relations between a set of premises and a conclusion. Both premises and conclusions are drawn from a sentence pool, which consists of n atomic sentence variables and their negations, meaning it has 2n elements. Every argument introduced to the debate must meet these criteria: • Satisfiability: A debate must remain satisfiable at all times. That is, the conjunction of arguments must be satisfiable.
• Premise uniqueness: Every argument must have a unique set of premises, i.e. any set of premises can be used in at most one argument of the debate. This restriction does not hold for conclusions, i.e. there can be multiple arguments with the same conclusion.
• Prohibition of conflicts and redundancy: If a sentence is used as a premise, neither it nor its negation is used as the conclusion or another premise of the same argument.
. A debate is represented by a Boolean expression of the conjunctive form in ( ).
The arguments and the debate as a whole use sentence variables, and the model is thus abstracting from the actual propositional content of premises and conclusions. This is su icient for the purpose of investigating the general role of argumentation in polarisation dynamics, but building debates on sentence variables can not elucidate the role of particular propositional contents in premises (such as the di erence between normative and descriptive claims) or of actual argumentation schemes. Natural language argumentation technologies seem necessary for this task, and it will be interesting to see how emerging approaches to natural language processing of argumentation (Hunter et al. ; Betz ) can be employed in future research.
. Agents in this model are simulated as having a belief system represented by positions in terms of TDS, and I will use both terms interchangeably in this paper. Positions are mappings from the atomic sentence variables to truth values True and False. An agent's belief system is fully specified by these truth-value attributions. In this model, agents assign a truth value to every sentence in the sentence pool (they never suspend judgement), but are confined to satisfying interpretations of the debate, which means that every agent must hold a position that is an interpretation of the Boolean formula that describes the debate. This minimal picture of rationality implies that agents assign identical truth values to equivalent sentences but di erent truth values to contradictory sentences, and follow their inferential obligations: if an agent assigns True to all premises in an argument, it also assigns True to the conclusion.
. For a simulation with n sentence variables in the sentence pool, an agent's position can be represented as in ( ):

( )
Agents that assign True to an atomic sentence variable are said to "accept" it, otherwise they "reject" it. Besides a so-defined belief system, agents are associated with one of the five argumentation strategies (described as part of the argument introduction sub-process below). In every model run, all agents share the same argumentation strategy. Distances between positions of any two agents are measured by means of the Hamming distance, which is interpreted to be the number of sentences that are mapped to a di erent truth value.
. A debate stage is described by the debate at that time (the current state of the conjunction of the arguments) together with the agents' current positions. There are a number of high-level properties that can be obtained from these lower-level properties. The first object of interest here is the argument graph. An argument graph is a two-coloured directed graph that takes the arguments in the debate as nodes and defeat and support relations as edges. A pair of arguments (a, b) satisfies the support relation if the conclusion of a is equivalent to one of the premises in b, and the pair fulfils the defeat relation if the conclusion of a is equivalent to the negation of one of the premises in b. This means that the relations between arguments are automatically obtained from the arguments. Argument graphs are not necessarily complete, and are non-circular more o en than circular. The argument graph of a debate stage i is referred to as τ i . .
A second group of higher-order properties of a given debate stage concern its space of complete and coherent positions, represented as Γ τ (SCCP, Betz , -). In logical terms, the SCCP is the set of all satisfying interpretations of the Boolean formula that represents the debate at a given stage (see Figure for an example). It should be noted that the SCCP is very di erent to, and usually contains much more elements than, the collection of actually held positions by the simulated agents. Actually held positions have to be in the SCCP, but multiple agents can hold the same position from the SCCP, and the actually maintained positions in the simulation can be quite spread out in the SCCP. In terms of the model, the SCCP is the set of positions that the actual agents are allowed to move to should their positions be rendered incoherent by the introduction of an argument. A position is incoherent relative to a debate stage if its assertions are jointly unsatisfiable with the arguments at that debate stage. Other than that, the model allows agents to move freely in the SCCP. In particular, it does not prescribe them to favour positions with maximal quantitative argument support, or to move toward positions held by other agents. This seems realistic considering that it can be rational to adopt a position even if there is just a single argument in its favour, namely when the single argument is especially convincing. Argument evaluation and a measure of argumentative strength are not part of this model, however. In the argument display, premises and conclusions are separated by a horizontal bar, and the defeat relation is expressed with a dashed arrow. In the SCCP, nodes have a label that shows the bit string representation of the position they resemble. In this string, sentences are ordered alphabetically (p 1 will show in the first bit, p 5 in the last), and they are either if the position assigns False to this proposition, or if the position accepts the proposition. Positions are connected by an edge if they di er in exactly one truth-value attribution (i.e., if their Hamming distance equals ). .
The size of the SCCP, |Γ τ |, is used in calculating a debate stage's density, a fundamental measure of progress in debate simulations (Betz , -). Roughly speaking, density encodes how many positions have been rendered incoherent so far in the debate, and how much freedom the agents have in choosing the next position to move to if they have to. Importantly, not every argument introduction raises the debate's density in the same way, and debates can take up to twice as much argument introductions to reach the same density. A debate stage's density is always in the [0, 1] interval, and defined as (n − log 2 (|Γ τ |))/n, where n is the number of atomic sentence variables (Betz , ). Figure in Appendix B shows how density evolves over simulation time depending on the di erent argumentation strategies. .
The most important parameters of the model and initial settings for the main experiment are described in Table . Almost all of them influence the simulation's computation time. The number of premises per argument is initially chosen as the range -, but a robustness analysis was run on -sentences per argument. This parameter can be either a number or a range. If it is a range, the number of premises in any introduced argument is chosen randomly. The number of atomic sentence variables and the termination density are taken as best practices from Betz ( ). The number of agents are chosen somewhat arbitrarily, but robustness analyses are provided for smaller populations and sentence pools (which actually confirm and exceed the results). Varying the size of the sentence pool does exponentially influence the run time of experiments. Given the current so ware implementation and the available computational resources for conducting the experiments reported below, a size of atomic sentence variables, resulting in a sentence pool of sentences, proved workable.

Processes and scheduling
.
The simulation proceeds by two kinds of events: argument introduction and a following position updating, and simulation time is understood as numbers of introduced arguments. At every step, both events are called until limiting conditions are met. The model terminates either when a density greater or equal than the maximum density parameter is reached, or if an argument introduction fails due to lack of premises or conclusions meeting the requirements imposed by the strategy. .
For each argument introduction, two agents are randomly drawn from the population. The first agent is understood to be the source, the second one is called target. The source then acts according to its associated argumentation strategy. A single model run assigns the same strategy to all agents, which can either be one of the four basic ones (Betz , -): • Attack: From the sentence pool, the source picks premises that it accepts and a conclusion that the target rejects to build a valid argument. The source also ensures that the conclusion does not contradict its position.
• Fortify: From the sentence pool, the source position selects both premises and a conclusion that it accepts to construct a valid argument.
• Convert: From the sentence pool, the source position selects premises that the target accepts and a conclusion that the source accepts to build a valid argument.
• Undercut: From the sentence pool, the source position picks premises that the target accepts and a conclusion that the target does not accept to construct a valid argument. The source also ensures that the conclusion does not contradict its position. .
Agents can also have a fi h strategy, which picks a strategy at random for each argument introduction: • Any: The source randomly chooses one of the four basic argumentation strategies to introduce a valid argument. .
When this argumentation process is completed, all agents in the population check whether their positions are rendered incoherent by the new argument. From the logical point of view, an interpretation can satisfy one Boolean formula, but not the updated conjunction of the Boolean formula and an added formula (i.e., the previous debate extended by a newly introduced argument). When this happens, a position is rendered incoherent.
In the model, all the agents with incoherent positions following argument introduction immediately update their position. Figure in Appendix B shows how many agents, on average in the model runs of the main experiment, update their position following argument introduction. As can be seen, the undercut strategy has the agents update considerably more o en than the other strategies. As the SCCP shrinks and density rises accordingly, there is comparatively high pressure on the agents to update their positions. A er all, a shrinking SCCP implies that a decreasing number of positions are acceptable to the agents. .
The position update strategy that all agents share points them to their respective closest coherent position from the SCCP. "Closest" is understood to mean lowest Hamming distance. When there are several coherent positions with minimal Hamming distance, one of them is chosen randomly. Distances to the positions held by other simulated agents do not influence the updating process.

Design concepts .
A er this technical review of key objects and processes, let me provide a more conceptual reflection of the model's properties.
• Emergence: The starting positions of agents are randomly assigned at the start of each model run. However, agents autonomously select their subsequent updated positions based on coherence criteria. Levels of polarisation in this sense emerge from the model. This is also true for the relation between propositions: whether an agent can simultaneously accept two items from the sentence pool depends on the autonomously introduced arguments.
• Adaptation: Agents update their position if a newly introduced argument renders their position incoherent. They do so by moving to the closest neighbour among the remaining coherent positions in the SCCP, or selecting a random next neighbour if more than one have minimum Hamming distance to their previous position. In this way, only the logical relations matter for an agent's adaptation.
Although more than one agent can hold a coherent position, agents select the closest position among all coherent positions, not just those that are currently held by other agents in the simulation. If updating is required, they choose one of the coherently adoptable positions without regard for what others believe, even their closest neighbours.
• Fitness: Agents have only two goals. The first is upholding a coherent position, which they maintain by the adaptation process just described. Their second goal is to introduce arguments according to their assigned argument strategy if the model determines that it is their turn. Agents always fulfil both goals: since there is no distance limit in selecting a new position, updating always succeeds for every agent. Also, the simulation stops when one agent is unable to introduce an argument according to its argumentation strategy. Polarisation is thus not introduced due to agents' inability to accomplish their goals.
• Sensing: Agents that are selected for argument introduction know their own position and the one of the other agent in the turn. They are aware of the complete sentence and premise pool.
A er argument introduction, all agents recognise whether they need to update their position or not. Those that do are aware of all of their options, i.e. they know all the remaining coherent positions in the SCCP and their own position's distance to them.
• Interaction: Agents interact in terms of argument introduction described above. Agents are indirectly influenced by the actions of other agents when their position is rendered incoherent and they are forced to update.
The interaction of agents is a ected only by random processes. For example, agents do not prefer to introduce attack arguments against agents with a high Hamming distance to their position. They are also ignorant to what relations their introduced argument will have to existing arguments.
Agents impact the opinion dynamics of others by introducing logical constraints to the debate. For a minimal example, consider two agents with positions a 1 = {p 1 → True, p 2 → True, p 3 → False} and a 2 = {p 1 → False, p 2 → False, p 3 → True}, and consider that a 1 would introduce the valid argument (p 1 ∧ p 2 ) =⇒ ¬p 3 , thus reflecting its truth-value attributions. The argument stands against a 2 's belief in p 3 , but is a 2 forced to update its system of belief because of the argument? No. a 2 does not accept p 1 and p 2 , and so does not need be moved by an argument that relies on their truth. If a 2 would accept both p 1 and p 2 , then giving up p 3 would result in the shi to the closest coherent position.

This simple example illustrates how agents only have intermediate control over the beliefs of others.
Through their argument introductions, agents shape the space of complete and coherent positions. But what the other agents make of their options is a di erent issue.
• Collectives: Agents are not grouped into collectives. The population is regarded as a uniform whole.
• Observation: Among others, the model tracks every agent and its position at every debate stage and the density of that stage. These are the two fundamental variables for calculating polarisation measures. The current implementation of the model logs more information about the model run, including position updating at each stage, and all of the arguments introduced at any given debate stage.
At the start of every simulation, the sentence pool is generated. In the main simulation presented below, the sentence pool is generated from atomic sentence variables, p 1 , p 2 , ..., p 20 , and their negations, ¬p 1 , ¬p 2 , ..., ¬p 20 . The sentence pool thus consists of sentences. From this pool, a premise pool is constructed. The premise pool consists of all combinations of sentences that can be used in an argument. Given that the argument length is set to or premises and given the condition that an atomic sentence variable should appear only once in the premises of each argument, the number of possible combinations of premises is : − 20 · 38 = 760 + 9120 = 9880 .
Agents select their initial position by randomly assigning truth values to every atomic sentence variable (though see the Robustness Analysis Section which varies this initial setting). In the simulations below, these truth values are either True or False, and simulating positions with probabilistic assignments is le for future research. The debate contains no arguments at the beginning of the simulation. .
What kind of debates are simulated with the initial values for the main experiment? With participants and sentences under discussion, these artificial debates could model political deliberation in parliament or scientific deliberation at a conference (think of a panel and its audience), or of the participants at a citizen deliberation event. However, the fact that there is a continuous debate forum and no side conversations take place is a simplification over the real-world originals, while the restriction of arguments with and premises is due to computational limitations.

Input .
Input to the model is limited to the settings in the model initialisation. Environment variables such as the premise pool and the space of coherent and complete positions change only based on the agents' behaviour, as described in the two sub-modules.

Sub-modules .
There are two noteworthy sub-processes that shape the evolution of debates in the model. Both have elements of random choice -which is why the model should be evaluated in simulation experiments with many runs.
• Argument introduction: At every debate stage, two agents from the population are drawn at random. Depending on the argumentation strategy, the first agent (the source) then draws premises from the premise pool that meet the criteria imposed by the strategy. For example, in the convert strategy it will draw a random set of premises that is accepted by the target agent. The fortify strategy is a special case in this regard, since it does not require inspecting the target's belief system.
Next, the source agent looks for a conclusion from the sentence pool that (a) is not equivalent or contradictory to one of the premises and (b) meets the criteria that the argumentation strategy imposes on conclusion choice. For example, in the convert strategy, this conclusion must be one that is currently accepted by the source agent. The search continues until a valid argument is found, i.e. one that is jointly satisfiable in conjunction with the arguments already present in the debate. When the introduction succeeds in this manner, the set of selected premises is removed from the premise pool, which means that this set of premises is unavailable for subsequent argument introductions for the rest of the model run.
If no valid argument can be found for this particular pair of agents, the process is repeated at most A/2 times by drawing another pair of agents from the population, where A is the size of the population.
It should be noted that argument introduction almost always changes the extension of the space of coherent and complete positions, although argument introductions can di er significantly in their impact. Argument introductions can render previous positions incoherent, and are the driver behind agents updating their positions in the course of the debate.
• Position updating: Newly introduced arguments can render existing positions incoherent, and they regularly do. A er each argument introduction, all agents in the debate check whether their position is still valid given the new debate stage, and all agents that now hold incoherent positions update them.
For all agents, the update strategy in this model is always the move to the closest coherent position. In order to find the closest coherent, every agent with an incoherent position compares its position to all coherent positions in the SCCP, and moves to the one with minimal Hamming distance. If there are multiple positions with minimum distance, one is chosen at random.

Experimental design
. Since argument introduction and position updating include elements of random choice, it is unavoidable to study the model in simulation experiments with many iterations. In this section, I present the results of seven experimental settings. The main experiment has a population of agents and atomic sentence variables, resulting in a sentence pool of sentences. In a robustness analysis, I also study the model in conditions of ( ) agents and sentence variables, ( ) agents and sentence variables, ( ) agents and sentence JASSS, ( ) , http://jasss.soc.surrey.ac.uk/ / / .html Doi: . /jasss. variables (resulting in a sentence pool of sentences), ( ) agents and sentence variables, but an argument length of -premises instead of -, ( ) with initial positions in perfect bi-polarisation and ( ) with a clustering on a subset of key issues. This robustness analysis does not only confirm the results from the main experiment, but shows that polarisation e ects can be amplified by contraction of the population and in particular the sentence pool.

.
There are a total of , experiments in each setting, , for each argumentation strategy. Apart from varying population size, sentence pool, and, in one case, length of arguments, all experiments have the same set-up: all five argumentation strategies are compared in each experiment and the experiment always runs until either a density of ≥ 0.8 is reached or an argument introduction fails, whichever occurs first. In the main experiment, all debates end due to the density condition. Termination there occurs on average a er turns for convert model runs, in undercut, in fortify, attack models take on average, and models with the any strategy . All in all, the data for the main experiment consist of , debate stages. I evaluate the model runs by applying three polarisation measures, all of which are adapted from the definitions in Bramson et al. ( ): dispersion, group divergence and group consensus. All measures return values in the [0, 1] range. .
The observed values for issue polarisation are lower in this model than in other studies, and only a low percentage of simulations end in clear-cut bi-polarisation (which happens frequently in the social influence and Bayesian models discussed above). But it is important to keep in mind that this model only accounts for the influence of arguments and argumentation strategies on polarisation, and drivers such as homophily as well as other properties of agents are ignored.

Dispersion, understood as standard deviation .
Dispersion tracks an intuitive idea of how agents and their belief systems can polarise by measuring how agents deviate from a population-wide mean. If these spread out evenly or cluster around one pole, dispersion will be low, but clustering around an increasing number of poles will lead to increased dispersion.
. When agents' belief systems are understood in terms of positions toward a debate stage, it is usually impossible to define a population-wide mean. This is because there will o en be several, but distant, positions that likewise maximise centrality measures, and graphs on positions o en have more than one graph centre.
. A way to avoid this is to replace the mean position with the mean distance between all pairs of positions, and then inspect the dispersion of distances around that mean. But one must be careful here to select a polarisation measure that is an aggregation of distances -merely interpreting the average distance between pairs of agents as dispersion would lead to a concept of polarisation that is too close to a concept of agreement. Following Bramson et al. ( , ), I use the standard deviation of pairwise distances as a measure of dispersion (Definition ).
Definition . Dispersion, understood as the standard deviation on the pairwise distances between agents' belief systems. Let δ be the Hamming distance and A τ the set of agents at debate stage τ , represented by their positions. Aτ 2 denotes the pairs of agents in the population. With N = |A τ |, dispersion is defined as: Note: This is an instance of the common SD measure from statistics. (...) 1/2 is used here instead of (...) for cleaner display.
. Figure shows the development of dispersion depending on argumentation strategy and plotted against density in the main experiment. It allows for a very general inspection of polarisation, and shows that the introduction of arguments generally increases polarisation. The argumentation strategies di er in their contribution to polarisation, which is comparatively high in the attack and comparatively low in the convert strategy. Agents with the any strategy show a slightly higher rate of polarisation in lower density, but end up less polarised than the attack and fortify model runs. This seems to indicate that the di erent e ects of the strategies seem to balance each other out when triggered in alternation. .
When simulations reach densities of around . , the model runs are about to terminate a er each agent has introduced -arguments on average, and the e ects of argumentation will be most visible in this period. At densities of around . , the mean values for attack simulation runs are higher than in other strategies, particularly for convert and undercut (see Figure in Appendix C). The latter more o en reach lower dispersion values than the other strategies, and some of the simulation runs have dispersion values comparable to their initial values, going against the general tendency.

Group-based measures .
Dispersion measures pairs of agents uniformly and is ignorant as to whether they are members of di erent communities, or groups. Group-based measures (as defined by Bramson et al. ( , -)) rely on the community structure, or clustering, of the population and treat distances between neighbours (members of the same group) di erent to strangers (members of di erent groups). Groups can be determined either endogenously or exogenously (Bramson et al. , -). An endogenous definition works on the structure of the population alone, such as in community structuring algorithms. Exogenous definitions require the addition of pre-defined criteria to partition the population into groups.

.
While the model could be adapted to structure its communities exogenously (for example, by adding a central thesis or background beliefs to the debate and forming groups based on whether agents accept them), in its current form the model recommends an endogenous approach. For this, I have utilised two state-of-the-art clustering algorithms: Leiden (Traag et al. ), a modularity maximisation algorithm, in the implementation from python-igraph version . . (Csárdi & Nepusz ), and a inity propagation (Frey & Dueck ) from scikit-learn version . . (Pedregosa et al. ). The results below are mostly reported following the Leiden clusterings, while a inity propagation was used to compare and legitimise the Leiden clusterings. .
The input to each clustering algorithm is the distance matrix of the population of agents, where the values are the Hamming distance between the agents' positions, normalised by the number of sentences (i.e., HD(x, y)/40 for agents x, y in the main experiment). For Leiden and a inity propagation, these distances were transformed by exp(−4x), and resulting values below . filtered out. This reduces the numbers of edges in the graph, which improves the clustering for Leiden. Leiden seems to expect sparsely connected graphs (as common in social networks), and the transformed and filtered distance matrices lead to a success rate consistently above %. Transformation and filtering also improve the convergence rate for a inity propagation, which then is around -%. Both algorithms output non-overlapping clusters and are deterministic, i.e. they output the same clustering for the same input every time they are run.

.
Since there are no previous reports of applying Leiden and a inity propagation to agent-based debate models built on TDS, it is important to ensure that these algorithms return reliable results. One way to measure the quality of clusterings is the adjusted Rand index (ARI, Hubert & Arabie ). The ARI compares two clusterings by looking into how many agents are clustered into the same group in both clusterings, and how many are clustered into di erent groups. For the present purpose, I apply the ARI to count how many pairs of agents that are clustered into the same community in one debate stage are also members of the same community in the following debate stage, thus measuring in how far an argument introduction changes the clustering. A low ARI indicates that many agents have been clustered di erently compared to the previous debate stage, while a higher ARI shows that more agents are in the same cluster as before, thus implying a lower mobility of agents and less force of arguments to influence the composition of groups. Then, the goal is to have a somewhat high, but not too high mean ARI value that can confirm the intuitively plausible expectation that the majority of agents remain in their group in most debate stages. The model should also allow for some fluctuation in the ARI, because some argument introductions have little if any e ect on the debate, while others convince many to change their views. In the evaluation of the model, the ARI between pairs of adjacent debate stages took a median value of about . , depending on the argumentation strategy (see Figure in Appendix B). The observed values indicate that clusterings based on the model are stable enough to simulate intuitively plausible opinion dynamics, and are thus reliable.

Group divergence .
An interesting question to ask about a population's community structure is how far apart its groups are, or what their degree of divergence is. In Bramson et al.'s understanding, this measure compares the group opinion means ( , ). As before, since the concept of a mean is hard to apply if belief systems are modelled as positions toward debate stages, I use the averaged distance value between all position pairs instead of a single value for the group. .
In divergence, this translates into how, for each agent, the distances to neighbours deviate from the distances to its strangers. This then gives a measure of how distant the groups are. See Definition for my formulation.
Definition . Group divergence, based on Bramson et al. ( , -). As before, let A τ be the population of agents at debate stage τ , represented by their positions. Let δ be the Hamming distance. For a position x i , G(x i ) is the set of positions of the same group (neighbours), while ¬G(x i ) are the out-group positions (strangers) determined by the algorithm. Note that | · | denotes either the cardinality of a set or the absolute value of a distance, depending on its argument.
Note: The egocentric "me" in the measure runs on index i. Its neighbours run on index j, and its strangers on k.
. Before going into the analysis on averaged values from large amounts of simulations, let me present the clustering analysis and resulting divergences in single runs of the model. Figure looks at two single runs, one attack and one convert, and shows how the populations of agents therein move into states of low and moderate polarisation. These could be interpreted as typical evolutions for the attack and convert strategies. While all strategies have a non-zero chance of ending in low or moderately polarised states, low polarisation is much more likely in convert, and to a lesser extent in undercut debates, and moderate polarisation is most likely in attack, and somewhat more likely in the fortify debates. So the evolution shown in Figure for attack could have materialised with a di erent strategy -but it is more likely for an attack debate to behave in this way. .
Probably the most interesting feature of the attack series in Figure is tri-polarisation in the last debate stage -especially when it is contrasted with convergence in the last debate stage of the convert series. There are two other di erences. First, notice how the divergence is steadily increasing in the attack run, but developing non-monotonously in the convert run. This seems to show how the convert strategy is able to recover from increasing polarisation. Secondly, while both runs start with a high number of groups ( ), the attack strategy very quickly reduces to only groups, while the convert strategy is able to maintain its diversity until τ 25 , and is able to uphold groups until at least τ 75 . This ability to maintain a higher diversity could be interpreted as contributing to the lower values observed in convert simulations.
. From the main experiment, the overall results for divergence depending on the two clustering algorithms are shown in Figure . As in the results for dispersion, these show how the introduction of arguments contributes to polarisation in general. More particularly, the attack strategy shows the highest polarisation values, while values particularly in undercut and convert seem to be more frequent in less polarised states. Values for the any strategy also lie between those of the four basic strategies, confirming the observation from the dispersion measurement. All together, this confirms the overall pattern from the dispersion values, although with considerably higher levels of polarisation.
. Figure takes a more concentrated look at the divergence data by showing the divergence distribution for the simulation runs as they reach a density of around . . The panes compare the main experiment with two robustness analyses, and they show a noteworthy di erence among argumentation strategies: convert and undercut model runs reach low levels of divergence much more o en, and they have smoother distributions, whereas fortify and particularly attack model runs are single-peaked with a considerably lower chance of ending in low polarisation, an e ect that remains stable in the robustness analyses (to be further discussed in the dedicated section below).  For example, it shows that the convert strategy has a much higher proportion of model runs with a divergence of less than . at density . than the attack strategy, and that the biggest proportion of convert runs (about %) in the main experiment has a divergence of around . , while the attack strategy peaks at around . with a proportion of more than %. .
In quantitative terms (see Figure in Appendix C), di erences in group divergence are more pronounced than in dispersion, but the tendency is the same. In divergence analysed with the Leiden algorithm, % of simulation runs in the convert strategy have a group divergence of less than . -which is a very low increase, if any at all, from the start of the debate. For the attack strategy, only . % of simulation runs reach this low level of polarisation. But about % of simulation runs with the attack strategy show moderate polarisation of at least . , while only about % of convert debates do so. Fortify and undercut strategies are somewhere in between, with the fortify debates showing more tendency for medium polarisation and the undercut showing at least some chance of lower polarisation. The divergence mean for all data points at a density of around . is . for convert, but . for attack (see Table ).

.
Table provides a possible explanation for the lower mean polarisation in convert and undercut: it might be due to their increased ability to reach states of very low polarisation more o en (the lowest % of both strategies have a mean of . , compared to . in attack). However, the strategies seem to di er less in their chance to reach higher values of polarisation, as the highest % of simulation runs have lower variation. .
Qualifying the result that arguments drive polarisation on their own, two argumentation strategies, convert and undercut, have a higher chance to end in states of low polarisation, while the other two, fortify and attack, tend to drive moderate levels of polarisation. These tendencies are noteworthy because they run parallel to another distinction: fortify and attack are very much egocentric argumentation strategies insofar as they select premises from the source position. Convert and undercut are allocentric strategies by the same standard: the source devises an argument with premises that the target accepts. So it seems that, in agent-based debate models, egocentric premise selection can be a driver of moderate polarisation, which is most pronounced in the attack strategy, while allocentric premise selection has a higher chance of inducing states of lower polarisation, which is most pronounced in the convert strategy.

Group consensus
.
When a population of agents is clustered into groups, one can not only ask how much the groups di er, but also how high the agreement is in each individual group. Are they a tightly knit bunch or a more diverse group, in which disagreement may not be uncommon at all? Group consensus is aimed to measure this.
Definition . Group consensus, based on Bramson et al. ( , -). Let δ be the Hamming distance and G the clustering of the population at a debate stage with individual clusters g. The expression g 2 is understood to denote the set of agent pairs in g. The debate's consensus is then given as: δ(x, y) .
Group consensus measures how distant the members of groups are on average. This measure can capture situations in which the distance in groups is changing over time; contracting groups could be associated with lowering compatibility to outside influences, while rising distance between the group members could indicate that the groups are well acquainted to diversity of opinion and thus more open to outside influence. A rise in group divergence and a simultaneous rise in consensus captures an important part of the intuitive understanding of polarisation. .

Figure shows how group consensus develops amid the introduction of arguments in the main experiment.
It is evident that group consensus correlates with density (Pearson's r > 0.9, p 0.001 for all strategies), and that the variation between di erent strategies has a minor e ect. Rising group consensus indicates that variance within groups diminish, but it does not automatically indicate that the groups move toward a more extreme stance than initially held by its members, and so this development does not quite confirm the law of group polarisation (Myers ; Sunstein ).
Figure : Development of group consensus in debates clustered with the Leiden algorithm, depending on argumentation strategy. Shaded areas show standard deviation.
. It seems that the introduction of arguments, virtually irrespective of the employed argumentation strategy, can bring groups closer together. When evaluated together with the results from group divergence, there is a di erence in which kind of group is, on average, produced by the argumentation strategies: while convert and undercut arguments lead to groups that are both in internal agreement and diverge less o en from other groups, attack and fortify arguments have a more realistic chance to drive internally agreeing groups further apart, thus generating polarisation.

Robustness Analysis
. Six experiments complement the main experiment, which has a population of agents, atomic sentence variables, and an argument length of -premises. The first four complementary experiments show that polarisation e ects remain at least stable under variation of the initial settings concerning population size, extension of the sentence pool, and length of arguments. Table from the previous section shows the mean values for group divergence at a density of . for the main experiment and two of these robustness analyses. .
In a fi h robustness analysis, agents are not initialised with randomly assigned positions, but start o clustered into two groups with perfect bi-polarisation. This setting is designed to study the model's behaviour concerning de-polarisation rather than polarisation. In the sixth and final analysis, the Leiden clustering is not obtained by taking into account agents' complete positions, but only their stances on four propositions. This analysis is done to accommodate the fact that many real-world debates have a subset of sentences under discussion that are regarded to be the debate's key issues.
relatively low at the beginning of a model run. This raises the question how the model behaves when the population starts highly polarised. In this robustness analysis, perfect bi-polarisation is induced by splitting the population of agents in half, and assigning the same position to each agent in each half. All agents in the first group start by assigning True to the first half of sentences, but False to the other half ({{p 0 , p 1 , ..., p 9 } → True, {p 10 , p 11 , ..., p 19 } → False}), and the agents in the second group hold the exact inverse at the start ({{p 0 , p 1 , ..., p 9 } → False, {p 10 , p 11 , ..., p 19 } → True}). This creates an initial perfect bi-polarisation in terms of group divergence (see Figure ).

.
There is a striking di erence between the argumentation strategies as they respond to initially bi-polarised debates. While the strategies that select premises allocentrically (convert and undercut) show significant effects of de-polarisation, the egocentric strategies (attack and fortify) prove unable to recover from a state of bi-polarisation. Populations that use only these strategies remain at a state of bi-polarisation throughout the debate, while the convert and undercut strategies quickly lead to significantly lower polarisation levels. When allocentrism and egocentrism in premise choice are mixed in the any strategy, the outcome is mixed as well: de-polarisation occurs, but at a lower rate than in the purely allocentric strategies.
Figure : Group divergence following Leiden clusterings for an experiment with agents and atomic sentence variables in which the agents start in perfect bi-polarisation. The graphs for the attack and fortify strategies are exactly the same and overlap in this plot.

Clustering on a subset of propositions .
The clusterings in the evaluation above take into account agents' complete positions, which assumes that all sentences under discussion are equally relevant in determining the groups. Yet debates o en evolve around a set of key issues. For these, it may be more realistic to cluster agents into groups depending only on their stance toward these key issues. Figure shows the results of such a clustering on a subset of the sentence pool consisting in four propositions. The debate stages from the main experiment are used for this analysis, but instead of using agent's complete positions for the clustering, it asks how the population would have been clustered if only these four propositions had been taken into account. The results confirm the main findings, but there is significantly more volatility in the data.

Summary of results
. In this paper, I studied an agent-based debate model based on the theory of dialectical structures. It models a population in which agents update their perspective due to logical constraints, but not based on social factors such as similarity or trust. A simulation experiment of model runs revealed that arguments can generally be a driver of issue polarisation, and that argumentation strategies a ect it di erently. This result was confirmed in a robustness analysis. In dispersion and group divergence, two state-of-the-art measures for issue polarisation, argumentation strategies that behave egocentrically (attack and fortify) in the selection of premises were associated with significantly higher levels of polarisation compared to strategies that select premises allocentrically (convert and undercut). All argumentation strategies increased issue polarisation similarly when observed in a third measure, group consensus. Besides the general influence of arguments on polarisation, the picture that emerged here was that the attack and fortify strategies simultaneously lead to groups being more alike internally and more distant compared to other groups, while convert and undercut produced groups that, albeit rising internal consensus, did not move apart from other groups that much. .
The argumentation strategies also significantly di ered in their ability to recover from bi-polarisation: when agents used the allocentric strategies or a mixed strategy ("any"), they were able to de-polarise debates with initial bi-polarisation. However, the egocentric strategies failed to recover from perfect bi-polarisation and did not show any ability to de-polarise. .
The model shows that polarisation is possible among artificial agents by means of rational processes. Introducing arguments and responding to those of others in a rational manner influences the polarisation dynamics in this model by showing how a initially unpolarised population can move to medium levels of polarisation, and how populations that di er in their argumentation strategy also di er in their ability to de-polarise an initial setting of perfect bi-polarisation. The argumentative, rational behaviour is the sole driver inspected in this model, but it is for future research to inspect polarisation dynamics as argumentation interacts with other factors.

Limitations .
The model presented here is intended to understand issue polarisation in a specific kind of artificial agent. The agents are modelled to have bounded rationality and always follow the same argumentation and updating strategies, without making any errors in applying them. The results from simulations on this model should not be directly applied in interpretation of human behaviour and/or states of social polarisation. Rather, this model elucidates properties of argumentative features irrespective of other variables, of which there are quite a few.
. On a minor note, Polberg & Hunter ( ) stress the importance of modelling (a) bipolar argumentation, allowing for both support and defeat relations in agent-based debate models, but also of modelling (b) probabilistic belief systems. The model presented here fulfils their requirement (a) but falls short of fulfilling (b), mainly due to computational restrictions. An extension of the model to probabilistic belief systems is le for future research. .
As mentioned above, the simulation results fall short of producing high and very high polarisation values. This is in contrast to some social influence models (Mäs & Flache ; Banisch & Olbrich ), which o en end in states of perfect bi-polarisation. Yet this inability to produce perfect bi-polarisation should be seen as a virtue rather than a vice. If argumentation alone were to explain high and very high degrees of issue polarisation among artificial agents, there would be no room to accommodate other factors in extended models. The factors not considered in this model include homophily, limited agent memory, and bias in selection of communication partners relative to argumentation strategy. Extensions of this model could consider if there should not be some bias in selecting a target position given some of the argumentation strategies: for example, what changes if agents only attack out-group targets?
Figure : Development of density over simulation time (in terms of debate stages) depending on argumentation strategy and averaged over all model runs in the main experiment. Here, attack and fortify model runs can take considerably more simulation time to reach the termination density of . .

Appendix C: Raw polarisation values
The plots in this appendix display raw polarisation values. They are interpreted as heat maps in tabular alignment, where the x-axis shows polarisation intervals and the cells contain the proportion of model runs in the respective argumentation strategy that lie in this interval as they reach a density of at least . .  Figure : Polarisation distribution measured as group divergence based on a clustering from the Leiden algorithm for simulation runs as they reach a density of . (data for control experiment with agents and atomic sentence variables).