A geospatial bounded confidence model including mega-influencers with an application to Covid-19 vaccine hesitancy

We introduce a geospatial bounded confidence model with mega-influencers, inspired by Hegselmann and Krause. The inclusion of geography gives rise to large-scale geospatial patterns evolving out of random initial data; that is, spatial clusters of like-minded agents emerge regardless of initialization. Mega-influencers and stochasticity amplify this effect, and soften local consensus. As an application, we consider national views on Covid-19 vaccines. For a certain set of parameters, our model yields results comparable to real survey results on vaccine hesitancy from late 2020.


Introduction
Opinions drive human behavior [36], and opinion formation is a complex multi-scale process, involving characteristics of the individual, local interaction of individuals, social media, mass media etc. Opinion dynamics have been modeled using approaches inspired by physics [38].For surveyes of the literature on opinion dynamics see for instance [27], [31], and [28].
Opinions are formed in part by people talking to their families, friends, colleagues, etc.This is the sort of mechanism that the bounded confidence model of [21] aims to capture.It is just one of several opinion dynamics models that have appeared in the literature; see for instance [5,18,20,37] for others.However, our work here starts with the Hegselmann-Krause model.
Hegselmann and Krause [22] augmented the model to include the impact of "radicals" on opinion formation.By their definition, a "radical" is an individual (or a group of individuals) holding an opinion that is extreme (at one end of the opinion spectrum) and unchanging.[29] proposed another model including radicals.Our Opinion Dynamics Network (ODyN) model includes "radicals" as well.We call them mega-influencers, thinking of mass media, prominent politicians, etc., and assuming that a mega-influencer is heard by a large fraction of the population.
To a model of opinion space dynamics with mega-influencers in the style of earlier work such as [22] and [29], we add the new feature of geospatial dynamics.We assume that individuals who are further apart from each other in two-dimensional space are less likely to influence each others' opinions; this is reminiscent of the geometric inhomogeneous random graphs of [7].The addition of a notion of spatial proximity turns out to have a very interesting effect: Large-scale geospatial patterns evolve out of random initial data.That is, spatial clusters of like-minded agents (think "blue states" and "red states") emerge, regardless of initialization.
Closeness in two-dimensional space can be thought of as a stand-in for different notions of closeness; for instance, close family members might be considered "nearby" even when Date: 13 October 2022.they live on a different continent.However, despite the seemingly geography-less nature of the online world, studies have shown [26] that geographic distance remains a key component in the formation and maintenance of social networks.For an extensive review on spatial networks, see [3]; the networks that we propose here are similar to the hidden variable model for spatial networks presented in Section 3 of [3], but futher include bounded confidence.Finally, we introduce the assumption that different people have different levels of influence, as in a Chung-Lu random graph [9,10,11].Combing all of these factors, we can compute the probability that two agents speak during a given timestep and update their beliefs accordingly, similar to the random interactions of Weisbuch, Deffuant, et al [39].As a result, opinion clusters no longer become perfectly tight with time, but remain blurred.
Social media does not appear explicitly in our model.Social media interactions can be akin to conversations among friends, family, neighbors, colleagues, in other words the sort of interactions modeled by the original Hegselmann-Krause model.However, social media users can also be mega-influencers; think of a Twitter account with millions of followers.
We construct a random graph reflecting all the features and assumptions discussed above.Vertices represent individuals, and directed edges indicate who influences whom.(When individual v's opinion influences that of individual u, this does not necessarily imply that u also influences v.) In the simulations presented here, the spatial domain is a triangle, and the spatial locations of individuals are independent of each other and uniformly distributed.The code in the ODyN library (Github link removed for anonymity) also allows simulations on unions of triangles, where the number of individuals in each triangle is chosen to be random, Poissondistributed, with an expected value proportional to the area of the triangle, possibly with different constants of proportionality for different triangles.In short, the spatial locations in the ODyN library are a Poisson point process with a possibly space-dependent rate.
As an example, we consider opinions about Covid-19 vaccination, a topic of urgent current interest for which data are plentiful.By April 19, 2021, vaccination was approved for everyone in the US age 16 and older.Despite the fact that the vaccine was free to all residents of the US, many factors impeded widespread vaccination.There was a lack of availability and access to vaccines in rural areas [30], and vaccine hesitancy was impacted by fear of rare but severe vaccine side-effects [4], social and economic factors [35], as well as targeted misinformation campaigns and politicization of issues surrounding the vaccine [32].Others have worked on bounded confidence models and spread of misinformation, for instance see [15].Such models are different from our model, in that they require the presence of a "ground truth," and therefore there is a concept of mis-and dis-information.In that and similar works the focus is on how best to organize communities of interacting agents in the presence of a fixed goal informed by the ground truth, and how to achieve this goal most efficiently [13], [14], [16][33] [40].However, when the question is whether or not to accept a Covid vaccine, there is no objective, unquestionable "ground truth." Since the onset of the Covid-19 pandemic, the extent of hesitancy regarding the vaccine has been tracked at both the national and county levels.For example, the Centers for Disease Control's data portal includes a dataset that provides county level estimates for vaccine hesitancy based on data gathered in the U.S. Census Bureau's Household Pulse Survey, [8].Carnegie Mellon University's Delphi Group Covid-19 Trends and Impact Survey is available through a public API [1] and estimates the extent of vaccine hesitancy at the county level, including changes from week to week.At the national level, [34] use survey data to track changes in vaccine hesitancy over time, and notably also include the proportion of people who change their beliefs, becoming either more or less hesitant over time, as shown in the far left of Fig 1 .We show how our model can be parameterized to arrive at the empirical results presented in [34], and discuss what this might mean in terms of the mechanism of belief proliferation.Moreover we demonstrate the formation of spatial clusters as seen in the far right of Fig 1 .Other noteworthy work applying opinion dynamic models to mimic empirical results can be found for example in [19], [6].
The paper is structured as follows.We begin by introducing the model, explaining its parameters, dynamics, and statistics.Next we describe a set of simulations that were carried out to perform model analysis.Finally, we present the results of these simulations, along with the application to Covid-19 vaccine hesitancy.All code and data relevant to this paper are distributed in the ODyN library available at (Github link removed for anonymity).

Model
2.1.Directed graph encoding who influences whom.Let N be a positive integer, and consider N individuals.We use letters such as u and v (for "vertex"), 1 ≤ u, v ≤ N , to label individuals.We will construct a random directed graph in which the individuals are the vertices, with an arrow (a directed edge) from individual v to individual u indicating that v influences the opinion of u.We write p uv = probability of an arrow from v to u.
We do not assume symmetry: p vu need not be equal to p uv .

Spatial locations.
This part of our model is inspired by [7], although several of the details are different here.We assign to individual v a random spatial location X v in a polygonal domain D in the plane.In the code available through ODyN, D is assumed to be a union of triangles, and the number of individuals per triangle is taken to be random with Poisson distribution, with a rate that can be different for different triangles.The locations of individuals within each triangle are then assumed to be independent and random with uniform distribution (see Fig 2 for an example).We use triangles because they are a flexible way of approximating more complicated shapes, and it is straightforward to generate uniformly distributed random points in a triangle.
In the simulations presented here, we simply take D to be a single triangle, fix N , and let the locations X 1 , X 2 , . .., X N of the individuals be independent, uniformly distributed points in D. We assume that p uv is a decreasing function of the euclidean distance X u − X v .
2.3.Influence weights.Following [9], we assign a random influence weight W v > 0 to each v.This weight determines how likely others are to listen to v, not how much weight they assign to v's opinion; the probability p uv is an increasing function of W v .
We assume W v to be a heavy-tailed random variable that is always greater than 1.Specifically, we assume that for any x > 1, ( 1) x γ , with some exponent γ > 0. (The parameter β of [9] is γ + 1.) To generate a random number W v with the complementary distribution function (1), draw a uniformly distributed random number U ∈ (0, 1), then set γ .We will choose a value of γ that makes the mean of the distribution of the W v finite: γ > 1.Given this constraint, however, we will choose γ to make the variance of the distribution infinite, so that outlying values of W v become fairly common.The variance is infinite for 1 < γ ≤ 2, and since within this range, we don't expect the precise value of γ to have a qualitative impact on our results, we choose γ = 1.5.
2.4.Opinion scores.Each individual v carries a time-dependent opinion score H v between −1 and 1 in our model, reflecting their view on Covid-19 vaccines, ranging from H v = −1 (strong willingness) to H v = 1 (strong hesitancy).Following [21] we assume that where b > 0 is a threshold.That is, we assume that v cannot have any impact on u's opinion if u and v have starkly different views.Throughout this manuscript, we fix b = 1.5.Under this choice of b, the classic Hegselmann-Krause model will converge to tight consensus.However, as we will demonstrate, the ODyN model exhibits other emergent phenomena.
2.5.Overall formula for the connection probabilities.We define where 1 denotes the indicator function.The parameter λ > 0 is a reference length; we take it to be the diameter of the spatial domain.The parameters α > 0 and δ > 0 determine the importance of influence weight and spatial proximity, respectively.
2.6.Initialization of opinion scores.The influence weights W u and spatial locations X u are independent random numbers, chosen as outlined above.The opinion scores H u change with time; see the discussion on Hegselmann-Krause dynamics in paragraph 2.7.We assign a random initial opinion score to each individual, drawn from a Gaussian with standard deviation and mean either −1 (with probability p −1 or +1 (with probability p 1 = 1 − p −1 ).These assignments are made independently of each other, and independently of the X u and W u .The p k , k = −1, 1, are chosen to reflect publicly available data.
2.7.Hegselmann-Krause dynamics.Denote the opinion scores after t time steps by H u (t).(We take t to be a non-negative integer here.)Then H u (t) is the average of those H v (t − 1) for which either v = u, or there is an arrow pointing from v to u at time t − 1.
In words, u averages their own opinion with the opinions of those whom u is influenced by.This is the Hegselmann-Krause model [21].Since the probabilities p uv depend on H u −H v , they, too, are time-dependent.The connections in the random graph are re-drawn after each time step, reflecting the fact that people don't necessarily speak and interact with the same people every day.
2.8.In-degree and clustering coefficient.The in-degree of an individual u is the number of individuals v who influence u, that is, the number of v for which there is an arrow from v to u.We will keep track of the average in-degree.As the graph is time-dependent, so is the average in-degree.Since every outgoing arrow for one vertex is an incoming arrow for another vertex, the average in-degree equals the average out-degree.Though a person might interact with a larger number of individuals through their online social networks, or a smaller number of individuals through in-person interactions, surveys have shown that people report feeling genuinely close to between 5 and 10 individuals in their social circle, broadly construed [17].The clustering coefficient was chosen to be consistent with values for average clustering coefficients on directed graphs using random walks on social networks [23].
2.9.Mega-influencers.We add to the model two mega-influencers, one with opinion score −1, referred to as the left mega-influencer, and the other with opinion score 1, the right megainfluencer.One might think of these as modeling mass media outlets, outspoken governors, etc.To parallel similar work by Hegselmann and Krause on radicals and charismatic leaders [22], the mega-influencers hold static beliefs throughout.The impact of the mega-influencers is modeled as follows.To each individual u, we assign two random numbers L u and R u , with L u = 1 with probability p L , 0 otherwise, and R u = 1 with probability p R , 0 otherwise, where p L and p R are further model parameters, with 0 ≤ p L , p R ≤ 1.If L u = 1, then u is susceptible to the left mega-influencer.In that case, while H u − (−1) < , where > 0 is another model parameter, the opinion score of the left mega-influencer, namely −1, will be added to the opinions over which u averages in each step of the Hegselmann-Krause dynamics.Similarly, if R u = 1, then u is susceptible to the right mega-influencer.In that case, while 1 − H u < , the opinion score of the right mega-influencer, namely 1, will be added to the opinions over which u averages in each step.The parameters b and play similar roles, for local interactions and for mega-influencers, respectively.In the code, they need not be the same, but in the simulations presented here, they were the same.Note that our model assumes that to u, mega-influencers do not carry more weight than friends or neighbors.The very considerable effect of mega-influencers that we will demonstrate in the computational results is all the more surprising.

Parameterization and model creation.
The parameters in our model are n, λ, γ, δ, α, b, , p L , and p R .Using the ODyN library, the OpionionNetworkModel class can be initialized with these parameters as arguments.This model can be populated with individuals bearing both weight and belief scores as described in the previous sections using populate model().The belief propagation simulator is loaded as a separate class, NetworkSimulator, and network simulations can be carried out on the model with run simulation().Further documentation and demonstrations of this workflow can be found on the project Github page (link removed for anonymity), and pseudocode for these procedures are given in Algorithms 1 and 2 below.

Experimental Methodology
3.1.Model Initialization.The model described in 2.1 through 2.10 is generated by Algorithms 1 and 2 below.To populate the network, we generate uniformly distributed random points in a triangle T , and as our initial belief distributions, we take symmetric beliefs centered at -1 and 1 with standard deviation 0.5.We run several experiments varying the reach of mega-influencers from the left and right.Each of the experiments has n = 1000 agents/vertices and parameters: λ = 1/10 the diameter of T , δ = 8, α = 2, b = 1.5 and = 1.5.These parameters were explicitly chosen to achieve clustering coefficients and indegrees that were realistic for real-life community interactions, namely, a consistent clustering coefficient of approximately 0.3 as well as an average in-degree around 5. As noted earlier, when generating the weights W u , we used γ = 1.5 which yields a heavy-tailed distribution that has finite mean but infinite variance.Our selection of parameters were chosen to mimic a real-life network in a way that's quantitatively supported by social science research as mentioned earlier.For computational feasibility we restrict our attention to networks with only 1000 nodes, bearing in mind that such networks may be susceptible to edge effects.
With Algorithm 1 we assign attributes to individual agents such as weight, spatial distance, position in opinion space, and connection to mega-influencers, which we then use to compute the network graph.Using Algorithm 2 we synchronously update all opinions.After each round of opinion updates, the network graph is recomputed holding weight and spatial distance parameters constant, in the manner of adaptive networks [25].In the present model, this reflects the fact that somebody who influences my opinion today may not influence it tomorrow, for instance because I may happen not to talk to them tomorrow.

Stopping criterion.
For each initialization, we carry out 25 experiments, using Algorithm 2, varying the left and right mega-influence (i.e., varying p L and p R ) to the same extent.A stopping criterion is determined as follows.At each time step, a 5-time step rolling average in belief change is calculated for each individual.The community-wide mean of the absolute change in belief is then computed.When this value drops below .01, the simulation is stopped.We note that this allows for individuals to have small oscillations in opinion, but overall the community opinion stabilizes.For brevity, in Algorithm 2 we indicate this with a Boolean stopping criterion satisfied.This threshold is typically reached in 20 or fewer time steps.

Model Analysis.
For each set of parameters, we ran 25 random seeded simulations.In Fig 5 we show how the variance of beliefs when the stopping criterion is met (see 3.2) is distributed for each set of 25 experiments.From this plot we can clearly see that a loose consensus is reached in every scenario, but always with a significantly higher variance than what is seen in a classical Hegselmann-Krause model.In the absence of mega-influencers the standard deviation of beliefs at the final time is typically around 0.13.Given 50% megainfluencer reach it is most often near 0.15 and for 100% mega-influencer reach it is near 0.18.

19:
for i ≤ n do 20: for j ≤ n with i = j do 21: x ← sampled from U (0, 1)

22:
if x < p u i u j computed using Eq (2) then L ← set of agents with belief within of the left influencer.

28:
R ← set of agents with belief within of the right influencer.return Agent, Weight, Belief, Neighbor, MegaInfluencer 36: end procedure As further evidence of this behavior, we show simulation plots for one distinct iteration in Fig 6 .The simulation here has reached a stopping point after 16 time steps.We observe that even in the absence of mega-influencers, agents who are situated geographically far from other agents with similar beliefs can still end up stuck in their initial beliefs and therefore become holdouts.
In Fig 7 we take a more geospatial view of the same model, and look at the beliefs as situated in space.For simplicity we color agents white if their beliefs are less than 0 and yellow if their beliefs are greater than or equal to 0. Although the relative magnitudes in Algorithm 2 1: procedure run simulation(n,∆, γ, (p 0 , p 1 , p 2 ), λ,α, b, , p L , p R )

24:
if stopping criterion satisfied then    4.2.Application to Vaccine Hesitancy.[34] present a cohort study of changes in vaccine hesitancy from a baseline survey taken between August 9 and December 8, 2020 and a follow-up survey taken March 2 to April 21, 2021.At baseline, 69% of respondents indicate that they are "likely" or "very likely" to receive the vaccine and accordingly, are classified as willing, while the remaining 31% indicate that they are "very unlikely," "unlikely," or "unsure" about receiving the vaccine and are therefore classified as hesitant.At the time of the follow-up survey, 47% of respondents have been vaccinated, 38% are willing to receive the vaccine but hadn't yet done so, and 15% were hesitant.Notably, there were individuals from both initial cohorts that fell among the vaccinated, willing, and hesitant cohorts at follow-up, that is, people changed their minds to become both more willing and less willing over time.The complete data from this study can be found in a table in [34].
Using the ODyN model, we are able to recreate these results.We seed the model with 1000 agents and initial beliefs centered at -1 and 1 with probabilities .69 and .31respectively, and standard deviation 0.5.Fixing model parameters α = 2, b = 1.5, δ = 8 we perform a grid search across different choices for left and right mega-influencer reach.The choice of parameters which best reproduced the real survey results was a left influencer reach of 35% Hesitant (i.e.Belief ≥ 0).For example in the case of "Willing to Willing" our model suggests that 69% to 87% of the initially willing remained willing, whereas data from [34] suggests that 86% to 100% of the initally willing remained willing.8 we demonstrate how our model compares with [34] across multliple simulations.

Conclusion
The most interesting outcome of our model is the geospatial clustering.In particular, even an initially randomly mixed population evolves into patches of geospatially consistent beliefs.This suggests that even if there were no "blue states" and "red states" they would eventually emerge for mathematical reasons.Even individuals who are initially beyond the bounded confidence thresholds of their neighbors will eventually have their views softened.
Our model also shows the spread of opinions observed with the introduction of megainfluencers.Unlike in previous models including radicals with static beliefs, such as in [22], the combination of stochasticity and mega-influencers in our model has the effect that opinions eventually stabilize but never tightly around a single or double consensus.Megainfluencers result in a more diffuse set of individual beliefs.
We demonstrate how a certain set of model parameters recreate real life data related to changing opinions around vaccine hesitancy.As in real life, our model shows agents changing their minds to become both more accepting and more hesitant.Of note is the fact that a substantial influence from the right is needed to prevent public opinion from almost entirely shifting towards vaccine acceptance.This is shown convincingly in Fig. 10 where the presence of any reach from the left is enough to overpower all but the most broadly reaching right influencers.
Our geospatial distance can be interpreted as geographic, or some other notion of distance.Physical proximity is an important component of political and ideological opinion formation.On the other hand, social media makes spatial proximity less important, but might tend to make people interact more selectively with the like-minded as both a consequence of social and algorithmic behavioral drivers [12], although this is a point of discussion [2].One could attempt to model this effect by changing parameters in our model, making spatial proximity less important, and making like-mindedness more important, in determining the probabilities p uv .It might also be worthwhile to consider the role of bots in networks and opinion formation online as in [24].
Our results also suggest future work on the dependence on the parameters b and .We intend to work on scalable sampling algorithms for combining triangles into other more complex geometries.We plan to attempt to understand how time steps in our model map onto real time.Another feature to be added to the model in the future would be the effects of central interventions such as government or workplace vaccine mandates in the case of Covid-19 vaccinations.The connection between beliefs and the spread of misinformation could also be incorporated into our model.

Figure 1 .
Figure 1.The figure on the far left shows results from a longitudinal survey on Covid-19 vaccine hesitancy [34].The figure in the center shows a similar result from the ODyN model.The figure on the far right shows the geospatial clustering present in the ODyN model.

Figure 2 .
Figure 2. In the triangle above, 1000 agents are placed an an equilateral triangle.Agents are denoted by circles, where the radius of a circle is a function of the weight of the individual agent and weights are assigned at random with power law exponent 1.5.Agents are randomly assigned beliefs from a 1-dimensional Gaussian mixture model with centers at -1 and 1, both with standard deviation 0.5.The inset square gives a zoomed in view of one section of the triangle.Directed edges are then assigned with a probability given by eq. 2, although in this image the edges are shown as directionless because of resolution constraints.

Figure 3 .
Figure 3.For a model with 1000 agents with γ = 1.5 and symmetric initial beliefs centered at -1 and 1 we allow the importance of weight, α. and the importance of distance, δ, to vary between 1 and 10.

Figure 4 .
Figure 4.Each of the networks above is initialized with 100 agents holding symmetric beliefs around -1 and 1.The classic Hegselmann-Krause model (left-most panel) has a very high mean in-degree (denoted md) and clustering coefficient (denoted cc).Holding all other model parameters fixed, the inclusion of weight (second from left), distance (second from right) into the full ODyN model (right-most panel) decreases the overall level of connectivity in both the mean in-degree and clustering coefficient.In models for which it is relevant, agents with greater weight are denoted by correspondingly larger dots.

29 :
for i in randomly sampled subset of L with size p L • | L | do 30: for i in randomly sampled subset of R with size p R • | R | do

20 :
Belief[j] ← average of beliefs in S.

Figure 5 .
Figure 5.For each setting of mega influence reach (i.e.p R and p L equal to 0.0, 0.5 or 1.0) we ran 25 experiments with 1000 agents, α = 2, b = 1.5, δ = 8 and γ = 1.5.Here we show the distribution in the standard deviation of beliefs at the time of model convergence.

Figure 6 .
Figure 6.Sample simulation results for each setting of mega influence reach (i.e.p R and p L equal to 0.0, 0.5 or 1.0) with 1000 agents, α = 2, b = 1.5, δ = 8 and γ = 1.5.We see a wide spread of final beliefs as well as holdout individuals.

Figure 8 .
Figure 8.The 95% confidence intervals from [34] are shown as cross-hatched bars, and the 95% confidence intervals from 30 simulations carried out with the ODyN model ar shown as solid bars.Bars are are futher color-coded to denote whether the cohort began as Willing (i.e.Belief ¡0) orHesitant (i.e.Belief ≥ 0).For example in the case of "Willing to Willing" our model suggests that 69% to 87% of the initially willing remained willing, whereas data from[34] suggests that 86% to 100% of the initally willing remained willing.
and right influencer reach of 75%.These results are shown in Fig. 1.For other values of p L and p R , results are shown in Fig 10 of the Appendix.The plots shown in Fig. 1 and 10 are for individual simulations, but in Fig.

6 .
Appendix A: Additional Supporting Figures

Figure 9 .
Figure 9.For a model with 1000 agents with and symmetric initial beliefs centered at -1 and 1 we allow the importance of weight, α. and the importance of distance, δ, to vary between 1 and 10 with γ = 1.1 (top) and γ = 2.0 (bottom).
Figure 9.For a model with 1000 agents with and symmetric initial beliefs centered at -1 and 1 we allow the importance of weight, α. and the importance of distance, δ, to vary between 1 and 10 with γ = 1.1 (top) and γ = 2.0 (bottom).

Figure 10 .
Figure 10.The model is seeded with 1000 agents, initial beliefs centered at -1 and 1 with probabilities .69 and .31,respectively, α= 2, b = 1.5, δ = 8.The alluvial plot shows the overall cohort changes in belief from model initialization to the model convergence.