Review of Sascha Hokamp, Laszlo Gulyas, Matthew Koehler, Sanith Wijesinghe: Agent-Based Modeling of Tax Evasion

Agent-Based Modeling of Tax Evasion

Sascha Hokamp, Laszlo Gulyas, Matthew Koehler, Sanith Wijesinghe
Wiley: UK, 2018
ISBN

Reviewed by Philip Truscott
Singapore University of Technology and Design

There are many famous examples of computer models designed to simulate the effects of tax policy. In the USA an elaborate model advises Washington DC legislators. It makes projections both in the short term and on the impact of social security changes decades into the future (CBO, 2008). In the UK, a private research body, the Institute for Fiscal Studies, maintains a model (Dilnot, Stark, & Davis, 1987) that is widely quoted in the media as a tool to analyse tax and benefit changes. The European Union has even funded the development of a multi-country model, Euromod, (Bargain, 2006) that simulates tax-policy changes using a standard software format across its 28 member states.

Agent-based modeling of tax evasion by Hokamp, Gulyas, Koehler and Wijesinghe describes computer simulations that share some characteristics with these models. Some of the models described use microdata about individual taxpayers to simulate the effect of policy changes. Like America’s Congressional Budget Office Long Term Model, they simulate the effect of changes over multiple years.

Where the models differ is in the simulation complexity of individual decision-making. Agent-based models (ABMs) “consist of discrete autonomous agents that behave according to prescribed decision rules. Interaction of agents at the micro-level often leads to complex emergent phenomena at the macro-level” (p. 11).

Four of the models included in the book are described in detail below.

• Other models include a model that uses an approach from the natural sciences: Götz Seibold’s From Spins to Agents: An Econophysics Approach to Tax Evasion (pp. 289-314).

• Sascha Hokamp and Andrés M. Cuervo Díaz describe a model sensitive to Lapse of time, social norms, age heterogeneity, subjective audit probability, public goods provision, and pareto-optimality (pp. 256-284).

• Matthew Koehler, Shaun Michel, David Slater, Christine Harvey, Amanda Andrei and Kevin Comer explore The effects of Network Structures in Massive Agent-based Models of Tax Evasion pp. 225-251).

• Nigar Hashimzade and Gareth Myles describe a model based on a hypothetical group of taxpayers that represent “three occupations that matches UK data”. The purpose of their model is to simulate compliance behaviour and alternative audit strategies (pp. 91-121).

Some of the policy questions addressed by the various models include the following:

• Which enforcement policy is more effective, auditing a higher proportion of taxpayers or increasing the fines for evasion?

• To what extent will taxpayers learn from their own history of being audited?

• What is the impact of social networks? How far will tax evasion by one person in a circle of acquaintances change the behaviour of others they are in contact with?

• In what circumstances will employers and employees collude to avoid paying taxes?

• How far do improvements in government services encourage greater tax-payer compliance?

TAXSIM This last question is addressed by TAXSIM, a model developed in Hungary and described by Laszlo Gulyas, Tamás Mahr and Istvan Toth. TAXSIM explicitly addresses some tax evasion questions avoided by the models that have emerged from developed English speaking countries. These tend to assume that employers rarely conspire with workers to evade taxes. By contrast one of TAXSIM’s main questions is the extent to which large numbers of employees will be treated as independent contractors with no employer tax deduction or wage reporting. The authors identify 23 different categories of worker defined by receiving different combinations of compensation type: reported wages, unreported wages, payments in kind, fringe benefits and cash for ad hoc jobs (which makes them unreported independent contractors).

TAXSIM does not use taxpayer survey data (or any other type of empirical data) for its inputs. As the authors declare it is a method of automating a long sequence of “thought experiments.”

One policy question addressed by the authors was the tax compliance effect of varying levels of government service quality. Each run of the model included six thousand time steps and the authors averaged results across ten independent runs. Their simulations suggest that the number of labour contracts in the shadow economy tended to be inversely proportional to service quality. “In the case of low quality of services employment contracts are mostly illegal. In the mid-quality range, contracts become mixed, while with high quality services legal contracts tend to dominate the market” (p. 189).

Another policy question explored was the effectiveness of an adaptive audit strategy. This tactic targets companies that are in the same network as firms previously identified as tax evaders (rather than using random selection to choose firms for audit). The authors concluded that “the larger the ratio of adaptive target selection … the less hidden and the more legal contracts are observed” (p. 192).

As TAXSIM is not based on empirical data on tax payments and incomes it cannot be validated by comparing the simulated and actual tax payments or real people. Critics of this approach might argue that it is excessively abstract and that the policy results are merely the effect of the parameter values chosen by the model user. On the other hand, given the large number of time steps and policy parameters it is fair to say that the combination of the effect of the different policy parameters requires automation. The authors are open about the abstract nature of the model and they explore issues of employer complicity that other models ignore. The most notable issue is employee-employer collusion to define regular workers and independent contractors. Employees in the developing world are often tempted into a Devil’s bargain to evade social security contributions in the short term without weighing the future impact of having no pension entitlement in old age.

SIMULFIS SIMULFIS, a model designed to simulate the Spanish context is described by Miguel Quesada, Toni Llacer, Jose Noguera and Eduardo Tejada. Like TAXSIM the income and tax payment data of the agents is not based on survey data, but many of its parameters have been set to model the context of contemporary Spain using empirical data. The income tax thresholds and tax rates, the proportion of self-employed people, the income threshold for social benefits and the withdrawal rate for benefits are all taken from official regulations.

The model user can select the existing Spanish audit rate of 3%, or select default options of 25%, 50% or 75%. The evasion fine can be modelled as 1.5 times the tax liability or higher values in among the following values: 3.0, 4.5 and 6.0.

For each agent SIMULFIS assumes a four-stage decision-making process as follows:

1. The Opportunity Filter is a deterministic function that selects the maximum proportion of income that can be concealed based on employment type (employees can conceal less, the self-employed more). The resulting proportion is fed into the next filter.

2. The Normative Filter changes the proportion of income concealed based on the individual’s opinion about two aspects of the tax system. One is the level of support for the concept of progressivity in the tax system. The second aspect is called ‘Tax Balance’ and is defined as follows. If a taxpayer is a net contributor while most of his/her neighbours are net recipients, the taxpayer is defined as “unsatisfied” otherwise he/she is “satisfied”. The results of this second stage are fed into the next filter.

3. The Rational Choice Filter changes the proportion of income that is concealed based on the perceived probability of being audited. This probability is updated for each time-period based on the individual’s own history of being audited and the audit rate in the taxpayer’s neighbourhood. The results of this filter are fed into the next filter.

4. The Social Influence Filter is a parameter for all taxpayers that represents the prosocial influence (in favour of paying taxes) exerted by the agent’s neighbourhood. The value is set in the range 0 to 1. At a value of 0 the neighbourhood has no social influence and the proportion of income concealed is the same as at the end of the rational choice filter. At the maximum value of 1 the maximum income is declared, and the effect of the rational choice filter is cancelled.

Authors conclude that taxpayers are motivated by a complex interaction of prosocial influences in favour of tax-paying and individualistic temptations to evade – “... strict rational agents would produce much less compliance than is usually estimated, except with unrealistically high deterrence levels” (p. 45). The authors assert that the number and effectiveness of audits is more important than the size of evasion penalties.

The SIMULFIS authors state that they can distribute versions of their tax evasion model on demand. SIMULFIS has been implemented using the Netlogo Modeling environment (download address: ccl.northwestern.edu/netlogo/) which is intended to facilitate the simulation of thousands of agents evolving over time. Sample models available to download include “Wolf Sheep Predation”, “Rebellion” and “Ethnocentrism”. These freely available agent-based models and the software environment probably mean that SIMULFIS is the most useful chapters for teaching purposes and for those wishing to develop new agent-based models from scratch.

Tax Shelters and Audit Priorities In the majority of the book’s agent-based models the agent is assumed to be an individual tax-payer or a corporation. A radically different type of agent is described in the chapter entitled Modeling the Co-evolution of Tax-Shelters and Audit Priorities by Jacob Rosen, Geoffrey Warner, Erik Hemberg, H. Sanith Wijesinghe and Una-May O’Reilly. In this chapter the agent that evolves over time is not a tax-payer but a tax evasion strategy (or tax shelter).

The authors point out that in their area of research (complex real estate transactions) it is difficult to decide between the terms ‘tax evasion’ (illegal) and ‘tax avoidance’ (legal). Some of the schemes mentioned were not explicitly ruled out by legislation but were later ruled to be evasion during law suits. This review will use the term ‘evasion.’

The tax evasion strategy the authors use for their main illustration involves schemes to evade capital gains tax. Suppose Mr. Jones buys a house for $100,000 (the basis price) and sells it for $150,000 he should pay capital gains tax on the $50,000 profit. However, he may reduce the tax burden by setting up bogus companies as intermediaries. The taxable gain may be changed by selling the house for a series of annuity payments rather than a single payment. Schemes to inflate the original basis price can also reduce the taxable gain. The tax evaders use a combination of these methods that are described by the acronym IBOB for ‘Installment Sale Bogus Optional Basis Transaction’.

Genetic algorithms require the creation of a virtual environment which can mimic a Darwinian survival-of-the-fittest process over time. In Darwinian evolution the popular conception is that genetic variations of an animal will improve its survival rate. To use the jargon of genetic programming an evolutionary experiment needs a ‘fitness function’ to measure some numeric quantity at each stage that simulates this survival rate.

A typical genetic evolution run involves a set of ‘training data’ to simulate the phenomenon being researched.

To illustrate how this is used, consider the following simplified version of the authors’ tax shelter research. In this case, data would typically have these elements:

a) A variable corresponding to the Fitness Measure – Imagine a data file on real estate companies which is includes a column of numbers which describe Tax_Reduction in dollars

b) A set of descriptive variables about each company showing the presence or absence of various transaction features: payments as annuities, single link payments (payments through an intermediary company), double link payments (payments through two intermediary companies), adjustments to basis price (the original purchase price of a real estate asset).

A given ‘run’ of a computational evolution process would generate a starting tax reduction strategy as a genome, then calculate a value for Tax_Reduction. At the second evolution the tax reduction strategy would be varied and Tax_Reduction would be calculated again. In symbolic regression, the variable Tax_Reduction would be taken to be the dependent variable in a regression model. The various generations of the run would produce values for the variable Tax_Reduction that gradually increase over time. The versions of the tax strategy genome that produce better fitting data (that is have higher fitness scores) are ‘rewarded’ in the evolution process and are so more likely to produce offspring in later generations.

In the case of the current tax shelter research the authors did not have the benefit of such training data, so the fitness measure is the calculated value of Tax_Reduction for an abstract tax-payer. Instead of training data they use a scoring system that imputes fitness values to various tax payment strategies. Therefore, instead of rewarding genomes that have better fitting prediction values, new genomes can be created that increase the calculated value of Tax_Reduction.

The illustration above is a crude simplification the authors’ research. The fitness measure used is not a simple quantification of the tax reduction, it is a combination of two values that represent the “largest reduction of taxable income for the smallest risk” (p. 298). Let us call this variable Weighted_Tax_Gain.

The authors not only use genetic algorithms to simulate tax shelters, they perform a “co-evolution” of tax audit strategies that generate ways to catch the tax evaders.

As with the evolution of the tax shelters, the process of evolving audit strategies requires a fitness measure. This is taken to be the inverse of Weighted_Tax_Gain. The tax audit strategies can evolve to increase the risk that the evader will be caught. While the tax authority cannot directly change the reduction in taxable income part of Weighted_Tax_Gain it can change the ‘smallest risk’ element.

The authors suggest that their technique could be used in the real world by a tax authority. Subject matter experts working for a given government would brain storm possible tax evasion strategies, encode them into a genome, and then use an evolutionary process to decide what form of the strategy represents the biggest potential tax loss to the government (that is the biggest gain to the taxpayer for the lowest risk of being audited).

This pair of simulations is an example of a classic predator-prey scenario where the preys are the tax evaders and the predators are the tax auditors. The authors show scenarios in which the tax audit strategies evolve over time to reduce tax evasion but never eliminate it. The implication is that the dance between auditors and tax evaders is eternal.

“… we augment the audit score sheets to assign the lowest audit point a value of 0, so that there will be at least one scheme that is not detectable by the auditor. Our hypothesis is that once the population of audit score sheets begins to converge, a tax scheme will evolve that utilizes the type of behavior that is currently not detectable by the majority of audit score sheets” (p. 308).

To paraphrase Benjamin Franklin, nothing is certain in the world except “death and tax evasion.”

Individual Reporting Compliance Model When it comes to the inclusion of real-world data Bloomquist's Individual Reporting Compliance Model (IRCM) (p. 199) presents a sharp contrast to the models described above. The USA is one of the few countries in the world which publishes an official estimate of its “tax gap” (this is defined as the amount of money in uncollected taxes as a proportion of the total tax due). Moreover, the government releases a public domain database containing a representative sample of anonymized taxpayers: the Statistics of Income (SOI) public use file (PUF). The public use file is modified to preserve taxpayer anonymity.

Bloomquist, a senior economist at the US Internal Revenue Service, had the benefit of his increased level of government access to create an enriched version of the Public Use File. The author started with a database of 85,000 actual taxpayers with 180 data items about each. Two crucial items not available on the public use file where the person’s misreporting behaviour and the person’s tax preparer (tax accountant). This last item makes it possible to model tax evasion or compliance behaviour that is spread from one accountant to another. This real data was then converted to artificial data by substituting, for each real taxpayer, the closest matching tax taxpayer from the public use file.

The IRCM starts with a sophisticated understanding of how far different types of income are under-reported. The author comments that “non-compliance is most prevalent where the opportunities for under-reporting are greatest” (p. 206) and reports the “net misreporting percentage” (NMP) for four different types of income. Where the worker’s tax is both reported and deducted by the employer, the scope for misreporting is minimal (estimated at only 1% in cash terms). Where the income is subject to third party reporting but not withholding the NMP is much higher. Pensions, social security income, interest and dividend income have an 8% NMP. Capital Gain income has an NMP of 11%. Where there is little or no 3rd party income reporting (e.g. rents, royalties and small business income) the NMP is 56%.

The IRCM allows users simulate changes in tax-payer behaviour by playing what-if games with the deduction-at-source and 3rd party income reporting of different types of income. IRCM includes an elaborate set of drop-down menus and check boxes to model changes to the US tax system.

Sometimes the model uses a direct approach to changes in taxpayer behaviour. It can also use an indirect approach that copies a taxpayer’s reporting behaviour from the closest person in that taxpayer’s reference group. A reference group might be a coworker, a neighbour, or another person who uses the same accountant.

The validation of the model includes a detailed comparison of the IRCM’s estimated NMPs for different classes of income compared to the same percentages estimated by the IRS for the relevant tax gap study. To its credit the IRCM estimated a total amount of tax collected that was within 1% of the official IRS figure. By contrast it over-estimated the NMP for sole-proprietor income (63% vs. 57%).

The high values for misreporting by sole-proprietors suggest a threat to the tax base from the USA’s growing “Gig” economy. The rise of Uber, Lyft and Airbnb suggest a decline in the proportion of American workers whose income is reported by their employers on the all-important W2 form. The IRCM was used to simulate changes to tax under-reporting under different assumptions about the proportion of Americans who are sole proprietors. The model reported three different levels of increased Gig workers: +5%, +10% and +15%. After 25 time-steps, the +15% level would increase the Net Misreporting Percentage by roughly 6% compared to 16% the baseline assumption where the proportion of Gig workers does not change. Such a change would have serious consequences for the USA’s deficit. Bloomquist estimates that “each additional percentage point of voluntary compliance brings in approximately $30 billion in tax receipts” (p. 215). The author compares the IRCM’s predictions to the official IRS estimates for 18 major income categories and tax statistics (including total tax yield, deductions and exemptions).

It is difficult to see how any such prediction could even be possible without a model like the IRCM. While most of the models contained in the same volume freely admit that they are elaborate thought experiments, the level of checking against empirical data suggests that the IRCM is more of a proof-of-concept than a demonstration-of-concept.

Readers interested in further information about TAXSIM should contact Istvan Toth (email: toth.istvanjanos@krtk.mta.hu).

To request further information about SIMULFIS the following two authors say they should be used as points of contact: Miguel Quesada (miguel.quesada@uab.cat) or Jose Noguera (jose.noguera@uab.cat).

The suggested contact point for this research is Una May O’Reilly (email: unamay@csail.mit.edu).

References

BARGAIN, O. (Ed.). (2006). Micro-Simulation in Action, Volume 25: Policy Analysis in Europe using EUROMOD (1 edition). Amsterdam: Emerald Group Publishing Limited.

CBO. (2008). CBO’s long-term model: an overview. Washington, D.C.: Congressional Budget Office, Congress of the U.S. Retrieved from http://purl.access.gpo.gov/GPO/LPS114131

DILNOT, A., Stark, G., & Davis, E. (1987). The IFS tax and benefit model. London: Institute for Fiscal Studies. Retrieved from http://www.worldcat.org/title/ifs-tax-and-benefit-model/oclc/59709203&referer=brief_results

Button Return to Contents of this issue