Introduction

The current policy landscape has brought about a number of proposed reforms to health care delivery in the United States (U.S.). To help guide decision-making, policymakers often turn to mathematical modeling to explore the potential outcomes of such reforms. Dynamic microsimulation is a powerful tool by which the predicted effects of a policy can be evaluated at multiple levels (e.g. individuals, households, and employers, as well as at the aggregate level) (Krijkamp et al. 2018; Rutter et al. 2011).

We built the Longitudinal Health, Income and Employment Model (LHIEM) to assess the impact of the 10Plan, an alternative health care financing proposal, on individuals, families and the federal government (Kapinos et al. 2021). The proposed policy involves a self-pay system designed to minimize the burden of health care costs by allowing individuals to borrow from the U.S. government to cover health care costs and repay them at a means-tested annual rate. The analysis thus required careful tracking of medical expenditures, medical debt, loan payments, and income over time. These requirements drove the design of LHIEM, a nationally representative dynamic microsimulation that evolves individual- and family-level health expenditures and key individual-level attributes, including labor force participation and income, health insurance status, family structure, and health status, over a multi-year period.

LHIEM builds on the strong foundation of dynamic microsimulation models in the health policy field, addressing a critical gap left by existing tools. In the United States, five major health policy simulation models are widely used for federal and state-level decision making (Abraham 2012): the Health Insurance Simulation Model (HSIM, and more recently, HSIM2) developed by the Congressional Budget Office; the Gruber Microsimulation Model (GMSIM), developed by Dr. Jonathan Gruber at the Massachusetts Institute of Technology (MIT); the Comprehensive Assessment of Reform Efforts (COMPARE) model, developed by the RAND Corporation; the Health Benefits Simulation Model (HBSM), developed by the Lewin Group; and the Health Insurance Policy Simulation Model (HIPSM), developed by the Urban Institute. While these partial equilibrium models are powerful tools, they are not designed to handle path-dependency, making them less suitable for evaluating scenarios where an individual’s healthcare payments depend on their history of expenses and payments. To analyze a policy such as the 10Plan, a model must explicitly capture the dynamic interplay between health spending, income, employment, family structure, and health status. Existing U.S.-based dynamic microsimulations, such as the Future American Model (FAM), are focused on different attributes (the progression of chronic health conditions, in the case of the FAM) rather than healthcare expenditures (Goldman et al. 2016).

Dynamic microsimulation models with some of the necessary features have been developed internationally — for example, EUROMOD, which examines the impact of taxes on household incomes in Europe; Health Equity and its Economic Determinants (HEED), a European microsimulation for assessing the health impacts of income; and the Population Health Model (POHEM), which simulates the impact of government policies on health outcomes in Canada. However, the specific complexities of the U.S. healthcare system make these models challenging to adapt for analyzing U.S.-based policies. Nevertheless, these international models serve as important sources of inspiration for LHIEM.

LHIEM is thus distinctive as a dynamic model of healthcare spending and related attributes, offering a simpler and more customizable framework than many existing microsimulations. This simplicity makes it an accessible tool for other researchers to tailor to their specific research questions, with potential applications across health policy and related fields. The purpose of this paper is to describe the core technical components of LHIEM, rather than focusing on the specifics of the 10Plan policy proposal, to enable broader use of the model in exploring other research questions. LHIEM was developed in R and all code is available on GitHub (https://github.com/apropp/LHIEM/).

This paper begins with a description of the data and methods we used to develop LHIEM. We continue with details on model validation and calibration, selected results from the analysis of the 10Plan (Kapinos et al. 2021), and uncertainty analysis. We conclude with a discussion of LHIEM’s contributions, areas for future work, and how LHIEM fits in the context of microsimulation modeling for policy analysis.

Methods

Model overview

Purpose

LHIEM facilitates the exploration of policy questions related to health care expenditures by estimating the evolution of medical spending and its key predictors over time for individuals and families. This discrete time microsimulation produces path-dependent life trajectories for each individual in our U.S.-representative model population by advancing the initial population from year to year as a Markov chain. Key individual-level attributes, including age, income, health status, health insurance status and source, and yearly medical spending are updated each year, based on the attributes of the previous year and transition rates derived from publicly available data. Individual records are also sampled each year for childbirth (if female), negative health shocks (such as an acute injury), the onset of a chronic disease, and death. These attributes and events contribute to determining an individual’s health care expenditures for the year.

State variables and scales

The model population consists of individuals organized into family units and is representative of the U.S. population on both the individual and family level. The set of state variables that describe individuals are included in Table 1. Static state variables are set at model initialization or birth and do not change for the duration of an individual’s lifetime. Dynamic state variables may change at each time step of one year. Importantly, we assume that individuals do not interact with or affect one another. The only exception is with respect to income, which we aggregate to the level of the family unit.

Table 1: Individual state variables
Variable group Variable Variable type Units / Levels Static v. Dynamic Description
Identification PID String Static Simulation individual identifier
FID String Static Simulation family identifier
TID String Dynamic Simulation tax group identifier
WT Numeric Static Individual weight
WTH Numeric Static Household weight
Individual
characteristics
Sex Factor "Male",
"Female"
Static Sex of individual
Age Numeric Years Dynamic Age of individual
Race Factor "Hispanic",
"White",
"Black",
"Other"
Static Race of individual
Survive Boolean Dynamic Whether individual is alive
in each year
Economic
characteristics
FamIncome Numeric Nominal USD ($) Dynamic Total family income
WageProp Numeric Proportion Static Proportion of family income
due to individual's wages
InsCat1 Factor "Uninsured",
"Medicaid",
"Other Public",
"NonGroup Private",
"Other Private"
Dynamic Health insurance status
Deduct Numeric Nominal USD ($) Dynamic Health insurance deductible
Health
characteristics
MedSpend Numeric Nominal USD ($) Dynamic Medical spending this year
Visits Numeric Number of visits Dynamic Number of medical visits or
interactions this year
Preg Factor "Pregnant",
"Not Pregnant"
Dynamic Pregnancy status
HS Factor "Good",
"Bad"
Dynamic Health status
Morbidity Factor "None",
"Chronic",
"Acute"
Dynamic Morbidity status
1Note that Medicare is not included as a factor because this study was focused on the population under 65 years of age. Studies with different target populations would include this group.
Process overview and scheduling

To properly characterize the evolution of medical spending, LHIEM evolves the dynamic attributes that we found to be the most significant predictors of medical spending. Specifically, these include:

  • Health status and morbidity,
  • Family income,
  • Mortality and end-of-life spending,
  • Demographics, including pregnancy and population-level attributes.

LHIEM is implemented as a set of sequentially executed processes, which we call “submodels”. Each of the attributes listed above, as well as medical spending itself, is evolved by a separate submodel. These are described in detail beginning in paragraph 2.12. Algorithm 1 outlines the processes that comprise LHIEM in the order they are executed.

Model details

Initialization of model population

An essential component of any microsimulation such as LHIEM is the model population. The LHIEM model population is constructed from census records and other individual-level sample data and is designed to mirror the real population’s distribution over certain attributes that are relevant for the desired analysis.

We constructed LHIEM’s initial model population primarily from the 2019 Annual Social and Economic (ASC) supplement to the Current Population Survey (CPS) (U.S. Census Bureau and U.S. Bureau of Labor Statistics 2019). The CPS is a nationally representative survey containing most of the variables required for the model, including demographics (e.g. age, race, and sex), income (e.g. wages, salary, and other income sources), family structure (e.g. number of children and adults in each household), and health (e.g. self-reported health status and insurance source). We augmented this CPS-based model population with specific attributes from other datasets.

In particular, we used the 2015-2016 Medical Expenditure Panel Survey (MEPS) Panel 20 Longitudinal Data File (Agency for Healthcare Research and Quality 2018) to inform health care spending dynamics, one of the most important elements of LHIEM. The MEPS data contains information about individuals’ medical expenditures over the two-year period 2015-2016, as well as many of the other variables found in the CPS. We used the 2016 Medical Expenditure Panel Survey (MEPS) Person Round Plan (PRPL) file, drawn from the 2016 MEPS Household Component, to determine coverage type (single or family), plan metal level,1 and annual deductible level. We assigned each record the average deductible by coverage type and plan metal level, and for records without any plan or coverage-level data we assigned the overall average deductible (eHealth 2019a, 2019b). Given the well-documented differences between National Health Expenditures Accounts (NHEA) estimates and MEPS estimates of health care expenditures, we adjusted all expenditure amounts to match the NHEA using a factor of 1.27, following the work of Bernard et al. (2012).

We mapped the medical spending distribution and average deductible by age group and insurance status from the MEPS to the CPS-based initial population. We selected age group and insurance status as the matching variables2 because classification and regression tree (CART) analysis revealed these to be the strongest predictors of medical spending (see paragraph 2.16). The age groups considered were: 18 and under, 19-34, 35-49, and 50-643. We harmonized insurance status categories between the MEPS and CPS, arriving at the five groups given in Table 1. The harmonization rules for insurance status and other variables are included in Appendix Table 8.

To ensure that the mapping from the MEPS to the CPS resulted in realistic levels of variation in medical expenditures, we compared summary statistics of the initial model population’s expenditures with those reported in the literature (Mitchell 2019). As shown in Table 2, comparison across percentiles of the spending distribution demonstrates a high level of agreement between the MEPS target population and the initial model population.

Table 2: Comparison of Medical Expenditures for Target and Model Populations
Average Individual Annual Expenditure by Pctile of Spending Distribution
Overall Bottom 50% Top 50% Top 30% Top 10% Top 5% Top 1%
Target Population ($) 5,006 276 9,735 15,057 33,053 50,077 110,003
Model Population ($) 4,748 271 9,224 14,379 33,267 52,524 131,250

We also validated the other attributes of our initial model population against external statistics to ensure general representativeness. We confirmed that each 5-year age group represented between 7-10% of the under-65 population (National Center for Health Statistics 2016; U.S. Census Bureau 2016a), and that our model population had a representative distribution of insurance status and sex across age groups (Tables 3 and 4, with target population data pulled from U.S. Census Bureau 2016a).

Table 3: Comparison of Number of Uninsured Individuals by Age Group for Target and Model Populations
Age Group <19 19-34 35-49 50-64
Target Population (millions) 4.53 10.38 7.78 5.78
Model Population (millions) 4.12 9.71 7.38 5.26
Table 4: Comparison of Percent Female Individuals by Age Group for Target and Model Populations
Age Group <19 19-34 35-49 50-64
Target Population (%) 49.0 49.9 50.8 51.6
Model Population (%) 48.9 49.7 50.7 52.0
Input data

As discussed above, we used the CPS and MEPS as the primary datasets for the initial model population. However, the development of each submodel required additional data sources to ensure accurate evolution of the LHIEM model population over time, specifically with respect to health status, income, mortality, and demographic shifts. These additional data sources include:

  • Panel Study on Income Dynamics (PSID) data from 2005 to 2017 to inform changes in the income distribution;
  • U.S. Bureau of Labor Statistics CPI Inflation Calculator (U.S. Census Bureau and U.S. Bureau of Labor Statistics 2019), used to convert all dollar values to 2019 real dollars;
  • Centers for Disease Control (CDC) data (including 2016 National Center for Health Statistics data, the 2017 National Vital Statistics Report, and 2019 American Cancer Society data) to inform the updates to health and health-related characteristics, including fertility, health status, and mortality;
  • U.S. Census Bureau population-level projections from 2016-2017 to inform demographic changes — specifically, birth, death, and migration by year.
The specific details of how each supplemental data source was used can be found in the discussion of the relevant submodel in the following sections.

Submodels

In this section, we describe each of the dynamic submodels in detail. Each submodel is designed to evolve particular attributes (specifically medical expenditures, health status, income, insurance status, mortality, and childbirth) from year to year. While some parameters and transition rates are assigned at the cohort level (e.g. by age, race, or sex), all transitions occur at the individual level.3

In our framework, each submodel functions independently, with the exception of medical expenditures. This particular submodel is directly dependent on the outcomes of all other submodels and, in turn, influences the mortality submodel. Although there are real-world interactions between attributes such as health and economic outcomes or health and fertility, these interactions are not typically the most significant predictors of the attributes’ evolution over time. In designing LHIEM, we aimed to avoid unnecessary complexity, ensuring that any additional model components were justified by a significant improvement in the accuracy of aggregate outcomes. The resulting independence of the submodels provides the overall model with a high degree of modularity and flexibility.

Medical expenditures

We constructed a submodel to predict annual individual-level medical spending in each year based on medical spending in the previous year and demographic and medically relevant characteristics. The distribution of individual annual medical expenditures is known to be semi-continuous and highly skewed; many individuals incur no health care expenditures while some incur very high health care expenditures. A number of methods can be used to account for this (see, for example, Mihaylova et al. 2011). We employed a widely-used two-part model: part one predicts the likelihood of incurring nonzero medical expenditures; part two, applicable for cases with positive outcomes in part one, estimates the actual amount of these expenditures (Belotti et al. 2015; Duan et al. 1983; Olsen & Schafer 2001).

In part one, we estimated the likelihood of incurring nonzero expenditures using a logistic regression based on age group, sex, health status, insurance category, race, income, pregnancy status, nonzero spending in the previous year, and total amount of spending in the previous year.

In part two, we projected medical expenditures from year to year conditional on a prediction of nonzero expenditures in part one. We first used a classification and regression tree (CART) to identify partitions of the population exhibiting meaningfully distinct medical spending patterns. Here, a “partition” refers to a group of records sharing a certain combination of predictor variables. We then defined a unique spending regression for each partition identified by the CART using a generalized linear model with an identity link function and the natural log-transform of health care expenditures. This type of approach (using machine learning to partition the dataset, then applying classical approximation schemes such as polynomial approximation or regression to the resulting partitions), has been proposed as an effective method for solving high-dimensional regression problems in general contexts (Fan et al. 2023).

CART analysis revealed that the regression coefficients meaningfully differed between age groups and levels of medical spending in the previous year. Insurance category, sex, income, race, and health status were also generally important in the regression. The final regressions ultimately predicted the next year’s health expenditures using spending in the previous year, age group, insurance category, sex, income, race, and health status as predictor variables.

We excluded pregnant women from the training sample because the costs associated with pregnancy are not easily determined from the MEPS Longitudinal File. We instead modeled the costs associated with maternity health care separately. We also excluded records with incomplete data from our analysis to help address the issue of missing data due to death. However, this method may be problematic for applications specifically concerned with old age or chronic disease (Hass et al. 2016).

To capture the wide variation in maternity and childbirth costs, for example in the event of Cesarean section delivery or complicated birth, we generated these costs stochastically using a truncated log-normal distribution fit to reproduce statistics from Hsia et al. (2014) (median $5,123, lower bound $835, upper bound $26,850) and to statistically reproduce the high level of variability in the costs across different insurance statuses and states. We modeled medical spending in the first year of life as a Poisson distribution, Pois(\(\lambda\)), with rate parameter:

\[\lambda=\omega^{2.5}*1000\] \[(1)\]
Here, \(\omega\) represents health status on a scale from 1 (good) to 5 (poor). We selected this form for rate parameter \(\lambda\) in order to roughly reproduce statistics from the National Conference of State Legislatures memo on the costs of prenatal care (National Conference of State Legislatures 2013). While we associated maternity and childbirth costs with the mother’s record, we associated medical costs in the first year of life with the infant record.

Medical expenditures thus evolve over time as the underlying predictor variables (health status, income and insurance status, age) evolve over time. The remainder of this section describes how LHIEM captures the dynamics of these predictor variables.

Health status

We assumed that individuals could be in a state of good health or poor health, with those in poor health suffering from either an acute condition (from which recovery is possible) or a chronic condition (from which recovery is not possible). To estimate the hazard rates for acute and chronic health conditions, we consulted the CDC and National Vital Statistics for cause of death data and obtained the proportions of deaths attributable to common health conditions (American Cancer Society 2019; Crimmins et al. 2008; Heron 2019; Murphy et al. 2018). We assumed that these rates also reflect the proportions of incidence rates by age and gender (we elaborate on the implications of these assumptions in Section 4). We used these hazard rates to represent the net transition from good to poor health. We then implemented a recovery rate from acute conditions that decreases with age, using a half-life that varies from 1 year at birth to 30 years at age 50.

Importantly, we do not explicitly model any correlation between the health statuses of individuals in the same household. While this correlation may be significant for both acute and chronic effects, this simplification does not affect LHIEM’s ability to capture the aggregate dynamics of medical expenditures for individuals, families, and the general population over time.

Income and insurance status

Because we were interested in longitudinal changes in an individual’s income and insurance status, it was important to ensure that the income trajectories were realistic and that the relationship between insurance status and income was accurately preserved over time. We constructed individual income trajectories using the income mobility estimates from the Panel Study for Income Dynamics (PSID) for 2006-2018. We inflated income to 2018 dollars using the Consumer Price Index Research Series Using Current Methods (CPI-U-RS).

We analyzed the annual change in real income across the income distribution to obtain probabilities for transitions across income and labor force categories by age. In particular, we updated incomes based on income quintile, age group (18-24, 25-34, 35-49, and 50-64), and sex, then sampled from the distribution of annual income changes for the relevant CPS records. Based on some of the more extreme annual changes in income found in the PSID, we assumed that a decrease in income of 90% or more between years (which occurred in 1-3% of cases for most age and income groups) indicated job loss. We then randomly assigned these individuals to an income in the bottom 15th percentile of the income distribution in the next year. To model transitions from unemployment to employment, we randomly assigned individuals in the bottom 15th percentile to an income sampled from the transitions of those who moved out of the bottom 15th percentile in the PSID. For each projection year, we applied the appropriate period-specific CPI for historical years and estimated CPI values for future years to ensure that all income numbers are expressed in nominal dollars for the respective years.

For efficiency, we aggregated individual income trajectories into family-level trajectories prior to running the model, but individual-level trajectories could also be used. We assumed that all individuals retired at age 65, and adjusted family income according to the proportion of family income that was previously attributable to the retiring family member’s wages or salary. Note that this assumption could produce an underestimation of family income in many cases, especially considering the trend towards later retirement and the tendency for unearned income to increase as a proportion of total income as individuals age.

Insurance status was initialized from the CPS. We assumed that it remained unchanged except in the event of an income shock of 10% or more in either direction. We did not make any assumptions about changes in the insurance status distribution over time, and simply sampled from the original distribution of insurance status by federal poverty level (FPL) in the event of such an income shock. This is a limitation of our approach, as there may be many scenarios in which an individual or family changes insurance status without a significant change in income. However, this affects only a small share of the total population (Berchick et al. 2019) and thus does not greatly impact the results.

Mortality and end-of-life expenditures

A substantial portion of lifetime medical expenditures occur in the last few months of life, particularly for those with chronic health conditions (French et al. 2017), so the treatment of mortality and end-of-life spending is an important aspect of the model. Because the MEPS does not cover all end-of-life spending, we obtained age- and gender-specific average probabilities of death from the 2017 Human Mortality Database U.S. life tables, assuming a smooth transition to 2018 using UN life table projections (Human Mortality Database 2017; United Nations 2017). Pulling from the findings of Einav et al. (2018) on the relationship between predicted mortality and end-of-life spending, we were able to introduce individual-level variability in these probabilities and obtain an indirect dependence on health status. We then used these predicted mortalities to determine which individuals would be removed from the population each year, to match the total mortality rate to the United States Census Bureau 2017 National Population Projections Tables (U.S. Census Bureau 2017). From there, we used the findings of Einav et al. (2018) on end-of-life health care spending to inflate medical expenditures for those predicted to die.

Demographic changes

It is important for a model such as LHIEM to capture changes to the overall population size and composition over time. In addition to simulating death (as described in the previous section), LHIEM also simulates birth and migration to account for such changes.

Birth, in particular, is critical to model because it contributes to both medical spending and demographic changes. We model birth as a Bernoulli random variable, where each female record has some probability of giving birth each year. We assigned these probabilities according to CDC fertility statistics based on age and race (Matthews & Hamilton 2019). New infant records introduced to the population were assigned the same demographic characteristics as the mother, with gender randomly assigned with equal probability and health status randomly assigned based on MEPS statistics.

To capture the effects of immigration, we added families to the model population according to the appropriate projections from the United States Census Bureau 2017 National Population Projections Tables (U.S. Census Bureau 2017). We randomly sampled the appropriate number of families from those labeled as immigrant families in the CPS dataset.

Results

This section begins with an analysis of LHIEM’s performance, focusing on validation and calibration. We then present a case study to illustrate the application of LHIEM for policy analysis, along with example results derived from this case study. We conclude with results from our uncertainty analysis.

Model validation

To validate LHIEM’s accuracy in capturing the dynamics of medical spending and its predictors, we conducted multiple validation exercises using historical data.

For individual-level validation of medical expenditures, we adopted an 85/15 data split for training and testing the CART and regression models, a standard practice in predictive modeling to balance model training and validation needs. This split provided a robust training dataset, with 11,448 observations in part one and 7,734 in part two, while ensuring a sufficient test set (15%) to assess model performance and generalizability.

In part one, we determined that the best performing model for forecasting non-zero medical expenses in a given year incorporated the following factors as covariates: age group, sex, health status, insurance status, pregnancy status, race, family income, a binary indicator of non-zero health expenditures in the previous year, and total health expenditures in the previous year. In cross-validation exercises, this model outperformed alternative models using subsets of the list of covariates identified.

In part two, we performed the CART analysis using the following factors as covariates: age group, sex, health status, insurance status, race, family income, and total health expenditures in the previous year. We used the R library rpart to construct, optimize, and validate candidate CART models. This library provides tools for users to manage model complexity by setting a complexity parameter. This parameter defines the minimum required improvement in model fit that justifies further branching of the tree. A common strategy involves initially growing a regression tree beyond the desired complexity and then pruning it to retain only the most significant branches. The "one-standard error" rule, a widely adopted practice, guides this pruning process (Therneau & Atkinson 1997). According to this rule, we trim the tree at the point where the cross-validation error is within one standard deviation above the minimum error observed.

The plotcp() function in R aids in this process by plotting the cross-validation error against the complexity parameter, highlighting the one-standard deviation threshold with a dashed line. The results of this validation process for the model of nonzero medical expenditures are depicted in Figure 1, which shows cross-validation error as a function of the number of splits (top \(x\)-axis) and corresponding complexity parameter (bottom \(x\)-axis). According to the one standard error rule, the optimal complexity parameter is 0.0019, leading to a tree with 12 splits and 13 terminal nodes. However, the figure also indicates potential overfitting with larger trees, as gains in cross-validation error taper off quickly beyond the initial splits. This observation steered us to a conservative approach, opting to prune the tree down to six splits. This equates to a complexity parameter of 0.006, striking a balance between model complexity and predictive performance.

The CART resulting from this analysis is depicted in Figure 2. In this diagram, X\(_t\) indicates the natural log-transform of medical expenditures in year \(t\). The top number in each box provides the CART estimate for \(\text{X}_{t+1}\) (the outcome variable, the natural log-transform of medical expenditures in year \(t+1\)), and the bottom number provides the percentage of the training sample that falls into each partition. The six splits correspond to seven leaves (or partitions), each of a suitable sample size to estimate a regression. For each leaf in the resulting regression tree, we then built a separate regression using the same covariates and a generalized linear model assuming normally distributed errors.

Validation of aggregate model results against NHEA data revealed a higher-than-expected medical inflation rate of 6.9% per year. We therefore added a calibration step to control the rate of inflation. We first normalized the year \(t+1\) expenditures by dividing by the weighted inflation rate, then multiplied by the desired inflation rate. We pegged the inflation of medical spending to the projected growth in Medicare per capita spending, rather than to Medical CPI, and thus assumed a growth rate of 5.1% (Cubanski et al. 2019). Medical expenditure estimates for each year of the simulation were then recalibrated to this inflation rate. This calibration step reflects the tendency for dynamic microsimulations to require alignment to aggregate benchmarks (Harding 2007).

To validate the income submodel, we analyzed both the individual income trajectories and the overall income distribution generated by the model. We first ensured face validity, which involved subjectively assessing whether the model’s income trajectories and distributions appeared realistic and plausible when compared to known economic patterns. Subsequently, for a more objective assessment, we verified that the distribution of changes in income from one year to the next aligned statistically with the 2006-2018 PSID data. Additionally, we compared the income distributions our model produced for years 1, 5, 10, and 15 against the corresponding distributions in the PSID dataset, checking for significant deviations. Our model consistently aligned with each validation criterion considered, affirming its robustness and reliability in realistically simulating income trajectories and distributions. Figure 3 shows how the income distribution of the LHIEM model population compares to the income distribution according to data from the CPS-ASEC (U.S. Census Bureau 2016b).

Validation of the health status submodel was crucial, as transitions between good and poor health states — whether driven by acute or chronic conditions — are key predictors of medical expenditures. To ensure that the health status distributions by age and gender were accurately represented over time, we benchmarked our incidence rates for acute and chronic conditions against MEPS data. It is important to note that MEPS health status is self-reported and may not fully capture the true prevalence of conditions, unlike the CDC incidence data which informs the dynamics of our health status submodel. Indeed, initial comparisons revealed an overestimation of ‘good’ health status among individuals under 25 years in our model. To address this discrepancy, we adjusted the incidence rates of acute conditions for this age group to better align with MEPS data. Figure 4 shows the proportion of individuals in good health as a function of age for the uncalibrated model, the calibrated model, and the MEPS data used for comparison. Further analysis revealed that these adjustments significantly affected projections of individual and household healthcare expenditures, as shown in the results for Model 4 in Table 6. This validation process underscores the importance of calibration in model validation to ensure both the accuracy and applicability of our findings.

Case study: An alternative health care financing approach

We developed LHIEM in order to analyze the 10 Plan, a proposed alternative health care financing approach targeting individuals who are uninsured or purchase private coverage in the nongroup health insurance market (Kapinos et al. 2021). This policy introduces a self-pay system designed to reduce the financial burden of health care costs for eligible individuals and families.

For this analysis, LHIEM provided dynamic estimates of health care expenditures at the individual, family, and population levels, along with estimated federal costs of implementation. The model incorporated predicted changes in health care utilization and prices over a 15-year time horizon. To simulate the proposed policy, we developed an additional module for tracking loans and payments associated with the proposed financing system. Further information on this extension can be found in Kapinos et al. (2021).

Importantly, we designed this policy module in a way that integrates seamlessly with LHIEM’s core model mechanics but easily allows for modification or replacement to explore other policy scenarios. This modular approach also allows us to distinguish between uncertainty arising from the core model mechanics and uncertainty related to the policy-specific elements.

We next present a sample of results from this analysis, focusing on the key outputs relevant to assessing the proposed policy’s impacts.

Results

Under the study’s baseline assumptions, our analysis suggested that the policy would cover approximately 46 million individuals (those who are currently uninsured or purchase private coverage in the nongroup health insurance market) and reduce total health care expenditures by $33 billion over 15 years. On average, this translated to a decrease in out-of-pocket costs of $1,343 per year for covered individuals, though out-of-pocket costs would actually increase on average for those who are currently uninsured due to an expected increase in utilization. Importantly, the results varied dramatically depending on the price levels that families and individuals covered by the policy would face, but were less sensitive to the model parameters and other policy details tested (Kapinos et al. 2021).

Figure 5 shows the estimated mean annual health care expenditures among individuals who would be eligible for the self-pay system, both under the status quo (SQ) and under the proposed self-pay system with baseline assumptions. The differences in total expenditures between the status quo and proposed policy scenarios can be attributed to assumptions about changes in utilization and health care prices under the self-pay system. Notably, individuals who are currently uninsured would face an increase in their out-of-pocket (OOP) costs, while individuals who are currently covered by a non-group plan would benefit from a substantial decrease in their out-of-pocket costs. On average, individuals eligible for the self-pay system would pay less out-of-pocket under the proposed policy than under the status quo.

Uncertainty analysis

Understanding the inherent uncertainties in a complex model like LHIEM is essential for ensuring its reliability and applicability. To address this, we conducted a comprehensive uncertainty analysis to examine the effects of both parameter uncertainty (stemming from incomplete information, imperfect data, and estimation error) and model uncertainty (arising from simplifications and assumptions in model formulation).

Table 5 outlines the core model variations tested in this analysis, each representing a modification to a specific parameter or assumption in the model mechanics. Table 6 summarizes the results for these variations, highlighting the percentage change from the baseline policy scenario. This comparison allows us to evaluate the impact of individual parameters or modeling assumptions on outcomes at the individual and household levels. Additionally, Table 7 provides detailed summary statistics at the household level, offering a more nuanced view of the variation and uncertainty within each model.

Among the seven model variants tested, Model 4 (which omits the calibration step discussed in paragraph 3.10 aligning the health status distribution to MEPS data) yielded particularly revealing insights. This modification resulted in the greatest deviations from baseline results, highlighting a potential disparity between self-reported health status and actual medical conditions according to CDC statistics. Since LHIEM’s medical expenditure predictions rely on self-reported health status, achieving alignment with the MEPS distribution was critical. This result underscores the importance of calibration in model validation and illustrates how uncertainty analysis can identify key factors that influence model performance and outcome accuracy.

In addition to the seven core model variations in Table 5, our full analysis included over 20 additional model variants focused on specific policy details and their effects on the healthcare landscape. These variants explored factors such as medical price levels, healthcare demand, loan repayment rates, and loan interest rates. While these policy-specific variations are not covered in this section, their results are discussed in detail in Kapinos et al. (2021).

Table 5: Detailed Description of LHIEM Model Variants
Model variant Module affected Detailed description
1. Status Quo Policy This model runs LHIEM with the policy module turned off. This represents the predictions produced by LHIEM without assuming any policy intervention.
2. Baseline policy Policy This model runs LHIEM with the policy module turned on. This represents the predictions produced by LHIEM under the baseline policy intervention examined.
3. Baseline policy with lower rate of medical inflation Medical expenditures The inflation rate for health care is separate from the overall inflation rate for the economy. This model tested the sensitivity of LHIEM long-term projections to the rate of medical inflation assumed.
4. Baseline policy without calibrating distribution of health status to MEPS Health status As described earlier, we calibrated LHIEM’s distribution of health status to the MEPS self-reported health status distribution. By removing the calibration step, this model assesses the sensitivity of LHIEM to calibration of health status.
5. Baseline policy with end-of-life consumption inflated only for those with chronic conditions Mortality and end-of-life expenditures While significant medical expenditures are often incurred in the last year of life, the costs incurred by an individual suffering from a serious chronic condition (for example, cancer) will likely far exceed the costs incurred by the victim of a car accident, for example. This model assesses the sensitivity of LHIEM to inflating the end-of-life healthcare costs for only those individuals suffering from chronic conditions, rather than for all individuals.
6. Baseline policy with deflation of mortality scaling factor Mortality and end-of-life expenditures Our method for estimating the relationship between mortality and health care spending is based on the work of Einav et al. (2018). To assess LHIEM’s sensitivity to the characterization of this relationship, we tested a model that scales the estimated function relating mortality to health care spending.
7. Baseline policy with insurance category corrections Income and insurance status Since the CPS data on health insurance coverage is self-reported, LHIEM’s initial assignment of health insurance status may contain some inconsistencies. To test the sensitivity of LHIEM to potential misreporting of insurance status, we performed the following steps: 1) reassign anyone on Medicaid and at or above 400% FPL to have Group Private insurance; 2) reassign anyone on Medicaid and between 200-400% FPL to have Nongroup Private insurance; 3) reassign individuals at or below 100% FPL with Nongroup Private insurance to be Uninsured.
Table 6: Sample of simulation results: mean and median household and individual expenditures in year 5 and year 15 of the simulation, and corresponding percentage change from baseline for uncertainty analysis runs (Models 3-7)
Model Variant Mean expenditures ($) Median expenditures ($) % Change from baseline median
Individual Household Individual Household Individual Household
Year 5 Year 15 Year 5 Year 15 Year 5 Year 15 Year 5 Year 15 Year 5 Year 15 Year 5 Year 15
1. Status quo 4,414 7,274 5,554 8,960 2,279 3,421 3,162 4,689
2. Baseline policy 4,328 7,313 5,459 9,029 2,399 3,246 3,301 4,621
3. Baseline policy with
lower rate of
medical inflation
4,301 7,306 5,418 8,961 2,388 3,261 3,178 4,651 -0.45% 0.46% -3.72% 0.65%
4. Baseline policy without
calibrating distribution
of health status to MEPS
4,325 7,276 5,445 8,965 2,328 2,995 3,034 4,296 -2.96% -7.73% -8.09% -7.04%
5. Baseline policy with end-of-life
consumption inflated only for
those with chronic conditions
4,343 7,266 5,500 8,945 2,398 3,219 3,305 4,584 -0.06% -0.83% 0.13% -0.80%
6. Baseline policy with
deflation of mortality
scaling factor
4,336 7,352 5,461 9,029 2,393 3,251 3,260 4,698 -0.27% 0.14% -1.24% 1.67%
7. Baseline policy with
insurance category
corrections
4,264 7,333 5,364 9,059 2,384 3,215 3,114 4,703 -0.65% -0.97% -5.65% 1.79%
Table 7: Sample of simulation results: summary statistics for household-level medical expenditures in year 5 and year 15 of the simulation.
Model Variant Year 5 ($) Year 15 ($)
Mean 25th Pct 50th Pct 75th Pct Mean 25th Pct 50th Pct 75th Pct
1. Status quo 5,554 1,298 3,162 7,254 8,960 1,649 4,689 11,146
2. Baseline policy 5,459 1,685 3,301 6,411 9,029 2,050 4,621 13,313
3. Baseline policy with
lower rate of
medical inflation
5,418 1,707 3,178 6,378 8,961 2,026 4,651 12,941
4. Baseline policy without
calibrating distribution
of health status to MEPS
5,445 1,657 3,034 6,316 8,965 1,979 4,296 13,274
5. Baseline policy with end-of-life
consumption inflated only for
those with chronic conditions
5,500 1,721 3,305 6,397 8,945 1,985 4,584 13,003
6. Baseline policy with
deflation of mortality
scaling factor
5,461 1,720 3,260 6,408 9,029 2,053 4,698 13,216
7. Baseline policy with
insurance category
corrections
5,364 1,688 3,114 6,300 9,059 2,035 4,703 13,407

Discussion

Investigating policies related to accumulated outcomes — such as retirement savings, wealth, or bio-accumulative health processes — requires models that account for path-dependent dynamics. In this paper we have introduced LHIEM, a dynamic microsimulation model capable of simulating medical expenditures, health status, income, mortality, and demographic attributes over time. By capturing these interrelated dynamics, LHIEM enables the analysis of path-dependent policies that cannot be adequately assessed using models that assume partial equilibrium. This expands the scope of policies that can be studied, providing valuable insights for policymaking across a variety of domains beyond health.

LHIEM has already been applied to evaluate the 10Plan (Kapinos et al. 2021), an alternative healthcare financing approach. Specifically, the model was used to examine how total health care spending, individual- and family-level out-of-pocket health spending, and federal spending would change under a self-pay system aimed at reducing the financial burden for families without health insurance or those purchasing private coverage in the nongroup market.

Our work builds on prior studies, publicly available datasets, and empirical evidence to create a modular and flexible dynamic microsimulation framework. A key strength of LHIEM is its reliance on open-source datasets and statistics that are regularly maintained and updated: the Current Population Survey (CPS); the Medical Expenditure Panel Survey (MEPS); the Panel Study on Income Dynamics (PSID); Centers for Disease Control (CDC) data; and United States Census Bureau population-level projections. This open data foundation improves the reproducibility of our work and facilitates the use of LHIEM for future policy analysis.

Additionally, each of LHIEM’s submodels is built using the most relevant data sources to accurately model the evolution of its respective dynamic variables. This modular design also supports targeted enhancements, enabling researchers to refine specific components to achieve greater accuracy or precision for particular variables of interest. Together, these features make LHIEM a robust and adaptable tool for studying complex pathdependent policy interventions.

Areas for future work

As with any simulation, we made several assumptions and simplifications, in some cases to improve model tractability and, in other cases, to avoid incorporating unnecessary sources of uncertainty and keep the model intuitive and straightforward. We have noted these throughout the paper. However, depending on the intended application, it may be desirable to modify particular aspects of the model to improve its fidelity for particular outcomes of interest. Here, we offer a discussion of potential areas for future work, and we hope that other researchers will expand or improve upon these where useful.

First, we note that LHIEM is a multi-year dynamic simulation constructed based on only two years of medical expenditure data. While it would have been preferable to develop a model of spending patterns with more than two years of data, the MEPS is not designed to link data across more than two years, and the authors are unaware of a publicly available dataset of comparable quality containing the required information for more than two years. The literature suggests that the relationship between medical spending in year \(t\) and year \(t + 2\) is much weaker than that between spending in year \(t\) and year \(t + 1\), so additional data of this nature would be unlikely to dramatically improve results (see, for example, Eichner et al. 1997). However, it may be worth exploring whether additional longitudinal data can improve predictive accuracy, particularly if richer datasets become available.

Another rich area for future work would involve refining our characterization of health status and its interaction with LHIEM’s other submodels. First and foremost, the health transition rates would benefit from more detailed data directly capturing the hazard rates of various acute and chronic conditions, beyond the cause of death data we currently use. The calibration step aligning the LHIEM health status distributions to MEPS data currently corrects biases introduced by our approach, but higher-fidelity data could eliminate the need for this adjustment. In addition, a more direct dependence of end-of-life spending on health status could be appropriate. Currently, LHIEM admits an indirect dependence due to the assumed dependence of mortality rate on health spending and the dependence of health spending on health status. However, for a given mortality rate, it is possible that those suffering from chronic diseases may incur higher end-of-life costs than those with acute conditions. Furthermore, our scaling of the link between mortality and health spending found by Einav et al. (2018) and our application of this method to all age groups could introduce bias, as the relationship they found was actually based on the Medicare population.

Additional extensions could address our assumption that pregnancy rates are independent of health status and health insurance status. In reality, women in poor health may not become pregnant at the same rate as their counterparts in good health. It is also well known that Medicaid pays for about 50% of births, though this varies by state. Thus, we might expect some relationship between the likelihood of pregnancy and both health status and insurance status. Studies focusing specifically on maternal health, for example, may warrant a more detailed characterization of these relationships. These extensions would benefit from more detailed data on the relationships between medical spending, health status, pregnancy, and birth rates.

Finally, future studies should consider updating the data informing LHIEM’s submodel dynamics. In particular, the income submodel is based on PSID data covering portions of the 2001-2007 business cycle, the 2008 recession, and the subsequent business cycle up to 2016. This period saw slower income growth and lower labor force participation in the U.S. relative to other periods. Thus, LHIEM projections would benefit from recalibration if macroeconomic conditions are expected to be better or worse than they were during this timeframe. For example, in the early months of the COVID-19 pandemic, we adapted this module to incorporate updated projections of unemployment, ESI responsiveness, and wage growth obtained from the Urban Institute (Garrett & Gangopadhyaya, 2020). We recommend similar adaptations for future policy analyses, as well.

Conclusions

LHIEM is a novel dynamic longitudinal microsimulation developed to evaluate a proposed health care policy initiative. An important contribution of this work is the demonstration of the value of a flexible, modular approach; each submodel provides the necessary level of fidelity for a key component of the model and can easily be adapted based on the needs of the research question. In describing the model and data architecture in such detail, we aim to encourage further work in this area and facilitate this class of model development for future policy analysis.

LHIEM addresses a critical gap in health policy modeling by explicitly capturing path-dependency in healthcare expenditures. Unlike many existing models that treat each year independently, LHIEM explicitly tracks individual histories, making it particularly well-suited for evaluating policies where cumulative effects matter, such as healthcare financing reforms. LHIEM evolves key attributes like health status, income, employment, and medical expenditures over time, allowing for realistic life-course simulations. Its modular structure ensures flexibility, enabling researchers to adapt it to a wide range of policy questions, update components as better data becomes available, and extend its functionality as needed. LHIEM has already been validated against external datasets and applied to analyze the 10Plan, demonstrating its practical value in real-world policy evaluation.

While LHIEM represents a significant advancement, several areas warrant future development. Expanding the medical expenditure model beyond two years of data could enhance long-term projections, and updating transition probabilities would ensure alignment with changing economic and demographic trends. Improving end-of-life spending estimates and incorporating richer interactions between submodels — for example, between health status and fertility decisions — would further refine the accuracy of the model.

Given its flexibility and broad applicability, LHIEM is well-positioned for future research on healthcare financing, medical debt, and other policy interventions that require detailed, longitudinal tracking of individual and household outcomes. As policy continues to evolve, a robust, adaptable modeling framework like LHIEM is essential for assessing long-term impacts and informing evidence-based decision-making. By enabling more nuanced, history-aware analyses, LHIEM provides policymakers and researchers with a powerful tool to explore complex hypothetical scenarios and craft sustainable, data-driven solutions to pressing policy challenges.

Acknowledgements

The authors are employees of the RAND Corporation (with exception of Adrienne M. Propp, who was an employee of the RAND Corporation at the time of writing), which received an unrestricted grant from Mark Cuban for this research. We thank Katherine Carman, Federico Girosi, Christine Eibner, David Adamson, Jayme Fuglesten, Paul Koegel, and Jodi Liu (RAND Corporation), Jonathan Gruber (Massachusetts Institute of Technology), and James Capretta (American Enterprise Institute) for their helpful guidance and reviews of this work.

Notes

  1. In the Health Insurance Marketplace, plan metal level is a ranking system that determines how an individual and his/her health care plan split health care costs.↩︎
  2. That is, the variables we used to identify "similar" individuals in the two datasets, allowing us to augment one dataset with attributes from the other.↩︎
  3. We excluded individuals over 65 in our study because these individuals are covered by Medicare and are thus not impacted by health care policy proposals such as the 10Plan.↩︎
  4. In other words, while many individuals may be assigned the same transition rate for a given attribute, the occurrence of that transition is determined independently for each individual.↩︎

Appendix A: Variable Harmonization

We used multiple datasets in the construction of LHIEM. In some cases, different datasets had different codifications of variables of interest. In these cases, it was necessary to harmonize across datasets. We include the harmonization rules we used for LHIEM in Table 8 below.

Table 8: Variable harmonization across datasets
Model variable Data source Data source variable name Raw value Harmonized value
Race CPS RACE 100 White
802
803
804
200 Black
801
805
806
807
810
811
814
816
HISPAN All values except the following: Hispanic
0
901
902
MEPS RACETHX 1 Hispanic
2 White
3 Black
4 Other
5
InsCat CPS ANYCOVLY 1 Uninsured
GRPCOVLY 2 Other Private
MRKSCOVLY 2 NonGroup
MRKCOVLY 2
NMCOVLY 2
PUBPART 2 Other Public
TRCCOVLY 2
CHAMPVALY 2
INHCOVLY 2
PUBCOVLY 2
HICHAMP 2
HIMCARE 2
HIMCAID 2 Medicaid
MEPS INSURCY1 3 Uninsured
INSURCY1 1 Other Private
PRSTXY1 1 NonGroup
PRX*1 1
PNG*1 1
INSURCY1 2 Other Public
INSURCY1 4
INSURCY1 5
INSURCY1 6
INSURCY1 7
INSURCY1 8
MCDEVY1 1 Medicaid
MCD*Y1 1
HealthStatus MEPS RTHLTH3 -9 N/A
-8
-7
-1
1 Good
2
3
4 Bad
5

Appendix B: Participation and spending for case study policy over time

In the following table, we show the number of participants in the proposed case study policy at the end of years 1, 5, 10, and 15 by gender, race, and age group, and provide mean family income. These projections demonstrate stability in the demographics of the model population affected by the policy over the projected time frame, absent external forces such as behavioral shifts.

Table 9: Plan Participation and Spending at the End of Years 1, 5, 10, and 15
Year 1 Year 5 Year 10 Year 15
Number of Individuals (millions) 43.5 43.6 45.7 46.8
% Female 50% 51% 50% 50%
% Male 50% 49% 50% 50%
% White 53% 52% 52% 50%
% Black 16% 16% 15% 16%
% Hispanic 24% 24% 25% 26%
% < 19 29% 30% 30% 29%
% 19 to 34 26% 25% 25% 24%
% 35 to 49 22% 23% 22% 23%
% 50 to 64 23% 22% 21% 23%
Mean Family Income ($ thousands) 86.8 90.5 89.7 88.0

References

ABRAHAM, J. (2012). Predicting the effects of the affordable care act: A comparative analysis of health policy microsimulation models. State Health Reform Assistance Network, policy brief.

AGENCY for Healthcare Research and Quality. (2018). Medical Expenditure Panel Survey (MEPS) Panel 20 Longitudinal Data File. Available at: https://meps.ahrq.gov/data_stats/download_data_files.jsp

AMERICAN Cancer Society. (2019). Cancer facts and figures 2019. Atlanta, GA. Available at: https://www.cancer.org/research/cancer-facts-statistics/all-cancer-facts-figures/cancer-facts-figures-2019.html

BELOTTI, F., Deb, P., Manning, W. G., & Norton, E. C. (2015). Twopm: Two-Part models. The Stata Journal, 15(1), 3–20. [doi:10.1177/1536867x1501500102]

BERCHICK, E. R., Barnett, J. C., & Upton, R. D. (2019). Health insurance coverage in the United States: 2018. Current Population Reports, P60-267(RV).

BERNARD, D., Cowan, C., Selden, T., Cai, L., Catlin, A., & Heffler, S. (2012). Reconciling medical expenditure estimates from the meps and nhea, 2007. Medicare & Medicaid Research Review, 2. [doi:10.5600/mmrr.002.04.a09]

CRIMMINS, E., Hayward, M., Ueda, H., Saito, Y., & Kim, J. K. (2008). Life with and without heart disease among women and men over 50. Journal of Women & Aging, 20, 5–19. [doi:10.1300/j074v20n01_02]

CUBANSKI, J., Neuman, T., & Freed, M. (2019). The facts on Medicare spending and financing. The Henry J. Kaiser Family Foundation.

DUAN, N., Manning, W. G., Morris, C. N., & Newhouse, P., J. (1983). A comparison of alternative models for the demand for medical care. Journal of Business & Economic Statistics, 1(2), 115–126. [doi:10.1080/07350015.1983.10509330]

EHEALTH. (2019a). Health insurance index report for the 2019 open enrollment period. Available at: https://news.ehealthinsurance.com/_ir/68/20196/Health_Insurance_Index_Report_ACA_2019.pdf

EHEALTH. (2019b). How much does individual health insurance cost? Available at: https://www.ehealthinsurance.com/resources/guide/individual-health-insurance-cost

EICHNER, M. J., McClellan, M. B., & Wise, D. A. (1997). Health expenditure persistence and the feasibility of medical savings accounts. Tax Policy and the Economy, 11, 91–128. [doi:10.1086/tpe.11.20061846]

EINAV, L., Finkelstein, A., Mullainathan, S., & Obermeyer, Z. (2018). Predictive modeling of U.S. health care spending in late life. Science, 360, 1462–1465. [doi:10.1126/science.aar5045]

FAN, T., Trask, N., D’Elia, M., & Darve, E. (2023). Probabilistic partition of unity networks for high-dimensional regression problems. International Journal for Numerical Methods in Engineering, 124(10), 2215–2236. [doi:10.1002/nme.7207]

FRENCH, E., McCauley, J., Aragon, M., Bakx, P., Chalkley, M., Chen, S., Christensen, B., Chuang, H., Côté-Sergent, A., De Nardi, M., Fan, E., Échevin, D., Geoffard, P.-Y., Gastaldi-Ménager, C., Gortz, M., Ibuka, Y., Jones, J., Kallestrup-Lamb, M., Karlsson, M., & Kelly, E. (2017). End-of-life medical spending in last twelve months of life is lower than previously reported. Health Affairs, 36, 1211–1217. [doi:10.1377/hlthaff.2017.0174]

GARRETT, B., & Gangopadhyaya, A. (2020). How the COVID-19 recession could affect health insurance coverage. SSRN pre-print. 10.2139/ssrn.3598558. [doi:10.2139/ssrn.3598558]

GOLDMAN, D. P., Leaf, D. E. & Tysinger, B. (2016). The future Americans model: Technical documentation. Available at: https://cehd.uchicago.edu/wp-content/uploads/2019/12/fam_techdoc.pdf

HARDING, A. (2007). Challenges and opportunities of dynamic microsimulation modelling. Proceedings of 1st General Conference of the International Microsimulation Association.

HASS, Z., Levine, M., Sands, L., Ting, J., & Xu, H. (2016). The modeling of medical expenditure data from a longitudinal survey using the generalized method of moments (GMM) approach. Statistics in Medicine, 35. [doi:10.1002/sim.6878]

HERON, M. (2019). Deaths: Leading causes for 2017. National vital statistics reports: Fromthe Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System.

HSIA, R., Akosa Antwi, Y., & Weber, E. (2014). Analysis of variation in charges and prices paid for vaginal and caesarean section births: A cross-sectional study. BMJ Open, 4, e004017. [doi:10.1136/bmjopen-2013-004017]

HUMAN Mortality Database. (2017). Human Mortality Database. www.mortality.org

KAPINOS, K., Price, C., Anderson, D., Propp, A., Vardavas, R., & Whaley, C. (2021). Analysis of the 10Plan: A self-pay system designed to minimize the burden of health care costs. RAND Corporation, RR-4270-MC.

KRIJKAMP, E. M., Alarid-Escudero, F., Enns, E. A., Jalal, H. J., Hunink, M. M., & Pechlivanoglou, P. (2018). Microsimulation modeling for health decision sciences using R: A tutorial. Medical Decision Making, 38(3), 400–422. [doi:10.1177/0272989x18754513]

MATTHEWS, T. J., & Hamilton, B. (2019). Total fertility rates by state and race and hispanic origin: United States, 2017. National vital statistics reports: From the Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System.

MIHAYLOVA, B., Briggs, A., O’Hagan, A., & Thompson, S. (2011). Review of statistical methods for analysing healthcare resources and costs. Health Economics, 20, 897–916. [doi:10.1002/hec.1653]

MITCHELL, E. (2019). Concentration of health expenditures and selected characteristics of persons with high expenses, U.S. civilian noninstitutionalized population, 2016. Brief #521. Agency for Healthcare Research and Quality, Rockville, MD.

MURPHY, S., Xu, J., Kochanek, K., & Arias, E. (2018). Mortality in the United States, 2017 key findings data from the national vital statistics system. NCHS data brief

NATIONAL Center for Health Statistics (2016). Health, United States, 2016. Population, by sex and five-year age groups: United States, 1975 and 2015. Available at: https://www.cdc.gov/nchs/hus/data-finder.htm

NATIONAL Conference of State Legislatures. (2013). Prenatal care. https://www.ncsl.org/research/health/prenatal-care-postcard.aspx. Accessed August 2020.

OLSEN, M. K., & Schafer, J. L. (2001). A two-Part random-Effects model for semicontinuous longitudinal data. Journal of the American Statistical Association, 96(454), 730–745. [doi:10.1198/016214501753168389]

RUTTER, C. M., Zaslavsky, A. M., & Feuer, E. J. (2011). Dynamic microsimulation models for health outcomes: A review. Medical Decision Making: An International Journal of the Society for Medical Decision Making, 31(1), 10–18. [doi:10.1177/0272989x10369005]

THERNEAU, T., & Atkinson, E. (1997). An introduction to recursive partitioning using the RPART routines. Mayo Clinic. [doi:10.32614/cran.package.rpart]

UNITED Nations. (2017). Model life tables. Available at: https://www.un.org/development/desa/pd/data/model-life-tables

U.S. Census Bureau. (2016a). Age and sex composition in the United States: 2016. Available at: https://www.census.gov/data/tables/2016/demo/age-and-sex/2016-age-sex-composition.html

U.S. Census Bureau. (2016b). FINC-01. Selected characteristics of families by total money income. Available at: https://www.census.gov/data/tables/time-series/demo/income-poverty/cps-finc/finc-01.2016.html#list-tab-650840761

U.S. Census Bureau. (2017). 2017 national population projections tables: Main series. Available at: https://www.census.gov/data/tables/2017/demo/popproj/2017-summary-tables.html

U.S. Census Bureau & U.S. Bureau of Labor Statistics (2017). Current Population Survey, 2019 Annual Social and Economic (ASEC) Supplement. Available at: https://www.census.gov/data/datasets/time-series/demo/cps/cps-asec.html