Thomas Sauerbier (2002)
UMDBS - A New Tool for Dynamic Microsimulation
Journal of Artificial Societies and Social Simulation
vol. 5, no. 2
To cite articles published in the Journal of Artificial Societies and Social Simulation, please reference the above information and include paragraph numbers if necessary
Microsimulation is a powerful method for analysis and forecasting especially in the field of economics and social science. One of the main reasons for its relatively rare usage is that until now there has been no standard software available. The Universal Micro DataBase System, UMDBS, is a new tool that runs on any Windows PC. It is suited for all tasks involved in running a microsimulation starting from the import of external data, the development of the simulation model, to the analysis of the results. It includes MISTRAL, an integrated modelling language that allows implementing the simulation models as well as analysing the micro data.After a short introduction to microsimulation, this article first presents the UMDBS and its main functions. Then an overview to the new modelling language MISTRAL is given including the features, the structure, and the implementation. Finally information is given about how to get UMDBS for free.
Micro Data; Microsimulation; MISTRAL; Monte Carlo Simulation; Simulation Languages; Simulation Systems; UMDBS
- Microsimulation is a method which allows predicting the development of populations on the basis of empirical micro data. In contrast to widely used models at the macro level, micro models give information about distributions of demographic or socio-economic attributes.
- Although this method has been known for about 45 years (Orcutt 1957), it is not used to the extent that matches its potential. In contrast to macro models, the main reason for that situation may be that there have so far been no tools available for general use. For each project, costly programming work has been necessary which normally exceeds the capability or knowledge of the researchers who are often economists or social scientists.
- In this article the simulation system UMDBS is presented that allows non computer specialists to examine empirical micro data, run microsimulations, and analyse the results.
- The starting point for a microsimulation is a microdatabase that represents a specific population. The objects of that database correspond to individual decision making units (e.g. persons, households, or enterprises) that are described by a set of relevant attributes (e.g. age, sex, income). The following explanations are based on the modelling of households and persons. For other types of micro objects the principles are the same.
- In the microsimulation, a microdatabase that is representative for period t is transferred to period t+1. In the case of dynamic microsimulation, each micro object is aged individually. That means for example that a person can get married or divorced, get a baby or die. The result of that process is a microdatabase that should be representative of the underlying population in the future period t+1 together with a set of statistics for the events (e.g. marriage, death) that have occurred in the simulated period.
Figure 1. Principle of Microsimulation
- The simulation is done at the level of the single micro objects. The aggregation to the level of the underlying population is only done when the results are analysed. Then for example 200 micro objects that are created in microsimulation because of a birth process will be scaled up to about 700000 births for Germany in one year.
Introduction to UMDBS
- The Universal Micro DataBase System UMDBS is a simulation system that can be used for all the basic tasks necessary for a dynamic microsimulation. The main applications are socio-economic investigations so that the potential users are especially empirically oriented economists and social scientists.
- The UMDBS was developed at the Darmstadt University of Technology by Thomas Sauerbier. It is implemented in the object-oriented programming language Smalltalk and consists of about 45000 lines of code. It runs on any Windows PC and has a graphical user interface (figure 2):
Figure 2. User Interface of UMDBS
- The user interface is suited to the typical user by showing him/her the main tasks in the form of an interactive flow diagram. So the user can work in a well-known environment that shows each necessary step. By means of different colours the system shows the user which step has to be taken next or at what position an error has occurred.
- For simulating existing models with UMDBS only basic computer experience and no programming knowledge is necessary. In order to develop complex microsimulation models some programming experience is needed. However, the integrated modelling language MISTRAL is suited especially for the field of microsimulation and makes it easier to implement microsimulation models compared with normal programming languages.
Functionality of UMDBS
- The functionality of the UMDBS includes all tasks that are needed in a microsimulation:
These tasks are described in detail below.
- importing or generating the microdatabase
- analysing the micro data
- running the simulation
- analysing the simulation results
Importing or Generating a Microdatabase
- The central element in UMDBS is a microdatabase that normally contains a representative sample of a population at a specific time. This database consists of a great number of micro objects which are instances of several classes.
- Each class describes the characteristics of one kind of object, e.g. the definition of the names and types of their attributes. Typical classes in a microsimulation are person, family, household, and enterprise. The indivdiual micro objects of one class (e.g. any single person) all have the same attributes but normally different values for those attributes.
- The microdatabase in UMDBS contains on the one hand the definition of the classes and the self-defined data types (e.g. sex with the values female and male) and on the other hand the data record for each micro object.
- Normally the micro data used in UMDBS come from external sources. For the population of Germany this may for example be the Mikrozensus, the Einkommens- und Verbrauchsstichprobe, or the German Socio-economic Panel.
- These data are normally available in the form of one or more ASCII files. The meta data with the format of the records and the coding of the attributes are sometimes also available in files but often are only given in a printed or a non-structured form.
- The UMDBS is able to import micro data from any ASCII file. In order to support that flexibility, the user has to write an import program in the integrated modelling language MISTRAL. This is necessary because normally there are several complicated tasks to be done to transform the external data into the internal microdatabase:
- The classes have to be defined, including the names and types of the attributes and the definition of the user defined types. This meta information is not normally included in the data files and often it has to be adapted to the specific requirements of the model. In most cases only a subset of the given data is imported.
- The data have to be extracted from the ASCII files. Sometimes this includes a matching of data spread over several files.
- Often the coding of the attributes has to be changed. For example, data recording many nationalities can be reduced to native and foreign.
- The values for some attributes may be calculated depending on other attributes or other objects (e.g. the age can be derived from the birth year).
- Before running a microsimulation the missing values (often found in empirical data) have to be substituted with plausible values. The values of other attributes and external distributions can be used for this.
- Wrong or inconsistent values have to be corrected. For example, if two married persons show a different year for their marriage, (at least) one of that values has to be changed.
- Sometimes synthetic attributes have to be created. For example, if the duration of marriage is not given, it may be derived from the ages of both partners and existing children using additional external distributions or probabilities.
- The complexity of the tasks described above shows that the import of data can not be done by only defining tables but it is necessary to define algorithms. So the subset of the modelling language MISTRAL used for that task contains nearly the full range of functions of a normal programming language like PASCAL.
- After the import a microdatabase is available in an internal format that can be used for analysis and simulation. For further use the whole database - including definitions and data records - can be stored in one single ASCII file.
- Sometimes synthetic micro data are needed for testing or theoretical investigations. They can be created in the same manner as importing external data. It is also possible to run disclosure-avoidance algorithms to make the micro data anonymous.
Analysis of Micro data
- Because there are many powerful statistical programs available, the static analysis of micro data is not one of the main functions of UMDBS. So the statistical functions of the system are at a base level. On the other hand it offers some very efficient and comfortable methods for an interactive analysis of the microdatabase which are similar to the usage of the database query language SQL. It is also possible to run complex MISTRAL programs for analysing and testing microdatabases automatically.
- The most important UMDBS analysis tools and their functionality are described below.
Figure 3. Micro Object Monitor
- The Micro Object Monitor allows the interactive analysis of single micro objects. An interesting feature is the possibility of displaying corresponding objects (e.g. the partner or mother of a person) in a second sub window. This tool is especially useful for deeper investigations after automatic test programs have found an error.
Figure 4. Micro Monitor
- With the Micro Monitor the user can formulate interactive queries to the current microdatabase. The results are shown immediately as graphs, tables, or single values. For these queries, the Micro Query Language MQL (a subset of MISTRAL) is used. Here is a concrete example (shown also in the upper left sub window in figure 4):
getclass (Person) |
select ((self.Sex = female) and (self.Partner_married <> nil)) |
distribution (self.Age - self.Partner_married.Age)
- In the first line a set of all objects of the class Person is generated. In the next line the objects are selected that are both female and have a pointer to their marriage partner. For that set the distribution of the differences in age between the woman and their husband is calculated. The result is displayed as a histogram in the sub window below and as a table at the upper right.
- A general MQL statement consists of a first instruction that takes all objects of a specific class from the microdatabase. This set is transferred from one statement to the next by the pipe symbol (well known from UNIX or DOS). The following statements select objects that match the defined condition or follow pointers to corresponding objects (e.g. from the persons to their households). The last statement calculates the result. This may be a one- or two-dimensional distribution or a single value that can be the minimum, maximum, average, or sum of one attribute.
- MQL also allows complicated queries. Below is a short example that gives the distribution of the number of persons living in a household with at least one person with 65 years or older (Sets in MQL contain objects only once. So in the example there is no double-counting of households that include more than one person aged 65 or more.):
getclass (Person) |
select (self.Age >= 65) |
collect (self.Household) |
- All these capabilities are also available in MISTRAL programs which can automatically analyse microdatabases. The queries are written in such programs and the results go to a report or another interactive analysing tool for viewing graphs or tables.
- Another very important application for the analysis capabilities of UMDBS is the testing of imported data. For example no negative age is possible and the husband of a woman must have the same year of marriage as she does. Such rules can be implemented in a special MISTRAL program and each violation will cause a message in the report file with the key number of the object and the kind of error.
Simulation of Micro Data and Analyses of the Results
- In UMDBS there are several workflows available for microsimulation that can be used very easily with an interactive flow diagram. Figure 5 shows one of the possible flows:
Figure 5. Simulation Workflow
- On the left there is the microdatabase that has normally been imported from external data previously.
In the middle of the flow there is the essential simulation model with the algorithms for the transformation of the microdatabase. For the simulation of persons it includes methods for birth, marriage, death, and so on.
- Microsimulation is a stochastic process. So probabilities have to be defined for the events, e.g. the probability for a person of a specific sex and age to die in this year. Depending on the number of attributes and their resolution, the probability tables can contain more than one thousand values for one kind of event and one period. For a long term simulation, several tables are needed for each probability because these values normally change over time. So it is recommended to store these parameters separately from the algorithms in the work flow module above.
- If there are exogenous time series data needed in the simulation, they can be stored in the Exogenous Time Series module. In figure 5 this is displayed in grey because no time series are used in the current model.
- These three parts of the model are MISTRAL programs. For (probability) parameters and time series a very restricted non algorithmic subset of MISTRAL is used. The programs can be edited with a text editor, part of UMDBS, and they will be stored as ASCII source code. Before running the simulation, the source codes are compiled with the included MISTRAL compiler and automatically checked for consistency. Afterwards the programs are interpreted by a virtual machine.
- On the right of the simulation model in the work flow (figure 5) there are four databases that will store the simulation results:
- A report for results (single lines of text or full tables) and - if necessary - debugging information.
- The new microdatabase that represents a state in a future period. It can be analysed with the interactive tools described above.
- A time series database for aggregated results with one single value for each simulation period (e.g. number of births or sum of income of all persons).
- A database with one- or two-dimensional distributions for each simulated period (e.g. the distribution of age or income).
- On the right side of the flow diagram there are two additional modules which can contain an analysis or a test program. In figure 5 only the test program is active. It is used to check the microdatabase after each simulation period with an extensive set of rules. If a violation is found, it is written in the report and optionally the simulation will be stopped.
- Such a test program is a very good method to check the formal correctness of the simulation model. It can be used to find hidden program bugs that are caused by a complex interaction of parts of the simulation model which may become visible only after a long simulation time.
Microsimulation Language MISTRAL
A core element of the UMDBS is the integrated MIcro Simulation, TRansformation, and Analysis Language MISTRAL. With its different sub languages it is suited for all the tasks inside UMDBS:
- import of external micro data
- (interactive) analysis of microdatabases
- definition of test programs
- definition of simulation models
- definition of (probability) parameters
- definition of exogenous time series
- It is an advantage for the user that he or she needs only one language for all these purposes. So the learning is easier and existing parts of a MISTRAL program (especially analysis or testing statements) can often be used in several applications.
Elements of MISTRAL
- The syntax and the structure of MISTRAL are based on the widely known programming language PASCAL. So many users can use MISTRAL without lengthy training. Beginners also can use the available literature about PASCAL.
- MISTRAL includes most of the original core of PASCAL. Some elements have been added which are particularly useful for microsimulation:
- In MISTRAL there are objects, classes, and methods but no inheritance. So it is an object-based language.
- Pointers to objects do not need the special syntax used in PASCAL (^), but are similar to the form used in SMALLTALK.
- A central data structure of MISTRAL is the set of object. It allows some very easy but powerful operations, e.g. the transformation and analysis of the set as well as running loops with all elements of the set.
- There are many functions available for stochastic simulation, e.g. random number generation and branches depending on probabilities.
- A special feature of MISTRAL is stochastic control structures. It is possible to run loops over all objects of a set in a random order. Furthermore sequences of statements can be run in a random order. Both are often necessary in microsimulation to imitate simultaneity of processes and events and avoid systematic biases.
- The access to other sources of data inside UMDBS (microdatabase, parameter, time series) is very easy. In addition, the consistency of all parts of the model is checked automatically before running the program.
- There are special data types for the simulation results: distributions of values and sets of events. Furthermore there are powerful commands to work with these types and write the results into a report. As most microdatabases use weighting factors, all analyses are available with these factors as well as without them.
- For the import of external micro data straightforward methods for parallel access to up to 15 ASCII files are available as well as several string operations.
- UMDBS forces a clear separation of different parts of the model (e.g. algorithms, parameters, and time series). In addition, it is possible to split each of these parts into an unlimited number of hierarchically organised files to make team work easier and to share parts of a model.
- Some of the elements and characteristics of PASCAL are not implemented because they are not needed in the field of microsimulation. Examples are input operations (e.g. READLN), access to binary files, complex data types (e.g. pointers and records beyond objects), and recursive definitions and calls.
Structure of MISTRAL
- It was mentioned above that MISTRAL is used in all the parts of UMDBS. Since for interactive queries only read access to the database is necessary, MQL has been defined as a subset of MISTRAL. On the other hand, all the elements of MQL are available also in simulation or test programs.
- There are definitions that are common to two or more parts of the entire model. For example, the definition of the classes and the types must be compatible in the microdatabase and the simulation model. The same applies to user defined types in the simulation model and the parameters. So these parts of MISTRAL are the common basis for the sub languages even if they are otherwise very different because of their specific purposes.
- The general structure of MISTRAL is shown in figure 6:
Figure 6. General Structure of MISTRAL
- Each level of MISTRAL includes all capabilities of the lower levels and adds some new elements. So the levels of Simulation and Generation have a maximum of functionality whereas the levels below Transformation do not include any statement to modify the microdatabase.
- The compiler and the runtime system of MISTRAL were implemented in the object-oriented programming language SMALLTALK. All parts (e.g. scanner, parser) are individual objects. The separate levels of MISTRAL were implemented with several classes of parsers within an inheritance structure.
- The front end of the compiler was implemented normally. As the whole system had to be implemented in SMALLTALK, no compiler tools (e.g. LEX or YACC) were used.
- The back end of the compiler differs from the normal structure because it generates neither binary code nor byte code for a normal virtual machine. Instead the output of the compiler is a data structure similar to an enhanced parse tree. This structure can be interpreted by the runtime system, similar to a virtual machine. The advantage of this solution is a significant reduction of programming time and a machine-independent implementation Furthermore changes and enhancements to the language can be done very easily.
Availability of UMDBS
- UMDBS runs on any PC under Windows (95 or later). No additional software is necessary (e.g. database programs or statistical software). However, the system offers bi-directional interfaces to EXCEL and statistical programs.
- The user of UMDBS can switch between German and English. The entire user interface is modified, including all system messages (e.g. compiler errors).
- The documentation consists of a user manual for UMDBS and a language reference for MISTRAL. In addition, an online tutorial (including example files) is available that gives a practical introduction to all important functions of the system. The documentation is so far only available in German. An English version is planned for the future if there is demand.
- UMDBS consists of one EXE file and about 100 DLL, all of which are stored in one directory. There are no entries in the Windows Registry or installation of files in central Windows directories.
- UMDBS was developed as part of academic research and is available free of charge for use in academic research or teaching. The system (including documentation and example files) can be downloaded at the URL
- For access a password is necessary and this can be obtained from the author (email: Thomas.Sauerbier@suk.fh-friedberg.de) . Further information about the licence conditions can be found at the URL above.
- With UMDBS a system for dynamic microsimulation is available free of charge for use in academic research and teaching. It runs on Windows PCs without any additional software.
- The clear separation of the simulation system and the models allows fast and easy working with several models. The capability of splitting the model into separated ASCII files improves co-operation in a team as well as the transmission of models and data between different teams.
- It is to be hoped that the availability of a tool like UMDBS will increase the usage of microsimulation to the extent that is appropriate for its potential. This method can now also be used by researchers who so far have not had the capacity or knowledge to implement their own microsimulators.
HEIKE, H.-D. and Sauerbier, Th. (1997): MISTRAL - a new object-based micro simulation language, In Bandilla, W. and Faulbaum, F. [Eds.]: SoftStat'97 - Advances in Statistical Software 6, Lucius & Lucius, Stuttgart, 1997, pp. 403-410
ORCUTT, G. (1957): A New Type of Socio-Economic System. In The Review of Economics and Statistics, No. 2, pp. 116-123
SAUERBIER, Th. (2001): UMDBS - Universal Micro DataBase System Version 3.0; Benutzerhandbuch; Fachbereich 1; Fachgebiet Statistik und Ökonometrie; Darmstadt University of Technology
SAUERBIER, Th. (2001): MISTRAL - Micro Simulation, Transformation, and Analysis Language Version 3.0; Sprachdefinition und Handbuch; Fachbereich 1; Fachgebiet Statistik und Ökonometrie; Darmstadt University of Technology
Return to Contents of this issue
Copyright Journal of Artificial Societies and Social Simulation,