estudos internacionais • Belo Horizonte, ISSN 2317-773X, v.6 n.3 (2018), p.82 - 97
Models, explanation, and the
pitfalls of empirical testing
Modelos, explicação e as armadilhas dos testes empíricos
DOI: 10.5752/P.2317-773X.2018v6.n3.p82
Enzo Lenine
Submitted in march 25, 2018
Accepted in may 22, 2018
Formal models constitute an essential part of contemporary political
science and International Relations. Their recent history is tightly tied to the
developments of Rational Choice Theory, which is considered to be the only
deductive theory in the social sciences. This unique character, especially its
manifestation through mathematical symbolisms, has caused profound schisms
and criticisms in the discipline. Formal models have constantly been accused of
being built on unrealistic assumptions of human behaviour and social structure,
rendering as a result either trivial predictions or no empirical prediction at
all. Nevertheless, these criticisms frequently ignore essential elements of the
concept of explanation and how it is applicable to formal modelling. In this
paper, I provide an approach to mathematical modelling that considers the
challenges of designing and performing empirical tests of predictions generated
by formal models. Rather than disqualifying or falsifying models, empirical
tests are paramount to the tailoring of more grounded explanations of political
phenomena and should be seen as a tool to enhance modelling. In this sense,
I scrutinise two examples of formal modelling in IR, and derive lessons for the
empirical testing of models in the discipline.
Keywords: formal models, rational choice theory, empirical testing, explanation
Os modelos formais constituem uma parte essencial da ciência política
contemporânea e das Relações Internacionais. Sua história recente está
fortemente ligada aos desenvolvimentos da teoria da escolha racional, que é
considerada a única teoria dedutiva nas ciências sociais. Este caráter único,
especialmente sua manifestação por meio de simbolismos matemáticos,
causou profundas divisões e críticas na disciplina. Os modelos formais têm
sido constantemente acusados de serem construídos com base em suposições
irrealistas do comportamento humano e da estrutura social, resultando em
previsões triviais ou nenhuma previsão empírica. No entanto, essas críticas
frequentemente ignoram elementos essenciais do conceito de explicação e
como o mesmo é aplicável à modelagem formal. Neste artigo, forneço uma
abordagem à modelagem matemática que considera os desaos de conceber e
executar testes empíricos de previsões geradas por modelos formais. Em vez de
desqualicar ou falsicar modelos, os testes empíricos são fundamentais para a
adaptação de explicações mais fundamentadas dos fenômenos políticos e devem
1. Enzo Lenine Nunes Batista Oliveira
Lima is Professor of International
Relations at the University of Interna-
tional Integration of the Afro-Brazilian
Lusophony (UNILAB-Malês), Brazil.
His research interests are mostly con-
nected to methodology, formal models
and rational choice theory, and hierar-
chies of knowledge. He has recently
published a bibliometric analysis in the
International Political Science Review
as part of his work on the hierarchies of
knowledge in the discipline.
ORCID: 0000-0001-5280-4252
Lima, Enzo Lenine Models, explanaon, and the pialls of empirical tesng
ser vistos como uma ferramenta para aprimorar a modelagem. Nesse sentido,
analiso dois exemplos de modelagem formal em RI e extraio lições para o teste
empírico de modelos na disciplina.
Palavras-chave: modelos formais, teoria da escolha racional,
teste empírico, explicação
Mathematics has aided a variety of sciences since the dawn of
times. Ancient civilisations relied on mathematical concepts and models
to describe the world around them. Models in particular constitute the
essence of modern physics and chemistry, but they are also important
tools in disciplines such as biology and social sciences. Political science
and International Relations (henceforth, IR) have beneted extensively
from models. Models of political phenomena have been designed mostly
under the framework of rational choice theory (henceforth RCT), which
seems plausible for RCT is the only deductive theory in the social sci-
ences. Mathematical models are intrinsically dependant on deduction to
connect assumptions, enhance logical arguments, and generate predic-
tions. It is only natural that a deductive theory would produce such kind
of models.
The rst eorts in modelling political phenomena may be traced
back to Borda’s and Condocert’s paradoxes, or, more recently, to spatial
models developed in the rst half of the 20th century by Harold Hotell-
ing (1929), followed by Duncan Black (1958) and Anthony Downs (1957).
Nevertheless, the use of models as methodological tools is usually attrib-
uted to Kenneth Arrow’s impossibility theorem, which relied on math-
ematical assumptions to advance arguments about preference aggrega-
tion. Game theory also became popular in political science and IR around
the same time, and together with spatial models, they account for the
bulk of mathematical modelling in the discipline.
Since the initial developments, mathematical models have caused
profound disputes within the discipline. Donald Green and Ian Shap-
iros (1994) classical critique – Pathologies of Rational Choice Theory – sum-
marises a great deal of the arguments against RC models, which tended
to be echoed by many political scientists despite the responses given by
RC theorists. In their piece, Green and Shapiro are profoundly concerned
about the great importance and visibility given to RCT and formal mod-
els, mentioning the increasing share of RC articles being published in the
American Political Science Review. Despite acknowledging the potential of
RCT, they believe the theoretical enterprise failed in its mission of provid-
ing explanations and predictions of concrete political phenomena, such as
voter turnout, legislative behaviour, electoral competition, and collective
action. Their focus is eminently on the empirical power of RCT, which
they deem limited.
The debate over the prospects of empirically testing RC models
has echoed in the discipline, and has been constantly used as an argu-
ment against formal modelling. If models cannot generate predictions
estudos internacionais • Belo Horizonte, ISSN 2317-773X, v.6 n.3 (2018), p.82 - 97
that are true to the real world, why should we bother designing such
models in the rst place? Wouldnt political science and IR do much bet-
ter with statistical tests and/or qualitative analyses? The rst question
is an issue of contention not only in political science and IR, but also in
philosophy. Models represent the real world, but to what extent their as-
sumptions should be true to the world is a matter that causes profound
disagreements. The second results from the historical and institutional
development of the discipline. Quantitative and qualitative methods have
granted political science and IR their scientic status, and still nowadays
they are the main terms under which most political scientists think about
methodology and, more importantly, about explanation.
However pertinent these questions may be, they tell only part of
the story of formal modelling. Models come in dierent avours, and
distinguishing between them is paramount to understand their raison
d’être, the predictions they generate, and the prospects of testing. Not
all models are empirically testable via statistics or qualitative methods.
Some models are fashioned to advance concepts, or to explain regulari-
ties observed in a specic set of phenomena, while others generate pre-
dictions that may be tested via statistical models. The main issue here is
that each type of model serves specic purposes, and empirically testable
models are just one type of models. Even when the prospects of empirical
testing are possible, methodological issues regarding mathematical struc-
tural compatibility and measurement might be of extreme importance
to dene whether an empirical test is de facto “testing” the predictions of
a given model. Throughout the remainder of this paper, I shall address
the challenges of formal modelling and empirical testing. The paper is
divided into three sections: the next section discusses philosophical as-
pects of models as representational devices, and the kinds of explanation
they produce. Then, I proceed to discuss the modus operandi of empirical
testing in political science and IR
, raising questions about measurement
and structural representation. I discuss these challenges in detail in the
third section, where I address potential sources of problems in empirical
testing of formal models, and how political scientists should cope with
them in their research.
Models and Explanation
Models and modelling have been discussed in philosophy and so-
cial sciences, although each discipline focuses on dierent aspects of the
same issue. Philosophers are usually concerned about the representa-
tional capabilities of models, i.e. how models represent the real world via
mathematical assumptions. The nature of models is described in dier-
ent forms: models as “autonomous agents” that “function as instruments of
investigation” (MORRISON; MORGAN, 1999, p.10); models as “abstract
objects constructed in conformity with appropriate general principles
and specic conditions” (GIERE, 2004, p. 747); models as “experiments in
thought about what would happen in a real experiment” (CARTWRIGHT,
2010, p. 19). Despite the semantic nuances of each denition, they share
the common understanding that models are designed to represent at least
2. A first version of this article has
been debated at the 2017 Annual
Conference of the Australian Society
for Quantitative Political Science held in
Wellington, New Zealand in
December 2017.
Lima, Enzo Lenine Models, explanaon, and the pialls of empirical tesng
some aspects of the world, for total representation is unattainable (and,
perhaps, undesirable).
In the social sciences, the representational character of models has
caused profound disputes. One of the main questions raised by sociolo-
gists, economists and political scientists, is to what extent models repre-
sent social phenomena, and how they produce explanations about the
real world. If the primary goal of modelling consists in explaining the
world via mathematical assumptions, then it is only reasonable to try
to assess how modellers use maths to represent real-world phenomena.
The literature usually focuses on two fronts to question the explanatory
power displayed by models: either researchers question how realistic a
models assumptions are; or they question a models predictions by con-
fronting them with empirical data.
The case of assumptions is paramount to the understanding of
modelling, for it raises questions about the process of designing a model.
Many assumptions usually entailed in models are known to be false: per-
fect information, transitivity, Homo economicus, just to name a few. For
those who adjudicate the value of models based solely on their assump-
tions, falsehoods cannot be ignored to assess a models explanatory pow-
er. According to them (CARTWRIGHT, 2010; REISS, 2013), unrealistic
assumptions render the model unrealistic, even if it generates predictions
via a logical process. Taking this stance, however, seems too radical, for
it ignores that models are not supposed to truly represent every single as-
pect of reality. It is the modeller’s job to eschew falsehoods when deriving
explanations. As Hausman (2013, p. 252) states:
What one needs to inspect is not the model but the application of a model in a
particular explanation. Such applications typically do not make use of all the as-
sumptions within the model and so obviously do not rely on those assumptions
that they do not make use of.
It is also important to note that models are designed for particular
target systems, providing explanations based on the logical implications
of the initial assumptions. In this sense, a model is “partially isomorphic
to the real world” to the extent that
some assumptions that dene the model match some of the assumptions met
in the real world, the target system. What we need in terms of truth values to
make this happen is a claim indicating precisely which aspects of the model
identify which aspects of the target system” (ROL, 2013, p. 246).
Therefore, the existence of falsehoods should not automatically
doom a model as unrealistic, unsuccessful or false (MÄKI, 2013). As Day
(1990, p. 286) asserts, models consist of “a structuring of the situation (ac-
tual or hypothetical) so that a theory can be applied.
This structuring
establishes the links between assumptions, and hence the explanatory
mechanisms of a given set of phenomena.
The essence of modelling lies on producing explanations about
mechanisms operating in real-world phenomena. To be sure, the scien-
tic endeavour consists in predicting events, and models are tailored to
generate such predictions, and hence explanations. The nature of expla-
nations produced by models falls into the domain of what Dowding (2016,
p. 2-50) denes as type and token explanation. Type explanations concern
3. Day (1990) is mostly concerned about
the links between model design and
theories. In his view, “[c]onstructing a
model (i.e., a structuring of a situation
for theory application) can be intimately
related to the level of representation
of the associated theory” (DAY, 1990,
p. 290). In dealing with the no-slip
boundary condition in fluid mechanics,
he illustrates not only the usefulness,
but also the tension caused by rival
theories, and how models can be used
to solve for this tension.
estudos internacionais • Belo Horizonte, ISSN 2317-773X, v.6 n.3 (2018), p.82 - 97
general phenomena, such as the tragedy of commons, veto power, para-
doxes of social choice, etc. Token explanations are provided in terms of
cases and data, meaning that they are case-specic. Models are closely
related to type explanations, for they articulate assumptions that unravel
the mechanisms operating in the macro-level. Take, for example, the case
of Navier-Stokes equations, a set of non-linear dierential equations that
model uid behaviour: they can be used (and simplied) to predict a va-
riety of uid behaviour, such as water ow in a pipe, the air currents on
an airplane wing, the weather, blood ow etc. The mathematical model
comprises the mechanisms operating in uid ow (viscosity, pressure,
stress, external forces), and solving for the equations generate predictions
about how the uid will behave given initial and boundary conditions.
This distinction between type and token explanation does not pre-
vent critics of modelling from attacking models based on their lack of
empirical evidence. It is only natural that the term prediction generates
expectations about confronting a models conclusions/hypotheses with
empirical data. Nevertheless, this interpretation is misleading, for it only
tells one part of the story about models. As mentioned previously, models
come in dierent avours, and some types are not suited for empirical
testing. I hereby identify three classes of models: conceptual, quasi-con-
ceptual, and extrapolative.
Conceptual models aim to enhance certain conceptual and theo-
retical arguments, resorting to the logical language of mathematics. Set
theory and game theory are the most common mathematical tools in this
class of models. This is the case of Arrow’s theorem, which is a set of logi-
cal deductions based on set theory. The Shapley-Shubik index and Thomas
Schelling’s model of segregation are also examples of conceptual models
that resort to similar tools. In terms of generating explanation, conceptual
models oer predictions by unravelling, through mathematical expression,
the mechanisms underlying political phenomena. They are not testable in
the sense that one could t data into them (which is precisely the case of
the aforementioned examples). An analogy could be drawn with the third
law of thermodynamics, which states that, at absolute zero, the entropy of
a perfect crystal is equal to zero. However, as cooling to absolute zero is un-
attainable, the third law remains as a conceptual model about what would
happen if we could reach such a temperature.
Quasi-conceptual models are designed to explain regularities and
patterns observed in data, but which lack an explanatory mechanism.
In physics and mathematics, this is comparable to conservation laws: an
explanatory model for why some physical quantities were conserved was
absent until Emmy Noether published her theorems in 1915 and 1918
(BAYER, 1999). Thanks to her model, a pervasive regularity – conserva-
tion of momentum and energy – was fully explained. Similarly, political
scientists might be interested in designing mathematical models capable
of binding data together via an explanatory mechanism. Anna Bassis
(2013) model of endogenous government formation and Torun Dewan
and Arthur Spirling’s (2011) model of collective decisions in Westmin-
ster systems are examples of the quasi-conceptual class. It is worth not-
ing that quasi-conceptual models might resort to additional instruments
Lima, Enzo Lenine Models, explanaon, and the pialls of empirical tesng
to advance their claims, such as visual representations and/or historical
evidence, as in Giannetti; Sened (2004). Nonetheless, the purpose of such
quasi-conceptual models is still to oer predictions of certain phenomena
via a mathematical construction that captures the regularities in data.
They are, therefore, accommodationist.
Finally, extrapolative models comprise all models that are suit-
able for an empirical test. Nevertheless, tests might dier in respect to
how they are performed. The standard approach is to test the proposi-
tions and theorems of a model by the means of a statistical model. This
outcome-oriented test validates explanations based upon the results ob-
tained through statistics. Peter Partell and Glenn Palmer’s (1999) test of
James Fearon’s (1999) audience costs model falls into this category, as
well as many others. I call this approach the data-t extrapolative mod-
el. Yet there is still the possibility of deriving the statistical test directly
from the formal model, attempting to represent the particularities of the
mathematics into the statistics. Curtis Signorino (1999, 2003) oers the
best example of this approach: his strategic interaction game derives a
statistical test that accounts for the uncertainties entailed in the formal
model. Cliord Carrubba et al. (2007) follow a similar strategy, but in-
stead of designing a stochastic model, they derive a comparative statics
model of decision-making. I call this approach a mathematical-statistical
(or math-stats) model, whose main characteristic consists in bridging the
mathematical model and the statistical test. They generate explanation
by representing the mechanisms entailed in the models assumptions in a
statistical test. Table 1 sums up these classes.
Extrapolative models are intimately related to the criticisms found
in the literature of political science. Donald Green and Ian Shapiro’s fa-
mous critique scrutinised a set of popular models in the discipline, using
empirical success as the main criterion for evaluating models’ explana-
tory power (GREEN; SHAPIRO, 1994). The responses were immediate,
with many political scientists criticising their biased approach to model
testing (FIORINA, 1995; LOHMANN, 1995; COX, 1999, 2004). Neverthe-
less, the empirical criticism has echoed in the discipline, and some have
come to defend that models should be rather seem as fables, whose con-
clusions and predictions are better understood as the moral lessons in Ae-
sopian stories (CARTWRIGHT, 2010; RUBINSTEIN, 2012; JOHNSON,
2017). Nonetheless, empirical testing still remains an issue for modelling,
at least for one type of models (namely, extrapolative). I shall deal with
the prospects of empirical testing and the diculties they entail in the
next section, highlighting mathematical aspects of this debate and their
implications to modelling in political science and IR.
Empirical tests and Extrapolative models
Traditionally, political scientists of the quantitative tradition have
resorted to statistics to test potential predictions, and hence explanations,
of political phenomena. Quantitative scholars are naturally familiar with
numerical expressions that “produce a more precise description of con-
cepts and relationships than ordinary language” (LUPIA; ALTER, 2014,
estudos internacionais • Belo Horizonte, ISSN 2317-773X, v.6 n.3 (2018), p.82 - 97
p. 54). Explanation in the quantitative tradition comes in the form of infer-
ential propositions, tested via statistical models for causal relationships.
Table 1- Typology of models
Type Description Explanation Examples
These models advance
concepts and predictions
via mathematical
expressions derived from
set theory and game
theory. They are not
empirically testable.
The logical-
expressions generate
unravelling the
mechanisms implied
by the model’s
• Kenneth Arrow’s (1953)
impossibility theorem
• Shapley-Shubik (1988)
• Third law of
The model explains
an observed empirical
regularity by resorting to
mathematical deductions.
Data come rst and the
model explains their
Mathematics binds
together the patterns
in data, building
an explanatory
mechanism via the
assumptions in the
• Anna Bassi’s (2013) model
of endogenous government
• Torun Dewan and Arthur
Spirling’s (2011) model
of collective decisions in
Westminster systems
• Emmy Noether’s theorem
Mathematical model
and statistical test are
not structurally linked
via mathematical
expressions. Hypotheses
are formulated based on
the model’s propositions
and theorems, and then
subject to an appropriate
statistical test.
Explanation results
from the test of
hypotheses on the level
of outcomes.
• Peter Partell and Glenn
Palmer’s (1999) test of the
audience costs model
• Craig Volden and Cliorf
Carrubba (2004)
Statistical tests are
derived directly from the
mathematical model.
In this case, the test
represents the details of
the model. There is a
structural, mathematical
link between the formal
model and the statistical
Explanation results
from the test of
hypotheses on the level
of mechanisms.
• Stephen Ansolabehere et
al. (2005) measurement of
voting weights
• Cliord Carruba et al.
(2007) comparative statics
model of strategic
• Curtis Signorino’s (1999,
2003) strategic interaction
• LIGO experiment on
gravitational waves
Source: Authors research, 2018
Most courses on quantitative methodology present the standard re-
search cycle, which resembles the classical scientic method of the natural
sciences. The researcher departs from a given phenomenon in the real world
which demands some sort of explanation. She then proceeds to formulate
hypotheses that shall be tested for their explanatory power. Variables and
relationships between them are assigned to each hypothesis the researcher
is willing to test. Results are expressed in probabilistic terms and intervals
of signicance, and might accept or reject the initial hypothesis (or certain
hypotheses). This cycle has become so pervasive in the discipline that most
Lima, Enzo Lenine Models, explanaon, and the pialls of empirical tesng
quantitative articles follow a standardised structure, where they present
the hypotheses, dataset, statistical test to be performed, and results.
For some time, and more specically under the inuence of King
et al. (1994), quantitative scholars have believed that the logic of inference
of their tradition was the only one capable of pinning down causation
(DOWDING, 2016, p. 162). Evidently, that caused profound disagree-
ments with other scholars, namely qualitative/interpretive researchers
and experimentalists. Despite recent developments of more sophisticated
statistical tools to analyse political phenomena, and the dialogue with
experimental political science, there are still questions about the type of
explanation generated by statistical tests. To be sure, quantitative meth-
ods are the only ones capable of identifying empirical generalizations,
but this is not an automatic result from any research design. Moreover,
identifying patterns in data is just one step in the long process of under-
standing the explanatory mechanisms underlying a given phenomenon.
Given the challenge posed by the essence of explanation, it would
seem only natural that modellers and quantitative scholars would work
closely together in a cooperative fashion. This is not always the case, and
much to the contrary, modellers and quantitative researchers often see each
other with some level of suspicion. R. Luce (apud TAAGEPERA, 2008, p. 5),
speaking from modellers’ perspective, summarises this tension as follows:
Model builders nd inferential statistics of remarkably limited value. In part,
this is because the statistics for most models have not been worked out; to do so
is usually hard work, and by the time it might be completed, interest in the mo-
del is likely to have vanished. A second reason is that often model builders are
trying to select between models or classes of models, and they much prefer to
ascertain where they dier maximally and to exploit this experimentally. This
is not easy to do, but when done it is usually far more convincing than a fancy
statistical test.
Luce raises questions about the prospects of empirically testing mod-
els that have been tackled only recently. Most of the literature in the quan-
titative tradition rests on the premise that hypotheses can be deduced from
real-world phenomena and tested via statistical models. There is an under-
standing that a well-performed test suces to at least identify potential
causal relations between variables. This might hold as long as the research-
er can prove that the test is statistically suitable to deal with her research
design. However, when it comes to formal models, problems might occur
in the process of translating a models assumptions into a statistical test. In
order to make this argument clearer, I will briey present two approaches
to testing formal models that demonstrate the translation challenge.
James Fearons model of audience costs
In 1994, James Fearon published in APSR a formal model of inter-
national crisis, where domestic aspects inuence the nal outcome of cri-
sis bargaining. In his own words (FEARON 1994, p. 577):
I characterize crises as political contests with two dening features. First, at each
moment a state can choose to attack, back down, or escalate the crisis further.
Second, if a state backs down, its leaders suer audience costs that increase as the
crisis escalates. These costs arise from the action of domestic audiences concerned
with whether the leadership is successful or unsuccessful at foreign policy.
estudos internacionais • Belo Horizonte, ISSN 2317-773X, v.6 n.3 (2018), p.82 - 97
Without going deep into details, Fearon’s model rests on a
game-theoretical approach, where audience costs are modelled as a
linear increasing function, and the crisis unfolds as a series of stages
prior to the commencement of war. Once the time horizon is reached,
war is waged.
The literature that followed Fearons game focused on testing a
variety of hypotheses derived from the outcomes of the original model.
Joe Eyerman and Robert A. Hart (1996), for example, designed a Poisson
test based on the SHERFACS phase-disaggregated conict management
dataset, and used measures of democracy to assess audience costs. Peter
Partell ;Glenn Palmer (1999) followed a similar approach, but opted for
logit and maximum likelihood tests, and used institutional constraints as
a proxy to measure audience costs. In both works, the assumption of the
existence of audience costs is taken for granted, and no attempt is made to
measure them directly. The rst study where audience costs were experi-
mentally measured was conducted by Michael Tomz (2007), which was
later praised by Erik Gartzke; Yonatan Lupu (2012) as a valuable empirical
proof of the validity of Fearons model.
The puzzling issue about the literature on audience costs – and
which echoes in other elds – is that data-t extrapolative models are test-
ed solely for their outcomes. Mathematically speaking, there is no struc-
tural correspondence between the test and the model. In the aforemen-
tioned cases, authors assumed that audience costs existed, and attempted
to check whether Fearon’s conclusions would hold when confronted with
empirical data. Nevertheless, by focusing in the outcome, they ignored a
more basic question about the underlying assumption in the model, i.e.,
the very existence of audience costs. Luckily, Tomz’s research ended up
proving that they do exist, but his results are rather silent about the math-
ematical behaviour of the audience costs function. In the original model,
it is assumed to be linear for non-stated reasons (probably, for simplicity,
but Fearon says nothing about that). Counterfactually, one could raise
the question that the function is non-linear, which could potentially lead
to dierent conclusions. Therefore, this casts doubts whether the model
was eectively tested, for an essential feature in its construction was as-
sumed to be true instead of being subject to scrutiny.
This pattern is observed in other elds where extrapolative mod-
els are subject to data-tting. Stephen Ansolabehere et al. (2005), writing
about coalition theory, criticises the use of number of seats as a proxy to
measure voting weights, which are a constitutive element of a variety of
coalition models. At a rst glance, it would sound simply as a technicality
conned to the realms of variable measurement. However, it has pro-
found implications to the prospects of testing a model in terms of what
is being represented in the test. If the test fails to translate the constitu-
tive parts of the model, how can one be sure about its validity? From a
mathematical perspective, one cannot be sure about the results of a test
that do not truly reproduce a models assumptions and propositions. Evi-
dently, and as pointed by Luce (2008), translating a model is no easy task,
but some political scientists have already started to work on math-stats
extrapolative models, as I will present next.
Lima, Enzo Lenine Models, explanaon, and the pialls of empirical tesng
Curtis Signorino’s strategic interaction game
In 1999, Curtis Signorino published in APSR the rst of a series
of articles on a substantively dierent approach to extrapolative models.
Drawing on the work of Bruce Bueno de Mesquita and David Lalman
(1992), Signorino developed a mathematical-statistical model which was
directly derived from the strategic game of international conict. He
made his proposal even clearer in a paper published in Political Analysis in
2003, where he derives three models based on distinct sources of error in
the analysis of strategic games.
The quintessence of Signorinos proposal consists in translating mod-
els in ways that respect their structure, especially if they contain nonlin-
earities and sources of error. Paying attention to the structure of the game
requires designing mathematical bridges between the assumptions/propo-
sitions and the statistical test before feeding the model with data. Signorino
does so by specifying utility functions with regressors and error terms, and
often by testing them via Monte Carlo simulations.
Cliord Carrubba et al.
(2007a, 2007b) acknowledge the need for building the structural connection
between model and test, but disagree with Signorino about his stochastic
approach, favouring instead models that rely on simpler maths and com-
parative statics. Nonetheless, what we can learn from this disagreement is
that dierent statistical models can be implemented to test a formal model,
as long as they are properly derived from the latter.
The maths-stats class of models oers an extra advantage in setting
the boundaries for a models application, and hence testing. Boundaries
dene the range of applicability of a given model based on its assump-
tions. For example, in structural engineering the widely used innitesi-
mal stress-strain tensor oers simpler expressions to tackle problems of
small displacements, where the eects caused by a given load do not in-
teract (allowing, consequently, for the application of the linearity prin-
ciple of superposition). Larger displacements require other approaches,
such as the Lagrangian stress-strain tensor. The latter encompasses a
larger set of cases, including those ones solvable by the innitesimal ten-
sor. Yet if one is working within the domain of small displacements, then
the innitesimal tensor is a natural and practical choice. Nevertheless,
it would not generate right predictions if applied in contexts that do not
respect its boundaries. In political science and IR, parallels can be drawn
in the same lines. Failing to represent a model accordingly to its structure
may result in inaccurate predictions. In his response to Carrubba et al.
(2007a) and Signorino (2007) presents an argument that follows this line
and summarises his approach to math-stats extrapolative models:
Although deterministic models may under certain conditions approximate the
relationships in models with uncertainty, in many other situations the predic-
tions will be very dierent. If one’s theoretical model includes uncertainty (e.g.,
private information or agent error), then the equilibrium conditions should be
derived based on the assumed uncertainty. That was actually one of the points
of Signorino (2003). If one wants to conduct comparative statics analysis, one
should then do so based on the equilibrium conditions for the theoretical model
with uncertainty. Similarly, derivation of an estimator, observable implications,
or insights for model specication should be based on the equilibrium condi-
tions of the model with uncertainty. (SIGNORINO, 2004 , p. 494)
4. Monte Carlo methods consist of
computational algorithms based on
randomness used to solve mathematical
problems where repeated iterations are
necessary. Randomness is introduced
artificially and is typically used for:
sampling, estimation, and optimisation
(KROESE et al. 2014). Monte Carlo
simulations allow for “exploring and
understanding the behaviour of random
systems and data” by carrying out
“random experiments on a computer and
[observing] the outcomes of these experi-
ments” (KROESE et al. 2014, p. 387).
estudos internacionais • Belo Horizonte, ISSN 2317-773X, v.6 n.3 (2018), p.82 - 97
Summing up, Signorino’s works address the structural part of mod-
elling and testing. By raising the problem of misspecication (SIGNORI-
NO; YILMAZ, 2003) and the issue of nonlinearities (SIGNORINO, 1999,
2003), he and his colleagues call for a dierent methodological approach
to extrapolative models. Building the bridge between the assumptions
and propositions entailed in the model, and the equations in the statisti-
cal test adds mathematical consistency to the analysis.
Evidently, the literature comprises models that fall into the two
classes of extrapolative models. Despite the acknowledging of the need
for deeper reection about how statistical tests translate models assump-
tions, many researchers still nd it hard to bridge both models. This is
why data-t models are still popular and a recurring tool for empirical
testing of formal models. This is not necessarily undesirable, as long as
researchers keep in mind the explanatory limits imposed by their tests.
Setting the boundaries of explanation is a legitimate issue, and it might
prevent political scientists from reaching conclusions that are not as com-
prehensive as they would want them to look like.
The pitfalls of empirical testing
The examples of the previous section draw attention to two sets of
pitfalls of empirical testing, namely: 1. the challenge of measuring vari-
ables and assumptions of formal models; and 2. the structural correspon-
dence between mathematical model and statistical test. The rst is a ma-
jor concern for empirical works of any kind, but it plays an important role
in assessing the explanatory power of extrapolative models. The second,
however, is specic to the math-stats class of models, more importantly,
to the issue of structural translation.
The literature on audience costs illustrates one of the most press-
ing challenges in quantitative research and formal modelling: that of
measurement. Often political scientists have to measure unobservable
variables (such as values and attitudes) in their models, which requires
a great deal of methodological eort to capture valuable and useful in-
formation. Measuring is complicated in all sciences, and, perhaps, one of
the toughest tasks a scientist might perform in her daily routine. This is
so, because one has to oer concrete evidence about the variable being
measured. Furthermore, in order for a set of measurements to be valid,
coherent theoretical models that connect empirical evidence with prop-
erties of a given phenomenon must be precisely dened. In structural en-
gineering, for example, displacement is the measure which is conceptu-
ally connected to the stress-strain models of mechanical behaviour: once
one measures displacement, all calculations can be performed to solve for
stresses, strains, and mechanical properties.
Measuring demands caution and, at the same time, creativity when
testing formal models. The recent observation of gravitational waves il-
lustrates the importance of both skills. Albert Einsteins theory of rela-
Lima, Enzo Lenine Models, explanaon, and the pialls of empirical tesng
tivity predicted gravitational waves, but for a long time, scientists were
unable to nd any evidence of their existence. It was only in 2016 that
the phenomenon was observed for the rst time by the Laser Interferom-
eter Gravitational-Wave Observatory (LIGO) and Virgo. In order to measure
gravitational waves, scientists departed from the assumption that they
should behave as waves, and as such they would be expected to distort
the fabric of space-time. Building upon the concept of classical interferom-
eters, LIGO was constructed as a larger version of such devices to detect
specically that distortion. Scientists were cautious enough to guarantee
that their measurements were rmly grounded in the model, which was
essential to the success of their endeavour.
Political scientists need similar skills to ensure that their measure-
ments t models’ assumptions and outcomes. As we have seen, the classi-
cal approach to measurement follows the dicta of statistical tests. This is
not a problem per se, but formal models tend to be silent about how tests
should be performed and, more importantly, how variables should be
measured. Correct measurements – in terms of what is being measured
– are paramount to assessing the robustness and results of a given test,
especially when dealing with math-stats models. One cannot guarantee
whether a test corroborates or not a models predictions if measurements
do not correspond to the assumptions. Nor a test can be said to be appro-
priate if data are pushed into the model without taking into consideration
measurement reliability.
Quantitative political scientists have been developing a variety of
tools to ensure measurement validity and reliability (JACKMAN, 2008).
Attention has also been drawn to the eects of errors and bias, not to
mention to uncertainty (SIGNORINO, 2003). Formal models, and name-
ly math-stats models, may benet from the incorporation of such tools
and concerns into their structure. In terms of data-t models, building a
compelling case is of uttermost importance to guarantee that all essential
parts of the statistical test are properly connected to models assumptions
and outcomes.
Nonlinearities and structural translation
Despite the temptation of trusting linearity, real-world phe-
nomena are pervaded with nonlinear effects. Our brains are wired to
think in terms of linear relationships, preferring models where vari-
ables behave in a more-or-less linear fashion to those where variables
take nonlinear paths. However, nature and society display a variety
of phenomena that do not follow the tenets of linearity. Turbulence,
fracture propagation, combustion, conflict escalation are just a few
examples of nonlinearities.
When modelling, political scientists sometimes have to represent
nonlinearities. This is the case of uncertainty, which was the main is-
sue in Signorinos works. Failing to incorporate nonlinearities in a model
might aect its explanatory power, especially if it is suciently complex
(for example, when it entails subgames, institutional constraints, signals).
However, the challenge becomes more prominent when nonlinear mod-
estudos internacionais • Belo Horizonte, ISSN 2317-773X, v.6 n.3 (2018), p.82 - 97
els are subject to statistical tests. In this case, structure matters, and fail-
ing to adequately represent a models structure may generate incorrect
outcomes or, in a less dramatic scenario, outcomes that are conned to
certain boundaries where linearity applies.
Representing nonlinearities in math-stats models in particular
poses the challenge of mathematical tractability. Equations may become
too complex to be solved in due time, for the more complex they are,
the more computational power is required to perform calculations. Po-
litical scientists have to be aware about this issue whilst translating their
models into statistical tests. Unfortunately, there is no denitive recom-
mendation, besides the fact that one should take into account some fac-
tors such as: 1. analytical tractability of the equations; 2. computational
power to solve for them; 3. boundaries between which the solutions are
applicable. In the case of game-theoretical models that are constructed in
sequential moves, Signorinos works may serve as a point of departure to
implement a math-stats model. For larger sequential games, where the
game tree leads to a great number of nodes and outcomes, breaking them
into smaller parts and solving for them separately may oer partial re-
sults whilst eschewing the divergence caused in the process of solving for
nonlinear equations.
Another approach with which political scientists are less familiar
is suggested by Rein Taagepera (2008, chapter 4). When discussing the
functional forms quantitatively predictive logical models can assume, he
recommends political scientists to extrapolate from the classical linear re-
gression equation and look for functionals based on boundary conditions
and logical considerations. In mathematical terms, it means expressing
the problem in terms of dierential equations. Solving for dierential
equations results in functional forms that respect boundary and initial
conditions, and the resulting functional is often nonlinear. Nevertheless,
basic mathematics does not suce to deal with such equations, meaning
that specic training would be necessary to model political phenomena
in these terms. The consequences, however, would be profoundly posi-
tive to the understanding of politics, for political scientists would rst
think about each situation in a phenomenon-oriented fashion instead of
automatically tting whatever they observe in the real world into a sta-
tistical test.
Summing up, measurement and nonlinearities represent part of
the challenges faced by political scientists when testing formal models.
They pose diculties to claims over a models explanatory capabilities,
for poorly designed tests do not shed light on the crucial issue of whether
a model is capable of predicting real world phenomena. A well-designed
test has to reect to some extent the assumptions entailed in the model
in order to have a say about its explanatory power. As Dowding (2016, p.
173) states:
One must always ask when reading formal models how robust the conclusions
are with regard to the assumptions. The more robust the conclusions, the less
important specic assumptions; if a result rests upon a key assumption, how far
that assumption is descriptively accurate (either motivationally for an agent or
structurally for the system) will determine how useful the model is.
Lima, Enzo Lenine Models, explanaon, and the pialls of empirical tesng
Final remarks
Throughout this paper
, I have raised questions about the prospects
of empirically testing formal models. Despite the widespread temptation
to subject models to statistical scrutiny, only a certain class of models
– extrapolative – are suitable for this kind of test. Conceptual and quasi-
conceptual models, which I have not explored in detail here, present dis-
tinct features that are not t to statistical tests.
When performing tests of formal models, researchers must be at-
tentive to measurement issues. Models often express their propositions
in terms of variables that have not been previously measured or whose
data may not be directly available. Much on the contrary, modellers are
rather silent about how their models can be implemented via a statistical
test. Creativity and caution are necessary to ensure a test matches with a
models assumptions and propositions.
Ideally, researchers would attempt to represent the mathematics of
a formal model into a viable statistical test. This is no easy task, for the
translation of a models structure might entail nonlinearities and com-
plicated behaviour that would render the test intractable. Nonetheless,
adequate translation is paramount to shed light on models’ explanatory
power. It is also the key to test rival models against evidence, and assess
which is descriptively more accurate.
Numerics and computational simulations are useful tools to deal
with nonlinearities and to test math-stats extrapolative models. Political
scientists should take advantage of them to assess the tractability of their
models and solve for them whenever analytical solutions are not avail-
able. Furthermore, these tools allow for the implementation of nonlinear
elements, such as imperfect information and heuristics, and for iterative
solutions. They might be part of the creative solutions researchers need
devise to oer accurate explanations about political phenomena. This is
a challenging process, though an essential one in order to unravel the
mechanisms operating in the real world.
ANSOLABEHERE, S. et al. Ting. Voting Weights and Formateur Advantages in the Formation
of Coalition Governments. American Journal of Political Science, vol. 49, n. 3, 2005, p. 550-563.
ARROW, K. Social Choice and Individual Values. New Haven: Yale University Press, 1953.
BASSI, A. A Model of Endogenous Government Formation. American Journal of Political
Science, vol. 5, n.4, 2013, p. 777-793.
BAYER, N. The Heritage of Emmy Noether in Algebra, Geometry, and Physics. Israel Mathe-
matical Conference Proceeeding, vol 12. 1999. Disponível em: <
articles/noether.asg/noether.html>. Acesso em 4 mar. 2018.
BLACK, D. The Theory of Committees and Elections. Cambridge: Cambridge University
Press, 1958.
BUENO DE MESQUITA, B.; LALMAN, D. War and Reason: Domestic and International Impe-
ratives. New Haven: Yale University Press, 1992.
CARRUBBA, C. J.; YUEN, A.; ZORN, C. In Defense of Comparative Statics: Specifying Empi-
rical Tests of Models of Strategic Interaction. Political Analysis, vol. 15, n. 4, 2007a, p. 465-482.
5. I thank Dr. Shawn Treier (Australian
National University) and Shirley Duarte
(Federal University of Sergipe) for their
valuable comments on the first version
of this article.
estudos internacionais • Belo Horizonte, ISSN 2317-773X, v.6 n.3 (2018), p.82 - 97
CARRUBBA, C. J.; YUEN, A.; ZORN, C. Reply to Signorino. Political Analysis, vol. 15, n. 4, ,
2007b, p. 502-504.
CARTWRIGHT, N. Models: Parables v Fables. In: FRIGG, R.; HUNTER, M. Beyond Mimesis
and Convention: Representation in Art and Science. Amsterdam: Springer Netherlands, 2010
COX, G. The Empirical Content of Rational Choice Theory: A Reply to Green and Shapiro.
Journal of Theoretical Politics, vol. 11, n. 2, 1999, p. 147-169.
COX, G. Lies, Damned Lies, and Rational Choice Analyses. In: SHAPIRO, I; SMITH; R. M.;
MASOUD, T. E. Problems and Methods in the Study of Politics. Cambridge: Cambridge Uni-
versity Press, 2004.
DAY, M. A. The No-Slip Condition in Fluid Dynamics. Erkenntnis, vol. 33, n. 3, 1990, p. 285-296.
DOWDING, K. The Philosophy and Methods of Political Science. London: Palgrave Macmil-
lan, 2016.
DOWNS, A. An Economic Theory of Democracy. New York: Harper, 1957.
EYERMAN, J.; HART, R. A. Jr. An Empirical Test of the Audience Cost Proposition. Journal of
Conict Resolution, vol. 40, n. 4, p. 597-616, 1996.
FEARON, J. D. Domestic Political Audiences and the Escalation of International Disputes. Ame-
rican Political Science Review, vol. 88, n. 3, 1994, p. 577-592.
FIORINA, M. Rational Choice, Empirical Contributions, and the Scientic Enterprise. Critical
Review: A Journal of Politics and Society, vol 9, n. 1-2, 1995 , p. 85-94.
GARTZKE, E.; LUPU, Y. Still Looking for Audience Costs. Security Studies, vol. 21, n. 3,
2012, p. 391-397.
GIANNETTI, D.; SENED, I. Party Competition and Coalition Formation. Journal of Theoreti-
cal Politics, vol. 16, n. 4, 2004, p. 483-515.
GIERE, R. How models are used to represent reality. Philosophy of Science, vol. 71, n. 5, 2004,
GREEN, D.; SHAPIRO, I. Pathologies of Rational Choice Theory. New Haven: Yale University
Press, 1994.
HAUSMAN, D. M. Testing’ Game Theory. Journal of Economic Methodology, vol. 12, n.2,
2005, p. 211-223.
HOTELLING, H. Stability in Competition. Economic Journal, vol. 39, n. 153, 1929 , p. 41-57.
Oxford Handbook of Political Methodology. Oxford: Oxford University Press, 2008.
JOHNSON, J. Models-As-Fables: An Alternative to the Standard Rationale for Using Formal
Models in Political Science. Annual Meetings of the Midwest Political Science Association,
Roundtable: New Directions in Formal Theory, mar. 2017.
KING, G.; KEOHANE, R. O.; VERBA, S. Designing Social Enquiry. Princeton: Princeton Uni-
versity Press, 1994.
KROESE, D. P. et al. Why the Monte Carlo Method is so Important Today. Computational Sta-
tistics, vol. 6, n. 6,, 2014, p. 386-392.
LIMA, Enzo Lenine Nunes Batista Oliveira.Mathematics in political science: an explanation-
-oriented typology of rational choice models. 2018, Tese (Doutorado ) - Programa de Pós-
-Graduação em Ciência Política da Universidade Federal do Rio Grande do Sul, Porto Alegre.
LOHMANN, S. 1995. The Poverty of Green and Shapiro. Critical Review: A Journal of Politics
and Society, vol. 9, n. 1-2, 1995 , p. 127-154.
LUPIA, A.; ALTER, G. Data Access and Research Transparency in the Quantitvative Tradition.
Political Science & Politics, vol. 47, n. 1, 2014, p. 54-59.
MÄKI, U. On a Paradox of Truth, or how not to obscure the issue whether explanatory models
can be true. Journal of Economic Methodology, vol. 20, n. 3, 2013, p. 268-279.
MORRISON, M.; MORGAN, M. Models as Mediating Instruments. In: MORRISON, M.; MOR-
GAN, M. Models as Mediators: Perspectives on Natural And Social Science. Cambridge:
Cambridge University Press, 1999.
PARTELL, P. J.; PALMER, G. Audience Costs and Interstate Crises: An Empirical Assessment
of Fearon’s Model of Dispute Outcomes. International Studies Quarterly, vol. 43, n. 2, 1999,
p. 389-405.
Lima, Enzo Lenine Models, explanaon, and the pialls of empirical tesng
REISS, J. The Explanation Paradox Redux. Journal of Economic Methodology, vol. 20, n. 3,
2013, p.280-292.
ROL, M. Reply to Julian Reiss. Journal of Economic Methodology, vol. 20, n. 3, 2013, p. 244-249.
RUBINSTEIN, A. Economic Fables. Cambridge: Open Book Publishers, 2012.
SHAPLEY, L. S.; SHUBIK, M. 1988. A Method for Evaluating the Distribution of Power in a
Committee System. In: ROTH, A. E. The Shapley Value: Essays in Honor of Lloyd S. Shapley.
Cambridge: Cambridge University Press 1988.
SIGNORINO, C. S. Strategic Interaction and the Statistical Analysis of International Conict.
American Political Science Review, vol. 93, n. 2, 1999, p. 279-297.
SIGNORINO, C. S. 2003. Structure and Uncertainty in Discrete Choice Models. Political Analy-
sis, vol. 11, n. 4, 2003, p. 316-344.
SIGNORINO, C. S.; YILMAZ, K. Strategic Misspecication in Regression Models. American
Journal of Political Science, vol. 47, n. 3, 2003, p. 551-566.
SIGNORINO, C. S. On Formal Theory and Statistical Methods: A Response to Carrubba, Yuen
and Zorn. Political Analysis, vol. 15, n. 4, 2007, p. 483-501.
SUGDEN, R. Explanations in Search of Observations. Biology and Philosophy, vol. 26, n. 5,
2011, p. 717-736.
TOMZ, M. Domestic Audience Costs in International Relations: An Experimental Approach.
International Organization, vol. 61, n. 4, 2007, p. 821-840.
VOLDEN, C.; CARRUBBA, C. J. The Formation of Oversized Coalitions in Parliamentary De-
mocracies. American Journal of Political Science, vol. 48, n. 3, 2004, p. 521-537.