Week 1 Summaries

Gary King, Robert O. Keohane, and Sidney Verba, Designing Social Inquiry: Scientific Inference in Qualitative Research.  (chaps. 1-3, 6)   

    The authors’ principal purposes in Designing Social Inquiry is to show that the logic of inference typically associated with quantitative methods also applies to qualitative methods. Understood in this vein, the book proposes a methodological synthesis rather than a tyranny of quantitative over the qualitative methods. The book adopts the language of mathematics to give greater precision and clarity to notions implicit in qualitative approaches that otherwise might seem fuzzy or muddled.

    Chapter 1 introduces the purpose of the book and discusses the basic components of a research design: the research question, theories, data, and tests of the theories using data.
-  Research questions should deal with matters that are “important” in the real world. The answers to the questions should contribute to some recognized body of scholarly literature. (In other words, they ought to address an empirical problem as well as a theoretical approach).
-  Theories must be fashioned with care to ensure that they are not tautological or unfalsifiable, have observable implications, and are as concrete as possible—lending themselves to clear operationalizations. If a theory claims that x causes y, it is “falsifiable” if there remains the possibility that x results in not y, not x results in y, or not x results in not y.
-   In collecting data, one must be completely familiar with the process that generated the data, collect data related to as many of the observable implications as possible, endeavor to maximize validity (by how one operationalizes the variables), ensure reliability (by using consistent processes that are repeatable), and ensure that the data collection and analysis processes can be replicated by others.
-  In using data, a researcher must be aware of the potential sources of bias and should “wring” as much as possible out of the data by disaggregating or looking for temporal “breaks” that delineate separate cases. Moreover, a persuasive test of a theory must explore the bounds of the theory’s applicability by investigating the outcomes for cases that do not conform to the proposed set of causal conditions.

    Chapter 2 contrasts contextual “interpretation” with “inference,” but argues that the same standards of inference apply when testing hypotheses no matter how the hypotheses were divined. The chapter then develops a generalized formal model of a research design that is useful for qualitative research. The model draws analogies using the notions of “expected value,” “mean,” “variance,” “bias,” and “data efficiency” as applied in statistical inference.

    Chapter 3 takes up the topics of causality and causal inference. In devising a test of a theory, one should choose empirical cases that satisfy the criteria of unit homogeneity or conditional independence—or both, if possible. Unit homogeneity implies that for all cases in which the explanatory variables take on certain values, the expected value of the dependent variable is the same. Conditional independence asserts that observations are chosen such that the values taken by the study variables are independent of the values of the dependent variable. Conditional independence eliminates the problems of endogeneity (in which the explanatory variables are caused, at least in part, by the dependent variable), selection bias (choosing cases that explicitly or implicitly favor certain values of the dependent variable), and omitted variable bias (the influence of constant effects and spurious variables).

    Randomly selecting cases satisfies conditional independence. However, random selection introduces three difficulties in small-n research designs. It is difficult to apply in situations where the universe of cases is unclear. Also, if there is a small number of observations, random selection risks missing important cases. Furthermore, random selection can introduce biases in a small-n research design that can be avoided by carefully selecting cases in which the explanatory variables appear to be uncorrelated with the dependent variable.

    Chapter 6 addresses strategies for using case studies (”observations”) to best advantage for testing a theory. A single case study can serve as a “critical” test of a theory if it corresponds to a “least likely” test of a hypothesis (or a “most likely” test in the case of a plausibility probe). However, this approach has limited usefulness if there might be more than one causal effect, if we are concerned about inherent measurement error (which is almost always true), and if there is a possibility that the causal effect embodies contingency or a probabilistic quality. The number of data points (“observations” or “cases”) necessary to test a theory is a function of the variance in the causal variable, variability in the outcomes, the uncertainty associated with the causal inference (the desired “confidence interval”), and the degree of collinearity between the causal variable and the control variables. If more observations are needed, it may be possible to “squeeze” more out of the data on hand—by looking at subunits within the case or looking at the case across time—recognizing that a single case “parsed” into more cases may not yield observations that are independent.

Sprinz, Detlef F. and Yael N. Wolinsky (eds., under review).  Cases, Numbers, Models: International Relations Research Methods.  Chapter 2.  

There are three methods of case analysis: process tracing, congruence testing, and counterfactual analysis.  1) Process tracing applies theories under investigation to each causal step (intervening variables) between the hypothesized cause and the observed effect.  It allows testing a hypothesis derived from a case against different evidence in the same case.  2) Congruence testing involves comparing the predicted and the observed values of the dependent variable.  It’s inferior to process tracing because of the n=1 problem. 3) When we need to test a hypothesis “if and only if x then y,” counterfactual analysis will test a logically equivalent hypothesis “ if not x then not y.”

Two important single-case research designs are identified.  1) Eckstein proposed studying most-likely, least-likely, and crucial cases (the latter are perfectly most- or least- likely) for testing a theory.  “A most likely case is one that is almost certain to fit a theory if the theory is true for any cases at all” (37).  The theory is undermined if it doesn’t hold true in this case.  The definition of a least-likely case is analogous to that above.  If a theory holds in this case, it is strongly supported.   Incorporating the existence of competing theories into Ecksteins design, strong support for a theory is found when a case supports that theory’s predictions, and not the others’.  2) Studying deviant or “outlier” cases is useful in identifying new hypotheses and new or omitted variables. 

Comparative methods
1) Mill’s method of agreement / least similar case comparison involves selecting cases (observations), in which all but one independent variables have different values, and the dependent variable has the same value.  This yields the conclusion that the common independent variable is causally related to the dependent variable.  Mill’s method of difference / most similar case design involves selecting cases, in which all but one independent variables have same values, and the dependent variable has different values.  This yields the conclusion that the differing independent variable is causally related to the dependent variable.  These designs cannot be used in the presence of equifinality – a condition where “the same outcome can arise through different pathways or combinations of variables.”  In general, the requirements that must be satisfied in order to use these methods are unrealistic, and, therefore, these methods are rarely used.

2) Structured focused comparison, developed by A. George, requires i) defining the research objective (formulating hypotheses, etc.), ii) specifying control, key causal, and dependent variables, iii) selecting cases, iv) establishing how to measure variance in the dependent and independent variables, v) specifying the method for selecting observations (single values of variables).   George argued that case studies are useful in developing “typologycal theories” (41), which make less restrictive assumptions than those of Mill and incorporate equifinality. 

Comparative advantages and some tradeoffs
The most important comparative advantage of case study methods is in identifying new hypotheses.  The other advantages include studying causal mechanisms via process tracing, developing historical explanations, identifying new and omitted variables, attaining high levels of construct validity, and accommodating complex causal relations, such as equifinality, interactions effects, and path dependency.  The latter advantage carries a tradeoff, since it implies the loss of parsimony in selecting the number of variables and the loss of generality of findings.  Statistical methods face the opposite tradeoff.

Construct validity is the “ability to [operationalize and] measure in a case the indicators that best represent the theoretical concept we intend to measure” (42).  Again, there is a tradeoff between achieving high levels of construct validity, where case studies are superior to statistical analysis, and external validity, or the ability to generalize findings to a wide number of cases, where statistical methods are superior. 

One of the problems with cases studies is the danger of the selection bias - the case selection process, which results in “inferences that suffer from systematic error” (47-48).  Selection bias usually results from selecting on the dependent variable (selecting cases from the non-randomly limited sample of values of the dependent variable).  However, some argue that selecting on the dependent variable is useful when trying to test or limit the choice of the independent variables and when identifying the causal paths leading to a selected value of a dependent variable. 
Another source of selection bias is “confirmation bias: selecting only those cases whose independent and dependent variables vary as the favored hypothesis suggests and ignoring cases that appear to contradict the theory” (48, underlined in the original).  While selecting on the dependent variable typically understates the strength of the causal relationship, confirmation bias can either understate or overstate it. 

The indeterminancy problem arises when a case could be successfully explained by several competing hypotheses.  This is different from the “degrees of freedom” problem, where the number of independent variables exceeds the number of observations.   Because the author defines a case as “an instance of a class of events of interest to the investigator” (28), he argues that cases usually include a potentially large number of observations on dependent and independent variables, so that the degrees of freedom problem is not endemic to case studies.   

Another danger of using cases studies – potential lack of independence of cases – need not be a problem if this lack of independence is recognized (e.g. through process tracing) and adjuster for.  Another limitation of case studies is difficulty in measuring magnitude and uncertainty of causal inference. 

Chp. 6:  B.F. Braumoeller and A.E. Satori, "Empirical-Quantitative Approaches to the Study of International Relations."

Statistical method “permits the researcher to draw inferences about reality based on the data at hand and the laws of probability” (139).  It is especially useful for evaluating and testing theories. 


Ability to aggregate information from large numbers of cases is the major advantage of statistical method and can be useful for theory development.  Statistical analysis allows not only to uncover a puzzle, but, unlike a case study, to check if it represents a systemic pattern.  Thus, the method permits generalizations.  Statistics requires both high standards of inference (explicit assumptions) and standards of evidence (explicit criteria for measurement). 

Statistical method allows drawing causal inferences and estimating uncertainty of those inferences – the probability that the association is due to chance.  Finally, the method is extremely useful for testing rival hypotheses against each other.

Error of specification is a failure of statistical tests to “relate meaningfully to the causal mechanisms implied by the theories that they purport to evaluate” (143).  Three such errors are identified. 

1) Focus on correlations with little attention to theory.  This error is illustrated by the development of the democratic peace theory.  There, theory development lagged behind studies built on statistical associations.  Development of new theories uncovered the possibility that preceding studies based their analysis on the wrong causal variables.

2) Analysis based on imprecise or shallow theories.  An imprecise theory allows for “a wide range of relationships between independent and dependent variables” (144).  Such theories may be unfalsifiable.   According to Lake and Powell, Waltzian neorealism is one such theory.  It predicts that when, in a multipolar system, an alliance is challenged, a member of the alliance will either free-ride or join with others in meeting the challenge (144).  Since these responses are exhaustive and mutually exclusive, falsification is impossible.  

A shallow theory has few testable implications.  For instance, a one-shot Prisoner’s Dilemma (PD) game has been used to hypothesize a relationship between nuclear weapons and the likelihood of war.  However, if confronted with rival theories that predict the same relationship, the PD fails to provide additional testable implications, which would differ from those of rival theories.  Therefore, imprecise and shallow theories require theoretical development before statistical models can be applied.

3) Inattention to functional form - imposing a statistical model on a theory, instead of using a model to test the theory.  A statistical model should reflect the underlying theory and the causal processes that generated the data.  A combination of formal theory at the development stage and statistical methods for testing is, therefore, recommended. 
Errors of inference refer to fallacious reasoning as to “the extent that tests of a given theory reveal information about reality” (150). 

1) One way to make this error is to focus on statistical significance to the detriment of substantive significance.  In large-n studies, the smallest degree of association that provides weak support for a theory will prove to be statistically significant.  Another problem is that rejecting, or failing to reject, the null hypothesis on the basis of arbitrary significance levels is wrong, but widely practiced.  Instead, the certainty of one’s results should be represented by a probability measure of observing those results due to chance.  Finally, data mining, or running a model until significant results appear, significantly compromises reliability.  If one runs a model enough times, the probability of some results appearing significant, when the relationship is actually spurious, can be quite high. 

A “sin of omission” occurs when researchers “accept or reject a theory based upon an assessment of how likely certain variables are to have non-zero effects” (i.e. looking at the coefficients and standard errors) (152).  Instead, according to Lakatos, a theory should be evaluated on its performance against rival theories.  Or, as the Bayesian view holds that a theory should be evaluated based on results over time.       

A “sin of commission” occurs when too many independent variables are included in the analysis (“’garbage can’ models” (153)) and presents a serious threat to inference.  “Moreover, if the variables that the competing theory suggests are correlated in the sample with the variables of primary interest, then including these ‘control’ variables can lead to incorrect conclusions about the primary theory being tested” (153).  (The summarizer will be grateful to anyone who can reconcile that last quote with Gary King’s assertion that multicollinearity is not a problem, unless correlation = 1).

Chp. 10:  Duncan Snidal, “Formal Models of International Politics"

A model is “a simplified picture of a part of the real world,” which takes into account the most important considerations for the theory under investigation (242).  Formal models vary a great deal: verbal, physical, mathematical, computer models, etc.   Though different models have the same basic logical structure, each has its own advantages and disadvantages.  For example, computer models are difficult to set up and explain, but are great for manipulating assumptions of the model and for handling complex problems.  Mathematical models do not incorporate much detail, but have advantages of generality and preciseness of representation.  

Model construction is a powerful way to develop a theory.  When constructing a model, one should start with the simplest of specifications and then add complexity as needed. 

“The greatest advantage of models emerges when their deductive power moves us beyond descriptions to inferences from assumptions” (249).  Formal models are very good at achieving internal validity.  They help avoid logical mistakes, but are often criticized for producing intuitive results.  Therefore, a model is especially valuable when its conclusions are “surprising” (252).  The deductions of models can be surprising when they predict unobservable outcomes (e.g. if Saddam Hussein gets nukes, he’ll use them), or when an observed outcome depends on an unobserved cause (e.g. nuclear peace depends on the credibility of mutually assured destruction), or when only one of many potential outcomes is observed (existence of multiple equilibria).     

Though models achieve high levels of internal validity, external validity is a problem if the model is not properly tested.  The empirical content of models is based on “stylized facts,” or empirical generalizations (254).  To test a model, one must assess its applicability to an empirical problem it attempts to address.  Ascertaining “face validity” (whether the facts clearly contradict the theory or not) is a start, which should generally be followed by statistical testing.   This way of testing is inconclusive if a model is indeterminate – comes up with too many predictions.  The author notes, however, that such a model is not necessarily useless – it might be “illuminating indeterminancy that is a fundamental feature of the world” (255).   Case studies, which focus on complex causal relationships and interaction effects, can be useful when testing such models.  Besides testing predictions, one should also test the assumptions to see if the results are robust to reasonable variations in specifications.  

Progression of Formal Models

Richardson (1960) produced the first formal model in international relations.  Please refer to pages 258-261 for a concise and informative summary of the model.  Here it is in brief.  This is a rational choice decision theory model of two states conditioned by three motivations: grievances between states, fear of the other state, and fatigue resulting from costs of acquiring weapons.  A state’s behavior (i.e. the rate of weapons acquisition) can then be represented mathematically and graphically as a function of its grievances, its armament levels (of which fatigue is a function), and the other state’s armaments level (of which fear is a function).  The two equations (one for each state) can then be solved to arrive at an equilibrium level of military spending for each state.  Comparative statics can then be derived.  For example, the optimal level of spending by a state is increasing in its grievances, decreasing in the cost of maintaining current armaments, and increasing in either state’s fear (this is not bad grammar, this is rat. choice grammar J).  The model also implies that states will behave differently when not at equilibrium as parameters vary.  Specifically, when the fatigue factors are relatively larger than the fear factors, the states will always converge at equilibrium – the equilibrium is stable (see p.278).  If the inequality is reversed, the equilibrium becomes unstable, and the model comes up with bizarre predictions.  If both states levels of spending are slightly above equilibrium, spending will spiral up to infinity; and if both sides spend slightly less, spending on both sides will decrease to 0 (see p. 279).  One can algebraically solve for conditions under which the equilibrium is stable, but that does not explain away the predictions for unstable equilibrium conditions, and these predictions fly in the face of empirical observations.         

            Game theory addressed this problem by modeling interactions strategically (i.e. each player’s actions are conditioned by the other’s).   Using the example above, spending 0 will no longer be an equilibrium, since one state has an incentive to increase its spending a little to take advantage of the other. 

Please refer to pp. 263 and 280 for a description of the one-shot two-player Prisoner’s Dilemma.  It has been widely used in studies of cooperation, despite its obvious shortcomings.  First, it predicts mutual defection as the only equilibrium, while we do observe cooperation empirically.  Second, it treats states as unitary actors and ignores effects of domestic politics on foreign policy.  Third, the model ignores existing international institutional environment.  (Though I would assign the latter shortcomings to the underlying theory (i.e. realism), and not to the model, unless the model is not built on the assumptions of realism). 

The first shortcoming is addressed by incorporating repeated games into the model.  Folk Theorem predicts cooperation in an infinitely repeated PD game.  Repeated games also provide an answer to why cooperation on security issues is harder to achieve than on economic issues.  This is because the discount factor on future payoffs is lower for security issues (i.e. being taken advantage of now and getting a “sucker’s payoff” is especially not worth potential future cooperation when it comes to security).  A more unpleasant implication of the Folk Theorem is that there is an infinite number of possible cooperative equilibria, and the model cannot predict which one will occur.  Non-rational choice (psychological, cultural) theories rely on concepts like “focal points” to predict the outcome (266).  The Folk Theorem also highlights a substantive problem with the PD in that it focuses too much on cooperation and not enough on coordination between states on the possible choice of equilibria. 

            Extensive form games offer more detail than normal form games discussed above.  Presenting a model in extensive form allows the use of backwards induction or other techniques to find subgame perfect equilibria, which do not depend on incredible threats or commitments by states.  For example, on page 283, the normal form game has two Nash equilibria marked by asterisks.  By looking at the extensive form game on the same page, we see that the “Cooperate with threat – Cooperate” equilibrium relies on a commitment by C to cooperate after R cooperates.  However, this commitment is incredible, since, when C gets to choose, she will want to get a higher payoff and will not cooperate.  Since R knows this (by common knowledge assumption), he will not cooperate in the first place.  Thus, the only predicted (subgame-perfect) equilibrium of this game is mutual non-cooperation.     

            Some models experience difficulties describing reality because of simplifying assumptions.  Nevertheless, formal theory allows changing and relaxing such assumptions.  For example, to explain why war occurs, an assumption of complete information must be relaxed and uncertainty introduced.  Similarly, an assumption that states are unitary actors could be relaxed to introduce domestic actors.  Finally, complexity theory allows for change of preferences.