MATH60230: Lecture 11

IV, Bootstrap, and Randomization Inference

Vincent Grégoire

vincent.gregoire@hec.ca

HEC Montréal

Saad Ali Khan

saad-ali.khan@hec.ca

HEC Montréal

Outline

Collaboration: Git and GitHub

Git and GitHub

IV

Instrumental Variables
Potential Errors
IV Example

Bootstrap and Randomization Inference

Motivation
Bootstrap Inference
Randomization Inference
Example in Python

git

A free and open source distributed version control system.
You can think of git as adding “track changes” to your code, better.

GitHub

A hosted solution for git, plus more.

git

A free and open source distributed version control system.
You can think of git as adding “track changes” to your code, better.
Allows to keep track of multiple working versions of your code.
Allows multiple people to collaborate on code

GitHub

A hosted solution for git, plus more.
Provides a nice interface to git features
Provides a nice set of features for open-source collaboration
++ a lot more

How to learn more about GitHub?

Using GitHub for academic research

Introduction to IV

y = x\beta + \tilde{X}\beta_{\tilde{X}} + \varepsilon \quad \text{where} \quad E[\varepsilon|x] \neq 0 \\ \Rightarrow \widehat{\beta}_{OLS} \overset{p}{\rightarrow} \beta + \frac{Cov[x,\varepsilon]}{Var[x]} \ne \beta \quad \text{when} \quad T \rightarrow \infty

In other words, if one component X is not orthogonal to \varepsilon, \widehat{\beta}_{OLS} is biased (note that here we care about \beta, the univariate coefficient on x, not the vector of all coefs.).

Assume that we can find some other (instrumental) variable that satisfies

x = \gamma z + \tilde{X}\gamma_{\tilde{X}} + \mu \quad \text{where} \quad Cov[z,\varepsilon] = 0, \gamma \ne 0

We can isolate variation in the determination of x through z that is unrelated to the main relationship we are studying (the effect of x on y).

Introduction to IV

We need two assumptions for a valid IV
- Relevance — There must be correlation between z and x conditional on all other variables in a system i.e. \gamma \ne 0
- Exclusion — The variable can be excluded from the main equation of interest.

Two important parts to this assumption:
- The only relationship between z and y is through the first stage relationship
- Conditional on covariates (\tilde{X}), the instrument is as good as randomly assigned

IV Variation

IV Variation

IV Variation

How does this work?

Typically done as a two stage OLS estimation
- Estimate the relationship between z and x (including all other variables in the main equation)
- Estimate \hat{x} using the estimated coefficients
- Estimate the second stage model with \hat{x}

Can also be done as a GMM system

\widehat{\beta}_{2SLS} \overset{p}{\rightarrow} \frac{Cov[z,y]}{Cov[z,x]} = \beta + \frac{Cov[\epsilon,z]}{Cov[x,z]} \quad \text{when} \quad T \rightarrow \infty

Note that the identifying assumptions imply that the IV coefficient is asymptotically consistent
The estimator is biased in finite samples, but more on this later

Where do good instruments come from?

IV originally developed as a technique to estimate systems of equations (e.g. supply and demand for oranges)
- Using rainfall to instrument supply in order to isolate perturbations in quantity and price along a demand curve

Good instruments have credible economic link for relevance, and a logical reason for exclusion

Relevance can be tested since it is a partial correlation
Exclusion cannot be tested, so it must be argued based off of logical reasoning

Common (good) instruments include physical events, institutional changes, etc.

Common bad instruments include lagged variables and group averages excluding an individual member

More on the Exclusion Restriction

The exclusion assumption cannot be tested
- We never observe the true errors of a model, so we cannot test whether they are correlated with our instrument
- Moreover, the estimated residuals will always be orthogonal to all covariates in a regression, so we cannot “test” whether a potentially endogenous variable is correlated with the error of a regression

Researchers need to come up with supporting evidence that the exclusion restriction might hold

Placebo tests can be helpful

Maybe there is a region or time period when we think an effect shouldn’t be present
Are there other outcomes where confounding stories would have implications that can be tested?

More on IV

IV is consistent, but biased in finite samples towards the OLS estimate
- Basically because the first stage is estimated (with noise) there is bias unless the sample is really large

Since (asymptotically) \widehat{\beta}_{2SLS} = \beta + \frac{Cov[\epsilon,z]}{Cov[x,z]}, in finite samples, we are dividing the potential bias by the strength of the instrument, so it is really important to have a strong instrument

Adding more weak instruments makes the problem worse

Several papers suggest having a first-stage F-statistic on the instrument of greater than 10 or so.

Best practices are evolving

Inference with weak (but valid) instruments

Imposing filters (like an F-statistic cutoff) can induce distortions in which specifications/magnitudes are reported
- Weak instruments are only a problem when there is a violation of the exclusion restriction — filters will rule out cases of good instruments that have low power — would otherwise identify useful causal magnitudes
- Like p-value cutoffs

Distribution of F-Statistics in AER publications 2014–2018

Over-Identification

Consider only one endogenous variable in an equation. If there are more than one instruments for the variable, it is over-identified. In this case there is a “test” to show whether one instrument versus another instrument provides different estimates
- BUT they could all be bad instruments…

IV example — Snow and Leverage

Giroud et al. (2012) (GMSW) study whether reducing debt overhang increases firm performance
- Important question, but tough to find exogenous variation in debt forgiveness

The authors look at “unexpected” changes in snow on Austrian ski resorts

Look within the set of firms that had a debt restructuring to try and identify those that were strategic defaulters — i.e. those firms that defaulted despite having “favorable” circumstances

Economic story — those firms that had renegotiated and had unexpected good snow likely underinvested or were lazy, whereas those that had bad snow were more likely liquidity defaulters

Possible if lenders cannot ex-ante credibly commit to ex-post inefficient liquidation
Obvious alternative story is that managers who default despite good snow are bad managers. Authors attempt to address this.

GMSW IV story

Note that their story is not about the amount or level of snow, but the unexpected snow

Economic setting argues relevance

Exclusion restriction ((1) that conditional on covariates the variation is random and (2) that strategic debt relief is the only channel at work) relies on a few points
1. Primary analysis is on restructuring firms — so this looks within the set of restructuring firms and isolates variation in how the debt was restructured, so alternate stories have to explain variation within these firms
2. Authors control for the amount of snow (and this loads in the expected direction), so alternative explanations cannot be about the amount of snow
The authors first start with OLS regressions — they find that an increase in leverage is correlated with an increase in ROA
Next instrument change in leverage by abnormally high or low snowfall in recent years
Finally show the second stage results

IV estimation

Coefficient is negative — consistent with economic justification of instrument
Correlation is strong — F-Stat of \beta=0 is 10.3 — Important to ensure relevance condition is met

OLS

IV Second Stage

IV results flip sign compared to OLS results, suggesting that IV approach was important
Authors interpret their findings as restructuring caused by strategic defaults leads to better ROA because managers/shareholders are better incentivized

More on the Exclusion Restriction and bias

We said before that exclusion assumption cannot be tested
- We never observe the true errors of a model, so we cannot test whether they are correlated with our instrument
- Moreover, the estimated residuals will always be orthogonal to all covariates in a regression, so we cannot “test” whether a potentially endogenous variable is correlated with the error of a regression

Researchers need to come up with supporting evidence that the exclusion restriction might hold

Since GMSW’s sample is small, bias could be a problem but find an IV that is opposite from the OLS, their instrument is strong this is less of a problem (recall small departures from exogeneity are a problem with small samples and weak instruments)

Last points about general IV

The first stage should be linear in order to ensure consistent second stage estimates
- Binary endogenous variables should NOT be estimated via probit/logit
All variables in the second stage must be included in the first stage, otherwise estimates are inconsistent
Statistical inference in the second stage must be done on actual (not estimated) data. (the linearmodels module does this automatically)
IV can also be used to correct for measurement error if it is a problem and you have a plausible instrument

Bootstrap and Randomization Inference

Motivation

In many empirical settings, standard asymptotic inference may be unreliable:

Small samples — asymptotic approximations may not hold
Non-standard distributions — test statistics may not be normally distributed
Complex dependence structures — standard errors may be difficult to compute analytically

Simulation-based methods offer an alternative by constructing the distribution of a test statistic empirically rather than relying on theoretical approximations.

See Rosenbaum (2010) and Imbens and Rubin (2015) for detailed discussions.

Bootstrap Inference

The bootstrap (Efron 1979) estimates the sampling distribution of a statistic by resampling with replacement from the observed data.

Procedure:

Compute the statistic of interest \hat{\tau} from the original sample
Draw B bootstrap samples (same size, with replacement)
Compute \tilde{\tau}_b for each bootstrap sample b = 1, \ldots, B
Use the distribution of \{\tilde{\tau}_1, \ldots, \tilde{\tau}_B\} for inference

Confidence intervals: Use the quantiles of the bootstrap distribution, e.g., the 2.5th and 97.5th percentiles for a 95% CI.

p-values: Fraction of bootstrap statistics at least as extreme as \hat{\tau} under the null.

Bootstrap — Intuition

The key idea: the empirical distribution \hat{F}(x) of the observed data approximates the true population distribution F(x).

Drawing with replacement from the sample is equivalent to drawing from \hat{F}(x)
Each bootstrap sample reflects the variability we would expect if we could resample from the population
The bootstrap distribution of \tilde{\tau} approximates the sampling distribution of \hat{\tau}

Works well when:

The sample is representative of the population
The statistic is smooth (e.g., means, regression coefficients)

Can fail when the sample is very small or the statistic depends on extreme values.

Randomization Inference

Randomization inference (also called permutation tests or random shuffles) tests whether an observed relationship is statistically significant by breaking the association between variables.

Procedure (e.g., testing whether Y is related to X):

Compute the statistic \hat{\tau} from the original data \{Y_t, X_t\}
Randomly shuffle \{Y_t\} (without replacement) to create \{\tilde{Y}_t\}
- \tilde{Y}_t is now independent of X_t
Compute \tilde{\tau} from \{\tilde{Y}_t, X_t\}
Repeat N times to get \{\tilde{\tau}_1, \ldots, \tilde{\tau}_N\}

p-value: Reject if \hat{\tau} < q_{2.5\%}(\tilde{\tau}) or \hat{\tau} > q_{97.5\%}(\tilde{\tau}).

Bootstrap vs. Randomization Inference

	Bootstrap	Randomization
Resampling	With replacement	Without replacement (shuffle)
What it tests	Sampling uncertainty	Whether the relationship is real
Null hypothesis	Varies (e.g., \beta = 0)	No association between variables
Preserves	Marginal distributions	Marginal distributions, sample size
Breaks	Nothing (resamples pairs)	The association between Y and X

Both methods are non-parametric: they make no assumptions about the distribution of the data.

Using both provides complementary evidence: bootstrap reflects sampling uncertainty, while randomization tests structural significance.

Example: Bootstrap and Randomization in OLS

Consider a small sample (n = 30) where we estimate Y = \alpha + \beta X + \varepsilon.

import numpy as np
import statsmodels.api as sm

np.random.seed(42)
n = 30
X = np.random.normal(0, 1, n)
epsilon = np.random.normal(0, 2, n)
Y = 1.0 + 0.8 * X + epsilon  # True beta = 0.8

X_const = sm.add_constant(X)
ols_result = sm.OLS(Y, X_const).fit()
beta_hat = ols_result.params[1]
print(f"OLS estimate: β̂ = {beta_hat:.4f}")
print(f"Asymptotic p-value (H₀: β=0): {ols_result.pvalues[1]:.4f}")

OLS estimate: β̂ = 1.0045
Asymptotic p-value (H₀: β=0): 0.0154

Example: Bootstrap Confidence Interval

B = 10_000
boot_betas = np.empty(B)

for b in range(B):
    idx = np.random.choice(n, size=n, replace=True)
    X_b, Y_b = X_const[idx], Y[idx]
    boot_betas[b] = sm.OLS(Y_b, X_b).fit().params[1]

ci_lower, ci_upper = np.percentile(boot_betas, [2.5, 97.5])
boot_p = np.mean(np.abs(boot_betas - beta_hat) >= np.abs(beta_hat))
print(f"Bootstrap 95% CI: [{ci_lower:.4f}, {ci_upper:.4f}]")
print(f"Bootstrap p-value (H₀: β=0): {boot_p:.4f}")

Bootstrap 95% CI: [0.2242, 1.7431]
Bootstrap p-value (H₀: β=0): 0.0132

Example: Randomization Inference

N_shuffles = 10_000
shuffle_betas = np.empty(N_shuffles)

for i in range(N_shuffles):
    Y_shuffled = np.random.permutation(Y)
    shuffle_betas[i] = sm.OLS(Y_shuffled, X_const).fit().params[1]

rand_p = np.mean(np.abs(shuffle_betas) >= np.abs(beta_hat)) * 2
print(f"Randomization p-value (H₀: β=0): {min(rand_p, 1.0):.4f}")

Randomization p-value (H₀: β=0): 0.0298

Example: Comparing the Distributions

References

Efron, Bradley. 1979. “Bootstrap Methods: Another Look at the Jackknife.” The Annals of Statistics 7 (1): 1–26.

Giroud, Xavier, Holger M Mueller, Alex Stomper, and Arne Westerkamp. 2012. “Snow and Leverage.” The Review of Financial Studies 25 (3): 680–710.

Imbens, Guido W, and Donald B Rubin. 2015. Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press.

Rosenbaum, Paul R. 2010. Design of Observational Studies. Vol. 10. Springer.