Then we run one regression on our stacked data and get one series of residuals \{\hat{u}_{1}, \ldots, \hat{u}_{N \cdot T}\}
RSS_R \equiv \sum_{t=1}^{N \cdot T} \hat{u}_{t}^2
We can then calculate the usual F-statistic F = \frac{RSS_R - RSS_U}{RSS_U} \cdot \frac{N \cdot (T - k)}{R} \sim F(R,N \cdot (T - k)) \; \text{under} \; H_0 where k = 1 + the number of variables in stuff.
Omitted Variables Bias
A Short Digression
Recall our review of OLS, where we mentioned an alternative way to estimate \widehat{\beta}_{i}:
Regress y on all the other variables X_{j}\neq X_{i} and save the residuals \widehat{\varepsilon}_{y}.
Regress X_{i} on all the other variables X_{j}\neq X_{i} and save the residuals \widehat{\varepsilon}_{x}.
\widehat{\beta}_{i} is the coefficient of the regression of \widehat{\varepsilon}_{y} on \widehat{\varepsilon}_{x}.
This shows that \widehat{\beta}_{i} captures the effect of X_{i} on ythat cannot be accounted for by any other variablesX_{j}.
Problem
What if we forgot to include an important variable X_{j}?
Then we might have Omitted Variables Bias!
Suppose that the true model is y_{t}=\beta_{X}\cdot X_{t}+\beta_{Z}\cdot Z_{t}+e_{t}
but we omit Z_{t} and instead use OLS to estimate y_{t}=\beta_{X}\cdot X_{t}+e_{t}
Will \widehat{\beta}_{X} be biased?
We might think that many things other than CF influence I:
Firms in capital-intensive industries (e.g. power generation, aerospace manufacturing) might invest more than those in other industries (e.g. fast-food franchises).
Firms might invest much less than usual in recessions (low demand) or more than usual when interest rates are low.
If we don’t add additional variables to control for these effects, our estimates of \beta may be biased.
Is CF_{it} correlated with the missing variable(s)?
That’s why we had stuff included in the above panel regressions.
Fixed Effects
Problem
What if we don’t have data on the missing variables?
Solution
In many (not all) cases, we can add a set of dummy variables.
We call this Fixed Effects.
Firm (or Entity) Fixed Effects
Case 1:
The missing variables are constant over t but not i.
We can add N dummy variables, one for each value of i.
They will capture the overall effect of all omitted variables that are constant over time.
Example
If I_{it} depends on CF_{it} and on the firm’s industry, these dummy variables will capture the latter effect.
Time Fixed Effects
Case 2
The missing variables are constant over i but not t.
We can add T dummy variables, one for each time period t.
We call this Time Fixed Effects.
They will capture the overall effect of all omitted variables that are the same for all i.
Example
If I_{it} depends on CF_{it} and on GDP growth or the output gap, these dummy variables will capture those effects.
Firm and Time Fixed Effects
Case 3
Some missing variables are constant over i, others over t.
We can add N+T dummy variables, one for each time period tand one for each value of i.
We have more than enough observations to estimate both! (N \cdot T \gg N + T)
Fixed Effects
Some points to note:
Including all N or T dummies means that we can’t also include a constant!
Including Time Fixed Effects means that we can’t include any variables that only vary through time (e.g. T-bill rates, returns on market portfolio, etc.)
We could include a smaller number of dummies if we want. (one per zip code? SIC code? year?)
Because we’re interested in \beta, we don’t usually report the (many!) coefficients on the dummies.
Fixed Effects
The Fixed Effects model looks something like I_{it}=\alpha_{i}+\beta\cdot CF_{it}+e_{it}
This is equivalent to using OLS to estimate I_{it}=\left( \sum_{j=1}^N\alpha_{j}\cdot D_{j}\right) +\beta\cdot CF_{it}+e_{it}
where D_{j}=1 for firm j and =0 otherwise.
Fixed Effects
Fact
That’s not the way Fixed Effects estimates are usually programmed, however.
When N is large, our X'X matrix is large.
Inverting large matrices is relatively slow.
To avoid this, we replace each variable with the deviation from its average value over time. \tilde{y}_{it} \equiv y_{it} - T^{-1} \cdot \left( \sum_{t=1}^T y_{it} \right)
OLS on \tilde{I}_{it}= \beta\cdot \tilde{CF}_{it}+e_{it} will give identical \hat{\beta}, \hat{e}_{it}, etc. as above, but do it much faster!
That’s one reason why we like to use code optimized for Panel Data.
Estimating Panel Data Models
Python package for estimating linear models in finance and economics:
But what happens if errors are correlated within each firm i or time period t?
Clustered Standard Errors
Suppose that the data have an unobserved firm effect that is fixed:
X_{i,t} = \alpha_i + \nu_{i,t}.
The residuals can be specified as
\varepsilon_{i,t} = \gamma_i + \eta_{i,t}.
Clustered Standard Errors
Both the independent variable and the residual are correlated across observations of the same firm, but are independent across firms:
\begin{aligned}
corr(X_{i,t}, X_{j,s}) &= 1 \text{ for } i=j \text{ and } t=s\\
&= \rho_{X} = \sigma^2_{\alpha} / \sigma^2_X \text{ for } i=j \text{ and all } t\neq s\\
&= 0 \text{ for all } i\neq j
\end{aligned}
\begin{aligned}
corr(\varepsilon_{i,t}, \varepsilon_{j,s}) &= 1 \text{ for } i=j \text{ and } t=s\\
&= \rho_{\varepsilon} = \sigma^2_{\gamma} / \sigma^2_{\varepsilon} \text{ for } i=j \text{ and all } t\neq s\\
&= 0 \text{ for all } i\neq j.
\end{aligned}
Since the autocorrelations can be positive or negative, it is possible for the OLS standard error to under- or overestimate the true standard error.
The correlation of the residuals within a cluster is the problem the clustered standard errors are designed to correct.
This correlation can be of any form; no parametric structure is assumed. However, the squared sum of X_{i,t}\varepsilon_{i,t} is assumed to have the same distribution across the clusters.
Petersen, Mitchell A. 2008. “Estimating Standard Errors in Finance Panel Data Sets: Comparing Approaches.”The Review of Financial Studies 22 (1): 435–80.