18  Tables for Research

18.1 Introduction

Tables are fundamental to empirical research in finance. Whether you’re presenting summary statistics of your data, reporting regression results, or comparing model performance across specifications, clear and well-formatted tables communicate your findings effectively to readers. In this chapter, you’ll learn how to create publication-quality tables in Python and export them to formats suitable for academic papers.

The objectives of this chapter are to:

  1. Create descriptive statistics tables that summarize key features of your data
  2. Format regression output tables with proper presentation of coefficients, standard errors, and model statistics
  3. Export tables to LaTeX format for inclusion in academic papers
  4. Apply best practices for table design that meet journal formatting requirements

By the end of this chapter, you’ll be able to produce tables that look professional, convey information clearly, and can be easily integrated into your research papers. The emphasis is on practical workflows that save time while ensuring your tables meet the standards expected in top finance journals.

18.2 Prerequisites

This chapter assumes you are familiar with pandas DataFrames (covered in Introduction to DataFrames) and have basic knowledge of regression analysis. You should also have pandas and statsmodels installed in your Python environment. We’ll introduce additional packages for table formatting as we go.

18.3 Descriptive Statistics Tables

Descriptive statistics tables typically appear early in empirical papers, often labeled “Summary Statistics” or “Descriptive Statistics.” They provide readers with a quick overview of the key variables in your analysis: their central tendency, dispersion, and sometimes their distribution properties.

18.3.1 Basic Summary Statistics with pandas

The simplest way to generate summary statistics is using pandas’ built-in describe() method. Let’s start with a sample dataset of stock returns:

import pandas as pd
import numpy as np

# Set random seed for reproducibility
np.random.seed(42)

# Create sample data: monthly returns for 5 stocks over 60 months
stocks = ['AAPL', 'GOOGL', 'MSFT', 'AMZN', 'META']
n_months = 60

data = {
    'Date': pd.date_range('2019-01-01', periods=n_months, freq='MS'),
}

for stock in stocks:
    # Generate returns with different means and volatilities
    mean_return = np.random.uniform(-0.01, 0.02)
    volatility = np.random.uniform(0.05, 0.10)
    data[stock] = np.random.normal(mean_return, volatility, n_months)

df = pd.DataFrame(data)
df = df.set_index('Date')

# View first few rows
df.head()
AAPL GOOGL MSFT AMZN META
Date
2019-01-01 0.064409 0.053167 -0.067056 0.036091 0.105055
2019-02-01 0.149786 0.089193 -0.034479 -0.016558 0.020009
2019-03-01 -0.021602 -0.005442 0.016658 -0.091181 -0.084127
2019-04-01 -0.021601 0.065823 -0.029655 -0.155592 0.052071
2019-05-01 0.155266 0.023291 -0.110068 -0.052587 0.127354

Now generate basic summary statistics:

# Basic descriptive statistics
summary = df.describe()
summary
AAPL GOOGL MSFT AMZN META
count 60.000000 60.000000 60.000000 60.000000 60.000000
mean -0.015512 0.002233 0.012238 -0.005251 -0.003366
std 0.088307 0.063891 0.071728 0.100086 0.065310
min -0.189902 -0.174253 -0.114431 -0.204792 -0.207925
25% -0.065834 -0.029886 -0.051764 -0.078866 -0.043179
50% -0.021601 0.001494 0.024828 0.001821 0.000423
75% 0.035533 0.037292 0.057643 0.052005 0.041835
max 0.181899 0.162542 0.217884 0.361928 0.128034

The describe() method provides count, mean, standard deviation, minimum, quartiles, and maximum. For financial data, we often want to customize this to show statistics more relevant to our analysis.

18.3.2 Customizing Summary Statistics

Let’s create a more informative summary statistics table tailored for financial returns:

def create_summary_stats(df):
    """
    Create customized summary statistics for financial returns.

    Parameters
    ----------
    df : pd.DataFrame
        DataFrame containing return data

    Returns
    -------
    pd.DataFrame
        Summary statistics table
    """
    stats = pd.DataFrame({
        'Mean': df.mean(),
        'Std Dev': df.std(),
        'Min': df.min(),
        'Max': df.max(),
        'Skewness': df.skew(),
        'Kurtosis': df.kurtosis(),
        'Obs': df.count()
    })

    return stats

summary_stats = create_summary_stats(df)
summary_stats
Mean Std Dev Min Max Skewness Kurtosis Obs
AAPL -0.015512 0.088307 -0.189902 0.181899 0.071449 -0.410886 60
GOOGL 0.002233 0.063891 -0.174253 0.162542 -0.181071 0.596248 60
MSFT 0.012238 0.071728 -0.114431 0.217884 0.377622 -0.055411 60
AMZN -0.005251 0.100086 -0.204792 0.361928 0.711126 2.126233 60
META -0.003366 0.065310 -0.207925 0.128034 -0.315057 0.511462 60

This gives us a better picture of our return distributions, including skewness and kurtosis which are important for understanding tail risk.

18.3.3 Formatting Numbers for Presentation

Raw statistics often have too many decimal places for presentation. Let’s format the table appropriately:

def format_summary_stats(stats_df):
    """
    Format summary statistics for presentation.

    Parameters
    ----------
    stats_df : pd.DataFrame
        Summary statistics DataFrame

    Returns
    -------
    pd.DataFrame
        Formatted summary statistics
    """
    # Create a copy to avoid modifying original
    formatted = stats_df.copy()

    # Format each column appropriately
    for col in ['Mean', 'Std Dev', 'Skewness', 'Kurtosis']:
        formatted[col] = formatted[col].apply(lambda x: f"{x:.4f}")

    for col in ['Min', 'Max']:
        formatted[col] = formatted[col].apply(lambda x: f"{x:.3f}")

    # Keep Obs as integer
    formatted['Obs'] = formatted['Obs'].astype(int)

    return formatted

formatted_stats = format_summary_stats(summary_stats)
formatted_stats
Mean Std Dev Min Max Skewness Kurtosis Obs
AAPL -0.0155 0.0883 -0.190 0.182 0.0714 -0.4109 60
GOOGL 0.0022 0.0639 -0.174 0.163 -0.1811 0.5962 60
MSFT 0.0122 0.0717 -0.114 0.218 0.3776 -0.0554 60
AMZN -0.0053 0.1001 -0.205 0.362 0.7111 2.1262 60
META -0.0034 0.0653 -0.208 0.128 -0.3151 0.5115 60
TipSignificant Figures in Finance Research

Different statistics require different precision levels:

  • Returns and prices: 3-4 decimal places typically suffice
  • Statistical tests (t-stats, p-values): 2-3 decimal places
  • R-squared values: 3-4 decimal places
  • Sample sizes: No decimal places (integers)

When in doubt, check recent papers in your target journal to see their conventions.

18.3.4 Panel Data Summary Statistics

When working with panel data (multiple entities observed over time), you often want to present both cross-sectional and time-series dimensions:

# Create panel data: returns for 10 firms over 60 months
np.random.seed(123)
n_firms = 10
n_periods = 60

panel_data = []
for firm_id in range(1, n_firms + 1):
    for t in range(n_periods):
        panel_data.append({
            'firm_id': firm_id,
            'period': t,
            'returns': np.random.normal(0.01, 0.05),
            'market_cap': np.random.lognormal(10, 2),
            'book_to_market': np.random.gamma(2, 0.5)
        })

panel_df = pd.DataFrame(panel_data)

# Summary statistics by firm (cross-sectional)
cross_section = panel_df.groupby('firm_id')[['returns', 'market_cap', 'book_to_market']].mean()
print("Cross-sectional averages (by firm):")
print(cross_section.describe())
Cross-sectional averages (by firm):
         returns     market_cap  book_to_market
count  10.000000      10.000000       10.000000
mean    0.010209  176798.631089        0.986518
std     0.004949  122379.986024        0.070979
min     0.003166   59440.133610        0.884198
25%     0.005920  102011.132256        0.937919
50%     0.011089  147587.267562        0.990285
75%     0.014329  192955.334105        1.030200
max     0.016038  482650.132081        1.098476
# Time-series statistics
print("\nTime-series statistics:")
time_series = panel_df.groupby('period')[['returns', 'market_cap', 'book_to_market']].mean()
print(time_series.describe())

Time-series statistics:
         returns    market_cap  book_to_market
count  60.000000  6.000000e+01       60.000000
mean    0.010209  1.767986e+05        0.986518
std     0.013868  3.264904e+05        0.198994
min    -0.027006  1.449108e+04        0.510657
25%     0.001668  5.120925e+04        0.863925
50%     0.012041  8.341678e+04        0.966076
75%     0.018662  1.699882e+05        1.090577
max     0.034992  2.339858e+06        1.571215

18.3.5 Comparing Groups in Summary Statistics

Research papers often compare statistics across different groups (e.g., treated vs. control, large vs. small firms):

# Add a grouping variable
panel_df['size_group'] = pd.cut(
    panel_df['market_cap'],
    bins=2,
    labels=['Small', 'Large']
)

# Summary statistics by group
grouped_stats = panel_df.groupby('size_group')[['returns', 'book_to_market']].agg([
    ('Mean', 'mean'),
    ('Std Dev', 'std'),
    ('N', 'count')
])

# Flatten column names
grouped_stats.columns = ['_'.join(col).strip() for col in grouped_stats.columns.values]
grouped_stats
/var/folders/jr/cn9h86ld68qb5rtvs9gsb1vr0000gn/T/ipykernel_38836/3721035929.py:9: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  grouped_stats = panel_df.groupby('size_group')[['returns', 'book_to_market']].agg([
returns_Mean returns_Std Dev returns_N book_to_market_Mean book_to_market_Std Dev book_to_market_N
size_group
Small 0.010225 0.050226 599 0.987235 0.669736 599
Large 0.000245 NaN 1 0.557118 NaN 1

This type of table is common in papers examining differences between portfolios or firm characteristics.

18.3.6 Adding Statistical Tests to Summary Tables

When comparing groups, you often want to test whether differences are statistically significant:

from scipy import stats as scipy_stats

def add_difference_test(df, group_col, value_cols):
    """
    Add t-test for difference in means between two groups.

    Parameters
    ----------
    df : pd.DataFrame
        Panel data with grouping variable
    group_col : str
        Name of grouping column (must have exactly 2 groups)
    value_cols : list
        Variables to test

    Returns
    -------
    pd.DataFrame
        Table with group means and difference tests
    """
    groups = df[group_col].unique()
    if len(groups) != 2:
        raise ValueError("Grouping variable must have exactly 2 groups")

    results = []

    for col in value_cols:
        group1_data = df[df[group_col] == groups[0]][col].dropna()
        group2_data = df[df[group_col] == groups[1]][col].dropna()

        mean1 = group1_data.mean()
        mean2 = group2_data.mean()

        # Two-sample t-test
        t_stat, p_value = scipy_stats.ttest_ind(group1_data, group2_data)

        results.append({
            'Variable': col,
            f'{groups[0]} Mean': f'{mean1:.4f}',
            f'{groups[1]} Mean': f'{mean2:.4f}',
            'Difference': f'{mean1 - mean2:.4f}',
            't-stat': f'{t_stat:.3f}',
            'p-value': f'{p_value:.3f}'
        })

    return pd.DataFrame(results)

# Test differences between small and large firms
diff_test = add_difference_test(panel_df, 'size_group', ['returns', 'book_to_market'])
diff_test
Variable Small Mean Large Mean Difference t-stat p-value
0 returns 0.0102 0.0002 0.0100 0.199 0.843
1 book_to_market 0.9872 0.5571 0.4301 0.642 0.521
NoteChoosing the Right Statistical Test

The t-test above assumes normally distributed data with equal variances. For financial data:

  • Use Welch’s t-test (default in scipy.stats.ttest_ind) if variances differ
  • Consider Mann-Whitney U test for non-normal distributions
  • Use paired t-test for matched samples (e.g., before/after comparisons)
  • Bootstrap standard errors for robust inference with small samples

18.4 Regression Output Tables

Regression results are the heart of most empirical papers in finance. A well-formatted regression table presents coefficients, standard errors (or t-statistics), and model fit statistics in a clear, compact format that allows readers to quickly assess your results.

18.4.1 Basic Regression with statsmodels

Let’s start by estimating a simple regression and examining the output:

import statsmodels.api as sm

# Prepare data for regression
# Dependent variable: firm returns
# Independent variables: market cap (log), book-to-market ratio
panel_df['log_market_cap'] = np.log(panel_df['market_cap'])

y = panel_df['returns']
X = panel_df[['log_market_cap', 'book_to_market']]
X = sm.add_constant(X)  # Add intercept

# Estimate OLS regression
model = sm.OLS(y, X)
results = model.fit()

# View default output
print(results.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                returns   R-squared:                       0.001
Model:                            OLS   Adj. R-squared:                 -0.003
Method:                 Least Squares   F-statistic:                    0.2171
Date:                Tue, 23 Dec 2025   Prob (F-statistic):              0.805
Time:                        13:57:01   Log-Likelihood:                 944.57
No. Observations:                 600   AIC:                            -1883.
Df Residuals:                     597   BIC:                            -1870.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
const              0.0041      0.011      0.360      0.719      -0.018       0.026
log_market_cap     0.0007      0.001      0.629      0.530      -0.001       0.003
book_to_market    -0.0005      0.003     -0.163      0.871      -0.007       0.006
==============================================================================
Omnibus:                        0.097   Durbin-Watson:                   2.072
Prob(Omnibus):                  0.953   Jarque-Bera (JB):                0.094
Skew:                          -0.030   Prob(JB):                        0.954
Kurtosis:                       2.986   Cond. No.                         57.5
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The default summary from statsmodels is informative but not publication-ready. Let’s extract and format the key information.

18.4.2 Creating a Basic Regression Table

def create_regression_table(results):
    """
    Create a formatted regression table from statsmodels results.

    Parameters
    ----------
    results : statsmodels RegressionResults
        Fitted regression model

    Returns
    -------
    pd.DataFrame
        Formatted regression table
    """
    # Extract coefficients and standard errors
    coefs = results.params
    std_errs = results.bse
    t_stats = results.tvalues
    p_values = results.pvalues

    # Create table
    table_data = []
    for var in coefs.index:
        # Format coefficient with standard error in parentheses
        coef_str = f"{coefs[var]:.4f}"
        se_str = f"({std_errs[var]:.4f})"

        # Add significance stars
        if p_values[var] < 0.01:
            coef_str += "***"
        elif p_values[var] < 0.05:
            coef_str += "**"
        elif p_values[var] < 0.10:
            coef_str += "*"

        table_data.append({
            'Variable': var,
            'Coefficient': coef_str,
            'Std Error': se_str
        })

    table = pd.DataFrame(table_data)

    # Add model statistics
    stats_rows = pd.DataFrame([
        {'Variable': 'R-squared', 'Coefficient': f"{results.rsquared:.4f}", 'Std Error': ''},
        {'Variable': 'Observations', 'Coefficient': str(int(results.nobs)), 'Std Error': ''}
    ])

    table = pd.concat([table, stats_rows], ignore_index=True)

    return table

reg_table = create_regression_table(results)
reg_table
Variable Coefficient Std Error
0 const 0.0041 (0.0113)
1 log_market_cap 0.0007 (0.0010)
2 book_to_market -0.0005 (0.0031)
3 R-squared 0.0007
4 Observations 600
TipSignificance Stars

The convention for significance stars in finance research:

  • *** : p < 0.01 (1% level)
  • ** : p < 0.05 (5% level)
  • * : p < 0.10 (10% level)

Always include a note below your table explaining this convention. Some journals discourage stars and prefer reporting exact p-values or confidence intervals instead.

18.4.3 Comparing Multiple Model Specifications

Research papers typically present several model specifications side-by-side. Let’s create a multi-column regression table:

# Estimate multiple specifications
# Model 1: Just market cap
X1 = sm.add_constant(panel_df[['log_market_cap']])
model1 = sm.OLS(panel_df['returns'], X1).fit()

# Model 2: Market cap + book-to-market
X2 = sm.add_constant(panel_df[['log_market_cap', 'book_to_market']])
model2 = sm.OLS(panel_df['returns'], X2).fit()

# Model 3: Add firm fixed effects (simple dummy approach for illustration)
panel_df['firm_dummies'] = pd.Categorical(panel_df['firm_id'])
X3 = sm.add_constant(
    pd.concat([
        panel_df[['log_market_cap', 'book_to_market']],
        pd.get_dummies(panel_df['firm_dummies'], drop_first=True, dtype=float)
    ], axis=1)
)
model3 = sm.OLS(panel_df['returns'], X3).fit()

def create_multi_model_table(models, model_names=None):
    """
    Create a regression table comparing multiple models.

    Parameters
    ----------
    models : list
        List of fitted statsmodels regression results
    model_names : list, optional
        Names for each model column

    Returns
    -------
    pd.DataFrame
        Multi-column regression table
    """
    if model_names is None:
        model_names = [f"Model {i+1}" for i in range(len(models))]

    # Collect all unique variables across models
    all_vars = set()
    for model in models:
        all_vars.update(model.params.index)

    # Remove firm dummies for cleaner display
    all_vars = [v for v in all_vars if not str(v).startswith('firm_dummies')]

    # Build table
    table_dict = {'Variable': []}
    for name in model_names:
        table_dict[name] = []

    for var in all_vars:
        # Add coefficient row
        table_dict['Variable'].append(var)

        for i, model in enumerate(models):
            if var in model.params.index:
                coef = model.params[var]
                p_val = model.pvalues[var]

                # Format with stars
                coef_str = f"{coef:.4f}"
                if p_val < 0.01:
                    coef_str += "***"
                elif p_val < 0.05:
                    coef_str += "**"
                elif p_val < 0.10:
                    coef_str += "*"

                table_dict[model_names[i]].append(coef_str)
            else:
                table_dict[model_names[i]].append("")

        # Add standard error row
        table_dict['Variable'].append('')
        for i, model in enumerate(models):
            if var in model.params.index:
                se = model.bse[var]
                table_dict[model_names[i]].append(f"({se:.4f})")
            else:
                table_dict[model_names[i]].append("")

    # Add model statistics
    stats = [
        ('R-squared', lambda m: f"{m.rsquared:.4f}"),
        ('Adj. R-squared', lambda m: f"{m.rsquared_adj:.4f}"),
        ('Observations', lambda m: f"{int(m.nobs)}")
    ]

    for stat_name, stat_func in stats:
        table_dict['Variable'].append(stat_name)
        for i, model in enumerate(models):
            table_dict[model_names[i]].append(stat_func(model))

    return pd.DataFrame(table_dict)

multi_reg_table = create_multi_model_table([model1, model2, model3])
multi_reg_table
Variable Model 1 Model 2 Model 3
0 2 -0.0064
1 (0.0092)
2 3 0.0032
3 (0.0092)
4 book_to_market -0.0005 -0.0006
5 (0.0031) (0.0031)
6 4 0.0005
7 (0.0092)
8 5 -0.0062
9 (0.0092)
10 6 0.0055
11 (0.0092)
12 7 0.0061
13 (0.0092)
14 8 -0.0049
15 (0.0092)
16 9 0.0020
17 (0.0092)
18 10 0.0065
19 (0.0092)
20 const 0.0035 0.0041 0.0030
21 (0.0107) (0.0113) (0.0131)
22 log_market_cap 0.0007 0.0007 0.0007
23 (0.0010) (0.0010) (0.0011)
24 R-squared 0.0007 0.0007 0.0096
25 Adj. R-squared -0.0010 -0.0026 -0.0089
26 Observations 600 600 600

18.4.4 Using stargazer for Professional Tables

While our custom functions work well, the stargazer package (Python port of the R package) provides a more automated solution for creating publication-quality regression tables:

from stargazer.stargazer import Stargazer

# Create stargazer table
stargazer = Stargazer([model1, model2])

# Display as text (can also output to LaTeX)
print(stargazer)
<stargazer.stargazer.Stargazer object at 0x133c54ec0>

The stargazer output follows academic conventions closely and includes several useful features automatically.

NoteAlternatives to stargazer

While stargazer is popular, you might also consider:

  • statsmodels.iolib.summary2: Built into statsmodels for comparing models
  • linearmodels: Excellent for panel data regressions with sophisticated standard errors
  • regtabletotext: Lightweight package focused on clean text and LaTeX output
  • Custom solutions: Building your own functions gives maximum control over formatting

18.4.5 Handling Clustered Standard Errors

In finance research, especially with panel data, you often need to cluster standard errors by firm or time:

# Estimate model with clustered standard errors
# Using firm clusters
from statsmodels.regression.linear_model import OLS

# Prepare data
y = panel_df['returns']
X = sm.add_constant(panel_df[['log_market_cap', 'book_to_market']])

# Fit with clustered standard errors
model_clustered = OLS(y, X).fit(
    cov_type='cluster',
    cov_kwds={'groups': panel_df['firm_id']}
)

print("Standard OLS standard errors:")
print(results.bse)
print("\nClustered standard errors (by firm):")
print(model_clustered.bse)
Standard OLS standard errors:
const             0.011311
log_market_cap    0.001047
book_to_market    0.003072
dtype: float64

Clustered standard errors (by firm):
const             0.008306
log_market_cap    0.000976
book_to_market    0.002056
dtype: float64
WarningClustering and Standard Errors

Clustering typically increases standard errors because it accounts for within-cluster correlation. Key considerations:

  • Cluster at the level of treatment assignment: If treatment varies by firm, cluster by firm
  • Few clusters problem: With fewer than 20-30 clusters, consider wild bootstrap
  • Two-way clustering: For panel data, you may need to cluster by both firm and time
  • Report clustering in table notes: Always document your clustering approach

Use linearmodels package for more sophisticated clustering options.

18.4.6 Formatting Model Fit Statistics

Different journals have different conventions for which fit statistics to report:

def format_model_stats(results, include_stats=None):
    """
    Format model fit statistics for regression tables.

    Parameters
    ----------
    results : statsmodels RegressionResults
        Fitted model
    include_stats : list, optional
        Which statistics to include. Options: 'rsq', 'adj_rsq',
        'fstat', 'nobs', 'aic', 'bic'

    Returns
    -------
    pd.DataFrame
        Formatted statistics
    """
    if include_stats is None:
        include_stats = ['rsq', 'adj_rsq', 'nobs']

    stats_dict = {
        'rsq': ('R-squared', results.rsquared),
        'adj_rsq': ('Adj. R-squared', results.rsquared_adj),
        'fstat': ('F-statistic', results.fvalue),
        'nobs': ('Observations', results.nobs),
    }

    # Add AIC/BIC if available
    if hasattr(results, 'aic'):
        stats_dict['aic'] = ('AIC', results.aic)
    if hasattr(results, 'bic'):
        stats_dict['bic'] = ('BIC', results.bic)

    formatted = []
    for key in include_stats:
        if key in stats_dict:
            name, value = stats_dict[key]
            if key == 'nobs':
                formatted.append({'Statistic': name, 'Value': f'{int(value)}'})
            elif key == 'fstat':
                formatted.append({'Statistic': name, 'Value': f'{value:.2f}'})
            else:
                formatted.append({'Statistic': name, 'Value': f'{value:.4f}'})

    return pd.DataFrame(formatted)

model_stats = format_model_stats(results, include_stats=['rsq', 'adj_rsq', 'fstat', 'nobs'])
model_stats
Statistic Value
0 R-squared 0.0007
1 Adj. R-squared -0.0026
2 F-statistic 0.22
3 Observations 600

18.5 Introduction to LaTeX Tables

Academic journals in finance typically require submissions in LaTeX format. Converting your Python tables to LaTeX is a crucial skill for publishing your research.

18.5.1 Understanding LaTeX Table Structure

A basic LaTeX table has this structure:

\begin{table}[htbp]
  \centering
  \caption{Summary Statistics}
  \label{tab:summary}
  \begin{tabular}{lcccc}
    \toprule
    Variable & Mean & Std Dev & Min & Max \\
    \midrule
    Returns & 0.0105 & 0.0543 & -0.1234 & 0.1876 \\
    Market Cap & 12.45 & 2.34 & 8.12 & 18.99 \\
    \bottomrule
  \end{tabular}
  \begin{tablenotes}
    \small
    \item Note: All values are in decimals.
  \end{tablenotes}
\end{table}

Key components:

  • \begin{table}...\end{table}: Table environment
  • \caption{}: Table title
  • \label{}: Reference label for citing in text
  • \begin{tabular}...\end{tabular}: The actual table
  • {lcccc}: Column alignments (l=left, c=center, r=right)
  • \toprule, \midrule, \bottomrule: Professional-looking horizontal lines
  • \begin{tablenotes}...\end{tablenotes}: Notes below table

18.5.2 Converting pandas DataFrames to LaTeX

pandas provides built-in LaTeX export functionality:

# Basic LaTeX export
latex_output = formatted_stats.to_latex(
    index=True,
    caption='Summary Statistics for Monthly Stock Returns',
    label='tab:summary_stats',
    position='htbp'
)

print(latex_output)
\begin{table}[htbp]
\caption{Summary Statistics for Monthly Stock Returns}
\label{tab:summary_stats}
\begin{tabular}{lllllllr}
\toprule
 & Mean & Std Dev & Min & Max & Skewness & Kurtosis & Obs \\
\midrule
AAPL & -0.0155 & 0.0883 & -0.190 & 0.182 & 0.0714 & -0.4109 & 60 \\
GOOGL & 0.0022 & 0.0639 & -0.174 & 0.163 & -0.1811 & 0.5962 & 60 \\
MSFT & 0.0122 & 0.0717 & -0.114 & 0.218 & 0.3776 & -0.0554 & 60 \\
AMZN & -0.0053 & 0.1001 & -0.205 & 0.362 & 0.7111 & 2.1262 & 60 \\
META & -0.0034 & 0.0653 & -0.208 & 0.128 & -0.3151 & 0.5115 & 60 \\
\bottomrule
\end{tabular}
\end{table}

18.5.3 Customizing LaTeX Output

For publication-quality tables, you’ll want more control over the output:

def df_to_latex_table(df, caption, label, notes=None, index=True):
    """
    Convert DataFrame to well-formatted LaTeX table.

    Parameters
    ----------
    df : pd.DataFrame
        Data to convert
    caption : str
        Table caption
    label : str
        LaTeX label for referencing
    notes : str, optional
        Table notes to appear below table
    index : bool
        Whether to include DataFrame index

    Returns
    -------
    str
        LaTeX table code
    """
    # Start with pandas LaTeX conversion
    latex = df.to_latex(
        index=index,
        escape=False,  # Allow LaTeX commands in data
        column_format='l' + 'c' * len(df.columns),  # Left-align first col, center rest
        position='htbp',
        caption=caption,
        label=label
    )

    # Replace default rules with booktabs rules
    latex = latex.replace('\\toprule', '\\toprule')
    latex = latex.replace('\\midrule', '\\midrule')
    latex = latex.replace('\\bottomrule', '\\bottomrule')

    # Add table notes if provided
    if notes:
        # Find the end of tabular environment
        tabular_end = latex.find('\\end{tabular}')
        if tabular_end != -1:
            notes_latex = f"\n\\begin{{tablenotes}}\n\\small\n\\item {notes}\n\\end{{tablenotes}}\n"
            latex = latex[:tabular_end + len('\\end{tabular}')] + notes_latex + latex[tabular_end + len('\\end{tabular}'):]

    return latex

# Create a nice LaTeX table
latex_table = df_to_latex_table(
    formatted_stats,
    caption='Summary Statistics for Monthly Stock Returns, 2019-2023',
    label='tab:summary',
    notes='This table presents summary statistics for monthly returns of five major technology stocks. '
          'Returns are expressed as decimals. Sample period is January 2019 to December 2023.',
    index=True
)

print(latex_table)
\begin{table}[htbp]
\caption{Summary Statistics for Monthly Stock Returns, 2019-2023}
\label{tab:summary}
\begin{tabular}{lccccccc}
\toprule
 & Mean & Std Dev & Min & Max & Skewness & Kurtosis & Obs \\
\midrule
AAPL & -0.0155 & 0.0883 & -0.190 & 0.182 & 0.0714 & -0.4109 & 60 \\
GOOGL & 0.0022 & 0.0639 & -0.174 & 0.163 & -0.1811 & 0.5962 & 60 \\
MSFT & 0.0122 & 0.0717 & -0.114 & 0.218 & 0.3776 & -0.0554 & 60 \\
AMZN & -0.0053 & 0.1001 & -0.205 & 0.362 & 0.7111 & 2.1262 & 60 \\
META & -0.0034 & 0.0653 & -0.208 & 0.128 & -0.3151 & 0.5115 & 60 \\
\bottomrule
\end{tabular}
\begin{tablenotes}
\small
\item This table presents summary statistics for monthly returns of five major technology stocks. Returns are expressed as decimals. Sample period is January 2019 to December 2023.
\end{tablenotes}

\end{table}
TipLaTeX Table Best Practices

Follow these conventions for professional tables:

  1. Use booktabs package: Provides \toprule, \midrule, \bottomrule for clean horizontal lines
  2. Avoid vertical lines: Modern table design omits vertical separators
  3. Align numbers properly: Center or right-align numeric columns, left-align text
  4. Add informative notes: Explain variable definitions, sample restrictions, significance levels
  5. Use consistent precision: Don’t mix 2 and 4 decimal places randomly
  6. Include label: Always add \label{} so you can reference tables in text with \ref{}

In your LaTeX preamble, include:

\usepackage{booktabs}
\usepackage{threeparttable}

18.5.4 Creating Regression Tables in LaTeX

For regression tables, we want a specific format showing multiple models side-by-side:

def regression_to_latex(models, model_names, caption, label,
                       dep_var_name='Dependent Variable', notes=None):
    """
    Create publication-quality LaTeX regression table.

    Parameters
    ----------
    models : list
        List of fitted statsmodels results
    model_names : list
        Names for each model column
    caption : str
        Table caption
    label : str
        LaTeX label
    dep_var_name : str
        Name of dependent variable
    notes : str, optional
        Table notes

    Returns
    -------
    str
        LaTeX table code
    """
    # Collect all variables
    all_vars = set()
    for model in models:
        all_vars.update(model.params.index)
    all_vars = sorted(list(all_vars))

    # Start building LaTeX
    n_models = len(models)
    col_spec = 'l' + 'c' * n_models

    latex = f"""\\begin{{table}}[htbp]
\\centering
\\caption{{{caption}}}
\\label{{{label}}}
\\begin{{tabular}}{{{col_spec}}}
\\toprule
"""

    # Header row
    latex += f" & " + " & ".join([f"({i+1})" for i in range(n_models)]) + " \\\\\n"
    latex += f" & " + " & ".join(model_names) + " \\\\\n"
    latex += "\\midrule\n"

    # Variable rows
    for var in all_vars:
        # Variable name
        var_display = var.replace('_', '\\_')  # Escape underscores for LaTeX
        latex += f"{var_display}"

        # Coefficients
        for model in models:
            if var in model.params:
                coef = model.params[var]
                pval = model.pvalues[var]
                stars = ""
                if pval < 0.01:
                    stars = "^{***}"
                elif pval < 0.05:
                    stars = "^{**}"
                elif pval < 0.10:
                    stars = "^{*}"
                latex += f" & ${coef:.4f}{stars}$"
            else:
                latex += " & "
        latex += " \\\\\n"

        # Standard errors
        latex += " "
        for model in models:
            if var in model.params:
                se = model.bse[var]
                latex += f" & $({se:.4f})$"
            else:
                latex += " & "
        latex += " \\\\\n"

    # Model statistics
    latex += "\\midrule\n"

    # R-squared
    latex += "R-squared"
    for model in models:
        latex += f" & ${model.rsquared:.4f}$"
    latex += " \\\\\n"

    # Observations
    latex += "Observations"
    for model in models:
        latex += f" & ${int(model.nobs)}$"
    latex += " \\\\\n"

    latex += "\\bottomrule\n"
    latex += "\\end{tabular}\n"

    # Add notes
    if notes:
        latex += f"""\\begin{{tablenotes}}
\\small
\\item \\textit{{Notes:}} {notes}
\\end{{tablenotes}}
"""

    latex += "\\end{table}\n"

    return latex

# Generate LaTeX regression table
reg_latex = regression_to_latex(
    models=[model1, model2],
    model_names=['Base Model', 'Full Model'],
    caption='Determinants of Stock Returns',
    label='tab:regressions',
    dep_var_name='Returns',
    notes='This table presents OLS regression results. The dependent variable is monthly stock returns. '
          'Standard errors are shown in parentheses. '
          '*, **, and *** denote significance at the 10\\%, 5\\%, and 1\\% levels, respectively.'
)

print(reg_latex)
\begin{table}[htbp]
\centering
\caption{Determinants of Stock Returns}
\label{tab:regressions}
\begin{tabular}{lcc}
\toprule
 & (1) & (2) \\
 & Base Model & Full Model \\
\midrule
book\_to\_market &  & $-0.0005$ \\
  &  & $(0.0031)$ \\
const & $0.0035$ & $0.0041$ \\
  & $(0.0107)$ & $(0.0113)$ \\
log\_market\_cap & $0.0007$ & $0.0007$ \\
  & $(0.0010)$ & $(0.0010)$ \\
\midrule
R-squared & $0.0007$ & $0.0007$ \\
Observations & $600$ & $600$ \\
\bottomrule
\end{tabular}
\begin{tablenotes}
\small
\item \textit{Notes:} This table presents OLS regression results. The dependent variable is monthly stock returns. Standard errors are shown in parentheses. *, **, and *** denote significance at the 10\%, 5\%, and 1\% levels, respectively.
\end{tablenotes}
\end{table}

18.5.5 Saving Tables to Files

In practice, you’ll want to save your LaTeX tables to files for inclusion in your paper:

def save_latex_table(latex_string, filename):
    """
    Save LaTeX table to file.

    Parameters
    ----------
    latex_string : str
        LaTeX table code
    filename : str
        Output filename (should end in .tex)
    """
    with open(filename, 'w') as f:
        f.write(latex_string)
    print(f"Table saved to {filename}")

# Example (not actually saving to avoid creating files in this tutorial)
# save_latex_table(reg_latex, 'tables/regression_results.tex')

In your LaTeX document, you would then include the table with:

\input{tables/regression_results.tex}
NoteWorkflow for Managing Tables

A recommended workflow for research papers:

  1. Generate all tables in Python: Keep table generation code in well-organized scripts
  2. Export to .tex files: Save each table to a separate file in a tables/ directory
  3. Include in main document: Use \input{} to include tables in your paper
  4. Automate updates: When data or specifications change, re-run Python scripts to update all tables

This separation keeps your main LaTeX document clean and makes it easy to update tables as your research evolves.

18.5.6 Using stargazer for LaTeX Output

Returning to stargazer, it can directly generate LaTeX output:

# Create stargazer table with LaTeX output
stargazer_latex = Stargazer([model1, model2])

# Configure output
stargazer_latex.title('Determinants of Stock Returns')
stargazer_latex.show_model_numbers(True)
stargazer_latex.custom_columns(['Base Model', 'Full Model'], [1, 1])

# Get LaTeX code
print(stargazer_latex.render_latex())
\begin{table}[!htbp] \centering
  \caption{Determinants of Stock Returns}
\begin{tabular}{@{\extracolsep{5pt}}lcc}
\\[-1.8ex]\hline
\hline \\[-1.8ex]
& \multicolumn{2}{c}{\textit{Dependent variable: returns}} \
\cr \cline{2-3}
\\[-1.8ex] & \multicolumn{1}{c}{Base Model} & \multicolumn{1}{c}{Full Model}  \\
\\[-1.8ex] & (1) & (2) \\
\hline \\[-1.8ex]
 book_to_market & & -0.001$^{}$ \\
& & (0.003) \\
 const & 0.003$^{}$ & 0.004$^{}$ \\
& (0.011) & (0.011) \\
 log_market_cap & 0.001$^{}$ & 0.001$^{}$ \\
& (0.001) & (0.001) \\
\hline \\[-1.8ex]
 Observations & 600 & 600 \\
 $R^2$ & 0.001 & 0.001 \\
 Adjusted $R^2$ & -0.001 & -0.003 \\
 Residual Std. Error & 0.050 (df=598) & 0.050 (df=597) \\
 F Statistic & 0.408$^{}$ (df=1; 598) & 0.217$^{}$ (df=2; 597) \\
\hline
\hline \\[-1.8ex]
\textit{Note:} & \multicolumn{2}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01} \\
\end{tabular}
\end{table}

stargazer handles many formatting details automatically and is particularly good at managing multiple model specifications.

18.6 Advanced Topics and Best Practices

18.6.1 Panel Data Tables with Fixed Effects

When presenting panel regression results, clearly indicate which fixed effects are included:

# Example of documenting fixed effects in table
def add_fixed_effects_rows(models, fe_dict):
    """
    Add fixed effects indicator rows to regression table.

    Parameters
    ----------
    models : list
        List of model results
    fe_dict : dict
        Dictionary mapping model index to dict of FE types
        Example: {0: {'Firm FE': False, 'Year FE': False},
                  1: {'Firm FE': True, 'Year FE': False}}

    Returns
    -------
    pd.DataFrame
        Fixed effects indicator table
    """
    fe_types = list(fe_dict[0].keys())

    data = []
    for fe_type in fe_types:
        row = {'Fixed Effect': fe_type}
        for i in range(len(models)):
            row[f'Model {i+1}'] = 'Yes' if fe_dict[i][fe_type] else 'No'
        data.append(row)

    return pd.DataFrame(data)

# Example fixed effects table
fe_indicators = add_fixed_effects_rows(
    models=[model1, model2, model3],
    fe_dict={
        0: {'Firm FE': False, 'Year FE': False},
        1: {'Firm FE': False, 'Year FE': False},
        2: {'Firm FE': True, 'Year FE': False}
    }
)

fe_indicators
Fixed Effect Model 1 Model 2 Model 3
0 Firm FE No No Yes
1 Year FE No No No

18.6.2 Combining Summary Statistics and Tests

For treatment effect studies, you often want to combine summary statistics with balance tests:

def create_balance_table(df, treatment_col, variables, control_name='Control',
                        treatment_name='Treatment'):
    """
    Create a balance table showing pre-treatment differences.

    Parameters
    ----------
    df : pd.DataFrame
        Dataset with treatment indicator
    treatment_col : str
        Name of treatment column (binary)
    variables : list
        Variables to compare
    control_name : str
        Label for control group
    treatment_name : str
        Label for treatment group

    Returns
    -------
    pd.DataFrame
        Balance table with means and difference tests
    """
    results = []

    for var in variables:
        control_data = df[df[treatment_col] == 0][var].dropna()
        treatment_data = df[df[treatment_col] == 1][var].dropna()

        control_mean = control_data.mean()
        treatment_mean = treatment_data.mean()

        t_stat, p_val = scipy_stats.ttest_ind(control_data, treatment_data)

        results.append({
            'Variable': var,
            f'{control_name} Mean': f'{control_mean:.3f}',
            f'{treatment_name} Mean': f'{treatment_mean:.3f}',
            'Difference': f'{treatment_mean - control_mean:.3f}',
            't-statistic': f'{t_stat:.3f}',
            'p-value': f'{p_val:.3f}'
        })

    return pd.DataFrame(results)

# Example (creating artificial treatment variable)
panel_df['treatment'] = (panel_df['firm_id'] <= 5).astype(int)

balance_table = create_balance_table(
    panel_df,
    'treatment',
    ['returns', 'log_market_cap', 'book_to_market'],
    control_name='Control Firms',
    treatment_name='Treatment Firms'
)

balance_table
Variable Control Firms Mean Treatment Firms Mean Difference t-statistic p-value
0 returns 0.013 0.008 -0.005 1.178 0.239
1 log_market_cap 10.054 10.078 0.024 -0.150 0.880
2 book_to_market 0.986 0.987 0.002 -0.029 0.977

18.6.3 Journal-Specific Formatting

Different journals have different table formatting requirements. Here are some common variations:

# Example: Compact format vs. spacious format
def format_for_journal(df, journal_style='compact'):
    """
    Format table according to journal requirements.

    Parameters
    ----------
    df : pd.DataFrame
        Table to format
    journal_style : str
        'compact' for journals with page limits, 'spacious' for others

    Returns
    -------
    str
        Formatted LaTeX code
    """
    if journal_style == 'compact':
        # Use smaller font, tighter spacing
        latex = df.to_latex(index=True, escape=False)
        latex = "\\small\n" + latex
        latex = latex.replace("\\begin{tabular}",
                            "\\begin{tabular}[t]")
    else:
        # Standard format with more spacing
        latex = df.to_latex(index=True, escape=False)

    return latex

# Compact version for journals with strict page limits
compact_table = format_for_journal(formatted_stats, journal_style='compact')
print("Compact format:")
print(compact_table[:300])  # Show first 300 characters
Compact format:
\small
\begin{tabular}[t]{lllllllr}
\toprule
 & Mean & Std Dev & Min & Max & Skewness & Kurtosis & Obs \\
\midrule
AAPL & -0.0155 & 0.0883 & -0.190 & 0.182 & 0.0714 & -0.4109 & 60 \\
GOOGL & 0.0022 & 0.0639 & -0.174 & 0.163 & -0.1811 & 0.5962 & 60 \\
MSFT & 0.0122 & 0.0717 & -0.114 & 0.218 & 0.3776 
TipCheck Journal Guidelines

Before finalizing your tables:

  1. Review recent papers in your target journal to see their table style
  2. Check submission guidelines for specific requirements on:
    • Font sizes
    • Table placement (in-text vs. end of document)
    • File formats accepted (.tex, .pdf, etc.)
    • Maximum table width
  3. Note any requirements about:
    • Significance levels to report
    • Decimal places for different statistics
    • Required model diagnostics
    • Footnote formatting

18.7 Summary and Best Practices

18.7.1 Key Takeaways

  1. Summary statistics tables should present central tendency, dispersion, and relevant distributional properties of your data
  2. Regression tables must clearly show coefficients, standard errors (or t-statistics), significance levels, and model fit statistics
  3. LaTeX export is essential for academic publishing; pandas provides basic functionality, but custom functions or stargazer offer more control
  4. Formatting consistency across all tables in a paper is crucial for professionalism
  5. Documentation in table notes should explain variable definitions, sample restrictions, and statistical methods

18.7.2 Checklist for Publication-Quality Tables

Before submitting tables in a paper, verify:

18.7.3 Common Pitfalls to Avoid

  1. Too many decimal places: Don’t report spurious precision
  2. Missing standard errors: Always report uncertainty measures
  3. Unclear variable names: Use descriptive labels, not raw column names
  4. No table notes: Readers need context about your methods and data
  5. Inconsistent formatting: All tables in a paper should follow the same style
  6. Missing model statistics: R-squared and sample size are almost always required
  7. Overly wide tables: Consider splitting or transposing tables that don’t fit on a page

18.8 Additional Resources

For further learning about creating research tables:

18.8.1 Example Papers with Excellent Tables

Study tables in these highly-cited finance papers:

  • Fama and French (1993) - “Common Risk Factors in the Returns on Stocks and Bonds”
  • Amihud (2002) - “Illiquidity and Stock Returns”
  • Petersen (2009) - “Estimating Standard Errors in Finance Panel Data Sets”

These papers exemplify clear table presentation that effectively communicates empirical findings.