Financial data rarely arrives in the exact format you need for analysis. Stock returns might come as one row per stock-date combination, but you need a matrix with dates as rows and stocks as columns. Or you might receive a wide table of quarterly earnings across columns that needs to be converted into a tidy long format for regression analysis. These transformations—collectively known as “reshaping”—are fundamental operations in empirical finance.
This chapter covers the essential reshaping operations you’ll use repeatedly: converting between long and wide formats, pivoting and melting data, and stacking and unstacking hierarchical indices. We’ll work through practical examples using both pandas and Polars, showing how each library handles these transformations.
15.1 Long vs. Wide Formats
The distinction between long and wide formats is fundamental to data analysis. Understanding when to use each format, and how to convert between them, will save you countless hours of frustration.
15.1.1 What Are Long and Wide Formats?
Wide format stores each entity as a row and each time period (or category) as a separate column. This is how many people naturally think about panel data—it looks like a spreadsheet:
Long format (also called “tidy” or “narrow” format) stores each observation as a separate row, with columns indicating the entity, time period, and value:
Calculating cross-sectional statistics (comparing values across entities at a point in time)
Creating visualizations that show multiple series over time
Working with time series models that expect matrices
Use long format when:
Running regressions or statistical models (most modeling libraries expect long data)
Creating grouped or faceted visualizations
Filtering or aggregating by category
Following “tidy data” principles for data analysis
In empirical finance, you’ll often start with data in one format and need to convert it for your specific analysis. For example, stock return data from CRSP arrives in long format (one row per stock-date), but calculating a covariance matrix requires wide format (dates as rows, stocks as columns).
15.1.3 A Simple Example
Let’s create a small dataset of daily stock returns in long format:
import pandas as pdimport polars as plfrom datetime import datetime, timedelta# Create sample return data in long formatdates = pd.date_range('2024-01-01', periods=5, freq='D')tickers = ['AAPL', 'MSFT', 'GOOGL']# pandas versiondata_long_pd = pd.DataFrame([ {'date': date, 'ticker': ticker, 'return': round(0.01* (hash(str(date) + ticker) %100-50) /100, 4)}for date in datesfor ticker in tickers])data_long_pd.head(10)
date
ticker
return
0
2024-01-01
AAPL
0.0047
1
2024-01-01
MSFT
0.0023
2
2024-01-01
GOOGL
-0.0038
3
2024-01-02
AAPL
0.0047
4
2024-01-02
MSFT
0.0035
5
2024-01-02
GOOGL
-0.0009
6
2024-01-03
AAPL
-0.0027
7
2024-01-03
MSFT
-0.0035
8
2024-01-03
GOOGL
-0.0026
9
2024-01-04
AAPL
-0.0017
# Polars versiondata_long_pl = pl.DataFrame([ {'date': date, 'ticker': ticker, 'return': round(0.01* (hash(str(date) + ticker) %100-50) /100, 4)}for date in datesfor ticker in tickers])data_long_pl.head(10)
shape: (10, 3)
date
ticker
return
datetime[μs]
str
f64
2024-01-01 00:00:00
"AAPL"
0.0047
2024-01-01 00:00:00
"MSFT"
0.0023
2024-01-01 00:00:00
"GOOGL"
-0.0038
2024-01-02 00:00:00
"AAPL"
0.0047
2024-01-02 00:00:00
"MSFT"
0.0035
2024-01-02 00:00:00
"GOOGL"
-0.0009
2024-01-03 00:00:00
"AAPL"
-0.0027
2024-01-03 00:00:00
"MSFT"
-0.0035
2024-01-03 00:00:00
"GOOGL"
-0.0026
2024-01-04 00:00:00
"AAPL"
-0.0017
This long format is ideal for most statistical operations. Each row is an observation: a specific stock on a specific date. You can easily filter to specific stocks, calculate summary statistics by ticker, or run panel regressions.
15.2 Pivot, Melt, Stack, Unstack
Python’s data libraries provide several operations for reshaping data. While they might seem redundant at first, each serves a specific purpose and understanding all of them makes you more fluent in data manipulation.
15.2.1 Pivot: Long to Wide
The pivot operation transforms long format data into wide format. You specify which column becomes the index (rows), which becomes the columns, and which contains the values to fill the resulting matrix.
pandas syntax:
# Convert long to wide: dates as rows, tickers as columnsdata_wide_pd = data_long_pd.pivot( index='date', columns='ticker', values='return')data_wide_pd
1
Column that becomes the row index
2
Column whose unique values become new columns
3
Column containing the values to fill the matrix
ticker
AAPL
GOOGL
MSFT
date
2024-01-01
0.0047
-0.0038
0.0023
2024-01-02
0.0047
-0.0009
0.0035
2024-01-03
-0.0027
-0.0026
-0.0035
2024-01-04
-0.0017
0.0007
-0.0033
2024-01-05
-0.0013
0.0006
-0.0012
Notice that pivot creates a DataFrame where:
Each unique date becomes a row
Each unique ticker becomes a column
Return values fill the cells
Polars syntax:
# Polars uses pivot differently - need to specify aggregationdata_wide_pl = data_long_pl.pivot( index='date', columns='ticker', values='return')data_wide_pl
/var/folders/jr/cn9h86ld68qb5rtvs9gsb1vr0000gn/T/ipykernel_60679/105412148.py:2: DeprecationWarning: the argument `columns` for `DataFrame.pivot` is deprecated. It was renamed to `on` in version 1.0.0.
data_wide_pl = data_long_pl.pivot(
shape: (5, 4)
date
AAPL
MSFT
GOOGL
datetime[μs]
f64
f64
f64
2024-01-01 00:00:00
0.0047
0.0023
-0.0038
2024-01-02 00:00:00
0.0047
0.0035
-0.0009
2024-01-03 00:00:00
-0.0027
-0.0035
-0.0026
2024-01-04 00:00:00
-0.0017
-0.0033
0.0007
2024-01-05 00:00:00
-0.0013
-0.0012
0.0006
Polars’ pivot is similar but requires you to think about aggregation from the start. If multiple rows could map to the same cell (same date and ticker), you must specify how to combine them (sum, mean, first, etc.).
15.2.2 Handling Duplicate Entries
One common issue with pivoting is duplicate entries. What happens if your data has multiple return observations for the same stock on the same date?
# Create data with duplicatesdata_with_dups_pd = pd.DataFrame([ {'date': '2024-01-01', 'ticker': 'AAPL', 'return': 0.01}, {'date': '2024-01-01', 'ticker': 'AAPL', 'return': 0.02}, # duplicate! {'date': '2024-01-01', 'ticker': 'MSFT', 'return': 0.03},])# pandas pivot will fail with duplicatestry: data_with_dups_pd.pivot(index='date', columns='ticker', values='return')exceptValueErroras e:print(f"Error: {e}")
Error: Index contains duplicate entries, cannot reshape
When pandas encounters duplicates, it raises an error. You have two options:
Use pivot_table with an aggregation function
Clean your data first to remove duplicates
# Option 1: Use pivot_table with aggregationwide_with_agg_pd = data_with_dups_pd.pivot_table( index='date', columns='ticker', values='return', aggfunc='mean'# or 'sum', 'first', 'last', etc.)wide_with_agg_pd
ticker
AAPL
MSFT
date
2024-01-01
0.015
0.03
Polars requires you to specify aggregation from the start, so duplicates are handled automatically:
/var/folders/jr/cn9h86ld68qb5rtvs9gsb1vr0000gn/T/ipykernel_60679/556176386.py:8: DeprecationWarning: the argument `columns` for `DataFrame.pivot` is deprecated. It was renamed to `on` in version 1.0.0.
wide_with_agg_pl = data_with_dups_pl.pivot(
shape: (1, 3)
date
AAPL
MSFT
str
f64
f64
"2024-01-01"
0.015
0.03
15.2.3 Melt/Unpivot: Wide to Long
The melt operation (called unpivot in Polars) is the inverse of pivot—it transforms wide format data back into long format. You specify which columns to keep as identifiers and which to “melt” into a single column.
pandas syntax:
# Reset index to make date a regular columndata_wide_pd_reset = data_wide_pd.reset_index()# Melt back to long formatdata_melted_pd = data_wide_pd_reset.melt( id_vars=['date'], value_vars=['AAPL', 'MSFT', 'GOOGL'], var_name='ticker', value_name='return')data_melted_pd.head(10)
1
Columns to keep as identifiers
2
Columns to melt (optional—melts all others if not specified)
3
Name for the new column containing original column names
Columns to keep as identifiers (called index in Polars)
2
Columns to unpivot (called on in Polars)
shape: (10, 3)
date
ticker
return
datetime[μs]
str
f64
2024-01-01 00:00:00
"AAPL"
0.0047
2024-01-02 00:00:00
"AAPL"
0.0047
2024-01-03 00:00:00
"AAPL"
-0.0027
2024-01-04 00:00:00
"AAPL"
-0.0017
2024-01-05 00:00:00
"AAPL"
-0.0013
2024-01-01 00:00:00
"MSFT"
0.0023
2024-01-02 00:00:00
"MSFT"
0.0035
2024-01-03 00:00:00
"MSFT"
-0.0035
2024-01-04 00:00:00
"MSFT"
-0.0033
2024-01-05 00:00:00
"MSFT"
-0.0012
15.2.4 Stack and Unstack: Index-Based Reshaping
While pivot and melt work with columns, stack and unstack work with index levels. They’re particularly useful when you have MultiIndex DataFrames (hierarchical indices with multiple levels).
Stack: Moves a column level into the row index (wide to long)
Unstack: Moves a row index level into the columns (long to wide)
# Create a MultiIndex DataFrame# Set both date and ticker as indexdata_multi_pd = data_long_pd.set_index(['date', 'ticker'])data_multi_pd.head()
return
date
ticker
2024-01-01
AAPL
0.0047
MSFT
0.0023
GOOGL
-0.0038
2024-01-02
AAPL
0.0047
MSFT
0.0035
# Unstack: move ticker from index to columns (long to wide)data_unstacked_pd = data_multi_pd.unstack(level='ticker')data_unstacked_pd
return
ticker
AAPL
GOOGL
MSFT
date
2024-01-01
0.0047
-0.0038
0.0023
2024-01-02
0.0047
-0.0009
0.0035
2024-01-03
-0.0027
-0.0026
-0.0035
2024-01-04
-0.0017
0.0007
-0.0033
2024-01-05
-0.0013
0.0006
-0.0012
# Stack: move ticker from columns back to index (wide to long)data_stacked_pd = data_unstacked_pd.stack(level='ticker')data_stacked_pd.head()
/var/folders/jr/cn9h86ld68qb5rtvs9gsb1vr0000gn/T/ipykernel_60679/4172318071.py:2: FutureWarning: The previous implementation of stack is deprecated and will be removed in a future version of pandas. See the What's New notes for pandas 2.1.0 for details. Specify future_stack=True to adopt the new implementation and silence this warning.
data_stacked_pd = data_unstacked_pd.stack(level='ticker')
return
date
ticker
2024-01-01
AAPL
0.0047
GOOGL
-0.0038
MSFT
0.0023
2024-01-02
AAPL
0.0047
GOOGL
-0.0009
The key difference between pivot/melt and stack/unstack:
pivot/melt: Work with regular columns, create new structure from scratch
stack/unstack: Work with existing MultiIndex structure, move levels between index and columns
Polars doesn’t have direct equivalents to stack/unstack because it doesn’t use MultiIndex. Instead, you would use combinations of pivot, melt, group_by, and join to achieve the same results.
15.3 Summary
Reshaping data is a fundamental skill in empirical finance. The key operations are:
Long vs. Wide: Choose the format appropriate for your analysis
Long for modeling and tidy operations
Wide for matrix operations and cross-sectional analysis
Pivot/Melt: Convert between long and wide formats
pivot: long → wide (create matrix from observations)
melt/unpivot: wide → long (create observations from matrix)
Handle duplicates carefully with aggregation
Stack/Unstack: Work with MultiIndex structures
unstack: move index level to columns
stack: move column level to index
Primarily useful in pandas (Polars doesn’t use MultiIndex)
The specific syntax differs between pandas and Polars, but the concepts are universal. pandas uses pivot, pivot_table, melt, stack, and unstack. Polars uses pivot and unpivot, relying more on grouped operations for reshaping tasks.
Practice these operations until they become second nature. You’ll use them constantly in empirical finance, and fluency with reshaping will make your analyses both faster and more reliable.