17 Data Visualization

Effective visualization is essential in empirical finance. A well-crafted plot can reveal patterns in returns, expose outliers in trading data, or communicate regression results more clearly than pages of tables. This chapter introduces the core tools for creating publication-quality figures in Python: matplotlib for fine-grained control and seaborn for elegant statistical graphics.

We’ll focus on practical skills: understanding matplotlib’s architecture, using seaborn for exploratory analysis, and creating the kinds of plots you’ll need for academic papers—time series of stock prices and returns, distribution plots, and regression diagnostics. We’ll also cover how to export figures that meet journal standards for resolution and formatting.

17.1 Principles of Effective Data Visualization

Before diving into the technical tools, it’s worth considering what makes a visualization effective. The purpose of data visualization is to communicate information clearly and honestly. A good figure should reveal patterns in your data, support the conclusions you draw, and help your audience understand complex relationships quickly.

When creating visualizations for academic papers, keep these principles in mind:

Clarity over decoration: Resist the temptation to add visual flourishes that don’t convey information. Simple, clean figures are easier to interpret and reproduce.
Appropriate precision: Show uncertainty when it exists. Error bars, confidence intervals, and shaded regions can communicate the reliability of your estimates.
Honest scales: Start axes at zero when comparing magnitudes, use consistent scales across panels, and avoid truncating axes in ways that exaggerate differences.
Accessibility: Choose color palettes that remain distinguishable for colorblind readers, and ensure your figures are interpretable in grayscale for readers who print papers.

The tools we’ll cover in this chapter—matplotlib, pandas, and seaborn—give you the technical ability to create any visualization you need. The principles above will help you decide what to create and how to make it effective.

17.1.1 Recommended additional readings

Tufte (2001) established foundational principles for statistical graphics that remain influential today. His concept of data-ink ratio—the proportion of a graphic’s ink devoted to non-redundant display of data—encourages removing unnecessary visual elements (what he calls “chartjunk”) that distract from the data itself. While modern digital displays have made some of these concerns less pressing, the underlying principle holds: every visual element should serve a purpose.

Knaflic (2015) offers a more practical, business-oriented perspective on data visualization. Her framework emphasizes understanding your audience, choosing the right visual for your message, and eliminating clutter. She advocates for thoughtful use of pre-attentive attributes—visual properties like color, size, and position that our brains process almost instantly—to guide attention to the most important parts of your figure. For instance, using a single accent color to highlight a key data series while keeping everything else in muted gray immediately draws the viewer’s eye to what matters most.

Wilke (2019) provides comprehensive coverage of visualization types and when to use them. His book is particularly useful for understanding which plot types work best for different kinds of data: histograms and density plots for distributions, scatter plots for relationships, line plots for trends over time, and so on. He also addresses practical concerns like choosing color palettes that work for colorblind readers and in grayscale printing.

Rougier (2021) is a comprehensive guide to scientific visualization with Python and matplotlib. The book covers everything from basic plotting to advanced techniques for creating publication-quality figures, with a focus on the needs of researchers and scientists. It is freely available online at https://github.com/rougier/scientific-visualization-book.

17.2 Matplotlib Fundamentals

Matplotlib is the foundation of Python’s visualization ecosystem. While higher-level libraries like seaborn provide convenient interfaces, understanding matplotlib’s core concepts gives you the flexibility to create any visualization you need. For more details, see the Matplotlib User Guide.

17.2.1 The Figure and Axes Architecture

Matplotlib organizes plots into a hierarchy:

The Figure is the top-level container—the entire window or page where your plot lives
Each figure contains one or more Axes objects—these are the individual plots (despite the name, an Axes is a plot, not just the x and y axes)
Each Axes has x and y axis objects, titles, labels, and the actual plotted data

This separation gives you precise control over layout and appearance.

import matplotlib.pyplot as plt
import numpy as np

# Create a figure and a single axes
fig, ax = plt.subplots(figsize=(8, 5))

# Generate sample data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Plot on the axes
ax.plot(x, y, linewidth=2, color='steelblue')
ax.set_xlabel('Time')
ax.set_ylabel('Returns')
ax.set_title('Basic Time Series Plot')
ax.grid(True, alpha=0.3)

plt.show()

The pattern fig, ax = plt.subplots() is the standard way to start a plot. It creates both the figure and axes explicitly, giving you handles to customize each component.

Explicit vs. Implicit Interface

Matplotlib has two interfaces: the pyplot interface (e.g., plt.plot()) and the object-oriented interface (e.g., ax.plot()). The pyplot interface is convenient for quick plots but maintains hidden state. For publication-quality work, prefer the object-oriented interface—it’s more explicit and gives you finer control.

17.2.2 Multiple Subplots

Financial analysis often requires comparing multiple series. You can create multiple axes in a grid:

# Create a 2x2 grid of plots
fig, axes = plt.subplots(2, 2, figsize=(10, 8))

# Generate sample financial data
np.random.seed(42)
returns = np.random.normal(0.001, 0.02, 252)
prices = 100 * np.exp(np.cumsum(returns))
volume = np.random.lognormal(15, 0.5, 252)

# Plot on each subplot
axes[0, 0].plot(prices)
axes[0, 0].set_title('Price')
axes[0, 0].set_ylabel('Price ($)')

axes[0, 1].plot(returns)
axes[0, 1].set_title('Returns')
axes[0, 1].set_ylabel('Return')

axes[1, 0].hist(returns, bins=30, edgecolor='black', alpha=0.7)
axes[1, 0].set_title('Return Distribution')
axes[1, 0].set_xlabel('Return')

axes[1, 1].bar(range(len(volume[:20])), volume[:20])
axes[1, 1].set_title('Volume (First 20 Days)')
axes[1, 1].set_xlabel('Day')

plt.tight_layout()
plt.show()

Always call tight_layout()

The tight_layout() function automatically adjusts spacing to prevent labels from overlapping or being cut off. It’s good practice to call it before plt.show() or plt.savefig().

17.2.3 Common Plot Types

Matplotlib provides methods for all standard plot types. Here we demonstrate the most common ones.

Line plots are the workhorse of time-series visualization:

np.random.seed(123)
x = np.linspace(0, 4, 50)
y1 = np.exp(-x) * np.cos(2 * np.pi * x)
y2 = np.exp(-x)

fig, ax = plt.subplots(figsize=(6, 4))
ax.plot(x, y1, 'b-', label='Oscillating')
ax.plot(x, y2, 'r--', label='Decay')
ax.legend()
ax.set_title('Line Plot')
plt.show()

Scatter plots show relationships between two variables:

fig, ax = plt.subplots(figsize=(6, 4))
ax.scatter(x, y1 + np.random.normal(0, 0.1, len(x)), alpha=0.6, s=50)
ax.set_title('Scatter Plot')
plt.show()

Bar plots compare discrete categories:

fig, ax = plt.subplots(figsize=(6, 4))
categories = ['A', 'B', 'C', 'D']
values = [23, 45, 56, 78]
ax.bar(categories, values, color='steelblue')
ax.set_title('Bar Plot')
plt.show()

Histograms display the distribution of a single variable:

fig, ax = plt.subplots(figsize=(6, 4))
data = np.random.normal(0, 1, 1000)
ax.hist(data, bins=30, edgecolor='black', alpha=0.7)
ax.set_title('Histogram')
plt.show()

Filled area plots emphasize magnitude under a curve:

fig, ax = plt.subplots(figsize=(6, 4))
ax.fill_between(x, 0, y1, alpha=0.3, color='green')
ax.plot(x, y1, 'g-', linewidth=2)
ax.set_title('Filled Area')
plt.show()

Box plots summarize distributions and highlight outliers:

fig, ax = plt.subplots(figsize=(6, 4))
data_groups = [np.random.normal(0, std, 100) for std in range(1, 5)]
ax.boxplot(data_groups)
ax.set_title('Box Plot')
ax.set_xticklabels(['Low', 'Med', 'High', 'V.High'])
plt.show()

17.2.4 Customizing Appearance

Professional figures require attention to fonts, colors, line styles, and legends:

fig, ax = plt.subplots(figsize=(9, 5))

# Generate multiple series
t = np.linspace(0, 2, 100)
series = {
    'Asset A': np.sin(2 * np.pi * t) + 0.1 * np.random.randn(100),
    'Asset B': np.cos(2 * np.pi * t) + 0.1 * np.random.randn(100),
    'Asset C': 0.5 * np.sin(4 * np.pi * t) + 0.1 * np.random.randn(100),
}

# Plot with custom styles
colors = ['#1f77b4', '#ff7f0e', '#2ca02c']
linestyles = ['-', '--', '-.']

for (name, data), color, style in zip(series.items(), colors, linestyles):
    ax.plot(t, data, label=name, color=color, linestyle=style,
            linewidth=2, alpha=0.8)

# Customize axes
ax.set_xlabel('Time (years)', fontsize=11)
ax.set_ylabel('Cumulative Return', fontsize=11)
ax.set_title('Multi-Asset Performance', fontsize=13, fontweight='bold')

# Grid and legend
ax.grid(True, linestyle=':', alpha=0.6)
ax.legend(loc='upper right', framealpha=0.9, fontsize=10)

# Spine customization (remove top and right borders)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.show()

17.3 Plotting with Pandas

For most data analysis work, the pandas plot() method is the most convenient way to create visualizations. It provides a simple interface that works directly on DataFrames and Series, automatically handling index values, column labels, and legends. Under the hood, pandas uses matplotlib, so everything you’ve learned about customizing figures still applies.

17.3.1 Basic plotting with Series and DataFrames

When you call plot() on a pandas object, the index becomes the x-axis and the values become the y-axis:

import pandas as pd

# Create a simple time series
np.random.seed(42)
dates = pd.date_range('2023-01-01', periods=100, freq='D')
returns = pd.Series(np.random.normal(0.001, 0.02, 100), index=dates)
prices = (1 + returns).cumprod() * 100

# Basic plot - index is automatically used as x-axis
prices.plot(figsize=(10, 4), title='Stock Price')
plt.ylabel('Price ($)')
plt.show()

With a DataFrame, each column becomes a separate line:

# Create multi-asset DataFrame
np.random.seed(123)
returns_df = pd.DataFrame({
    'Tech': np.random.normal(0.0005, 0.020, 100),
    'Finance': np.random.normal(0.0003, 0.015, 100),
    'Energy': np.random.normal(0.0002, 0.025, 100)
}, index=dates)

prices_df = (1 + returns_df).cumprod() * 100

# All columns plotted automatically with legend
prices_df.plot(figsize=(10, 4), title='Multi-Asset Prices')
plt.ylabel('Price ($)')
plt.show()

17.3.2 Integrating with matplotlib

The pandas plot() method returns a matplotlib Axes object, allowing you to combine the convenience of pandas with full matplotlib customization. You can also pass an existing Axes to plot on:

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Plot on specific axes using the ax parameter
prices_df['Tech'].plot(ax=axes[0], color='steelblue', linewidth=2)
axes[0].set_title('Tech Sector')
axes[0].set_ylabel('Price ($)')

# Use the returned Axes for further customization
ax = prices_df.plot(ax=axes[1], alpha=0.8)
ax.set_title('All Sectors')
ax.set_ylabel('Price ($)')
ax.axhline(100, color='gray', linestyle='--', alpha=0.5)

plt.tight_layout()
plt.show()

17.3.3 Plot types

The pandas plot() method supports many plot types via the kind parameter:

fig, axes = plt.subplots(2, 2, figsize=(10, 8))

# Line plot (default)
prices_df.plot(ax=axes[0, 0], title='Line Plot')

# Area plot
prices_df.plot(kind='area', ax=axes[0, 1], alpha=0.4, title='Area Plot')

# Bar plot - useful for comparing values at specific points
monthly_returns = returns_df.resample('ME').sum()
monthly_returns.iloc[-6:].plot(kind='bar', ax=axes[1, 0], title='Monthly Returns')
axes[1, 0].tick_params(axis='x', rotation=45)

# Histogram of returns
returns_df.plot(kind='hist', ax=axes[1, 1], bins=30, alpha=0.7, title='Return Distribution')

plt.tight_layout()
plt.show()

17.3.4 Why use pandas plotting?

For working with financial data in DataFrames, pandas plotting offers several advantages:

DateTime index handling: When your index is a DatetimeIndex, pandas automatically formats dates on the x-axis and handles gaps appropriately.
Column awareness: Each column becomes a separate series with automatic legend entries based on column names.
Less boilerplate: Common operations like creating legends and setting labels require less code.
Direct integration: You can chain plotting with other pandas operations like resample(), groupby(), and rolling().

When you need more control than pandas provides, you can always drop down to the matplotlib API by getting the Axes object and modifying it directly.

17.4 Seaborn for Exploratory Analysis

Seaborn is a statistical visualization library built on top of matplotlib. It provides a higher-level interface for creating attractive and informative graphics, with particular strengths in visualizing distributions and relationships in data. Seaborn integrates closely with pandas DataFrames, making it natural to visualize data directly from your analysis workflow. For a comprehensive introduction to seaborn’s capabilities, see An introduction to seaborn.

17.4.1 Distribution Plots

Understanding the distribution of returns or other financial variables is fundamental:

import seaborn as sns
import pandas as pd

# Set seaborn style
sns.set_style("whitegrid")

# Generate sample data
np.random.seed(42)
n = 500
data = pd.DataFrame({
    'returns': np.concatenate([
        np.random.normal(0.001, 0.015, n),
        np.random.normal(0.001, 0.025, n)
    ]),
    'market': np.concatenate([
        ['Bull'] * n,
        ['Bear'] * n
    ])
})

# Create distribution plots
fig, axes = plt.subplots(1, 3, figsize=(14, 4))

# Histogram with KDE
sns.histplot(data=data, x='returns', kde=True, ax=axes[0])
axes[0].set_title('Histogram with Kernel Density')
axes[0].set_xlabel('Daily Returns')

# Compare distributions across groups
sns.histplot(data=data, x='returns', hue='market', kde=True,
             alpha=0.5, ax=axes[1])
axes[1].set_title('Returns by Market Regime')
axes[1].set_xlabel('Daily Returns')

# Violin plot for detailed distribution comparison
sns.violinplot(data=data, x='market', y='returns', ax=axes[2])
axes[2].set_title('Distribution Comparison')
axes[2].set_ylabel('Daily Returns')

plt.tight_layout()
plt.show()

Seaborn automatically handles many details: choosing appropriate bin sizes, computing kernel density estimates, and creating legends for grouped data.

17.4.2 Relationship Plots

Exploring relationships between variables is central to finance research:

# Generate correlated data
np.random.seed(123)
n = 200
market_returns = np.random.normal(0.001, 0.02, n)
stock_returns = 0.7 * market_returns + np.random.normal(0, 0.015, n)
size = np.random.choice(['Small', 'Large'], n)

data = pd.DataFrame({
    'Market': market_returns,
    'Stock': stock_returns,
    'Size': size
})

# Scatter plot with regression line
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Basic regression plot
sns.regplot(data=data, x='Market', y='Stock', ax=axes[0],
            scatter_kws={'alpha': 0.5}, line_kws={'color': 'red'})
axes[0].set_title('Stock vs. Market Returns')
axes[0].set_xlabel('Market Return')
axes[0].set_ylabel('Stock Return')

# Grouped by category
sns.scatterplot(data=data, x='Market', y='Stock', hue='Size',
                style='Size', s=80, alpha=0.7, ax=axes[1])
axes[1].set_title('Returns by Firm Size')
axes[1].set_xlabel('Market Return')
axes[1].set_ylabel('Stock Return')

plt.tight_layout()
plt.show()

The regplot function automatically fits and displays a regression line with confidence interval—perfect for visualizing CAPM-style relationships.

17.4.3 Pair Plots and Correlation

When exploring multiple variables simultaneously, pair plots provide a comprehensive overview:

# Generate multi-asset data
np.random.seed(42)
n = 300
cov = [[0.0004, 0.0002, 0.0001],
       [0.0002, 0.0006, 0.00015],
       [0.0001, 0.00015, 0.0005]]

returns = np.random.multivariate_normal([0.001, 0.0008, 0.0012], cov, n)
data = pd.DataFrame(returns, columns=['Tech', 'Finance', 'Energy'])
data['Period'] = np.random.choice(['Q1', 'Q2'], n)

# Create pair plot
pairplot = sns.pairplot(data, hue='Period', diag_kind='kde',
                         plot_kws={'alpha': 0.6, 's': 30},
                         diag_kws={'alpha': 0.7})
pairplot.fig.suptitle('Multi-Asset Return Relationships', y=1.02)
plt.show()

Pair plots show all pairwise relationships in a single figure—the diagonal shows marginal distributions, while off-diagonal panels show scatter plots.

17.4.4 Heatmaps for Correlation Matrices

Correlation matrices are ubiquitous in portfolio analysis:

# Generate larger dataset
np.random.seed(42)
n = 250
assets = ['Tech', 'Finance', 'Energy', 'Healthcare', 'Consumer']
cov = np.array([
    [0.0400, 0.0120, 0.0080, 0.0100, 0.0150],
    [0.0120, 0.0300, 0.0100, 0.0080, 0.0090],
    [0.0080, 0.0100, 0.0350, 0.0070, 0.0060],
    [0.0100, 0.0080, 0.0070, 0.0250, 0.0110],
    [0.0150, 0.0090, 0.0060, 0.0110, 0.0280]
])

returns = np.random.multivariate_normal(np.zeros(5), cov, n)
df = pd.DataFrame(returns, columns=assets)

# Compute correlation matrix
corr = df.corr()

# Create heatmap
fig, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(corr, annot=True, fmt='.3f', cmap='coolwarm',
            center=0, square=True, linewidths=1,
            cbar_kws={'label': 'Correlation'},
            ax=ax)
ax.set_title('Asset Return Correlations', fontsize=13, pad=15)
plt.tight_layout()
plt.show()

The annot=True parameter displays correlation coefficients directly on the heatmap—essential for publication-quality correlation tables.

17.5 Figure Design for Academic Papers

Academic journals have specific requirements for figures, and following these guidelines ensures your work meets professional standards and is accepted without revision requests for figure quality. Beyond technical requirements, well-designed figures communicate your results more effectively and make your paper more readable. Taking the time to polish your figures is one of the highest-return investments you can make when preparing a manuscript.

17.5.1 Figure sizing and resolution

For professional-looking figures, use vector formats (PDF, SVG) whenever possible since they scale perfectly at any size. When vector formats aren’t suitable—for example, with complex scatter plots containing thousands of points—use raster formats (PNG) with a minimum resolution of 300 DPI. Most journals specify figure widths: typically 3.5 inches for single-column figures and 7 inches for figures spanning two columns. Set your figure size in inches when creating the plot, so fonts and line weights appear at their intended size in the final document.

# Set publication-ready figure size
fig, ax = plt.subplots(figsize=(7, 4))  # 7 inches wide for double-column

# Create a clean, professional plot
x = np.linspace(0, 10, 100)
y = np.sin(x) * np.exp(-x/5)

ax.plot(x, y, 'k-', linewidth=1.5)
ax.fill_between(x, 0, y, alpha=0.2, color='gray')

ax.set_xlabel('Time (years)', fontsize=11)
ax.set_ylabel('Cumulative Abnormal Return', fontsize=11)
ax.set_title('Event Study: Post-Announcement Performance',
             fontsize=12, fontweight='normal')

# Clean spines
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

# Subtle grid
ax.grid(True, linestyle=':', alpha=0.4, linewidth=0.5)

plt.tight_layout()
plt.show()

17.6 Time-Series Plots

Time series are fundamental to finance. When your data has a DatetimeIndex, pandas handles most of the date formatting automatically, making it the preferred approach for time-series visualization.

17.6.1 Plotting stock prices

With a DatetimeIndex, plotting stock prices is straightforward using pandas’ plot() method:

import pandas as pd

# Generate realistic stock price data
np.random.seed(42)
dates = pd.date_range(start='2020-01-01', end='2023-12-31', freq='B')
n = len(dates)
daily_returns = np.random.normal(0.0003, 0.015, n)
prices = 100 * np.exp(np.cumsum(daily_returns))

# Create DataFrame with DatetimeIndex
df = pd.DataFrame({'Price': prices}, index=dates)
df.index.name = 'Date'

# Simple plot using pandas - dates are handled automatically
df['Price'].plot(figsize=(10, 5), title='Stock Price: 2020-2023')
plt.ylabel('Price ($)')
plt.show()

17.6.2 Returns over time

When visualizing financial performance over time, you can plot either prices or cumulative returns—both convey similar information, but cumulative returns are normalized to start at an arbitrary value (typically 0% or 1), making it easier to compare assets with different price levels.

Cumulative returns show the total growth of an investment over time:

# Calculate cumulative returns (starting from 0)
df['Return'] = df['Price'].pct_change()
df['Cumulative_Return'] = (1 + df['Return']).cumprod() - 1

# Plot cumulative returns
ax = df['Cumulative_Return'].plot(figsize=(10, 5), title='Cumulative Returns: 2020-2023')
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax.axhline(0, color='gray', linestyle='--', alpha=0.5)
plt.ylabel('Cumulative Return')
plt.show()

Daily returns show the day-to-day variation in returns. This type of plot is typically not very informative on its own—individual daily returns appear as noise—but it can be useful when you want to highlight the volatility of returns or identify periods of unusual activity:

# Plot daily returns
ax = df['Return'].plot(figsize=(10, 4), title='Daily Returns', alpha=0.7, linewidth=0.8)
ax.axhline(0, color='black', linestyle='--', linewidth=0.8)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.1%}'))
plt.ylabel('Daily Return')
plt.show()

17.6.3 Rolling statistics

Visualizing rolling windows helps identify time-varying patterns like changes in volatility regimes:

# Calculate rolling statistics
window = 60  # 60 trading days (approximately 3 months)
df['Rolling_Std'] = df['Return'].rolling(window=window).std()

# Annualize the rolling volatility
df['Annualized_Vol'] = df['Rolling_Std'] * np.sqrt(252)

# Plot rolling volatility
ax = df['Annualized_Vol'].plot(figsize=(10, 5), title='Rolling Volatility (60-day window)',
                                color='darkred', linewidth=1.5)
ax.fill_between(df.index, 0, df['Annualized_Vol'], alpha=0.3, color='darkred')
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
plt.ylabel('Annualized Volatility')
plt.show()

17.6.4 Multiple series comparison

Comparing multiple assets or portfolios is common in finance. With pandas, plotting multiple columns is automatic:

# Generate multiple asset returns
np.random.seed(123)
dates = pd.date_range(start='2020-01-01', end='2023-12-31', freq='B')
n = len(dates)

returns_df = pd.DataFrame({
    'Tech': np.random.normal(0.0005, 0.020, n),
    'Finance': np.random.normal(0.0003, 0.015, n),
    'Energy': np.random.normal(0.0002, 0.025, n)
}, index=dates)

# Calculate cumulative returns (wealth indices starting at 1)
cumulative = (1 + returns_df).cumprod()

# Plot all columns with automatic legend
ax = cumulative.plot(figsize=(10, 6), title='Multi-Asset Performance Comparison', linewidth=2)
ax.axhline(1, color='gray', linestyle='--', alpha=0.5)
plt.ylabel('Cumulative Return (Index)')
plt.show()

17.7 Styles and Color Choices

Consistent styling across all figures in a paper creates a professional, polished appearance. Matplotlib provides several mechanisms for controlling the visual style of your plots, from built-in style sheets to fine-grained control over individual parameters. Equally important is choosing colors that work for all readers, including those with color vision deficiencies, and that reproduce well in both color and grayscale.

17.7.1 Using style sheets

Matplotlib’s style system lets you apply consistent formatting across all figures with a single command. The library includes many built-in styles, and you can create custom style files to match specific journal requirements or personal preferences. Using style sheets ensures that fonts, line weights, grid styles, and other visual elements remain consistent without manually setting each parameter.

# Available styles
print("Available styles:", plt.style.available[:10])  # Show first 10

# Use a clean, publication-ready style
plt.style.use('seaborn-v0_8-paper')

fig, ax = plt.subplots(figsize=(7, 4))

# Scatter with regression line
np.random.seed(42)
x = np.random.normal(0, 1, 50)
y = 2*x + np.random.normal(0, 0.5, 50)
ax.scatter(x, y, alpha=0.6, s=50)
ax.plot(np.sort(x), 2*np.sort(x), 'r--', alpha=0.8, label='Fitted line')
ax.set_xlabel('Factor')
ax.set_ylabel('Return')
ax.set_title('Factor Exposure')
ax.legend()

plt.tight_layout()
plt.show()

# Reset to default
plt.style.use('default')

Available styles: ['Solarize_Light2', '_classic_test_patch', '_mpl-gallery', '_mpl-gallery-nogrid', 'bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot']

For consistency across a paper, consider creating a custom style file or setting rcParams at the start of your analysis script. Seaborn also provides its own style settings through sns.set_style() and sns.set_context(), which can be useful when combining seaborn and matplotlib plots.

17.7.2 Color schemes for accessibility

Effective color choices enhance clarity and ensure your figures are accessible to all readers. About 8% of men and 0.5% of women have some form of color vision deficiency, most commonly affecting the ability to distinguish red from green. Choosing colorblind-friendly palettes from the start means you won’t need to redesign figures later. Both matplotlib and seaborn offer built-in color palettes designed with accessibility in mind.

# Colorblind-friendly palette
colors_cb = ['#0173B2', '#DE8F05', '#029E73', '#CC78BC', '#CA9161']

fig, ax = plt.subplots(figsize=(8, 4))

# Multiple lines with accessible colors
t = np.linspace(0, 2, 100)
for i, color in enumerate(colors_cb[:3]):
    y = np.sin(2 * np.pi * (i+1) * t) * np.exp(-t/2)
    ax.plot(t, y, color=color, linewidth=2, label=f'Portfolio {i+1}')
ax.set_xlabel('Time')
ax.set_ylabel('Cumulative Return')
ax.set_title('Colorblind-Friendly Palette')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Seaborn provides several colorblind-friendly palettes through sns.color_palette(), including "colorblind", "deep", and "muted". For continuous data, prefer perceptually uniform colormaps like "viridis", "plasma", or "cividis" over rainbow colormaps like "jet", which can create misleading visual gradients and reproduce poorly in grayscale.

17.8 Exporting Publication-Quality Figures

The final step is saving your figures in formats suitable for journals and presentations.

17.8.1 Save Formats and Settings

Different output formats serve different purposes:

# Create a publication-ready figure
fig, ax = plt.subplots(figsize=(7, 4))

x = np.linspace(0, 10, 100)
y1 = np.sin(x) * np.exp(-x/5)
y2 = np.cos(x) * np.exp(-x/5)

ax.plot(x, y1, 'b-', linewidth=2, label='Strategy A')
ax.plot(x, y2, 'r--', linewidth=2, label='Strategy B')
ax.fill_between(x, y1, y2, alpha=0.2, color='gray')

ax.set_xlabel('Time (years)', fontsize=11)
ax.set_ylabel('Return', fontsize=11)
ax.set_title('Strategy Performance Comparison', fontsize=12)
ax.legend(loc='upper right', framealpha=0.9)
ax.grid(True, alpha=0.3)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.tight_layout()

# Save in multiple formats (commented out to avoid file creation)
# High-resolution PNG for Word documents
# plt.savefig('figure.png', dpi=300, bbox_inches='tight',
#             facecolor='white', edgecolor='none')

# Vector PDF for LaTeX documents (preferred for publications)
# plt.savefig('figure.pdf', format='pdf', bbox_inches='tight')

# SVG for web or further editing
# plt.savefig('figure.svg', format='svg', bbox_inches='tight')

# Transparent background version
# plt.savefig('figure_transparent.png', dpi=300, bbox_inches='tight',
#             transparent=True)

plt.show()

Best Practices for Saving Figures

Use bbox_inches='tight': Automatically crops whitespace around the figure
Set dpi=300: Ensures high quality for print (journals typically require 300-600 DPI)
Specify facecolor='white': Prevents transparent backgrounds in PNG files unless explicitly desired
Save as PDF for LaTeX: Vector formats scale perfectly and meet most journal requirements
Keep originals: Save a Python script or Jupyter notebook with the code to regenerate figures

17.8.2 Format-specific considerations

Different output formats serve different purposes:

PDF (vector): Best for publications. Vector formats scale infinitely without quality loss and produce small file sizes for simple plots. PDF is required or preferred by most journals.
PNG (raster): Good for presentations and Word documents. PNG offers wide compatibility but has fixed resolution, so you should set dpi=300 or higher for print quality. File sizes can be larger for complex plots.
SVG (vector): Best for web and further editing. SVG files can be edited in vector graphics software like Inkscape or Adobe Illustrator, and they render well in modern browsers. However, SVG is not always accepted by journals.

17.8.3 Batch Saving Multiple Figures

When creating many figures for a paper, automate the export process:

# Example: Create and save multiple figures programmatically
def save_figure(fig, filename, formats=['pdf', 'png']):
    """
    Save figure in multiple formats with consistent settings.

    Parameters
    ----------
    fig : matplotlib.figure.Figure
        The figure to save
    filename : str
        Base filename (without extension)
    formats : list of str
        List of formats to save ('pdf', 'png', 'svg')
    """
    for fmt in formats:
        if fmt == 'png':
            # High-resolution PNG
            fig.savefig(f'{filename}.{fmt}', dpi=300,
                       bbox_inches='tight', facecolor='white')
        else:
            # Vector formats
            fig.savefig(f'{filename}.{fmt}', format=fmt,
                       bbox_inches='tight')
    print(f"Saved {filename} in formats: {', '.join(formats)}")

# Example usage (commented to avoid file creation)
# fig1, ax1 = plt.subplots(figsize=(7, 4))
# ax1.plot([1, 2, 3], [1, 4, 9])
# ax1.set_title('Figure 1: Results')
# save_figure(fig1, 'paper_figure_1', formats=['pdf', 'png'])
# plt.close(fig1)

Figure Numbering and Organization

For paper submissions:

Name figures descriptively: Use names like returns_distribution.pdf rather than fig1.pdf. Avoid putting figure numbers in filenames—they cause friction when you reorder content.
Keep separate directories: Store figures in a figures/ or output/ directory
Version control: Commit your figure generation code, not the generated figures themselves. Output files can always be regenerated and clutter your repository.
Document parameters: Note DPI, dimensions, and any special requirements in your code