4 Code Quality and Documentation

Writing code is like writing prose for a dual audience: computers that execute it and humans who read, maintain, and extend it. While any code that runs correctly serves its immediate purpose, the real measure of quality lies in how easily others (including your future self) can understand, trust, and build upon your work.

In empirical research, where reproducibility and transparency are paramount, code quality takes on additional importance. A subtle bug in your data processing pipeline can invalidate months of work. Poorly documented functions can make it impossible for reviewers to verify your methodology. Code that works but cannot be understood becomes a liability rather than an asset.

This chapter covers the practical tools and techniques for writing clean, clear, readable, reproducible, and reliable code. We will explore how to organize your code for readability, document it effectively, catch errors before they cause problems, and maintain consistent style across your projects. These practices are not about perfectionism—they are about making your research more reliable, your collaboration more effective, and your future work easier.

4.1 Code Organization and Readability

The foundation of code quality is organization. Well-organized code reveals its structure and intent at a glance, making it easier to navigate, debug, and modify. This section covers the principles that make code readable and the practical techniques for achieving them.

4.1.1 The Principle of Least Surprise

Good code should behave the way readers expect. This means following established conventions, using descriptive names, and structuring your logic in clear, predictable ways. When you need to deviate from conventions, document why.

Consider two approaches to calculating portfolio returns:

# Unclear: cryptic names and unexpected structure
def calc(d, w):
    r = []
    for i in range(len(d)):
        r.append(sum([d[i][j] * w[j] for j in range(len(w))]))
    return r

# Clear: descriptive names and explicit structure
def calculate_portfolio_returns(asset_returns, weights):
    """Calculate portfolio returns given asset returns and weights."""
    portfolio_returns = []
    for period_returns in asset_returns:
        period_portfolio_return = sum(
            asset_return * weight
            for asset_return, weight in zip(period_returns, weights)
        )
        portfolio_returns.append(period_portfolio_return)
    return portfolio_returns

The second version is longer, but its intent is immediately clear. The function name describes what it does, parameter names indicate what inputs are expected, and the logic is explicit rather than compressed.

4.1.2 Project Directory Structure

A well-organized directory structure makes projects easier to navigate and understand. For empirical research projects, we recommend the following layout:

my-project/
├── conf/
│   └── config.yaml
├── data/
│   ├── raw/
│   ├── clean/
│   └── results/
├── notebooks/
├── notes/
├── paper/
├── slides/
├── src/my_project/
│   ├── __init__.py
│   ├── pipeline.py
│   └── utils/
├── tests/
├── .gitignore
├── pyproject.toml
└── README.md

1: Configuration files for your analysis pipeline
2: All data files, organized by processing stage
3: Original unprocessed data (never modify these files)
4: Processed and cleaned datasets
5: Analysis outputs like regression results
6: Jupyter notebooks for exploration and prototyping
7: Research notes and documentation
8: Paper manuscript (Quarto or LaTeX)
9: Presentation slides
10: Main Python package with your reusable code
11: Main analysis pipeline script
12: Utility functions and helpers
13: Unit tests for your code
14: Files to exclude from version control
15: Project dependencies and metadata
16: Project overview and setup instructions

This structure separates concerns clearly: raw data stays pristine, processed data is reproducible, and code is organized into reusable modules. The src/ directory pattern keeps your package importable while maintaining a clean project root.

When working with version control, we usually want to keep data and results in the different location. We discuss this in Chapter 8.

4.1.3 Function Design

Video

The following video covers function design. Note that this is an external resource that may present concepts differently than those covered here.

Functions should do one thing well. A function that does multiple unrelated tasks is harder to test, harder to reuse, and harder to understand. Consider this example:

# Poor: one function doing too much
def analyze_data(filepath):
    # Read data
    with open(filepath) as f:
        lines = f.readlines()

    # Parse and clean data
    values = []
    for line in lines:
        parts = line.strip().split(',')
        if len(parts) >= 2 and parts[1]:
            values.append(float(parts[1]))

    # Calculate statistics
    mean = sum(values) / len(values)
    squared_diffs = [(x - mean) ** 2 for x in values]
    std = (sum(squared_diffs) / len(values)) ** 0.5

    # Save results
    with open('stats.txt', 'w') as f:
        f.write(f'Mean: {mean}\nStd: {std}')

    return values

# Better: separate concerns into focused functions
def read_data_file(filepath):
    """Read lines from a data file."""
    with open(filepath) as f:
        return f.readlines()

def parse_values(lines, column=1):
    """Extract numeric values from CSV lines."""
    values = []
    for line in lines:
        parts = line.strip().split(',')
        if len(parts) > column and parts[column]:
            values.append(float(parts[column]))
    return values

def calculate_mean(values):
    """Calculate the arithmetic mean of a list of numbers."""
    if not values:
        raise ValueError("Cannot calculate mean of empty list")
    return sum(values) / len(values)

def calculate_std(values):
    """Calculate the standard deviation of a list of numbers."""
    if len(values) < 2:
        raise ValueError("Need at least 2 values for standard deviation")
    mean = calculate_mean(values)
    squared_diffs = [(x - mean) ** 2 for x in values]
    return (sum(squared_diffs) / len(values)) ** 0.5

def save_statistics(stats, output_path):
    """Save statistics dictionary to a text file."""
    with open(output_path, 'w') as f:
        for key, value in stats.items():
            f.write(f'{key}: {value}\n')

The refactored version is more verbose, but each function is now:

Testable: You can verify each step independently
Reusable: Functions can be used in other analyses
Readable: Each function has a clear, single purpose
Maintainable: Changes to one step don’t affect others

4.1.4 Managing Complexity with Abstraction

As your analysis grows more sophisticated, you will build up layers of abstraction. Lower-level functions handle details; higher-level functions orchestrate workflow:

# Low-level: handle specific calculations
def calculate_mean(values):
    """Calculate arithmetic mean."""
    return sum(values) / len(values)

def calculate_variance(values):
    """Calculate population variance."""
    mean = calculate_mean(values)
    squared_diffs = [(x - mean) ** 2 for x in values]
    return sum(squared_diffs) / len(values)

def calculate_covariance(x_values, y_values):
    """Calculate covariance between two lists of values."""
    if len(x_values) != len(y_values):
        raise ValueError("Lists must have same length")
    x_mean = calculate_mean(x_values)
    y_mean = calculate_mean(y_values)
    products = [(x - x_mean) * (y - y_mean) for x, y in zip(x_values, y_values)]
    return sum(products) / len(x_values)

# Mid-level: combine calculations
def calculate_descriptive_stats(values):
    """Calculate common descriptive statistics."""
    mean = calculate_mean(values)
    variance = calculate_variance(values)
    std = variance ** 0.5
    return {'mean': mean, 'variance': variance, 'std': std}

# High-level: orchestrate entire analysis
def analyze_dataset(filepath, column=1):
    """
    Perform comprehensive analysis of a data file.

    Reads data, calculates descriptive statistics, and
    returns a complete summary.
    """
    lines = read_data_file(filepath)
    values = parse_values(lines, column)
    stats = calculate_descriptive_stats(values)
    stats['n'] = len(values)
    stats['min'] = min(values)
    stats['max'] = max(values)
    return stats

4.1.5 Code Layout and Readability

Python’s readability comes partly from its use of whitespace. Use it deliberately:

# Cramped and hard to parse
def process_records(data,filters=None,transform=True):
    if filters is not None:data=[x for x in data if all(f(x) for f in filters)]
    if transform:data=[{'id':x['id'],'value':x['amount']*100} for x in data]
    return data

# Readable with proper spacing
def process_records(data, filters=None, transform=True):
    """Process records with optional filtering and transformation."""
    if filters is not None:
        data = [x for x in data if all(f(x) for f in filters)]

    if transform:
        data = [
            {'id': x['id'], 'value': x['amount'] * 100}
            for x in data
        ]

    return data

Guidelines for spacing:

Use blank lines to separate logical sections
Add spaces around operators (=, +, ==, etc.)
Avoid spaces immediately inside parentheses or brackets
Group related items visually

We discuss how this can be automated in Section 4.4.

4.2 Docstrings and Documentation Standards

Video

The following video covers code documentation. Note that this is an external resource that may present concepts differently than those covered here.

Documentation bridges the gap between what your code does and what users (including future you) need to know to use it correctly. In Python, this documentation primarily takes the form of docstrings—string literals that appear as the first statement in a module, class, or function.

4.2.1 Why Docstrings Matter

Unlike comments, which explain how code works, docstrings explain what code does and how to use it. They serve multiple purposes:

IDE integration: Modern editors display docstrings as tooltips and in autocomplete
Generated documentation: Tools like Sphinx can extract docstrings to create HTML documentation
Interactive help: The help() function displays docstrings in the Python REPL
Code review: Reviewers can understand intent without reading implementation
AI assistance: AI coding assistants use docstrings to understand your code and provide better suggestions

Consider the difference:

# Without docstring
def clip_values(values, lower, upper):
    return [max(lower, min(upper, x)) for x in values]

# With docstring
def clip_values(values, lower, upper):
    """
    Limit values to fall within specified bounds.

    Parameters
    ----------
    values : list of float
        Data values to clip
    lower : float
        Lower bound (values below this are set to lower)
    upper : float
        Upper bound (values above this are set to upper)

    Returns
    -------
    list of float
        Clipped values with the same length as input

    Examples
    --------
    >>> clip_values([1, 5, 10, 15], lower=3, upper=12)
    [3, 5, 10, 12]

    Notes
    -----
    This function is useful for reducing the impact of outliers while
    retaining more information than simple outlier removal.
    """
    return [max(lower, min(upper, x)) for x in values]

When you call help(clip_values) or hover over the function in VS Code, you will see the formatted docstring with all the information needed to use the function correctly.

There are multiple standard docstring styles, but the most relevant for research projects are the NumPy and Google styles. Both provide clear structure for documenting parameters, return values, and examples. AI assistants can help generate well-formatted docstrings, but you should always review the descriptions to ensure they accurately reflect what your code does.

4.2.2 NumPy Documentation Style

The NumPy documentation style is the standard in the scientific Python community. It is more verbose than some alternatives, but its structured format makes it ideal for technical and scientific code.

The basic structure includes:

One-line summary: Brief description of what the function does
Extended description (optional): Additional context and details
Parameters: Each parameter with its type and description
Returns: What the function returns and its type
Examples (optional): Usage examples with expected output
Notes (optional): Additional information, warnings, or references

Here’s a complete example:

def calculate_rolling_mean(values, window, min_periods=None):
    """
    Calculate rolling mean over a sliding window.

    Computes the arithmetic mean over a rolling window of specified size.
    The function handles edge cases at the beginning of the series.

    Parameters
    ----------
    values : list of float
        Sequence of numeric values
    window : int
        Number of values to include in each rolling window
    min_periods : int, optional
        Minimum number of observations required to calculate mean.
        If None, defaults to window size. Windows with fewer observations
        will return None.

    Returns
    -------
    list of float or None
        Rolling mean values. Returns None for positions where there
        are fewer than min_periods observations.

    Raises
    ------
    ValueError
        If window is less than 1
    ValueError
        If min_periods is greater than window

    Examples
    --------
    >>> values = [1.0, 2.0, 3.0, 4.0, 5.0]
    >>> calculate_rolling_mean(values, window=3)
    [None, None, 2.0, 3.0, 4.0]
    >>> calculate_rolling_mean(values, window=3, min_periods=1)
    [1.0, 1.5, 2.0, 3.0, 4.0]

    Notes
    -----
    The rolling mean at position i is calculated as the average of values
    from position (i - window + 1) to i, inclusive.

    See Also
    --------
    calculate_mean : Calculate mean of entire sequence
    calculate_rolling_std : Calculate rolling standard deviation
    """
    if window < 1:
        raise ValueError("window must be at least 1")

    if min_periods is None:
        min_periods = window

    if min_periods > window:
        raise ValueError("min_periods cannot exceed window")

    result = []
    for i in range(len(values)):
        start = max(0, i - window + 1)
        window_values = values[start:i + 1]
        if len(window_values) >= min_periods:
            result.append(sum(window_values) / len(window_values))
        else:
            result.append(None)

    return result

4.2.3 Google Documentation Style

Google style is an alternative that is more compact while still providing structure. It uses indented sections rather than underlined headers:

def normalize_values(values, method='zscore'):
    """Normalize a list of values using the specified method.

    Transforms values to have comparable scales, which is useful for
    combining variables measured in different units.

    Args:
        values (list of float): Numeric values to normalize
        method (str): Normalization method. Use 'zscore' for zero mean
            and unit variance, 'minmax' for scaling to [0, 1] range.

    Returns:
        list of float: Normalized values

    Raises:
        ValueError: If values has zero standard deviation (for zscore)
        ValueError: If all values are equal (for minmax)
        ValueError: If method is not recognized

    Example:
        >>> data = [10, 20, 30, 40, 50]
        >>> normalize_values(data, method='minmax')
        [0.0, 0.25, 0.5, 0.75, 1.0]
    """
    if method == 'zscore':
        mean = sum(values) / len(values)
        squared_diffs = [(x - mean) ** 2 for x in values]
        std = (sum(squared_diffs) / len(values)) ** 0.5
        if std == 0:
            raise ValueError("Cannot zscore normalize: zero standard deviation")
        return [(x - mean) / std for x in values]

    elif method == 'minmax':
        min_val, max_val = min(values), max(values)
        if min_val == max_val:
            raise ValueError("Cannot minmax normalize: all values are equal")
        return [(x - min_val) / (max_val - min_val) for x in values]

    else:
        raise ValueError(f"Unknown method: {method}")

For this course, we recommend using NumPy style for longer, more complex functions and Google style for simpler utilities. The key is to be consistent within a project. The docstring standards also define conventions for module-level and class-level docstrings, which follow similar patterns.

Documentation and Research Transparency

In empirical research, good documentation is not just helpful—it is essential for reproducibility. When writing up your analysis, you should be able to point reviewers to specific, well-documented functions that implement your methodology. This makes peer review more effective and helps establish trust in your results.

4.3 Type Hints and Static Typing

Python is a dynamically typed language, meaning you do not need to declare variable types. However, Python 3.5+ supports optional type hints that document expected types without changing runtime behavior. Type hints improve code quality by:

Making function interfaces explicit and self-documenting
Enabling static analysis tools to catch type errors before runtime
Improving IDE autocomplete and error detection
Serving as machine-checked documentation
Providing additional context to AI coding assistants

4.3.1 Basic Type Hints

Type hints specify the expected type of variables, parameters, and return values:

def calculate_return(initial_price: float, final_price: float) -> float:
    """Calculate simple return between two prices."""
    return (final_price - initial_price) / initial_price


def read_config(filepath: str) -> dict[str, str]:
    """Read configuration from a file."""
    config = {}
    with open(filepath) as f:
        for line in f:
            key, value = line.strip().split('=')
            config[key] = value
    return config

The syntax parameter: type indicates the expected type, and -> type indicates the return type.

4.3.2 Common Type Hints

Here are the type hints you will use most frequently:

def calculate_weighted_sum(
    values: list[float],
    weights: list[float]
) -> float:
    """Calculate weighted sum of values."""
    return sum(v * w for v, w in zip(values, weights))


def process_records(
    records: list[dict[str, str]],
    key: str = 'id'
) -> dict[str, dict[str, str]]:
    """Index records by a key field."""
    return {record[key]: record for record in records}


def calculate_statistics(
    values: list[float]
) -> tuple[float, float, float]:
    """Calculate mean, min, and max."""
    mean = sum(values) / len(values)
    return mean, min(values), max(values)

Legacy Type Hint Syntax

When type hints were first introduced in Python 3.5, you had to import special types from the typing module like List, Dict, Tuple, and Union. Starting with Python 3.9+, you can use the built-in types directly: list[str] instead of List[str], dict[str, int] instead of Dict[str, int], and int | float instead of Union[int, float]. You will still see the old style in many code examples and libraries, but for new code, prefer the modern syntax.

4.3.3 Optional and Union Types

Use | None when a parameter might be None:

def calculate_mean(values: list[float], default: float | None = None) -> float:
    """
    Calculate mean of values, with optional default for empty lists.

    Parameters
    ----------
    values : list of float
        Values to average
    default : float, optional
        Value to return if list is empty. If None, raises ValueError.
    """
    if not values:
        if default is None:
            raise ValueError("Cannot calculate mean of empty list")
        return default
    return sum(values) / len(values)

Use | (pipe) when a parameter can be one of several types:

def format_value(value: int | float | str) -> str:
    """Format a value as a string with appropriate formatting."""
    if isinstance(value, float):
        return f"{value:.2f}"
    return str(value)

4.3.4 Type Checking with ty

Type hints become even more valuable when combined with static type checkers. We recommend ty, a fast type checker from Astral (the same company behind uv and ruff). Run it with:

uvx ty check your_script.py

The type checker will detect type inconsistencies:

def calculate_return(initial_price: float, final_price: float) -> float:
    return (final_price - initial_price) / initial_price

# This will trigger a type error
result = calculate_return("100", "110")  # Error: expected float, got str

Type hints do not affect runtime behavior; Python will not enforce types unless you use a type checker. This means type hints are documentation and analysis tools, not runtime constraints. You can gradually add type hints to your codebase without breaking existing code.

For practical use, VS Code can highlight type errors the same way Word highlights typos, which is very useful to catch bugs early on.

For research code, focus type hints where they add the most value: public function interfaces that others will use, complex data transformations where types clarify expected structures, and critical calculations where you want to make assumptions explicit. You do not need to type hint every variable in every function—use judgment about where type information improves clarity.

4.4 Code Style and Linting with ruff

Consistent code style makes collaboration easier and reduces cognitive load when reading code. Rather than debating style choices, the Python community has converged on automated tools that enforce consistent formatting. This section introduces ruff, the modern standard for Python code quality. Ruff is made by Astral, the same company behind uv and ty.

4.4.1 Why Automated Formatting Matters

Manual formatting is time-consuming and leads to inconsistency. Different developers have different preferences for spacing, line breaks, and indentation. These differences create noise in version control, make code reviews harder, and waste mental energy on decisions that don’t affect functionality.

Automated formatters solve this by making formatting decisions for you. While you might not agree with every choice, the consistency and time savings far outweigh any aesthetic preferences. Additionally, when you get used to a specific style, it increases readability—your eyes learn to scan consistently formatted code more quickly.

4.4.2 ruff: The All-in-One Linter

Ruff is an extremely fast Python linter and code formatter written in Rust. It replaces multiple tools (flake8, isort, pyupgrade, and more) with a single, consistent interface. Ruff can:

Check for common errors and bugs
Enforce code style guidelines
Sort and organize imports
Suggest modernizations and improvements
Automatically fix many issues

The easiest way to use ruff is through the VS Code extension. Install the “Ruff” extension from the VS Code marketplace, and VS Code can be configured to automatically format and fix your code each time you save. This makes code quality effortless—just write your code and save.

You can also run ruff from the command line. Check your code with:

uvx ruff check your_script.py

Ruff will identify issues:

# example.py
import json
import os  # Unused import

def load_data(filepath):
    with open(filepath) as f:
        data = json.load(f)
    return data

# Unused variable
result = load_data("data.json")

Running uvx ruff check example.py produces:

example.py:2:8: F401 [*] `os` imported but unused
example.py:10:1: F841 [*] Local variable `result` is assigned to but never used
Found 2 errors.
[*] 2 potentially fixable with the `--fix` option.

Auto-fix issues:

uvx ruff check --fix example.py

Ruff will automatically remove the unused import and variable.

4.4.3 Configuring ruff

Configure ruff using a pyproject.toml file in your project root:

[tool.ruff]
# Set maximum line length (default is 88)
line-length = 88

# Target Python version
target-version = "py311"

[tool.ruff.lint]
# Enable specific rule sets
select = [
    "E",    # pycodestyle errors
    "F",    # Pyflakes
    "I",    # isort (import sorting)
    "B",    # flake8-bugbear (common bugs)
    "SIM",  # flake8-simplify
    "UP",   # pyupgrade (modernize syntax)
]

# Disable specific rules if needed
ignore = [
    "E501",  # Line too long (handled by formatter)
]

# Allow auto-fixing for these rule types
fixable = ["ALL"]

[tool.ruff.lint.per-file-ignores]
# Allow unused imports in __init__.py files
"__init__.py" = ["F401"]

# Relaxed rules for test files
"tests/**/*.py" = ["S101"]  # Allow assert statements

This configuration enables helpful checks while avoiding overly strict rules that might interfere with research workflows.

4.4.4 Common ruff Rules

Some particularly useful ruff rules:

Import Organization (I)

This rule organizes imports in alphabetical order and grouping them in three groups: standard library imports, third-party imports, and imports from the current project. It will also automatically remove imports that are not used in the code.

Before ruff:

import json
import pandas as pd
import sys
import os
from myproject.utils import helper
import csv

After ruff with isort rules:

import csv
import json
import os
import sys

import pandas as pd

from myproject.utils import helper

Bug Detection (B)

Ruff catches bugs such as the mutable default argument bug. When you use a mutable object like a list as a default argument, Python creates that object once when the function is defined—not each time the function is called. This means all calls share the same list, causing unexpected behavior where the list grows across calls.

Before (buggy):

def collect_values(value, results=[]):  # B006: Mutable default argument
    results.append(value)
    return results

After (fixed):

def collect_values(value, results=None):
    if results is None:
        results = []
    results.append(value)
    return results

Code Simplification (SIM)

Before:

if condition:
    return True
else:
    return False

After:

return condition

4.4.5 Formatting Code with ruff

In addition to linting, ruff includes a powerful code formatter. Ruff’s formatter is opinionated—it makes formatting decisions for you, eliminating debates about style. The philosophy is simple: let the tool handle formatting so you can focus on the code itself.

Format your code:

uvx ruff format your_script.py

Before formatting:

def calculate_mean(values,skip_none=True):
    if skip_none:values=[v for v in values if v is not None]
    return sum(values)/len(values)

After formatting:

def calculate_mean(values, skip_none=True):
    if skip_none:
        values = [v for v in values if v is not None]
    return sum(values) / len(values)

You can use both linting and formatting together:

uvx ruff format your_script.py && uvx ruff check --fix your_script.py

This gives you a single, fast tool to automatically improve your code quality.

Code Quality in Jupyter Notebooks

Ruff can also format Jupyter notebooks:

uvx ruff format analysis.ipynb
uvx ruff check analysis.ipynb

Configure per-cell ignores for exploratory code while maintaining standards for final analysis code.

4.5 Summary and Practical Guidelines

Code quality is not about perfectionism—it is about making your research more reliable, your collaboration more effective, and your future work easier. The practices covered in this chapter form the foundation of professional Python development:

Organization and readability: Structure code to reveal intent clearly
Documentation: Write docstrings that explain what code does and how to use it
Type hints: Make data types explicit to catch errors early
Automated formatting: Use ruff to maintain consistent style

For research projects, we recommend:

Start simple: Begin with basic ruff configuration, add rules gradually
Format early: Run ruff format regularly, not just before commits
Fix what matters: Use ruff check --fix to auto-fix safe issues
Team consistency: Ensure all collaborators use the same tools and configuration

These practices require some initial investment, but they pay dividends throughout your research career. Code that is well-organized, well-documented, and consistently formatted is easier to debug, easier to extend, and easier to share with collaborators and reviewers.

As you develop your empirical finance projects, make these practices habitual. Configure your tools once, integrate them into your workflow, and let automation handle the details. This frees you to focus on what matters: designing sound research, implementing correct methodology, and drawing valid conclusions from your data.

The practices in this chapter work best alongside testing, which we will cover in the next chapter. While code quality ensures your code is readable and well-documented, testing ensures it is correct. Together, they form the foundation of reliable research software.