5 Error Handling and Testing

In empirical research, the quality and reliability of your code directly impacts the quality of your results. A small bug in your data processing pipeline or statistical calculation can invalidate months of work. Yet many researchers write code without systematic error handling or testing. The result? Papers retracted due to coding errors, results that can’t be replicated, and countless hours spent debugging problems that could have been caught early.

This chapter introduces two complementary practices that will make your research code more robust: error handling and testing. Error handling is about writing code that fails gracefully and provides useful information when something goes wrong. Testing is about systematically verifying that your code does what you think it does. Together, these practices form the foundation of reliable, reproducible research.

Think of error handling as defensive driving for your code. You anticipate what might go wrong and plan for it. Testing, on the other hand, is like having a checklist before takeoff. You verify that everything works as expected before committing to your results. Both practices require a small upfront investment that pays enormous dividends in time saved and confidence gained.

5.1 Exceptions and Error Handling

When something goes wrong in a Python program, the interpreter raises an exception. An exception is Python’s way of signaling that an error has occurred. If you’ve written any Python code, you’ve likely encountered exceptions: TypeError, ValueError, KeyError, FileNotFoundError, and so on. By default, an unhandled exception stops your program and prints a traceback showing where the error occurred.

While this default behavior is useful during development, it’s often not what you want in production code or long-running research scripts. What if a file is missing, but you can use a default dataset instead? What if one stock in your analysis has corrupted data, but you want to continue processing the others? What if a network request fails, but you can retry it? This is where error handling comes in.

5.1.1 Understanding Exceptions

Let’s start with a simple example. Suppose you’re calculating portfolio returns and need to divide returns by portfolio values:

def calculate_return_percentage(return_dollars, portfolio_value):
    return (return_dollars / portfolio_value) * 100

# This works fine
print(calculate_return_percentage(1000, 50000))  # 2.0

# But this crashes
print(calculate_return_percentage(1000, 0))  # ZeroDivisionError!

When you try to divide by zero, Python raises a ZeroDivisionError. The program stops, and you see a traceback. While this is informative, it’s not helpful if you’re processing thousands of portfolios and one happens to have zero value.

5.1.2 The try-except Block

Python’s try-except block allows you to handle exceptions gracefully. The basic structure is:

try:
    result = risky_operation()
except SomeException:
    result = default_value

1: Code that might raise an exception goes in the try block.
2: Specify which exception type(s) to catch.
3: Handle the error and provide a fallback.

Let’s apply this to our portfolio example:

def calculate_return_percentage(return_dollars, portfolio_value):
    try:
        return (return_dollars / portfolio_value) * 100
    except ZeroDivisionError:
        # Portfolio has zero value, return None or a special value
        return None

# Now this doesn't crash
print(calculate_return_percentage(1000, 50000))  # 2.0
print(calculate_return_percentage(1000, 0))      # None

This is better, but we’ve lost information. We know the calculation failed, but we don’t know which portfolio or why. In research code, you almost always want to preserve this information:

def calculate_return_percentage(return_dollars, portfolio_value):
    try:
        return (return_dollars / portfolio_value) * 100
    except ZeroDivisionError:
        print(f"Warning: Cannot calculate return for portfolio with zero value")
        return None

While print() statements are commonly used to log errors and warnings, there are better ways to handle logging in production code. We introduce proper logging techniques in Chapter 6.

When to Catch Exceptions

A common mistake is catching exceptions too broadly or too often. Don’t catch exceptions just because you can. Catch them when you have a specific, sensible way to handle the error. If you can’t do anything useful with the exception, it’s often better to let it propagate and fail fast rather than hiding the problem.

5.1.3 Catching Multiple Exceptions

Often, several different things can go wrong, and you want to handle them differently:

def load_stock_data(filename):
    try:
        with open(filename, 'r') as f:
            lines = f.readlines()
        if len(lines) == 0:
            raise ValueError("File is empty")
        # Parse header and validate columns
        header = lines[0].strip().split(',')
        required_columns = ['date', 'close', 'volume']
        missing = set(required_columns) - set(header)
        if missing:
            raise ValueError(f"Missing required columns: {missing}")
        return lines
    except FileNotFoundError:
        print(f"Error: File '{filename}' not found")
        return None
    except ValueError as e:
        print(f"Error: Invalid data format - {e}")
        return None

You can also catch multiple exceptions in a single except block if you want to handle them the same way:

def load_data(filename):
    try:
        with open(filename, 'r') as f:
            return f.read()
    except (FileNotFoundError, PermissionError) as e:
        print(f"Error loading '{filename}': {e}")
        return None

5.1.4 The else and finally Clauses

The try-except block can include two additional clauses: else and finally.

The else clause runs if no exception was raised:

def process_file(filename):
    try:
        with open(filename, 'r') as f:
            lines = f.readlines()
    except FileNotFoundError:
        print(f"File not found: {filename}")
        return None
    else:
        # This runs only if no exception occurred
        print(f"Successfully loaded {len(lines)} lines")
        return lines

The finally clause always runs, whether an exception occurred or not. This is useful for cleanup:

def analyze_large_dataset(filename):
    file_handle = None
    try:
        file_handle = open(filename, 'r')
        data = process(file_handle)
        return data
    except Exception as e:
        print(f"Error processing file: {e}")
        return None
    finally:
        if file_handle:
            file_handle.close()

1: The except block handles errors.
2: The finally block always runs, ensuring the file is closed even if an error occurs or the function returns early.

Context Managers vs. finally

For file handling and similar resources, Python’s context managers (the with statement) are usually cleaner than finally:

def analyze_large_dataset(filename):
    try:
        with open(filename, 'r') as file_handle:
            data = process(file_handle)
            return data
    except Exception as e:
        print(f"Error processing file: {e}")
        return None

The context manager automatically closes the file, even if an exception occurs.

5.1.5 Raising Exceptions

Sometimes you need to signal an error in your own code. Use the raise statement:

def calculate_sharpe_ratio(returns, risk_free_rate):
    """Calculate Sharpe ratio.

    Parameters
    ----------
    returns : array-like
        Series of returns
    risk_free_rate : float
        Risk-free rate

    Raises
    ------
    ValueError
        If returns is empty or risk_free_rate is negative
    """
    if len(returns) == 0:
        raise ValueError("Returns array cannot be empty")

    if risk_free_rate < 0:
        raise ValueError("Risk-free rate cannot be negative")

    excess_returns = returns - risk_free_rate
    return excess_returns.mean() / excess_returns.std()

This is much better than returning a special value like -999 or None and hoping the caller checks for it. An exception forces the caller to explicitly handle the error.

5.1.6 Creating Custom Exceptions

Video

The following video covers similar topics to this section.

For complex projects, you might want to define your own exception types. In Python, all exceptions are classes that inherit from the built-in Exception class (see Chapter 3 for more on classes and inheritance). This makes it easier to catch specific errors:

class DataQualityError(Exception):
    """Raised when data fails quality checks."""
    pass

class InsufficientDataError(Exception):
    """Raised when there's not enough data for analysis."""
    pass

def calculate_rolling_beta(stock_returns, market_returns, window=60):
    """Calculate rolling beta with data quality checks."""
    if len(stock_returns) < window:
        raise InsufficientDataError(
            f"Need at least {window} observations, got {len(stock_returns)}"
        )

    # Check for too many missing values
    missing_pct = stock_returns.isna().sum() / len(stock_returns)
    if missing_pct > 0.1:
        raise DataQualityError(
            f"Too many missing values: {missing_pct:.1%}"
        )

    # Calculate beta...

Now calling code can handle different errors appropriately:

try:
    beta = calculate_rolling_beta(stock_returns, market_returns)
except InsufficientDataError as e:
    print(f"Skipping stock: {e}")
    beta = None
except DataQualityError as e:
    print(f"Data quality issue: {e}")
    beta = None

Don’t Swallow Exceptions

A common antipattern is the bare except: clause that catches everything:

# BAD: This hides all errors, including bugs in your code
try:
    result = complex_calculation()
except:
    result = None

This will catch not just the errors you expect, but also bugs in your code, keyboard interrupts, and system errors. Always catch specific exceptions, or at least use except Exception: which won’t catch system-exiting exceptions.

5.1.7 Error Handling in Data Pipelines

In empirical research, you often process many items (stocks, firms, countries) where some might fail. Here’s a pattern for handling this gracefully:

def process_stock(ticker, start_date, end_date):
    """Process a single stock, raising exceptions on failure."""
    # This function doesn't handle exceptions - it lets them propagate
    data = download_data(ticker, start_date, end_date)
    returns = calculate_returns(data)
    return calculate_statistics(returns)

def process_all_stocks(tickers, start_date, end_date):
    """Process multiple stocks, collecting both successes and failures."""
    results = {}
    errors = {}

    for ticker in tickers:
        try:
            results[ticker] = process_stock(ticker, start_date, end_date)
        except Exception as e:
            # Log the error but continue processing
            errors[ticker] = str(e)
            print(f"Error processing {ticker}: {e}")

    print(f"\nProcessed {len(results)} stocks successfully")
    print(f"Failed to process {len(errors)} stocks")

    return results, errors

# Usage
tickers = ['AAPL', 'MSFT', 'INVALID_TICKER', 'GOOGL']
results, errors = process_all_stocks(tickers, '2020-01-01', '2023-12-31')

This pattern separates the logic (in process_stock) from the error handling (in process_all_stocks). The individual function can be tested in isolation, while the batch function handles partial failures gracefully.

5.2 Unit Testing with pytest

Video

The following video covers similar topics to this section.

Error handling helps your code fail gracefully when things go wrong. Testing helps ensure things don’t go wrong in the first place. A test is simply code that verifies other code works correctly. You write a test that calls your function with known inputs and checks that it produces the expected output.

5.2.1 Why Test?

You might think: “I’ll just run my code and check the results. Why write separate tests?” Here’s why testing matters:

Confidence in changes: When you modify code, tests verify you didn’t break anything.
Documentation: Tests show how your code is meant to be used.
Better design: Code that’s easy to test is usually better designed.
Catch bugs early: Tests find problems before they affect your results.
Reproducibility: Tests verify your code produces consistent results.
Validation of AI-generated code: Tests provide an additional layer of verification when using AI coding assistants.

In research, there’s an additional benefit: tests help you understand your methods. Writing tests forces you to think clearly about what your code should do, edge cases, and assumptions. This deeper understanding often reveals problems in your research design.

AI Coding Assistants and Testing

AI coding assistants are particularly good at writing tests, making it easier to build a comprehensive test suite. However, always review AI-generated tests carefully. AI may miss important edge cases or make incorrect assumptions about expected behavior. Use AI-generated tests as a starting point, then add your own tests for edge cases and domain-specific scenarios that the AI might overlook.

5.2.2 Getting Started with pytest

pytest is Python’s most popular testing framework. It’s simple to use but powerful enough for complex projects. The easiest way to run pytest is using uvx, which runs the tool without requiring explicit installation:

uvx pytest test_math.py

If you’re working on a project and want pytest available as a development dependency, add it to your project’s dev group:

uv add --dev pytest

Then run it with:

uv run pytest

A pytest test is just a function whose name starts with test_. Here’s the simplest possible test:

def test_simple():
    assert 1 + 1 == 2

Save this in a file called test_math.py and run:

uvx pytest test_math.py

You’ll see output indicating the test passed. The assert statement is the heart of testing. If the expression after assert is True, the test passes. If it’s False, the test fails.

5.2.3 Testing a Real Function

Let’s test a function that calculates simple returns:

# finance_utils.py
def calculate_simple_returns(prices):
    """Calculate simple returns from a price series.

    Parameters
    ----------
    prices : list
        Series of prices

    Returns
    -------
    list
        Simple returns (length is len(prices) - 1)
    """
    returns = []
    for i in range(1, len(prices)):
        ret = (prices[i] - prices[i-1]) / prices[i-1]
        returns.append(ret)
    return returns

Now write tests:

# test_finance_utils.py
from finance_utils import calculate_simple_returns

def test_simple_returns_basic():
    """Test simple returns calculation with known values."""
    prices = [100, 110, 121]
    returns = calculate_simple_returns(prices)

    # Expected: (110-100)/100 = 0.10, (121-110)/110 = 0.10
    assert abs(returns[0] - 0.10) < 1e-10
    assert abs(returns[1] - 0.10) < 1e-10

def test_simple_returns_length():
    """Test that output length is correct."""
    prices = [100, 110, 121, 133.1]
    returns = calculate_simple_returns(prices)
    assert len(returns) == len(prices) - 1

def test_simple_returns_constant_prices():
    """Test with constant prices (zero returns)."""
    prices = [100, 100, 100]
    returns = calculate_simple_returns(prices)
    assert returns == [0, 0]

Run the tests:

uvx pytest test_finance_utils.py

Each test function checks a different aspect of the behavior. This is much more thorough than running the function once and eyeballing the output.

Test One Thing Per Test

Each test should verify one specific behavior. This makes failures easier to diagnose. When a test fails, you want to immediately know what’s wrong, not spend time figuring out which of five assertions in the test failed.

5.2.4 Understanding Test Output

When a test fails, pytest provides detailed information:

def test_log_returns_incorrect():
    """This test will fail to demonstrate pytest output."""
    prices = [100, 110]
    returns = calculate_log_returns(prices)
    assert returns[0] == 0.1  # This is wrong - log(1.1) ≈ 0.0953

Running this produces:

test_finance_utils.py::test_log_returns_incorrect FAILED

================================== FAILURES ===================================
________________________ test_log_returns_incorrect __________________________

    def test_log_returns_incorrect():
        prices = [100, 110]
        returns = calculate_log_returns(prices)
>       assert returns[0] == 0.1
E       assert 0.09531017980432493 == 0.1

test_finance_utils.py:8: AssertionError

The output shows exactly which assertion failed and what the actual value was. This makes debugging straightforward.

5.2.5 Testing with Fixtures

Often, you need the same data for multiple tests. pytest fixtures let you set up reusable test data:

import pytest

@pytest.fixture
def sample_prices():
    """Create sample price data for testing."""
    return [100, 105, 103, 108, 112, 110, 115]

def test_returns_are_correct(sample_prices):
    """Test returns calculation using fixture."""
    returns = calculate_simple_returns(sample_prices)
    # First return: (105-100)/100 = 0.05
    assert abs(returns[0] - 0.05) < 1e-10

def test_data_has_correct_length(sample_prices):
    """Test using the same fixture."""
    assert len(sample_prices) == 7

The @pytest.fixture decorator marks a function as a fixture. When you include the fixture name as a test function parameter, pytest automatically calls the fixture and passes its return value to your test.

Fixtures can also handle setup and teardown:

import pytest
import tempfile
import os

@pytest.fixture
def temp_data_file():
    """Create a temporary file with test data."""
    with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.csv') as f:
        f.write('date,close\n')
        f.write('2020-01-01,100\n')
        f.write('2020-01-02,110\n')
        temp_path = f.name

    yield temp_path

    os.unlink(temp_path)

def test_load_data_from_file(temp_data_file):
    """Test loading data from a CSV file."""
    with open(temp_data_file) as f:
        lines = f.readlines()
    assert len(lines) == 3  # Header + 2 data rows

1: Setup: Create temporary file with test data.
2: Yield: Provide the file path to the test.
3: Teardown: Clean up the file after the test completes.
4: The fixture name as a parameter tells pytest to inject the fixture’s return value.

The yield statement separates setup from teardown. Everything before yield runs before the test, and everything after runs after the test (even if the test fails).

5.2.6 Parametrized Tests

When you want to test the same function with multiple inputs, use parametrization:

import pytest

@pytest.mark.parametrize("prices,expected_length", [
    ([100, 110], 1),
    ([100, 110, 121], 2),
    ([100, 110, 121, 133], 3),
    ([100], 0),  # Edge case: single price
])
def test_returns_length_parametrized(prices, expected_length):
    """Test that returns have correct length for various inputs."""
    returns = calculate_simple_returns(prices)
    assert len(returns) == expected_length

This creates four separate tests, one for each parameter set. This is cleaner than writing four separate test functions and makes the pattern clear.

You can parametrize multiple arguments:

@pytest.mark.parametrize("initial_price,final_price,expected_return", [
    (100, 110, 0.10),   # 10% price increase
    (100, 90, -0.10),   # 10% price decrease
    (100, 100, 0),      # No change
    (50, 100, 1.0),     # 100% increase
])
def test_simple_return_calculation(initial_price, final_price, expected_return):
    """Test simple return calculation with various price changes."""
    returns = calculate_simple_returns([initial_price, final_price])
    assert abs(returns[0] - expected_return) < 1e-10

5.2.7 Testing for Exceptions

Sometimes you want to verify that your code raises an exception in certain situations:

def calculate_mean_return(returns):
    """Calculate mean return."""
    if len(returns) == 0:
        raise ValueError("Returns list cannot be empty")

    return sum(returns) / len(returns)

def test_mean_return_empty_raises():
    """Test that empty returns raise ValueError."""
    with pytest.raises(ValueError, match="cannot be empty"):
        calculate_mean_return([])

def test_mean_return_with_data():
    """Test normal mean return calculation."""
    returns = [0.01, 0.02, -0.01, 0.03]
    mean = calculate_mean_return(returns)
    assert abs(mean - 0.0125) < 1e-10

The pytest.raises() context manager asserts that the code block raises the specified exception. The match parameter checks that the exception message matches a pattern (using regular expressions).

Testing with NumPy and pandas

Testing functions that work with NumPy arrays or pandas DataFrames sometimes requires special handling, such as using np.allclose() for floating-point array comparisons or pd.testing.assert_frame_equal() for DataFrame comparisons.

5.2.8 Organizing Tests

As your project grows, organize tests to mirror your code structure:

my_research_project/
├── finance_utils/
│   ├── __init__.py
│   ├── returns.py
│   ├── risk.py
│   └── portfolio.py
└── tests/
    ├── __init__.py
    ├── test_returns.py
    ├── test_risk.py
    └── test_portfolio.py

Run all tests with:

uvx pytest tests/

Or if pytest is installed in your project:

uv run pytest tests/

Run specific tests:

uvx pytest tests/test_returns.py
uvx pytest tests/test_returns.py::test_simple_returns_basic

Configuration with pytest.ini

Create a pytest.ini file in your project root to configure pytest:

[tool.pytest.ini_options]
testpaths = tests
python_files = test_*.py
python_functions = test_*
addopts = -v --strict-markers

This specifies where to find tests and how to run them.

5.3 Test-Driven Development Concepts

Video

The following video by ArjanCodes covers similar topics to this section.

Test-Driven Development (TDD) is a development approach where you write tests before writing the code they test. This might seem backwards, but it has significant benefits, especially in research.

5.3.1 The TDD Cycle

TDD follows a simple cycle:

Red: Write tests that fail (because the code doesn’t exist yet), including edge cases
Green: Write just enough code to make all the tests pass
Refactor: Improve the code while keeping tests passing

Let’s walk through an example. Suppose you need to calculate the maximum drawdown of a price series.

Step 1: Write failing tests, including edge cases

# test_risk.py
from risk import calculate_max_drawdown

def test_max_drawdown_simple():
    """Test max drawdown with simple price series."""
    prices = [100, 110, 105, 115, 90, 95]
    # Peak is 115, trough is 90, drawdown is (90-115)/115 ≈ -0.217
    assert abs(calculate_max_drawdown(prices) - (-0.217)) < 0.001

def test_max_drawdown_no_drawdown():
    """Test with monotonically increasing prices (no drawdown)."""
    prices = [100, 110, 120, 130]
    assert calculate_max_drawdown(prices) == 0

def test_max_drawdown_single_price():
    """Test with single price."""
    prices = [100]
    assert calculate_max_drawdown(prices) == 0

Run these tests. They will all fail because calculate_max_drawdown doesn’t exist yet.

Step 2: Write minimal code to pass

# risk.py
def calculate_max_drawdown(prices):
    """Calculate maximum drawdown from a price series."""
    if len(prices) <= 1:
        return 0

    max_drawdown = 0
    peak = prices[0]

    for price in prices:
        if price > peak:
            peak = price
        drawdown = (price - peak) / peak
        if drawdown < max_drawdown:
            max_drawdown = drawdown

    return max_drawdown

Run the tests again. They should all pass.

Step 3: Refactor if needed

The code is clean and handles all edge cases. We can now move on, or improve our function to make it more efficient.

5.3.2 Benefits of TDD for Research

TDD might feel slow at first, but it pays off:

Clarifies thinking: Writing the test first forces you to specify exactly what you want the function to do.
Prevents scope creep: You implement only what’s needed to pass tests.
Documents intent: Tests show how the function should behave.
Enables refactoring: You can improve code with confidence because tests verify behavior doesn’t change.

In research, TDD is particularly valuable when implementing statistical methods or financial calculations. Write tests based on the formulas in the paper, then implement the method. The tests verify you’ve implemented the method correctly.

TDD for Complex Calculations

When implementing a complex statistical method from a paper:

Create tests using examples from the paper (if provided)
Create tests using results from R or Stata implementations
Create tests using simple cases you can verify by hand
Then implement your Python version

This approach catches mistakes early and gives you confidence in your implementation.

5.3.3 When Not to Use TDD

TDD isn’t always the right approach:

Exploratory analysis: When you don’t know what you’re looking for, write code first, then add tests
Prototypes: If you’re just trying something to see if it works, TDD adds overhead
Simple scripts: For one-off analyses, informal testing might be enough

But for any code you’ll reuse or that’s critical to your results, testing (whether test-first or test-after) is essential.

5.4 Testing Floating-Point Calculations

Financial calculations often involve floating-point arithmetic, which has quirks:

def test_floating_point_comparison():
    """Demonstrate floating-point comparison issues."""
    # This might fail due to floating-point precision
    result = 0.1 + 0.2
    # Don't do this:
    # assert result == 0.3  # Might fail!

    # Do this instead:
    assert abs(result - 0.3) < 1e-10
    # Or use pytest's approximate comparison:
    assert result == pytest.approx(0.3)

Always use tolerance-based comparisons for floating-point numbers. The pytest.approx() function is particularly nice because it chooses sensible default tolerances:

def test_returns_calculation():
    """Test returns calculation with approximate comparison."""
    prices = [100, 105, 110.25]
    returns = calculate_simple_returns(prices)
    expected = [0.05, 0.05]

    assert returns == pytest.approx(expected, rel=1e-6)

5.5 Best Practices Summary

Let’s consolidate what we’ve learned into actionable practices:

Error Handling:

Catch specific exceptions, not broad ones
Provide informative error messages
Don’t hide errors unless you can handle them meaningfully
Use custom exceptions for domain-specific errors
Validate inputs early and explicitly

Testing:

Write tests for any code you’ll reuse
Test edge cases, not just happy paths
One assertion per test when possible
Use fixtures for reusable test data
Use parametrization to test multiple scenarios
Run tests frequently during development

General:

Make functions testable (pure functions with clear inputs/outputs)
Validate assumptions with assertions
Document expected behavior
Use type hints to catch errors early
Review your own code before considering it done

Testing in Research Workflows

In empirical research, you often have a mix of:

Library code: Functions you’ll reuse across projects (test thoroughly)
Analysis scripts: One-off analyses (test key calculations)
Exploratory code: Trying things out (informal testing is fine)

Focus your testing effort on library code and anything that affects your paper’s results. A bug in a chart’s formatting is annoying; a bug in your returns calculation invalidates your research.

5.6 Conclusion

Error handling and testing might feel like overhead when you start a project, but they’re investments that pay enormous dividends. Code that handles errors gracefully is more robust and maintainable. Code with tests is easier to modify, debug, and trust.

In empirical research, where your code directly impacts your results and conclusions, this isn’t just about software engineering best practices—it’s about research integrity. A well-tested analysis pipeline gives you confidence in your results. Good error handling helps you identify data quality issues and edge cases. Together, they make your research more reproducible and reliable.

Start small. Add error handling to functions that interact with external data. Write tests for your key calculations. As these practices become habits, you’ll find yourself writing better code, spending less time debugging, and having more confidence in your results.