3 Object-Oriented Programming Basics

Object-Oriented Programming (OOP) is a programming paradigm that organizes code around “objects” rather than functions and logic. While Python fully supports OOP, you don’t need to use it for everything you do. In fact, for many data analysis tasks, a procedural or functional approach is simpler and more appropriate.

That said, understanding the basics of OOP is valuable for several reasons. First, many of the libraries you’ll use in empirical finance are built using OOP principles, so understanding these concepts will help you use them more effectively. Second, there are certain situations in research code where OOP can make your code cleaner, more organized, and easier to maintain. Finally, OOP provides a way to model real-world entities and relationships in your code, which can be particularly useful when working with financial concepts like trades and limit order books.

In this chapter, we’ll cover the fundamentals of OOP in Python, focusing on practical applications relevant to empirical finance. We’ll start with the basic concepts of classes and objects, then discuss when OOP is genuinely useful in research code, and finally introduce Python’s data classes, which provide a streamlined way to work with structured data.

A Pragmatic Approach

This chapter takes a pragmatic approach to OOP. We won’t cover every feature or delve into advanced design patterns. Instead, we’ll focus on the subset of OOP that’s most useful for research code in finance. If you find yourself writing highly object-oriented code with deep inheritance hierarchies, you are most likely overengineering your research scripts.

3.1 Classes and Objects

At its core, object-oriented programming is about creating custom data types that bundle together related data and the functions that operate on that data. Let’s break down the key concepts.

3.1.1 What is a Class?

A class is essentially a blueprint or template for creating objects. It defines what data an object will hold (attributes) and what operations can be performed on that data (methods). Think of a class as a cookie cutter and objects as the cookies made from that cutter.

Let’s start with a simple example. Suppose you’re working on a project that involves tracking individual trades. Each trade has certain properties: a ticker symbol, a quantity, a price, and whether it’s a buy or sell. You could represent each trade as a dictionary:

trade1 = {
    "ticker": "AAPL",
    "quantity": 100,
    "price": 150.50,
    "side": "buy"
}

trade2 = {
    "ticker": "MSFT",
    "quantity": 50,
    "price": 280.25,
    "side": "sell"
}

This works, but it has some limitations. There’s no guarantee that every trade dictionary has the same keys. You might accidentally misspell a key, or forget to include one. And if you want to calculate the total value of a trade, you need to write that logic separately.

A class provides a better solution. Here’s how we might define a Trade class:

class Trade:
    def __init__(self, ticker: str, quantity: int, price: float, side: str):
        self.ticker = ticker
        self.quantity = quantity
        self.price = price
        self.side = side

    def value(self) -> float:
        """Calculate the total value of the trade."""
        return self.quantity * self.price

    def __repr__(self) -> str:
        """Return a string representation of the trade."""
        return f"Trade({self.ticker}, {self.quantity}, ${self.price}, {self.side})"

1: We define the class with class Trade:. By convention, class names use CamelCase. The __init__ method is a special method called a constructor. It runs automatically when you create a new object from the class. The self parameter refers to the instance being created.
2: Inside __init__, we set attributes on the object using self.attribute_name. These become the object’s data.
3: The value method is a regular method that calculates the trade’s total value. Like all methods, it takes self as its first parameter.
4: The __repr__ method is another special method that defines how the object should be displayed. Methods that start and end with double underscores are called “dunder” (double underscore) methods or magic methods.

Now we can create trades as objects:

trade1 = Trade("AAPL", 100, 150.50, "buy")
trade2 = Trade("MSFT", 50, 280.25, "sell")

print(trade1)
print(f"Trade value: ${trade1.value():.2f}")

Trade(AAPL, 100, $150.5, buy)
Trade value: $15050.00

This is cleaner and more robust. Every Trade object is guaranteed to have the required attributes, and the logic for calculating value is bundled with the data.

3.1.2 Attributes and Methods

Let’s clarify some terminology:

Attributes are variables that belong to an object. In our example, ticker, quantity, price, and side are attributes.
Methods are functions that belong to a class. They operate on the object’s data. In our example, value() is a method.
Instance refers to a specific object created from a class. trade1 and trade2 are instances of the Trade class.

You access attributes and call methods using dot notation:

print(trade1.ticker)      # Accessing an attribute
print(trade1.value())     # Calling a method

AAPL
15050.0

3.1.3 Adding More Functionality

Let’s expand our Trade class to include more useful functionality. Suppose we want to compare trades and calculate profit and loss:

class Trade:
    def __init__(self, ticker: str, quantity: int, price: float, side: str):
        self.ticker = ticker
        self.quantity = quantity
        self.price = price
        self.side = side

    def value(self) -> float:
        """Calculate the total value of the trade."""
        return self.quantity * self.price

    def pnl(self, current_price: float) -> float:
        """Calculate profit/loss relative to a current price."""
        if self.side == "buy":
            return self.quantity * (current_price - self.price)
        else:  # sell
            return self.quantity * (self.price - current_price)

    def __repr__(self) -> str:
        return f"Trade({self.ticker}, {self.quantity}, ${self.price:.2f}, {self.side})"

    def __eq__(self, other: object) -> bool:
        """Check if two trades are equal."""
        if not isinstance(other, Trade):
            return NotImplemented
        return (self.ticker == other.ticker and
                self.quantity == other.quantity and
                self.price == other.price and
                self.side == other.side)

1: The pnl method calculates the profit or loss based on a current market price. The logic differs for buys (profit when price goes up) and sells (profit when price goes down).
2: The __eq__ method defines what it means for two Trade objects to be equal. Here, two trades are equal if all their attributes match.

The __eq__ method is one of several special comparison methods Python supports. Others include __ne__ (not equal, !=), __lt__ (less than, <), __le__ (less than or equal, <=), __gt__ (greater than, >), and __ge__ (greater than or equal, >=). For a complete list of these “rich comparison” methods and other special methods, see the Python documentation on basic customization.

Now we can do more with our trades:

trade = Trade("AAPL", 100, 150.50, "buy")
print(f"Trade value: ${trade.value():.2f}")

# Calculate P&L at a current price
current_price = 160.00
pnl = trade.pnl(current_price)
print(f"P&L at ${current_price:.2f}: ${pnl:.2f}")

# Test equality
trade2 = Trade("AAPL", 100, 150.50, "buy")
trade3 = Trade("MSFT", 50, 280.25, "sell")
print(f"trade == trade2: {trade == trade2}")  # Same attributes
print(f"trade == trade3: {trade == trade3}")  # Different attributes

Trade value: $15050.00
P&L at $160.00: $950.00
trade == trade2: True
trade == trade3: False

3.1.4 A More Complex Example: Portfolio Class

Let’s build a more sophisticated example: a Portfolio class that manages a collection of trades. This demonstrates how objects can contain other objects:

class Portfolio:
    def __init__(self, name):
        self.name = name
        self.trades = []

    def add_trade(self, trade):
        """Add a trade to the portfolio."""
        self.trades.append(trade)

    def total_value(self):
        """Calculate the total value of all trades."""
        return sum(trade.value() for trade in self.trades)

    def positions(self):
        """Calculate net position for each ticker."""
        positions = {}
        for trade in self.trades:
            if trade.ticker not in positions:
                positions[trade.ticker] = 0

            if trade.side == "buy":
                positions[trade.ticker] += trade.quantity
            else:  # sell
                positions[trade.ticker] -= trade.quantity

        return positions

    def __repr__(self):
        return f"Portfolio('{self.name}', {len(self.trades)} trades)"

    def summary(self):
        """Print a summary of the portfolio."""
        print(f"Portfolio: {self.name}")
        print(f"Total trades: {len(self.trades)}")
        print(f"Total value: ${self.total_value():.2f}")
        print("\nPositions:")
        for ticker, quantity in self.positions().items():
            print(f"  {ticker}: {quantity} shares")

Now we can use our Portfolio class:

# Create a portfolio
portfolio = Portfolio("My Research Portfolio")

# Add some trades
portfolio.add_trade(Trade("AAPL", 100, 150.50, "buy"))
portfolio.add_trade(Trade("AAPL", 50, 155.00, "buy"))
portfolio.add_trade(Trade("MSFT", 75, 280.25, "buy"))
portfolio.add_trade(Trade("AAPL", 25, 152.00, "sell"))

# View summary
portfolio.summary()

Portfolio: My Research Portfolio
Total trades: 4
Total value: $47618.75

Positions:
  AAPL: 125 shares
  MSFT: 75 shares

This example shows how OOP allows you to build up layers of abstraction. A Portfolio is a collection of Trade objects, and both have methods that make sense for their level of abstraction.

When to Use Classes vs. Functions

Don’t create a class just to group functions together. If your class only has one or two methods and no meaningful state (attributes), it should probably just be a function. Classes are most useful when you need to maintain state across multiple operations.

3.1.5 String Representations: `repr` vs. `str`

Python provides two different methods for converting objects to strings: __repr__ and __str__. Understanding the difference between them helps you write more useful classes.

__repr__ is meant to produce an unambiguous representation of the object, primarily for developers and debugging. Ideally, it should look like a valid Python expression that could recreate the object.
__str__ is meant to produce a readable, user-friendly string. It’s what gets displayed when you use print() on an object.

If you only implement one, implement __repr__. Python will use it as a fallback for __str__ if __str__ isn’t defined. Here’s an example showing both:

class Trade:
    def __init__(self, ticker: str, quantity: int, price: float, side: str):
        self.ticker = ticker
        self.quantity = quantity
        self.price = price
        self.side = side

    def __repr__(self) -> str:
        """Unambiguous representation for developers."""
        return f"Trade({self.ticker!r}, {self.quantity}, {self.price}, {self.side!r})"

    def __str__(self) -> str:
        """User-friendly representation."""
        action = "Buy" if self.side == "buy" else "Sell"
        return f"{action} {self.quantity} shares of {self.ticker} @ ${self.price:.2f}"

trade = Trade("AAPL", 100, 150.50, "buy")

# __str__ is used by print()
print(trade)

# __repr__ is used in the REPL and for debugging
print(repr(trade))

Buy 100 shares of AAPL @ $150.50
Trade('AAPL', 100, 150.5, 'buy')

3.1.6 Rich Display in Jupyter and Quarto

Jupyter notebooks (and Quarto, a publishing system for creating documents from notebooks and other sources) support special methods for rich display. These methods allow your objects to render as HTML, Markdown, or LaTeX instead of plain text:

_repr_html_() returns HTML that will be rendered in the notebook
_repr_markdown_() returns Markdown text
_repr_latex_() returns LaTeX for mathematical notation

Here’s a simple example:

class Trade:
    def __init__(self, ticker: str, quantity: int, price: float, side: str):
        self.ticker = ticker
        self.quantity = quantity
        self.price = price
        self.side = side

    def __repr__(self) -> str:
        return f"Trade({self.ticker!r}, {self.quantity}, {self.price}, {self.side!r})"

    def _repr_html_(self) -> str:
        """Rich HTML display for Jupyter/Quarto."""
        color = "green" if self.side == "buy" else "red"
        return f"""
        <div style="border: 1px solid #ccc; padding: 10px; border-radius: 5px; width: fit-content;">
            <strong>{self.ticker}</strong><br>
            <span style="color: {color};">{self.side.upper()}</span>
            {self.quantity} shares @ ${self.price:.2f}
        </div>
        """

    def _repr_latex_(self) -> str:
        """Rich LaTeX display for PDF output."""
        action = "Buy" if self.side == "buy" else "Sell"
        return (
            rf"\textbf{{{self.ticker}}}: "
            rf"{action} {self.quantity} shares @ \${self.price:.2f}"
        )

trade = Trade("AAPL", 100, 150.50, "buy")
trade  # In Jupyter/Quarto, this displays as formatted HTML or LaTeX

AAPL
BUY 100 shares @ $150.50

Many of the data analysis libraries you’ll use later in this course, such as pandas, use these methods to display data frames as nicely formatted tables.

3.1.7 Class Variables vs. Instance Variables

So far, we’ve been working with instance variables—attributes that are unique to each object. Python also supports class variables, which are shared by all instances of a class:

class Trade:
    # Class variable
    commission_rate = 0.001  # 0.1% commission

    def __init__(self, ticker, quantity, price, side):
        # Instance variables
        self.ticker = ticker
        self.quantity = quantity
        self.price = price
        self.side = side

    def value(self):
        """Calculate the total value of the trade."""
        return self.quantity * self.price

    def net_value(self):
        """Calculate value after commission."""
        gross_value = self.value()
        commission = gross_value * Trade.commission_rate
        return gross_value - commission

    def __repr__(self):
        return f"Trade({self.ticker}, {self.quantity}, ${self.price:.2f}, {self.side})"

# All trades share the same commission rate
trade1 = Trade("AAPL", 100, 150.50, "buy")
trade2 = Trade("MSFT", 50, 280.25, "sell")

print(f"Trade 1 net value: ${trade1.net_value():.2f}")
print(f"Trade 2 net value: ${trade2.net_value():.2f}")

# Changing the class variable affects all instances
Trade.commission_rate = 0.002
print(f"Trade 1 net value (new rate): ${trade1.net_value():.2f}")

Trade 1 net value: $15034.95
Trade 2 net value: $13998.49
Trade 1 net value (new rate): $15019.90

Class variables are useful for values that should be consistent across all instances, like constants, default settings, or shared configuration.

3.1.8 Property Decorators

Sometimes you want to compute a value on-the-fly rather than storing it as an attribute. Python’s @property decorator makes this look like a simple attribute access:

class Trade:
    def __init__(self, ticker, quantity, price, side):
        self.ticker = ticker
        self.quantity = quantity
        self.price = price
        self.side = side

    @property
    def value(self):
        """Calculate the total value of the trade."""
        return self.quantity * self.price

    @property
    def is_buy(self):
        """Check if this is a buy trade."""
        return self.side == "buy"

    def __repr__(self):
        return f"Trade({self.ticker}, {self.quantity}, ${self.price:.2f}, {self.side})"

trade = Trade("AAPL", 100, 150.50, "buy")

# No parentheses needed - looks like an attribute
print(f"Trade value: ${trade.value:.2f}")
print(f"Is buy: {trade.is_buy}")

Trade value: $15050.00
Is buy: True

The advantage of using @property is that it allows you to start with a simple attribute and later change it to a computed value without changing how the class is used. It also makes the code more readable when the value is conceptually an attribute rather than an action.

3.2 When OOP is Useful in Research Code

Now that you understand the basics of classes and objects, an important question remains: when should you actually use OOP in your research code?

The truth is, many data analysis tasks in empirical finance don’t require OOP. A straightforward script that loads data, performs some analysis, and generates output can be perfectly fine without defining any classes. In fact, overusing OOP can make simple tasks more complicated than they need to be.

However, there are several scenarios where OOP becomes genuinely useful in research code.

3.2.1 Scenario 1: Managing Complex State

If you find yourself passing around many related variables to multiple functions, a class might be appropriate. Consider a simulation configuration that needs to be loaded, validated, and used across multiple functions:

Without OOP:

import json
from pathlib import Path

def load_simulation_config(filepath):
    with open(filepath) as f:
        data = json.load(f)
    return {
        "name": data["name"],
        "n_simulations": data["n_simulations"],
        "initial_value": data["initial_value"],
        "drift": data["drift"],
        "volatility": data["volatility"],
        "results": None
    }

def validate_config(config):
    if config["n_simulations"] <= 0:
        raise ValueError("n_simulations must be positive")
    if config["initial_value"] <= 0:
        raise ValueError("initial_value must be positive")
    if config["volatility"] < 0:
        raise ValueError("volatility must be non-negative")
    return config

def run_simulation(config):
    if config["n_simulations"] <= 0:
        raise ValueError("Invalid config")
    # Simulation logic would go here...
    config["results"] = {"mean": 105.2, "std": 12.3}
    return config

# Usage - must remember the correct sequence of function calls
config = load_simulation_config("sim_config.json")
config = validate_config(config)
config = run_simulation(config)
print(config["results"])

With OOP:

import json
from pathlib import Path
import pprint

class SimulationConfig:
    def __init__(
        self,
        name: str,
        n_simulations: int,
        initial_value: float,
        drift: float,
        volatility: float,
    ):
        self.name = name
        self.n_simulations = n_simulations
        self.initial_value = initial_value
        self.drift = drift
        self.volatility = volatility
        self.results: dict | None = None

        # Validation happens automatically on creation
        self._validate()

    def _validate(self) -> None:
        """Validate the configuration parameters."""
        if self.n_simulations <= 0:
            raise ValueError("n_simulations must be positive")
        if self.initial_value <= 0:
            raise ValueError("initial_value must be positive")
        if self.volatility < 0:
            raise ValueError("volatility must be non-negative")

    @classmethod
    def from_json(cls, filepath: str | Path) -> "SimulationConfig":
        """Create a SimulationConfig from a JSON file."""
        with open(filepath) as f:
            data = json.load(f)
        return cls(
            name=data["name"],
            n_simulations=data["n_simulations"],
            initial_value=data["initial_value"],
            drift=data["drift"],
            volatility=data["volatility"],
        )

    def run(self) -> "SimulationConfig":
        """Run the simulation."""
        # Simulation logic would go here...
        self.results = {"mean": 105.2, "std": 12.3}
        return self

    def __repr__(self) -> str:
        status = "completed" if self.results else "not run"
        return f"SimulationConfig({self.name!r}, {self.n_simulations} sims, {status})"

1: The pprint (pretty print) module from Python’s standard library formats complex data structures like dictionaries and lists in a more readable way, with proper indentation and line breaks. This is especially useful when displaying nested structures or long lists.
2: The @classmethod decorator creates a class method—a method that receives the class itself (conventionally named cls) as its first argument instead of an instance. Class methods are often used as alternative constructors, like from_json() here. In contrast, a @staticmethod doesn’t receive any implicit first argument and behaves like a regular function that happens to live inside a class.

Now we can use the class:

# Load from file using the class method
config = SimulationConfig.from_json("sim_config.json")

# Or create directly
config = SimulationConfig(
    name="Test Simulation",
    n_simulations=1000,
    initial_value=100.0,
    drift=0.05,
    volatility=0.2,
)

# Run and display results
config.run()
pprint.pprint(config.results)  # Pretty print the results dictionary

{'mean': 105.2, 'std': 12.3}

The OOP version is cleaner because the state is bundled together, and you don’t need to pass around a configuration dictionary. The simulation object maintains its own state, making the code more organized and less error-prone.

Beyond reducing errors, the class also makes your code more clearly defined. With a dictionary, nothing prevents you from accessing a misspelled key like config["n_simulaitons"]—you’ll only discover the typo at runtime. With a class, your editor (like VS Code) can immediately flag config.n_simulaitons as an error because it knows exactly which attributes SimulationConfig has. This kind of immediate feedback makes development faster and catches bugs before you even run the code.

3.2.2 Scenario 2: Multiple Related Variants

If you need to implement several variants of a similar concept, OOP with inheritance can reduce code duplication. For example, different return calculation methods:

import math

class Returns:
    """Base class for return calculations."""

    def __init__(self, prices: list[float]):
        self.prices = prices

    def calculate(self) -> list[float]:
        raise NotImplementedError("Subclasses must implement calculate()")

class SimpleReturns(Returns):
    """Calculate simple returns: (P_t / P_{t-1}) - 1"""

    def calculate(self) -> list[float]:
        return [
            (self.prices[i] / self.prices[i - 1]) - 1
            for i in range(1, len(self.prices))
        ]

class LogReturns(Returns):
    """Calculate log returns: log(P_t / P_{t-1})"""

    def calculate(self) -> list[float]:
        return [
            math.log(self.prices[i] / self.prices[i - 1])
            for i in range(1, len(self.prices))
        ]

class ExcessReturns(Returns):
    """Calculate excess returns over risk-free rate."""

    def __init__(self, prices: list[float], risk_free_rate: float):
        super().__init__(prices)
        self.risk_free_rate = risk_free_rate

    def calculate(self) -> list[float]:
        simple_returns = [
            (self.prices[i] / self.prices[i - 1]) - 1
            for i in range(1, len(self.prices))
        ]
        return [r - self.risk_free_rate for r in simple_returns]

# Usage
prices = [100.0, 102.0, 101.0, 105.0, 108.0]

simple = SimpleReturns(prices)
print("Simple returns:", [f"{r:.4f}" for r in simple.calculate()])

log_ret = LogReturns(prices)
print("Log returns:", [f"{r:.4f}" for r in log_ret.calculate()])

excess = ExcessReturns(prices, risk_free_rate=0.001)
print("Excess returns:", [f"{r:.4f}" for r in excess.calculate()])

Simple returns: ['0.0200', '-0.0098', '0.0396', '0.0286']
Log returns: ['0.0198', '-0.0099', '0.0388', '0.0282']
Excess returns: ['0.0190', '-0.0108', '0.0386', '0.0276']

This pattern is useful when you want to ensure different variants share a common interface or when you want to write code that works with any of the variants.

Don’t Overuse Inheritance

Inheritance can create tight coupling between classes and make code harder to understand. Often, composition (having one class use another as an attribute) is a better choice. Only use inheritance when you have a genuine “is-a” relationship and need to substitute one type for another.

For cases where you want classes to share a common interface without inheritance, Python 3.8+ offers Protocols (from the typing module). A Protocol defines what methods and attributes a class should have, without requiring the class to explicitly inherit from anything. This is sometimes called “structural subtyping” or “duck typing with type hints.”

3.2.3 Scenario 3: Encapsulating Complex Data Structures

When working with complex data structures that need validation or computed properties, classes provide a clean way to manage this complexity:

class EventStudyWindow:
    """Represents an event study window with validation."""

    def __init__(self, event_date, estimation_start, estimation_end,
                 event_start, event_end):
        self.event_date = event_date
        self.estimation_start = estimation_start
        self.estimation_end = estimation_end
        self.event_start = event_start
        self.event_end = event_end

        # Validate the window
        self._validate()

    def _validate(self):
        """Validate that the window makes sense."""
        if self.estimation_end >= self.event_date:
            raise ValueError("Estimation window must end before event date")

        if self.event_start > self.event_date:
            raise ValueError("Event window start must be at or before event date")

        if self.event_end < self.event_date:
            raise ValueError("Event window end must be at or after event date")

    @property
    def estimation_length(self):
        """Length of the estimation window in days."""
        return (self.estimation_end - self.estimation_start).days

    @property
    def event_length(self):
        """Length of the event window in days."""
        return (self.event_end - self.event_start).days

    def __repr__(self):
        return (f"EventStudyWindow(event={self.event_date}, "
                f"estimation={self.estimation_length} days, "
                f"event_window={self.event_length} days)")

# Usage
from datetime import date, timedelta

event_date = date(2024, 6, 15)
window = EventStudyWindow(
    event_date=event_date,
    estimation_start=event_date - timedelta(days=260),
    estimation_end=event_date - timedelta(days=10),
    event_start=event_date - timedelta(days=1),
    event_end=event_date + timedelta(days=1)
)

print(window)
print(f"Estimation period: {window.estimation_length} days")
print(f"Event window: {window.event_length} days")

EventStudyWindow(event=2024-06-15, estimation=250 days, event_window=2 days)
Estimation period: 250 days
Event window: 2 days

The class encapsulates both the data and the logic for validation and computation, making it easier to work with event study windows correctly.

3.2.4 Scenario 4: Building Reusable Components

If you’re building functionality that will be reused across multiple projects, classes provide a clean interface:

import statistics
import random

class RollingWindow:
    """Calculate rolling window statistics."""

    def __init__(self, data: list[float], window_size: int):
        self.data = data
        self.window_size = window_size

        if len(self.data) < window_size:
            raise ValueError("Data must be longer than window size")

    def mean(self) -> list[float]:
        """Calculate rolling mean."""
        return [
            statistics.mean(self.data[i : i + self.window_size])
            for i in range(len(self) )
        ]

    def std(self) -> list[float]:
        """Calculate rolling standard deviation."""
        return [
            statistics.stdev(self.data[i : i + self.window_size])
            for i in range(len(self))
        ]

    def sharpe(self, risk_free_rate: float = 0) -> list[float]:
        """Calculate rolling Sharpe ratio."""
        means = self.mean()
        stds = self.std()
        return [(m - risk_free_rate) / s for m, s in zip(means, stds)]

    def __len__(self) -> int:
        return len(self.data) - self.window_size + 1

    def __repr__(self) -> str:
        return f"RollingWindow(data_length={len(self.data)}, window={self.window_size})"

# Generate some sample returns
random.seed(42)
returns = [random.gauss(0.001, 0.02) for _ in range(100)]
rolling = RollingWindow(returns, window_size=10)

print(f"Rolling windows: {len(rolling)}")
print(f"Mean rolling mean: {statistics.mean(rolling.mean()):.4f}")
print(f"Mean rolling Sharpe: {statistics.mean(rolling.sharpe()):.4f}")

Rolling windows: 91
Mean rolling mean: 0.0019
Mean rolling Sharpe: 0.1290

3.2.5 When to Avoid OOP

Just as important as knowing when to use OOP is knowing when not to use it. Avoid OOP when:

You’re doing one-off analysis: If you’re exploring data or doing a quick calculation, a simple script is fine.
Your code is primarily a sequence of transformations: Data pipelines that transform data step-by-step are often clearer as functions rather than classes.
You’re wrapping a single function: Don’t create a class with only one method. Just use a function.
It makes the code more complex: If OOP is making your code harder to understand, you’re probably not in a situation where it helps.

Remember: the goal is clarity and maintainability, not using OOP for its own sake.

3.3 Data Classes

Video

The following video provides a good introduction to data classes.

Python 3.7 introduced data classes, which provide a streamlined way to create classes that are primarily used to store data. They automatically generate common methods like __init__, __repr__, and __eq__, reducing boilerplate code significantly.

3.3.1 Basic Data Classes

Let’s revisit our Trade class, but this time using a data class:

from dataclasses import dataclass

@dataclass
class Trade:
    ticker: str
    quantity: int
    price: float
    side: str

    @property
    def value(self) -> float:
        """Calculate the total value of the trade."""
        return self.quantity * self.price

# Create trades
trade1 = Trade("AAPL", 100, 150.50, "buy")
trade2 = Trade("AAPL", 100, 150.50, "buy")
trade3 = Trade("MSFT", 50, 280.25, "sell")

print(trade1)
print(f"Value: ${trade1.value:.2f}") 
print(f"trade1 == trade2: {trade1 == trade2}")
print(f"trade1 == trade3: {trade1 == trade3}")

Trade(ticker='AAPL', quantity=100, price=150.5, side='buy')
Value: $15050.00
trade1 == trade2: True
trade1 == trade3: False

With just the @dataclass decorator and type annotations, we get:

An __init__ method that accepts all the attributes
A __repr__ method that shows a useful string representation
An __eq__ method that compares instances by their attributes

This is much less code than writing these methods manually, and it’s less error-prone.

3.3.2 Default Values

Data classes make it easy to specify default values:

from dataclasses import dataclass
from typing import Optional

@dataclass
class Trade:
    ticker: str
    quantity: int
    price: float
    side: str = "buy"  # default value
    commission: float = 0.0
    notes: Optional[str] = None

    def value(self):
        """Calculate the total value of the trade."""
        return self.quantity * self.price

    def net_value(self):
        """Calculate value after commission."""
        return self.value() - self.commission

# Use defaults
trade1 = Trade("AAPL", 100, 150.50)
print(trade1)

# Override defaults
trade2 = Trade("MSFT", 50, 280.25, side="sell", commission=14.00)
print(trade2)
print(f"Net value: ${trade2.net_value():.2f}")

Trade(ticker='AAPL', quantity=100, price=150.5, side='buy', commission=0.0, notes=None)
Trade(ticker='MSFT', quantity=50, price=280.25, side='sell', commission=14.0, notes=None)
Net value: $13998.50

3.3.3 Immutable Data Classes

You can make a data class immutable by setting frozen=True. This means that once created, the attributes cannot be changed:

from dataclasses import dataclass

@dataclass(frozen=True)
class Trade:
    ticker: str
    quantity: int
    price: float
    side: str

    def value(self):
        return self.quantity * self.price

trade = Trade("AAPL", 100, 150.50, "buy")
print(trade)

# This would raise an error:
# trade.price = 160.00  # FrozenInstanceError

Trade(ticker='AAPL', quantity=100, price=150.5, side='buy')

Immutable data classes are useful when you want to ensure that data doesn’t change unexpectedly, or when you need to use instances as dictionary keys or in sets.

3.3.4 Data Classes vs. Regular Classes

When should you use a data class instead of a regular class?

Use data classes when:

Your class is primarily for storing data
You want automatic generation of common methods
You want type hints for all attributes
You need value-based equality (comparing by content, not identity)

Use regular classes when:

You need more control over initialization
The class has complex behavior with little data
You need inheritance from non-dataclass parents

3.3.5 Data Classes and Type Checking

Data classes work particularly well with static type checkers. The type annotations are not just documentation—they can be validated by tools like ty (a fast type checker from Astral, the creators of uv and ruff) or directly in VS Code.

from dataclasses import dataclass

@dataclass
class Trade:
    ticker: str
    quantity: int
    price: float
    side: str

# These work fine
trade1 = Trade("AAPL", 100, 150.50, "buy")
trade2 = Trade(ticker="MSFT", quantity=50, price=280.25, side="sell")

# Runtime Python won't stop these, but type checkers will flag them:
# trade3 = Trade(ticker="AAPL", quantity="100", price=150.50, side="buy")  # wrong type
# trade4 = Trade("AAPL", 100, 150.50)  # missing argument

The real advantage is that VS Code (with the Python or Pylance extension) can highlight these errors as you type, before you even save the file. This immediate feedback helps catch bugs early and makes development faster.

Pydantic for Data Validation

If you need runtime data validation (not just static type checking), consider Pydantic. It’s a third-party library that offers functionality similar to dataclasses but validates data types at runtime, converts values to the correct types when possible, and provides detailed error messages when validation fails. Pydantic is particularly useful when working with external data sources like JSON files or API responses.