trade1 = {
"ticker": "AAPL",
"quantity": 100,
"price": 150.50,
"side": "buy"
}
trade2 = {
"ticker": "MSFT",
"quantity": 50,
"price": 280.25,
"side": "sell"
}3 Object-Oriented Programming Basics
Object-Oriented Programming (OOP) is a programming paradigm that organizes code around “objects” rather than functions and logic. While Python fully supports OOP, you don’t need to use it for everything you do. In fact, for many data analysis tasks, a procedural or functional approach is simpler and more appropriate.
That said, understanding the basics of OOP is valuable for several reasons. First, many of the libraries you’ll use in empirical finance are built using OOP principles, so understanding these concepts will help you use them more effectively. Second, there are certain situations in research code where OOP can make your code cleaner, more organized, and easier to maintain. Finally, OOP provides a way to model real-world entities and relationships in your code, which can be particularly useful when working with financial concepts like trades and limit order books.
In this chapter, we’ll cover the fundamentals of OOP in Python, focusing on practical applications relevant to empirical finance. We’ll start with the basic concepts of classes and objects, then discuss when OOP is genuinely useful in research code, and finally introduce Python’s data classes, which provide a streamlined way to work with structured data.
This chapter takes a pragmatic approach to OOP. We won’t cover every feature or delve into advanced design patterns. Instead, we’ll focus on the subset of OOP that’s most useful for research code in finance. If you find yourself writing highly object-oriented code with deep inheritance hierarchies, you are most likely overengineering your research scripts.
3.1 Classes and Objects
At its core, object-oriented programming is about creating custom data types that bundle together related data and the functions that operate on that data. Let’s break down the key concepts.
3.1.1 What is a Class?
A class is essentially a blueprint or template for creating objects. It defines what data an object will hold (attributes) and what operations can be performed on that data (methods). Think of a class as a cookie cutter and objects as the cookies made from that cutter.
Let’s start with a simple example. Suppose you’re working on a project that involves tracking individual trades. Each trade has certain properties: a ticker symbol, a quantity, a price, and whether it’s a buy or sell. You could represent each trade as a dictionary:
This works, but it has some limitations. There’s no guarantee that every trade dictionary has the same keys. You might accidentally misspell a key, or forget to include one. And if you want to calculate the total value of a trade, you need to write that logic separately.
A class provides a better solution. Here’s how we might define a Trade class:
class Trade:
def __init__(self, ticker: str, quantity: int, price: float, side: str):
self.ticker = ticker
self.quantity = quantity
self.price = price
self.side = side
def value(self) -> float:
"""Calculate the total value of the trade."""
return self.quantity * self.price
def __repr__(self) -> str:
"""Return a string representation of the trade."""
return f"Trade({self.ticker}, {self.quantity}, ${self.price}, {self.side})"- 1
-
We define the class with
class Trade:. By convention, class names use CamelCase. The__init__method is a special method called a constructor. It runs automatically when you create a new object from the class. Theselfparameter refers to the instance being created. - 2
-
Inside
__init__, we set attributes on the object usingself.attribute_name. These become the object’s data. - 3
-
The
valuemethod is a regular method that calculates the trade’s total value. Like all methods, it takesselfas its first parameter. - 4
-
The
__repr__method is another special method that defines how the object should be displayed. Methods that start and end with double underscores are called “dunder” (double underscore) methods or magic methods.
Now we can create trades as objects:
trade1 = Trade("AAPL", 100, 150.50, "buy")
trade2 = Trade("MSFT", 50, 280.25, "sell")
print(trade1)
print(f"Trade value: ${trade1.value():.2f}")Trade(AAPL, 100, $150.5, buy)
Trade value: $15050.00
This is cleaner and more robust. Every Trade object is guaranteed to have the required attributes, and the logic for calculating value is bundled with the data.
3.1.2 Attributes and Methods
Let’s clarify some terminology:
Attributes are variables that belong to an object. In our example,
ticker,quantity,price, andsideare attributes.Methods are functions that belong to a class. They operate on the object’s data. In our example,
value()is a method.Instance refers to a specific object created from a class.
trade1andtrade2are instances of theTradeclass.
You access attributes and call methods using dot notation:
print(trade1.ticker) # Accessing an attribute
print(trade1.value()) # Calling a methodAAPL
15050.0
3.1.3 Adding More Functionality
Let’s expand our Trade class to include more useful functionality. Suppose we want to compare trades and calculate profit and loss:
class Trade:
def __init__(self, ticker: str, quantity: int, price: float, side: str):
self.ticker = ticker
self.quantity = quantity
self.price = price
self.side = side
def value(self) -> float:
"""Calculate the total value of the trade."""
return self.quantity * self.price
def pnl(self, current_price: float) -> float:
"""Calculate profit/loss relative to a current price."""
if self.side == "buy":
return self.quantity * (current_price - self.price)
else: # sell
return self.quantity * (self.price - current_price)
def __repr__(self) -> str:
return f"Trade({self.ticker}, {self.quantity}, ${self.price:.2f}, {self.side})"
def __eq__(self, other: object) -> bool:
"""Check if two trades are equal."""
if not isinstance(other, Trade):
return NotImplemented
return (self.ticker == other.ticker and
self.quantity == other.quantity and
self.price == other.price and
self.side == other.side)- 1
-
The
pnlmethod calculates the profit or loss based on a current market price. The logic differs for buys (profit when price goes up) and sells (profit when price goes down). - 2
-
The
__eq__method defines what it means for twoTradeobjects to be equal. Here, two trades are equal if all their attributes match.
The __eq__ method is one of several special comparison methods Python supports. Others include __ne__ (not equal, !=), __lt__ (less than, <), __le__ (less than or equal, <=), __gt__ (greater than, >), and __ge__ (greater than or equal, >=). For a complete list of these “rich comparison” methods and other special methods, see the Python documentation on basic customization.
Now we can do more with our trades:
trade = Trade("AAPL", 100, 150.50, "buy")
print(f"Trade value: ${trade.value():.2f}")
# Calculate P&L at a current price
current_price = 160.00
pnl = trade.pnl(current_price)
print(f"P&L at ${current_price:.2f}: ${pnl:.2f}")
# Test equality
trade2 = Trade("AAPL", 100, 150.50, "buy")
trade3 = Trade("MSFT", 50, 280.25, "sell")
print(f"trade == trade2: {trade == trade2}") # Same attributes
print(f"trade == trade3: {trade == trade3}") # Different attributesTrade value: $15050.00
P&L at $160.00: $950.00
trade == trade2: True
trade == trade3: False
3.1.4 A More Complex Example: Portfolio Class
Let’s build a more sophisticated example: a Portfolio class that manages a collection of trades. This demonstrates how objects can contain other objects:
class Portfolio:
def __init__(self, name):
self.name = name
self.trades = []
def add_trade(self, trade):
"""Add a trade to the portfolio."""
self.trades.append(trade)
def total_value(self):
"""Calculate the total value of all trades."""
return sum(trade.value() for trade in self.trades)
def positions(self):
"""Calculate net position for each ticker."""
positions = {}
for trade in self.trades:
if trade.ticker not in positions:
positions[trade.ticker] = 0
if trade.side == "buy":
positions[trade.ticker] += trade.quantity
else: # sell
positions[trade.ticker] -= trade.quantity
return positions
def __repr__(self):
return f"Portfolio('{self.name}', {len(self.trades)} trades)"
def summary(self):
"""Print a summary of the portfolio."""
print(f"Portfolio: {self.name}")
print(f"Total trades: {len(self.trades)}")
print(f"Total value: ${self.total_value():.2f}")
print("\nPositions:")
for ticker, quantity in self.positions().items():
print(f" {ticker}: {quantity} shares")Now we can use our Portfolio class:
# Create a portfolio
portfolio = Portfolio("My Research Portfolio")
# Add some trades
portfolio.add_trade(Trade("AAPL", 100, 150.50, "buy"))
portfolio.add_trade(Trade("AAPL", 50, 155.00, "buy"))
portfolio.add_trade(Trade("MSFT", 75, 280.25, "buy"))
portfolio.add_trade(Trade("AAPL", 25, 152.00, "sell"))
# View summary
portfolio.summary()Portfolio: My Research Portfolio
Total trades: 4
Total value: $47618.75
Positions:
AAPL: 125 shares
MSFT: 75 shares
This example shows how OOP allows you to build up layers of abstraction. A Portfolio is a collection of Trade objects, and both have methods that make sense for their level of abstraction.
Don’t create a class just to group functions together. If your class only has one or two methods and no meaningful state (attributes), it should probably just be a function. Classes are most useful when you need to maintain state across multiple operations.
3.1.5 String Representations: __repr__ vs. __str__
Python provides two different methods for converting objects to strings: __repr__ and __str__. Understanding the difference between them helps you write more useful classes.
__repr__is meant to produce an unambiguous representation of the object, primarily for developers and debugging. Ideally, it should look like a valid Python expression that could recreate the object.__str__is meant to produce a readable, user-friendly string. It’s what gets displayed when you useprint()on an object.
If you only implement one, implement __repr__. Python will use it as a fallback for __str__ if __str__ isn’t defined. Here’s an example showing both:
class Trade:
def __init__(self, ticker: str, quantity: int, price: float, side: str):
self.ticker = ticker
self.quantity = quantity
self.price = price
self.side = side
def __repr__(self) -> str:
"""Unambiguous representation for developers."""
return f"Trade({self.ticker!r}, {self.quantity}, {self.price}, {self.side!r})"
def __str__(self) -> str:
"""User-friendly representation."""
action = "Buy" if self.side == "buy" else "Sell"
return f"{action} {self.quantity} shares of {self.ticker} @ ${self.price:.2f}"
trade = Trade("AAPL", 100, 150.50, "buy")
# __str__ is used by print()
print(trade)
# __repr__ is used in the REPL and for debugging
print(repr(trade))Buy 100 shares of AAPL @ $150.50
Trade('AAPL', 100, 150.5, 'buy')
3.1.6 Rich Display in Jupyter and Quarto
Jupyter notebooks (and Quarto, a publishing system for creating documents from notebooks and other sources) support special methods for rich display. These methods allow your objects to render as HTML, Markdown, or LaTeX instead of plain text:
_repr_html_()returns HTML that will be rendered in the notebook_repr_markdown_()returns Markdown text_repr_latex_()returns LaTeX for mathematical notation
Here’s a simple example:
class Trade:
def __init__(self, ticker: str, quantity: int, price: float, side: str):
self.ticker = ticker
self.quantity = quantity
self.price = price
self.side = side
def __repr__(self) -> str:
return f"Trade({self.ticker!r}, {self.quantity}, {self.price}, {self.side!r})"
def _repr_html_(self) -> str:
"""Rich HTML display for Jupyter/Quarto."""
color = "green" if self.side == "buy" else "red"
return f"""
<div style="border: 1px solid #ccc; padding: 10px; border-radius: 5px; width: fit-content;">
<strong>{self.ticker}</strong><br>
<span style="color: {color};">{self.side.upper()}</span>
{self.quantity} shares @ ${self.price:.2f}
</div>
"""
def _repr_latex_(self) -> str:
"""Rich LaTeX display for PDF output."""
action = "Buy" if self.side == "buy" else "Sell"
return (
rf"\textbf{{{self.ticker}}}: "
rf"{action} {self.quantity} shares @ \${self.price:.2f}"
)
trade = Trade("AAPL", 100, 150.50, "buy")
trade # In Jupyter/Quarto, this displays as formatted HTML or LaTeXBUY 100 shares @ $150.50
Many of the data analysis libraries you’ll use later in this course, such as pandas, use these methods to display data frames as nicely formatted tables.
3.1.7 Class Variables vs. Instance Variables
So far, we’ve been working with instance variables—attributes that are unique to each object. Python also supports class variables, which are shared by all instances of a class:
class Trade:
# Class variable
commission_rate = 0.001 # 0.1% commission
def __init__(self, ticker, quantity, price, side):
# Instance variables
self.ticker = ticker
self.quantity = quantity
self.price = price
self.side = side
def value(self):
"""Calculate the total value of the trade."""
return self.quantity * self.price
def net_value(self):
"""Calculate value after commission."""
gross_value = self.value()
commission = gross_value * Trade.commission_rate
return gross_value - commission
def __repr__(self):
return f"Trade({self.ticker}, {self.quantity}, ${self.price:.2f}, {self.side})"
# All trades share the same commission rate
trade1 = Trade("AAPL", 100, 150.50, "buy")
trade2 = Trade("MSFT", 50, 280.25, "sell")
print(f"Trade 1 net value: ${trade1.net_value():.2f}")
print(f"Trade 2 net value: ${trade2.net_value():.2f}")
# Changing the class variable affects all instances
Trade.commission_rate = 0.002
print(f"Trade 1 net value (new rate): ${trade1.net_value():.2f}")Trade 1 net value: $15034.95
Trade 2 net value: $13998.49
Trade 1 net value (new rate): $15019.90
Class variables are useful for values that should be consistent across all instances, like constants, default settings, or shared configuration.
3.1.8 Property Decorators
Sometimes you want to compute a value on-the-fly rather than storing it as an attribute. Python’s @property decorator makes this look like a simple attribute access:
class Trade:
def __init__(self, ticker, quantity, price, side):
self.ticker = ticker
self.quantity = quantity
self.price = price
self.side = side
@property
def value(self):
"""Calculate the total value of the trade."""
return self.quantity * self.price
@property
def is_buy(self):
"""Check if this is a buy trade."""
return self.side == "buy"
def __repr__(self):
return f"Trade({self.ticker}, {self.quantity}, ${self.price:.2f}, {self.side})"
trade = Trade("AAPL", 100, 150.50, "buy")
# No parentheses needed - looks like an attribute
print(f"Trade value: ${trade.value:.2f}")
print(f"Is buy: {trade.is_buy}")Trade value: $15050.00
Is buy: True
The advantage of using @property is that it allows you to start with a simple attribute and later change it to a computed value without changing how the class is used. It also makes the code more readable when the value is conceptually an attribute rather than an action.
3.2 When OOP is Useful in Research Code
Now that you understand the basics of classes and objects, an important question remains: when should you actually use OOP in your research code?
The truth is, many data analysis tasks in empirical finance don’t require OOP. A straightforward script that loads data, performs some analysis, and generates output can be perfectly fine without defining any classes. In fact, overusing OOP can make simple tasks more complicated than they need to be.
However, there are several scenarios where OOP becomes genuinely useful in research code.
3.2.1 Scenario 1: Managing Complex State
If you find yourself passing around many related variables to multiple functions, a class might be appropriate. Consider a simulation configuration that needs to be loaded, validated, and used across multiple functions:
Without OOP:
import json
from pathlib import Path
def load_simulation_config(filepath):
with open(filepath) as f:
data = json.load(f)
return {
"name": data["name"],
"n_simulations": data["n_simulations"],
"initial_value": data["initial_value"],
"drift": data["drift"],
"volatility": data["volatility"],
"results": None
}
def validate_config(config):
if config["n_simulations"] <= 0:
raise ValueError("n_simulations must be positive")
if config["initial_value"] <= 0:
raise ValueError("initial_value must be positive")
if config["volatility"] < 0:
raise ValueError("volatility must be non-negative")
return config
def run_simulation(config):
if config["n_simulations"] <= 0:
raise ValueError("Invalid config")
# Simulation logic would go here...
config["results"] = {"mean": 105.2, "std": 12.3}
return config
# Usage - must remember the correct sequence of function calls
config = load_simulation_config("sim_config.json")
config = validate_config(config)
config = run_simulation(config)
print(config["results"])With OOP:
import json
from pathlib import Path
import pprint
class SimulationConfig:
def __init__(
self,
name: str,
n_simulations: int,
initial_value: float,
drift: float,
volatility: float,
):
self.name = name
self.n_simulations = n_simulations
self.initial_value = initial_value
self.drift = drift
self.volatility = volatility
self.results: dict | None = None
# Validation happens automatically on creation
self._validate()
def _validate(self) -> None:
"""Validate the configuration parameters."""
if self.n_simulations <= 0:
raise ValueError("n_simulations must be positive")
if self.initial_value <= 0:
raise ValueError("initial_value must be positive")
if self.volatility < 0:
raise ValueError("volatility must be non-negative")
@classmethod
def from_json(cls, filepath: str | Path) -> "SimulationConfig":
"""Create a SimulationConfig from a JSON file."""
with open(filepath) as f:
data = json.load(f)
return cls(
name=data["name"],
n_simulations=data["n_simulations"],
initial_value=data["initial_value"],
drift=data["drift"],
volatility=data["volatility"],
)
def run(self) -> "SimulationConfig":
"""Run the simulation."""
# Simulation logic would go here...
self.results = {"mean": 105.2, "std": 12.3}
return self
def __repr__(self) -> str:
status = "completed" if self.results else "not run"
return f"SimulationConfig({self.name!r}, {self.n_simulations} sims, {status})"- 1
-
The
pprint(pretty print) module from Python’s standard library formats complex data structures like dictionaries and lists in a more readable way, with proper indentation and line breaks. This is especially useful when displaying nested structures or long lists. - 2
-
The
@classmethoddecorator creates a class method—a method that receives the class itself (conventionally namedcls) as its first argument instead of an instance. Class methods are often used as alternative constructors, likefrom_json()here. In contrast, a@staticmethoddoesn’t receive any implicit first argument and behaves like a regular function that happens to live inside a class.
Now we can use the class:
# Load from file using the class method
config = SimulationConfig.from_json("sim_config.json")
# Or create directly
config = SimulationConfig(
name="Test Simulation",
n_simulations=1000,
initial_value=100.0,
drift=0.05,
volatility=0.2,
)
# Run and display results
config.run()
pprint.pprint(config.results) # Pretty print the results dictionary{'mean': 105.2, 'std': 12.3}
The OOP version is cleaner because the state is bundled together, and you don’t need to pass around a configuration dictionary. The simulation object maintains its own state, making the code more organized and less error-prone.
Beyond reducing errors, the class also makes your code more clearly defined. With a dictionary, nothing prevents you from accessing a misspelled key like config["n_simulaitons"]—you’ll only discover the typo at runtime. With a class, your editor (like VS Code) can immediately flag config.n_simulaitons as an error because it knows exactly which attributes SimulationConfig has. This kind of immediate feedback makes development faster and catches bugs before you even run the code.
3.2.3 Scenario 3: Encapsulating Complex Data Structures
When working with complex data structures that need validation or computed properties, classes provide a clean way to manage this complexity:
class EventStudyWindow:
"""Represents an event study window with validation."""
def __init__(self, event_date, estimation_start, estimation_end,
event_start, event_end):
self.event_date = event_date
self.estimation_start = estimation_start
self.estimation_end = estimation_end
self.event_start = event_start
self.event_end = event_end
# Validate the window
self._validate()
def _validate(self):
"""Validate that the window makes sense."""
if self.estimation_end >= self.event_date:
raise ValueError("Estimation window must end before event date")
if self.event_start > self.event_date:
raise ValueError("Event window start must be at or before event date")
if self.event_end < self.event_date:
raise ValueError("Event window end must be at or after event date")
@property
def estimation_length(self):
"""Length of the estimation window in days."""
return (self.estimation_end - self.estimation_start).days
@property
def event_length(self):
"""Length of the event window in days."""
return (self.event_end - self.event_start).days
def __repr__(self):
return (f"EventStudyWindow(event={self.event_date}, "
f"estimation={self.estimation_length} days, "
f"event_window={self.event_length} days)")
# Usage
from datetime import date, timedelta
event_date = date(2024, 6, 15)
window = EventStudyWindow(
event_date=event_date,
estimation_start=event_date - timedelta(days=260),
estimation_end=event_date - timedelta(days=10),
event_start=event_date - timedelta(days=1),
event_end=event_date + timedelta(days=1)
)
print(window)
print(f"Estimation period: {window.estimation_length} days")
print(f"Event window: {window.event_length} days")EventStudyWindow(event=2024-06-15, estimation=250 days, event_window=2 days)
Estimation period: 250 days
Event window: 2 days
The class encapsulates both the data and the logic for validation and computation, making it easier to work with event study windows correctly.
3.2.4 Scenario 4: Building Reusable Components
If you’re building functionality that will be reused across multiple projects, classes provide a clean interface:
import statistics
import random
class RollingWindow:
"""Calculate rolling window statistics."""
def __init__(self, data: list[float], window_size: int):
self.data = data
self.window_size = window_size
if len(self.data) < window_size:
raise ValueError("Data must be longer than window size")
def mean(self) -> list[float]:
"""Calculate rolling mean."""
return [
statistics.mean(self.data[i : i + self.window_size])
for i in range(len(self) )
]
def std(self) -> list[float]:
"""Calculate rolling standard deviation."""
return [
statistics.stdev(self.data[i : i + self.window_size])
for i in range(len(self))
]
def sharpe(self, risk_free_rate: float = 0) -> list[float]:
"""Calculate rolling Sharpe ratio."""
means = self.mean()
stds = self.std()
return [(m - risk_free_rate) / s for m, s in zip(means, stds)]
def __len__(self) -> int:
return len(self.data) - self.window_size + 1
def __repr__(self) -> str:
return f"RollingWindow(data_length={len(self.data)}, window={self.window_size})"
# Generate some sample returns
random.seed(42)
returns = [random.gauss(0.001, 0.02) for _ in range(100)]
rolling = RollingWindow(returns, window_size=10)
print(f"Rolling windows: {len(rolling)}")
print(f"Mean rolling mean: {statistics.mean(rolling.mean()):.4f}")
print(f"Mean rolling Sharpe: {statistics.mean(rolling.sharpe()):.4f}")Rolling windows: 91
Mean rolling mean: 0.0019
Mean rolling Sharpe: 0.1290
3.2.5 When to Avoid OOP
Just as important as knowing when to use OOP is knowing when not to use it. Avoid OOP when:
You’re doing one-off analysis: If you’re exploring data or doing a quick calculation, a simple script is fine.
Your code is primarily a sequence of transformations: Data pipelines that transform data step-by-step are often clearer as functions rather than classes.
You’re wrapping a single function: Don’t create a class with only one method. Just use a function.
It makes the code more complex: If OOP is making your code harder to understand, you’re probably not in a situation where it helps.
Remember: the goal is clarity and maintainability, not using OOP for its own sake.
3.3 Data Classes
Python 3.7 introduced data classes, which provide a streamlined way to create classes that are primarily used to store data. They automatically generate common methods like __init__, __repr__, and __eq__, reducing boilerplate code significantly.
3.3.1 Basic Data Classes
Let’s revisit our Trade class, but this time using a data class:
from dataclasses import dataclass
@dataclass
class Trade:
ticker: str
quantity: int
price: float
side: str
@property
def value(self) -> float:
"""Calculate the total value of the trade."""
return self.quantity * self.price
# Create trades
trade1 = Trade("AAPL", 100, 150.50, "buy")
trade2 = Trade("AAPL", 100, 150.50, "buy")
trade3 = Trade("MSFT", 50, 280.25, "sell")
print(trade1)
print(f"Value: ${trade1.value:.2f}")
print(f"trade1 == trade2: {trade1 == trade2}")
print(f"trade1 == trade3: {trade1 == trade3}")Trade(ticker='AAPL', quantity=100, price=150.5, side='buy')
Value: $15050.00
trade1 == trade2: True
trade1 == trade3: False
With just the @dataclass decorator and type annotations, we get:
- An
__init__method that accepts all the attributes - A
__repr__method that shows a useful string representation - An
__eq__method that compares instances by their attributes
This is much less code than writing these methods manually, and it’s less error-prone.
3.3.2 Default Values
Data classes make it easy to specify default values:
from dataclasses import dataclass
from typing import Optional
@dataclass
class Trade:
ticker: str
quantity: int
price: float
side: str = "buy" # default value
commission: float = 0.0
notes: Optional[str] = None
def value(self):
"""Calculate the total value of the trade."""
return self.quantity * self.price
def net_value(self):
"""Calculate value after commission."""
return self.value() - self.commission
# Use defaults
trade1 = Trade("AAPL", 100, 150.50)
print(trade1)
# Override defaults
trade2 = Trade("MSFT", 50, 280.25, side="sell", commission=14.00)
print(trade2)
print(f"Net value: ${trade2.net_value():.2f}")Trade(ticker='AAPL', quantity=100, price=150.5, side='buy', commission=0.0, notes=None)
Trade(ticker='MSFT', quantity=50, price=280.25, side='sell', commission=14.0, notes=None)
Net value: $13998.50
3.3.3 Immutable Data Classes
You can make a data class immutable by setting frozen=True. This means that once created, the attributes cannot be changed:
from dataclasses import dataclass
@dataclass(frozen=True)
class Trade:
ticker: str
quantity: int
price: float
side: str
def value(self):
return self.quantity * self.price
trade = Trade("AAPL", 100, 150.50, "buy")
print(trade)
# This would raise an error:
# trade.price = 160.00 # FrozenInstanceErrorTrade(ticker='AAPL', quantity=100, price=150.5, side='buy')
Immutable data classes are useful when you want to ensure that data doesn’t change unexpectedly, or when you need to use instances as dictionary keys or in sets.
3.3.4 Data Classes vs. Regular Classes
When should you use a data class instead of a regular class?
Use data classes when:
- Your class is primarily for storing data
- You want automatic generation of common methods
- You want type hints for all attributes
- You need value-based equality (comparing by content, not identity)
Use regular classes when:
- You need more control over initialization
- The class has complex behavior with little data
- You need inheritance from non-dataclass parents
3.3.5 Data Classes and Type Checking
Data classes work particularly well with static type checkers. The type annotations are not just documentation—they can be validated by tools like ty (a fast type checker from Astral, the creators of uv and ruff) or directly in VS Code.
from dataclasses import dataclass
@dataclass
class Trade:
ticker: str
quantity: int
price: float
side: str
# These work fine
trade1 = Trade("AAPL", 100, 150.50, "buy")
trade2 = Trade(ticker="MSFT", quantity=50, price=280.25, side="sell")
# Runtime Python won't stop these, but type checkers will flag them:
# trade3 = Trade(ticker="AAPL", quantity="100", price=150.50, side="buy") # wrong type
# trade4 = Trade("AAPL", 100, 150.50) # missing argumentThe real advantage is that VS Code (with the Python or Pylance extension) can highlight these errors as you type, before you even save the file. This immediate feedback helps catch bugs early and makes development faster.
If you need runtime data validation (not just static type checking), consider Pydantic. It’s a third-party library that offers functionality similar to dataclasses but validates data types at runtime, converts values to the correct types when possible, and provides detailed error messages when validation fails. Pydantic is particularly useful when working with external data sources like JSON files or API responses.