Working with Data
This part covers the essential tools and techniques for working with data in Python. We introduce three powerful libraries—pandas, Polars, and DuckDB—and demonstrate how to use them for common data manipulation tasks in empirical finance.
The chapters in this part progressively build your data manipulation skills:
- Introduction to DataFrames: A high-level overview of pandas, Polars, and DuckDB
- Data Input and File Formats: Loading data from CSV, Parquet, and Excel files
- Data Cleaning: Handling duplicates, missing data, and validation
- Data Structuring and Aggregation: Grouping, aggregation, and working with keys
- Reshaping Data: Converting between long and wide formats
- Joins and Merges: Combining datasets correctly
Most concepts are demonstrated with examples in pandas, Polars, and DuckDB, allowing you to choose the right tool for your needs and understand how to translate between them.