Empirical Finance

Modern Research with Python

Author
Affiliation

HEC Montréal

Published

January 2026

This book is aimed at students in the MSc in Finance program at HEC Montréal taking the course MATH 60230 Empirical Finance. I make it publicly available for anyone interested in learning Python for finance, but some examples and explanations are tailored to the HEC Montréal context.

About This Book

The main objective of this book is to offer a thorough and accessible practical guide to empirical finance research using Python. To achieve this goal, I cover statistical and econometric techniques used in empirical finance, with a focus on practical applications using Python. I believe that learning by doing is the most effective way to master new skills. Thus, I present real-world scenarios and datasets, enabling you to see the power and efficacy of these techniques in action.

No prior programming experience is required. If you are already familiar with Python, I still recommend you at least skim Part I because I propose not only an introduction to Python, but an introduction to a modern workflow for data analysis with Python.

My goal is that by the end of this book, you will have an advanced understanding of the main econometric techniques used in empirical finance and a solid grounding in modern Python programming for data analysis. I look forward to guiding you through this exciting journey into the world of empirical finance, statistics, econometrics, and Python programming.

Tip

This book is also available as a PDF file.

NoteWork in Progress

This book is a work in progress. I am constantly adding new content and refining existing content. If you have any suggestions or feedback, please reach out at vincent.gregoire@hec.ca.

Structure

This book is organized into seven parts plus appendices, each designed to build upon the knowledge from the previous section, ultimately guiding you to a robust understanding of empirical finance using Python.

Part I: Python Fundamentals, Environment, and Best Practices

The first part of the book is devoted to familiarizing you with Python and setting up the necessary coding environment. We begin with instructions on installing Python and understanding its basic syntax. We then introduce various tools that are part of the programmer’s toolbox, including the terminal, VS Code, Git, and GitHub. You will learn about managing Python environments with uv, writing clean and well-documented code, testing your code, and object-oriented programming concepts.

Part II: Working with Data

In the second part, we focus on data manipulation and analysis. You will learn how to load data from various sources and formats, work with DataFrames using pandas and Polars, clean and transform data, structure datasets for analysis, merge multiple data sources, and reshape data between wide and long formats.

Part III: Visualization and Research Output

The third part covers how to communicate your findings effectively. We explore data visualization with matplotlib and seaborn, creating publication-quality tables for regression results and summary statistics, and using Quarto to create reproducible research documents that combine code, text, and results.

Part IV: Statistical Foundations

In the fourth part, we dive into the statistical foundations needed for empirical finance. We cover numerical computing with NumPy, probability distributions and random number generation, and descriptive statistics for financial data.

Part V: Regression Methods

The fifth part focuses on econometric techniques central to empirical finance research. We cover linear regression with statsmodels, panel data methods for working with firm-time observations, and instrumental variables estimation for addressing endogeneity.

Part VI: Machine Learning and Artificial Intelligence in Research

The sixth part introduces modern AI and machine learning tools relevant to finance research. We discuss how to use AI assistants effectively for coding and research, machine learning techniques for prediction and classification, and natural language processing methods for analyzing textual data.

Part VII: Reproducibility and Replication

The final part addresses the critical importance of reproducibility in research. We cover best practices for organizing research projects, managing dependencies, and ensuring that your results can be replicated by others.

Learning Approach

The learning approach adopted in this book is designed to be practical and closely linked with the real-world challenges encountered in empirical finance. My philosophy is grounded in the belief that the best way to learn is by doing, especially when it comes to mastering complex concepts like econometrics and programming.

YouTube Video Tutorials

Throughout the book, I provide links to YouTube videos that offer alternative explanations of the concepts covered in the chapters. These videos are not meant to replace the book, but rather to provide additional perspectives and clarifications. Some of these videos are created by me and are available on my YouTube channel, Vincent Codes Finance.

Tech Stack

This book is structured around a tech stack formed by a specific set of tools that has been carefully chosen based on their wide adoption, robustness, versatility, and compatibility with each other. While alternative tools exist and may be equally capable, the book takes an opinionated approach, focusing on this particular stack for clarity and consistency. It’s worth noting that the concepts and techniques covered in this book can be applied with other tools as well, but the specific examples and code use the following:

  • Python 3.14 (released in October 2025)
  • uv for managing Python versions and environments
  • Visual Studio Code for writing code
  • Git and GitHub for version control and collaboration
  • Claude, ChatGPT, and Microsoft Copilot
  • Claude Code, OpenAI Codex, and GitHub Copilot for coding assistance
  • Quarto for writing technical content

Use of AI

This book was written with substantial assistance from AI tools, primarily ChatGPT and Claude Code. AI was used for all aspects of the book’s creation, including idea generation, creating outlines, drafting content, proofreading, and generating code examples. This reflects the modern reality of software development and technical writing, where AI assistants have become valuable collaborators.

However, all content has been reviewed and edited by a human. I take full responsibility for the accuracy and quality of the material presented. Any errors or omissions remain my responsibility.

About the Author

I’m Vincent Grégoire, CFA, a Professor of Finance at HEC Montréal and the Canada Research Chair in Finance and Technology. I teach empirical finance with a strong emphasis on Python-based data analysis. I earned a Ph.D. in Finance from the University of British Columbia, along with degrees in Computer Engineering and Financial Engineering from Université Laval, and previously served as Chief Data Scientist at Berkindale Analytics, a fintech startup.

My work focuses on how information is produced, processed, and priced in financial markets. I study market structure through the lens of big data, machine learning, algorithmic trading, and cybersecurity, with an emphasis on methods that actually scale outside toy examples.

Acknowledgments

I am grateful to Charles Martineau and Saad Ali Khan for their feedback and suggestions on the book.