Menu

Peter Pfaffelhuber

My group is dealing with applications of probability theory in the life sciences. Recently, we started projects in formalization of probability using the Lean interactive Theorem prover.

I am one of three professors at the probability group within the mathematical institute of the University of Freiburg.

Currently, I am member of the directorate of the Freiburg Center for Data Analysis, Modeling and AI (FDMAI).

Office hours:

Please write an email if you want to contact me.

Adress:

Abteilung für Mathematische Stochastik
Albert-Ludwigs University of Freiburg
Ernst-Zermelo-Straße 1

Zimmer 233
D - 79104 Freiburg

Tel: +49-761-203-5667
E-Mail: p.p@stochastik.uni-freiburg.de

Peter_Pfaffelhuber_Bild_zugeschnitten

Markov Processes in the Life Sciences

Schematic of transposable element insertions in a genome
Transposable elements

Transposable elements (TEs) are mobile genetic elements that can dislocate within a genome, leaving copies of themselves at new loci. In humans they account for roughly 54% of the genome. We model the abundance of TEs in population samples — currently in termite species — using multi-scale Markov processes that couple replication, excision and drift.

with Samuel Adeosun

Cartoon of Muller's ratchet
Muller's ratchet

Muller's ratchet describes the irreversible accumulation of slightly deleterious mutations in asexual populations: every time the fittest class is lost by genetic drift the ratchet clicks one notch further. We study the speed of the ratchet via martingale problems for the underlying measure-valued process and quantify how it depends on the underlying mutation model.

with Carola Heinzel

State graph of a chemical reaction network with rate constants
Chemical reaction networks

Chemical reaction networks are Markov jump processes on systems where some species are abundant and some are rare. Introducing a scaling parameter \(N\), we study the dynamics as \(N \to \infty\) and automate the derivation of laws of large numbers and central limit theorems on different time scales.

with Sebastian Stroppel


Probability Theory for Machine Learning

Prior-Data Fitted Network: context dataset and query feed into a transformer that outputs a predictive distribution

Prior-Data Fitted Networks (PFNs) are transformer-based foundation models that are trained on synthetic datasets sampled from a Bayesian prior. Given a context dataset \(\mathcal{D} = \{(x_i, y_i)\}_{i=1}^n\) and a query point \(x^\star\), a trained PFN approximates the posterior predictive distribution \(p(y \mid x^\star, \mathcal{D})\) in a single forward pass — no per-task gradient descent required at inference time. We investigate the statistical properties of these implicit Bayesian posteriors and the conditions under which they are well calibrated.

Project C01 within CRC 1597 SmallData, together with Jana Naue


Formalizing Probability

Lean Prover logo

Lean is both a functional programming language and an interactive theorem prover. Its mathematical library mathlib builds mathematics machine-checked from the ground up. We contribute to extending the library with results from probability theory, currently focusing on stochastic processes, Brownian motion, and measure-theoretic foundations.

My research is published in both, mathematical and biological journals. Only peer-reviewed papers are listed.

Here is the list from MathSciNet
Here is the list from google scholar
Here is my orcid profile.

Note: If you are looking for a thesis within the Master of Education programme, please look at the Bachelor's theses.

Bachelor thesis: The variance of the playing time in Gambler's Ruin (January 2026)

Two players play against each other. Both players stake one Euro per round; player \(A\) (\(B\)) wins with probability \(p\) (\(1-p\)). Starting with amounts \(x\) and \(N-x\), the game runs for a random time \(T\) until one player is ruined. The classical result is \(E[T] = x(N-x)\); the aim of the thesis is to compute \(V[T]\). This can be done via generating functions, or somewhat more directly.

  • Jiri Andel and Sarka Sudecova. Variance of the game duration in the gambler's ruin problem. Statistics and Probability Letters 82(9), 1750–1754, 2012.
  • E. Bach. Moments in the duration of play. Statistics and Probability Letters 36, 1–7, 1997.
  • William Feller. An Introduction to Probability Theory and its Applications (3rd edition). New York: Wiley, 1968.

Download as PDF


Master thesis: The greedy strategy in the all-heads coin game (May 2026)

In the all-heads coin game a player repeatedly tosses a pool of coins, each landing heads with probability \(p\). After every round, all coins showing heads are set aside and the remaining coins (tails) are re-tossed. A round in which no coin shows heads is lost; the player wins as soon as the pool is empty. The greedy strategy — set aside every head — yields the success probability \(b_{n,p}\) when starting with \(n\) coins.

Recently, I have assistend AI to find a formula for \(b_{n,p}\). The goal of the thesis is to check all arguments, as well as a proper write-up. See paper2.pdf.

Download as PDF


Bachelor and Master thesis: Lean formalization (ongoing)

We are running several formalization projects for the Lean Interactive Theorem Prover. Here is a list of topics you can contribute to:

  • Poisson convergence
  • Large deviations
  • Specific probability distributions
  • Markov chains
  • Random graphs
  • Generating random numbers
  • Statistics

Download as PDF


Master thesis: Prior-Data Fitted Networks and Martingale Posteriors (June 2026)

Summary of arXiv:2505.11325: Prior-data fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular datasets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not provide any uncertainty quantification for predictive means, quantiles, or similar quantities. The paper proposes a principled, efficient, and tuning-free sampling procedure to construct Bayesian posteriors for such estimates based on martingale posteriors, and proves its convergence. Several simulated and real-world data examples showcase the efficiency and calibration of the method in inference applications.

The thesis consists of placing martingale posterior distributions in the framework of theoretical statistics and implementing them.

Download as PDF