Peter Pfaffelhuber
My group is dealing with applications of probability theory in the life sciences. Recently, we started projects in formalization of probability using the Lean interactive Theorem prover.
I am one of three professors at the probability group within the mathematical institute of the University of Freiburg.
Currently, I am member of the directorate of the Freiburg Center for Data Analysis, Modeling and AI (FDMAI).
Office hours:
Please write an email if you want to contact me.
Adress:
Abteilung für Mathematische Stochastik
Albert-Ludwigs University of Freiburg
Ernst-Zermelo-Straße 1
Zimmer 233
D - 79104 Freiburg
Tel: +49-761-203-5667
E-Mail: p.p@stochastik.uni-freiburg.de
Markov Processes in the Life Sciences
Transposable elements
Transposable elements (TEs) are mobile genetic elements that can dislocate within a genome, leaving copies of themselves at new loci. In humans they account for roughly 54% of the genome. We model the abundance of TEs in population samples — currently in termite species — using multi-scale Markov processes that couple replication, excision and drift.
with Samuel Adeosun
Muller's ratchet
Muller's ratchet describes the irreversible accumulation of slightly deleterious mutations in asexual populations: every time the fittest class is lost by genetic drift the ratchet clicks one notch further. We study the speed of the ratchet via martingale problems for the underlying measure-valued process and quantify how it depends on the underlying mutation model.
with Carola Heinzel
Chemical reaction networks
Chemical reaction networks are Markov jump processes on systems where some species are abundant and some are rare. Introducing a scaling parameter \(N\), we study the dynamics as \(N \to \infty\) and automate the derivation of laws of large numbers and central limit theorems on different time scales.
with Sebastian Stroppel
Probability Theory for Machine Learning
Prior-Data Fitted Networks (PFNs) are transformer-based foundation models that are trained on synthetic datasets sampled from a Bayesian prior. Given a context dataset \(\mathcal{D} = \{(x_i, y_i)\}_{i=1}^n\) and a query point \(x^\star\), a trained PFN approximates the posterior predictive distribution \(p(y \mid x^\star, \mathcal{D})\) in a single forward pass — no per-task gradient descent required at inference time. We investigate the statistical properties of these implicit Bayesian posteriors and the conditions under which they are well calibrated.
Project C01 within CRC 1597 SmallData, together with Jana Naue
Formalizing Probability
Lean is both a functional programming language and an interactive theorem prover. Its mathematical library mathlib builds mathematics machine-checked from the ground up. We contribute to extending the library with results from probability theory, currently focusing on stochastic processes, Brownian motion, and measure-theoretic foundations.
My research is published in both, mathematical and biological journals. Only peer-reviewed papers are listed.
- Markov processes forced on a subspace by a large drift, with applications to population genetics. submitted. 2026.
- Optimal strategies in the all-heads coin game. submitted. 2026.
- Formalization of Brownian motion in Lean. submitted. 2025.
- Enhancing Intra-Continental Biogeographical Ancestry Prediction Through a Machine Learning Marker Selection Method. submitted. 2025.
- Revealing the range of equally likely estimates in the admixture model. 15(9), G3. 2025.
- Advancing Biogeographical Ancestry Predictions Through Machine Learning. 79(103290), Forensic Science International: Genetics. 2025.
- Duality and the well-posedness of a martingale problem. 159, 59--73, Theoretical Population Biology. 2024.
- Probabilistic genetic identification of wild boar hybridization to support control of invasive wild pigs (Sus scrofa). 15(2), e4774, Ecosphere. 2024.
- The martingale problem method revisited. 28, 1--46, Electronic Journal of Probability. 2023.
- A unified framework for limit results in chemical reaction networks on multiple time-scales. 28, 1--33, Electronic Journal of Probability. 2023.
- A diploid population model for copy number variation of genetic elements. 28, 1--15, Electronic Journal of Probability. 2023.
- Neural networks for self-adjusting mutation rate estimation when the recombination rate is unknown. 18(8), e1010407, PLOS Computational Biology. 2022.
- A central limit theorem concerning uncertainty in estimates of individual admixture. 148, 28--39, Theoretical Population Biology. 2022.
- Mean-field limits for non-linear Hawkes processes with excitation and inhibition. 153, 57--78, Stochastic Processes and their Applications. 2022.
- Inference of recent admixture using genotype data. 56, 102593, Forensic Science International: Genetics. 2022.
- The partial duplication random graph with edge deletion. 18, 325--347, ALEA. 2021.
- Europe’s Roma people are vulnerable to poor practice in genetics. 599(7885), 368--371, Nature. 2021.
- The range of once-reinforced random walk in one dimension. 58(1), 164--175, Random Structures \& Algorithms. 2021.
- Pgainsim: A Method to Assess the Mode of Inheritance for Quantitative Trait Loci in Genome-Wide Association Studies. 85(2), 91--92, . 2021.
- Modifiers of mutation rate in selectively fluctuating environments. 130(11), 6843--6862, Stochastic Processes and their Applications. 2020.
- Markov branching processes with disasters: extinction, survival and duality to p-jump processes. 130(4), 2488--2518, Stochastic Processes and their Applications. 2020.
- Genealogical distances under low levels of selection. 131, 2--11, Theoretical Population Biology. 2020.
- Inference of historical population-size changes with allele-frequency data. 10(1), 211--223, G3: Genes, Genomes, Genetics. 2020.
- How to choose sets of ancestry informative markers: A supervised feature selection approach. 46, 102259, Forensic Science International: Genetics. 2020.
- Interdisziplin\"are Überlegungen zu Erweiterten DNA-Analysen. 24(1), 119--154, Jahrbuch f{\"u}r Wissenschaft und Ethik. 2019.
- High-complexity regions in mammalian genomes are enriched for developmental genes. 35(11), 1813--1819, Bioinformatics. 2019.
- The independent loss model with ordered insertions for the evolution of CRISPR spacers. 119, 72--82, Theoretical Population Biology. 2018.
- The fixation probability and time for a doubly beneficial mutant. 128(12), 4018--4050, Stochastic Processes and their Applications. 2018.
- Forensic DNA phenotyping legislation cannot be based on “Ideal FDP”—A response to Caliebe, Krawczak and Kayser (2017). 34, e13--e14, Forensic Science International: Genetics. 2018.
- Limits of noise for autoregulated gene expression. 77(4), 1153--1191, Journal of mathematical biology. 2018.
- Fixation probabilities and hitting times for low levels of frequency-dependent selection. 124, 61--69, Theoretical Population Biology. 2018.
- A spatial model for selection and cooperation. 54(2), 522--539, Journal of Applied Probability. 2017.
- The fixation time of a strongly beneficial allele in a structured population. 21(61), 1--42, Electron. J. Probab. 2016.
- Large-scale behavior of the partial duplication random graph. 13, 687--710, ALEA. Latin American Journal of Probability \& Mathematical Statistics. 2016.
- In silico modeling of the dynamics of low density lipoprotein composition via a single plasma sample [S]. 57(5), 882--893, Journal of lipid research. 2016.
- A mixing tree-valued process arising under neutral evolution with recombination. 20, 1--22, Electronic Journal of Probability. 2015.
- andi: Fast and accurate estimation of evolutionary distances between closely related genomes. 31(8), 1169--1175, Bioinformatics. 2015.
- Stochastic gene expression with delay. 364, 355--363, Journal of theoretical biology. 2015.
- The stationary distribution of a markov jump process glued together from two state spaces at two vertices. 31(4), 525--553, Stochastic Models. 2015.
- Scaling limits of spatial compartment models for chemical reaction networks. 25(6), 3162--3208, The Annals of Applied Probability. 2015.
- How spatial heterogeneity shapes multiscale biochemical reaction network dynamics. 12(104), 20141106, Journal of the Royal Society Interface. 2015.
- Classification of phenotypic subpopulations in isogenic bacterial cultures by triple promoter probing at single cell level. 198, 3--14, Journal of Biotechnology. 2015.
- The infinitely many genes model with horizontal gene transfer. 19(115), 1--27, Electron. J. Probab. 2014.
- Correction: The Yule Approximation for the Site Frequency Spectrum after a Selective Sweep. 9(1), Plos one. 2014.
- Some limit results for Markov chains indexed by trees. 19(77), 1--11, Elect. Comm. in Probab.. 2014.
- Some large deviations in Kingman's coalescent. 20(7), 1--14, Elec. Comm. Probab.. 2014.
- Genome-wide linkage-disequilibrium profiles from single individuals. 198(1), 269--281, Genetics. 2014.
- The yule approximation for the site frequency spectrum after a selective sweep. 8(12), e81738, PLoS One. 2013.
- Path-properties of the tree-valued Fleming–Viot process. 18(84), 1--47, Electron. J. Probab. 2013.
- A Brownian ratchet for protein translocation including dissociation of ratcheting sites. 66(3), 505--534, Journal of mathematical biology. 2013.
- Tree-valued resampling dynamics martingale problems and applications. 155(3-4), 789--838, Probability Theory and Related Fields. 2013.
- An alignment-free test for recombination. 29(24), 3121--3127, Bioinformatics. 2013.
- Modeling quorum sensing in Sinorhizobium meliloti. 2, 59--74, International Journal of Biomathematics and Biostatistics. 2013.
- Competing islands limit the rate of adaptation in structured populations. 90, 1--11, Theoretical population biology. 2013.
- The ancestral selection graph under strong directional selection. 87, 25--33, Theoretical population biology. 2013.
- The infinitely many genes model for the distributed genome of bacteria. 4(4), 443--456, Genome biology and evolution. 2012.
- Compact metric measure spaces and Λ-coalescents coming down from infinity. 9, 269--278, ALEA. 2012.
- Tree-valued Fleming-Viot dynamics with mutation and selection. 22(6), 2560--2615, Annals of applied probability. 2012.
- Alignment-free population genomics: an efficient estimator of sequence diversity. 2(8), 883--889, G3: Genes| Genomes| Genetics. 2012.
- Muller's ratchet with compensatory mutations. 22(5), 2108--2132, Annals of applied probability. 2012.
- Marked metric measure spaces. 16, 174--188, Elect. Comm. in Probab.. 2011.
- Selective sweeps for recessive alleles and for other modes of dominance. 63(3), 399--431, Journal of mathematical biology. 2011.
- Alignment-free estimation of nucleotide diversity. 27(4), 449--455, Bioinformatics. 2011.
- In silico modelling of human lipoprotein metabolism. 12(1), 22--23, . 2011.
- The tree length of an evolving coalescent. 151, 529--557, Probability theory and related fields. 2011.
- Sensitivity analysis of one parameter semigroups exemplified by the Wright–Fisher diffusion. 3(2), 109--128, Intern. J. Funct. Anal., Oper. Th. Appl.. 2011.
- Estimating parameters of speciation models based on refined summaries of the joint site-frequency spectrum. 6(5), e18155, PLoS One. 2011.
- The diversity of a distributed genome in bacterial populations. 1567--1606, The Annals of Applied Probability. 2010.
- The Aldous–Shields model revisited (with application to cellular ageing). 15, 475--488, Elec. Comm. Probab.. 2010.
- Asymptotics of a Brownian ratchet for protein translocation. 120(6), 901--925, Stochastic processes and their applications. 2010.
- Experimentelle und theoretische Biologie: getrennte Welten?. 40(1), 12--12, Biologie in unserer Zeit. 2010.
- mlRho–a program for estimating the population mutation and recombination rates from shotgun-sequenced diploid genomes. 19, 277--284, Molecular ecology. 2010.
- How often does the ratchet click? Facts, heuristics, asymptotics. 353, 365--390, Trends in stochastic analysis. 2009.
- Convergence in distribution of random metric measure spaces (Λ-coalescent measure trees). 145(1-2), 285--322, Probability Theory and Related Fields. 2009.
- Estimating mutation distances from unaligned genomes. 16(10), 1487--1500, Journal of Computational Biology. 2009.
- The impact of sampling schemes on the site frequency spectrum in nonequilibrium subdivided populations. 182(1), 205--216, Genetics. 2009.
- The pattern of genetic hitchhiking under recurrent mutation. 13(68), 2069--2106, Elec. J. Probab.. 2008.
- Linkage disequilibrium under genetic hitchhiking in finite populations. 179(1), 527--537, Genetics. 2008.
- Approximating genealogies for partially linked neutral loci under a selective sweep. 55(3), 299--330, Journal of mathematical biology. 2007.
- An approximate sampling formula under genetic hitchhiking. 16(2), Annals of Applied Probability. 2006.
- The Finite System Scheme for State-dependent interacting multitype Branching Systems. 2, 1--66, ALEA. 2006.
- Approximate genealogies under genetic hitchhiking. 174(4), 1995--2008, Genetics. 2006.
- The process of most recent common ancestors in an evolving coalescent. 116(12), 1836--1859, Stochastic Processes and their Applications. 2006.
Here is the list from
MathSciNet
Here is the list from
google scholar
Here is my
orcid profile.
Bachelor thesis: The variance of the playing time in Gambler's Ruin (January 2026)
Two players play against each other. Both players stake one Euro per round; player \(A\) (\(B\)) wins with probability \(p\) (\(1-p\)). Starting with amounts \(x\) and \(N-x\), the game runs for a random time \(T\) until one player is ruined. The classical result is \(E[T] = x(N-x)\); the aim of the thesis is to compute \(V[T]\). This can be done via generating functions, or somewhat more directly.
- Jiri Andel and Sarka Sudecova. Variance of the game duration in the gambler's ruin problem. Statistics and Probability Letters 82(9), 1750–1754, 2012.
- E. Bach. Moments in the duration of play. Statistics and Probability Letters 36, 1–7, 1997.
- William Feller. An Introduction to Probability Theory and its Applications (3rd edition). New York: Wiley, 1968.
Master thesis: The greedy strategy in the all-heads coin game (May 2026)
In the all-heads coin game a player repeatedly tosses a pool of coins, each landing heads with probability \(p\). After every round, all coins showing heads are set aside and the remaining coins (tails) are re-tossed. A round in which no coin shows heads is lost; the player wins as soon as the pool is empty. The greedy strategy — set aside every head — yields the success probability \(b_{n,p}\) when starting with \(n\) coins.
Recently, I have assistend AI to find a formula for \(b_{n,p}\). The goal of the thesis is to check all arguments, as well as a proper write-up. See paper2.pdf.
Bachelor and Master thesis: Lean formalization (ongoing)
We are running several formalization projects for the Lean Interactive Theorem Prover. Here is a list of topics you can contribute to:
- Poisson convergence
- Large deviations
- Specific probability distributions
- Markov chains
- Random graphs
- Generating random numbers
- Statistics
Master thesis: Prior-Data Fitted Networks and Martingale Posteriors (June 2026)
Summary of arXiv:2505.11325: Prior-data fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular datasets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not provide any uncertainty quantification for predictive means, quantiles, or similar quantities. The paper proposes a principled, efficient, and tuning-free sampling procedure to construct Bayesian posteriors for such estimates based on martingale posteriors, and proves its convergence. Several simulated and real-world data examples showcase the efficiency and calibration of the method in inference applications.
The thesis consists of placing martingale posterior distributions in the framework of theoretical statistics and implementing them.
- Interactive Theorem Proving using Lean
- Mathematische Statistik
- Measure Theory, Probability Theory, Stochastic Processes, and Stochastic Analysis
- Probabilistic aspects of machine learning
- Python for Data Analysis
- Stochastische Modelle in der Biologie
- Stochastik
- Stochastik II (für BSc)
- Vorkurs Mathematik