Projects

Here are some descriptions of ongoing research projects.

In all our work, we use probability theory as the basis for understanding phenomena in the life sciences. Fields of application that we work on are population genetics, forensic genetics and neuroscience.

We consider \(K\) populations and data on the DNA sequence at \(M\) different positions (i.e., we use \(M\) different markers) for \(N\) individuals. Based on this knowledge, we estimate the ancestry of individuals with the Admixture Model. Specifically, we assume that the ancestry of each individual is a mixture of the $K$ non-admixed populations, such as those from Europe or Africa.

We mainly focus on the following topics concerning the ancestry of individuals:

  • Theoretical Properties of the Estimator: The ancestry is often estimated with a Maximum-Likelihood Estimator. We prove some theoretical properties of this estimator, e.g. consistency and central limit results. Marker Selection: The question arises which markers are the best ones for the estimation of the ancestry. We use central limit results to answer this question.
  • Classification: In the forensic genetics, researchers often classify the individuals into populations. The output is the probability that an individual comes from a specific population. We combine – together with Frank Hutter and Lennart Purucker – machine learning and probability theory to improve the state-of-the-art method to classify individuals into populations.
  • Statistical Test: It might be that the parents of an individual are from different populations. Then, it does not make sense to classify this individual into one single population. We provide statistical tests to e.g. evaluate whether the classification makes sense. A Part of this work is joint with Sabine Lutz-Bonengel .
This project if part of the CRC Small Data.

TEs are mobile genetic elements and oftentimes in the process of moving, they duplicate. In our ongoing research project, we are interested in modeling the evolution of TEs in the species of Macrotermes. By utilizing long-read, high-fidelity (HiFi) sequencing, we are able to annotate the TEs present in the Macrotermes genome and study their abundance using some bioinformatic tools.

Importantly, by employing a de novo annotation approach, we can also discover and characterize novel TE families that are not represented in existing TE libraries. This is a valuable endeavor, as the movement and duplication of TEs can have significant genotypic and phenotypic consequences, such as gene disruption and the generation of new genetic elements.

CRNs

The characterization and convergence of stochastic processes can be studied by (so-called) martingale problems. We are working on the development of new convergence results using this technique, in particular corresponding to Chemical Reaction Networks.

In recent years, mathematicians can formalize their research results using the Lean Theorem prover. We are working to expand the stochastics section of the corresponding library.