We consider $K$ populations and data on the DNA sequence at $M$
different positions (i.e., we use $M$ different markers) for $N$
individuals. Based on this knowledge, we estimate the ancestry
of individuals with the Admixture Model. Specifically, we assume
that the ancestry of each individual is a mixture of the $K$
non-admixed populations, such as those from Europe or Africa.
We mainly focus on the following topics concerning the ancestry
of individuals:
-
Theoretical Properties of the Estimator: The ancestry is often
estimated with a Maximum-Likelihood Estimator. We prove some
theoretical properties of this estimator, e.g. consistency and
central limit results. Marker Selection: The question arises
which markers are the best ones for the estimation of the
ancestry. We use central limit results to answer this
question.
-
Classification: In the forensic genetics, researchers often
classify the individuals into populations. The output is the
probability that an individual comes from a specific
population. We combine – together with Frank Hutter and
Lennart Purucker – machine learning and probability theory to
improve the state-of-the-art method to classify individuals
into populations.
-
Statistical Test: It might be that the parents of an
individual are from different populations. Then, it does not
make sense to classify this individual into one single
population. We provide statistical tests to e.g. evaluate
whether the classification makes sense. A Part of this work is
joint with Sabine Lutz-Bonengel.
Zusammen mit
Angelika Rohde
und
Sabine Lutz-Bonengel
bin ich Projektleiter in einem Projekt zur forensischen Genetik
des Sonderforschungsbereichs
Small Data.