Genetic variations of an individuals' DNA, like Single Nucleotide Variants (SNVs) or Indels (Insertions and Deletions), can play important roles in many diseases. Therefore, genotyping variants, that is, determining their zygosity status, is an important step in analyzing such diseases. The general approach to genotyping from sequencing data is to analyze reads covering the corresponding regions in the genome. Current genotyping approaches rely on short sequencing reads from second generation sequencing devices for this task.
In my talk, I present novel algorithms able to use long sequencing reads from third generation sequencing platforms to genotype SNVs and indels. These platforms come with the drawback of high sequencing error rates. On the positive side, they provide information from neighboring variants, but this has, so far, not been exploited for genotyping. We provide a way to achieve this by considering bipartitions of all given sequencing reads, corresponding to the two haplotypes. We formalize this computational problem in terms of a Hidden Markov Model and compute posterior genotype probabilities using the Forward-Backward algorithm. Genotype predictions can then be made by picking the likeliest genotype at each site. Our experiments confirm that this approach allows to leverage the power of long sequencing reads for genotyping, which current genotypers are not able to do.