New for: D1, D2, D3, D4, D5
In this thesis, we model the evolution of the Influenza genome with a Bayesian Network with the nodes corresponding to the immune system components and positions of the Influenza genome sequence. Since patient data with infecting virus strain and immune system information is not available, we create the most probable patient data based on different approaches. One of these approaches is to sample immune system information from a database containing the regional human leukocyte antigen (HLA) allele distribution and combine these samples with data from the Influenza databases containing virus sequences with regional information. Besides this approach we also combine haplotype and binding affinity information with HLA information to get more realistic patient data. After creating patient data for each region we calculate HLA to amino acid residue associations and amino acid residue to amino acid residue associations based on the likelihood ratio test. We learn logistic regression models for all possible associations and estimate the false discovery rates and the q-values based on the likelihood ratio test.
The final Bayesian Network of Influenza virus evolution contains significant HLA and amino acid associations as directed arcs to the amino acid residues. Using this model, we show how these associations influence the evolution of the Influenza virus, and find direct and indirect immune system pressure on amino acid residues. We evaluate these associations on a small test set of epitopes and show, that besides discovering new associations, we can rediscover some associations from HLAs to amino acid residues that are already known. With this model we can predict the susceptibility of the regions to an outbreaks regarding the T-cell response and make implications on vaccine design.