ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: IMGT/PhyloGene : an online software package for phylogenetic analysis of IG and TR genes
P31
Elemento, Olivier; Lefranc, Marie-Paule

olivier@ligm.igh.cnrs.fr
IMGT, the international ImMunoGeneTics database

We have developed IMGT/PhyloGene, a Web based application for evolutionary analysis of immunoglobulin (IG) and T cell receptor (TR) variable genes within IMGT. IMGT/PhyloGene is integrated within IMGT, the international ImMunoGeneTics database [5] which is a integrated information system specialising in immunoglobulins, T cell receptors and major histocompatibility complex (MHC) molecules of human and other vertebrates. The conserved structure of the IG and TR variable (V) domain led the IMGT researchers to introduce a unique numbering system for V-REGIONs and V-DOMAINs, e.g. a set of rules that can be used to position conserved amino acids within a protein sequence [4].
The benefits of this numbering is that it provides a common base to every researcher in the immunogenetics field to compare and share sequence data. This numbering enables IG and TR gene sequences from different species, groups, and subgroups to be aligned without resorting to computationally heavy multiple alignement procedures, e.g. CLUSTALW [11]. These standardized alignements can then be used to perform various phylogenetic analyses. In the IMGT context, the main aspect of phylogenetic analysis is, given a set of IG and TR sequences, to reconstruct their gene tree, thus providing visual comparison of sequences from multiple species, groups and subgroups. The use of standardized data means that trees reconstructed using different methods or from different sets of sequences can be compared. The free availability of the data on the IMGT Web site also means that these analyses can be reproduced whenever needed. IMGT/PhyloGene is one of the first tools to use the IMGT standardized data and aims at automating the phylogenetic analysis procedure using Web forms and online visualisation tools. It is meant to provide fast though accurate tree reconstructions and does not require deep knowledge about phylogenetic analysis.

IMGT/PhyloGene currently consists of 6 complementary tools :
- a selection tool which allows to interactively and progressively select the IG and TR sequences to analyse. The sequences currently available for selection are the human and mouse IGHV, IGKV, IGLV, TRAV, TRBV, TRDV and TRGV gene nucleotide sequences. These sequences are identical to those available in the human and mouse IMGT reference directory, except that only the first allele (allele *01) of each gene can be selected.
At this point, the selected sequences can also be completed with the user's own sequences in FASTA format, provided they strictly follow the IMGT unique numbering for V-REGION.
- a synonymous and non-synonymous substitution rates estimation tool, which computes the estimated rates between the selected sequences according to the Gojobori and Nei method [2].
- an evolutionary distance calculation tool which computes a distance
matrix from the selected sequences. At the moment only the Kimura-2 parameters [3] model of substitution is implemented in IMGT/PhyloGene. As in most phylogenetic analyses of IG and TR genes (e.g. [7]), the default strategy is to remove the CDR-IMGT for the phylogenetic analysis. Indeed, their various lengths do not allow a meaningful alignment. However, the user is left to chose not to remove the CDR-IMGT regions.
- a tree building tool which uses the popular Neighbor-Joining (NJ) algorithm [8] to build a phylogenetic tree from the previously calculated distance matrix. The benefits of using NJ are both its speed and its proven efficiency in terms of topological accuracy.
- a tree drawing tool which provides a graphical representation of a rooted phylogenetic tree, in the form of a horizontal phenogram. The phylogenetic tree created by the NJ algorithm can be rooted through the interface using two methods: the midpoint rooting method, which consists in locating the root of the tree at the middle of the tree path linking the most distant sequences in the valued tree; the outgroup method, which consists in introducing an outgroup within the sequences to analyse, i.e. one or several sequences which are known to have diverged or duplicated prior to every other divergence or duplication. For example, a tree of IG genes can be rooted with a TR gene, and inversely. A set of IG and TR genes can be rooted with a V-LIKE sequence, e.g. a CD4, CD8A or CD8B sequence [6].
- a tree alignment tool which displays, at the tips of a phylogenetic tree, the aligned subsequences corresponding to V-REGION related labels. Once a tree has been constructed, this tool enables the user to view, along with the gene names, the sequences corresponding to the CDR1-IMGT, FR1-IMGT, CDR2-IMGT, FR2-IMGT, CDR3-IMGT, FR3-IMGT and V-RS (recombination signal) when available. This feature is particularly useful for CDR-IMGT, since it may give clues about the evolution of this crucial antigen binding region. Moreover, a most parsimonious reconstruction procedure [10] allows to recover information about the ancestral lengths of these regions.

IMGT/PhyloGene is integrated to IMGT and thus benefits of frequent updates. The possibility to analyze sequences that are not in the IMGT/PhyloGene database allows to conduct visual identity searches, i.e. to assign new genes or alleles to a subgroup, or to compare new genes from other species to related human or mouse gene, group or subgroup.
However, it is important to note that IMGT/PhyloGene is not meant to replace traditional phylogeny programs such as those found in the PHYLIP [1] or PAUP [9] packages, especially when statistical reliability analyses (such as the bootstrap procedure) need to be performed.
On the implementation side, IMGT/PhyloGene is a set of Perl programs accessing sequences cached (for improving performances) in a MySQL database, and interfacing several fast external programs written in the C language, for distance calculation, tree construction and tree manipulations. IMGT/PhyloGene is freely available at http://imgt.cines.fr.
[1] J. Felsenstein. Phylip - phylogeny inference package. Cladistics, 5:164­166, 1989.
[2] T. Gojobori and M. Nei. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Molecular Biological Evolution, 3:418­426, 1986.
[3] M. Kimura. A simple model for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution, 16:111­120, 1980.
[4] M-P. Lefranc. The IMGT unique numbering for immunoglobulins, T cell receptors and Ig-like domains. The Immunologist, 7:132­136, 1999.
[5] M-P. Lefranc. IMGT, the international immunogenetics database. Nucleic Acids Research, 29:207­209, 2001.
[6] M-P. Lefranc, C. Pommie, M. Ruiz, V. Giudicelli, E. Foulquier, L. Truong, V. Thévenin-Contet, and G. Lefranc. IMGT unique numbering for immunoglobulins and t cell receptor variable domains and ig superfamily v-like domain. Dev Comp Immun, 2002. In Press.
[7] M. Nei, X. Gu, and T. Sitnikova. Evolution by the birth-and-death process in multigene families of the vertebrate immune system. Proc. Natl. Acad. Sci. USA, 94:7799­7806, 1997.
[8] N. Saitou and M. Nei. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4:406­425, 1987.
[9] D.L. Swofford. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts, 1999.
[10] D.L. Swofford, P.J. Olsen, P.J. Waddell, and D.M. Hillis. Molecular Systematics, chapter Phylogenetic Inference, pages 407­514. Sinauer Associates, Sunderland, Massachusetts, 1996.
[11] J.D. Thompson, D.J. Higgins, and T.J. Gibson. CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Research, 22(22):4673­4680, 1994.