ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: IMGT/3Dstructure-DB for immunoglobulin, T cell receptor and MHC structural data
P72
Kaas, Quentin; Lefranc, Marie-Paule

kaas@ligm.igh.cnrs.fr
IMGT, the international ImMunoGeneTics database

Introduction
IMGT/3Dstructure-DB is a three dimensional immunological structure database. It is part of IMGT [1], the international ImMunoGenetics database(R), the high-quality integrated information system specialising on immunoglobulins (IG) [2], T cell receptors (TR) [3] and MHC molecules of human and other vertebrates, which consists of databases, web resources and tools. IMGT/3Dstructure-DB data are described according to the IMGT Scientific Chart rules based on IMGT-ONTOLOGY [4]. IMGT/3Dstructure-DB provides IMGT gene and allele identification of IG, TR and MHC with known 3D structures [2,3], IG, TR and MHC domain delimitations, amino acid positions according to the IMGT unique numbering [5,6], and renumbered coordinate flat files. IMGT/3Dstructure-DB is available on-line at http://imgt.cines.fr.

IMGT/3Dstructure-DB Query page
IMGT/3Dstructure-DB is queried through a user friendly CGI interface. The user can search (1) by PDB code, protein name, or fragment type, or (2) by selecting a group, a subgroup, a gene or a chain type, and a species, or (3) by typing a complete 'Structural query' sentence. This last query is composed of a list of mnemonics (described in the IMGT/3Dstructure-DB Documentation) coordinated by parentheses and logical boolean operators. For example, mnemonics describe the complementarity determining regions (CDR-IMGT) lengths of a variable domain (V-DOMAIN), the phi angle at a given position or the distance between alpha carbons of two positions. Two displays are available for the results: 'Protein overview' and 'Sequence details'.

IMGT/3Dstructure-DB Results pages
In the 'Protein overview' results table, IMGT/3Dstructure-DB data are displayed with the PDB code, IMGT protein names, fragment type, species, potential ligands and experimental method. Each entry is detailed in an IMGT/3Dstructure-DB card, accessible by clicking in the first column. The IMGT/3Dstructure-DB card comprises:
- a protein summary table (IMGT protein name, receptor and fragment type, species and chain names).
- technical and bibliographical data.
- a link to the contact analysis results. Atoms are considered to be in contact when no water molecule can take place between them.
- a link to the IMGT/3Dstructure-DB file renumbered according to the IMGT unique numbering. The file can be displayed on-line or downloaded).
- a detailed description of the individual chains: amino acid sequence with domain and region delimitations, characterization of each domain (domain type, IMGT gene and allele names, sequence with IMGT gaps, 2D graphical representation or 'Collier de Perles' [7]). For IG and TR V-DOMAINs, CDR-IMGT lengths and 'Collier de Perles' on two layers with hydrogen bonds are also provided.
In the 'Sequence details' results page, amino acid sequences of the selected domain are displayed with a link to the IMGT/3Dstructure-DB cards.

Statistics
The IMGT/3Dstructure-DB database manages 596 coordinate files, which correspond to 354 different proteins (283 IG, 22 TR and 49 MHC). IG structures include 64 Homo sapiens, 177 Mus musculus, 8 Camelus dromedarius, 7 Rattus rattus, 1 Cricetinae gen. sp. and 26 engineered proteins. TR structures include 7 Homo sapiens and 12 Mus musculus proteins. MHC structures include 27 Mus musculus and 22 Homo sapiens proteins. Two hundred and five different V genes and alleles were identified in V-DOMAINs: 183 IG (96 IGHV, 15 IGLV, 72 IGKV) and 22 TR (11 TRAV, 8 TRBV 2 TRDV and 1 TRGV).

IMGT/3Dstructure-DB implementation
Database administration
IMGT/3Dstructure-DB programs are written in Perl [8]. The file coordinates are extracted once a week from PDB and selected by keywords checking through the file text. The program IMGT3DAlleleAlign was implemented for the analysis of the amino acid sequences: IMGT gene and allele identification and region delimitation (V-REGION, J-REGION, etc.) are obtained by running sequentially the sequences with the FASTA program [9], against the IMGT reference directory sets [2,3]. This program delimits the V-DOMAIN by combining the V-J-REGION or V-D-J-REGION, depending from the chain type [2,3]. Amino acid IMGT numerotation is created by comparing the domain sequences with the IMGT reference sequences. A program was implemented to determine the distances between alpha carbons of a domain (used in 'Structural query') and the contacts between atoms of different domains (used in 'Contact between domains'). Chain partners are identified by both sequence and contact analysis.

Database organization
The database server is MySQL [10]. The database tables are organized in four groups: administrative data (bibliographical, experimental methods, etc.), protein description (quaternary structure, chain composition, regions with IMGT gene and allele names, domain type), amino acid IMGT positions (amino acid type and structural properties), contacts between domains or between domains and ligands (at the domain, amino acid and atom levels).

Conclusion
IMGT/3Dstructure-DB integrates data from sequence and structural sources. This database provides, for the first time, the identification of IMGT genes and alleles expressed in the IG, TR and MHC with known 3D structures. This information is of high value since the IMGT gene names for IG and TR [2,3] have been approved by HGNC, the HUGO Gene Nomenclature Committee in 1999 [11], and entered in LocusLink (NCBI), GDB and GeneCards. Moreover, IMGT/3Dstructure-DB provides also, for the first time, an identical numbering for positions in the 1D, 2D and 3D structures of antigen receptors, whatever the receptor type (IG or TR), the chain type (heavy, kappa, lambda for IG, and alpha, beta, gamma, delta for TR) and whatever the domain (V-DOMAIN or C-DOMAIN). These standardizations will provide a great help in large scale sequence-structure studies, and more generally in protein engineering.

Acknowledgements
IMGT is funded by the European Union's 5th PCRDT (QLG2-2000-01287) and CNRS.
[1] Lefranc, M.-P., IMGT, the international ImMunoGeneTics database. Nucleic Acid Research. 29:207-209 (2001)
[2] Lefranc, M.-P., Lefranc, G., The immunoglobulin FactsBook. Academic Press, London UK, 458 pages (2001)
[3] Lefranc, M.-P., Lefranc, G., The T cell receptor FactsBook. Academic Press, London UK, 398 pages (2001)
[4] Giudicelli, V., Lefranc, M.-P., Ontology for immunogenetics: IMGT-ONTOLOGY. Bioinformatics. 15:1047-54 (1999)
[5] Lefranc, M.-P., Unique database numbering system for immunogenetics analysis. Immunology Today. 18:509 (1997)
[6] Lefranc, M.-P., The IMGT unique numbering for immunoglobulins, T cell receptors and Ig-like domains. The Immunologist. 7:132-136 (1999)
[7] Ruiz, M., Lefranc, M.-P., IMGT gene identification and Colliers de Perles of human immunoglobulins with known 3D structures. Immunogenetics 53:857-883 (2002)
[8] http://www.perl.com
[9] Pearson, W.R., Rapid and Sensitive Sequence Comparison with FASTP and FAST. Methods in Enzymology 183:63-98 (1990)
[10] http://www.mysql.com
[11] Wain, H.M., et al., Guidelines for human gene nomenclature. Genomics. 79:464-470 (2002)