Max-Planck-Institut für Informatik
max planck institut
informatik
mpii logo Minerva of the Max Planck Society
 

Spotlight: Released Monday, 15 October 2001

Computational Biology at the Beginning of the Post-genomic Era

Thomas Lengauer

Note: This article is republished on the web with the kind permission of Springer-Verlag.

It appeared in Lecture Notes for Computer Science Volume 2000. "Informatics: 10 Years Back - 10 Years Ahead", Reinhard Wilhelm (Ed.), Springer, Berlin 2000, p. 341-355, ISBN 3-540-41635-8.

1. Introduction

Computational biology and bioinformatics are terms for an interdisciplinary field joining information technology and biology that has skyrocketed in recent years. The field is located at the interface between the two scientific and technological disciplines that can be argued to drive a significant if not the dominating part of contemporary scientific innovation. In the English language, computational biology refers mostly to the scientific part of the field, whereas bioinformatics addresses more the infrastructure part. In other languages (e.g. German) bioinformatics covers both aspects of the field.

The goal of this field is to provide computer-based methods for coping with and interpreting the genomic data that are being uncovered in large volumes within the diverse genome sequencing projects and other new experimental technology in molecular biology. The field presents one of the grand challenges of our times. It has a large basic research aspect, since we cannot claim to be close to understanding biological systems on an organism or even cellular level. At the same time, the field is faced with a strong demand for immediate solutions, because the genomic data that are being uncovered encode many biological insights whose deciphering can be the basis for dramatic scientific and economical success. At the end of the pre-genomic era that was characterized by the effort to sequence the human genome we are entering the post-genomic era that concentrates on harvesting the fruits hidden in the genomic text. In contrast to the pre-genomic era which, from the announcement of the quest to sequence the human genome to its completion, has lasted less than 15 years, the post-genomic era can be expected to last much longer, probably extending over several generations.

2. Grand Challenge Problems in Computational Biology

At the basis of the scientific challenge in computational biology there are fundamental problems in computational biology that are independent from applications and very hard to solve. In this section we will mention a number of these problems. We choose to list these problems in order of increasing "distance" from the genetic sequence. We start with problems that deal directly with the sequence and then step progressively towards analyzing issues of phenotype.

Grand Challenge 1: Finding Genes in Genomic Sequences

The genomic blue prints of proteins are provided by genes. Via the genetic code, the gene determines the exact amino acid sequence of the protein chain. The transcription machinery of the cell reads genes and translates them into the appropriate protein chain. In higher organisms, only a minor part of the genome codes for proteins. For man, this fraction lies in the low percentage range (3 to 6 percent).

The gene is much more than the coding sequence along the genome. There is associated infrastructure to help the transcription machinery to attach to the DNA, in order to read off the coding information, see Figure 1. Furthermore, there are sequence regions that help regulate the transcription, including mechanisms for blocking the transcription. In eucaryotes, genes also have a complex internal structure. They are not contiguous, but parts of the gene, the so-called introns, are eliminated before the translation into the protein. In Figure 1, only the pre-mRNA part is transcribed from the DNA sequence, and only the coding exons, depicted by the gray boxes finally make it into protein. Gene identification is tantamount to elucidating all aspects of the gene structure. The especially taxing part is the location and interpretation of functional signals in the upstream (left-hand) regulatory region of the gene. This is where proteins bind to the DNA in order to regulate gene transcription [1]. Gene identification algorithms use typical sequence analysis methods, including string comparison and alignment, for instance based on dynamic programming, and statistical methods such as hidden Markov models [2].
Figure 1: Structure of a eucaryotic gene

Figure 1: Structure of a eucaryotic gene (adapted from [2])

Grand Challenge 2: Protein Folding and Protein Structure Prediction

The famed protein folding problem asks, how the amino-acid sequence of a protein adopts its native three-dimensional structure under natural conditions (e.g. in aqueous solution, with neutral pH at room temperature). People develop Molecular Dynamics methods to solve this very difficult problem [3]. IBMs highly advertized Blue Gene Project aims at building a supercomputer that is supposed to provide the resources for solving this problem. Very recent developments raise hope that the problem may not be as difficult as has been thought previously [4].

Protein folding has to be distinguished from protein structure prediction. In the latter problem, we are not interested in the folding process but just in the final structure attained. Stated as such, the problem is also called the ab initio protein structure prediction problem, because no additional information is accessible to aid in the task. People prefer Monte Carlo methods to solve this problem [5]. This is because the folding process takes too much time to be effectively simulated by today's Molecular dynamics methods. So far only short helical protein sequences can be folded successfully.

There is a substantially easier version of the protein structure prediction problem that is much further along its way to solution. This version is based on the observation that the many millions of protein sequences that life has come up with so far fold into a remarkably small set of basic protein structures, only a few thousand. Assume that we know one representative protein for each structure. Then it seems reasonable to ask the question as follows: Given a protein sequence A, does protein A fold into a structure of type 1 or type 2, etc. After a few thousand queries, we are done. Interestingly enough, each query can be answered with increasing reliability on a PC in a few minutes or so. This method is covered by the terms protein threading - for finding the structural backbone of the protein - and homology-based modeling - for filling in the atomic details. It is the workhorse of today's protein structure predictions [6,7]. At GMD, we have contributed effective software to this field [8,9]. The method is limited only by the fact, that it cannot invent a protein structure that has never been seen before. And, as of now, we know only an estimated 20 to 30% of the several thousand presumed protein structures. For more details on protein structure prediction, see [10,11].

Figure 2: Homology-based Protein Structure Prediction

Figure 2: Homology-based Protein Structure Prediction

Now that, in principle, we can access all genes in a genome, we can also start a genome-wide structure prediction effort. This is the object of several Structural Genomics projects world-wide [12]. All of them combine experimental methods for resolving new protein structures (by X-ray or NMR techniques) with homology-based modeling to find related protein structures. However, several classes of protein structures - such as proteins that are located inside the cell membrane - are extremely hard to resolve.

In the presence of Structural Genomics efforts, it will be interesting to see, whether ab initio protein structure prediction will play a significant role in elucidating protein structure space.

Grand Challenge 3: Estimating the Free Energy of Biomolecules and their Complexes

Part of why the protein structure prediction problem is so hard is that estimating the free energy of a protein conformation accurately is impossible, so far. However, molecules adopt the conformation of lowest free energy. Thus Nature solves a complex minimization problem in a very high-dimensional space. Each atom has three coordinates, thus the dimension of the space is three times the number of atoms, which is in the many thousands for a protein. Covalent bonds reduce this number somewhat, but not enough to make a difference computationally. In addition, up to millions of surrounding water molecules contribute to the energy balance. The scalar energy function that forms a landscape above this space has innumerably many local minima at approximately the same low energy. We are looking for the global minimum among them. This multiminima problem is at the heart of the difficulty of protein folding [13,14].

In this paragraph, we address an even more fundamental issue, however. Even computing a single point on the free energy landscape is impossible. The reason is that free energy is a thermodynamic average involving enthalpic (force) contributions as well as entropic (disorder) contributions. Computing free energies would actually necessitate performing statistics on large molecular ensembles. This exceeds any envisionable computing power. However, the picture may not be as bleak, if we are just interested in energy differences [15,16]. This problem is important not only for protein folding but also for molecular docking: One criterion for distinguishing good drugs from bad drugs is that good drugs bind tightly to their target protein, resulting in a low free energy of the molecular complex as compared to the dissociated molecules (see Section 3 below). Ranking drug molecules accurately according to this measure is a critical step in a computer-based approach to drug design [17,18].

Grand Challenge 4: Simulating a Cell

In the last paragraph we have stepped up from discussing molecular structures to discussing molecular interaction. Inside a cell, life’s processes are motorized by complex networks of such interactions involving many thousands different proteins and metabolites. Even understanding only the connectivity of such a network, e.g. the topology of signal transduction pathways, on a cell-wide scale is beyond us, today. Ultimately we want to understand its kinetics, i.e. what effects up-regulating or down-regulating an enzyme would have on the equilibrium reached by the cell. Mathematical methods for analyzing these interdependencies touch the difficult field of dynamic systems [19] and are restricted to very small networks, currently. In addition, we are lacking important experimental data that are needed as input to the respective algorithms. The goal of simulating a cell has been formulated, and a few groups are working on it [20], but we are very far from a solution.

3. Computational Biology in Applied Biology and Medicine

Besides the more "timeless" scientific Grand Challenge problems, there is a significant part of computational biology that is driven by new experimental data provided through the dramatic progress in molecular biology techniques. The past few years have provided so-called expression data on the basis of ESTs (expressed sequence tags) and DNA microarrays (DNA chips). These data go far beyond just uncovering genomic sequences. Essentially, we obtain a cell-wide census of certain molecular species.

Figure 4: Schema of a model for simulating a (very simple) bacterial cell (taken from [20])

Figure 4: Schema of a model for simulating a (very simple) bacterial cell (taken from [20])

There are two levels on which we can envision such a census. The first is covered by the term genomics . Here we tally all messenger RNA (mRNA) in the cell. mRNA is the result of the initial gene transcription and the intermediate on the way to the synthesized protein. Presumably lots of mRNA of a certain protein inside a cell also means that the cell is producing lots of this protein. At least this is the hypothesis on which mRNA expression studies are based.

We can expect thousands of different expressed genes inside a complex eucaryotic cell. Using DNA microarrays, for instance, for which there are several different technologies, we can obtain a differential profile of the total mRNA population inside a cell in two specific cell states. This means that, for each gene, we learn how much more or less mRNA there is in cell state 1 than in cell state 2. It is most interesting to compare different cell states of the same tissue, e.g. normal temperature/heat shock, healthy/sick, neutral pH/acidic etc. With a series of differential expression experiments one can even follow trajectories of expression levels, e.g. as a disease develops.

Expression data are a rich source of fundamental biological insight. Harvesting the signals buried in these data is burdened by three major complications:


It is the problems pertaining to point 3 above that motivate to perform the census not on the mRNA level (genomics) but with the synthesized and matured proteins (proteomics ). The resulting experimental technologies are more complicated and not as highly developed as genomics [23]. But with mounting progress on the experimental front, proteomics can be expected to dominate a significant part of computational biology within a few years.

The rapidity with which the experimental procedures develop and the demand to find quick answers to mining the incurring data puts Computational biology under great pressure. We need appropriate statistical tools to correlate homogeneous and inhomogeneous data. In addition to expression data which are at the center of many analyses, there are other sources of knowledge that it would be foolish not to tap:


For efficient mining of gene expression data we eventually have to combine all of this information in the quest to come up with new biological insight. The purposes of this endeavor are manifold. Here we will concentrate on a pharmaceutical application.

Molecular Therapy of Diseases

The development of a new drug as a cure for a disease is performed in two basic steps. The first is the identification of a key molecule, usually a protein, the so-called target protein, whose biochemical function is causative of or at least intimately involved in the disease. The second step is the search for or development of a drug that moderates - often blocks - the function of the target protein.

Figure 5 shows the three-dimensional shape of the protein dihydrofolate reductase (DHFR) which catalyzes a reaction that is important in the cell division cycle. DHFR has a prominent binding pocket in its center that is specifically designed to bind to the substrate molecule dihydrofolate and induce a small modification of this molecule. This activity of DHFR can be blocked by administering the drug molecule methotrexate (MTX) (Figure 6). MTX binds tightly to DHFR and prevents the protein from exercising its catalytic function. MTX is a commonly administered drug in cancer treatment, where our goal is to break the (uncontrolled) cell division cycle. This example shows both the benefits and the problems of current drug design. Using MTX, we can in fact break the cell division cycle and stop tumor growth. However, DHFR is actually the wrong target molecule. It is expressed not only inside the tumor but in all dividing cells, thus a treatment with MTX not only affects the tumor but all dividing cells in the body. This leads to severe side effects such as losing one's hair and intestinal lining. What we need is a more appropriate target protein - one that is specifically expressed inside the tumor and whose inhibition does not cause side effects in other tissues.

Figure 5: The protein dihydrofolate reductase (DHFR)

Figure 5: The protein dihydrofolate reductase (DHFR)

There are presumably at least several thousand suitable drug targets among the perhaps 50 000 different proteins in our body. Less than 500 proteins are targeted by all drugs on the market today. This shows the potential of innovation in this field. It is only by the new expression measurements that we can attempt to globally search for suitable drug targets. The whole experimental and computer-based machinery described above can be employed for this purpose. Pharmaceutical industry is currently placing large bets on this approach - and this again drives much of the bioinformatics research in this area. Recent findings support the hope that this approach is very promising [29]. At GMD we are currently developing software for target protein finding in the context of a couple of concrete human diseases.

Figure 6: The inhibitor methotrexate (MTX) bound to dihydrofolate reaductase

Figure 6: The inhibitor methotrexate (MTX) bound to dihydrofolate reaductase

Searching For New Drugs

Once we have identified the target protein we have to search for a drug that binds tightly to that protein. This search also has been systematized greatly with the advent of very efficient methods for synthesizing new compounds (combinatorial chemistry) and testing their binding properties to the protein target (high-throughput screening). Combinatorial libraries provide a carefully selected set of molecular building blocks - usually dozens or hundreds - together with a small set of chemical reactions that link the modules. In this way, a combinatorial library can theoretically provide a diversity of up to billions of molecules from a small set of reactants. Up to millions of these molecules can be synthesized daily in a robotized process and submitted to chemical test in a high-throughput screening procedure. In our context, the objective of the test is to find out which compounds bind tightly to a given target protein.

Here we have a similar situation as in the search for target proteins. We have to inspect compounds among a very large set of molecular candidates, in order to select those that we want to inspect further. Again, computer help is necessary for preselection of molecular candidates and interpretation of experimental data.

In the computer, finding out whether a drug binds tightly to the target protein can best be done if the protein structure is available. If the spatial shape of the site of the protein to which the drug is supposed to bind is known, then we can apply docking methods to select suitable lead compounds which have the potential of being refined to drugs. The speed of a docking method determines whether the method can be employed for screening compound databases in the search for drug leads. At GMD, we developed the docking method FlexX that takes a minute per instance and can be used to screen up to thousands of compounds on a PC or a hundredthousand drugs on a suitable parallel computer [30]. Docking methods that take the better part of an hour cannot suitably be employed for such large scale screening purposes.

In order to screen really large drug databases with several hundred thousand compounds or more we need docking methods that can handle single protein/drug pairs within seconds or less [31]. The high conformational flexibility of small molecules as well as the subtle structural changes in the protein binding pocket upon docking (induced fit) are major complications in docking. Furthermore, docking necessitates careful analysis of the binding energy (see Grand Challenge 3 above).

Perspectives of Computational Biology in Medical and Pharmaceutical Applications

With the advent of expression measurements, computational biology has gained a major push towards application. We can expect this push to drive much of the field for coming years. The high demand for innovation in medicinal chemistry and molecular medicine will generate new problems for computational biology in short succession. These problems will be tied to emerging experimental methods. Two major directions will be:


4. Methodical Demands on Computational Biology

Of course, nature is much too complex to be modeled to any sufficiently accurate detail. And we have little time to spend on each molecular candidate. Thus we mostly do not even attempt to model things in great physical detail, instead we use techniques from statistics and machine learning to infer " signals" in the data and separate them from "noise". Just as people interpret facial expressions of their dialog partners not by a thorough physiological analysis that reasons backwards from the shape of the muscles to the neurological state of the brain but learn on (usually vast amounts of) data how to tell whether somebody is happy or sad, attentive or bored, so do computational biology models query hopefully large sets of data to infer the signals. Here signal is a very general notion that can mean just about anything of biological interest - from a sequence alignment exhibiting the evolutionary relationship of the two proteins involved over a predicted 2D or 3D structure of a protein to the structure of a complex of two molecules binding to each other. On the sequence level, the splice sites in complex eucaryotic genes, the location and makeup of regulatory regions or the design of signal peptides giving away the final location of the protein in the cells are examples of interesting signals.

Methods that are used to learn from biological data have classically included neural nets and genetic algorithms. Hidden-Markov models [33,34] are a very popular method of generating models for biological signals of all kinds. Recently support vector machines have been applied very successfully to solving classification problems in computational biology [35,36].

As the methods of analysis are inexact so are the results. The analyses yield predictions that cannot be trusted, in general. This is quite different from the usual situation in theoretical computer science, where you are either required to compute the optimum solution or, at least, optimum means something and so does the distance of the computed solution to the optimum, in case that you do not hit the optimum. Not so here. Cost functions in computational biology usually miss the goal. Notions such as evolutionary distance or free energy are much too complex to be reflected adequately by easy-to-compute cost functions. Thus, computational biology is dominated by the search for suitable cost functions. Those cost functions can be trained, just as the models in toto. At GMD, we have developed a training procedure based on linear programming to improve the predictive power of our protein structure prediction methods [37], and employed support vector machines to find initiation sites for the translation of genes into proteins [36]. Another possibility is to leave the mystical parameters in the cost function variable and study the effect of changing them on the outcome. A method for doing this in the area of sequence alignment is presented in [38].

Whether a method or a cost function is good or bad cannot be proved but has to be validated against biologically interpreted data that are taken as a gold standard for purposes of the validation. Several respective data sets have evolved in different bioinformatics domains. Examples are the SCOP [39] and CATH [40] structural protein classifications for validating methods for protein structure prediction and analysis. These sets are not only taken to validate the different methods but also to compare them community-wide.

Validating methods and cost functions on known biological data has a serious drawback. One is not prepared to answer the question whether the method uses the knowledge of the intended outcome, either on purpose or inadvertently. Therefore, the ultimate test of any computational biology methods is a blind prediction, one that convincingly makes a prediction without previous knowledge of the outcome. To stage a blind prediction experiment involves a certifying authority that vouches for the fact that the knowledge to be predicted was not known to the predictor. The biannual CASP (Critical Assessment of Structure Prediction Methods [10]) experiment series that was started in 1994 performs this task for protein structure prediction methods. The CASP team provides a world-wide clearing house for protein sequences whose structures are in the process of being resolved, e.g. by crystallographers. The group that resolves the structure communicates the protein sequence to the CASP team that puts it on the web up for prediction. Sufficiently long before the crystallographers resolve the structure, the prediction contest closes on that sequence. After the structure is resolved it is compared with the predictions. CASP has been a tremendous help in gaining acknowledgement for the scientific discipline of protein structure prediction.

5. Summary

Computational biology is an highly significant and very demanding branch of applied computer science. This article could only touch upon a few research topics in this complex field. computational biology is a young field. The biological systems under study are not very well understood yet. Models are rough, data are voluminous but often noisy. This limits the accuracy of computational biology predictions. However, the analyses improve quickly, due to improvements on the algorithmic and statistical side and to the accessibility to more and better data. Nevertheless, computational biology can be expected to be a major challenge for some time to come.

Pharmaceutical industry was the first branch of the economy to strongly engage in the new technology combining high-throughput experimentation with bioinformatics analysis. Medicine is following closely. Medical applications step beyond trying to find new drugs on the basis of genomic data. The aim here is to develop more effective diagnostic techniques and to optimize therapies. The first steps to engage computational biology in this quest have already been taken.

While driven by the biological and medical demand, computational biology will also exert a strong impact onto information technology. Since, due to their complexity, we are not able to simulate biological processes on the basis of first principles, we resort to statistical learning and data mining techniques, methods that are at the heart of modern information technology. The mysterious encoding that Nature has afforded for biological signals as well as the enormous data volume present large challenges and are continuing to have large impact on the processes of information technology themselves.

One important point that we want to stress in the end is this. The impact of computational biology research critically depends on an accurate understanding of the biological process under investigation. It is essential to ask the right questions, and often modeling takes priority over optimization. Therefore, we need people that understand and love both computer science and biology to bring the field forward. Fortunately, it seems that a growing number of people discover their interest in both disciplines that make up computational biology.

6. Acknowledgements

I am grateful to Joannis Apostolakis and Joachim Selbig for helpful remarks on the draft of this manuscript.

7. References
1 T. Werner, Analyzing Regulatory Regions in Genomes, in Bioinformatics - From Genomes to Drugs (T. Lengauer, ed.), Wiley-VCH, Heidelberg, to appear.
2 V. Solovyev, Structure, Properties and Computer Identification of Eucaryotic Genes, in Bioinformatics - From Genomes to Drugs (T. Lengauer, ed.), Wiley-VCH, Heidelberg, to appear.
3 S. He, H. A. Scheraga, Brownian Dynamics Simulations of Protein Folding. J. Chem. Phys. 108 (1998) 287-300.
4 D. Baker, A Surprising Simplicity to Protein Folding, Nature 405 (2000) 39-42.
5 J. Kostrowicki, H. A. Scheraga, Application of the Diffusion Equation Method for Global Optimization of Oligopeptides, J. Phys. Chem. 96 (1992) 7442--7449.
6 R. L. Dunbrack, Jr., Homology Modeling in Biology and Medicine, in Bioinformatics - From Genomes to Drugs (T. Lengauer, ed.), Wiley-VCH, Heidelberg, to appear.
7 R. M. Zimmer, Protein Structure Prediction and Applications in Structural Genomics, Protein Function Assignment and Drug Target Finding, in Bioinformatics - From Genomes to Drugs (T. Lengauer, ed.), Wiley-VCH, Heidelberg, to appear.
8 R. M. Zimmer, R. Thiele, Fast Protein Fold Recognition and Accurate Sequence-Structure Alignment, Proceedings of German Conference on Bioinformatics (GCB'96), R. Hofestädt, T. Lengauer, M. Löffler, D. Schomburg, eds., Springer Lecture Notes in Computer Science No. 1278 (1997) 137-148.
9 R. Thiele, R. M. Zimmer, T. Lengauer, Protein Threading by Recursive Dynamic Programming. J. Mol. Biol. 290, 3 (1999) 757-779
10 Proteins: Structure, Function and Genetics, Suppl: Third Meeting on the Critical Assessment of Techniques for Protein Structure Prediction (1999). http://PredictionCenter.llnl.gov/casp3/Casp3.html
11 T. Lengauer, R. Zimmer, Structure Prediction Methods for Drug Design, Briefings in Bioinformatics 1,3 (2000)
12 S. Anderson , Structural genomics: keystone for a Human Proteome Project. Nat Struct Biol. 6,1 (1999)11-12
13 I. Andricioaei, J. E. Straub, Finding the Needle in the Haystack: Algorithms for Conformal Optimization, Computers in Physics 10, 5 (1996) 449.
14 L. Piela, J. Kostrowicki, H. A. Scheraga, The Multiple--Minima Problem in the Conformational Analysis of Molecules. Deformation of the Potential Energy Hypersurface by the Diffusion Equation Method, J. Phys. Chem. 93 (1989) 3339--3346.
15 P. Kollman, Free Energy Calculations: Applications to Chemical and Biochemical Phenomena, Chemical Reviews 93 (1993) 2395-2417.
16 M. K. Gilson et al., The Statistical-Thermodynmic Basis for Computation of Binding Affinities: A Critical review, Biophysical Journal 72 (1997) 1047-1069.
17 J. D. Hirst, Predicting ligand binding energies, Current Opinion in Drug Discovery and Development 1 (1998) 28-33.
18 M. Rarey, M. Stahl, G. Klebe, Screening of Drug Databases, in Bioinformatics - From Genomes to Drugs (T. Lengauer, ed.), Wiley-VCH, Heidelberg, to appear.
19 E. O. Voit, Computational Analysis of Biochemical Systems, Cambridge University Press (2000)
20 M. Tomita et al., E-CELL: Software Environment for Whole-Cell Simulation, Bioinformatics 15, 1 (1999) 72-84.
21 A. Zien, R. Küffner, R. Zimmer., T. Lengauer, Analysis of Gene Expression Data With Pathway Scores, Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB2000), AAAI Press (2000) 407-417.
22 S. Fuhrman, S. Liang, X. Wen, R. Somogyi, Target Finding in Genomes and Proteomes, in Bioinformatics - From Genomes to Drugs (T. Lengauer, ed.), Wiley-VCH, Heidelberg, to appear.
23 P.-A. Binz et al., Proteome Analysis, in Bioinformatics - From Genomes to Drugs (T. Lengauer, ed.), Wiley-VCH, Heidelberg, to appear.
24 P., Bork, E.V. Koonin, Predicting Function from Protein Sequences: Where are the Bottlenecks? Nature Genet. 18 (1998) 313-318.
25 M. A. Huynen, Y. Diaz-Lazcoz and P. Bork, Differential Genome Display, Trends in Genetics 13 (1997) 389-390.
26 E. M. Marcotte et al., Detecting Protein Function and Protein-Protein Interactions from Genome Sequences, Science 285, 5428 (1999)751-753.
27 http://www.ncbi.nlm.nih.gov/PubMed/
28 H. Shatkay et al., Genes, Themes and Microarrays, Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB2000), AAAI Press (2000) 317-328.
29 E. A. Clark et al., Genomic Analysis of Metastasis Reveals an Essential Role for RhoC, Nature 406 (2000)532-535.
30 B. Kramer, G. Metz, M. Rarey, T. Lengauer, Ligand Docking and Screening with FlexX, Medical Chemistry Research 9, 7/8 (1999) 463-478.
31 M. Rarey, J. S. Dixon, Feature Trees: A New Molecular Similarity Measure Based on Tree Matching, J Comput Aided Mol Des. 12, 5 (1998) 471-490.
32 M. J. Rieder, D. A. Nickerson, Analysis of Sequence Variations, in Bioinformatics - From Genomes to Drugs (T. Lengauer, ed.), Wiley-VCH, Heidelberg, to appear.
33 A. Krogh, M. Brown, I. S. Mian, K. Sjölander, D. Haussler, Hidden Markov Models in Computational Biology: Application to Protein Modeling, J. Mol. Biol. 235 (1994) 1501--1531.
34 S. R. Eddy, Profile Hidden Markov Models, Bioinformatics 14,9 (1998) 755-763.
35 T. Jaakola, M. Diekhans, D. Haussler, Using the Fisher Kernel Method to Detect Remote Protein Homologies, Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology (ISMB'99), AAAI Press (1999) 149-158.
36 A. Zien et al., Engineering Support Vector Machines Kernels that Recognize Translation Initiation Sites, Bioinformatics (2000) to appear.
37 A. Zien, R. Zimmer, T. Lengauer, A Simple Iterative Approach to Parameter Optimization, Proceedings of the Fourth Annual Conference on Research in Computational Molecular Biology (RECOMB'00), ACM Press (2000) 318-327.
38 R. Zimmer, T. Lengauer, Fast and Numerically Stable Parametric Alignment of Biosequences. Proceedings of the First Annual Conference on Research in Computational Molecular Biology (RECOMB'97) (1997) 344-353.
39 http://scop.mrc-lmb.cam.ac.uk/scop/
40 http://www.biochem.ucl.ac.uk/bsm/cath/



URL for this page: http://domino.mpi-inf.mpg.de/internet/news.nsf/Spotlight/20011015
Created by:Uwe Brahm/MPII/DE, 10/15/2001 02:30 PMLast modified by:Uwe Brahm/MPII/DE, 02/10/2006 04:10 PM
  • Christel Weins, 12/14/2004 03:03 PM
  • Uwe Brahm, 10/17/2001 01:50 PM
  • Uwe Brahm, 10/16/2001 05:12 PM
  • Uwe Brahm, 10/15/2001 03:42 PM
  • Uwe Brahm, 10/15/2001 03:07 PM -- Created document.