ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: Information Management of Zebrafish Full-length cDNA project
Yuyu, Kuang; Oh, Tania; Mathavan, S.; Le Ber, Pierre Louis; Dowts, Heidi; Kolatkar, Prasanna R.
Genome Institute of Singapore

We are curating full-length cDNA sequences from zebrafish which our sequencing group is producing en masse.

For the current phase, we initially process the raw data by removing the vector sequences to produce more than 20,000 cleaned sequences, which can then be used to analyze and pick out the full length clones for annotation.

In order to predict our full length clones, we have to first cluster and assemble the sequences to obtain consensus sequences for each cluster.

We then analyze the sequence and predict whether we have the full 5' UTR in the clone. We also predict if this is a novel or known sequence from it's predicted protein.

The project currently has identified more than 2000 full length cDNA sequences but it is expected that we will have more than 200,000 within the next 18 months. Therefore a robust , automated pipeline for processing data emanating from the LIMS system is critical to the project.

These full length cDNAs will allow a detailed look at alternative splicing and related questions involving differential expression in various tissues and human diseases such as liver cancer.