Our EST clusters take advantages of NCBI Pinus taeda UniGene (Build #11).   Each UniGene cluster is intended to include mRNA transcripts transcribed from a unique locus in the genome.   Because of the algorithm adopted, it is also likely that an UniGene cluster contains transcripts from gene paralogs and/or gene families.  Meanwhile, transcript isoforms derived from the same genes could also be clustered into different UniGene clusters.

    We adopted an unique approach in our EST clustering by

    (1) retrieve EST component lists of all UniGene clusters from NCBI

    (2) use our clean EST sequences annoated with cDNA termini that deliminates transcript ends

    (3) conduct EST clustering for every individual UniGene cluster using CAP3 to create consensus or contig sequences

    Consequently, one UniGene cluster could result in 0 (i.e., no clean EST sequence), 1 or multiple contigs.   In particular, our EST-Contig-Browser provides sophisticated visualization that helps biologists to explore transcript ends, SNP and so on, with robust search, sorting, and filtration functionality.

Continue