This database contains 309,970 raw DNA sequencing trace files of Pinus taeda Expressed Sequence Tags (ESTs). The relevant cDNA libraries were constructed using oligo-dT primers, and represent various sampling strategies with regard to different genotypes, organs, developmental stages, growth conditions and/or environmental stressors. All the traces have been reprocessed using our novel bioinformatics pipeline, WebTraceMiner, which focuses on detecting in-silico verified cDNA termini/ends. 53% ESTs had cDNA termini(us) that matched the expected structure specified in their cDNA library construction, with a high quality, clean region of >=75 nt. When compared with our final sequences, lots of EST counterparts in GenBank dbEST contain untrimmed terminus parts, exerting deleterious impacts to many downstream EST applications. Meanwhile, lots of public ESTs are over trimmed in terms of their identified terminal structures, representing a loss of directional, positional and structural information of cDNA termini and therefore mRNA 3'/5' ends. Taking advantage of NCBI Pinus taeda UniGene (Build #6), we have conducted EST clustering using our clean EST data. For any given UniGene, we might have one to several contigs created. Using our EST-Contig Browser, biologists can see the nucleotide alignment of individual ESTs, which provides visualization of insertion, deletion and mismatches, along with cDNA termini emphasizing the transcript 3'/5' ends. |
