So far, EST terminus information has been largely ignored in the major public-domain EST databases.  This is mainly because the majority of sequence processing packages or pipelines focus on "cleaning" or "trimming" spurious sequences in ESTs, including vector fragments, insert-flanking restriction endonuclease recognition sites, adapter sequences and/or polyA/T tails.  However, our recent research proved that without inspecting cDNA terminus structures, conventional bioinformatics pipelines appear to be problematic when they process some raw EST trace files.  Consequently, they could create unclean or under-trimmed EST sequences, which will definitely have cascading and deleterious impacts to many downstream EST applications that use these sequences.   On the other hand, lots of sequences have been over-trimmed with regard to their terminus structures and represent loss of directional and positional information of mRNA 3'/5' ends.

    Within the C. reinhardtii research community, there are many ESTs which have not been submitted into GenBank yet, but have been used to create public EST assemblies and for genome annotation.  Fortunately, we are able to get these EST sequences to conduct data comparison.   After comparative analyses, we found that lots of these community EST sequences are incorrectly-trimmed.  Using the following web interfaces, you can find out how many community sequeneces are over-trimmed or under-trimmed.
Continue