All 1,219,747 EST sequences from Oryza sativa were downloaded from NCBI dbEST (i.e., release of 03-26-2008).   In addition, we were able to collect 95,601 raw EST sequencing trace files which overlapped with the downloaded GenBank data by 68,937 sequence reads.  The core component of WebTraceMiner, a trace mining tools that we developed, was adopted to detect cDNA termini in raw EST sequencing traces, which are defined as a set of diagnostic sequence features (including poly(A)/(T) tails) that define the ends or termini of cDNA inserts, and therefore 3' or 5' ends of mRNA transcripts.  Meanwhile, this tool is also used for the detection of poly(A)/(T) tails in the GenBank data.  In silico authenticated poly(A)/(T) tails in ESTs were defined as oligo(A)/(T) tracts that have a minimum length of 10 nt, allowing for a 2-nt error (i.e., mismatch, insertion, or deletion).   In addition, all the first and last eight nucleotides must be adenines (or thymines if reading from a different direction).  Using GMAP, a stand-alone program for aligning cDNA sequences to a genome and generating gene structures, we mapped all ESTs to the rice genome and make sure that the identified poly(A)/(T) tails were not from the genome sequences, thus eliminating internal priming contaminations.

      We have adopted the RAP Rice Genome Annotation (Build 4.0) with 29,389 gene loci.   Based on the representative mRNA of each locus, we then categorized the in silico authenticated poly(A)/(T) sites into 5'-UTR, 3'-UTR, CDS and Intron regions.  Consequently, we have identified more than 80,000 distinct poly(A) sites within genic regions of the current Rice genome annotation.  Moreover, our data also revealed that there are a lot of poly(A) sites located in intergenic regions of the current Rice Genome Annotation.

      Please click "Continue" to retrieve the poly(A) site list.
Continue