Overview:


 How to use DSAP:

(i) Input file

A single Solexa sequencing run produces two kinds of data:

  1. Raw data ( a FASTQ file contains an identifier, sequence reads and quality values for each base). The sizes of FASTQ files are usually in the gigabytes, which is not suitable for sending over the web.
  2. Sequence tags ( a tab-delimited file which holds only the unique sequence read (tag) and its corresponding number of copies).

A Biopieces' script is available to transform the FASTQ file into unique sequence tags.

Biopieces' command: ( read_fastq -i INPUT.fastq | uniq_seq -c | sort_records -r -k SEQ_COUNTn | write_tab -k SEQ_COUNT,SEQ -xo OUTPUT.tag )

DSAP takes a sequence tag file under 300Mb as input material. ( Must be same format as Figure1. )

Figure 1: Input file format

 

(ii) parameters

  • Choose species: The user can upload a sequence tag file under 300 Mb and then choose from among 115 species, or the use the default of all species if the organism is not listed.(as shown in Figure 2.)
  • Do not consider adaptor sequences: Remember to click the checkbox if you have an adaptor-removed sequence tags file or DSAP will consider it as an adaptor-contained file, discard tags without a reliable 3-adaptor and skip tags <16nt after the removal of 3-adaptor. Solexa use standard 3- and 5-adaptor sequences in their small RNA library preparation kit so it is not necessary for the user to upload the adaptor sequences.
  • Do not consider poly-A, T, C, G: The user can choose whether to remove continous poly-A/T/C/G reads in the cleanup step since lots of poly-A, T, C, G or N sequences came from sequencing errors.
  • Use test dataset: We provided a sequence tag file with 329,334 tags as a test dataset.

Figure 2: DSAP input page

 


 Description of Output:

  •  The output page is composed of several blocks which represents the analysis workflow of DSAP

    (i) job status

    (ii) cleanup

    (iii) clustering

    (iv) non-coding RNA matching

    (v) known microRNA matching

    (vi) summary of job

    (vii) comparative microRNAs analysis

 

(i) job status

After successful upload, the web server will return a page using timestamp as identifier (JOB ID). Job status can be monitored by a real-time meter graph which contains exact run time of each step. Besides, users can bookmark this web page for future reference.

(ii) cleanup

A bar chart dynamically recording the number of sequence tags during the cleanup process. It also provides a link to detailed information about the length distribution of attached adaptors.

Cleaned Sequence Tags ( in FASTA format) are available through the download link.

 

(iii) clustering

This block shows the clustering state of the cleaned sequence tags and provides each unique sequence cluster and its member information in a tab-delimited file

(iv)non-coding RNA matching, (v) known microRNA matching

The fourth and fifth blocks summarized the results of the unique tag clusters matched to Rfam and miRBase respectively. Each matched RNA family and its related expression level was summarized in a multi-colour clickable bar chart which linked to external database such as miRBase for further detail information. All the results were downloadable from the website in a tab-delimited text file. Representative sequence tags failed to be identified from the known microRNAs matching step can be downloaded for the identification of putative novel miRNAs

#The alignment of unique sequence clusters with the corresponding miRNA hairpin is optimized for the observation of isomiRs.

(vi) summary of job

(vii) comparative microRNAs analysis

DSAP is capable of displaying different microRNAs expression levels from different jobs using a log2-transformed color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in the miRBase.

Demonstration output

 


 

 

miRNAomics Page:

1.Paste your own miRNA expression profiles

DSAP accept users' own miRNA profiles from different experiment methods such as stem-loop real-time PCR, microarray or SOLiD sequencing , but should avoid file format problems with cautions.

Demonstration output:

2.Fill in the job ids provided by DSAP (max. 5)

Demonstration output: