You are here: Home > Questions and Answers

SeqTU team: Questions and Answers

Q: What is a TU?

A: The transcription units (TUs) are basic functional units each consisting of genes consecutively arranged in a genomic region and with the same orientation and being co-expressed through transcriptional co-regulation

Q: Why we need to predict a TU?

A: Although operons can be reliably predicted based on genomic sequence data alone their true utility has been somewhat limited, particularly in functional studies of bacterial cells as it has been widely observed that genes in the same operons may not necessarily be always co-regulated i.e., the expressed portion of an operon is dynamically determined by specific conditions. This clearly raises the need for accurate identification of TUs revealed by a given set of transcriptomic data, which are condition-dependent and may overlap with each other in the genome.

Q: What SeqTU server can do?

A: SeqTU is a web server to identify TUs in bacteria based on the organism’s RNA-seq data. For a given set of RNA data, the algorithm predicts that a set of consecutive genes form a TU if the estimated RNA levels across the genomic region spanned by these genes are stable and continuous across the whole region. SeqTU uses the BWA software to map short RNA reads to the underlying genome for expression level estimation at each base pair

Q: What is the unique feature of SeqTU algorithm?

A: A unique property of the program is that it has a trainer for training organism-specific predictor for TU prediction based on the provided RNA-seq data, using a machine learning approach. In addition, the program uses NCBI’s Sequence Read Archive (SRA) database as the default dataset so a user aiming to identify TUs in a dataset in SRA only needs to specify the identifier of the relevant file rather than providing the dataset to the program.

Q: How can we upload a RNA-seq dataset?

A: Due to the large size of RNA-seq data, we cannot allow the users upload their own data to SeqTU server. But the server can fetch the dataset from the SRA database, one can search and select the ID of the set through the search box at the middle of Home page. The server will automatically download the specific RNA-seq dataset.

Q: How can we find the correct reference genome?

A: We use the in-house program tp provide a table listing all the possible reference genomes retrieved from the NCBI biosample database. Also user can input the NC number of the reference genome if the desired reference genome is not listed in the table.

Q: How SeqTU progress the short reads mapping?

A: Once the user selects or provides the reference genome, the SeqTU server maps the RNA-seq reads on the reference genome using the BWA program on a high performance computer with 128 cores and 512GB memory. An additional step is used to handle the unmapped RNA reads using Blast. SeqTU keeps a record of the previously mapped RNA-seq reads to a specific genome. Once the same request is made for RNA reads mapping to the same reference genome, the server will retrieve the previous mapping results to save computing time.

Q: How to access the final predict results?

A: The final prediction result will be shown in the bottom table with the TU number in results page including start position, end position, strand information, gene symbol and a download link to the computed expression levels