The Maìstas Server Help Page
Maìstas is a fully automatic pipeline aimed at building and assessing three-dimensional models for alternative splicing isoforms. The server builds, when possible, comparative structural models for all the splicing isoforms of a submitted gene or set of genes. The models are then analysed in terms of their suitability to exist in the monomeric state, i.e. when a warning appears in the model assessment, it cannot be excluded the possibility that other multimeric state may stabilize the structure.
Moreover, the splicing isoform exonic coordinates are mapped on the final models. The latter feature can be visualized through a Jmol applet.
The server then, automatically stores the models and corresponding analysis in a relational database, shortening the time needed, in this way, for future modelling requests regarding the same gene products. Anyway, it may be advisable to re-model the genes if major updates of the genome or structure database have taken place in the mean time.
When a query is correctly uploaded, the job is launched on our cluster. Maìstas pipeline includes:
- BioMart and Ensembl databases search for all the splicing isoforms of the family.
- Template searching and target/template sequence alignment using HHsearch 1.1.5 software.
- Protein model building (one model for each isoform sequence) using modeller9v8 software.
- Automatic storage of the produced models in a relational database.
- Plausibility assessment of protein models in terms of their suitability to exist in the monomeric state.
The target sequence, the template(s) and the alignment obtained by HHsearch are automatically analysed. The models are inspected to detect possible gaps in the coordinate set (for example because of the absence of electron density in X-ray structures). If these regions are present at the N- or C-terminus of the protein, they are trimmed, otherwise a warning is issued. A warning is also issued if the alignment includes insertions larger than fifty residues that might correspond to an inserted domain, or deletions larger than twenty residues.
Maìstas takes as input a list of gene (or protein) identification codes. The input to Maìstas can be one or more of the following codes:
|Ensembl Gene ID(s)
Ensembl protein ID(s)
Ensembl Transcript ID(s)
|Identifiers provided by the Ensembl database.
||Identifiers provided by the EMBL database.
||Identifiers provided by the EntrezGene resource.
|HGNC automatic gene name
HGNC curated gene name
|Identifiers provided by the HUGO Gene Nomenclature Committee ID(s) (http://www.genenames.org/).
|Identifiers provided by UniProtKB database.
|VEGA transcript ID(s)
||Identifiers provided by the Vertebrate Genome Annotation (VEGA) database.
|HAVANA transcript ID(s)
||Identifiers provided by the Vertebrate Genome Annotation (VEGA) database.
|Special FASTA format
||User supplied sequences. See below.
The submitted codes will be then used to identify the corresponding gene codes in the BioMart database. Thus, all the splicing variants belonging to the same family of the gene of interest will be retrieved. The input codes are derived from the ensembl_mart_58 database (ftp.ensembl.org). For the complete identification code list refer to the select
menu in the Maìstas interface.
Protein sequences in special FASTA format
(see below) can also be pasted into the input window of the server main page.
The special FASTA format
of your query sequence/s MUST contain, in the first line, a ">" (greater-than) symbol followed by the ID code (name) of the sequence. The ID must be in the following format: GENE_ISOFORM, where GENE and ISOFORM are alpha-numeric characters. No spaces allowed (!!!)
Example of special FASTA format.
In the following example gene1 has two splicing isoforms, named iso1 and iso2.
The input format for the gene1 must be as follow:
You can retrieve your results:
- using the web link to your results that Maistas provides in the last submission page.
- from the [retrieve page] inserting the Job-ID (obtained upon submission in the last submission page) or the e-mail address (upon insertion of the optional e-mail).
- directly by e-mail if you provide a valid e-mail address (optional). In this case the server will send you an e-mail immediately after query submission confirming job launching or a self-explaining error message if the query is wrong. Upon job completion an e-mail containing the URL for result downloading will be sent. The response time will strongly depends on server load.
Server output includes the three-dimensional coordinates in PDB format for each modelled peptide and a table describing results of the structural analysis. See output example
The Maistas RESULT DETAILS section consists of the following columns:
|gene ID:||gene identification code.
|isoform ID:||isoform identification code.
|isoform length:||number of modelled residues (or solved residues when isoform structure is known).
|first AA, last AA:||the first and the last modelled (or solved) aminoacids.
|template ID:||PDB accession code of the template protein used in the modelling or the PDB code of the known isoform structure.
|isoform/template % seq. id.:||percentage of sequence identity between splicing isoform and template sequence.
|fraction of isoform modelled:||percentage of modelled sequence.
|summary:||Summary of the evaluation step: Plausible means that isoform 3D model might correspond to a complete or plausible structure; Unlikely means that the model might not correspond to a complete or plausible structure; No template means that isoform model cannot be built by homology because no template is available in PDB; Not assessed means that some tools might have failed and the assesment or modelling procedure cannot be executed.
The e-mail address is optional. It can be used when you want to be notified about the availability of your results if you ask for many proteins to be analysed. Bear in mind that if you enter an incorrect e-mail address, there is no way the server can contact you!
- Söding J., (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951-960. doi:10.1093/bioinformatics/bti125.
- A. Sali and T.L. Blundell. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779-815, 1993.
- Hubbard, S. and Thornton, J. (1993) NACCESS (Department of Biochemistry and Molecular Biology, University College, London)
- Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res. 25:3389-3402
- Pattabiraman, N., Ward, K.B. and Fleming, P.J. (1995) Occluded Molecular Surface: Analysis of Protein Packing, Journal of Molecular Recognition, 8:334-344. (This is the original description of the OS method).
- Fleming,P.J.and F.M.Richards (2000) Protein Packing:Dependence on Protein Size, Secondary Structure and Amino Acid Composition. J.Mol.Biol. 299, 487-498. (This is the most complete description of occluded surface packing and includes packing results for a dataset of 152 proteins).
- Vorobjev, Y.N. and Hermans, J. (1997) SIMS: Computation of a Smooth Invariant Molecular Surface. Biophysical Journal, 73:722-732. (SIMS is used to calculate the dot surface for OS).
- Calculation of standard atomic volumes for RNA and comparison with proteins: RNA is packed more tightly. NR Voss, M Gerstein (2005) J Mol Biol 346: 477-92.
- Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, Huber W BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics. 2005 Aug 15;21(16):3439-40.
- J Tsai, R Taylor, C Chothia and M Gerstein (1999). The Packing Density in Proteins: Standard Radii and Volumes, J. Mol. Biol. 290: 253-266.
For further details contact [floris]AT[crs4]DOT[it].