Services, alignments and database searches |
TOP - Central sites - Fast db searches - Full dynamic progr - Profile-based ana - Hidden Markov models - Analyse and display alis - Find motifs - Composition bias - Other services |
Central sites
- EBI (England): Sequence similarity searches (FASTA, BLITZ, PROSITE, BLAST, MAXHOM-PredictProtein).
- BCM (USA): General protein sequence/pattern searches. Programs include fast methods (BLAST, FASTA, PROSITE) and full dynamic programming methods (FASTA, BLAST, BLITZ, MPSEARCH).
- BioSCAN: The BioSCAN Server allows searching, retrieving and comparing of protein and DNA sequences.
- NCSA Biology Workbench: The NCSA Biology Workbench provides a point and click interface for rapid access to biological databases and analysis tools.
- BCM search launcher: The BCM (Baylor College of Medicine) search launcher is an on-going project to organise molecular biology-related search and analysis services available on the WWW by providing a single point-of-entry for related searches.
|
Fast db searches
- BLAST: BLAST performs fast database searching combined with rigorous statistics for judging the significance of matches. Five BLAST programs search all combinations of query and database sequences.
|
Full dynamic progr
- BIOACCELERATOR: Sequence database searches using a fast parallel computer at the Weizmann Institute.
- ClustalW: ClustalW is a progressive (tree guided) multiple alignment program.
- SSEARCH: Sequence comparison using a full dynamic programming algorithm.
- ToPLign: ToPLign implements standard pairwise and multiple alignment methods with flexible parameter handling. The analysis of alignments is supported by offering different visualisations of alignments. Furthermore, the stability of the resulting alignments can be explored.
|
Profile-based ana
- ClustalW: ClustalW is a progressive (tree guided) multiple alignment program.
- MatchBox: Alignment refinement program based on merging boxes of highly similar residues in all columns of a given multiple alignment.
|
Hidden Markov models
- SAM: The Sequence Alignment and Modeling system (SAM) is a collection of flexible software tools for creating, refining, and using linear hidden Markov models for biological sequence analysis.
|
Analyse and display alis
- ToPLign: ToPLign implements standard pairwise and multiple alignment methods with flexible parameter handling. The analysis of alignments is supported by offering different visualisations of alignments. Furthermore, the stability of the resulting alignments can be explored.
- BOX: Pretty Printing and Shading of Multiple-Alignment files.
|
Find motifs
- PFSCAN: This server uses the PFSCAN program to search a single sequence against all profile entries in the current release of PROSITE.
- PatScan: The PatScan pattern matcher is being offered to allow you to search protein or DNA sequence archives for instances of some pattern. You must provide the pattern, along with some indication of which protein or DNA sequences you wish to scan.
- PROSITE: Dictionary of protein sites and patterns. PROSITE is a method of determining what is the function of uncharacterized proteins translated from genomic or cDNA sequences. It consists of a database of biologically significant sites, patterns and profiles that help to reliably identify to which known family of protein (if any) a new sequence belongs
- Pmotif: Search of given protein motif in requested protein and nucleotide database.
- REPRO: A service for the recognition of protein sequence repeats.
- FPAT: Regular expression searches of seq db from Washington Univ
|
Composition bias
- SAPS: SAPS - Statistical Analysis of Protein Sequences. Alignments and analyses of composition bias.
|
Other services
- Proteome: Comparative analysis of protein coding sequences of completed genomes.
- InterPro: Integrated resource of protein domains and functional sites.
- Sequence Alerting System: The sequence alerting system in its present form will search each day in several databases for news on (homologues of) "your" sequence and will inform you by email if it has detected a new relative.
- PSORT: Prediction of protein sorting signals and localisation sites in amino acid sequences.
|
Services, protein structure prediction |
TOP - Collection of tools - Secondary structure - Sec Str from CD - Solvent accessibility - HTM + signal pep - Coiled-coils - O-glycosylation sites - Contact prediction - Homology modelling - Threading |
Collection of tools
- Documented collection of prediction services: Overview and links to services for predicting secondary structure, solvent accessibility, homology modelling and threading (MRC, Cambdridge, England).
- List of prediction services: A list of useful protein prediction servers (Univ. Stockholm, Sweden).
- PredictProtein: Multiple sequence alignment (MAXHOM); prediction of secondary structure (PHDsec), solvent accessibility (PHDacc), transmembrane helices (PHDhtm), transmembrane topology (PHDtopology); and threading (PHDthreader).
|
Secondary structure
- PHDsec: Multiple alignment-based neural network system.
Accuracy: > 72% (+/-10%, one standard deviation), higher for more reliably predicted residues. Evaluated by cross-validation on 720 unique proteins; comparisons to other methods based on identical sets.
- NSSP: Multiple alignment-based nearest-neigbour method.
Accuracy: > 71%. Evaluated on > 200 unique proteins.
- SOPM: Multiple alignment-based method combining various other prediction programs.
Accuracy: > 70%. Evaluated on 100 unique proteins.
- DSC: Multiple alignment-based program using statistics.
Accuracy: 70%. Evaluated on standard set of 126 unique proteins (comparisons to other methods based on identical sets).
- SSPRED: Multiple alignment-based program using statistics.
Accuracy: > 70%. Evaluated on 70 unique proteins (no comparison based on identical sets to other methods).
- MultiPredict: Multiple alignment-based method using physicochemical information from a set of aligned sequences and statistical secondary structure decision constants.
Accuracy: > 65%. Evaluated on 13 unique proteins.
- PSA: The PSA server analyzes amino acid sequences to predict secondary structures and folding classes.
- NNPREDICT: Single-sequence based neural network prediction.
Accuracy: > 65%. Evaluated on pairwise similar proteins.
|
Sec Str from CD
- K2d: Algorithm for the estimation of the percentages of protein secondary structure from UV circular dichroism spectra using a Kohonen neural network with a 2-dimensional output layer. You can either use k2d via a web server or get the program and run it on your machine.
|
Solvent accessibility
- PHDacc: Multiple alignment-based neural network system.
Accuracy: > 75% (+/-10%, one standard deviation), higher for more reliably predicted residues. Evaluated by cross-validation on 720 unique proteins; comparisons to other methods based on identical sets.
|
HTM + signal pep
- PHDhtm: Multiple alignment-based neural network system predicting the locations of transmembrane helices.
Accuracy: > 95% (+/-10%, one standard deviation), higher for more reliably predicted residues. Evaluated by cross-validation on 132 proteins; comparisons to other methods based on identical sets.
- TMAP: Single sequence-based statistical prediction of the locations of transmembrane helices.
Accuracy: > 95%. Evaluated on 28 proteins WITHOUT cross-validation.
- PHDtopology: Refinement of PHDhtm by dynamic programming and prediction of topology (orientation of N-term with respect to membrane).
Accuracy: for > 85% of all proteins all helices and topology are predicted correctly. Evaluated by cross-validation on 132 proteins; comparisons to other methods based on identical sets.
- TMpred: Single sequence-based prediction of location and topology for helical transmembrane proteins using statistics and similarity metrices.
- DAS: Single sequence-based prediction of location for helical transmembrane proteins.
- TopPred2: Single sequence-based prediction of topology for helical transmembrane proteins.
- Signalp: Neural network prediction of presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive and Gram-negative prokaryotes, and eukaryotes.
|
Coiled-coils
- COILS: Single sequence-based prediction for coiled-coil regions using statistical patterns of coiled-coil proteins in the database.
|
O-glycosylation sites
- NetOglyc: Neural network predictions of mucin type O-glycosylation sites in mammalian proteins.
|
Contact prediction
|
Homology modelling
- SWISS-MODEL: An automated knowledge-based protein modelling server ; first approach and optimise (Peitsch M.C. Protein Modelling by E-mail. Bio/Technology 13:658-660. (1995)
|
Threading
- TOPITS: Prediction-based threading detecting the fold type and aligning a protein of unknown structure and a protein of known structure for low levels of sequence identity ( < 25%).
Accuracy: < 30% , i.e., less than 30% of the predicted first hits are true remote homologues. Evaluated by cross-validation on 89 unique protein structures.
- T3P2: Prediction-based threading detecting the fold type and aligning a protein of unknown structure and a protein of known structure for low levels of sequence identity ( < 25%).
- PSCANN: Threading method combining sequence and structure profiles. Performance accuracy: more likely to recognise similar folds than simple sequence alignment.
|