PHDacc:
prediction of solvent accessibility
(cite)
PHDhtm:
prediction of transmembrane helices and their topology
(cite)
reported only if hit found
NOTE: by default, the threshold for what is considered to be a membrane helix is rather restrictive. This has two consequences:
almost no false positves (proteins identified to contain membrane helices that do actually NOT contain membrane helices),
some membrane proteins may be missed
If you want to check with a higher sensitivity whether or not your protein does is likely to contain membrane helices, please make use of the advanced prediction option:
'transmembrane helices (PHDhtm)'
Methods available upon request
TOPITS:
prediction-based threading (detection of remote homologies)
(cite)
EvalSec:
evaluation of secondary structure prediction accuracy
(cite)
Default submission form
Your emailExample: rost@cubic.bioc.columbia.edu Your entire (and entirely correct) email address (e.g. rost@cubic.bioc.columbia.edu).
Note: typos will result in that we shall not be ablet to return the results.
PasswordExample: (i.e. leave field empty!)
Using PredictProtein is free for academical users. Only companies have to fill in their password, here!
Return results in HTMLexample The email you get will have the entire results attached in one HTML formatted file (which you can load into any WWW browser). Alignments and ProDom results are displayed using the program MView developed by (A HREF="http://mathbio.nimr.mrc.ac.uk/~nbrown/">Nigel Brown (MRC, Mill Hill, London).
NOTE: the option HTML for printouts results in that the file will contain a format which you can directly print out (the default is viewable by WWW browsers, but not printable, since there are too many characters per line!).
Store results here, return no mail output We shall not returned the results by mail. Instead, the results for your requests will be stored on our machines for 3 days, and you will receive a mail that simply tells you how you can ftp the result from here. The reason for including this option is that some requests may result in very large output files, and those may be difficult to handle for your local mailing device (in particular when you request HTML formatted output).
Please use only one-letter code amino acids. In particular, avoid numbers or '*', or '.'.
SUBMIT or CLEAR Click on the button SUBMIT to request a prediction
Click on the button CLEAR to clear all data you filled in (e.g. to restart, or to send a new request).
Default: all programs will be run. Some results may be omitted in the final mail we return, if we decide that the respective signal was not above a certain threshold (this holds in particular for the prediction of coiled coil and membrane regions).
TOPITS: prediction-based threading.
We run your protein against a representative part of the PDB database (i.e. the database of proteins with known 3D structure) to find proteins similar to your sequence, that cannot be identified as similar from the sequence alone.
PHDsec only: will return only the prediction of secondary structure (and the alignments, if requested).
PHDacc only: will return only the prediction of solvent accessibility (and the alignments, if requested).
PHDhtm only: will return only the prediction of helical transmembrane regions, as well as of hthe topology of helical membrane proteins.
Note: PHDhtm is also run by default, but weaker hits are reported only if you choose this option!
Evalsec: evaluates the accuracy of a secondary structure prediction for which you provide the observed secondary structure.
PROF: by default 1D structure predictions are still produced by PHD. Now, the more accurate program PROF is also available.
PROFsec only: will return only the PROFsec prediction of secondary structure (and the alignments, if requested).
PROFacc only: will return only the PROFacc prediction of solvent accessibility (and the alignments, if requested).
Default: SWISS-PROT protein sequence database (version 37.0, 78597 proteins)
PDB: Protein Data Bank of protein structures (version 99-04, 14400 protein chains)
TrEMBL: Translations of all coding sequences in the EMBL Nucleotide Sequence Database (version 99-04, 374386 proteins)
SWISS-Prot + PDB + TrEMBL: combination of all the previous (total of 467383 sequences)
Note: due to the cosiderable CPU-time spent on searching through the combined database, we have to impose a length limit (currently < 1000 residues). Thus, if you want a search through the big database, please chop your sequence into the units that are likely to form domains (see e.g. ProDom).
If your sequence is longer than 1000 residues, the search will be restricted to the SWISS-PROT database.
Run iterated PSI-BLAST on SWISS-PROT + TrEMBL + PDB Limited CPU-time prevents us from automatically running an iterated PSI-BLAST for all submissions. However, you can request this option. If you do, the PSI-BLAST alignment will be used for predictions. (Note: you also will have to select 'Return BLAST output', since that the PSI-BLAST results will NOT be returned by default!)
Return BLAST output from SWISS-PROT searchexample The (more or less unfiltered) raw output from the BLAST search against the SWISS-PROT database is additionally returned (note: by default we return only the final result from the dynamic programming search with the program MaxHom).
PHD msf
example Returns the PHD predictions additionally in an MSF format (appended to the alignment).
PHD rdb
example Returns the PHD predictions additionally in RDB format (as read and written by local versions of the programs PHD and TOPITS).
PHD col
example Returns the secondary structure and accessibility predictions additionally in a column format. (Note: this format can be used as input for a request of prediction-based threading.)
PHD casp2
example Returns the PHD predictions additionally in the format used for the second protein structure prediction contest in Asilomar, 1996 (CASP2).
PHD msf
example Returns the PROF predictions additionally in an MSF format (appended to the alignment).
PHD rdb
example Returns the PROF predictions additionally in RDB format (as read and written by local versions of the programs PROF and TOPITS).
PHD col
example Returns the secondary structure and accessibility predictions additionally in a column format. (Note: this format can be used as input for a request of prediction-based threading.)
PROF casp
example Returns the PROF predictions additionally in the format used for the second protein structure prediction contest in Asilomar, 2000 (CASP4).
TOPITS hssp
example Returns the threading output additionally in HSSP format.
TOPITS strip
example Returns the threading output additionally in STRIP format (which displays predicted and observed secondary structure underneath one another).
TOPITS own
example Returns the threading output additionally in the format used by TOPITS.
HTML formatted results
example The email you get will have the entire results attached in one HTML formatted file (which you can load into any WWW browser). Alignments and ProDom results are displayed using the program MView developed by (A HREF="http://mathbio.nimr.mrc.ac.uk/~nbrown/">Nigel Brown (MRC, Mill Hill, London).
HTML for printouts
example The email you get will have the all output attached in one HTML formatted file (to display with any WWW browser) that has fewer characters per line than the normal HTML output (see "return HTML"), so that you can print the output.
HTML with PHD graphs
example The email you get will have the entire results attached in one HTML formatted file (which you can load into any WWW browser).
Note: the HTML files resulting from the PHD predictions may be large. To avoid that your mail will be too big, you may therefore use the option of leaving the result on our machines, and simply ftp it to your local machine (see option "return no mail").
HTML with PHD graphs for printouts
example The email you get will have the all output attached in one HTML formatted file (to display with any WWW browser) that has fewer characters per line than the normal HTML output (see "return HTML detail"), so that you can print the output. (For further information see the option "return html".)
Note: the HTML files resulting from the PHD predictions may be large. To avoid that your mail will be too big, you may therefore use the option of leaving the result on our machines, and simply ftp it to your local machine (see option "return no mail").
Concise outputexample Returns a concise summary of results (e.g., no tables for prediction accuracy).
do NOT align:
if you check this box, then we shall not align your list of FASTA or PIR formatted sequences
Note: this option is only effective for those two cases (see sequence formats below).
Note 2: PHD prediction are more accurate if based on alignments. Thus, please DO NOT use this when you don't have a strong reason for it!!!
return full sequences:
By default, all alignments will be returned in a form which has NO insertions in the protein sequence you submitted. Thus, sequences that are best aligned to your protein by cleaving off (typically loop regions), will simply appear as lower case residues indicating that between these two residues others were deleted (e.g. 'AACpsHW' could indicate a deletion between P and S from an original sequence that may have been 'AACPEQGGSHW'). If you want to keep all the residues inserted, you have to select the box 'return full sequences'.
do NOT filter returned alignment
By default we filter the alignment returned to you such that only more likely homologues will be identified. If you switch this option off, you may want to be aware of the fact that most proteins aligned to your sequence will NEITHER be similar in terms of structure, NOR in terms of function!!
do NOT filter alignment used for PHD
If the divergence found in your family is not 'well' spread, prediction accuracy may drop. In particular, too many highly similar sequences may be problematic in absence of further diverged family members. This problem came up only in the post-genome era, i.e. since the number of sequences is exploding. To correct for this problem we run a crude filter on the alignment, by default.
do NOT return PSI-BLAST:
if you check this box, then we shall not use the PSI-BLAST alignment as input for the prediction
Note: at the moment iterated PSI-BLAST are not run, however, we consider to change this as soon as we shall have more CPU resources available.
Default= single sequence:
single protein sequence in one-letter amino acid code
Multiple-sequence alignment in SAF-format:
example Your alignment (in the simple alignment format SAF).
Note: I do strongly recommend this as THE option of choice for non-experts (rather than the MSF format).
Multiple-sequence alignment in MSF-format:
example Your alignment (in the multiple sequence format MSF).
Note: To non-experts I strongly recommend to use the SAF format, instead (see above).
List of sequences or alignment in FASTA-format:
example
Swissprot identifier (SWISSID):
Example: paho_chick Allows to submit a single sequence through its SWISSPROT identifier, i.e. you simply provide the respective swissprot identifier, and we shall align exactly that protein (provided it is available in our current SWISS-PROT database!).
Prediction of secondary structure and solvent accessibility in COLUMN-format:
example
Known and predicted secondary structure in COLUMN-format:
example Submitting secondary structure for evaluation of prediction accuracy
NOTE: only for running Evalsec, i.e. NOT for getting predictions!
Batch or interactive? PP has the option of providing the results interactively, i.e. you keep waiting, and eventually the results will pop up on the screen.
However, the typical processing time is more than 5 minutes, and to avoid overloading the network connection, we actually switch to the BATCH mode after that time!
Note: typically the interactive mode is useful:
if the current PP job queue is empty (see the WAIT icon below to check)
AND if your request contains an alignment already (i.e. you want only a PHD prediction)
Expert submission form
well, you don't need help, you are an expert, anyway....