Previous - Next - Bottom - PP home - PP help TOC

PP Help 07: Examples for input formats

Contents

Note: these examples for input formats are primarily important for users using the email submission procedure (instead of directly filling in the forms on the WWW: default, advanced, expert form)

EXAMPLES for input formats (required for email submissions)

Note: the examples for the allowed PP input formats are primarily important when you submit the request by email.

Example for input and output (important for email submission)

Submitting a single sequence

INPUT is: your protein sequence,
OUTPUT is: alignment + prediction

INPUT
You send the following file:

joe@amino.churn.edu
# incredulase from paracoccus dementiae, translated from cDNA
KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKD
WWKVEVNDRQGFVPAAYVKKLD

Notes:

The '#' is a control for PredictProtein.
The hash (#) is crucial, as the parser interprets anything after this line as a protein sequence. Following the hash, put a one-line description of the protein.

OUTPUT (detailed example)
If your sequence has at least one non-trivial homologue in the database of protein sequences, you receive a multiple sequence alignment and the annotated prediction in the following form:

Block with multiple sequence alignment.
Block with explanations about the prediction method.
Block with prediction (example for secondary structure prediction follows).

    .........1.........2.........3.........4.........5.........6
AA  KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPAAYVKKLD
PHD   EEEEEE                EEEEEE     EEEEEE    EEEE  EEE   
Rel 854777641334566643102441577762566642443213663122112234155

Submitting a set of unaligned sequences (in FASTA format)

INPUT is: a list with your sequences,
OUTPUT is: alignment + prediction

INPUT
You send the following file:

joe@amino.churn.edu
# FASTA list incredulase from paracoccus dementiae, translated from cDNA
> Andr_Mouse
RQLVHVVKWAKALPGFRNLHVDDQMAVIQYSWMGLMVFAMGWRSFT
> Prgr_Rabit
QLLSVVKWSKSLPGFRNLHIDDQITLIQYSWMSLMVFGLRSYK
:

Notes:

The string "# FASTA list" is crucial, as the parser interprets anything after this line as a list of sequences in FASTA format (i.e. the actual FASTA format starts in the line after the '#').

OUTPUT (example)

Block with ProDom domain assignment (if found).
Block with ProSite motif (if found).
Block with predictions of coiled-coil regions (if found).
Block with explanations about the prediction method.
Block with prediction.

Submitting a set of unaligned sequences (in PIR format)

INPUT is: a list with your sequences,
OUTPUT is: alignment + prediction

INPUT
You send the following file:

joe@amino.churn.edu
# PIR list incredulase from paracoccus dementiae, translated from cDNA
>P1;
Andr_Mouse
RQLVHVVKWAKALPGFRNLHVDDQMAVIQYSWMGLMVFAMGWRSFT
>P1;
Prgr_Rabit
QLLSVVKWSKSLPGFRNLHIDDQITLIQYSWMSLMVFGLRSYK
:

Notes:

The string "# PIR list" is crucial, as the parser interprets anything after this line as a list of sequences in PIR format (i.e. the actual PIR format starts in the line after the '#').

OUTPUT (example)

Submitting your alignment (in SAF format)

Note: I do strongly recommend this as THE option of choice for non-experts (rather than the MSF format).

INPUT is: your alignment (in the simple alignment format SAF),
OUTPUT is: prediction

INPUT You send the following file:

joe@amino.churn.edu
# SAF incredulase from paracoccus dementiae, translated from cDNA
Andr_Human RQLVHVVKWA KALPGFRNLH VDDQMAVIQY SWMGLMVFAM GWRSFT
Prgr_Rabit .QLLSVVKWS KSLPGFRNLH IDDQITLIQY SWMSLMVFGL GWRSYK

Notes:

Your name and email address are required. The string "# SAF" is crucial, as the parser interprets anything after this line as an alignment in SAF format.)
The '#' is a control for PredictProtein. The actual SAF-format begins after that line!
Names should contain up to 14 characters and no blanks.
Please use the same names for the same protein in all rows.
To mark insertions, please use a point '.'.

OUTPUT (example)

Submitting your alignment (in MSF format)

Note: To non-experts I strongly recommend to use the SAF format, instead (see above).

INPUT is: your alignment (in the multiple sequence format MSF),
OUTPUT is: prediction

INPUT You send the following file:

joe@amino.churn.edu
# MSF incredulase from paracoccus dementiae, translated from cDNA
MSF of: x.hssp from: 1 to: 176
x.msf MSF: 176 Type: P 11-Oct-93 21:17:4 Check: 5859 ..
Name: Andr_Human Len: 176 Check: 750 Weight: 1.00
Name: Prgr_Rabit Len: 176 Check: 3980 Weight: 1.00
//
Andr_Human RQLVHVVKWA KALPGFRNLH VDDQMAVIQY SWMGLMVFAM GWRSFT
Prgr_Rabit .QLLSVVKWS KSLPGFRNLH IDDQITLIQY SWMSLMVFGL GWRSYK

Notes:

Your name and email address are required. The string "# MSF" is crucial, as the parser interprets anything after this line as an alignment in MSF format.)
The '#' is a control for PredictProtein. The actual MSF-format begins after that line!
Names should contain up to 14 characters and no blanks.
Please use the same names for the same protein in all rows.
All sequences must have the same length.
To mark insertions, please use a point '.'.

OUTPUT (example)

Submitting a set of aligned sequences (in FASTA format)

INPUT is: a list with your aligned sequences,
OUTPUT is: prediction

INPUT
You send the following file:

joe@amino.churn.edu
do NOT align
# FASTA list incredulase from paracoccus dementiae, translated from cDNA
> Andr_Mouse
RQLVHVVKWAKALPGFRNLHVDDQMAVIQYSWMGLMVFAMGWRSFT
> Prgr_Rabit
QLLSVVKWSKSLPGFRNLHIDDQITLIQYSWMSLMVFGLRSYK
:

Notes:

The strings '# FASTA list',and 'do NOT align' are crucial: the first as the parser interprets anything after the line with the hash ('#') as a list of sequences in FASTA format (i.e. the actual FASTA format starts in the line after the '#'), the second ('do not align'), as otherwise your sequences will be re-aligned.

Block with ProDom domain assignment (if found).

Submitting a set of aligned sequences (in PIR format)

INPUT is: a list with your aligned sequences,
OUTPUT is: prediction

INPUT
You send the following file:

joe@amino.churn.edu
do NOT align
# PIR list incredulase from paracoccus dementiae, translated from cDNA
>P1;
Andr_Mouse
RQLVHVVKWAKALPGFRNLHVDDQMAVIQYSWMGLMVFAMGWRSFT
>P1;
Prgr_Rabit
QLLSVVKWSKSLPGFRNLHIDDQITLIQYSWMSLMVFGLRSYK
:

Notes:

The strings '# PIR list',and 'do NOT align' are crucial: the first as the parser interprets anything after the line with the hash ('#') as a list of sequences in PIR format (i.e. the actual PIR format starts in the line after the '#'), the second ('do not align'), as otherwise your sequences will be re-aligned.

Block with ProDom domain assignment (if found).

Submitting 1D structure prediction for fold recognition (in COLUMN format)

INPUT is: a prediction of secondary structure and accessibility,
OUTPUT is: an alignment of remote homologues

INPUT You send the following file:

joe@amino.churn.edu
prediction-based threading
# COLUMN format
AA PSEC PACC RI_SEC RI_ACC
E L 11 9 6
F E 7 9 0
: : : : :
V H 61 3 0
L H 113 3 0
R H 39 1 1
: : : : :
P L 17 9 4
A L 89 9 2
: : : : :

Delimiters of columns: allowed are spaces, commas, and tabs.
Compulsory information: (1) sequence (AA) in one-letter code; (2) secondary structure (PSEC) in either of the states H=helix, E=strand, or L=rest; (3) solvent accessibility (PACC) in square Angstrom (note: for prediction-based threading accessibility will be converted to relative accessibility in two states: buried (<15%) or exposed (≥15%)).
Optional: (1) reliability, or strength for secondary structure (RI_SEC) scaled from 0 (low) to 9 (high); (2) reliability, or strength for relative accessibility (RI_ACC) scaled from 0 (low) to 9 (high).
Notes:

The string '# COLUMN format' is crucial, as the parser interprets anything after this line as a prediction.
To receive PHD prediction in this format use the output option 'return COLUMN format'.

OUTPUT (example)

Submitting secondary structure for evaluation of prediction accuracy (in COLUMN format)

INPUT is: a prediction and observation of secondary structure,
OUTPUT is: an evaluation of prediction accuracy

INPUT You send the following file:

joe@amino.churn.edu
evaluate prediction accuracy
# COLUMN format
NAME AA PSEC OSEC
first M L L
first Q L L
first T L H
first S H H
first S H H
first I H H
: : : :
second G L L
second V L L
second K E L
second S L H
second I L H
: : : :

Delimiters of columns: allowed are spaces, commas, and tabs.
Compulsory information: (1) sequence (AA) in one-letter code; (2) secondary structure (PSEC) in either of the states H=helix, E=strand, or L=rest; (3) observed (OSEC) secondary structure in either of the states H=helix, E=strand, or L=rest (e.g. from DSSP assignment); (4) if more than one protein is used, simply append all requested proteins (in that case make sure that the first column (NAME) lists a unique protein name).
Optional: (1) name of protein (compulsory for more than one protein).
(Note: Your email address is required. The string "# COLUMN format" is crucial, as the parser interprets anything after this line as a prediction.)

OUTPUT (example)

Block with definition of scores for prediction accuracy.
Tables with per-residue and per-segment prediction accuracy.

Submitting a single sequence through its SWISSPROT identifier

INPUT is: your protein sequence given as a valid SWISSPROT identifer,
OUTPUT is: alignment + prediction

INPUT
You send the following file:

joe@amino.churn.edu
# SWISSid paho_chick

Notes:

The string "# SWISSid" is crucial, as the parser interprets anything after this line as a SWISSPROT identifier.
Valid identifiers have the form 'name_species', for the example above:

name=paho
species=chick

Only identifiers of the latest SWISSPROT release are accepted.

Previous - Next - Top - PP home - PP help TOC