Accuracy of PHDhtm
PHDhtm helical trans-membrane region prediction
****************************************************************************
* *
* *
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *
* Prediction of helical transmembrane segments by PHDhtm: *
* a Profile fed neural network system from HeiDelberg *
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *
* *
* Authors: Burkhard Rost & Chris Sander *
* EMBL, Heidelberg, FRG *
* Meyerhofstrasse 1, 69 117 Heidelberg *
* Internet: Predict-Help@EMBL-Heidelberg.DE *
* *
* All rights reserved. *
* *
* *
****************************************************************************
* *
* *
* About the network method *
* ~~~~~~~~~~~~~~~~~~~~~~~~ *
* *
* To be quoted for publications of PHDhtm output: *
* B Rost, R Casadio, P Fariselli & C Sander: Prediction of helical *
* transmembrane segments at 95% accuracy. Prot. Science, 1995, 4, *
* 521-533. (Abstract) *
* *
* The PredictProtein mail server is described in: *
* B Rost: PHD: predicting one-dimensional protein structure by pro- *
* file based neural networks. Meth. in Enzym., 1996, 266, 525-539. *
* (Text) *
* *
* The network for prediction of secondary structure is described in *
* detail in: *
* B Rost & C Sander: Prediction of protein structure at better than *
* 70% accuracy. J. Mol. Biol., 1993, 232, 584-599. (Abstract) *
* B Rost & C Sander: Combining evolutionary information and neural *
* networks to predict protein secondary struct. Proteins, 1994, 19, *
* 55-77. (Abstract) *
* *
* *
* *
****************************************************************************
* *
* *
* Estimated Accuracy of Prediction *
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *
* *
* A cross validation test on 69 helical trans-membrane proteins (in total*
* about 30,000 residues) with less than 25% pairwise sequence identity *
* gave the following results: *
* *
* ++================++-----------------------------------------+ *
* || Qtotal = 94.7% || ("overall two state accuracy") | *
* ++================++-----------------------------------------+ *
* *
* +----------------------------+-----------------------------+ *
* | Qhelix (% of observed)=92% | Qhelix (% of predicted)=83% | *
* | Qloop (% of observed)=96% | Qloop (% of predicted)=97% | *
* +----------------------------+-----------------------------+ *
* *
*..........................................................................*
* *
* These percentages are defined by: *
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *
* *
* number of correctly predicted residues *
* Qtotal = --------------------------------------- (*100) *
* number of all residues *
* *
* no of res correctly predicted to be in helix *
* Qhelix (% of obs) = -------------------------------------------- (*100) *
* no of all res observed to be in helix *
* *
* *
* no of res correctly predicted to be in helix *
* Qhelix (% of pred)= -------------------------------------------- (*100) *
* no of all residues predicted to be in helix *
* *
*..........................................................................*
* *
* Further measures of performance *
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *
* *
* Matthews correlation coefficient: *
* *
* +---------------------------------------------+ *
* | Chelix = 0.84, Cloop = 0.84 | *
* +---------------------------------------------+ *
*..........................................................................*
* *
* Average length of predicted transmembrane helices: *
* *
* +------------+----------+ *
* | predicted | observed | *
* +-----------+------------+----------+ *
* | Lhelix = | 24.6 | 22.2 | *
* +-----------+------------+----------+ *
*..........................................................................*
* *
* The accuracy matrix in detail: *
* *
* +---------------------------------+ *
* | number of residues with H, L | *
* +---------+------+-------+--------+ *
* | |net H | net L |sum obs | *
* +---------+------+-------+--------+ *
* | obs H | 5214 | 492 | 5706 | *
* | obs L | 1050 | 22423 | 23473 | *
* +---------+------+-------+--------+ *
* | sum Net | 6264 | 22915 | 29179 | *
* +---------+------+-------+--------+ *
* *
* Note: This table is to be read in the following manner: *
* 5214 of all residues predicted to be in a helical trans-membrane *
* region, were observed to be in the lipid bilayer, 1050 however *
* were observed either inside or outside of the protein, i.e. in *
* loop (or non-membrane) regions. The term "observed" refers to DSSP*
* assignment of secondary structure calculated from 3D coordinates *
* of experimentally determined structures (Dictionary of Secondary *
* Structure of Proteins: Kabsch & Sander (1983) Biopolymers, 22, *
* 2577-2637) where these were available. For all other proteins, *
* the assignment of trans-membrane segments has been taken from the *
* Swissprot data bank (Bairoch, A.; Boeckmann, B.: The SWISS-PROT *
* protein sequence data bank. Nucl. Acids Res. 20: 2019-2022, 1992).*
* *
*..........................................................................*
* *
* Overlap between predicted and observed segments: *
* *
* +-----------------+---------------+----------------+ *
* | segment overlap | % of observed | % of predicted | *
* | Sov helix | 95.6% | 95.5% | *
* | Sov loop | 83.6% | 97.2% | *
* +-----------------+---------------+----------------+ *
* | Sov total | 86.0% | 96.8% | *
* +-----------------+---------------+----------------+ *
* *
* Definition of Sov in: Rost et al., JMB, 1994, 235, 13-26. *
* *
* As helical trans-membrane segments are longer than globular heli- *
* ces, correctly predicted segments can easily be made out. PHDhtm *
* misses 5 out of 258 observed segments, predicts 6 where non is *
* observed and 3 times the predicted helical segment overlaps two *
* observed regions. Thus, in total more than 95% of all segments *
* are correctly predicted. *
* *
*..........................................................................*
* *
* Entropy of prediction (information measure): *
* *
* +-----------------+ *
* | I = 0.64 | *
* +-----------------+ *
* *
* (For comparison: homology modelling of globular proteins in three *
* states: I=0.62.) *
* Definition of Sov in: Rost et al., JMB, 1994, 235, 13-26. *
* *
* *
****************************************************************************
* *
* *
* Position-specific reliability index *
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *
* *
* The network predicts two states: helical trans-membrane region and rest *
* using two output units. The prediction is assigned by choosing the ma- *
* ximal unit ("winner takes all"). However, the real numbers of the out- *
* put units contain additional information. *
* E.g. the difference between the two output units can be used to derive *
* a "reliability index". This index is given for each residue along with *
* the prediction. The index is scaled to have values between 0 (lowest *
* reliability), and 9 (highest). *
* The accuracies (Qtot) to be expected for residues with values above a *
* particular value of the index are given below as well as the fraction *
* of such residues (%res).: *
* *
* +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ *
* | index| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | *
* | %res |100.0| 98.8| 97.3| 95.9| 94.1| 92.3| 89.9| 86.2| 75.0| 66.8| *
* +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ *
* | | | | | | | | | | | | *
* | Qtot | 94.7| 95.2| 95.6| 96.2| 96.7| 97.2| 97.7| 98.4| 99.4| 99.8| *
* | | | | | | | | | | | | *
* +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ *
* | H%obs| 91.8| 92.9| 93.8| 94.4| 95.0| 95.7| 96.2| 96.8| 95.5| 78.7| *
* | L%obs| 95.3| 95.7| 96.1| 96.6| 97.0| 97.5| 98.1| 98.8| 99.7|100.0| *
* | | | | | | | | | | | | *
* | H%prd| 82.7| 83.8| 85.0| 86.7| 88.1| 89.7| 91.4| 93.8| 96.3| 97.1| *
* | L%prd| 97.9| 98.3| 98.5| 98.7| 98.8| 99.0| 99.2| 99.4| 99.7| 99.9| *
* +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ *
* *
* The above table gives the cumulative results, e.g. 92.3% of all *
* residues have a reliability of at least 5. The overall two-state *
* accuracy for this subset is 97.2%. For this subset, e.g., 95.7% of *
* the observed helical trans-membrane residues are correctly predicted, *
* and 89.7% of all residues predicted to be in helical trans-membrane *
* segment are correct. *
* *
* *
* *
****************************************************************************