SYM ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SYM Abbreviations for accuracy of secondary structure prediction SYM ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SYM SYM For an explanation of the scores, please see: SYM per-residue accuracy: Rost & Sander, JMB, 1993, 232, 584-599 SYM per-segment accuracy: Rost et al., JMB, 1994, 235, 13-26 SYM SYM H, E, L: helix (H), extended strand (E), all others (L = loop) SYM SYM obs, prd: observed, predicted SYM SYM ~~~~~~~~~~~~~~~~~~ SYM Per-residue scores SYM ~~~~~~~~~~~~~~~~~~ SYM SYM A(i,j): number of residues observed in secondary structure state i and SYM predicted in secondary structure state j, where i and j is can SYM be either of the following: helix (H), strand (E), or other (L) SYM SYM number of residues correctly predicted in state i SYM Q(i)obs = ------------------------------------------------- * 100 SYM number of residues observed in state i SYM SYM number of residues correctly predicted in state i SYM Q(i)prd = ------------------------------------------------- * 100 SYM number of residues predicted in state i SYM SYM Q3 overall three-state per-residue accuracy (three states: H,E,L) SYM defined by: SYM number of residues correctly predicted SYM = --------------------------------------- * 100 SYM number of all residues SYM SYM BAD percentage of residues predicted in helix, observed in strand SYM or predicted in strand and observed in helix SYM SYM OVER percentage of residues predicted in helix or strand, and ob- SYM served in loop SYM SYM UNDER percentage of residues predicted in helix or strand, and ob- SYM served in loop SYM SYM Iobs information entropy contained in matrix A(i,j), defined by: SYM SYM SUM SUM SYM SUM a(i)*ln a(i) - SUM A(i,j) * ln A(i,j) SYM SUM SUM SYM i ij SYM = ________________________________________________ SYM SYM SUM SYM N * ln N - SUM b(i) * ln b(i) SYM SUM SYM i SYM SYM where N is the number of residues, a(i) the number of residues SYM predicted to be in secondary structure i; b(i) the number of re- SYM sidues observed to be in i; and A(i,j) the number of residues SYM predicted to be in i and observed to be in j. SYM SYM Iprd information entropy but weighted by the predicted numbers, i.e., SYM same as Iobs by exchanging b(i) <-> a(i). SYM SYM COR(i) Matthew correlation coefficient for structure i SYM SYM/prot SYM overall three-state accuracy averaged over proteins (as opposed SYM posed to residues). SYM SYM Dcontent(i) SYM difference between observed and predicted content of secondary SYM structure type i (percentage) SYM SYM SYM ~~~~~~~~~~~~~~~~~~ SYM Per-segment scores SYM ~~~~~~~~~~~~~~~~~~ SYM SYM avL(i)obs average length for the structure type i as observed SYM e.g., average length of an observed helix SYM SYM avL(i)prd average length for the structure type i as predicted SYM e.g., average length of a predicted helix SYM SYM SOV(i)obs SYM SOV(i)prd SYM fractional overlap (in percentage between segments predicted SYM and observed in structure type i), defined by: SYM SYM SUM 1 MINOV(S1;S2) + DELTA SYM SOV(i) = SUM - * -------------------- * LEN(S1) SYM SUM N MAXOV(S1;S2) SYM S SYM SYM where N is the total number of residues, S1 and S2 are the ob- SYM served and predicted secondary structure segments (in state i), SYM and LEN(S1) is the number of residues in the segments of S1. SYM The sum (SUMSUMSUM) is taken over all segment pairs S={S1,S2}. SYM The actual overlap bewteen the two segments is MINOV, i.e.,the SYM number of residues for which both segments have, e.g., a H (he- SYM lix) in common; maxov is the total extent of both segments,i.e., SYM the number of residues jfor which either jof the two has, say, SYM the assigned state H. The accepted variation DELTA assures a SYM ratio of 1.0 when there are only minor deviations at segment SYM ends; it is chosen to be smaller than MINOV and smaller than SYM half the length of segment S1. The ratio MINOV/MAXOV is con- SYM strained to a maximum value of 1.0, i.e., the allowance cannot SYM lead to a "more than perfect" value of fractional overlap for SYM obs any segment comparison. The addition of 'obs' (SOV(i)obs) SYM indicates that the length of the observed segments was used for SYM weighting (likelihood that an observed segment is correctly SYM prd predicted), i.e., S1 is the observed segment. In contrast, 'prd' SYM labels the weighting by the lengtyh of the predicted segments SYM (likelihood that a predicted segment is correct). SYM SYM SOV3 fractional segment overlap for all three states H, E, L SYM # # ***************************** # Prediction accuracy for FIRST # ***************************** # # A(i,j): number of residues observed in state i, predicted in j: # DAT +---------+---------+---------+---------+---------+ DAT | NUMBERS | prd H | prd E | prd L | obs Sum | DAT +---------+---------+---------+---------+---------+ DAT | obs H | 8 | 0 | 2 | 10 | DAT | obs E | 0 | 0 | 0 | 0 | DAT | obs L | 0 | 0 | 5 | 5 | DAT +---------+---------+---------+---------+---------+ DAT | prd Sum | 8 | 0 | 7 | 15 | DAT +---------+---------+---------+---------+---------+ # # Per-residue and Per-segment scores: # DAT +---------------------------------+ +---------------------------------------+ DAT | Per-residue scores | | Per-segment scores | DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+ DAT | SCORES |Q(i)obs|Q(i)prd| COR(i)| |SOV(i)obs|SOV(i)prd|avL(i)obs|avL(i)prd| DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+ DAT | i = H | 80 | 100 | 0.76 | | 100.0 | 100.0 | 5.0 | 4.0 | DAT | i = E | 0 | 0 | 0.00 | | 0.0 | 0.0 | 0.0 | 0.0 | DAT | i = L | 100 | 71 | 0.76 | | 100.0 | 100.0 | 2.5 | 3.5 | DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+ # # Overall scores: # DAT +---------------------------------+ +---------------------------------------+ DAT | Overall per-residue scores | | Overall per-segment scores | DAT +-------+--------+-------+--------+ +---------+---------+---------+---------+ DAT | OVER | 0.0 | UNDER | 13.3 | | | DAT | I obs | 0.48 | I prd | 0.48 | | | DAT | Q3 | 86.7 | BAD | 0.0 | | SOV3obs | 100.0 | SOV3prd | 100.0 | DAT +-------+========+-------+--------+ +---------+=========+---------+---------+ # # ****************************** # Prediction accuracy for SECOND # ****************************** # # A(i,j): number of residues observed in state i, predicted in j: # DAT +---------+---------+---------+---------+---------+ DAT | NUMBERS | prd H | prd E | prd L | obs Sum | DAT +---------+---------+---------+---------+---------+ DAT | obs H | 0 | 2 | 2 | 4 | DAT | obs E | 0 | 0 | 0 | 0 | DAT | obs L | 0 | 1 | 2 | 3 | DAT +---------+---------+---------+---------+---------+ DAT | prd Sum | 0 | 3 | 4 | 7 | DAT +---------+---------+---------+---------+---------+ # # Per-residue and Per-segment scores: # DAT +---------------------------------+ +---------------------------------------+ DAT | Per-residue scores | | Per-segment scores | DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+ DAT | SCORES |Q(i)obs|Q(i)prd| COR(i)| |SOV(i)obs|SOV(i)prd|avL(i)obs|avL(i)prd| DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+ DAT | i = H | 0 | 0 | 0.00 | | 0.0 | 0.0 | 4.0 | 0.0 | DAT | i = E | 0 | 0 | 0.00 | | 0.0 | 0.0 | 0.0 | 3.0 | DAT | i = L | 66 | 50 | 0.17 | | 100.0 | 50.0 | 3.0 | 2.0 | DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+ # # Overall scores: # DAT +---------------------------------+ +---------------------------------------+ DAT | Overall per-residue scores | | Overall per-segment scores | DAT +-------+--------+-------+--------+ +---------+---------+---------+---------+ DAT | OVER | 14.3 | UNDER | 28.6 | | | DAT | I obs | -0.38 | I prd | -0.38 | | | DAT | Q3 | 28.6 | BAD | 28.6 | | SOV3obs | 42.9 | SOV3prd | 28.6 | DAT +-------+========+-------+--------+ +---------+=========+---------+---------+ # # ************************************************* # Prediction accuracy for Average over all residues # ************************************************* # # A(i,j): number of residues observed in state i, predicted in j: # DAT +---------+---------+---------+---------+---------+ DAT | NUMBERS | prd H | prd E | prd L | obs Sum | DAT +---------+---------+---------+---------+---------+ DAT | obs H | 8 | 2 | 4 | 14 | DAT | obs E | 0 | 0 | 0 | 0 | DAT | obs L | 0 | 1 | 7 | 8 | DAT +---------+---------+---------+---------+---------+ DAT | prd Sum | 8 | 3 | 11 | 22 | DAT +---------+---------+---------+---------+---------+ # # Per-residue and Per-segment scores: # DAT +---------------------------------+ +---------------------------------------+ DAT | Per-residue scores | | Per-segment scores | DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+ DAT | SCORES |Q(i)obs|Q(i)prd| COR(i)| |SOV(i)obs|SOV(i)prd|avL(i)obs|avL(i)prd| DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+ DAT | i = H | 57 | 100 | 0.57 | | 71.4 | 100.0 | 4.7 | 4.0 | DAT | i = E | 0 | 0 | 0.00 | | 0.0 | 0.0 | 0.0 | 3.0 | DAT | i = L | 87 | 63 | 0.57 | | 100.0 | 81.8 | 2.7 | 2.8 | DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+ # # Overall scores: # DAT +---------------------------------+ +---------------------------------------+ DAT | Overall per-residue scores | | Overall per-segment scores | DAT +-------+--------+-------+--------+ +---------+---------+---------+---------+ DAT | OVER | 4.5 | UNDER | 18.2 | | | DAT | I obs | 42.00 | I prd | 42.00 | | | DAT | Q3 | 68.2 | BAD | 9.1 | | SOV3obs | 81.8 | SOV3prd | 77.3 | DAT +-------+========+-------+--------+ +---------+=========+---------+---------+ # # Per-residue accuracy averaged over all 2 proteins: # +---------------------+---------------------------------+ | /prot = 57.62 | one standard deviation = 57.62 | +---------------------+---------------------------------+ # # Accuracy of predicting secondary structural content: # DAT +---------------------+---------------------------------+ DAT | Dcontent H = 35.24 | one standard deviation = 35.24 | DAT | Dcontent E = 21.43 | one standard deviation = 21.43 | DAT +---------------------+---------------------------------+ # # Accuracy of predicting secondary structural class: # # Sorting into structure class according to # Zhang, C.-T. and Chou, K.-C., Prot. Sci. 1:401-408, 1992: # all-H: percentage of H >= 45% , percentage of E < 5% # all-E: percentage of H < 5% , percentage of E >=45% # mix : percentage of H >= 30% , percentage of E >=20% # DAT +-------+-------+-------+-------+-------+-------+ DAT | | sum | sum | sum | Q | Q | DAT | class | obs | prd |correct| %obs | %prd | DAT +-------+-------+-------+-------+-------+-------+ DAT | all-H | 2 | 1 | 1 | 50.0 | 100.0 | DAT | all-E | 0 | 0 | 0 | 0.0 | 0.0 | DAT | mix | 0 | 0 | 0 | 0.0 | 0.0 | DAT | other | 0 | 1 | 0 | 0.0 | 0.0 | DAT +-------+-------+-------+-------+-------+-------+ DAT | SUM | 4 | 4 | 2 | 50.0 | 50.0 | DAT +-------+-------+-------+-------+-------+-------+ END