Example for output produced by EvalSec

(evaluation of secondary structure prediction accuracy)

OUTPUT

SYM  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~            
SYM  Abbreviations for accuracy of secondary structure prediction            
SYM  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~            
SYM                                                                          
SYM          For an explanation of the scores, please see:                   
SYM          per-residue accuracy: Rost & Sander, JMB, 1993, 232, 584-599    
SYM          per-segment accuracy: Rost et al., JMB, 1994, 235, 13-26        
SYM                                                                          
SYM  H, E, L:   helix (H), extended strand (E), all others (L = loop)        
SYM                                                                          
SYM  obs, prd:  observed, predicted                                          
SYM                                                                          
SYM  ~~~~~~~~~~~~~~~~~~                                                      
SYM  Per-residue scores                                                      
SYM  ~~~~~~~~~~~~~~~~~~                                                      
SYM                                                                          
SYM  A(i,j): number of residues observed in secondary structure state i and  
SYM          predicted in secondary structure state j, where i and j is can  
SYM          be either of the following: helix (H), strand (E), or other (L) 
SYM                                                                          
SYM            number of residues correctly predicted in state i             
SYM  Q(i)obs = ------------------------------------------------- * 100       
SYM            number of residues observed in state i                        
SYM                                                                          
SYM            number of residues correctly predicted in state i             
SYM  Q(i)prd = ------------------------------------------------- * 100       
SYM            number of residues predicted in state i                       
SYM                                                                          
SYM  Q3      overall three-state per-residue accuracy (three states: H,E,L)  
SYM          defined by:                                                     
SYM            number of residues correctly predicted                        
SYM          = --------------------------------------- * 100                 
SYM            number of all residues                                        
SYM                                                                          
SYM  BAD     percentage of residues predicted in helix, observed in strand   
SYM          or predicted in strand and observed in helix                    
SYM                                                                          
SYM  OVER    percentage of residues predicted in helix or strand, and ob-    
SYM          served in loop                                                  
SYM                                                                          
SYM  UNDER   percentage of residues predicted in helix or strand, and ob-    
SYM          served in loop                                                  
SYM                                                                          
SYM  Iobs    information entropy contained in matrix A(i,j), defined by:     
SYM                                                                          
SYM              SUM                    SUM                                  
SYM              SUM   a(i)*ln a(i)  -  SUM  A(i,j) * ln A(i,j)              
SYM              SUM                    SUM                                  
SYM               i                      ij                                  
SYM          =  ________________________________________________             
SYM                                                                          
SYM                                 SUM                                      
SYM                    N * ln N  -  SUM  b(i) * ln b(i)                      
SYM                                 SUM                                      
SYM                                  i                                       
SYM                                                                          
SYM          where N is the number of residues, a(i) the number of residues  
SYM          predicted to be in secondary structure i; b(i) the number of re-
SYM          sidues observed to be in i; and A(i,j) the number of residues   
SYM          predicted to be in i and observed to be in j.                   
SYM                                                                          
SYM  Iprd    information entropy but weighted by the predicted numbers, i.e.,
SYM          same as Iobs by exchanging b(i) <-> a(i).                       
SYM                                                                          
SYM  COR(i)  Matthew correlation coefficient for structure i                 
SYM                                                                          
SYM  /prot                                                               
SYM          overall three-state accuracy averaged over proteins (as opposed 
SYM          posed to residues).                                             
SYM                                                                          
SYM  Dcontent(i)                                                             
SYM          difference between observed and predicted content of secondary  
SYM          structure type i (percentage)                                   
SYM                                                                          
SYM                                                                          
SYM  ~~~~~~~~~~~~~~~~~~                                                      
SYM  Per-segment scores                                                      
SYM  ~~~~~~~~~~~~~~~~~~                                                      
SYM                                                                          
SYM  avL(i)obs   average length for the structure type i as observed         
SYM          e.g., average length of an observed helix                       
SYM                                                                          
SYM  avL(i)prd   average length for the structure type i as predicted        
SYM          e.g., average length of a predicted helix                       
SYM                                                                          
SYM  SOV(i)obs                                                               
SYM  SOV(i)prd                                                               
SYM          fractional overlap (in percentage between segments predicted    
SYM          and observed in structure type i), defined by:                  
SYM                                                                          
SYM            SUM   1     MINOV(S1;S2) + DELTA                              
SYM  SOV(i)  = SUM   -  *  --------------------  *  LEN(S1)                  
SYM            SUM   N         MAXOV(S1;S2)                                  
SYM             S                                                            
SYM                                                                          
SYM          where N is the total number of residues, S1 and S2 are the ob-  
SYM          served and predicted secondary structure segments (in state i), 
SYM          and LEN(S1) is the number of residues in the segments of S1.    
SYM          The sum (SUMSUMSUM) is taken over all segment pairs S={S1,S2}.  
SYM          The actual overlap bewteen the two segments is MINOV, i.e.,the  
SYM          number of residues for which both segments have, e.g., a H (he- 
SYM          lix) in common; maxov is the total extent of both segments,i.e.,
SYM          the number of residues jfor which either jof the two has, say,  
SYM          the assigned state H.  The accepted variation DELTA assures a   
SYM          ratio of 1.0 when there are only minor deviations at segment    
SYM          ends; it is chosen to be smaller than MINOV and smaller than    
SYM          half the length of segment S1.  The ratio MINOV/MAXOV is con-   
SYM          strained to a maximum value of 1.0, i.e., the allowance cannot  
SYM          lead to a "more than perfect" value of fractional overlap for   
SYM  obs     any segment comparison.  The addition of 'obs' (SOV(i)obs)      
SYM          indicates that the length of the observed segments was used for 
SYM          weighting (likelihood that an observed segment is correctly     
SYM  prd     predicted), i.e., S1 is the observed segment. In contrast, 'prd'
SYM          labels the weighting by the lengtyh of the predicted segments   
SYM          (likelihood that a predicted segment is correct).               
SYM                                                                          
SYM  SOV3    fractional segment overlap for all three states H, E, L         
SYM                                                                          
# 
# *****************************
# Prediction accuracy for FIRST                                   
# *****************************
# 
# A(i,j): number of residues observed in state i, predicted in j:
# 
DAT +---------+---------+---------+---------+---------+
DAT | NUMBERS |  prd  H |  prd  E |  prd  L | obs Sum | 
DAT +---------+---------+---------+---------+---------+
DAT |  obs  H |       8 |       0 |       2 |      10 |
DAT |  obs  E |       0 |       0 |       0 |       0 |
DAT |  obs  L |       0 |       0 |       5 |       5 |
DAT +---------+---------+---------+---------+---------+
DAT | prd Sum |       8 |       0 |       7 |      15 |
DAT +---------+---------+---------+---------+---------+
# 
# Per-residue and Per-segment scores:
# 
DAT +---------------------------------+ +---------------------------------------+
DAT |        Per-residue scores       | |           Per-segment scores          |
DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+
DAT | SCORES  |Q(i)obs|Q(i)prd| COR(i)| |SOV(i)obs|SOV(i)prd|avL(i)obs|avL(i)prd|
DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+
DAT | i =  H  |    80 |   100 |  0.76 | |   100.0 |   100.0 |     5.0 |     4.0 | 
DAT | i =  E  |     0 |     0 |  0.00 | |     0.0 |     0.0 |     0.0 |     0.0 | 
DAT | i =  L  |   100 |    71 |  0.76 | |   100.0 |   100.0 |     2.5 |     3.5 | 
DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+
# 
# Overall scores:
# 
DAT +---------------------------------+ +---------------------------------------+
DAT |   Overall per-residue scores    | |       Overall per-segment scores      |
DAT +-------+--------+-------+--------+ +---------+---------+---------+---------+
DAT | OVER  |   0.0  | UNDER |  13.3  | |                                       |
DAT | I obs |   0.48 | I prd |   0.48 | |                                       |
DAT |  Q3   |  86.7  |  BAD  |   0.0  | | SOV3obs |  100.0  | SOV3prd |  100.0  |
DAT +-------+========+-------+--------+ +---------+=========+---------+---------+
# 
# ******************************
# Prediction accuracy for SECOND                                  
# ******************************
# 
# A(i,j): number of residues observed in state i, predicted in j:
# 
DAT +---------+---------+---------+---------+---------+
DAT | NUMBERS |  prd  H |  prd  E |  prd  L | obs Sum | 
DAT +---------+---------+---------+---------+---------+
DAT |  obs  H |       0 |       2 |       2 |       4 |
DAT |  obs  E |       0 |       0 |       0 |       0 |
DAT |  obs  L |       0 |       1 |       2 |       3 |
DAT +---------+---------+---------+---------+---------+
DAT | prd Sum |       0 |       3 |       4 |       7 |
DAT +---------+---------+---------+---------+---------+
# 
# Per-residue and Per-segment scores:
# 
DAT +---------------------------------+ +---------------------------------------+
DAT |        Per-residue scores       | |           Per-segment scores          |
DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+
DAT | SCORES  |Q(i)obs|Q(i)prd| COR(i)| |SOV(i)obs|SOV(i)prd|avL(i)obs|avL(i)prd|
DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+
DAT | i =  H  |     0 |     0 |  0.00 | |     0.0 |     0.0 |     4.0 |     0.0 | 
DAT | i =  E  |     0 |     0 |  0.00 | |     0.0 |     0.0 |     0.0 |     3.0 | 
DAT | i =  L  |    66 |    50 |  0.17 | |   100.0 |    50.0 |     3.0 |     2.0 | 
DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+
# 
# Overall scores:
# 
DAT +---------------------------------+ +---------------------------------------+
DAT |   Overall per-residue scores    | |       Overall per-segment scores      |
DAT +-------+--------+-------+--------+ +---------+---------+---------+---------+
DAT | OVER  |  14.3  | UNDER |  28.6  | |                                       |
DAT | I obs |  -0.38 | I prd |  -0.38 | |                                       |
DAT |  Q3   |  28.6  |  BAD  |  28.6  | | SOV3obs |   42.9  | SOV3prd |   28.6  |
DAT +-------+========+-------+--------+ +---------+=========+---------+---------+
# 
# *************************************************
# Prediction accuracy for Average over all residues               
# *************************************************
# 
# A(i,j): number of residues observed in state i, predicted in j:
# 
DAT +---------+---------+---------+---------+---------+
DAT | NUMBERS |  prd  H |  prd  E |  prd  L | obs Sum | 
DAT +---------+---------+---------+---------+---------+
DAT |  obs  H |       8 |       2 |       4 |      14 |
DAT |  obs  E |       0 |       0 |       0 |       0 |
DAT |  obs  L |       0 |       1 |       7 |       8 |
DAT +---------+---------+---------+---------+---------+
DAT | prd Sum |       8 |       3 |      11 |      22 |
DAT +---------+---------+---------+---------+---------+
# 
# Per-residue and Per-segment scores:
# 
DAT +---------------------------------+ +---------------------------------------+
DAT |        Per-residue scores       | |           Per-segment scores          |
DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+
DAT | SCORES  |Q(i)obs|Q(i)prd| COR(i)| |SOV(i)obs|SOV(i)prd|avL(i)obs|avL(i)prd|
DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+
DAT | i =  H  |    57 |   100 |  0.57 | |    71.4 |   100.0 |     4.7 |     4.0 | 
DAT | i =  E  |     0 |     0 |  0.00 | |     0.0 |     0.0 |     0.0 |     3.0 | 
DAT | i =  L  |    87 |    63 |  0.57 | |   100.0 |    81.8 |     2.7 |     2.8 | 
DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+
# 
# Overall scores:
# 
DAT +---------------------------------+ +---------------------------------------+
DAT |   Overall per-residue scores    | |       Overall per-segment scores      |
DAT +-------+--------+-------+--------+ +---------+---------+---------+---------+
DAT | OVER  |   4.5  | UNDER |  18.2  | |                                       |
DAT | I obs |  42.00 | I prd |  42.00 | |                                       |
DAT |  Q3   |  68.2  |  BAD  |   9.1  | | SOV3obs |   81.8  | SOV3prd |   77.3  |
DAT +-------+========+-------+--------+ +---------+=========+---------+---------+
# 
# Per-residue accuracy averaged over all     2 proteins:
# 
+---------------------+---------------------------------+
| /prot  =  57.62 | one standard deviation =  57.62 |
+---------------------+---------------------------------+
# 
# Accuracy of predicting secondary structural content:
# 
DAT +---------------------+---------------------------------+
DAT | Dcontent H =  35.24 | one standard deviation =  35.24 |
DAT | Dcontent E =  21.43 | one standard deviation =  21.43 |
DAT +---------------------+---------------------------------+
# 
# Accuracy of predicting secondary structural class:
# 
#        Sorting into structure class according to 
#        Zhang, C.-T. and Chou, K.-C., Prot. Sci. 1:401-408, 1992:
#           all-H: percentage of H >= 45% , percentage of E <  5%
#           all-E: percentage of H <   5% , percentage of E >=45%
#           mix  : percentage of H >= 30% , percentage of E >=20%
# 
DAT +-------+-------+-------+-------+-------+-------+
DAT |       |  sum  |  sum  |  sum  |   Q   |   Q   |
DAT | class |  obs  |  prd  |correct| %obs  | %prd  |
DAT +-------+-------+-------+-------+-------+-------+
DAT | all-H |    2  |    1  |    1  |  50.0 | 100.0 | 
DAT | all-E |    0  |    0  |    0  |   0.0 |   0.0 | 
DAT | mix   |    0  |    0  |    0  |   0.0 |   0.0 | 
DAT | other |    0  |    1  |    0  |   0.0 |   0.0 | 
DAT +-------+-------+-------+-------+-------+-------+
DAT | SUM   |    4  |    4  |    2  |  50.0 |  50.0 | 
DAT +-------+-------+-------+-------+-------+-------+
END