NOTE this will be extended to a more general threading format to be defined by the people in the field (hopefully with the same keywords!)
Bold face: keywords "prediction-based threading", and "return topits"
Effect: alignments with possible remote homologues (<25% sequence identity) are returned additionally in TOPITS format
Bold face: keyword "prediction-based threading"
Effect: alignments with possible remote homologues (<25% sequence identity) are returned
NOTE : only the output specific for this option is given!Threading results in TOPITS format: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ________________________________________________________________________________ # TOPITS (Threading One-D Predictions Into Three-D Structures) # -------------------------------------------------------------------------------- # FORMAT begin # FORMAT general: - lines starting with hashes contain comments or PARAMETERS # FORMAT general: - columns are delimited by tabs # FORMAT general: - the data are given in BLOCKS, each introduced by a line # FORMAT general: beginning with a hash and a keyword # FORMAT parameters: '# PARA:tab keyword =tab value tab (further-information)' # FORMAT notation: '# NOTATION:tab keyword tab explanation' # FORMAT info: '# INFO:tab text' # FORMAT blocks 0: '# BLOCK keyword' # FORMAT blocks 1: column names (tab delimited) # FORMAT blocks n>1: column data (tab delimited) # FORMAT file end: '//' is the end of a complete file # FORMAT end # -------------------------------------------------------------------------------- # PARA begin # PARA TOPITS HEADER: PARAMETERS # PARA: len1 = 24 # PARA: nali = 38 # PARA: listName = /home/phd/ut/topits/mat/x2.list # PARA: sortMode = ZSCORE # PARA: weight1 = NO # PARA: weight2 = NO # PARA: smin = -1.00 # PARA: smax = 2.00 # PARA: gapOpen = 2 # PARA: gapElon = 0.2 # PARA: indel1 = YES # PARA: indel2 = NO # PARA: threshold = ALL # PARA: str:seq = 50 (i.e. str= 50%, seq= 50%) # PARA end # -------------------------------------------------------------------------------- # NOTATION begin # NOTATION TOPITS HEADER: ABBREVIATIONS PARAMETERS # NOTATION: len1 : length of search sequence, i.e., your protein # NOTATION: nali : number of alignments in file # NOTATION: listName : fold library used for threading # NOTATION: sortMode : mode of ranking the hits # NOTATION: weight1 : YES if guide sequence weighted by residue conservation # NOTATION: weight2 : YES if aligned sequence weighted by residue conservation # NOTATION: smin : minimal value of alignment metric # NOTATION: smax : maximal value of alignment metric # NOTATION: gapOpen : gap open penalty # NOTATION: gapElon : gap elongation penalty # NOTATION: indel1 : YES if insertions in sec str regions allowed for guide seq # NOTATION: indel2 : YES if insertions in sec str regions allowed for aligned seq # NOTATION: threshold : hits above this threshold included (ALL means no threshold) # NOTATION: str:seq : weight structure:sequence # NOTATION TOPITS HEADER: ABBREVIATIONS SUMMARY # NOTATION: id2 : PDB identifier of aligned structure (1pdbC -> C = chain id) # NOTATION: pide : percentage of pairwise sequence identity # NOTATION: lali : length of alignment # NOTATION: ngap : number of insertions # NOTATION: lgap : number of residues inserted # NOTATION: len2 : length of aligned protein structure # NOTATION: Eali : alignment score # NOTATION: Zali : alignment zcore; note: hits with z>3 more reliable # NOTATION: strh : secondary str identity between guide and aligned protein # NOTATION: ifir : position of first residue of search sequence # NOTATION: ilas : position of last residue of search sequence # NOTATION: jfir : pos of first res of remote homologue (e.g. DSSP number) # NOTATION: jlas : pos of last res of remote homologue (e.g. DSSP number) # NOTATION: name : name of aligned protein structure # NOTATION end # -------------------------------------------------------------------------------- # INFO begin # INFO TOPITS HEADER: ACCURACY # INFO: Tested on 80 proteins, TOPITS found the correct remote homologue in about # INFO: 30%of the cases. Detection accuracy was higher for higher z-scores: # INFO: ZALI>0 => 1st hit correct in 33% of cases # INFO: ZALI>3 => 1st hit correct in 50% of cases # INFO: ZALI>3.5 => 1st hit correct in 60% of cases # INFO end # -------------------------------------------------------------------------------- # BLOCK TOPITS HEADER: SUMMARY rank id2 pide lali ngap lgap len2 Eali Zali strh ifir ilas jfir jlas name 1 1ytbA 21 24 0 0 180 18.37 1.46 75 1 24 38 61 1ytb_A TATA-BOX BINDING PROTEIN (YTBP) COMPLEXED WITH DNA 2 2yhx 21 24 1 1 457 17.37 1.28 83 1 24 60 84 2yhx YEAST HEXOKINASE B (E.C.2.7.1.1) COMPLEX WITH 3 1xnb 17 23 1 1 185 17.28 1.26 96 1 23 22 45 1xnb XYLANASE (ENDO-1,4-BETA-XYLANASE) (E.C.3.2.1.8) 4 1ysc 17 24 1 2 421 16.67 1.15 79 1 24 373 398 1ysc SERINE CARBOXYPEPTIDASE (CPY, CPD-Y, OR PROTEINASE C) 5 1whi 13 23 1 1 122 16.10 1.05 91 1 23 4 27 1whi MOL_ID: 1; 6 1xxaA 13 24 0 0 220 16.02 1.03 83 1 24 37 60 1xxa_A MOL_ID: 1; 7 1wkt 26 23 0 0 87 15.95 1.02 70 1 23 24 46 1wkt MOL_ID: 1; 8 1xsoA 30 23 2 4 301 15.95 1.02 87 2 24 152 178 1xso_A CU, ZN SUPEROXIDE DISMUTASE (E.C.1.15.1.1) 9 1xsoA 30 23 2 4 301 15.95 1.02 87 2 24 1 27 1xso_A CU, ZN SUPEROXIDE DISMUTASE (E.C.1.15.1.1) 10 1yal 21 24 1 4 218 15.92 1.01 75 1 24 145 172 1yal MOL_ID: 1; 11 1whtB 20 20 0 0 410 15.83 1.00 100 4 23 24 43 1wht_B SERINE CARBOXYPEPTIDASE II (E.C.3.4.16.1) COMPLEXED WITH 12 1wit 17 23 0 0 93 15.62 0.96 87 1 23 8 30 1wit MOL_ID: 1; 13 1xxaA 13 24 0 0 220 15.18 0.88 75 1 24 110 133 1xxa_A MOL_ID: 1; 14 1xxaA 13 24 0 0 220 15.18 0.88 75 1 24 183 206 1xxa_A MOL_ID: 1; 15 1xyzA 21 19 1 4 320 14.13 0.69 74 1 23 231 249 1xyz_A MOL_ID: 1; 16 1whtB 21 24 1 21 410 14.05 0.67 88 1 24 320 364 1wht_B SERINE CARBOXYPEPTIDASE II (E.C.3.4.16.1) COMPLEXED WITH 17 1xaa 32 19 1 3 345 13.58 0.59 77 2 23 257 275 1xaa MOL_ID: 1; 18 1xis 35 23 1 4 386 12.10 0.32 52 1 23 237 263 1xis XYLOSE ISOMERASE (E.C.5.3.1.5) COMPLEX WITH MN*CL2 19 1ycc 17 23 0 0 108 9.80 -0.10 48 1 23 29 51 1ycc CYTOCHROME C (ISOZYME 1) (REDUCED) 20 9wgaA 18 22 1 2 343 9.65 -0.13 58 1 24 247 268 9wga_A WHEAT GERM AGGLUTININ (ISOLECTIN 2) 21 9wgaA 18 22 1 2 343 8.28 -0.37 54 1 24 75 96 9wga_A WHEAT GERM AGGLUTININ (ISOLECTIN 2) 22 1wdcC 22 23 0 0 360 8.28 -0.37 26 1 23 231 253 1wdc_C MOL_ID: 1; 23 1wdcB 22 23 0 0 360 8.28 -0.37 26 1 23 231 253 1wdc_B MOL_ID: 1; 24 1wdcA 22 23 0 0 360 8.28 -0.37 26 1 23 231 253 1wdc_A MOL_ID: 1; 25 1wdcA 17 24 2 16 360 6.67 -0.67 58 1 24 154 193 1wdc_A MOL_ID: 1; 26 1wdcC 17 24 2 16 360 6.67 -0.67 58 1 24 154 193 1wdc_C MOL_ID: 1; 27 1wdcB 17 24 2 16 360 6.67 -0.67 58 1 24 154 193 1wdc_B MOL_ID: 1; 28 1zfd 16 19 1 4 32 6.47 -0.70 48 1 23 1 19 1zfd MOL_ID: 1; 29 1yrnA 33 6 0 0 128 4.25 -1.11 33 18 23 12 17 1yrn_A MOL_ID: 1; 30 1yrnB 33 6 0 0 128 4.25 -1.11 33 18 23 12 17 1yrn_B MOL_ID: 1; 31 1yrnA 20 5 0 0 128 4.03 -1.15 60 1 5 124 128 1yrn_A MOL_ID: 1; 32 1yrnB 20 5 0 0 128 4.03 -1.15 60 1 5 124 128 1yrn_B MOL_ID: 1; 33 3wrp 25 4 0 0 101 3.72 -1.20 75 20 23 24 27 3wrp $TRP APOREPRESSOR 34 1wdcA 33 3 0 0 360 3.10 -1.32 100 1 3 61 63 1wdc_A MOL_ID: 1; 35 1wdcB 33 3 0 0 360 3.10 -1.32 100 1 3 61 63 1wdc_B MOL_ID: 1; 36 1wdcC 33 3 0 0 360 3.10 -1.32 100 1 3 61 63 1wdc_C MOL_ID: 1; 37 1wfbA 0 3 0 0 37 2.52 -1.42 67 21 23 35 37 1wfb_A ANTIFREEZE PROTEIN ISOFORM HPLC6 (-180 DEGREES C) 38 1xxaA 0 1 0 0 220 0.70 -1.75 100 1 1 220 220 1xxa_A MOL_ID: 1; //