1Changes with 2.0u4 (February, 1996) 2 3Added '-L' option, which provides a longer discription of the library 4sequence. 5 6Fixed a bug in the -m 10 parseable output. 7 8Support is now provided for version 8.0 GCG libraries, both protein 9and DNA. Use library type 6. 10 11Changes with 2.0x4 (January, 1996) 12 13The major change in with 2.0x4 is the ability to get a parseable 14output from FASTA/TFASTA/SSEARCH. This can be done using output 15option -m 10. With -m 10, the initial histogram and list of best 16scores is unchanges, but the alignments are now in a parseable form: 17 18>>>mgstm1.aa, 217 aa vs s library 19; pg_name: FASTA 20; pg_ver: version 2.0x4 Jan., 1996 21; pg_matrix: BLOSUM50 22; pg_gap-pen: -12 -2 23; pg_ktup: 1 24; pg_optcut: 30 25; pg_cgap: 42 26>>GTB1_MOUSE GLUTATHIONE S-TRANSFERASE GT8.7 (EC 2.5.1.18 27; fa_initn: 1490 28; fa_init1: 1490 29; fa_opt: 1490 30; fa_z-score: 1916.0 31; fa_expect: 0 32; sw_score: 1490 33; sw_ident: 1.000 34; sw_overlap: 217 35>GT8.7 .. 36; sq_len: 217 37; sq_type: p 38; al_start: 1 39; al_stop: 217 40; al_display_start: 1 41PMILGYWNVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLNEKF 42KLGLDFPNLPYLIDGSHKITQSNAILRYLARKHHLDGETEEERIRADIVE 43NQVMDTRMQLIMLCYNPDFEKQKPEFLKTIPEKMKLYSEFLGKRPWFAGD 44KVTYVDFLAYDILDQYRMFEPKCLDAFPNLRDFLARFEGLKKISAYMKSS 45RYIATPIFSKMAHWSNK 46>GTB1_MOUSE .. 47; sq_len: 217 48; sq_type: p 49; al_start: 1 50; al_stop: 217 51; al_display_start: 1 52PMILGYWNVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLNEKF 53KLGLDFPNLPYLIDGSHKITQSNAILRYLARKHHLDGETEEERIRADIVE 54NQVMDTRMQLIMLCYNPDFEKQKPEFLKTIPEKMKLYSEFLGKRPWFAGD 55KVTYVDFLAYDILDQYRMFEPKCLDAFPNLRDFLARFEGLKKISAYMKSS 56RYIATPIFSKMAHWSNK 57>>GT28_SCHJA GLUTATHIONE S-TRANSFERASE 28 KD (EC 2.5.1.18 58; fa_initn: 190 59; fa_init1: 97 60; fa_opt: 169 61; fa_z-score: 217.9 62; fa_expect: 1.1e-05 63; sw_score: 169 64; sw_ident: 0.277 65; sw_overlap: 228 66>GT8.7 .. 67; sq_len: 217 68; sq_type: p 69; al_start: 4 70; al_stop: 180 71; al_display_start: 1 72PMILGYWNVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLNEKF 73KLGLDFPNLPY--LID--GSHK-ITQSNAILRYLARKHHLDGETEEERIR 74ADIVENQVMDTRMQLIMLCYNPDFEKQK--PEFLK-TIPEKMKLYSEFLG 75KRP--WFAGDKVTYVDFLAYDILDQYRMFEPKCLDA-FPNLRDFLARFEG 76LKKISAYMKSSRYIATPIFSKMAHWSNK 77>GT28_SCHJA .. 78; sq_len: 206 79; sq_type: p 80; al_start: 3 81; al_stop: 180 82; al_display_start: 1 83-VKLIYFNGRGRAEPIRMILVAAGVEFEDERIEFQDWP----------KI 84KPTIPGGRLPIVKITDKRGDVKTMSESLAIARFIARKHNMMGDTDDEYYI 85IEKMIGQVEDVESEYHKTLIKPPEEKEKISKEILNGKVPILLQAICETLK 86ESTGNLTVGDKVTLADVVLIASIDHITDLDKEFLTGKYPEIHKHRKHLLA 87TSPKLAKYLSERHATAF 88>>GT2_DROME GLUTATHIONE S-TRANSFERASE 2 (EC 2.5.1.18). 89; fa_initn: 124 90; fa_init1: 124 91; fa_opt: 164 92; fa_z-score: 210.1 93; fa_expect: 2.9e-05 94; sw_score: 164 95; sw_ident: 0.248 96; sw_overlap: 251 97>GT8.7 .. 98; sq_len: 217 99; sq_type: p 100; al_start: 4 101; al_stop: 198 102; al_display_start: 1 103---------------------------PMILGYWNVRGLTHPIRMLLEYT 104DSSYDEKRYTMGDAPDFDRSQWLNEKFKLGLDFPNLPYL-IDGSHKITQS 105NAILRYLARKHHLDGETEEERIRADIVENQVMDTRMQLIMLCYNPDFEKQ 106KPEFLKTIPEKMKLYSEFLGKR-----PWFAGDKVTYVDFLAYDILDQYR 107-MFEPKCLDAFPNLRDFLARFEGLKKISAYMKSSRYIATPIFSKMAHWSN 108K 109>GT2_DROME .. 110; sq_len: 247 111; sq_type: p 112; al_start: 52 113; al_stop: 240 114; al_display_start: 22 115PPAEGAEGAVEGGEAAPPAEPAEPIKHSYTLFYFNVKALPSPC------A 116TCSDGNQEYE--DVAHPRRVPALKPTMPMG----QMPVLEVDGK-RVHQS 117ISMARFLAKTVGLCGATPWEDLQIDIVVDTINDFRLKIAVVSYEPEDEIK 118EKKLVTLNAEVIPFYLEKLEQTVKDNDGHLALGKLTWADVYFAGITDYMN 119YMVKRDLLEPYPAVRGVVDAVNALEPIKAWIEKRPVTEV 120 121 122Note that the parseable output starts with ">>>" and that each 123alignment record starts with ">>" while each aligned sequence record 124starts with ">" 125 126All parameters produced by the fasta package will be of the form: 127 128 ; xx_yyyyy 129 130In this version, we have xx: 131 132 pg - program parameters (name, version, matrix) 133 fa - fasta scores, expect values, etc. 134 sw - Smith-Waterman scores, expect values. 135 sq - sequence length, type 136 al - alignment start, stop, display_offset 137 138Other FASTA distributors may choose to add additional fields. If they 139do, they should use a tag with more than two characters, e.g.: 140 141 ebi_access: 142or 143 gcg_????? 144 145The FASTA tags will be limited to two characters followed by a "_". 146 147All of the output parameters correspond to values that are presented 148in other FASTA output formats, with the exception of the "al_" 149parameters. 150 151al_start gives the location of the alignment start in the 152 original sequence 153 154al_stop gives the location of the end of the alignment in the 155 original sequence 156 157al_display_start 158 gives the location of the first displayed amino acid residue 159 in the original sequence. The -m 10 alignments are the same 160 as those produced in the other modes. In particular, 161 FASTA/SSEARCH provide some context for the alignment; if the 162 "-a" option is not used, FASTA/SSEARCH will try to provide 163 about 30 residues on either side of the actual local 164 alignment, if alignment is in the middle of one or the other 165 sequence. If the begining of the query sequence aligns with 166 the 10'th residue of the library sequence, then the query 167 sequence will be padded with ten leading "-" to produce the 168 alignment. The leading '-' are a formatting convenience only; 169 they are not considered in the numbering system for 170 al_display_start, al_start, or al_stop. 171 172 Thus: 173 174 >GT8.7 .. 175 ; sq_len: 217 176 ; sq_type: p 177 ; al_start: 3 178 ; al_stop: 180 179 ; al_display_start: 1 180 ---PMILGYWNVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLN 181 EKFKLGLDFPNLPYLIDGSHKITQSNAILRYLARKHH---LDGETEEERI 182 RADIVENQVMDTRMQLIMLCYNPDFEKQKPEFLKTIPEKMKLYSEFLGKR 183 PWFAGDKVTYVDFLAYDILDQYRMFEPKCLDA------FPNLRDFLARFE 184 GLKKISAYMKSSRYIATPIFSKMAHWSNK 185 >ARP2_TOBAC .. 186 ; sq_len: 223 187 ; sq_type: p 188 ; al_start: 6 189 ; al_stop: 181 190 ; al_display_start: 1 191 MAEVKLLGFW-YSPFSHRVEWALKIKGVKYE---YIEEDRD--NKSSLLL 192 QSNPV---YKKVPVLIHNGKPIVESMIILEYIDETFEGPSILPKDPYDRA 193 LARFWAKFLDDKVAAVVNTFFRKGEEQEKGK--EEVYEMLKVLDNELKDK 194 KFFAGDKFGFADIAANLVGFWLGVFEEGYGDVLVKSEKFPNFSKWRDEYI 195 NCSQVNESLPPRDELLAFFRARFQAVVASRSAPK 196 197 Says that to align the two sequences, the first 'P' of GT8.7 must 198 line up with the first 'V' (residue 4) in ARP2_TOBAC but that 199 the actual best local alignment starts with the first 'I' in 200 GT8.7 and the first 'L' in ARP2_TOBAC. 201 202