CROSS REFERENCE TO RELATED APPLICATIONS This application is a divisional application of U.S. application Ser. No. 13/824,317 filed Dec. 18, 2013, now issued as U.S. Pat. No. 9,096,871; which is a 35 USC §371 National Stage application of International Application No. PCT/US2011/055181 filed Oct. 6, 2011, now expired; which claims the benefit under 35 USC §119(e) to U.S. Application Ser. No. 61/390,392 filed Oct. 6, 2010, now expired. The disclosure of each of the prior applications is considered part of and is incorporated by reference in the disclosure of this application.
BACKGROUND OF THE INVENTION Cellulose is an unbranched polymer of glucose linked by β(1→4)-glycosidic bonds. Cellulose chains can interact with each other via hydrogen bonding to form a crystalline solid of high mechanical strength and chemical stability. The cellulose chains are depolymerized into glucose and short oligosaccharides before organisms, such as the fermenting microbes used in ethanol production, can use them as metabolic fuel. Cellulase enzymes catalyze the hydrolysis of the cellulose (hydrolysis of β-1,4-D-glucan linkages) in the biomass into products such as glucose, cellobiose, and other cellooligosaccharides. Cellulase is a generic term denoting a multienzyme mixture comprising exo-acting cellobiohydrolases (CBHs), endoglucanases (EGs) and β-glucosidases (BGs) that can be produced by a number of plants and microorganisms. Enzymes in the cellulase of Trichoderma reesei include CBH I (more generally, Ce17A), CBH2 (Cel6A), EG1 (Cel7B), EG2 (Cel5), EG3 (Cel2), EG4 (Cel61A), EG5 (Cel45A), EG6 (Cel74A), Cip1, Cip2, β-glucosidases (including, e.g., Cel3A), acetyl xylan esterase, β-mannanase, and swollenin.
Cellulase enzymes work synergistically to hydrolyze cellulose to glucose. CBH I and CBH II act on opposing ends of cellulose chains (Barr et al., 1996, Biochemistry 35:586-92), while the endoglucanases act at internal locations in the cellulose. The primary product of these enzymes is cellobiose, which is further hydrolyzed to glucose by one or more β-glucosidases.
The cellobiohydrolases are subject to inhibition by their direct product, cellobiose, which results in a slowing down of saccharification reactions as product accumulates. There is a need for new and improved cellobiohyrolases with improved productivity that maintain their reaction rates during the course of a saccharification reaction, for use in the conversion of cellulose into fermentable sugars and for related fields of cellulosic material processing such as pulp and paper, textiles and animal feeds.
SUMMARY OF THE INVENTION The present disclosure relates to variant CBH I polypeptides. Most naturally occurring CBH I polypeptides have arginines at positions corresponding to R268 and R411 of T. reesei CBH I (SEQ ID NO:2). The variant CBH I polypeptides of the present disclosure include a substitution at either or both positions resulting in a reduction or decrease in product (e.g., cellobiose) inhibition. Such variants are sometimes referred to herein as “product tolerant.”
The variant CBH I polypeptides of the disclosure minimally contain at least a CBH I catalytic domain, comprising (a) a substitution at the amino acid position corresponding to R268 of T. reesei CBH I (“R268 substitution”); (b) a substitution at the amino acid position corresponding to R411 of T. reesei CBH I (“R411 substitution”); or (c) both an R268 substitution and an R411 substitution. The amino acid positions of exemplary CBH I polypeptides into which R268 and/or R411 substitutions can be introduced are shown in Table 1, and the amino acid positions corresponding to R268 and/or R411 in these exemplary CBH I polypeptides are shown in Table 2.
R268 and/or R411 substituents can include lysines and/or alanines Accordingly, the present disclosure provides a variant CBH I polypeptide comprising a CBH I catalytic domain with one of the following amino acid substitutions or pairs of R268 and/or R411 substitutions: (a) R268K and R411K; (b) R268K and R411A; (c) R268A and R411K; (d) R268A and R411A; (e) R268A; (f) R268K; (g) R411A; and (h) R411K. In some embodiments, however, the amino acid sequence of the variant CBH I polypeptide does not comprise or consist of SEQ ID NO:299, SEQ ID NO:300, SEQ ID NO:301, or SEQ ID NO:302.
The variant CBHI polypeptides of the disclosure typically include a CD comprising an amino acid sequence having at least 50% sequence identity to a CD of a reference CBH I exemplified in Table 1. The CD portions of the CBH I polypeptides exemplified in Table 1 are delineated in Table 3. The variant CBH I polypeptides can have a cellulose binding domain (“CBD”) sequence in addition to the catalytic domain (“CD”) sequence. The CBD can be N- or C-terminal to the CD, and the CBD and CD are optionally connected via a linker sequence.
The variant CBH I polypeptides can be mature polypeptides or they may further comprise a signal sequence.
Additional embodiments of the variant CBH I polypeptides are provided in Section 0.
The variant CBH I polypeptides of the disclosure typically exhibit reduced product inhibition by cellobiose. In certain embodiments, the IC50 of cellobiose towards a variant CBH I polypeptide of the disclosure is at least 1.2-fold, at least 1.5-fold, or at least 2-fold the IC50 of cellobiose towards a reference CBH I lacking the R268 substitution and/or R411 substitution present in the variant. Additional embodiments of the product inhibition characteristics of the variant CBH I polypeptides are provided in Section 0.
The variant CBH I polypeptides of the disclosure typically retain some cellobiohydrolase activity. In certain embodiments, a variant CBH I polypeptide retains at least 50% the CBH I activity of a reference CBH I lacking the R268 substitution and/or R411 substitution present in the variant. Additional embodiments of cellobiohydrolase activity of the variant CBH I polypeptides are provided in Section 0.
The present disclosure further provides compositions (including cellulase compositions, e.g., whole cellulase compositions, and fermentation broths) comprising variant CBH I polypeptides. Additional embodiments of compositions comprising variant CBH I polypeptides are provided in Section 0. The variant CBH I polypeptides and compositions comprising them can be used, inter alia, in processes for saccharifying biomass. Additional details of saccharification reactions, and additional applications of the variant CBH I polypeptides, are provided in Section 0.
The present disclosure further provides nucleic acids (e.g., vectors) comprising nucleotide sequences encoding variant CBH I polypeptides as described herein, and recombinant cells engineered to express the variant CBH I polypeptides. The recombinant cell can be a prokaryotic (e.g., bacterial) or eukaryotic (e.g., yeast or filamentous fungal) cell. Further provided are methods of producing and optionally recovering the variant CBH I polypeptides. Additional embodiments of the recombinant expression system suitable for expression and production of the variant CBH I polypeptides are provided in Section 0.
BRIEF DESCRIPTION OF THE DRAWINGS AND TABLES FIGS. 1A-1B: Cellobiose dose-response curves using a 4-MUL assay for a wild-type CBH I (BD29555; FIG. 1A) and a R268K/R411K variant CBH I (BD29555 with the substitutions R273K/R422K; FIG. 1B).
FIGS. 2A-2B: The effect of cellobiose accumulation on the activity of wild-type CBH I and a R268K/R411K variant CBH I, based on percent conversion of glucan after 72 hours in the bagasse assay. FIG. 2A shows relative activity in the presence (+) and absence (−) of β-glucosidase (BG), where relative activity is normalized to wild type activity with BG (WT+=1). FIG. 2B shows tolerance to cellobiose as a function of the ratio of activity in the absence vs. presence of β-glucosidase (activity ratio=Activity −BG/Activity +BG).
FIG. 3: Cellobiose dose-response curves using PASC assay for a R268K/R411K variant CBH I polypeptide as compared to two wild type CBH I polypeptides.
FIG. 4: The effect of cellobiose accumulation on the activity of a wild-type CBH I and a R268K/R411K variant CBH I based on percent conversion of glucan after 72 hours in the bagasse assay in the presence (+) and absence (−) of β-glucosidase (BG). Activity is normalized to wild type activity with BG (WT+=1).
FIG. 5: Characterization of cellobiose product tolerance of variant CBH I polypeptides, based on percent conversion of glucan after 72 hours in the absence and presence of β-glucosidase (BG) in the bagasse assay; tolerance is evaluated as a function of the ratio of activity in the absence vs. presence of β-glucosidase.
TABLE 1: Amino acid sequences of exemplary “reference” CBH I polypeptides that can be modified at positions corresponding to R268 and/or R411 in T. reesei CBH I (SEQ ID NO:2). The database accession numbers are indicated in the second column. Unless indicated otherwise, the accession numbers refer to the Genbank database. “#” indicates that the CBH I has no signal peptide; “&” indicate that the sequence is from the PDB database and represents the catalytic domain only without signal sequence; * indicates a nonpublic database. These amino acid sequences are mostly wild type, with the exception of some sequences from the PDB database which contain mutations to facilitate protein crystallization.
TABLE 2: Amino acid positions in the exemplary reference CBH I polypeptides that correspond to R268 and R411 in T. reesei CBH I. Database descriptors are as for Table 1.
TABLE 3: Approximate amino acid positions of CBH I polypeptide domains. Abbreviations used: SS is signal sequence; CD is catalytic domain; and CBD is cellulose binding domain. Database descriptors are as for Table 1.
TABLE 4: Table 4 shows a segment within the catalytic domain of each exemplary reference CBH I polypeptide containing the active site loop (shown in bold, underlined text) and the catalytic residues (glutamates in most CBH I polypeptides) (shown in bold, double underlined text). Database descriptors are as for Table 1.
TABLE 5: MUL and bagasse assay results for variants of BD29555. ND means not determined. ±% Activity (+/−cellobiose)=[(Activity with cellobiose)/(Activity without cellobiose)]*100. ¥ % Activity (−/+BG)=[(Activity without BG)/(Activity with BG)]*100]
TABLE 6: MUL and bagasse assay results for variants of T. reesei CBH I. ND means not determined. ±% Activity (+/−cellobiose)=[(Activity with cellobiose)/(Activity without cellobiose)]*100. ¥ % Activity (−/+BG)=[(Activity without BG)/(Activity with BG)]*100.
TABLE 7: Informal sequence listing. SEQ ID NO:1-149 correspond to the exemplary reference CBH I polypeptides. SEQ ID NO:299 corresponds to mature T. reesei CBH I (amino acids 26-529 of SEQ ID NO:2) with an R268A substitution. SEQ ID NO:300 corresponds to mature T. reesei CBH I (amino acids 26-529 of SEQ ID NO:2) with an R411A substitution. SEQ ID NO:301 corresponds to full length BD29555 with both an R268K substitution and an R411K substitution. SEQ ID NO:302 corresponds to mature BD29555 with both an R268K substitution and an R411K substitution.
DETAILED DESCRIPTION OF THE INVENTION The present disclosure relates to variant CBH I polypeptides. Most naturally occurring CBH I polypeptides have arginines at positions corresponding to R268 and R411 of T. reesei CBH I (SEQ ID NO:2). The variant CBH I polypeptides of the present disclosure include a substitution at either or both positions resulting in a reduction of product (e.g., cellobiose) inhibition. The following subsections describe in greater detail the variant CBH I polypeptides and exemplary methods of their production, exemplary cellulase compositions comprising them, and some industrial applications of the polypeptides and cellulase compositions.
Variant CBH I Polypeptides The present disclosure provides variant CBH I polypeptides comprising at least one amino acid substitution that results in reduced product inhibition. “Variant” means a polypeptide which is differs in sequence from a reference polypeptide by substitution of one or more amino acids at one or a number of different sites in the amino acid sequence. Exemplary reference CBH I polypeptides are shown in Table 1.
The variant CBH I polypeptides of the disclosure have an amino acid substitution at the amino acid position corresponding to R268 of T. reesei CBH I (SEQ ID NO:2) (an “R268 substitution”), (b) a substitution at the amino acid position corresponding to R411 of T. reesei CBH I (“R411 substitution”); or (c) both an R268 substitution and an R411 substitution, as compared to a reference CBH I polypeptide. It is noted that the R268 and R411 numbering is made by reference to the full length T. reesei CBH I, which includes a signal sequence that is generally absent from the mature enzyme. The corresponding numbering in the mature T. reesei CBH I (see, e.g., SEQ ID NO:4) is 8251 and R394, respectively.
Accordingly, the present disclosure provides variant CBH I polypeptides in which at least one of the amino acid positions corresponding to R268 and R411 of T. reesei CBH I, and optionally both the amino acid positions corresponding to R268 and R411 of T. reesei CBH I, is not an arginine.
The amino acid positions in the reference polypeptides of Table 1 that correspond to R268 and R411 in T. reesei CBH I are shown in Table 2. Amino acid positions in other CBH I polypeptides that correspond to R268 and R411 can be identified through alignment of their sequences with T. reesei CBH I using a sequence comparison algorithm. Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman, 1981, Adv. Appl. Math. 2:482-89; by the homology alignment algorithm of Needleman & Wunsch, 1970, J. Mol. Biol. 48:443-53; by the search for similarity method of Pearson & Lipman, 1988, Proc. Nat'l Acad. Sci. USA 85:2444-48, by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, Wis.), or by visual inspection.
The R268 and/or R411 substitutions are preferably selected from (a) R268K and R411K; (b) R268K and R411A; (c) R268A and R411K; (d) R268A and R411A; (e) R268A; (f) R268K; (g) R411A; and (h) R411K.
CBH I polypeptides belong to the glycosyl hydrolase family 7 (“GH7”). The glycosyl hydrolases of this family include endoglucanases and cellobiohydrolases (exoglucanases). The cellobiohydrolases act processively from the reducing ends of cellulose chains to generate cellobiose. Cellulases of bacterial and fungal origin characteristically have a small cellulose-binding domain (“CBD”) connected to either the N or the C terminus of the catalytic domain (“CD”) via a linker peptide (see Suumakki et al., 2000, Cellulose 7: 189-209). The CD contains the active site whereas the CBD interacts with cellulose by binding the enzyme to it (van Tilbeurgh et al., 1986, FEBS Lett. 204(2): 223-227; Tomme et al., 1988, Eur. J. Biochem. 170:575-581). The three-dimensional structure of the catalytic domain of T. reesei CBH I has been solved (Divne et al., 1994, Science 265:524-528). The CD consists of two β-sheets that pack face-to-face to form a β-sandwich. Most of the remaining amino acids in the CD are loops connecting the β-sheets. Some loops are elongated and bend around the active site, forming cellulose-binding tunnel of (˜50 Å). In contrast, endoglucanases have an open substrate binding cleft/groove rather than a tunnel. Typically, the catalytic residues are glutamic acids corresponding to E229 and E234 of T. reesei CBH I.
The loops characteristic of the active sites (“the active site loops”) of reference CBH I polypeptides, which are absent from GH7 family endoglucanases, as well as catalytic glutamate residues of the reference CBH I polypeptides, are shown in Table 4. The variant CBH I polypeptides of the disclosure preferably retain the catalytic glutamate residues or may include a glutamine instead at the position corresponding to E234, as for SEQ ID NO:4. In some embodiments, the variant CBH I polypeptides contain no substitutions or only conservative substitutions in the active site loops relative to the reference CBH I polypeptides from which the variants are derived.
Many CBH I polypeptides do not have a CBD, and most studies concerning the activity of cellulase domains on different substrates have been carried out with only the catalytic domains of CBH I polypeptides. Because CDs with cellobiohydrolase activity can be generated by limited proteolysis of mature CBH I by papain (see, e.g., Chen et al., 1993, Biochem. Mol. Biol. Int. 30(5):901-10), they are often referred to as “core” domains. Accordingly, a variant CBH I can include only the CD “core” of CBH I. Exemplary reference CDs comprise amino acid sequences corresponding to positions 26 to 455 of SEQ ID NO:1, positions 18 to 444 of SEQ ID NO:2, positions 26 to 455 of SEQ ID NO:3, positions 1 to 427 of SEQ ID NO:4, positions 24 to 457 of SEQ ID NO:5, positions 18 to 448 of SEQ ID NO:6, positions 27 to 460 of SEQ ID NO:7, positions 27 to 460 of SEQ ID NO:8, positions 20 to 449 of SEQ ID NO:9, positions 1 to 424 of SEQ ID NO:10, positions 18 to 447 of SEQ ID NO:11, positions 18 to 434 of SEQ ID NO:12, positions 18 to 445 of SEQ ID NO:13, positions 19 to 454 of SEQ ID NO:14, positions 19 to 443 of SEQ ID NO:15, positions 2 to 426 of SEQ ID NO:16, positions 23 to 446 of SEQ ID NO:17, positions 19 to 449 of SEQ ID NO:18, positions 23 to 446 of SEQ ID NO:19, positions 19 to 449 of SEQ ID NO:20, positions 2 to 416 of SEQ ID NO:21, positions 19 to 454 of SEQ ID NO:22, positions 19 to 447 of SEQ ID NO:23, positions 19 to 447 of SEQ ID NO:24, positions 20 to 443 of SEQ ID NO:25, positions 18 to 447 of SEQ ID NO:26, positions 19 to 442 of SEQ ID NO:27, positions 18 to 451 of SEQ ID NO:28, positions 23 to 446 of SEQ ID NO:29, positions 18 to 444 of SEQ ID NO:30, positions 18 to 451 of SEQ ID NO:31, positions 18 to 447 of SEQ ID NO:32, positions 19 to 449 of SEQ ID NO:33, positions 18 to 447 of SEQ ID NO:34, positions 26 to 459 of SEQ ID NO:35, positions 19 to 450 of SEQ ID NO:36, positions 19 to 453 of SEQ ID NO:37, positions 18 to 448 of SEQ ID NO:38, positions 19 to 443 of SEQ ID NO:39, positions 19 to 442 of SEQ ID NO:40, positions 18 to 444 of SEQ ID NO:41, positions 24 to 457 of SEQ ID NO:42, positions 18 to 449 of SEQ ID NO:43, positions 19 to 453 of SEQ ID NO:44, positions 26 to 456 of SEQ ID NO:45, positions 19 to 451 of SEQ ID NO:46, positions 18 to 443 of SEQ ID NO:47, positions 18 to 448 of SEQ ID NO:48, positions 19 to 451 of SEQ ID NO:49, positions 18 to 444 of SEQ ID NO:50, positions 2 to 419 of SEQ ID NO:51, positions 27 to 461 of SEQ ID NO:52, positions 21 to 445 of SEQ ID NO:53, positions 19 to 449 of SEQ ID NO:54, positions 19 to 448 of SEQ ID NO:55, positions 18 to 443 of SEQ ID NO:56, positions 20 to 443 of SEQ ID NO:57, positions 18 to 448 of SEQ ID NO:58, positions 18 to 447 of SEQ ID NO:59, positions 26 to 455 of SEQ ID NO:60, positions 19 to 449 of SEQ ID NO:61, positions 19 to 449 of SEQ ID NO:62, positions 26 to 460 of SEQ ID NO:63, positions 18 to 448 of SEQ ID NO:64, positions 19 to 451 of SEQ ID NO:65, positions 19 to 447 of SEQ ID NO:66, positions 1 to 424 of SEQ ID NO:67, positions 19 to 448 of SEQ ID NO:68, positions 19 to 443 of SEQ ID NO:69, positions 23 to 447 of SEQ ID NO:70, positions 17 to 448 of SEQ ID NO:71, positions 19 to 449 of SEQ ID NO:72, positions 18 to 444 of SEQ ID NO:73, positions 23 to 458 of SEQ ID NO:74, positions 20 to 452 of SEQ ID NO:75, positions 18 to 435 of SEQ ID NO:76, positions 18 to 446 of SEQ ID NO:77, positions 22 to 457 of SEQ ID NO:78, positions 18 to 448 of SEQ ID NO:79, positions 1 to 431 of SEQ ID NO:80, positions 19 to 453 of SEQ ID NO:81, positions 21 to 440 of SEQ ID NO:82, positions 19 to 442 of SEQ ID NO:83, positions 18 to 448 of SEQ ID NO:84, positions 17 to 446 of SEQ ID NO:85, positions 18 to 447 of SEQ ID NO:86, positions 18 to 443 of SEQ ID NO:87, positions 23 to 448 of SEQ ID NO:88, positions 18 to 451 of SEQ ID NO:89, positions 21 to 447 of SEQ ID NO:90, positions 18 to 444 of SEQ ID NO:91, positions 19 to 442 of SEQ ID NO:92, positions 20 to 436 of SEQ ID NO:93, positions 18 to 450 of SEQ ID NO:94, positions 22 to 453 of SEQ ID NO:95, positions 16 to 472 of SEQ ID NO:96, positions 21 to 445 of SEQ ID NO:97, positions 19 to 447 of SEQ ID NO:98, positions 19 to 450 of SEQ ID NO:99, positions 19 to 451 of SEQ ID NO:100, positions 18 to 448 of SEQ ID NO:101, positions 19 to 442 of SEQ ID NO:102, positions 20 to 457 of SEQ ID NO:103, positions 19 to 454 of SEQ ID NO:104, positions 18 to 440 of SEQ ID NO:105, positions 18 to 439 of SEQ ID NO:106, positions 27 to 460 of SEQ ID NO:107, positions 23 to 446 of SEQ ID NO:108, positions 17 to 446 of SEQ ID NO:109, positions 21 to 447 of SEQ ID NO:110, positions 19 to 447 of SEQ ID NO:111, positions 18 to 449 of SEQ ID NO:112, positions 22 to 457 of SEQ ID NO:113, positions 18 to 445 of SEQ ID NO:114, positions 18 to 448 of SEQ ID NO:115, positions 18 to 448 of SEQ ID NO:116, positions 23 to 435 of SEQ ID NO:117, positions 21 to 442 of SEQ ID NO:118, positions 23 to 435 of SEQ ID NO:119, positions 20 to 445 of SEQ ID NO:120, positions 21 to 443 of SEQ ID NO:121, positions 20 to 445 of SEQ ID NO:122, positions 23 to 443 of SEQ ID NO:123, positions 20 to 445 of SEQ ID NO:124, positions 21 to 435 of SEQ ID NO:125, positions 20 to 437 of SEQ ID NO:126, positions 21 to 442 of SEQ ID NO:127, positions 23 to 434 of SEQ ID NO:128, positions 20 to 444 of SEQ ID NO:129, positions 21 to 435 of SEQ ID NO:130, positions 20 to 445 of SEQ ID NO:131, positions 21 to 446 of SEQ ID NO:132, positions 21 to 435 of SEQ ID NO:133, positions 22 to 448 of SEQ ID NO:134, positions 23 to 433 of SEQ ID NO:135, positions 23 to 434 of SEQ ID NO:136, positions 23 to 435 of SEQ ID NO:137, positions 23 to 435 of SEQ ID NO:138, positions 20 to 445 of SEQ ID NO:139, positions 20 to 437 of SEQ ID NO:140, positions 21 to 435 of SEQ ID NO:141, positions 20 to 437 of SEQ ID NO:142, positions 21 to 435 of SEQ ID NO:143, positions 26 to 435 of SEQ ID NO:144, positions 23 to 435 of SEQ ID NO:145, positions 24 to 443 of SEQ ID NO:146, positions 20 to 445 of SEQ ID NO:147, positions 21 to 441 of SEQ ID NO:148, and positions 20 to 437 of SEQ ID NO:149.
The CBDs are particularly involved in the hydrolysis of crystalline cellulose. It has been shown that the ability of cellobiohydrolases to degrade crystalline cellulose decreases when the CBD is absent (Linder and Teeri, 1997, Journal of Biotechnol. 57:15-28). The variant CBH I polypeptides of the disclosure can further include a CBD. Exemplary CBDs comprise amino acid sequences corresponding to positions 494 to 529 of SEQ ID NO:1, positions 480 to 514 of SEQ ID NO:2, positions 494 to 529 of SEQ ID NO:3, positions 491 to 526 of SEQ ID NO:5, positions 477 to 512 of SEQ ID NO:6, positions 497 to 532 of SEQ ID NO:7, positions 504 to 539 of SEQ ID NO:8, positions 486 to 521 of SEQ ID NO:13, positions 556 to 596 of SEQ ID NO:15, positions 490 to 525 of SEQ ID NO:18, positions 495 to 530 of SEQ ID NO:20, positions 471 to 506 of SEQ ID NO:23, positions 481 to 516 of SEQ ID NO:27, positions 480 to 514 of SEQ ID NO:30, positions 495 to 529 of SEQ ID NO:35, positions 493 to 528 of SEQ ID NO:36, positions 477 to 512 of SEQ ID NO:38, positions 547 to 586 of SEQ ID NO:39, positions 475 to 510 of SEQ ID NO:40, positions 479 to 513 of SEQ ID NO:41, positions 506 to 541 of SEQ ID NO:42, positions 481 to 516 of SEQ ID NO:43, positions 503 to 537 of SEQ ID NO:45, positions 488 to 523 of SEQ ID NO:46, positions 476 to 511 of SEQ ID NO:48, positions 488 to 523 of SEQ ID NO:49, positions 479 to 513 of SEQ ID NO:50, positions 500 to 535 of SEQ ID NO:52, positions 493 to 528 of SEQ ID NO:55, positions 479 to 514 of SEQ ID NO:58, positions 494 to 529 of SEQ ID NO:60, positions 490 to 525 of SEQ ID NO:61, positions 497 to 532 of SEQ ID NO:62, positions 475 to 510 of SEQ ID NO:64, positions 477 to 512 of SEQ ID NO:65, positions 486 to 521 of SEQ ID NO:66, positions 470 to 505 of SEQ ID NO:67, positions 491 to 526 of SEQ ID NO:68, positions 476 to 511 of SEQ ID NO:69, positions 480 to 514 of SEQ ID NO:73, positions 506 to 540 of SEQ ID NO:74, positions 471 to 504 of SEQ ID NO:76, positions 501 to 536 of SEQ ID NO:78, positions 473 to 508 of SEQ ID NO:79, positions 481 to 516 of SEQ ID NO:83, positions 488 to 523 of SEQ ID NO:86, positions 475 to 510 of SEQ ID NO:92, positions 468 to 504 of SEQ ID NO:93, positions 501 to 536 of SEQ ID NO:96, positions 482 to 517 of SEQ ID NO:98, positions 481 to 516 of SEQ ID NO:99, positions 488 to 523 of SEQ ID NO:100, positions 472 to 507 of SEQ ID NO:101, positions 481 to 516 of SEQ ID NO:102, positions 471 to 505 of SEQ ID NO:105, positions 481 to 516 of SEQ ID NO:106, positions 495 to 530 of SEQ ID NO:107, positions 488 to 523 of SEQ ID NO:111, positions 478 to 513 of SEQ ID NO:112, positions 501 to 536 of SEQ ID NO:113, positions 491 to 526 of SEQ ID NO:115, and positions 503 to 538 of SEQ ID NO:116.
The CD and CBD are often connected via a linker. Exemplary linker sequences correspond to positions 456 to 493 of SEQ ID NO:1, positions 445 to 479 of SEQ ID NO:2, positions 456 to 493 of SEQ ID NO:3, positions 458 to 490 of SEQ ID NO:5, positions 449 to 476 of SEQ ID NO:6, positions 461 to 496 of SEQ ID NO:7, positions 461 to 503 of SEQ ID NO:8, positions 446 to 485 of SEQ ID NO:13, positions 444 to 555 of SEQ ID NO:15, positions 450 to 489 of SEQ ID NO:18, positions 450 to 494 of SEQ ID NO:20, positions 448 to 470 of SEQ ID NO:23, positions 443 to 480 of SEQ ID NO:27, positions 445 to 479 of SEQ ID NO:30, positions 460 to 494 of SEQ ID NO:35, positions 451 to 492 of SEQ ID NO:36, positions 449 to 476 of SEQ ID NO:38, positions 444 to 546 of SEQ ID NO:39, positions 443 to 474 of SEQ ID NO:40, positions 445 to 478 of SEQ ID NO:41, positions 458 to 505 of SEQ ID NO:42, positions 450 to 480 of SEQ ID NO:43, positions 457 to 502 of SEQ ID NO:45, positions 452 to 487 of SEQ ID NO:46, positions 449 to 475 of SEQ ID NO:48, positions 452 to 487 of SEQ ID NO:49, positions 445 to 478 of SEQ ID NO:50, positions 462 to 499 of SEQ ID NO:52, positions 449 to 492 of SEQ ID NO:55, positions 449 to 478 of SEQ ID NO:58, positions 456 to 493 of SEQ ID NO:60, positions 450 to 489 of SEQ ID NO:61, positions 450 to 496 of SEQ ID NO:62, positions 449 to 474 of SEQ ID NO:64, positions 452 to 476 of SEQ ID NO:65, positions 448 to 485 of SEQ ID NO:66, positions 425 to 469 of SEQ ID NO:67, positions 449 to 490 of SEQ ID NO:68, positions 444 to 475 of SEQ ID NO:69, positions 445 to 479 of SEQ ID NO:73, positions 459 to 505 of SEQ ID NO:74, positions 436 to 470 of SEQ ID NO:76, positions 458 to 500 of SEQ ID NO:78, positions 449 to 472 of SEQ ID NO:79, positions 443 to 480 of SEQ ID NO:83, positions 448 to 487 of SEQ ID NO:86, positions 443 to 474 of SEQ ID NO:92, positions 437 to 467 of SEQ ID NO:93, positions 473 to 500 of SEQ ID NO:96, positions 448 to 481 of SEQ ID NO:98, positions 451 to 480 of SEQ ID NO:99, positions 452 to 487 of SEQ ID NO:100, positions 449 to 471 of SEQ ID NO:101, positions 443 to 480 of SEQ ID NO:102, positions 441 to 470 of SEQ ID NO:105, positions 440 to 480 of SEQ ID NO:106, positions 461 to 494 of SEQ ID NO:107, positions 448 to 487 of SEQ ID NO:111, positions 450 to 478 of SEQ ID NO:112, positions 458 to 500 of SEQ ID NO:113, positions 449 to 490 of SEQ ID NO:115, and positions 449 to 502 of SEQ ID NO:116.
Because CBH I polypeptides are modular, the CBDs, CDs and linkers of different CBH I polypeptides, such as the exemplary CBH I polypeptides of Table 1, can be used interchangeably. However, in a preferred embodiment, the CBDs, CDs and linkers of a variant CBH I of the disclosure originate from the same polypeptide.
The variant CBH I polypeptides of the disclosure preferably have at least a two-fold reduction of product inhibition, such that cellobiose has an IC50 towards the variant CBH I that is at least 2-fold the IC50 of the corresponding reference CBH I, e.g., CBH I lacking the R268 substitution and/or R411 substitution. More preferably the IC50 of cellobiose towards the variant CBH I is at least 3-fold, at least 5-fold, at least 8-fold, at least 10-fold, at least 12-fold or at least 15-fold the IC50 of the corresponding reference CBH I. In specific embodiments the IC50 of cellobiose towards the variant CBH I is ranges from 2-fold to 15-fold, from 2-fold to 10-fold, from 3-fold to 10-fold, from 5-fold to 12-fold, from 4-fold to 12-fold, from 5-fold to 10-fold, from 5-fold to 12-fold, from 2-fold to 8-fold, or from 8-fold to 20-fold the IC50 of the corresponding reference CBH I. The IC50 can be determined in a phosphoric acid swollen cellulose (“PASC”) assay (Du et al., 2010, Applied Biochemistry and Biotechnology 161:313-317) or a methylumbelliferyl lactoside (“MUL”) assay (van Tilbeurgh and Claeyssens, 1985, FEBS Letts. 187(2):283-288), as exemplified in the Examples below.
The variant CBH I polypeptides of the disclosure preferably have a cellobiohydrolase activity that is at least 30% the cellobiohydrolase activity of the corresponding reference CBH I, e.g., CBH I lacking the R268 substitution and/or R411 substitution. More preferably, the cellobiohydrolase activity of the variant CBH I is at least 40%, at least 50%, at least 60% or at least 70% the cellobiohydrolase activity of the corresponding reference CBH I. In specific embodiments the IC50 cellobiohydrolase activity of the variant CBH I is ranges from 30% to 80%, from 40% to 70%, 30% to 60%, from 50% to 80% or from 60% to 80% of the cellobiohydrolase activity of the corresponding reference CBH I. Assays for cellobiohydrolase activity are described, for example, in Becker et al., 2011, Biochem J. 356:19-30 and Mitsuishi et al., 1990, FEBS Letts. 275:135-138, each of which is expressly incorporated by reference herein. The ability of CBH I to hydrolyze isolated soluble and insoluble substrates can also be measured using assays described in Srisodsuk et al., 1997, J. Biotech. 57:4957 and Nidetzky and Claeyssens, 1994, Biotech. Bioeng. 44:961-966. Substrates useful for assaying cellobiohydrolase activity include crystalline cellulose, filter paper, phosphoric acid swollen cellulose, cellooligosaccharides, methylumbelliferyl lactoside, methylumbelliferyl cellobioside, orthonitrophenyl lactoside, paranitrophenyl lactoside, orthonitrophenyl cellobioside, paranitrophenyl cellobioside. Cellobiohydrolase activity can be measured in an assay utilizing PASC as the substrate and a calcofluor white detection method (Du et al., 2010, Applied Biochemistry and Biotechnology 161:313-317). PASC can be prepared as described by Walseth, 1952, TAPPI 35:228-235 and Wood, 1971, Biochem. J. 121:353-362.
Other than said R268 and/or R411 substitution, the variant CBH I polypeptides of the disclosure preferably:
-
- comprise an amino acid sequence having at least 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, or complete (100%) sequence identity to a CD of a reference CBH I exemplified in Table 1 (i.e., a CD comprising an amino acid sequence corresponding to positions 26 to 455 of SEQ ID NO:1, positions 18 to 444 of SEQ ID NO:2, positions 26 to 455 of SEQ ID NO:3, positions 1 to 427 of SEQ ID NO:4, positions 24 to 457 of SEQ ID NO:5, positions 18 to 448 of SEQ ID NO:6, positions 27 to 460 of SEQ ID NO:7, positions 27 to 460 of SEQ ID NO:8, positions 20 to 449 of SEQ ID NO:9, positions 1 to 424 of SEQ ID NO:10, positions 18 to 447 of SEQ ID NO:11, positions 18 to 434 of SEQ ID NO:12, positions 18 to 445 of SEQ ID NO:13, positions 19 to 454 of SEQ ID NO:14, positions 19 to 443 of SEQ ID NO:15, positions 2 to 426 of SEQ ID NO:16, positions 23 to 446 of SEQ ID NO:17, positions 19 to 449 of SEQ ID NO:18, positions 23 to 446 of SEQ ID NO:19, positions 19 to 449 of SEQ ID NO:20, positions 2 to 416 of SEQ ID NO:21, positions 19 to 454 of SEQ ID NO:22, positions 19 to 447 of SEQ ID NO:23, positions 19 to 447 of SEQ ID NO:24, positions 20 to 443 of SEQ ID NO:25, positions 18 to 447 of SEQ ID NO:26, positions 19 to 442 of SEQ ID NO:27, positions 18 to 451 of SEQ ID NO:28, positions 23 to 446 of SEQ ID NO:29, positions 18 to 444 of SEQ ID NO:30, positions 18 to 451 of SEQ ID NO:31, positions 18 to 447 of SEQ ID NO:32, positions 19 to 449 of SEQ ID NO:33, positions 18 to 447 of SEQ ID NO:34, positions 26 to 459 of SEQ ID NO:35, positions 19 to 450 of SEQ ID NO:36, positions 19 to 453 of SEQ ID NO:37, positions 18 to 448 of SEQ ID NO:38, positions 19 to 443 of SEQ ID NO:39, positions 19 to 442 of SEQ ID NO:40, positions 18 to 444 of SEQ ID NO:41, positions 24 to 457 of SEQ ID NO:42, positions 18 to 449 of SEQ ID NO:43, positions 19 to 453 of SEQ ID NO:44, positions 26 to 456 of SEQ ID NO:45, positions 19 to 451 of SEQ ID NO:46, positions 18 to 443 of SEQ ID NO:47, positions 18 to 448 of SEQ ID NO:48, positions 19 to 451 of SEQ ID NO:49, positions 18 to 444 of SEQ ID NO:50, positions 2 to 419 of SEQ ID NO:51, positions 27 to 461 of SEQ ID NO:52, positions 21 to 445 of SEQ ID NO:53, positions 19 to 449 of SEQ ID NO:54, positions 19 to 448 of SEQ ID NO:55, positions 18 to 443 of SEQ ID NO:56, positions 20 to 443 of SEQ ID NO:57, positions 18 to 448 of SEQ ID NO:58, positions 18 to 447 of SEQ ID NO:59, positions 26 to 455 of SEQ ID NO:60, positions 19 to 449 of SEQ ID NO:61, positions 19 to 449 of SEQ ID NO:62, positions 26 to 460 of SEQ ID NO:63, positions 18 to 448 of SEQ ID NO:64, positions 19 to 451 of SEQ ID NO:65, positions 19 to 447 of SEQ ID NO:66, positions 1 to 424 of SEQ ID NO:67, positions 19 to 448 of SEQ ID NO:68, positions 19 to 443 of SEQ ID NO:69, positions 23 to 447 of SEQ ID NO:70, positions 17 to 448 of SEQ ID NO:71, positions 19 to 449 of SEQ ID NO:72, positions 18 to 444 of SEQ ID NO:73, positions 23 to 458 of SEQ ID NO:74, positions 20 to 452 of SEQ ID NO:75, positions 18 to 435 of SEQ ID NO:76, positions 18 to 446 of SEQ ID NO:77, positions 22 to 457 of SEQ ID NO:78, positions 18 to 448 of SEQ ID NO:79, positions 1 to 431 of SEQ ID NO:80, positions 19 to 453 of SEQ ID NO:81, positions 21 to 440 of SEQ ID NO:82, positions 19 to 442 of SEQ ID NO:83, positions 18 to 448 of SEQ ID NO:84, positions 17 to 446 of SEQ ID NO:85, positions 18 to 447 of SEQ ID NO:86, positions 18 to 443 of SEQ ID NO:87, positions 23 to 448 of SEQ ID NO:88, positions 18 to 451 of SEQ ID NO:89, positions 21 to 447 of SEQ ID NO:90, positions 18 to 444 of SEQ ID NO:91, positions 19 to 442 of SEQ ID NO:92, positions 20 to 436 of SEQ ID NO:93, positions 18 to 450 of SEQ ID NO:94, positions 22 to 453 of SEQ ID NO:95, positions 16 to 472 of SEQ ID NO:96, positions 21 to 445 of SEQ ID NO:97, positions 19 to 447 of SEQ ID NO:98, positions 19 to 450 of SEQ ID NO:99, positions 19 to 451 of SEQ ID NO:100, positions 18 to 448 of SEQ ID NO:101, positions 19 to 442 of SEQ ID NO:102, positions 20 to 457 of SEQ ID NO:103, positions 19 to 454 of SEQ ID NO:104, positions 18 to 440 of SEQ ID NO:105, positions 18 to 439 of SEQ ID NO:106, positions 27 to 460 of SEQ ID NO:107, positions 23 to 446 of SEQ ID NO:108, positions 17 to 446 of SEQ ID NO:109, positions 21 to 447 of SEQ ID NO:110, positions 19 to 447 of SEQ ID NO:111, positions 18 to 449 of SEQ ID NO:112, positions 22 to 457 of SEQ ID NO:113, positions 18 to 445 of SEQ ID NO:114, positions 18 to 448 of SEQ ID NO:115, positions 18 to 448 of SEQ ID NO:116, positions 23 to 435 of SEQ ID NO:117, positions 21 to 442 of SEQ ID NO:118, positions 23 to 435 of SEQ ID NO:119, positions 20 to 445 of SEQ ID NO:120, positions 21 to 443 of SEQ ID NO:121, positions 20 to 445 of SEQ ID NO:122, positions 23 to 443 of SEQ ID NO:123, positions 20 to 445 of SEQ ID NO:124, positions 21 to 435 of SEQ ID NO:125, positions 20 to 437 of SEQ ID NO:126, positions 21 to 442 of SEQ ID NO:127, positions 23 to 434 of SEQ ID NO:128, positions 20 to 444 of SEQ ID NO:129, positions 21 to 435 of SEQ ID NO:130, positions 20 to 445 of SEQ ID NO:131, positions 21 to 446 of SEQ ID NO:132, positions 21 to 435 of SEQ ID NO:133, positions 22 to 448 of SEQ ID NO:134, positions 23 to 433 of SEQ ID NO:135, positions 23 to 434 of SEQ ID NO:136, positions 23 to 435 of SEQ ID NO:137, positions 23 to 435 of SEQ ID NO:138, positions 20 to 445 of SEQ ID NO:139, positions 20 to 437 of SEQ ID NO:140, positions 21 to 435 of SEQ ID NO:141, positions 20 to 437 of SEQ ID NO:142, positions 21 to 435 of SEQ ID NO:143, positions 26 to 435 of SEQ ID NO:144, positions 23 to 435 of SEQ ID NO:145, positions 24 to 443 of SEQ ID NO:146, positions 20 to 445 of SEQ ID NO:147, positions 21 to 441 of SEQ ID NO:148, and positions 20 to 437 of SEQ ID NO:149 (preferably the CD corresponding to positions 26-455 of SEQ ID NO:1 or 18-444 of SEQ ID NO:2); and/or
- comprise an amino acid sequence having at least 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, or complete (100%) sequence identity to a mature polypeptide of a reference CBH I exemplified in Table 1 (i.e., a mature protein comprising an amino acid sequence corresponding to positions 26 to 529 of SEQ ID NO:1, positions 18 to 514 of SEQ ID NO:2, positions 26 to 529 of SEQ ID NO:3, positions 1 to 427 of SEQ ID NO:4, positions 24 to 526 of SEQ ID NO:5, positions 18 to 512 of SEQ ID NO:6, positions 27 to 532 of SEQ ID NO:7, positions 27 to 539 of SEQ ID NO:8, positions 20 to 449 of SEQ ID NO:9, positions 1 to 424 of SEQ ID NO:10, positions 18 to 447 of SEQ ID NO:11, positions 18 to 434 of SEQ ID NO:12, positions 18 to 521 of SEQ ID NO:13, positions 19 to 454 of SEQ ID NO:14, positions 19 to 596 of SEQ ID NO:15, positions 2 to 426 of SEQ ID NO:16, positions 23 to 446 of SEQ ID NO:17, positions 19 to 525 of SEQ ID NO:18, positions 23 to 446 of SEQ ID NO:19, positions 19 to 530 of SEQ ID NO:20, positions 2 to 416 of SEQ ID NO:21, positions 19 to 454 of SEQ ID NO:22, positions 19 to 506 of SEQ ID NO:23, positions 19 to 447 of SEQ ID NO:24, positions 20 to 443 of SEQ ID NO:25, positions 18 to 447 of SEQ ID NO:26, positions 19 to 516 of SEQ ID NO:27, positions 18 to 451 of SEQ ID NO:28, positions 23 to 446 of SEQ ID NO:29, positions 18 to 514 of SEQ ID NO:30, positions 18 to 451 of SEQ ID NO:31, positions 18 to 447 of SEQ ID NO:32, positions 19 to 449 of SEQ ID NO:33, positions 18 to 447 of SEQ ID NO:34, positions 26 to 529 of SEQ ID NO:35, positions 19 to 528 of SEQ ID NO:36, positions 19 to 453 of SEQ ID NO:37, positions 18 to 512 of SEQ ID NO:38, positions 19 to 586 of SEQ ID NO:39, positions 19 to 510 of SEQ ID NO:40, positions 18 to 513 of SEQ ID NO:41, positions 24 to 541 of SEQ ID NO:42, positions 18 to 516 of SEQ ID NO:43, positions 19 to 453 of SEQ ID NO:44, positions 26 to 537 of SEQ ID NO:45, positions 19 to 523 of SEQ ID NO:46, positions 18 to 443 of SEQ ID NO:47, positions 18 to 511 of SEQ ID NO:48, positions 19 to 523 of SEQ ID NO:49, positions 18 to 513 of SEQ ID NO:50, positions 2 to 419 of SEQ ID NO:51, positions 27 to 535 of SEQ ID NO:52, positions 21 to 445 of SEQ ID NO:53, positions 19 to 449 of SEQ ID NO:54, positions 19 to 528 of SEQ ID NO:55, positions 18 to 443 of SEQ ID NO:56, positions 20 to 443 of SEQ ID NO:57, positions 18 to 514 of SEQ ID NO:58, positions 18 to 447 of SEQ ID NO:59, positions 26 to 529 of SEQ ID NO:60, positions 19 to 525 of SEQ ID NO:61, positions 19 to 532 of SEQ ID NO:62, positions 26 to 460 of SEQ ID NO:63, positions 18 to 510 of SEQ ID NO:64, positions 19 to 512 of SEQ ID NO:65, positions 19 to 521 of SEQ ID NO:66, positions 1 to 505 of SEQ ID NO:67, positions 19 to 526 of SEQ ID NO:68, positions 19 to 511 of SEQ ID NO:69, positions 23 to 447 of SEQ ID NO:70, positions 17 to 448 of SEQ ID NO:71, positions 19 to 449 of SEQ ID NO:72, positions 18 to 514 of SEQ ID NO:73, positions 23 to 540 of SEQ ID NO:74, positions 20 to 452 of SEQ ID NO:75, positions 18 to 504 of SEQ ID NO:76, positions 18 to 446 of SEQ ID NO:77, positions 22 to 536 of SEQ ID NO:78, positions 18 to 508 of SEQ ID NO:79, positions 1 to 431 of SEQ ID NO:80, positions 19 to 453 of SEQ ID NO:81, positions 21 to 440 of SEQ ID NO:82, positions 19 to 516 of SEQ ID NO:83, positions 18 to 448 of SEQ ID NO:84, positions 17 to 446 of SEQ ID NO:85, positions 18 to 523 of SEQ ID NO:86, positions 18 to 443 of SEQ ID NO:87, positions 23 to 448 of SEQ ID NO:88, positions 18 to 451 of SEQ ID NO:89, positions 21 to 447 of SEQ ID NO:90, positions 18 to 444 of SEQ ID NO:91, positions 19 to 510 of SEQ ID NO:92, positions 20 to 504 of SEQ ID NO:93, positions 18 to 450 of SEQ ID NO:94, positions 22 to 453 of SEQ ID NO:95, positions 16 to 536 of SEQ ID NO:96, positions 21 to 445 of SEQ ID NO:97, positions 19 to 517 of SEQ ID NO:98, positions 19 to 516 of SEQ ID NO:99, positions 19 to 523 of SEQ ID NO:100, positions 18 to 507 of SEQ ID NO:101, positions 19 to 516 of SEQ ID NO:102, positions 20 to 457 of SEQ ID NO:103, positions 19 to 454 of SEQ ID NO:104, positions 18 to 505 of SEQ ID NO:105, positions 18 to 516 of SEQ ID NO:106, positions 27 to 530 of SEQ ID NO:107, positions 23 to 446 of SEQ ID NO:108, positions 17 to 446 of SEQ ID NO:109, positions 21 to 447 of SEQ ID NO:110, positions 19 to 523 of SEQ ID NO:111, positions 18 to 513 of SEQ ID NO:112, positions 22 to 536 of SEQ ID NO:113, positions 18 to 445 of SEQ ID NO:114, positions 18 to 526 of SEQ ID NO:115, positions 18 to 538 of SEQ ID NO:116, positions 23 to 435 of SEQ ID NO:117, positions 21 to 442 of SEQ ID NO:118, positions 23 to 435 of SEQ ID NO:119, positions 20 to 445 of SEQ ID NO:120, positions 21 to 443 of SEQ ID NO:121, positions 20 to 445 of SEQ ID NO:122, positions 23 to 443 of SEQ ID NO:123, positions 20 to 445 of SEQ ID NO:124, positions 21 to 435 of SEQ ID NO:125, positions 20 to 437 of SEQ ID NO:126, positions 21 to 442 of SEQ ID NO:127, positions 23 to 434 of SEQ ID NO:128, positions 20 to 444 of SEQ ID NO:129, positions 21 to 435 of SEQ ID NO:130, positions 20 to 445 of SEQ ID NO:131, positions 21 to 446 of SEQ ID NO:132, positions 21 to 435 of SEQ ID NO:133, positions 22 to 448 of SEQ ID NO:134, positions 23 to 433 of SEQ ID NO:135, positions 23 to 434 of SEQ ID NO:136, positions 23 to 435 of SEQ ID NO:137, positions 23 to 435 of SEQ ID NO:138, positions 20 to 445, of SEQ ID NO:139, positions 20 to 437 of SEQ ID NO:140, positions 21 to 435 of SEQ ID NO:141, positions 20 to 437 of SEQ ID NO:142, positions 21 to 435 of SEQ ID NO:143, positions 26 to 435 of SEQ ID NO:144, positions 23 to 435 of SEQ ID NO:145, positions 24 to 443 of SEQ ID NO:146, positions 20 to 445 of SEQ ID NO:147, positions 21 to 441 of SEQ ID NO:148, and positions 20 to 437 of SEQ ID NO:149, preferably the mature polypeptide corresponding to positions 26-529 of SEQ ID NO:1 or 18-514 of SEQ ID NO:2).
An example of an algorithm that is suitable for determining sequence similarity is the BLAST algorithm, which is described in Altschul et al., 1990, J. Mol. Biol. 215:403-410. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information. This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence that either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. These initial neighborhood word hits act as starting points to find longer HSPs containing them. The word hits are expanded in both directions along each of the two sequences being compared for as far as the cumulative alignment score can be increased. Extension of the word hits is stopped when: the cumulative alignment score falls off by the quantity X from a maximum achieved value; the cumulative score goes to zero or below; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLAST program uses as defaults a word length (W) of 11, the BLOSUM62 scoring matrix (see Henikoff & Henikoff, 1992, Proc. Nat'l. Acad. Sci. USA 89:10915-10919) alignments (B) of 50, expectation (E) of 10, M'S, N′-4, and a comparison of both strands.
Most CBH I polypeptides are secreted and are therefore expressed with a signal sequence that is cleaved upon secretion of the polypeptide from the cell. Accordingly, in certain aspects, the variant CBH I polypeptides of the disclosure further include a signal sequence. Exemplary signal sequences comprise amino acid sequences corresponding to positions 1 to 25 of SEQ ID NO:1, positions 1 to 17 of SEQ ID NO:2, positions 1 to 25 of SEQ ID NO:3, positions 1 to 23 of SEQ ID NO:5, positions 1 to 17 of SEQ ID NO:6, positions 1 to 26 of SEQ ID NO:7, positions 1 to 27 of SEQ ID NO:8, positions 1 to 19 of SEQ ID NO:9, positions 1 to 17 of SEQ ID NO:11, positions 1 to 17 of SEQ ID NO:12, positions 1 to 17 of SEQ ID NO:13, positions 1 to 18 of SEQ ID NO:14, positions 1 to 18 of SEQ ID NO:15, positions 1 to 22 of SEQ ID NO:17, positions 1 to 18 of SEQ ID NO:18, positions 1 to 22 of SEQ ID NO:19, positions 1 to 18 of SEQ ID NO:20, positions 1 to 18 of SEQ ID NO:22, positions 1 to 18 of SEQ ID NO:23, positions 1 to 18 of SEQ ID NO:24, positions 1 to 19 of SEQ ID NO:25, positions 1 to 17 of SEQ ID NO:26, positions 1 to 18 of SEQ ID NO:27, positions 1 to 17 of SEQ ID NO:28, positions 1 to 22 of SEQ ID NO:29, positions 1 to 18 of SEQ ID NO:30, positions 1 to 17 of SEQ ID NO:31, positions 1 to 17 of SEQ ID NO:32, positions 1 to 18 of SEQ ID NO:33, positions 1 to 17 of SEQ ID NO:34, positions 1 to 25 of SEQ ID NO:35, positions 1 to 18 of SEQ ID NO:36, positions 1 to 18 of SEQ ID NO:37, positions 1 to 17 of SEQ ID NO:38, positions 1 to 18 of SEQ ID NO:39, positions 1 to 18 of SEQ ID NO:40, positions 1 to 17 of SEQ ID NO:41, positions 1 to 23 of SEQ ID NO:42, positions 1 to 17 of SEQ ID NO:43, positions 1 to 18 of SEQ ID NO:44, positions 1 to 25 of SEQ ID NO:45, positions 1 to 18 of SEQ ID NO:46, positions 1 to 17 of SEQ ID NO:47, positions 1 to 17 of SEQ ID NO:48, positions 1 to 18 of SEQ ID NO:49, positions 1 to 17 of SEQ ID NO:50, positions 1 to 26 of SEQ ID NO:52, positions 1 to 20 of SEQ ID NO:53, positions 1 to 18 of SEQ ID NO:54, positions 1 to 18 of SEQ ID NO:55, positions 1 to 17 of SEQ ID NO:56, positions 1 to 19 of SEQ ID NO:57, positions 1 to 17 of SEQ ID NO:58, positions 1 to 17 of SEQ ID NO:59, positions 1 to 25 of SEQ ID NO:60, positions 1 to 18 of SEQ ID NO:61, positions 1 to 18 of SEQ ID NO:62, positions 1 to 25 of SEQ ID NO:63, positions 1 to 17 of SEQ ID NO:64, positions 1 to 18 of SEQ ID NO:65, positions 1 to 18 of SEQ ID NO:66, positions 1 to 18 of SEQ ID NO:68, positions 1 to 18 of SEQ ID NO:69, positions 1 to 23 of SEQ ID NO:70, positions 1 to 17 of SEQ ID NO:71, positions 1 to 18 of SEQ ID NO:72, positions 1 to 17 of SEQ ID NO:73, positions 1 to 22 of SEQ ID NO:74, positions 1 to 19 of SEQ ID NO:75, positions 1 to 17 of SEQ ID NO:76, positions 1 to 17 of SEQ ID NO:77, positions 1 to 21 of SEQ ID NO:78, positions 1 to 18 of SEQ ID NO:79, positions 1 to 18 of SEQ ID NO:81, positions 1 to 20 of SEQ ID NO:82, positions 1 to 18 of SEQ ID NO:83, positions 1 to 17 of SEQ ID NO:84, positions 1 to 16 of SEQ ID NO:85, positions 1 to 17 of SEQ ID NO:86, positions 1 to 17 of SEQ ID NO:87, positions 1 to 22 of SEQ ID NO:88, positions 1 to 17 of SEQ ID NO:89, positions 1 to 20 of SEQ ID NO:90, positions 1 to 17 of SEQ ID NO:91, positions 1 to 18 of SEQ ID NO:92, positions 1 to 19 of SEQ ID NO:93, positions 1 to 17 of SEQ ID NO:94, positions 1 to 21 of SEQ ID NO:95, positions 1 to 15 of SEQ ID NO:96, positions 1 to 20 of SEQ ID NO:97, positions 1 to 18 of SEQ ID NO:98, positions 1 to 18 of SEQ ID NO:99, positions 1 to 18 of SEQ ID NO:100, positions 1 to 17 of SEQ ID NO:101, positions 1 to 18 of SEQ ID NO:102, positions 1 to 19 of SEQ ID NO:103, positions 1 to 18 of SEQ ID NO:104, positions 1 to 17 of SEQ ID NO:105, positions 1 to 17 of SEQ ID NO:106, positions 1 to 26 of SEQ ID NO:107, positions 1 to 22 of SEQ ID NO:108, positions 1 to 16 of SEQ ID NO:109, positions 1 to 20 of SEQ ID NO:110, positions 1 to 18 of SEQ ID NO:111, positions 1 to 17 of SEQ ID NO:112, positions 1 to 21 of SEQ ID NO:113, positions 1 to 17 of SEQ ID NO:114, positions 1 to 17 of SEQ ID NO:115, positions 1 to 18 of SEQ ID NO:116, positions 1 to 22 of SEQ ID NO:117, positions 1 to 20 of SEQ ID NO:118, positions 1 to 22 of SEQ ID NO:119, positions 1 to 19 of SEQ ID NO:120, positions 1 to 20 of SEQ ID NO:121, positions 1 to 19 of SEQ ID NO:122, positions 1 to 22 of SEQ ID NO:123, positions 1 to 19 of SEQ ID NO:124, positions 1 to 20 of SEQ ID NO:125, positions 1 to 19 of SEQ ID NO:126, positions 1 to 21 of SEQ ID NO:127, positions 1 to 22 of SEQ ID NO:128, positions 1 to 19 of SEQ ID NO:129, positions 1 to 20 of SEQ ID NO:130, positions 1 to 19 of SEQ ID NO:131, positions 1 to 20 of SEQ ID NO:132, positions 1 to 20 of SEQ ID NO:133, positions 1 to 21 of SEQ ID NO:134, positions 1 to 22 of SEQ ID NO:135, positions 1 to 22 of SEQ ID NO:136, positions 1 to 22 of SEQ ID NO:137, positions 1 to 22 of SEQ ID NO:138, positions 1 to 19 of SEQ ID NO:139, positions 1 to 19 of SEQ ID NO:140, positions 1 to 20 of SEQ ID NO:141, positions 1 to 19 of SEQ ID NO:142, positions 1 to 20 of SEQ ID NO:143, positions 1 to 25 of SEQ ID NO:144, positions 1 to 22 of SEQ ID NO:145, positions 1 to 23 of SEQ ID NO:146, positions 1 to 19 of SEQ ID NO:147, positions 1 to 20 of SEQ ID NO:148, and positions 1 to 19 of SEQ ID NO:149.
Recombinant Expression of Variant CBH I Polypeptides Cell Culture Systems The disclosure also provides recombinant cells engineered to express variant CBH I polypeptides. Suitably, the variant CBH I polypeptide is encoded by a nucleic acid operably linked to a promoter.
Where recombinant expression in a filamentous fungal host is desired, the promoter can be a filamentous fungal promoter. The nucleic acids can be, for example, under the control of heterologous promoters. The variant CBH I polypeptides can also be expressed under the control of constitutive or inducible promoters. Examples of promoters that can be used include, but are not limited to, a cellulase promoter, a xylanase promoter, the 1818 promoter (previously identified as a highly expressed protein by EST mapping Trichoderma). For example, the promoter can suitably be a cellobiohydrolase, endoglucanase, or β-glucosidase promoter. A particularly suitable promoter can be, for example, a T. reesei cellobiohydrolase, endoglucanase, or β-glucosidase promoter. Non-limiting examples of promoters include a cbh1, cbh2, egl1, eg12, eg13, eg14, eg15, pki1, gpdl, xyn1, or xyn2 promoter.
Suitable host cells include cells of any microorganism (e.g., cells of a bacterium, a protist, an alga, a fungus (e.g., a yeast or filamentous fungus), or other microbe), and are preferably cells of a bacterium, a yeast, or a filamentous fungus.
Suitable host cells of the bacterial genera include, but are not limited to, cells of Escherichia, Bacillus, Lactobacillus, Pseudomonas, and Streptomyces. Suitable cells of bacterial species include, but are not limited to, cells of Escherichia coli, Bacillus subtilis, Bacillus licheniformis, Lactobacillus brevis, Pseudomonas aeruginosa, and Streptomyces lividans.
Suitable host cells of the genera of yeast include, but are not limited to, cells of Saccharomyces, Schizosaccharomyces, Candida, Hansenula, Pichia, Kluyveromyces, and Phaffia. Suitable cells of yeast species include, but are not limited to, cells of Saccharomyces cerevisiae, Schizosaccharomyces pombe, Candida albicans, Hansenula polymorphs, Pichia pastoris, P. canadensis, Kluyveromyces marxianus, and Phaffia rhodozyma.
Suitable host cells of filamentous fungi include all filamentous forms of the subdivision Eumycotina. Suitable cells of filamentous fungal genera include, but are not limited to, cells of Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Chrysoporium, Coprinus, Coriolus, Corynascus, Chaetomium, Cryptococcus, Filobasidium, Fusarium, Gibberella, Humicola, Hypocrea, Magnaporthe, Mucor, Myceliophthora, Mucor, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Phanerochaete, Phlebia, Piromyces, Pleurotus, Scytaldium, Schizophyllum, Sporotrichum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Trametes, and Trichoderma. More preferably, the recombinant cell is a Trichoderma sp. (e.g., Trichoderma reesei), Penicillium sp., Humicola sp. (e.g., Humicola insolens); Aspergillus sp. (e.g., Aspergillus niger), Chrysosporium sp., Fusarium sp., or Hypocrea sp. Suitable cells can also include cells of various anamorph and teleomorph forms of these filamentous fungal genera.
Suitable cells of filamentous fungal species include, but are not limited to, cells of Aspergillus awamori, Aspergillus fumigatus, Aspergillus foetidus, Aspergillus japonicus, Aspergillus nidulans, Aspergillus niger, Aspergillus oryzae, Chrysosporium lucknowense, Fusarium bactridioides, Fusarium cerealis, Fusarium crookwellense, Fusarium culmorum, Fusarium graminearum, Fusarium graminum, Fusarium heterosporum, Fusarium negundi, Fusarium oxysporum, Fusarium reticulatum, Fusarium roseum, Fusarium sambucinum, Fusarium sarcochroum, Fusarium sporotrichioides, Fusarium sulphureum, Fusarium torulosum, Fusarium trichothecioides, Fusarium venenatum, Bjerkandera adusta, Ceriporiopsis aneirina, Ceriporiopsis aneirina, Ceriporiopsis caregiea, Ceriporiopsis gilvescens, Ceriporiopsis pannocinta, Ceriporiopsis rivulosa, Ceriporiopsis subrufa, Ceriporiopsis subvermispora, Coprinus cinereus, Coriolus hirsutus, Humicola insolens, Humicola lanuginosa, Mucor miehei, Myceliophthora thermophile, Neurospora crassa, Neurospora intermedia, Penicillium purpurogenum, Penicillium canescens, Penicillium solitum, Penicillium funiculosum, Phanerochaete chrysosporium, Phlebia radiate, Pleurotus eryngii, Talaromyces flavus, Thielavia terrestris, Trametes villosa, Trametes versicolor, Trichoderma harzianum, Trichoderma koningii, Trichoderma longibrachiatum, Trichoderma reesei, and Trichoderma viride.
The engineered host cells can be cultured in conventional nutrient media modified as appropriate for activating promoters, selecting transformants, or amplifying the nucleic acid sequence encoding the variant CBH I polypeptide. Culture conditions, such as temperature, pH and the like, are those previously used with the host cell selected for expression, and will be apparent to those skilled in the art. As noted, many references are available for the culture and production of many cells, including cells of bacterial and fungal origin. Cell culture media in general are set forth in Atlas and Parks (eds.), 1993, The Handbook of Microbiological Media, CRC Press, Boca Raton, Fla., which is incorporated herein by reference. For recombinant expression in filamentous fungal cells, the cells are cultured in a standard medium containing physiological salts and nutrients, such as described in Pourquie et al., 1988, Biochemistry and Genetics of Cellulose Degradation, eds. Aubert, et al., Academic Press, pp. 71-86; and Ilmen et al., 1997, Appl. Environ. Microbiol. 63:1298-1306. Culture conditions are also standard, e.g., cultures are incubated at 28° C. in shaker cultures or fermenters until desired levels of variant CBH I expression are achieved. Preferred culture conditions for a given filamentous fungus may be found in the scientific literature and/or from the source of the fungi such as the American Type Culture Collection (ATCC). After fungal growth has been established, the cells are exposed to conditions effective to cause or permit the expression of a variant CBH I.
In cases where a variant CBH I coding sequence is under the control of an inducible promoter, the inducing agent, e.g., a sugar, metal salt or antibiotics, is added to the medium at a concentration effective to induce variant CBH I expression.
In one embodiment, the recombinant cell is an Aspergillus niger, which is a useful strain for obtaining overexpressed polypeptide. For example A. niger var. awamori dgr246 is known to product elevated amounts of secreted cellulases (Goedegebuur et al., 2002, Curr. Genet. 41:89-98). Other strains of Aspergillus niger var awamori such as GCDAP3, GCDAP4 and GAP3-4 are known (Ward et al., 1993, Appl. Microbiol. Biotechnol. 39:738-743).
In another embodiment, the recombinant cell is a Trichoderma reesei, which is a useful strain for obtaining overexpressed polypeptide. For example, RL-P37, described by Sheir-Neiss et al., 1984, Appl. Microbiol. Biotechnol. 20:46-53, is known to secrete elevated amounts of cellulase enzymes. Functional equivalents of RL-P37 include Trichoderma reesei strain RUT-C30 (ATCC No. 56765) and strain QM9414 (ATCC No. 26921). It is contemplated that these strains would also be useful in overexpressing variant CBH I polypeptides.
Cells expressing the variant CBH I polypeptides of the disclosure can be grown under batch, fed-batch or continuous fermentations conditions. Classical batch fermentation is a closed system, wherein the compositions of the medium is set at the beginning of the fermentation and is not subject to artificial alternations during the fermentation. A variation of the batch system is a fed-batch fermentation in which the substrate is added in increments as the fermentation progresses. Fed-batch systems are useful when catabolite repression is likely to inhibit the metabolism of the cells and where it is desirable to have limited amounts of substrate in the medium. Batch and fed-batch fermentations are common and well known in the art. Continuous fermentation is an open system where a defined fermentation medium is added continuously to a bioreactor and an equal amount of conditioned medium is removed simultaneously for processing. Continuous fermentation generally maintains the cultures at a constant high density where cells are primarily in log phase growth. Continuous fermentation systems strive to maintain steady state growth conditions. Methods for modulating nutrients and growth factors for continuous fermentation processes as well as techniques for maximizing the rate of product formation are well known in the art of industrial microbiology.
Recombinant Expression in Plants The disclosure provides transgenic plants and seeds that recombinantly express a variant CBH I polypeptide. The disclosure also provides plant products, e.g., oils, seeds, leaves, extracts and the like, comprising a variant CBH I polypeptide.
The transgenic plant can be dicotyledonous (a dicot) or monocotyledonous (a monocot). The disclosure also provides methods of making and using these transgenic plants and seeds. The transgenic plant or plant cell expressing a variant CBH I can be constructed in accordance with any method known in the art. See, for example, U.S. Pat. No. 6,309,872. T. reesei CBH I has been successfully expressed in transgenic tobacco (Nicotiana tabaccum) and potato (Solanum tuberosum). See Hooker et al., 2000, in Glycosyl Hydrolases for Biomass Conversion, ACS Symposium Series, Vol. 769, Chapter 4, pp. 55-90.
In a particular aspect, the present disclosure provides for the expression of CBH I variants in transgenic plants or plant organs and methods for the production thereof. DNA expression constructs are provided for the transformation of plants with a nucleic acid encoding the variant CBH I polypeptide, preferably under the control of regulatory sequences which are capable of directing expression of the variant CBH I polypeptide. These regulatory sequences include sequences capable of directing transcription in plants, either constitutively, or in stage and/or tissue specific manners.
The expression of variant CBH I polypeptides in plants can be achieved by a variety of means. Specifically, for example, technologies are available for transforming a large number of plant species, including dicotyledonous species (e.g., tobacco, potato, tomato, Petunia, Brassica) and monocot species. Additionally, for example, strategies for the expression of foreign genes in plants are available. Additionally still, regulatory sequences from plant genes have been identified that are serviceable for the construction of chimeric genes that can be functionally expressed in plants and in plant cells (e.g., Klee, 1987, Ann. Rev. of Plant Phys. 38:467-486; Clark et al., 1990, Virology 179(2):640-7; Smith et al., 1990, Mol. Gen. Genet. 224(3):477-81.
The introduction of nucleic acids into plants can be achieved using several technologies including transformation with Agrobacterium tumefaciens or Agrobacterium rhizogenes. Non-limiting examples of plant tissues that can be transformed include protoplasts, microspores or pollen, and explants such as leaves, stems, roots, hypocotyls, and cotyls. Furthermore, DNA encoding a variant CBH I can be introduced directly into protoplasts and plant cells or tissues by microinjection, electroporation, particle bombardment, and direct DNA uptake.
Variant CBH I polypeptides can be produced in plants by a variety of expression systems. For instance, the use of a constitutive promoter such as the 35S promoter of Cauliflower Mosaic Virus (Guilley et al., 1982, Cell 30:763-73) is serviceable for the accumulation of the expressed protein in virtually all organs of the transgenic plant. Alternatively, promoters that are tissue-specific and/or stage-specific can be used (Higgins, 1984, Annu Rev. Plant Physiol. 35:191-221; Shotwell and Larkins, 1989, In: The Biochemistry of Plants Vol. 15 (Academic Press, San Diego: Stumpf and Conn, eds.), p. 297), permit expression of variant CBH I polypeptides in a target tissue and/or during a desired stage of development.
Compositions of Variant CBH I Polypeptides In general, a variant CBH I polypeptide produced in cell culture is secreted into the medium and may be purified or isolated, e.g., by removing unwanted components from the cell culture medium. However, in some cases, a variant CBH I polypeptide may be produced in a cellular form necessitating recovery from a cell lysate. In such cases the variant CBH I polypeptide is purified from the cells in which it was produced using techniques routinely employed by those of skill in the art. Examples include, but are not limited to, affinity chromatography (Van Tilbeurgh et al., 1984, FEBS Lett. 169(2):215-218), ion-exchange chromatographic methods (Goyal et al., 1991, Bioresource Technology, 36:37-50; Fliess et al., 1983, Eur. J. Appl. Microbiol. Biotechnol. 17:314-318; Bhikhabhai et al., 1984, J. Appl. Biochem. 6:336-345; Ellouz et al., 1987, Journal of Chromatography, 396:307-317), including ion-exchange using materials with high resolution power (Medve et al., 1998, J. Chromatography A, 808:153-165), hydrophobic interaction chromatography (Tomaz and Queiroz, 1999, J. Chromatography A, 865:123-128), and two-phase partitioning (Brumbauer et al., 1999, Bioseparation 7:287-295).
The variant CBH I polypeptides of the disclosure are suitably used in cellulase compositions. Cellulases are known in the art as enzymes that hydrolyze cellulose (beta-1,4-glucan or beta D-glucosidic linkages) resulting in the formation of glucose, cellobiose, cellooligosaccharides, and the like. Cellulase enzymes have been traditionally divided into three major classes: endoglucanases (“EG”), exoglucanases or cellobiohydrolases (EC 3.2.1.91) (“CBH”) and beta-glucosidases (EC 3.2.1.21) (“BG”) (Knowles et al., 1987, TIBTECH 5:255-261; Schulein, 1988, Methods in Enzymology 160(25):234-243).
Certain fungi produce complete cellulase systems which include exo-cellobiohydrolases or CBH-type cellulases, endoglucanases or EG-type cellulases and β-glucosidases or BG-type cellulases (Schulein, 1988, Methods in Enzymology 160(25):234-243). Such cellulase compositions are referred to herein as “whole” cellulases. However, sometimes these systems lack CBH-type cellulases and bacterial cellulases also typically include little or no CBH-type cellulases. In addition, it has been shown that the EG components and CBH components synergistically interact to more efficiently degrade cellulose. See, e.g., Wood, 1985, Biochemical Society Transactions 13(2):407-410.
The cellulase compositions of the disclosure typically include, in addition to a variant CBH I polypeptide, one or more cellobiohydrolases, endoglucanases and/or β-glucosidases. In their crudest form, cellulase compositions contain the microorganism culture that produced the enzyme components. “Cellulase compositions” also refers to a crude fermentation product of the microorganisms. A crude fermentation is preferably a fermentation broth that has been separated from the microorganism cells and/or cellular debris (e.g., by centrifugation and/or filtration). In some cases, the enzymes in the broth can be optionally diluted, concentrated, partially purified or purified and/or dried. The variant CBH I polypeptide can be co-expressed with one or more of the other components of the cellulase composition or it can be expressed separately, optionally purified and combined with a composition comprising one or more of the other cellulase components.
When employed in cellulase compositions, the variant CBH I is generally present in an amount sufficient to allow release of soluble sugars from the biomass. The amount of variant CBH I enzymes added depends upon the type of biomass to be saccharified which can be readily determined by the skilled artisan. In certain embodiments, the weight percent of variant CBH I polypeptide is suitably at least 1, at least 5, at least 10, or at least 20 weight percent of the total polypeptides in a cellulase composition. Exemplary cellulase compositions include a variant CBH I of the disclosure in an amount ranging from about 1 to about 20 weight percent, from about 1 to about 25 weight percent, from about 5 to about 20 weight percent, from about 5 to about 25 weight percent, from about 5 to about 30 weight percent, from about 5 to about 35 weight percent, from about 5 to about 40 weight percent, from about 5 to about 45 weight percent, from about 5 to about 50 weight percent, from about 10 to about 20 weight percent, from about 10 to about 25 weight percent, from about 10 to about 30 weight percent, from about 10 to about 35 weight percent, from about 10 to about 40 weight percent, from about 10 to about 45 weight percent, from about 10 to about 50 weight percent, from about 15 to about 20 weight percent, from about 15 to about 25 weight percent, from about 15 to about 30 weight percent, from about 15 to about 35 weight percent, from about 15 to about 30 weight percent, from about 15 to about 45 weight percent, or from about 15 to about 50 weight percent of the total polypeptides in the composition.
Utility of Variant CBH I Polypeptides It can be appreciated that the variant CBH I polypeptides of the disclosure and compositions comprising the variant CBH I polypeptides find utility in a wide variety applications, for example detergent compositions that exhibit enhanced cleaning ability, function as a softening agent and/or improve the feel of cotton fabrics (e.g., “stone washing” or “biopolishing”), or in cellulase compositions for degrading wood pulp into sugars (e.g., for bio-ethanol production). Other applications include the treatment of mechanical pulp (Pere et al., 1996, Tappi Pulping Conference, pp. 693-696 (Nashville, Tenn., Oct. 27-31, 1996)), for use as a feed additive (see, e.g., WO 91/04673) and in grain wet milling.
Saccharification Reactions Ethanol can be produced via saccharification and fermentation processes from cellulosic biomass such as trees, herbaceous plants, municipal solid waste and agricultural and forestry residues. However, the ratio of individual cellulase enzymes within a naturally occurring cellulase mixture produced by a microbe may not be the most efficient for rapid conversion of cellulose in biomass to glucose. It is known that endoglucanases act to produce new cellulose chain ends which themselves are substrates for the action of cellobiohydrolases and thereby improve the efficiency of hydrolysis of the entire cellulase system. The use of optimized cellobiohydrolase activity may greatly enhance the production of ethanol.
Cellulase compositions comprising one or more of the variant CBH I polypeptides of the disclosure can be used in saccharification reaction to produce simple sugars for fermentation. Accordingly, the present disclosure provides methods for saccharification comprising contacting biomass with a cellulase composition comprising a variant CBH I polypeptide of the disclosure and, optionally, subjecting the resulting sugars to fermentation by a microorganism.
The term “biomass,” as used herein, refers to any composition comprising cellulose (optionally also hemicellulose and/or lignin). As used herein, biomass includes, without limitation, seeds, grains, tubers, plant waste or byproducts of food processing or industrial processing (e.g., stalks), corn (including, e.g., cobs, stover, and the like), grasses (including, e.g., Indian grass, such as Sorghastrum nutans; or, switchgrass, e.g., Panicum species, such as Panicum virgatum), wood (including, e.g., wood chips, processing waste), paper, pulp, and recycled paper (including, e.g., newspaper, printer paper, and the like). Other biomass materials include, without limitation, potatoes, soybean (e.g., rapeseed), barley, rye, oats, wheat, beets, and sugar cane bagasse.
The saccharified biomass (e.g., lignocellulosic material processed by enzymes of the disclosure) can be made into a number of bio-based products, via processes such as, e.g., microbial fermentation and/or chemical synthesis. As used herein, “microbial fermentation” refers to a process of growing and harvesting fermenting microorganisms under suitable conditions. The fermenting microorganism can be any microorganism suitable for use in a desired fermentation process for the production of bio-based products. Suitable fermenting microorganisms include, without limitation, filamentous fungi, yeast, and bacteria. The saccharified biomass can, for example, be made it into a fuel (e.g., a biofuel such as a bioethanol, biobutanol, biomethanol, a biopropanol, a biodiesel, a jet fuel, or the like) via fermentation and/or chemical synthesis. The saccharified biomass can, for example, also be made into a commodity chemical (e.g., ascorbic acid, isoprene, 1,3-propanediol), lipids, amino acids, polypeptides, and enzymes, via fermentation and/or chemical synthesis.
Thus, in certain aspects, the variant CBH I polypeptides of the disclosure find utility in the generation of ethanol from biomass in either separate or simultaneous saccharification and fermentation processes. Separate saccharification and fermentation is a process whereby cellulose present in biomass is saccharified into simple sugars (e.g., glucose) and the simple sugars subsequently fermented by microorganisms (e.g., yeast) into ethanol. Simultaneous saccharification and fermentation is a process whereby cellulose present in biomass is saccharified into simple sugars (e.g., glucose) and, at the same time and in the same reactor, microorganisms (e.g., yeast) ferment the simple sugars into ethanol.
Prior to saccharification, biomass is preferably subject to one or more pretreatment step(s) in order to render cellulose material more accessible or susceptible to enzymes and thus more amenable to hydrolysis by the variant CBH I polypeptides of the disclosure.
In an exemplary embodiment, the pretreatment entails subjecting biomass material to a catalyst comprising a dilute solution of a strong acid and a metal salt in a reactor. The biomass material can, e.g., be a raw material or a dried material. This pretreatment can lower the activation energy, or the temperature, of cellulose hydrolysis, ultimately allowing higher yields of fermentable sugars. See, e.g., U.S. Pat. Nos. 6,660,506; 6,423,145.
Another exemplary pretreatment method entails hydrolyzing biomass by subjecting the biomass material to a first hydrolysis step in an aqueous medium at a temperature and a pressure chosen to effectuate primarily depolymerization of hemicellulose without achieving significant depolymerization of cellulose into glucose. This step yields a slurry in which the liquid aqueous phase contains dissolved monosaccharides resulting from depolymerization of hemicellulose, and a solid phase containing cellulose and lignin. The slurry is then subject to a second hydrolysis step under conditions that allow a major portion of the cellulose to be depolymerized, yielding a liquid aqueous phase containing dissolved/soluble depolymerization products of cellulose. See, e.g., U.S. Pat. No. 5,536,325.
A further exemplary method involves processing a biomass material by one or more stages of dilute acid hydrolysis using about 0.4% to about 2% of a strong acid; followed by treating the unreacted solid lignocellulosic component of the acid hydrolyzed material with alkaline delignification. See, e.g., U.S. Pat. No. 6,409,841. Another exemplary pretreatment method comprises prehydrolyzing biomass (e.g., lignocellulosic materials) in a prehydrolysis reactor; adding an acidic liquid to the solid lignocellulosic material to make a mixture; heating the mixture to reaction temperature; maintaining reaction temperature for a period of time sufficient to fractionate the lignocellulosic material into a solubilized portion containing at least about 20% of the lignin from the lignocellulosic material, and a solid fraction containing cellulose; separating the solubilized portion from the solid fraction, and removing the solubilized portion while at or near reaction temperature; and recovering the solubilized portion. The cellulose in the solid fraction is rendered more amenable to enzymatic digestion. See, e.g., U.S. Pat. No. 5,705,369. Further pretreatment methods can involve the use of hydrogen peroxide H2O2. See Gould, 1984, Biotech, and Bioengr. 26:46-52.
Pretreatment can also comprise contacting a biomass material with stoichiometric amounts of sodium hydroxide and ammonium hydroxide at a very low concentration. See Teixeira et al., 1999, Appl. Biochem. and Biotech. 77-79:19-34. Pretreatment can also comprise contacting a lignocellulose with a chemical (e.g., a base, such as sodium carbonate or potassium hydroxide) at a pH of about 9 to about 14 at moderate temperature, pressure, and pH. See PCT Publication WO2004/081185.
Ammonia pretreatment can also be used. Such a pretreatment method comprises subjecting a biomass material to low ammonia concentration under conditions of high solids. See, e.g., U.S. Patent Publication No. 20070031918 and PCT publication WO 06/110901.
Detergent Compositions Comprising Variant CBH I Proteins The present disclosure also provides detergent compositions comprising a variant CBH I polypeptide of the disclosure. The detergent compositions may employ besides the variant CBH I polypeptide one or more of a surfactant, including anionic, non-ionic and ampholytic surfactants; a hydrolase; a bleaching agents; a bluing agent; a caking inhibitors; a solubilizer; and a cationic surfactant. All of these components are known in the detergent art.
The variant CBH I polypeptide is preferably provided as part of cellulase composition. The cellulase composition can be employed from about 0.00005 weight percent to about 5 weight percent or from about 0.0002 weight percent to about 2 weight percent of the total detergent composition. The cellulase composition can be in the form of a liquid diluent, granule, emulsion, gel, paste, and the like. Such forms are known to the skilled artisan. When a solid detergent composition is employed, the cellulase composition is preferably formulated as granules.
Examples Materials and Methods Preparation of CBH I Polypeptides for Biochemical Characterization Protein expression was carried out in an Aspergillus niger host strain that had been transformed using PEG-mediated transformation with expression constructs for CBHI that included the hygromycin resistance gene as a selectable marker, in which the full length CBH I sequences (signal sequence, catalytic domain, linker and cellulose binding domain) were under the control of the glyceraldeyhde-3-phosphate dehydrogenase (gpd) promoter. Transformants were selected on the regeneration medium based on resistance to hygromycin. The selected transformants were cultured in Aspergillus salts medium, pH 6.2 supplemented with the antibiotics penicillin, streptomycin, and hygromycin, and 80 g/L glycerol, 20 g/L soytone, 10 mM uridine, 20 g/L MES) in baffled shake flasks at 30° C., 170 rpm. After five days of incubation, the total secreted protein supernatant was recovered, and then subjected to hollow fiber filtration to concentrate and exchange the sample into acetate buffer (50 mM NaAc, pH 5). CBH I protein represented over 90% of the total protein in these samples. Protein purity was analyzed by SDS-PAGE. Protein concentration was determined by gel densitometry and/or HPLC analysis. All CBH I protein concentrations were normalized before assay and concentrated to 1-2.5 mg/ml.
CBH I Activity Assays 4-Methylumbelliferyl Lactoside (4-MUL) Assay:
This assay measures the activity of CBH I on the fluorogenic substrate 4-MUL (also known as MUL). Assays were run in a costar 96-well black bottom plate, where reactions were initiated by the addition of 4-MUL to enzyme in buffer (2 mM 4-MUL in 200 mM MES pH 6). Enzymatic rates were monitored by fluorescent readouts over five minutes on a SPECTRAMAX™ plate reader (ex/em 365/450 nm). Data in the linear range was used to calculate initial rates (Vo).
Phosphoric Acid Swollen Cellulose (PASC) Assay:
This assay measures the activity of CBH I using PASC as the substrate. During the assay, the concentration of PASC is monitored by a fluorescent signal derived from calcofluor binding to PASC (ex/em 365/440 nm). The assay is initiated by mixing enzyme (15 μl) and reaction buffer (85 μl of 0.2% PASC, 200 mM MES, pH 6), and then incubating at 35° C. while shaking at 225 RPM. After 2 hours, one reaction volume of calcofluor stop solution (100 μg/ml in 500 mM glycine pH 10) is added and fluorescence read-outs obtained (ex/em 365/440 nm).
Bagasse Assay:
This assay measures the activity of CBH I on bagasse, a lignocellulosic substrate. Reactions were run in 10 ml vials with 5% dilute acid pretreated bagasse (250 mg solids per 5 ml reaction). Each reaction contained 4 mg CBH I enzyme/g solids, 200 mM MES pH 6, kanamycin, and chloramphenicol. Reactions were incubated at 35° C. in hybridization incubators (Robbins Scientific), rotating at 20 RPM. Time points were taken by transferring a sample of homogenous slurry (150 μl) into a 96-well deep well plate and quenching the reaction with stop buffer (450 μl of 500 mM sodium carbonate, pH 10). Time point measurements were taken every 24 hours for 72 hours.
Cellobiose Tolerance Assays (or Cellobiose Inhibition Assays):
Tolerance to cellobiose (or inhibition caused by cellobiose) was tested in two ways in the CBH I assays. A direct-dose tolerance method can be applied to all of the CBH I assays (i.e., 4-MUL, PASC, and/or bagasse assays), and entails the exogenous addition of a known amount of cellobiose into assay mixtures. A different indirect method entails the addition of an excess amount of β-glucosidase (BG) to PASC and bagasse assays (typically, 1 mg β-glucosidase/g solids loaded). BG will enzymatically hydrolyze the cellobiose generated during these assays; therefore, CBH I activity in the presence of BG can be taken as a measure of activity in the absence of cellobiose. Furthermore, when activity in the presence and absence of BG are similar, this indicates tolerance to cellobiose. Notably, in cases where BG activity is undesired, but may be present in crude CBH I enzyme preparations, the BG inhibitor gluconolactone can be added into CBH I assays to prevent cellobiose breakdown.
Library Screening Assays The wild type CBH I polypeptide BD29555 was mutagenized to identify variants with improved product tolerance. A small (60-member) library of BD29555 variants was designed to identify variant CBH I polypeptides with reduced product inhibition. This product-release-site library was designed based on residues directly interacting with the cellobiose product in an attempt to identify variants with weakened interactions with cellobiose from which the product would be released more readily than the wild type enzyme. The 60-member evolution library contained wild-type residues and mutations at positions B273, W405, and R422 of BD29555 (SEQ ID NO:1), and included the following substitutions: B273 (WT), R273Q, R273K, R273A, W405 (WT), W405Q, W405H, R422 (WT), R422Q, R422K, R422L, and R422E (4 variants at position 273×3 variants at position 405×5 variants at position 422 equals 60 variants in total). All members of the library were screened using the 4-MUL assay in the presence and absence of 250 g/L cellobiose and using gluconolactone to inhibit any BG activity. The R273A, R273Q, and R273K/R422K variants showed enhanced product tolerance. The R273K/R422K variant showed greatest activity among the variants and cellobiose tolerance at 250 mg/L. Due to low expression, the R273K variant was not tested for product inhibition.
Characterization of Product Tolerant VARIANTS of BD29555 The R273K/R422K substitutions were characterized in both a wild type BD29555 background and also in combination with the substitutions Y274Q, D281K, Y410H, P411G, which were identified in a screen of an expanded product release site evolution library.
The wild type, the R273K/R422K variant and the R273K/Y274Q/D281K/Y410H/P411G/R422K variants were tested for activity on 4-MUL in the presence and absence of 250 mg/L cellobiose, and the R273K/R422K variant was also tested in the bagasse assay in the presence and absence of BG. The results are summarized in Table 5.
The results from these activity assays were converted into the percentage of activity remaining with and without cellobiose present, where values close to 100% indicated cellobiose tolerance. The percent of activity remaining in the MUL assay in the presence cellobiose versus in the absence of cellobiose shows that the R273K/R422K variant was the most tolerant, followed by the R273K/Y274Q/D281K/Y410H/P411G/R422K variant, and then wild-type, at 95%, 78%, and 25% activity, respectively.
Cellobiose dose response curves of the wild-type and R273K/R422K variant of BD29555 were obtained during the 4-MUL assay. Enzyme rates (Vo) were measured in the presence of different concentrations of cellobiose (200 mM MES pH 6, 25° C.). Rates were measured in quadruplicate. The results are shown in FIG. 1A-1B. FIG. 1A shows that wild type BD2955 is inhibited by cellobiose, with a half maximal inhibitory concentration (IC50 value) of 60 mg/L. FIG. 1B shows that the R273K/R422K variant is tolerant to cellobiose up to 250 mg/L.
The bagasse assay results shown in Table 5, which lists the percentage of activity remaining in the absence vs. presence of BG, also demonstrate that the percentage activity of the wild type BD29555 is lower than the percentage activity of the R273K/R422K variant, indicating that the R273K/R422K variant is less sensitive to the presence of cellobiose than the wild type. FIG. 2A-2B shows bar graph data for the bagasse assay of BD29555 vs. the R273K/R422K variant. In FIG. 2A, bars represent relative activity, which has been normalized to wild type activity in the absence of cellobiose (WT+BG=uninhibited activity=1). In FIG. 2B, bars indicate tolerance to cellobiose, as represented by the ratio of activity in the presence of cellobiose (−BG) to that of activity in the absence of cellobiose (+BG); ratios close to 1 indicate greater tolerance to cellobiose. These data again demonstrate that the R273K/R422K variant of BD29555 is more tolerant to cellobiose than the wild tvae BD29555.
The wild type and R273K/R422K variant were also characterized in the PASC assay. Results are shown in FIG. 3. The activities of both wild type BD29555 (SEQ ID NO:1) and wild type T. reesei CBH I (SEQ ID NO:2) were inhibited by cellobiose concentrations starting around 1 g/L (with IC50 values of 2.2 and 3 g/L, respectively), whereas the R273K/R422K variant showed little inhibition in the presence of 10 g/L cellobiose.
Characterization of Product Tolerant VARIANTS of T. reesei CBH I
Cellobiose product tolerant substitutions were introduced into T. reesei CBH I (SEQ ID NO:2). A panel of variants with single and double alanine and lysine substitutions at R268 and R411 were expressed and analyzed. The variants were tested for activity on 4-MUL in the presence and absence of 250 mg/L cellobiose and also in the bagasse assay in the absence and prseence of BG. The results from these assays were converted into the percentage activity remaining in the presence and absence of cellobiose and BG, respectively. Values are summarized in Table 6.
The 4-MUL assay results shown in Table 6 demonstrate that the activity of the wild type T. reesei CBH I was reduced to 23% in the presence of cellobiose, whereas the double mutants at R268 and R411 retained more than 90% of their activity under the same conditions.
The bagasse assay results shown in Table 6 demonstrate that the activity of the wild type T. reesei CBH I is more significantly impacted by the presence of BG than is the activity of the single or double substitution variants, indicating that the variants are less sensitive to the accumulation of cellobiose than the wild type. FIGS. 4 and 5 show bar graph data for the bagasse assay of wild type T. reesei CBH I vs. the variants. In FIG. 4, bars represent relative activity, normalized to wild type activity in the absence of cellobiose (WT+BG=1). In FIG. 5, bars represent tolerance to cellobiose, as represented by the ratio of activity in the presence of accumulating cellobiose (−BG) to that of activity in the absence of cellobiose (+BG); ratios close to 1 indicate greater tolerance to cellobiose.
Specific Embodiments and Incorporation by Reference All publications, patents, patent applications and other documents cited in this application are hereby incorporated by reference in their entireties for all purposes to the same extent as if each individual publication, patent, patent application or other document were individually indicated to be incorporated by reference for all purposes.
While various specific embodiments have been illustrated and described, it will be appreciated that various changes can be made without departing from the spirit and scope of the invention(s).
TABLE 1
Sequence Identifier Database
(SEQ ID NO:) Accession Number Species of Origin Amino Acid Sequence
BD29555* Unknown MSALNSFNMY KSALILGSLL ATAGAQQIGT YTAETHPSLS WSTCKSGGSC TTNSGAITLD ANWRWVHGVN
TSTNCYTGNT WNTAICDTDA SCAQDCALDG ADYSGTYGIT TSGNSLRLNF VTGSNVGSRT YLMADNTHYQ
IFDLLNQEFT FTVDVSHLPC GLNGALYFVT MDADGGVSKY PNNKAGAQYG VGYCDSQCPR DLKFIAGQAN
VEGWTPSSNN ANTGLGNHGA CCAELDIWEA NSISEALTPH PCDTPGLSVC TTDACGGTYS SDRYAGTCDP
DGCDFNPYRL GVTDFYGSGK TVDTTKPITV VTQFVTDDGT STGTLSEIRR YYVQNGVVIP QPSSKISGVS
GNVINSDFCD AEISTFGETA SFSKHGGLAK MGAGMEAGMV LVMSLWDDYS VNMLWLDSTY PTNATGTPGA
ARGSCPTTSG DPKTVESQSG SSYVTFSDIR VGPFNSTFSG GSSTGGSSTT TASGTTTTKA SSTSTSSTST
GTGVAAHWGQ CGGQGWTGPT TCASGTTCTV VNPYYSQCL
340514556 Trichoderma reesei MYRKLAVISA FLATARAQSA CTLQSETHPP LTWQKCSSGG TCTQQTGSVV IDANWRWTHA TNSSTNCYDG
NTWSSTLCPD NETCAKNCCL DGAAYASTYG VTTSGNSLSI GFVTQSAQKN VGARLYLMAS DTTYQEFTLL
GNEFSFDVDV SQLPCGLNGA LYFVSMDADG GVSKYPTNTA GAKYGTGYCD SQCPRDLKFI NGQANVEGWE
PSSNNANTGI GGHGSCCSEM DIWEANSISE ALTPHPCTTV GQEICEGDGC GGTYSDNRYG GTCDPDGCDW
NPYRLGNTSF YGPGSSFTLD TTKKLTVVTQ FETSGAINRY YVQNGVTFQQ PNAELGSYSG NELNDDYCTA
EEAEFGGSSF SDKGGLTQFK KATSGGMVLV MSLWDDYYAN MLWLDSTYPT NETSSTPGAV RGSCSTSSGV
PAQVESQSPN AKVTFSNIKF GPIGSTGNPS GGNPPGGNPP GTTTTRRPAT TTGSSPGPTQ SHYGQCGGIG
YSGPTVCASG TTCQVLNPYY SQCL
51243029 Penicillium occitanis MSALNSFNMY KSALILGSLL ATAGAQQIGT YTAETHPSLS WSTCKSGGSC TTNSGAITLD ANWRWVHGVN
TSTNCYTGNT WNSAICDTDA SCAQDCALDG ADYSGTYGIT TSGNSLRLNF VTGSNVGSRT YLMADNTHYQ
IFDLLNQEFT FTVDVSHLPC GLNGALYFVT MDADGGVSKY PNNKAGAQYG VGYCDSQCPR DLKFIAGQAN
VEGWTPSANN ANTGIGNHGA CCAELDIWEA NSISEALTPH PCDTPGLSVC TTDACGGTYS SDRYAGTCDP
DGCDFNPYRL GVTDFYGSGK TVDTTKPFTV VTQFVTNDGT STGSLSEIRR YYVQNGVVIP QPSSKISGIS
GNVINSDYCA AEISTFGGTA SFNKHGGLTN MAAGMEAGMV LVMSLWDDYA VNMLWLDSTY PTNATGTPGA
ARGTCATTSG DPKTVESQSG SSYVTFSDIR VGPFNSTFSG GSSTGGSTTT TASRTTTTSA SSTSTSSTST
GTGVAGHWGQ CGGQGWTGPT TCVSGTTCTV VNPYYSQCL
7cel (PDB) & Trichoderma reesei ESACTLQSET HPPLTWQKCS SGGTCTQQTG SVVIDANWRW THATNSSTNC YDGNTWSSTL CPDNETCAKN
CCLDGAAYAS TYGVTTSGNS LSIDFVTQSA QKNVGARLYL MASDTTYQEF TLLGNEFSFD VDVSQLPCGL
NGALYFVSMD ADGGVSKYPT NTAGAKYGTG YCDSQCPRDL KFINGQANVE GWEPSSNNAN TGIGGHGSCC
SEMDIWQANS ISEALTPHPC TTVGQEICEG DGCGGTYSDN RYGGTCDPDG CDWNPYRLGN TSFYGPGSSF
TLDTTKKLTV VTQFETSGAI NRYYVQNGVT FQQPNAELGS YSGNELNDDY CTAEEAEFGG SSFSDKGGLT
QFKKATSGGM VLVMSLWDDY YANMLWLDST YPTNETSSTP GAVRGSCSTS SGVPAQVESQ SPNAKVTFSN
IKFGPIGSTG NPSG
67516425 Aspergillus nidulans MASSFQLYKA LLFFSSLLSA VQAQKVGTQQ AEVHPGLTWQ TCTSSGSCTT VNGEVTIDAN WRWLHTVNGY
FGSC A4 TNCYTGNEWD TSICTSNEVC AEQCAVDGAN YASTYGITTS GSSLRLNFVT QSQQKNIGSR VYLMDDEDTY
TMFYLLNKEF TFDVDVSELP CGLNGAVYFV SMDADGGKSR YATNEAGAKY GTGYCDSQCP RDLKFINGVA
NVEGWESSDT NPNGGVGNHG SCCAEMDIWE ANSISTAFTP HPCDTPGQTL CTGDSCGGTY SNDRYGGTCD
PDGCDFNSYR QGNKTFYGPG LTVDTNSPVT VVTQFLTDDN TDTGTLSEIK RFYVQNGVVI PNSESTYPAN
PGNSITTEFC ESQKELFGDV DVFSAHGGMA GMGAALEQGM VLVLSLWDDN YSNMLWLDSN YPTDADPTQP
GIARGTCPTD SGVPSEVEAQ YPNAYVVYSN IKFGPIGSTF GNGGGSGPTT TVTTSTATST TSSATSTATG
QAQHWEQCGG NGWTGPTVCA SPWACTVVNS WYSQCL
46107376 Gibberella zeae PH-1 MYRAIATASA LIAAVRAQQV CSLTQESKPS LNWSKCTSSG CSNVKGSVTI DANWRWTHQV SGSTNCYTGN
KWDTSVCTSG KVCAEKCCLD GADYASTYGI TSSGDQLSLS FVTKGPYSTN IGSRTYLMED ENTYQMFQLL
GNEFTFDVDV SNIGCGLNGA LYFVSMDADG GKAKYPGNKA GAKYGTGYCD AQCPRDVKFI NGQANSDGWQ
PSDSDVNGGI GNLGTCCPEM DIWEANSIST AYTPHPCTKL TQHSCTGDSC GGTYSNDRYG GTCDADGCDF
NSYRQGNKTF YGPGSGFNVD TTKKVTVVTQ FHKGSNGRLS EITRLYVQNG KVIANSESKI AGVPGNSLTA
DFCTKQKKVF NDPDDFTKKG AWSGMSDALE APMVLVMSLW HDHHSNMLWL DSTYPTDSTK LGSQRGSCST
SSGVPADLEK NVPNSKVAFS NIKFGPIGST YKSDGTTPTN PTNPSEPSNT ANPNPGTVDQ WGQCGGSNYS
GPTACKSGFT CKKINDFYSQ CQ
70992391 Aspergillus fumigatus MLASTFSYRM YKTALILAAL LGSGQAQQVG TSQAEVHPSM TWQSCTAGGS CTTNNGKVVI DANWRWVHKV
Af293 GDYTNCYTGN TWDTTICPDD ATCASNCALE GANYESTYGV TASGNSLRLN FVTTSQQKNI GSRLYMMKDD
STYEMFKLLN QEFTFDVDVS NLPCGLNGAL YFVAMDADGG MSKYPTNKAG AKYGTGYCDS QCPRDLKFIN
GQANVEGWQP SSNDANAGTG NHGSCCAEMD IWEANSISTA FTPHPCDTPG QVMCTGDACG GTYSSDRYGG
TCDPDGCDFN SFRQGNKTFY GPGMTVDTKS KFTVVTQFIT DDGTSSGTLK EIKRFYVQNG KVIPNSESTW
TGVSGNSITT EYCTAQKSLF QDQNVFEKHG GLEGMGAALA QGMVLVMSLW DDHSANMLWL DSNYPTTASS
TTPGVARGTC DISSGVPADV EANHPDAYVV YSNIKVGPIG STFNSGGSNP GGGTTTTTTT QPTTTTTTAG
NPGGTGVAQH YGQCGGIGWT GPTTCASPYT CQKLNDYYSQ CL
121699984 Aspergillus clavatus MLPSTISYRI YKNALFFAAL FGAVQAQKVG TSKAEVHPSM AWQTCAADGT CTTKNGKVVI DANWRWVHDV
NRRL 1 KGYTNCYTGN TWNAELCPDN ESCAENCALE GADYAATYGA TTSGNALSLK FVTQSQQKNI GSRLYMMKDD
NTYETFKLLN QEFTFDVDVS NLPCGLNGAL YFVSMDADGG LSRYTGNEAG AKYGTGYCDS QCPRDLKFIN
GLANVEGWTP SSSDANAGNG GHGSCCAEMD IWEANSISTA YTPHPCDTPG QAMCNGDSCG GTYSSDRYGG
TCDPDGCDFN SYRQGNKSFY GPGMTVDTKK KMTVVTQFLT NDGTATGTLS EIKRFYVQDG KVIANSESTW
PNLGGNSLTN DFCKAQKTVF GDMDTFSKHG GMEGMGAALA EGMVLVMSLW DDHNSNMLWL DSNSPTTGTS
TTPGVARGSC DISSGDPKDL EANHPDASVV YSNIKVGPIG STFNSGGSNP GGSTTTTKPA TSTTTTKATT
TATTNTTGPT GTGVAQPWAQ CGGIGYSGPT QCAAPYTCTK QNDYYSQCL
1906845 Claviceps purpurea MHPSLQTILL SALFTTAHAQ QACSSKPETH PPLSWSRCSR SGCRSVQGAV TVDANWLWTT VDGSQNCYTG
NRWDTSICSS EKTCSESCCI DGADYAGTYG VTTTGDALSL KFVQQGPYSK NVGSRLYLMK DESRYEMFTL
LGNEFTFDVD VSKLGCGLNG ALYFVSMDED GGMKRFPMNK AGAKFGTGYC DSQCPRDVKF INGMANSKDW
IPSKSDANAG IGSLGACCRE MDIWEANNIA SAFTPHPCKN SAYHSCTGDG CGGTYSKNRY SGDCDPDGCD
FNSYRLGNTT FYGPGPKFTI DTTRKISVVT QFLKGRDGSL REIKRFYVQN GKVIPNSVSR VRGVPGNSIT
QGFCNAQKKM FGAHESFNAK GGMKGMSAAV SKPMVLVMSL WDDHNSNMLW LDSTYPTNSR QRGSKRGSCP
ASSGRPTDVE SSAPDSTVVF SNIKFGPIGS TFSRGK
1gpi (PDB) & Phanerochaete ESACTLQSET HPPLTWQKCS SGGTCTQQTG SVVIDANWRW THATNSSTNC YDGNTWSSTL CPDNETCAKN
chrysosporium CCLDGAAYAS TYGVTTSGNS LSIDFVTQSA QKNVGARLYL MASDTTYQEF TLLGNEFSFD VDVSQLPCGL
NGALYFVSMD ADGGVSKYPT NTAGAKYGTG YCDSQCPRDL KFINGQANVE GWEPSSNNAN TGIGGHGSCC
SEMDIWQANS ISEALTPHPC TTVGQEICEG DGCGGTYSDN RYGGTCDPDG CDWNPYRLGN TSFYGPGSSF
TLDTTKKLTV VTQFETSGAI NRYYVQNGVT FQQPNAELGS YSGNELNDDY CTAEEAEFGG SSFSDKGGLT
QFKKATSGGM VLVMSLWDDY YANMLWLDST YPTNETSSTP GAVRGSCSTS SGVPAQVESQ SPNAKVTFSN
IKFGPIGSTG NPSG
119468034 Neosartorya fischeri MHQRALLFSA LAVAANAQQV GTQKPETHPP LTWQKCTAAG SCSQQSGSVV IDANWRWLHS TKDTTNCYTG
NRRL 181 NTWNTELCPD NESCAQNCAV DGADYAGTYG VTTSGSELKL SFVTGANVGS RLYLMQDDET YQHFNLLNNE
FTFDVDVSNL PCGLNGALYF VAMDADGGMS KYPSNKAGAK YGTGYCDSQC PRDLKFINGM ANVEGWKPSS
NDKNAGVGGH GSCCPEMDIW EANSISTAVT PHPCDDVSQT MCSGDACGGT YSATRYAGTC DPDGCDFNPF
RMGNESFYGP GKIVDTKSEM TVVTQFITAD GTDTGALSEI KRLYVQNGKV IANSVSNVAD VSGNSISSDF
CTAQKKAFGD EDIFAKHGGL SGMGKALSEM VLIMSIWDDH HSSMMWLDST YPTDADPSKP GVARGTCEHG
AGDPEKVESQ HPDASVTFSN IKFGPIGSTY KA
7804883 Leptosphaeria MYRSLIFATS LLSLAKGQLV GNLYCKGSCT AKNGKVVIDA NWRWLHVKGG YTNCYTGNEW NATACPDNKS
maculans CATNCAIDGA DYRRLRHYCE RQLLGTEVHH QGLYSTNIGS RTYLMQDDST YQLFKFTGSQ EFTFDVDLSN
LPCGLNGALY FVSMDADGGL KKYPTNKAGA KYGTGYCDAQ CPRDLKFING EGNVEGWQPS KNDQNAGVGG
HGSCCAEMDI WEANSVSTAV TPHSCSTIEQ SRCDGDGCGG TYSADRYAGV CDPDGCDFNS YRMGVKDFYG
KGKTVDTSKK FTVVTQFIGS GDAMEIKRFY VQNGKTIPQP DSTIPGVTGN SITTFFCDAQ KKAFGDKYTF
KDKGGMANMP STCNGMVLVM SLWDDHYSNM LWLDSTYPTD KNPDTDAGSG RGECAITSGV PADVESQHPD
ASVIYSNIKF GPINTTFG
85108032 Neurospora crassa MLAKFAALAA LVASANAQAV CSLTAETHPS LNWSKCTSSG CTNVAGSITV DANWRWTHIT SGSTNCYSGN
N150 (OR74A) EWDTSLCSTN TDCATKCCVD GAEYSSTYGI QTSGNSLSLQ FVTKGSYSTN IGSRTYLMNG ADAYQGFELL
GNEFTFDVDV SGTGCGLNGA LYFVSMDLDG GKAKYTNNKA GAKYGTGYCD AQCPRDLKYI NGIANVEGWT
PSTNDANAGI GDHGTCCSEM DIWEANKVST AFTPHPCTTI EQHMCEGDSC GGTYSDDRYG GTCDADGCDF
NSYRMGNTTF YGEGKTVDTS SKFTVVTQFI KDSAGDLAEI KRFYVQNGKV IENSQSNVDG VSGNSITQSF
CNAQKTAFGD IDDFNKKGGL KQMGKALAKP MVLVMSIWDD HAANMLWLDS TYPVEGGPGA YRGECPTTSG
VPAEVEANAP NSKVIFSNIK FGPIGSTFSG GSSGTPPSNP SSSVKPVTST AKPSSTSTAS NPSGTGAAHW
AQCGGIGFSG PTTCQSPYTC QKINDYYSQC V
169859458 Coprinopsis cinerea MFKKVALTAL CFLAVAQAQQ VGREVAENHP RLPWQRCTRN GGCQTVSNGQ VVLDANWRWL HVTDGYTNCY
okayama TGNSWNSTVC SDPTTCAQRC ALEGANYQQT YGITTNGDAL TIKFLTRSQQ TNVGARVYLM ENENRYQMFN
LLNKEFTFDV DVSKVPCGIN GALYFIQMDA DGGMSKQPNN RAGAKYGTGY CDSQCPRDIK FIDGVANSAD
WTPSETDPNA GRGRYGICCA EMDIWEANSI SNAYTPHPCR TQNDGGYQRC EGRDCNQPRY EGLCDPDGCD
YNPFRMGNKD FYGPGKTVDT NRKMTVVTQF ITHDNTDTGT LVDIRRLYVQ DGRVIANPPT NFPGLMPAHD
SITEQFCTDQ KNLFGDYSSF ARDGGLAHMG RSLAKGHVLA LSIWNDHGAH MLWLDSNYPT DADPNKPGIA
RGTCPTTGGT PRETEQNHPD AQVIFSNIKF GDIGSTFSGY
154292161 Botryotinia fuckeliana MYSAAVLATF SFLLGAGAQQ VGTSTAETHP ALTVQKCAAG GTCTDESDSI VLDANWRWLH STSGSTNCYT
B05-10 GNTWDTTLCP DAATCTTNCA LDGADYEGTY GITTSGDSLK LSFVTGSNVG SRTYLMDSET TYKEFALLGN
EFTFTVDVSK LPCGLNGALY FVPMDADGGM SKYPTNKAGA KYGTGYCDAQ CPQDMKFVNG TANVEGWVPD
SNSANSGTGN IGSCCSEFDV WEANSMSQAL TPHVCTVDSQ TACTGDDCAS NTGVCDGDGC DFNPYRMGNT
TFYGSGMTID TSKPFSVVTQ FITDDGTETG TLTEIKRFYV QDDVVYEQPS SDISGVSGNS ITDDFCAAQK
TAFGDTDYFT QNGGMAAMGK KMADGMVLVL SIWDDYNVNM LWLDSDYPTT KDASTPGVSR GSCATDSGVP
ATVEAASGSA YVTFSSIKYG PIGSTFNAPA DSSSSVSASS SPAPIASSSS SASIAPVSSV VAAIVSSSAQ
AISSAAPVVS SSAQAISSAA PVVSSVVSSA APVATSSTKS KCSKVSSTLK TSVAAPATSA TSAAVVATSS
AASSTGSVPL YGNCTGGKTC SEGTCVVQND YYSQCVASS
169615761 # Phaeosphaeria MTWQRCTGTG GSSCTNVNGE IVIDANWRWI HATGGYTNCF DGNEWNKTAC PSNAACTKNC AIEGSDYRGT
nodorum SN15 YGITTSGNSL TLKFITKGQY STNVGSRTYL MKDTNNYEMF NLIGNEFTFD VDLSQLPCGL NGALYFVSMP
EKGQGTPGAK YGTGKLSQCS VHISKTLTDA CARDLKFVGG EANADGWQAS TSDPNAGVGK KGACCAEMDV
WEANSMSTAL TPHSCQPEGY AVCEESNCGG TYSLDRYAGT CDANGCDFNP YRVGNKDFYG KGKTVDTSKK
MTVVTQFLGT GSDLTELKRF YVQDGKVISN PEPTIPGMTG NSITQKWCDT QKEVFKEEVY PFNQWGGMAS
MGKGMAQGMV LVMSLWDDHY SNMLWLDSTY PTDRDPESPG AARGECAITS GAPAEVEANN PDASVMFSNI
KFGPIGSTFQ QPA
4883502 Humicola grisea MQIKSYIQYL AAALPLLSSV AAQQAGTITA ENHPRMTWKR CSGPGNCQTV QGEVVIDANW RWLHNNGQNC
YEGNKWTSQC SSATDCAQRC ALDGANYQST YGASTSGDSL TLKFVTKHEY GTNIGSRFYL MANQNKYQMF
TLMNNEFAFD VDLSKVECGI NSALYFVAME EDGGMASYPS NRAGAKYGTG YCDAQCARDL KFIGGKANIE
GWRPSTNDPN AGVGPMGACC AEIDVWESNA YAYAFTPHAC GSKNRYHICE TNNCGGTYSD DRFAGYCDAN
GCDYNPYRMG NKDFYGKGKT VDTNRKFTVV SRFERNRLSQ FFVQDGRKIE VPPPTWPGLP NSADITPELC
DAQFRVFDDR NRFAETGGFD ALNEALTIPM VLVMSIWDDH HSNMLWLDSS YPPEKAGLPG GDRGPCPTTS
GVPAEVEAQY PNAQVVWSNI RFGPIGSTVN V
950686 Humicola grisea MRTAKFATLA ALVASAAAQQ ACSLTTERHP SLSWKKCTAG GQCQTVQASI TLDSNWRWTH QVSGSTNCYT
GNKWDTSICT DAKSCAQNCC VDGADYTSTY GITTNGDSLS LKFVTKGQYS TNVGSRTYLM DGEDKYQTFE
LLGNEFTFDV DVSNIGCGLN GALYFVSMDA DGGLSRYPGN KAGAKYGTGY CDAQCPRDIK FINGEANIEG
WTGSTNDPNA GAGRYGTCCS EMDIWEANNM ATAFTPHPCT IIGQSRCEGD SCGGTYSNER YAGVCDPDGC
DFNSYRQGNK TFYGKGMTVD TTKKITVVTQ FLKDANGDLG EIKRFYVQDG KIIPNSESTI PGVEGNSITQ
DWCDRQKVAF GDIDDFNRKG GMKQMGKALA GPMVLVMSIW DDHASNMLWL DSTFPVDAAG KPGAERGACP
TTSGVPAEVE AEAPNSNVVF SNIRFGPIGS TVAGLPGAGN GGNNGGNPPP PTTTTSSAPA TTTTASAGPK
AGRWQQCGGI GFTGPTQCEE PYTCTKLNDW YSQCL
124491660 Chaetomium MQIKQYLQYL AAALPLVNMA AAQRAGTQQT ETHPRLSWKR CSSGGNCQTV NAEIVIDANW RWLHDSNYQN
thermophilum CYDGNRWTSA CSSATDCAQK CYLEGANYGS TYGVSTSGDA LTLKFVTKHE YGTNIGSRVY LMNGSDKYQM
FTLMNNEFAF DVDLSKVECG LNSALYFVAM EEDGGMRSYS SNKAGAKYGT GYCDAQCARD LKFVGGKANI
EGWRPSTNDA NAGVGPYGAC CAEIDVWESN AYAFAFTPHG CLNNNYHVCE TSNCGGTYSE DRFGGLCDAN
GCDYNPYRMG NKDFYGKGKT VDTSRKFTVV TRFEENKLTQ FFIQDGRKID IPPPTWPGLP NSSAITPELC
TNLSKVFDDR DRYEETGGFR TINEALRIPM VLVMSIWDGH YANMLWLDSV YPPEKAGQPG AERGPCAPTS
GVPAEVEAQF PNAQVIWSNI RFGPIGSTYQ V
58045187 Chaetomium MMYKKFAALA ALVAGAAAQQ ACSLTTETHP RLTWKRCTSG GNCSTVNGAV TIDANWRWTH TVSGSTNCYT
thermophilum GNEWDTSICS DGKSCAQTCC VDGADYSSTY GITTSGDSLN LKFVTKHQHG TNVGSRVYLM ENDTKYQMFE
LLGNEFTFDV DVSNLGCGLN GALYFVSMDA DGGMSKYSGN KAGAKYGTGY CDAQCPRDLK FINGEANIEN
WTPSTNDANA GFGRYGSCCS EMDIWDANNM ATAFTPHPCT IIGQSRCEGN SCGGTYSSER YAGVCDPDGC
DFNAYRQGDK TFYGKGMTVD TTKKMTVVTQ FHKNSAGVLS EIKRFYVQDG KIIANAESKI PGNPGNSITQ
EWCDAQKVAF GDIDDFNRKG GMAQMSKALE GPMVLVMSVW DDHYANMLWL DSTYPIDKAG TPGAERGACP
TTSGVPAEIE AQVPNSNVIF SNIRFGPIGS TVPGLDGSTP SNPTATVAPP TSTTTSVRSS TTQISTPTSQ
PGGCTTQKWG QCGGIGYTGC TNCVAGTTCT ELNPWYSQCL
169601100 # Phaeosphaeria MYRNFLYAAS LLSVARSQLV GTQTTETHPG MTWQSCTAKG SCTTCSDNKA CASNCAVDGA DYKGTYGITA
nodorum SN15 SGNSLQLKFI TKGSYSTNIG SRTYLMASDT AYQMFKFDGN KEFTFDVDLS GLPCGFNGAL YFVSMDEDGG
LKKYSGNKAG AKYGTGYCDA QCPRDLKFIN GEGNVEGWKP SDNDANAGVG GHGSCCAEMD IWEANSISTA
VTPHACSTIE QTRCDGDGCG GTYSADRYAG VCDPDGCDFN AYRMGVKNFY GKGMTVDTSK KFTVVTQFIG
TGDAMEIKRF YVQGGKTIEQ PASTIPGVEG NSITTKFCDQ QKQVFGDRYT YKEKGGTANM AKALAQGMVL
VMSLWDDHYS NMLWLDSTYP TDKNPDTDLG SGRGSCDVKS GAPADVESKS PDATVIYSNI KFGPLNSTY
169870197 Coprinopsis cinerea MLGKIAIASL SFLAIAKGQQ VGREVAENHP RLPWQRCTRN GGCQTVSNGQ VVLDANWRWL HVTDGYTNCY
okayama TGNSWNSSVC SDGTTCAQRC ALEGANYQQT YGITTSGNSL TMKFLTRSQG TNVGGRVYLM ENENRYQMFN
LLNKEFTFDV DVSKVPCGIN GALYFIQMDA DGGMSSQPNN RAGAKYGTGY CDSQCPRDIK FIDGVANSVG
WEPSETDSNA GRGRYGICCA EMDIWEANSI SNAYTPHPCR TQNDGGYQRC EGRDCNQPRY EGLCDPDGCD
YNPFRMGNKD FYGPGKTIDT NRKMTVVTQF ITHDNTDTGT LVDIRRLYVQ DGRVIANPPT NFPGLMPAHD
SITEQFCTDQ KNLFGDYSSF ARDGGLAHMG RSLAKGHVLA LSIWNDHGAH MLWLDSNYPT DADPNKPGIA
RGTCPTTGGT PRETEQNHPD AQVIFSNIKF GDIGSTFSGY
3913806 Agaricus bisporus MFPRSILLAL SLTAVALGQQ VGTNMAENHP SLTWQRCTSS GCQNVNGKVT LDANWRWTHR INDFTNCYTG
NEWDTSICPD GVTCAENCAL DGADYAGTYG VTSSGTALTL KFVTESQQKN IGSRLYLMAD DSNYEIFNLL
NKEFTFDVDV SKLPCGLNGA LYFSEMAADG GMSSTNTAGA KYGTGYCDSQ CPRDIKFIDG EANSEGWEGS
PNDVNAGTGN FGACCGEMDI WEANSISSAY TPHPCREPGL QRCEGNTCSV NDRYATECDP DGCDFNSFRM
GDKSFYGPGM TVDTNQPITV VTQFITDNGS DNGNLQEIRR IYVQNGQVIQ NSNVNIPGID SGNSISAEFC
DQAKEAFGDE RSFQDRGGLS GMGSALDRGM VLVLSIWDDH AVNMLWLDSD YPLDASPSQP GISRGTCSRD
SGKPEDVEAN AGGVQVVYSN IKFGDINSTF NNNGGGGGNP SPTTTRPNSP AQTMWGQCGG QGWTGPTACQ
SPSTCHVIND FYSQCF
169611094 Phaeosphaeria MYRNLALASL SLFGAARAQQ AGTVTTETHP SLSWKTCTGT GGTSCTTKAG KITLDANWRW THVTTGYTNC
nodorum SN15 YDGNSWNTTA CPDGATCTKN CAVDGADYSG TYGITTSSNS LSIKFVTKGS NSANIGSRTY LMESDTKYQM
FNLIGQEFTF DVDVSKLPCG LNGALYFVEM AADGGIGKGN NKAGAKYGTG YCDSQCPHDI KFINGKANVE
GWNPSDADPN AGSGKIGACC PEMDIWEANS ISTAYTPHPC KGTGLQECTD DVSCGDGSNR YSGLCDKDGC
DFNSYRMGVK DFYGPGATLD TTKKMTVVTQ FLGSGSTLSE IKRFYVQNGK VFKNSDSAIE GVTGNSITES
FCAAQKTAFG DTNSFKTLGG LNEMGASLAR GHVLVMSLWD DHAVNMLWLD STYPTNSTKL GAQRGTCAID
SGKPEDVEKN HPDATVVFSD IKFGPIGSTF QQPS
3131 Phanerochaete MVDIQIATFL LLGVVGVAAQ QVGTYIPENH PLLATQSCTA SGGCTTSSSK IVLDANRRWI HSTLGTTSCL
chrysosporium TANGWDPTLC PDGITCANYC ALDGVSYSST YGITTSGSAL RLQFVTGTNI GSRVFLMADD THYRTFQLLN
QELAFDVDVS KLPCGLNGAL YFVAMDADGG KSKYPGNRAG AKYGTGYCDS QCPRDVQFIN GQANVQGWNA
TSATTGTGSY GSCCTELDIW EANSNAAALT PHTCTNNAQT RCSGSNCTSN TGFCDADGCD FNSFRLGNTT
FLGAGMSVDT TKTFTVVTQF ITSDNTSTGN LTEIRRFYVQ NGNVIPNSVV NVTGIGAVNS ITDPFCSQQK
KAFIETNYFA QHGGLAQLGQ ALRTGMVLAF SISDDPANHM LWLDSNFPPS ANPAVPGVAR GMCSITSGNP
ADVGILNPSP YVSFLNIKFG SIGTTFRPA
70991503 Aspergillus fumigatus MHQRALLFSA LAVAANAQQV GTQTPETHPP LTWQKCTAAG SCSQQSGSVV IDANWRWLHS TKDTTNCYTG
Af293 NTWNTELCPD NESCAQNCAL DGADYAGTYG VTTSGSELKL SFVTGANVGS RLYLMQDDET YQHFNLLNHE
FTFDVDVSNL PCGLNGALYF VAMDADGGMS KYPSNKAGAK YGTGYCDSQC PRDLKFINGM ANVEGWEPSS
SDKNAGVGGH GSCCPEMDIW EANSISTAVT PHPCDDVSQT MCSGDACGGT YSESRYAGTC DPDGCDFNPF
RMGNESFYGP GKIVDTKSKM TVVTQFITAD GTDSGALSEI KRLYVQNGKV IANSVSNVAG VSGNSITSDF
CTAQKKAFGD EDIFAKHGGL SGMGKALSEM VLIMSIWDDH HSSMMWLDST YPTDADPSKP GVARGTCEHG
AGDPENVESQ HPDASVTFSN IKFGPIGSTY EG
294196 Phanerochaete MFRTATLLAF TMAAMVFGQQ VGTNTAENHR TLTSQKCTKS GGCSNLNTKI VLDANWRWLH STSGYTNCYT
chrysosporium GNQWDATLCP DGKTCAANCA LDGADYTGTY GITASGSSLK LQFVTGSNVG SRVYLMADDT HYQMFQLLNQ
EFTFDVDMSN LPCGLNGALY LSAMDADGGM AKYPTNKAGA KYGTGYCDSQ CPRDIKFING EANVEGWNAT
SANAGTGNYG TCCTEMDIWE ANNDAAAYTP HPCTTNAQTR CSGSDCTRDT GLCDADGCDF NSFRMGDQTF
LGKGLTVDTS KPFTVVTQFI TNDGTSAGTL TEIRRLYVQN GKVIQNSSVK IPGIDPVNSI TDNFCSQQKT
AFGDTNYFAQ HGGLKQVGEA LRTGMVLALS IWDDYAANML WLDSNYPTNK DPSTPGVARG TCATTSGVPA
QIEAQSPNAY VVFSNIKFGD LNTTYTGTVS SSSVSSSHSS TSTSSSHSSS STPPTQPTGV TVPQWGQCGG
IGYTGSTTCA SPYTCHVLNP YYSQCY
18997123 Thermoascus MYQRALLFSF FLAAARAHEA GTVTAENHPS LTWQQCSSGG SCTTQNGKVV IDANWRWVHT TSGYTNCYTG
aurantiacus NTWDTSICPD DVTCAQNCAL DGADYSGTYG VTTSGNALRL NFVTQSSGKN IGSRLYLLQD DTTYQIFKLL
GQEFTFDVDV SNLPCGLNGA LYFVAMDADG NLSKYPGNKA GAKYGTGYCD SQCPRDLKFI NGQANVEGWQ
PSANDPNAGV GNHGSSCAEM DVWEANSIST AVTPHPCDTP GQTMCQGDDC GGTYSSTRYA GTCDPDGCDF
NPYQPGNHSF YGPGKIVDTS SKFTVVTQFI TDDGTPSGTL TEIKRFYVQN GKVIPQSEST ISGVTGNSIT
TEYCTAQKAA FGDNTGFFTH GGLQKISQAL AQGMVLVMSL WDDHAANMLW LDSTYPTDAD PDTPGVARGT
CPTTSGVPAD VESQNPNSYV IYSNIKVGPI NSTFTAN
4204214 Humicola grisea var MQIKSYIQYL AAALPLLSSV AAQQAGTITA ENHPRMTWKR CSGPGNCQTV QGEVVIDANW RWLHNNGQNC
thermoidea YEGNKWTSQC SSATDCAQRC ALDGANYQST YGASTSGDSL TLKFVTKHEY GTNIGSRFYL MANQNKYQMF
TLMNNEFAFD VDLSKVECGI NSALYFVAME EDGGMASYPS NRAGAKYGTG YCDAQCARDL KFIGGKANIE
GWRPSTNDPN AGVGPMGACC AEIDVWESNA YAYAFTPHAC GSKNRYHICE TNNCGGTYSD DRFAGYCDAN
GCDYNPYRMG NKDFYGKGKT VDTNRKFTVV SRFERNRLSQ FFVQDGRKIE VPPPTWPGLP NSADITPELC
DAQFRVFDDR NRFAETGGFD ALNEALTIPM VLVMSIWDDH HSNMLWLDSS YPPEKAGLPG GDRGPCPTTS
GVPAEVEAQY PDAQVVWSNI RFGPIGSTVN V
34582632 Trichoderma viride MYRKLAVISA FLATARAQSA CTLQSETHPP LTWQKCSSGG TCTQQTGSVV IDANWRWTHA TNSSTNCYDG
(also known as NTWSSTLCPD NETCAKNCCL DGAAYASTYG VTTSGNSLSI GFVTQSAQKN VGARLYLMAS DTTYQEFTLL
Hypochrea rufa) GNEFSFDVDV SQLPCGLNGA LYFVSMDADG GVSKYPTNTA GAKYGTGYCD SQCPRDLKFI NGQANVEGWE
PSSNNANTGI GGHGSCCSEM DIWEANSISE ALTPHPCTTV GQEICEGDGC GGTYSDNRYG GTCDPDGCDW
DPYRLGNTSF YGPGSSFTLD TTKKLTVVTQ FETSGAINRY YVQNGVTFQQ PNAELGSYSG NGLNDDYCTA
EEAEFGGSSF SDKGGLTQFK KATSGGMVLV MSLWDDYYAN MLWLDSTYPT NETSSTPGAV RGSCSTSSGV
PAQVESQSPN AKVTFSNIKF GPIGSTGDPS GGNPPGGNPP GTTTTRRPAT TTGSSPGPTQ SHYGQCGGIG
YSGPTVCASG TTCQVLNPYY SQCL
156712284 Thermoascus MYQRALLFSF FLAAARAQQA GTVTAENHPS LTWQQCSSGG SCTTQNGKVV IDANWRWVHT TSGYTNCYTG
aurantiacus NTWDTSICPD DVTCAQNCAL DGADYSGTYG VTTSGNALRL NFVTQSSGKN IGSRLYLLQD DTTYQIFKLL
GQEFTFDVDV SNLPCGLNGA LYFVAMDADG GLSKYPGNKA GAKYGTGYCD SQCPRDLKFI NGQANVEGWQ
PSANDPNAGV GNHGSCCAEM DVWEANSIST AVTPHPCDTP GQTMCQGDDC GGTYSSTRYA GTCDPDGCDF
NPYRQGNHSF YGPGQIVDTS SKFTVVTQFI TDDGTPSGTL TEIKRFYVQN GKVIPQSEST ISGVTGNSIT
TEYCTAQKAA FGDNTGFFTH GGLQKISQAL AQGMVLVMSL WDDHAANMLW LDSTYPTDAD PDTPGVARGT
CPTTSGVPAD VESQYPNSYV IYSNIKVGPI NSTFTAN
39977899 Magnaporthe grisea MIRKITTLAA LVGVVRGQAA CSLTAETHPS LTWQKCSSGG SCTNVAGSVT IDANWRWTHT TSGYTNCYTG
(oryzae) 70-15 NKWDTSICST NADCASKCCV DGANYQQTYG ASTSGNALSL QYVTQSSGKN VGSRLYLLES ENKYQMFNLL
GNEFTFDVDA SKLGCGLNGA VYFVSMDADG GQSKYSGNKA GAKYGTGYCD SQCPRDLKYI NGAANVEGWQ
PSSGDANSGV GNMGSCCAEM DIWEANSIST AYTPHPCSNN AQHSCKGDDC GGTYSSVRYA GDCDPDGCDF
NSYRQGNRTF YGPGSNFNVD SSKKVTVVTQ FISSGGQLTD IKRFYVQNGK VIPNSQSTIT GVTGNSVTQD
YCDKQKTAFG DQNVFNQRGG LRQMGDALAK GMVLVMSVWD DHHSQMLWLD STYPTTSTAP GAARGSCSTS
SGKPSDVQSQ TPGATVVYSN IKFGPIGSTF KSS
20986705 Talaromyces emersonii MLRRALLLSS SAILAVKAQQ AGTATAENHP PLTWQECTAP GSCTTQNGAV VLDANWRWVH DVNGYTNCYT
GNTWDPTYCP DDETCAQNCA LDGADYEGTY GVTSSGSSLK LNFVTGSNVG SRLYLLQDDS TYQIFKLLNR
EFSFDVDVSN LPCGLNGALY FVAMDADGGV SKYPNNKAGA KYGTGYCDSQ CPRDLKFIDG EANVEGWQPS
SNNANTGIGD HGSCCAEMDV WEANSISNAV TPHPCDTPGQ TMCSGDDCGG TYSNDRYAGT CDPDGCDFNP
YRMGNTSFYG PGKIIDTTKP FTVVTQFLTD DGTDTGTLSE IKRFYIQNSN VIPQPNSDIS GVTGNSITTE
FCTAQKQAFG DTDDFSQHGG LAKMGAAMQQ GMVLVMSLWD DYAAQMLWLD SDYPTDADPT TPGIARGTCP
TDSGVPSDVE SQSPNSYVTY SNIKFGPINS TFTAS
22138843 Aspergillus oryzae MHQRALLFSA FWTAVQAQQA GTLTAETHPS LTWQKCAAGG TCTEQKGSVV LDSNWRWLHS VDGSTNCYTG
NTWDATLCPD NESCASNCAL DGADYEGTYG VTTSGDALTL QFVTGANIGS RLYLMADDDE SYQTFNLLNN
EFTFDVDASK LPCGLNGAVY FVSMDADGGV AKYSTNKAGA KYGTGYCDSQ CPRDLKFING QVRKGWEPSD
SDKNAGVGGH GSCCPQMDIW EANSISTAYT PHPCDDTAQT MCEGDTCGGT YSSERYAGTC DPDGCDFNAY
RMGNESFYGP SKLVDSSSPV TVVTQFITAD GTDSGALSEI KRFYVQGGKV IANAASNVDG VTGNSITADF
CTAQKKAFGD DDIFAQHGGL QGMGNALSSM VLTLSIWDDH HSSMMWLDSS YPEDADATAP GVARGTCEPH
AGDPEKVESQ SGSATVTYSN IKYGPIGSTF DAPA
55775695 Penicillium MASTLSFKIY KNALLLAAFL GAAQAQQVGT STAEVHPSLT WQKCTAGGSC TSQSGKVVID SNWRWVHNTG
chrysogenum GYTNCYTGND WDRTLCPDDV TCATNCALDG ADYKGTYGVT ASGSSLRLNF VTQASQKNIG SRLYLMADDS
KYEMFQLLNQ EFTFDVDVSN LPCGLNGALY FVAMDEDGGM ARYPTNKAGA KYGTGYCDAQ CPRDLKFING
QANVEGWEPS SSDVNGGTGN YGSCCAEMDI WEANSISTAF TPHPCDDPAQ TRCTGDSCGG TYSSDRYGGT
CDPDGCDFNP YRMGNQSFYG PSKIVDTESP FTVVTQFITN DGTSTGTLSE IKRFYVQNGK VIPQSVSTIS
AVTGNSITDS FCSAQKTAFK DTDVFAKHGG MAGMGAGLAE GMVLVMSLWD DHAANMLWLD STYPTSASST
TPGAARGSCD ISSGEPSDVE ANHSNAYVVY SNIKVGPLGS TFGSTDSGSG TTTTKVTTTT ATKTTTTTGP
STTGAAHYAQ CGGQNWTGPT TCASPYTCQR QGDYYSQCL
171676762 Podospora anserina MVSAKFAALA ALVASASAQQ VCSLTPESHP PLTWQRCSAG GSCTNVAGSV TLDSNWRWTH TLQGSTNCYS
GNEWDTSICT TGTKCAQNCC VEGAEYAATY GITTSGNQLN LKFVTEGKYS TNVGSRTYLM ENATKYQGFN
LLGNEFTFDV DVSNIGCGLN GALYFVSMDL DGGLAKYSGN KAGAKYGTGY CDAQCPRDIK FINGEANIEG
WNPSTNDVNA GAGRYGTCCS EMDIWEANNM ATAYTPHSCT ILDQSRCEGE SCGGTYSSDR YGGVCDPDGC
DFNSYRMGNK EFYGKGKTVD TTKKMTVVTQ FLKNAAGELS EIKRFYVQNG VVIPNSVSSI PGVPNQNSIT
QDWCDAQKIA FGDPDDNTAK GGLRQMGLAL DKPMVLVMSI WNDHAAHMLW LDSTYPVDAA GRPGAERGAC
PTTSGVPSEV EAEAPNSNVA FSNIKFGPIG STFNSGSTNP NPISSSTATT PTSTRVSSTS TAAQTPTSAP
GGTVPRWGQC GGQGYTGPTQ CVAPYTCVVS NQWYSQCL
146350520 Pleurotus sp Florida MFPYIALVSF SFLSVVLAQQ VGTLTAETHP QLTVQQCTRG GSCTTQQRSV VLDGNWRWLH STSGSNNCYT
GNTWDTSLCP DAATCSRNCA LDGADYSGTY GITSSGNALT LKFVTHGPYS TNIGSRVYLL ADDSHYQMFN
LKNKEFTFDV DVSQLPCGLN GALYFSQMDA DGGTGRFPNN KAGAKYGTGY CDSQCPHDIK FINGEANVQG
WQPSPNDSNA GKGQYGSCCA EMDIWEANSM ASAYTPHPCT VTTPTRCQGN DCGDGDNRYG GVCDKDGCDF
NSFRMGDKNF LGPGKTVNTN SKFTVVTQFL TSDNTTSGTL SEIRRLYVQN GRVIQNSKVN IPGMASTLDS
ITESFCSTQK TVFGDTNSFA SKGGLRAMGN AFDKGMVLVL SIWDDHEAKM LWLDSNYPLD KSASAPGVAR
GTCATTSGEP KDVESQSPNA QVIFSNIKYG DIGSTYSN
37732123 Gibberella zeae myraiatasa LIAAVRAQQV CSLTQESKPS LNWSKCTSSG CSNVKGSVTI DANWRWTHQV SGSTNCYTGN
KWDTSVCTSG KVCAERCCLD GADYASTYGI TSSGDQLSLS FVTKGPYSTN IGSRTYLMED ENTYQMFQLL
GNEFTFDVDV SNIGCGLNGA LYFVSMDADG GKAKYPGNKA GAKYGTGYCD AQCPRDVKFI NGQANSDGWQ
PSDSDVNGGI GNLGTCCPEM DIWEANSIST AYTPHPCTKL TQHSCTGDSC GGTYSNDRYG GTCDADGCDF
NSYRQGNKTF YGPGSGFNVD TTKKVTVVTQ FHKGSNGRLS EITRLYVQNG KVIANSESKI AGVPGNSLTA
DFCTKQKKVF NDPDDFTKKG AWSGMSDALE APMVLVMSLW HDHHSNMLWL DSTYPTDSTK LGSQRGSCST
SSGVPADLEK NVPNSKVAFS NIKFGPIGST YKSDGTTPTN PTNPSEPSNT ANPNPGTVDQ WGQCGGSNYS
GPTACKSGFT CKKINDFYSQ CQ
156055188 Sclerotinia MYSAAVLATF SFLLGAGAQQ VGTLKTESHP PLTIQKCAAG GTCTDEADSV VLDANWRWLH STSGSTNCYT
sclerotiorum 1980 GNTWDTTLCP DAATCTANCA FDGADYEGTY GITSSGDSLK LSFVTGSNVG SRTYLMDSET TYKEFALLGN
EFTFTVDVSK LPCGLNGALY FVPMDADGGM SKYPTNKAGA KYGTGYCDAQ CPQDMKFVSG GANNEGWVPD
SNSANSGTGN IGSCCSEFDV WEANSMSQAL TPHTCTVDGQ TACTGDDCAG NTGVCDADGC DFNPYRMGNT
TFYGSGKTID TTKPFSVVTQ FITDDGTETG TLTEIKRFYV QDDVVYEQPN SDISGVSGNS ITDDFCTAQK
TAFGDTDYFS QKGGMAAMGK KMADGMVLVL SIWDDYNVNM LWLDSDYPTT KDASTPGVSR GSCATTSGVP
ATVEAASGSA YVTFSSIKYG PIGSTFKAPA DSSSPVVASS SPAAVAAVVS TSSAQAVPSH PAVSSSQAAV
STPEAVSSAP EVPASSSAAQ SVAPTSTKPK CSKVSQSSTL ATSVAAPATT ATSAAVAATS AASSSGSVPL
YGNCTGGKTC SEGTCVVQNP WYSQCVASS
453224 Phanerochaete MFRAAALLAF TCLAMVSGQQ AGTNTAENHP QLQSQQCTTS GGCKPLSTKV VLDSNWRWVH STSGYTNCYT
chrysosporium GNEWDTSLCP DGKTCAANCA LDGADYSGTY GITSTGTALT LKFVTGSNVG SRVYLMADDT HYQLLKLLNQ
EFTFDVDMSN LPCGLNGALY LSAMDADGGM SKYPGNKAGA KYGTGYCDSQ CPKDIKFING EANVGNWTET
GSNTGTGSYG TCCSEMDIWE ANNDAAAFTP HPCTTTGQTR CSGDDCARNT GLCDGDGCDF NSFRMGDKTF
LGKGMTVDTS KPFTVVTQFL TNDNTSTGTL SEIRRIYIQN GKVIQNSVAN IPGVDPVNSI TDNFCAQQKT
AFGDTNWFAQ KGGLKQMGEA LGNGMVLALS IWDDHAANML WLDSDYPTDK DPSAPGVARG TCATTSGVPS
DVESQVPNSQ VVFSNIKFGD IGSTFSGTSS PNPPGGSTTS SPVTTSPTPP PTGPTVPQWG QCGGIGYSGS
TTCASPYTCH VLNPYYSQCY
50402144 Trichoderma reesei MYRKLAVISA FLATARAQSA CTLQSETHPP LTWQKCSSGG TCTQQTGSVV IDANWRWTHA TNSSTNCYDG
NTWSSTLCPD NETCAKNCCL DGAAYASTYG VTTSGNSLSI GFVTQSAQKN VGARLYLMAS DTTYQEFTLL
GNEFSFDVDV SQLPCGLNGA LYFVSMDADG GVSKYPTNTA GAKYGTGYCD SQCPRDLKFI NGQANVEGWE
PSSNNANTGI GGHGSCCSEM DIWEANSISE ALTPHPCTTV GQEICEGDGC GGTYSDNRYG GTCDPDGCDW
NPYRLGNTSF YGPGSSFTLD TTKKLTVVTQ FETSGAINRY YVQNGVTFQQ PNAELGSYSG NELNDDYCTA
EEAEFGGSSF SDKGGLTQFK KATSGGMVLV MSLWDDYYAN MLWLDSTYPT NETSSTPGAV RGSCSTSSGV
PAQVESQSPN AKVTFSNIKF GPIGSTGNPS GGNPPGGNRG TTTTRRPATT TGSSPGPTQS HYGQCGGIGY
SGPTVCASGT TCQVLNPYYS QCL
115397177 Aspergillus terreus MPSTYDIYKK LLLLASFLSA SQAQQVGTSK AEVHPSLTWQ TCTSGGSCTT VNGKVVVDAN WRWVHNVDGY
NIH2624 NNCYTGNTWD TTLCPDDETC ASNCALEGAD YSGTYGVTTS GNSLRLNFVT QASQKNIGSR LYLMEDDSTY
KMFKLLNQEF TFDVDVSNLP CGLNGAVYFV SMDADGGMAK YPANKAGAKY GTGYCDSQCP RDLKFINGMA
NVEGWEPSAN DANAGTGNHG SCCAEMDIWE ANSISTAYTP HPCDTPGQVM CTGDSCGGTY SSDRYGGTCD
PDGCDFNSYR QGNKTFYGPG MTVDTKSKIT VVTQFLTNDG TASGTLSEIK RFYVQNGKVI PNSESTWSGV
SGNSITTAYC NAQKTLFGDT DVFTKHGGME GMGAALAEGM VLVLSLWDDH NSNMLWLDSN YPTDKPSTTP
GVARGSCDIS SGDPKDVEAN DANAYVVYSN IKVGPIGSTF SGSTGGGSSS STTATSKTTT TSATKTTTTT
TKTTTTTSAS STSTGGAQHW AQCGGIGWTG PTTCVAPYTC QKQNDYYSQC L
154312003 Botryotinia fuckeliana MISKVLAFTS LLAAARAQQA GTLTTETHPP LSVSQCTASG CTTSAQSIVV DANWRWLHST TGSTNCYTGN
B05-10 TWDKTLCPDG ATCAANCALD GADYSGVYGI TTSGNSIKLN FVTKGANTNV GSRTYLMAAG STTQYQMLKL
LNQEFTFDVD VSNLPCGLNG ALYFAAMDAD GGLSRFPTNK AGAKYGTGYC DAQCPQDIKF INGVANSVGW
TPSSNDVNAG AGQYGSCCSE MDIWEANKIS AAYTPHPCSV DTQTRCTGTD CGIGARYSSL CDADGCDFNS
YRQGNTSFYG AGLTVNTNKV FTVVTQFITN DGTASGTLKE IRRFYVQNGV VIPNSQSTIA GVPGNSITDS
FCAAQKTAFG DTNEFATKGG LATMSKALAK GMVLVMSIWD DHTANMLWLD APYPATKSPS APGVTRGSCS
ATSGNPVDVE ANSPGSSVTF SNIKWGPINS TYTGSGAAPS VPGTTTVSSA PASTATSGAG GVAKYAQCGG
SGYSGATACV SGSTCVALNP YYSQCQ
49333365 Volvariella volvacea MFPAATLFAF SLFAAVYGQQ VGTQLAETHP RLTWQKCTRS GGCQTQSNGA IVLDANWRWV HNVGGYTNCY
TGNTWNTSLC PDGATCAKNC ALDGANYQST YGITTSGNAL TLKFVTQSEQ KNIGSRVYLL ESDTKYQLFN
PLNQEFTFDV DVSQLPCGLN GAVYFSAMDA DGGMSKFPNN AAGAKYGTGY CDSQCPRDIK FINGEANVQG
WQPSPNDTNA GTGNYGACCN EMDVWEANSI STAYTPHPCT QQGLVRCSGT ACGGGSNRYG SICDPDGCDF
NSFRMGDKSF YGPGLTVNTQ QKFTVVTQFL TNNNSSSGTL REIRRLYVQN GRVIQNSKVN IPGMPSTMDS
VTTEFCNAQK TAFNDTFSFQ QKGGMANMSE ALRRGMVLVL SIWDDHAANM LWLDSNYPTD RPASQPGVAR
GTCPTSSGKP SDVENSTANS QVIYSNIKFG DIGSTYSA
729650 Penicillium MKGSISYQIY KGALLLSALL NSVSAQQVGT LTAETHPALT WSKCTAGXCS QVSGSVVIDA NWPXVHSTSG
janthinellum STNCYTGNTW DATLCPDDVT CAANCAVDGA RRQHLRVTTS GNSLRINFVT TASQKNIGSR LYLLENDTTY
QKFNLLNQEF TFDVDVSNLP CGLNGALYFV DMDADGGMAK YPTNKAGAKY GTGYCDSQCP RDLKFINGQA
NVDGWTPSKN DVNSGIGNHG SCCAEMDIWE ANSISNAVTP HPCDTPSQTM CTGQRCGGTY STDRYGGTCD
PDGCDFNPYR MGVTNFYGPG ETIDTKSPFT VVTQFLTNDG TSTGTLSEIK RFYVQGGKVI GNPQSTIVGV
SGNSITDSWC NAQKSAFGDT NEFSKHGGMA GMGAGLADGM VLVMSLWDDH ASDMLWLDST YPTNATSTTP
GAKRGTCDIS RRPNTVESTY PNAYVIYSNI KTGPLNSTFT GGTTSSSSTT TTTSKSTSTS SSSKTTTTVT
TTTTSSGSSG TGARDWAQCG GNGWTGPTTC VSPYTCTKQN DWYSQCL
146424871 Pleurotus sp Florida MFRTAALTAF TLAAVVLGQQ VGTLTAENHP ALSIQQCTAS GCTTQQKSVV LDSNWRWTHS LPVHTNCYTG
NAWDASLCPD PTTCATNCAI DGADYSGTYG ITTSGNALTL RFVTNGPYSK NIGSRVYLLD DADHYKMFDL
KNQEFTFDVD MSGLPCGLNG ALYFSEMPAD GGKAAHTSNK AGAKYGTGYC DAQCPHDIKW INGEANILDW
SASATDANAG NGRYGACCAE MDIWEANSEA TAYTPHVCRD EGLYRCSGTE CGDGDNRYGG VCDKDGCDFN
SYRMGDKNFL GRGKTIDTTK KITVVTQFIT DDNTSSGNLV EIRRVYVQDG VTYQNSFSTF PSLSQYNSIS
DDFCVAQKTL FGDNQYYNTH GGTEKMGDAM ANGMVLIMSL WSDHAAHMLW LDSDYPLDKS PSEPGVSRGA
CATTTGDPDD VVANHPNASV TFSNIKYGPI GSTYGGSTPP VSSGNTSAPP VTSTTSSGPT TPTGPTGTVP
KWGQCGGNGY SGPTTCVAGS TCTYSNDWYS QCL
67538012 Aspergillus nidulans MYQRALLFSA LLSVSRAQQA GTAQEEVHPS LTWQRCEASG SCTEVAGSVV LDSNWRWTHS VDGYTNCYTG
FGSC A4 NEWDATLCPD NESCAQNCAV DGADYEATYG ITSNGDSLTL KFVTGSNVGS RVYLMEDDET YQMFDLLNNE
FTFDVDVSNL PCGLNGALYF TSMDADGGLS KYEGNTAGAK YGTGYCDSQC PRDIKFINGL GNVEGWEPSD
SDANAGVGGM GTCCPEMDIW EANSISTAYT PHPCDSVEQT MCEGDSCGGT YSDDRYGGTC DPDGCDFNSY
RMGNTSFYGP GAIIDTSSKF TVVTQFIADG GSLSEIKRFY VQNGEVIPNS ESNISGVEGN SITSEFCTAQ
KTAFGDEDIF AQHGGLSAMG DAASAMVLIL SIWDDHHSSM MWLDSSYPTD ADPSQPGVAR GTCEQGAGDP
DVVESEHADA SVTFSNIKFG PIGSTF
62006162 Fusarium poae MYRAIATASA LIAAVRAQQV CSLTTETKPA LTWSKCTSSG CSNVQGSVTI DANWRWTHQV SGSTNCHTGN
KWDTSVCTSG KVCAEKCCVD GADYASTYGI TSSGNQLSLS FVTKGSYGTN IGSRTYLMED ENTYQMFQLL
GNEFTFDVDV SNIGCGLNGA LYFVSMDADG GKAKYPGNKA GAKYGTGYCD AQCPRDVKFI NGQANSDGWE
PSKSDVNGGI GNLGTCCPEM DIWEANSIST AYTPHPCTKL TQHACTGDSC GGTYSNDRYG GTCDADGCDF
NAYRQGNKTF YGPGSGFNVD TTKKVTVVTQ FHKGSNGRLS EITRLYVQNG KVIANSESKI AGNPGSSLTS
DFCTTQKKVF GDIDDFAKKG AWNGMSDALE APMVLVMSLW HDHHSNMLWL DSTYPTDSTA LGSQRGSCST
SSGVPADLEK NVPNSKVAFS NIKFGPIGST YNKEGTQPQP TNPTNPNPTN PTNPGTVDQW GQCGGTNYSG
PTACKSPFTC KKINDFYSQC Q
146424873 Pleurotus sp Florida MFRTAALTAF TLAAVVLGQQ VGTLAAENHP ALSIQQCTAS GCTTQQKSVV LDSNWRWTHS TAGATNCYTG
NAWDSSLCPN PTTCATNCAI DGADYSGTYG ITTSGNSLTL RFVTNGQYSE NIGSRVYLLD DADHYKLFNL
KNQEFTFDVD MSGLPCGLNG ALYFSEMAAD GGKAAHTGNN AGAKYGTGYC DAQCPHDIKW INGEANILDW
SGSATDPNAG NGRYGACCAE MDIWEANSEA TAYTPHVCRD EGLYRCSGTE CGDGDNRYGG VCDKDGCDFN
SYRMGDKNFL GRGKTIDTTK KITVVTQFIT DDNTPTGNLV EIRRVYVQDG VTYQNSFSTF PSLSQYNSIS
DDFCVAQKTL FGDNQYYNTH GGTEKMGDSL ANGMVLIMSL WSDHAAHMLW LDSDYPLDKS PSEPGVSRGA
CATTTGDPDD VVANHPNASV TFSNIKYGPI GSTYGGSTPP VSSGNTSVPP VTSTTSSGPT TPTGPTGTVP
KWGQCGGIGY SGPTSCVAGS TCTYSNEWYS QCL
295937 Trichoderma viride MYQKLALISA FLATARAQSA CTLQAETHPP LTWQKCSSGG TCTQQTGSVV IDANWRWTHA TNSSTNCYDG
NTWSSTLCPD NETCAKNCCL DGAAYASTYG VTTSADSLSI GFVTQSAQKN VGARLYLMAS DTTYQEFTLL
GNEFSFDVDV SQLPCGLNGA LYFVSMDADG GVTKYPTNTA GAKYGTGYCD SQCPRDLKFI NGQANVEGWE
PSSNNANTGI GGHGSCCSEM DIWEANSISE ALTPHPCTTV GQEICEGDSC GGTYSGDRYG GTCDPDGCDW
NPYRLGNTSF YGPGSSFTLD TTKKLTVVTQ FETSGAINRY YVQNGVTFQQ PNAELGDYSG NSLDDDYCAA
EEAEFGGSSF SDKGGLTQFK KATSGGMVLV MSLWDDYYAN MLWLDSTYPT DETSSTPGAV RGSSSTSSGV
PAQLESNSPN AKVVYSNIKF GPIGSTGNPS GGNPPGGNPP GTTTPRPATS TGSSPGPTQT HYGQCGGIGY
IGPTVCASGS TCQVLNPYYS QCL
6179889 # Alternaria alternata MTWQSCTAKG SCTNKNGKIV IDANWRWLHK KEGYDNCYTG NEWDATACPD NKACAANCAV DGADYSGTYG
ITAGSNSLKL KFITKGSYST NIGSRTYLMK DDTTYEMFKF TGNQEFTFDV DVSNLPCGFN GALYFVSMDA
DGGLKKYSTN KAGAKYGTGY CDAQCPRDLK FINGEGNVEG WKPSSNDANA GVGGHGSCCA EMDIWEANSV
STAVTPHSCS TIEQSRCDGD GCGGTYSADR YAGVCDPDGC DFNSYRMGVK DFYGKGKTVD TSKKFTVVTQ
FIGTGDAMEI KRFYVQNGKT IAQPASAVPG VEGNSITTKF CDQQKAVFGD TYTFKDKGGM ANMAKALANG
MVLVMSLWDD HYSNMLWLDS TYPTDKNPDT DLGTGRGECE TSSGVPADVE SQHADATVVY SNIKFGPLNS
TFG
119483864 Neosartorya fischeri MASAISFQVY RSALILSAFL PSITQAQQIG TYTTETHPSM TWETCTSGGS CATNQGSVVM DANWRWVHQV
NRRL 181 GSTTNCYTGN TWDTSICDTD ETCATECAVD GADYESTYGV TTSGSQIRLN FVTQNSNGAN VGSRLYMMAD
NTHYQMFKLL NQEFTFDVDV SNLPCGLNGA LYFVTMDEDG GVSKYPNNKA GAQYGVGYCD SQCPRDLKFI
QGQANVEGWT PSSNNENTGL GNYGSCCAEL DIWESNSISQ ALTPHPCDTA TNTMCTGDAC GGTYSSDRYA
GTCDPDGCDF NPYRMGNTTF YGPGKTIDTN SPFTVVTQFI TDDGTDTGTL SEIRRYYVQN GVTYAQPDSD
ISGITGNAIN ADYCTAENTV FDGPGTFAKH GGFSAMSEAM STGMVLVMSL WDDYYADMLW LDSTYPTNAS
SSTPGAVRGS CSTDSGVPAT IESESPDSYV TYSNIKVGPI GSTFSSGSGS GSSGSGSSGS ASTSTTSTKT
TAATSTSTAV AQHYSQCGGQ DWTGPTTCVS PYTCQVQNAY YSQCL
85083281 Neurospora crassa MKAYFEYLVA ALPLLGLATA QQVGKQTTET HPKLSWKKCT GKANCNTVNA EVVIDSNWRW LHDSSGKNCY
OR74A DGNKWTSACS SATDCASKCQ LDGANYGTTY GASTSGDALT LKFVTKHEYG TNIGSRFYLM NGASKYQMFT
LMNNEFAFDV DLSTVECGLN AALYFVAMEE DGGMASYSSN KAGAKYGTGY CDAQCARDLK FVGGKANIEG
WTPSTNDANA GVGPYGGCCA EIDVWESNAH SFAFTPHACK TNKYHVCERD NCGGTYSEDR FAGLCDANGC
DYNPYRMGNT DFYGKGKTVD TSKKFTVVSR FEENKLTQFF VQNGQKIEIP GPKWDGIPSD NANITPEFCS
AQFQAFGDRD RFAEVGGFAQ LNSALRMPMV LVMSIWDDHY ANMLWLDSVY PPEKEGQPGA ARGDCPQSSG
VPAEVESQYA NSKVVYSNIR FGPVGSTVNV
3913803 Cryphonectria MFSKFALTGS LLAGAVNAQG VGTQQTETHP QMTWQSCTSP SSCTTNQGEV VIDSNWRWVH DKDGYVNCYT
parasitica GNTWNTTLCP DDKTCAANCV LDGADYSSTY GITTSGNALS LQFVTQSSGK NIGSRTYLME SSTKYHLFDL
IGNEFAFDVD LSKLPCGLNG ALYFVTMDAD GGMAKYSTNT AGAEYGTGYC DSQCPRDLKF INGQGNVEGW
TPSTNDANAG VGGLGSCCSE MDVWEANSMD MAYTPHPCET AAQHSCNADE CGGTYSSSRY AGDCDPDGCD
WNPFRMGNKD FYGSGDTVDT SQKFTVVTQF HGSGSSLTEI SQYYIQGGTK IQQPNSTWPT LTGYNSITDD
FCKAQKVEFN DTDVFSEKGG LAQMGAGMAD GMVLVMSLWD DHYANMLWLD STYPVDADAS SPGKQRGTCA
TTSGVPADVE SSDASATVIY SNIKFGPIGA TY
60729633 Corticium rolfsii MFPAAALLSF TLLAVASAQQ IGTNTAEVHP SLTVSQCTTS GGCTSSTQSI VLDANWRWLH STSGYTNCYT
GNQWNSDLCP DPDTCATNCA LDGASYESTY GISTDGNAVT LNFVTQGSQT NVGSRVYLLS DDTHYQTFSL
LNKEFSFDVD ASNIGCGING AVYFVQMDAD GGLSKYSSNK AGAQYGTGYC DSQCPQDIKF INGEANLLDW
NATSANSGTG SYGSCCPEMD IWEANKYAAA YTPHPCSVSG QTRCTGTSCG AGSERYDGYC DKDGCDFNSW
RMGNETFLGP GMTIDTNKKF TIVTQFITDD NTANGTLSEI RRLYVQGGTV IQNSVANQPN IPKVNSITDS
FCTAQKTEFG DQDYFGTIGG LSQMGKAMSD MVLVMSIWDD YDAEMLWLDS NYPTSGSAST PGISRGPCSA
TSGLPATVES QQASASVTYS NIKWGDIGST YSGSGSSGSS SSSSSSAASA STSTHTSAAA TATSSAAAAT
GSPVPAYGQC GGQSYTGSTT CASPYVCKVS NAYYSQCLPA
39971383 Magnaporthe grisea MKRALCASLS LLAAAVAQQV GTNEPEVHPK MTWKKCSSGG SCSTVNGEVV IDGNWRWIHN IGGYENCYSG
70-15 NKWTSVCSTN ADCATKCAME GAKYQETYGV STSGDALTLK FVQQNSSGKN VGSRMYLMNG ANKYQMFTLK
NNEFAFDVDL SSVECGMNSA LYFVPMKEDG GMSTEPNNKA GAKYGTGYCD AQCARDLKFI GGKGNIEGWQ
PSSTDSSAGI GAQGACCAEI DIWESNKNAF AFTPHPCENN EYHVCTEPNC GGTYADDRYG GGCDANGCDY
NPYRMGNPDF YGPGKTIDTN RKFTVISRFE NNRNYQILMQ DGVAHRIPGP KFDGLEGETG ELNEQFCTDQ
FTVFDERNRF NEVGGWSKLN AAYEIPMVLV MSIWSDHFAN MLWLDSTYPP EKAGQPGSAR GPCPADGGDP
NGVVNQYPNA KVIWSNVRFG PIGSTYQVD
39973029 Magnaporthe grisea MQLTKAGVFL GALMGGAAAQ QVGTQTAENH PKMTWKKCTG KASCTTVNGE VVIDANWRWL HDASSKNCYD
70-15 GNRWTDSCRT ASDCAAKCSL EGADYAKTYG ASTSGDALSL KFVTRHDYGT NIGSRFYLMN GASKYQMFSL
LGNEFAFDVD LSTIECGLNS ALYFVAMEED GGMKSYSSNK AGAKYGTGYC DAQCARDLKF VGGKANIEGW
KPSSNDANAG VGPYGACCAE IDVWESNAHA FAFTPHPCTD NKYHVCQDSN CGGTYSDDRF AGKCDANGCD
INPYRLGNTD FYGKGKTVDT SKKFTVVTRF ERDALTQFFV QNNKRIDMPS PALEGLPATG AITAEYCTNV
FNVFGDRNRF DEVGGWSQLQ QALSLPMVLV MSIWDDHYSN MLWLDSVYPP DKEGSPGAAR GDCPQDSGVP
SEVESQIPGA TVVWSNIRFG PVGSTVNV
1170141 Fusarium oxysporum MYRIVATASA LIAAARAQQV CSLNTETKPA LTWSKCTSSG CSDVKGSVVI DANWRWTHQT SGSTNCYTGN
KWDTSICTDG KTCAEKCCLD GADYSGTYGI TSSGNQLSLG FVTNGPYSKN IGSRTYLMEN ENTYQMFQLL
GNEFTFDVDV SGIGCGLNGA PHFVSMDEDG GKAKYSGNKA GAKYGTGYCD AQCPRDVKFI NGVANSEGWK
PSDSDVNAGV GNLGTCCPEM DIWEANSIST AFTPHPCTKL TQHSCTGDSC GGTYSSDRYG GTCDADGCDF
NAYRQGNKTF YGPGSNFNID TTKKMTVVTQ FHKGSNGRLS EITRLYVQNG KVIANSESKI AGNPGSSLTS
DFCSKQKSVF GDIDDFSKKG GWNGMSDALS APMVLVMSLW HDHHSNMLWL DSTYPTDSTK VGSQRGSCAT
TSGKPSDLER DVPNSKVSFS NIKFGPIGST YKSDGTTPNP PASSSTTGSS TPTNPPAGSV DQWGQCGGQN
YSGPTTCKSP FTCKKINDFY SQCQ
121710012 Aspergillus clavatus MYQRALLFSA LATAVSAQQV GTQKAEVHPA LTWQKCTAAG SCTDQKGSVV IDANWRWLHS TEDTTNCYTG
NRRL 1 NEWNAELCPD NEACAKNCAL DGADYSGTYG VTADGSSLKL NFVTSANVGS RLYLMEDDET YQMFNLLNNE
FTFDVDVSNL PCGLNGALYF VSMDADGGLS KYPGNKAGAK YGTGYCDSQC PRDLKFINGE ANVEGWKPSD
NDKNAGVGGY GSCCPEMDIW EANSISTAYT PHPCDGMEQT RCDGNDCGGT YSSTRYAGTC DPDGCDFNSF
RMGNESFYGP GGLVDTKSPI TVVTQFVTAG GTDSGALKEI RRVYVQGGKV IGNSASNVAG VEGDSITSDF
CTAQKKAFGD EDIFSKHGGL EGMGKALNKM ALIVSIWDDH ASSMMWLDST YPVDADASTP GVARGTCEHG
LGDPETVESQ HPDASVTFSN IKFGPIGSTY KSV
17902580 Penicillium MSALNSFNMY KSALILGSLL ATAGAQQIGT YTAETHPSLS WSTCKSGGSC TTNSGAITLD ANWRWVHGVN
funiculosum TSTNCYTGNT WNTAICDTDA SCAQDCALDG ADYSGTYGIT TSGNSLRLNF VTGSNVGSRT YLMADNTHYQ
IFDLLNQEFT FTVDVSNLPC GLNGALYFVT MDADGGVSKY PNNKAGAQYG VGYCDSQCPR DLKFIAGQAN
VEGWTPSTNN SNTGIGNHGS CCAELDIWEA NSISEALTPH PCDTPGLTVC TADDCGGTYS SNRYAGTCDP
DGCDFNPYRL GVTDFYGSGK TVDTTKPFTV VTQFVTDDGT SSGSLSEIRR YYVQNGVVIP QPSSKISGIS
GNVINSDFCA AELSAFGETA SFTNHGGLKN MGSALEAGMV LVMSLWDDYS VNMLWLDSTY PANETGTPGA
ARGSCPTTSG NPKTVESQSG SSYVVFSDIK VGPFNSTFSG GTSTGGSTTT TASGTTSTKA STTSTSSTST
GTGVAAHWGQ CGGQGWTGPT TCASGTTCTV VNPYYSQCL
1346226 Humicola grisea var MRTAKFATLA ALVASAAAQQ ACSLTTERHP SLSWNKCTAG GQCQTVQASI TLDSNWRWTH QVSGSTNCYT
thermoidea GNKWDTSICT DAKSCAQNCC VDGADYTSTY GITTNGDSLS LKFVTKGQHS TNVGSRTYLM DGEDKYQTFE
LLGNEFTFDV DVSNIGCGLN GALYFVSMDA DGGLSRYPGN KAGAKYGTGY CDAQCPRDIK FINGEANIEG
WTGSTNDPNA GAGRYGTCCS EMDIWEANNM ATAFTPHPCT IIGQSRCEGD SCGGTYSNER YAGVCDPDGC
DFNSYRQGNK TFYGKGMTVD TTKKITVVTQ FLKDANGDLG EIKRFYVQDG KIIPNSESTI PGVEGNSITQ
DWCDRQKVAF GDIDDFNRKG GMKQMGKALA GPMVLVMSIW DDHASNMLWL DSTFPVDAAG KPGAERGACP
TTSGVPAEVE AEAPNSNVVF SNIRFGPIGS TVAGLPGAGN GGNNGGNPPP PTTTTSSAPA TTTTASAGPK
AGRWQQCGGI GFTGPTQCEE PYICTKLNDW YSQCL
156712282 Chaetomium MMYKKFAALA ALVAGASAQQ ACSLTAENHP SLTWKRCTSG GSCSTVNGAV TIDANWRWTH TVSGSTNCYT
thermophilum GNQWDTSLCT DGKSCAQTCC VDGADYSSTY GITTSGDSLN LKFVTKHQYG TNVGSRVYLM ENDTKYQMFE
LLGNEFTFDV DVSNLGCGLN GALYFVSMDA DGGMSKYSGN KAGAKYGTGY CDAQCPRDLK FINGEANVGN
WTPSTNDANA GFGRYGSCCS EMDVWEANNM ATAFTPHPCT TVGQSRCEAD TCGGTYSSDR YAGVCDPDGC
DFNAYRQGDK TFYGKGMTVD TNKKMTVVTQ FHKNSAGVLS EIKRFYVQDG KIIANAESKI PGNPGNSITQ
EYCDAQKVAF SNTDDFNRKG GMAQMSKALA GPMVLVMSVW DDHYANMLWL DSTYPIDQAG APGAERGACP
TTSGVPAEIE AQVPNSNVIF SNIRFGPIGS TVPGLDGSNP GNPTTTVVPP ASTSTSRPTS STSSPVSTPT
GQPGGCTTQK WGQCGGIGYT GCTNCVAGTT CTQLNPWYSQ CL
169768818 Aspergillus oryzae MASLSLSKIC RNALILSSVL STAQGQQVGT YQTETHPSMT WQTCGNGGSC STNQGSVVLD ANWRWVHQTG
RIB40 SSSNCYTGNK WDTSYCSTND ACAQKCALDG ADYSNTYGIT TSGSEVRLNF VTSNSNGKNV GSRVYMMADD
THYEVYKLLN QEFTFDVDVS KLPCGLNGAL YFVVMDADGG VSKYPNNKAG AKYGTGYCDS QCPRDLKFIQ
GQANVEGWVS STNNANTGTG NHGSCCAELD IWESNSISQA LTPHPCDTPT NTLCTGDACG GTYSSDRYSG
TCDPDGCDFN PYRVGNTTFY GPGKTIDTNK PITVVTQFIT DDGTSSGTLS EIKRFYVQDG VTYPQPSADV
SGLSGNTINS EYCTAENTLF EGSGSFAKHG GLAGMGEAMS TGMVLVMSLW DDYYANMLWL DSNYPTNEST
SKPGVARGTC STSSGVPSEV EASNPSAYVA YSNIKVGPIG STFKS
46241270 Gibberella pulicaris MYRAIATASA LIAAVRAQQV CSLTPETKPA LSWSKCTSSG CSNVQGSVTI DANWRWTHQL SGSTNCYTGN
KWDTSICTSG KVCAEKCCID GAEYASTYGI TSSGNQLSLS FVTKGAYGTN IGSRTYLMED ENTYQMFQLL
GNEFTFDVDV SNIGCGLNGA LYFVSMDADG GKAKYPGNKA GAKYGTGYCD AQCPRDVKFI NGQANSDGWQ
PSKSDVNAGI GNMGTCCPEM DIWEANSIST AYTPHPCTKL TQHSCTGDSC GGTYSNDRYG GTCDADGCDF
NAYRQGNKTF YGPGSGFNVD TTKKVTVVTQ FHKGSNGRLS EITRLYVQNG KVIANSESKI AGVPGSSLTP
EFCTAQKKVF GDTDDFAKKG AWSGMSDALE APMVLVMSLW HDHHSNMLWL DSTYPTDSTK LGAQRGSCST
SSGVPADLEK NVPNSKVAFS NIKFGPIGST YKEGVPEPTN PTNPTNPTNP TNPGTVDQWA QCGGTNYSGP
TACKSPFTCK KINDFYSQCQ
49333363 Volvariella volvacea MFPKSSLLVL SFLATAYAQQ VGTQTAEVHP SLNWARCTSS GCTNVAGSVT LDANWRWLHT TSGYTNCYTG
NSWNTTLCPD GATCAQNCAL DGANYQSTCG ITTSGNALTL KFVTQGEQKN IGSRVYLMAS ESRYEMFGLL
NKEFTFDVDV SNLPCGLNGA LYFSSMDADG GMAKNPGNKA GAKYGTGYCD SQCPRDIKFI NGEANVAGWN
GSPNDTNAGT GNWGACCNEM DIWEANSISA AYTPHPCTVQ GLSRCSGTAC GTNDRYGTVC DPDGCDFNSY
RMGDKTYYGP GGTGVDTRSK FTVVTQFLTN NNSSSGTLSE IRRLYVQNGR VVQNSKVNIP GMSNTLDSIT
TGFCDSQKTA FGDTRSFQNK GGMSAMGQAL GAGMVLVLSV WDDHAANMLW LDSNYPVDAD PSKPGIARGT
CSTTSGKPTD VEQSAANSSV TFSNIKFGDI GTTYTGGSVT TTPGNPGTTT STAPGAVQTK WGQCGGQGWT
GPTRCESGST CTVVNQWYSQ CI
46395332 Irpex lacteus MFRKAALLAF SFLAIAHGQQ VGTNQAENHP SLPSQHCTAS GCTTSSTSVV LDANWRWVHT TTGYTNCYTG
QTWDASICPD GVTCAKACAL DGADYSGTYG ITTSGNALTL QFVKGTNVGS RVYLLQDASN YQLFKLINQE
FTFDVDMSNL PCGLNGAVYL SQMDQDGGVS RFPTNTAGAK YGTGYCDSQC PRDIKFINGE ANVAGWTGSS
SDPNSGTGNY GTCCSEMDIW EANSVAAAYT PHPCSVNQQT RCTGADCGQD ANRYKGVCDP DGCDFNSFRM
GDQTFLGKGL TVDTSRKFTI VTQFISDDGT SSGNLAEIRR FYVQDGKVIP NSKVNIAGCD AVNSITDKFC
TQQKTAFGDT NRFADQGGLK QMGAALKSGM VLALSLWDDH AANMLWLDSD YPTTADASKP GVARGTCPNT
SGVPKDVESQ SGSATVTYSN IKWGDLNSTF SGTASNPTGP SSSPSGPSSS SSSTAGSQPT QPSSGSVAQW
GQCGGIGYSG ATGCVSPYTC HVVNPYYSQC Y
50844407 # Chaetomium TETHPRLTWK RCTSGGNCST VNGAVTIDAN WRWTHTVSGS TNCYTGNEWD TSICSDGKSC AQTCCVDGAD
thermophilum var YSSTYGITTS GDSLNLKFVT KHQHGTNVGS RVYLMENDTK YQMFELLGNE FTFDVDVSNL GCGLNGALYF
thermophilum VSMDADGGMS KYSGNKAGAK YGTGYCDAQC PRDLKFINGE ANIENWTPST NDANAGFGRY GSCCSEMDIW
EANNMATAFT PHPCTIIGQS RCEGNSCGGT YSSERYAGVC DPDGCDFNAY RQGDKTFYGK GMTVDTTKKM
TVVTQFHKNS AGVLSEIKRF YVQDGKIIAN AESKIPGNPG NSITQEWCDA QKVAFGDIDD FNRKGGMAQM
SKALEGPMVL VMSVWDDHYA NMLWLDSTYP IDKAGTPGAE RGACPTTSGV PAEIEAQVPN SNVIFSNIRF
GPIGSTVPGL DGSTPSNPTA TVAPPTSTTT SVRSSTTQIS TPTSQPGGCT TQKWGQCGGI GYTGCTNCVA
GTTCTELNPW YSQCL
4586347 Irpex lacteus MFHKAVLVAF SLVTIVHGQQ AGTQTAENHP QLSSQKCTAG GSCTSASTSV VLDSNWRWVH TTSGYTNCYT
GNTWDASICS DPVSCAQNCA LDGADYAGTY GITTSGDALT LKFVTGSNVG SRVYLMEDET NYQMFKLMNQ
EFTFDVDVSN LPCGLNGAVY FVQMDQDGGT SKFPNNKAGA KFGTGYCDSQ CPQDIKFING EANIVDWTAS
AGDANSGTGS FGTCCQEMDI WEANSISAAY TPHPCTVTEQ TRCSGSDCGQ GSDRFNGICD PDGCDFNSFR
MGNTEFYGKG LTVDTSQKFT IVTQFISDDG TADGNLAEIR RFYVQNGKVI PNSVVQITGI DPVNSITEDF
CTQQKTVFGD TNNFAAKGGL KQMGEAVKNG MVLALSLWDD YAAQMLWLDS DYPTTADPSQ PGVARGTCPT
TSGVPSQVEG QEGSSSVIYS NIKFGDLNST FTGTLTNPSS PAGPPVTSSP SEPSQSTQPS QPAQPTQPAG
TAAQWAQCGG MGFTGPTVCA SPFTCHVLNP YYSQCY
3980202 Phanerochaete MFRAAALLAF TCLAMVSGQQ AGTNTAENHP QLQSQQCTTS GGCKPLSTKV VLDSNWRWVH STSGYTNCYT
chrysosporium GNEWNTSLCP DGKTCAANCA LDGADYSGTY GITSTGTALT LKFVTGSNVG SRVYLMADDT HYQLLKLLNQ
EFTFDVDMSN LPCGLNGALY LSAMDADGGM SKYPGNKAGA KYGTGYCDSQ CPKDIKFING EANVGNWTET
GSNTGTGSYG TCCSEMDIWE ANNDAAAFTP HPCTTTGQTR CSGDDCARNT GLCDHGDGCD FNSFRMGDKT
FLGKGMTVDT SKPFTDVTQF LTNDNTSTGT LSEIRRIYIQ NGKVIQNSVA NIPGVDPVNS ITDNFCAQQK
TAFGDTNWFA QKGGLKQMGE ALGNGMVLAL SIWDDHAANM LWLDSDYPTD KDPSAPGVAR GTCATTSGVP
SDVESQVPNS QVVFSNIKFG DIGSTFSGTS SPNPPGGSTT SSPVTTSPTP PPTGPTVPQW GQCGGIGYSG
STTCASPYTC HVLNPYYSQC Y
27125837 Melanocarpus MMMKQYLQYL AAALPLVGLA AGQRAGNETP ENHPPLTWQR CTAPGNCQTV NAEVVIDANW RWLHDDNMQN
albomyces CYDGNQWTNA CSTATDCAEK CMIEGAGDYL GTYGASTSGD ALTLKFVTKH EYGTNVGSRF YLMNGPDKYQ
MFNLMGNELA FDVDLSTVEC GINSALYFVA MEEDGGMASY PSNQAGARYG TGYCDAQCAR DLKFVGGKAN
IEGWKSSTSD PNAGVGPYGS CCAEIDVWES NAYAFAFTPH ACTTNEYHVC ETTNCGGTYS EDRFAGKCDA
NGCDYNPYRM GNPDFYGKGK TLDTSRKFTV VSRFEENKLS QYFIQDGRKI EIPPPTWEGM PNSSEITPEL
CSTMFDVFND RNRFEEVGGF EQLNNALRVP MVLVMSIWDD HYANMLWLDS IYPPEKEGQP GAARGDCPTD
SGVPAEVEAQ FPDAQVVWSN IRFGPIGSTY DF
171696102 Podospora anserina MYRSATFLTF ASLVLGQQVG TYTAERHPSM PIQVCTAPGQ CTRESTEVVL DANWRWTHIT NGYTNCYTGN
EWNATACPDG ATCAKNCAVD GADYSGTYGI TTPSSGALRL QFVKKNDNGQ NVGSRVYLMA SSDKYKLFNL
LNKEFTFDVD VSKLPCGLNG AVYFSEMLED GGLKSFSGNK AGAKYGTGYC DSQCPQDIKF INGEANVEGW
GGADGNSGTG KYGICCAEMD IWEANSDATA YTPHVCSVNE QTRCEGVDCG AGSDRYNSIC DKDGCDFNSY
RLGNREFYGP GKTVDTTRPF TIVTQFVTDD GTDSGNLKSI HRYYVQDGNV IPNSVTEVAG VDQTNFISEG
FCEQQKSAFG DNNYFGQLGG MRAMGESLKK MVLVLSIWDD HAVNMNWLDS IFPNDADPEQ PGVARGRCDP
ADGVPATIEA AHPDAYVIYS NIKFGAINST FTAN
3913802 Cochliobolus MYRTLAFASL SLYGAARAQQ VGTSTAENHP KLTWQTCTGT GGTNCSNKSG SVVLDSNWRW AHNVGGYTNC
carbonum YTGNSWSTQY CPDGDSCTKN CAIDGADYSG TYGITTSNNA LSLKFVTKGS FSSNIGSRTY LMETDTKYQM
FNLINKEFTF DVDVSKLPCG LNGALYFVEM AADGGIGKGN NKAGAKYGTG YCDSQCPHDI KFINGKANVE
GWNPSDADPN GGAGKIGACC PEMDIWEANS ISTAYTPHPC RGVGLQECSD AASCGDGSNR YDGQCDKDGC
DFNSYRMGVK DFYGPGATLD TTKKMTVITQ FLGSGSSLSE IKRFYVQNGK VYKNSQSAVA GVTGNSITES
FCTAQKKAFG DTSSFAALGG LNEMGASLAR GHVLIMSLWG DHAVNMLWLD STYPTDADPS KPGAARGTCP
TTSGKPEDVE KNSPDATVVF SNIKFGPIGS TFAQPA
50403723 Trichoderma viride MYQKLALISA FLATARAQSA CTLQAETHPP LTWQKCSSGG TCTQQTGSVV IDANWRWTHA TNSSTNCYDG
NTWSSTLCPD NETCAKNCCL DGAAYASTYG VTTSADSLSI GFVTQSAQKN VGARLYLMAS DTTYQEFTLL
GNEFSFDVDV SQLPCGLNGA LYFVSMDADG GVSKYPTNTA GAKYGTGYCD SQCPRDLKFI NGQANVEGWE
PSSNNANTGI GGHGSCCSEM DIWEANSISE ALTPHPCTTV GQEICDGDSC GGTYSGDRYG GTCDPDGCDW
NPYRLGNTSF YGPGSSFTLD TTKKLTVVTQ FETSGAINRY YVQNGVTFQQ PNAELGDYSG NSLDDDYCAA
EEAEFGGSSF SDKGGLTQFK KATSGGMVLV MSLWDDYYAN MLWLDSTYPT NETSSTPGAV RGSCSTSSGV
PAQLESNSPN AKVVYSNIKF GPIGSTGNSS GGNPPGGNPP GTTTTRRPAT STGSSPGPTQ THYGQCGGIG
YSGPTVCASG STCQVLNPYY SQCL
3913798 Aspergillus aculeatus MVDSFSIYKT ALLLSMLATS NAQQVGTYTA ETHPSLTWQT CSGSGSCTTT SGSVVIDANW RWVHEVGGYT
NCYSGNTWDS SICSTDTTCA SECALEGATY ESTYGVTTSG SSLRLNFVTT ASQKNIGSRL YLLADDSTYE
TFKLFNREFT FDVDVSNLPC GLNGALYFVS MDADGGVSRF PTNKAGAKYG TGYCDSQCPR DLKFIDGQAN
IEGWEPSSTD VNAGTGNHGS CCPEMDIWEA NSISSAFTAH PCDSVQQTMC TGDTCGGTYS DTTDRYSGTC
DPDGCDFNPY RFGNTNFYGP GKTVDNSKPF TVVTQFITHD GTDTGTLTEI RRLYVQNGVV IGNGPSTYTA
ASGNSITESF CKAEKTLFGD TNVFETHGGL SAMGDALGDG MVLVLSLWDD HAADMLWLDS DYPTTSCASS
PGVARGTCPT TTGNATYVEA NYPNSYVTYS NIKFGTLNST YSGTSSGGSS SSSTTLTTKA STSTTSSKTT
TTTSKTSTTS SSSTNVAQLY GQCGGQGWTG PTTCASGTCTKQNDYYSQCL
66828465 Dictyostelium MYRILKSFIL LSLVNMSLSQ KIGKLTPEVH PPMTFQKCSE GGSCETIQGE VVVDANWRWV HSAQGQNCYT
discoideum GNTWNPTICP DDETCAENCY LDGANYESVY GVTTSEDSVR LNFVTQSQGK NIGSRLFLMS NESNYQLFHV
LGQEFTFDVD VSNLDCGLNG ALYLVSMDSD GGSARFPTNE AGAKYGTGYC DAQCPRDLKF ISGSANVDGW
IPSTNNPNTG YGNLGSCCAE MDLWEANNMA TAVTPHPCDT SSQSVCKSDS CGGAASSNRY GGICDPDGCD
YNPYRMGNTS FFGPNKMIDT NSVITVVTQF ITDDGSSDGK LTSIKRLYVQ DGNVISQSVS TIDGVEGNEV
NEEFCTNQKK VFGDEDSFTK HGGLAKMGEA LKDGMVLVLS LWDDYQANML WLDSSYPTTS SPTDPGVARG
SCPTTSGVPS KVEQNYPNAY VVYSNIKVGP IDSTYKK
156060391 Sclerotinia MISRVLAISS LLAAARAQQI GTNTAEVHPA LTSIVIDANW RWLHTTSGYT NCYTGNSWDA TLCPDAVTCA
sclerotiorum 1980 ANCALDGADY SGTYGITTSG NSLKLNFVTK GANTNVGSRT YLMAAGSKTQ YQLLKLLGQE FTFDVDVSNL
PCGLNGALYF AEMDADGGVS RFPTNKAGAQ YGTGYCDAQC PQDIKFINGQ ANSVGWTPSS NDVNTGTGQY
GSCCSEMDIW EANKISAAYT PHPCSVDGQT RCTGTDCGIG ARYSSLCDAD GCDFNSYRMG DTGFYGAGLT
VDTSKVFTVV TQFITNDGTT SGTLSEIRRF YVQNGKVIPN SQSKVTGVSG NSITDSFCAA QKTAFGDTNE
FATKGGLATM SKALAKGMVL VMSIWDDHSA NMLWLDAPYP ASKSPSAAGV SRGSCSASSG VPADVEANSP
GASVTYSNIK WGPINSTYSA GTGSNTGSGS GSTTTLVSSV PSSTPTSTTG VPKYGQCGGS GYTGPTNCIG
STCVSMGQYY SQCQ
116181754 Chaetomium globosum MYRQVATALS FASLVLGQQV GTLTAETHPS LPIEVCTAPG SCTKEDTTVV LDANWRWTHV TDGYTNCYTG
CBS 148-51 NAWNETACPD GKTCAANCAI DGAEYEKTYG ITTPEEGALR LNFVTESNVG SRVYLMAGED KYRLFNLLNK
EFTMDVDVSN LPCGLNGAVY FSEMDEDGGM SRFEGNKAGA KYGTGYCDSQ CPRDIKFING EANSEGWGGE
DGNSGTGKYG TCCAEMDIWE ANLDATAYTP HPCKVTEQTR CEDDTECGAG DARYEGLCDR DGCDFNSFRL
GNKEFYGPEK TVDTSKPFTL VTQFVTADGT DTGALQSIRR FYVQDGTVIP NSETVVEGVD PTNEITDDFC
AQQKTAFGDN NHFKTIGGLP AMGKSLEKMV LVLSIWDDHA VYMNWLDSNY PTDADPTKPG VARGRCDPEA
GVPETVEAAH PDAYVIYSNI KIGALNSTFA AA
145230535 Aspergillus niger MSSFQVYRAA LLLSILATAN AQQVGTYTTE THPSLTWQTC TSDGSCTTND GEVVIDANWR WVHSTSSATN
CYTGNEWDTS ICTDDVTCAA NCALDGATYE ATYGVTTSGS ELRLNFVTQG SSKNIGSRLY LMSDDSNYEL
FKLLGQEFTF DVDVSNLPCG LNGALYFVAM DADGGTSEYS GNKAGAKYGT GYCDSQCPRD LKFINGEANC
DGWEPSSNNV NTGVGDHGSC CAEMDVWEAN SISNAFTAHP CDSVSQTMCD GDSCGGTYSA SGDRYSGTCD
PDGCDYNPYR LGNTDFYGPG LTVDTNSPFT VVTQFITDDG TSSGTLTEIK RLYVQNGEVI ANGASTYSSV
NGSSITSAFC ESEKTLFGDE NVFDKHGGLE GMGEAMAKGM VLVLSLWDDY AADMLWLDSD YPVNSSASTP
GVARGTCSTD SGVPATVEAE SPNAYVTYSN IKFGPIGSTY SSGSSSGSGS SSSSSSTTTK ATSTTLKTTS
TTSSGSSSTS AAQAYGQCGG QGWTGPTTCV SGYTCTYENA YYSQCL
46241266 Nectria haematococca MYRAIATASA LLATARAQQV CTLNTENKPA LTWAKCTSSG CSNVRGSVVV DANWRWAHST SSSTNCYTGN
mpVI TWDKTLCPDG KTCADKCCLD GADYSGTYGV TSSGNQLNLK FVTVGPYSTN VGSRLYLMED ENNYQMFDLL
GNEFTFDVDV NNIGCGLNGA LYFVSMDKDG GKSRFSTNKA GAKYGTGYCD AQCPRDVKFI NGVANSDEWK
PSDSDKNAGV GKYGTCCPEM DIWEANKIST AYTPHPCKSL TQQSCEGDAC GGTYSATRYA GTCDPDGCDF
NPYRQGNKTF YGPGSGFNVD TTKKVTVVTQ FIKGSDGKLS EIKRLYVQNG KVIGNPQSEI ANNPGSSVTD
SFCKAQKVAF NDPDDFNKKG GWSGMSDALA KPMVLVMSLW HDHYANMLWL DSTYPKGSKT PGSARGSCPE
DSGDPDTLEK EVPNSGVSFS NIKFGPIGST YTGTGGSNPD PEEPEEPEEP VGTVPQYGQC GGINYSGPTA
CVSPYKCNKI NDFYSQCQ
1q9h (PDB) # Talaromyces emersonii EQAGTATAEN HPPLTWQECT APGSCTTQNG AVVLDANWRW VHDVNGYTNC YTGNTWDPTY CPDDETCAQN
CALDGADYEG TYGVTSSGSS LKLNFVTGSN VGSRLYLLQD DSTYQIFKLL NREFSFDVDV SNLPCGLNGA
LYFVAMDADG GVSKYPNNKA GAKYGTGYCD SQCPRDLKFI DGEANVEGWQ PSSNNANTGI GDHGSCCAEM
DVWEANSISN AVTPHPCDTP GQTMCSGDDC GGTYSNDRYA GTCDPDGCDF NPYRMGNTSF YGPGKIIDTT
KPFTVVTQFL TDDGTDTGTL SEIKRFYIQN SNVIPQPNSD ISGVTGNSIT TEFCTAQKQA FGDTDDFSQH
GGLAKMGAAM QQGMVLVMSL WDDYAAQMLW LDSDYPTDAD PTTPGIARGT CPTDSGVPSD VESQSPNSYV
TYSNIKFGPI NSTFTAS
157362170 Polyporus arcularius MFPTLALVSL SFLAIAYGQQ VGTLTAETHP KLSVSQCTAG GSCTTVQRSV VLDSNWRWLH DVGGSTNCYT
GNTWDDSLCP DPTTCAANCA LDGADYSGTY GITTSGNALS LKFVTQGPYS TNIGSRVYLL SEDDSTYEMF
NLKNQEFTFD VDMSALPCGL NGALYFVEMD KDGGSGRFPT NKAGSKYGTG YCDTQCPHDI KFINGEANVL
DWAGSSNDPN AGTGHYGTCC NEMDIWEANS MGAAVTPHVC TVQGQTRCEG TDCGDGDERY DGICDKDGCD
FNSWRMGDQT FLGPGKTVDT SSKFTVVTQF ITADNTTSGD LSEIRRLYVQ NGKVIANSKT QIAGMDAYDS
ITDDFCNAQK TTFGDTNTFE QMGGLATMGD AFETGMVLVM SIWDDHEAKM LWLDSDYPTD ADASAPGVSR
GPCPTTSGDP TDVESQSPGA TVIFSNIKTG PIGSTFTS
7804885 Leptosphaeria MLSASKAAAI LAFCAHTASA WVVGDQQTET HPKLNWQRCT GKGRSSCTNV NGEVVIDANW RWLAHRSGYT
maculans NCYTGSEWNQ SACPNNEACT KNCAIEGSDY AGTYGITTSG NQMNIKFITK RPYSTNIGAR TYLMKDEQNY
EMFQLIGNEF TFDVDLSQRC GMNGALYFVS MPQKGQGAPG AKYGTGYCDA QCARDLKFVR GSANAEGWTK
SASDPNSGVG KKGACCAQMD VWEANSAATA LTPHSCQPAG YSVCEDTNCG GTYSEDRYAG TCDANGCDFN
PFRVGVKDFY GKGKTVDTTK KMTVVTQFVG SGNQLSEIKR FYVQDGKVIA NPEPTIPGME WCNTQKKVFQ
EEAYPFNEFG GMASMSEGMS QGMVLVMSLW DDHYANMLWL DSNWPREADP AKPGVARRDC PTSGGKPSEV
EAANPNAQVM FSNIKFGPIG STFAHAA
121852 Phanerochaete MFRTATLLAF TMAAMVFGQQ VGTNTARSHP ALTSQKCTKS GGCSNLNTKI VLDANWRWLH STSGYTNCYT
chrysosporium GNQWDATLCP DGKTCAANCA LDGADYTGTY GITASGSSLK LQFVTGSNVG SRVYLMADDT HYQMFQLLNQ
EFTFDVDMSN LPCGLNGALY LSAMDADGGM AKYPTNKAGA KYGTGYCDSQ CPRDIKFING EANVEGWNAT
SANAGTGNYG TCCTEMDIWE ANNDAAAYTP HPCTTNAQTR CSGSDCTRDT GLCDADGCDF NSFRMGDQTF
LGKGLTVDTS KPFTVVTQFI TNDGTSAGTL TEIRRLYVQN GKVIQNSSVK IPGIDPVNSI TDNFCSQQKT
AFGDTNYFAQ HGGLKQVGEA LRTGMVLALS IWDDYAANML WLDSNYPTNK DPSTPGVARG TCATTSGVPA
QIEAQSPNAY VVFSNIKFGD LNTTYTGTVS SSSVSSSHSS TSTSSSHSSS STPPTQPTGV TVPQWGQCGG
IGYTGSTTCA SPYTCHVLNP YYSQCY
126013214 Penicillium decumbens MYQRALLFSA LMAGVSAQQV GTQKPETHPP LAWKECTSSG CTSKDGSVVI DANWRWVHSV DGYKNCYTGN
EWDSTLCPDD ATCATNCAVD GADYAGTYGA TTEGDSLSIN FVTGSNIGSR FYLMEDENKY QMFKLLNKEF
TFDVDVSTLP CGLNGALYFV SMDADGGMSK YETNKAGAKY GTGYCDSQCP RDLKFINGKG NVEGWKPSAN
DKNAGVGPHG SCCAEMDIWE ANSISTALTP HPCDTNGQTI CEGDSCGGTY STTRYAGTCD PDGCDFNPFR
MGNESFYGPG KMVDTKSKMT VVTQFITSDG TDTGSLKEIK RVYVQNGKVI ANSASDVSGI TGNSITSDFC
TAQKKTFGDE DVFNKHGGLS GMGDALGEGM VLVMSLWDDH NSNMLWLDGE KYPTDAAASK AGVSRGTCST
DSGKPSTVES ESGSAKVVFS NIKVGSIGST FSA
156048578 Sclerotinia MTSKIALASL FAAAYGQQIG TYTTETHPSL TWQSCTAKGS CTTQSGSIVL DGNWRWTHST TSSTNCYTGN
sclerotiorum 1980 TWDATLCPDD ATCAQNCALD GADYSGTYGI TTSGDSLRLN FVTQTANKNV GSRVYLLADN THYKTFNLLN
QEFTFDVDVS NLPCGLNGAV YFANLPADGG ISSTNKAGAQ YGTGYCDSQC PRDGKFINGK ANVDGWVPSS
NNPNTGVGNY GSCCAEMDIW EANSISTAVT PHSCDTVTQT VCTGDNCGGT YSTTRYAGTC DPDGCDFNPY
RQGNESFYGP GKTVDTNSVF TIVTQFLTTD GTSSGTLNEI KRFYVQNGKV IPNSESTISG VTGNSITTPF
CTAQKTAFGD PTSFSDHGGL ASMSAAFEAG MVLVLSLWDD YYANMLWLDS TYPTTKTGAG GPRGTCSTSS
GVPASVEASS PNAYVVYSNI KVGAINSTFG
156712278 Acremonium MYTKFAALAA LVATVRGQAA CSLTAETHPS LQWQKCTAPG SCTTVSGQVT IDANWRWLHQ TNSSTNCYTG
thermophilum NEWDTSICSS DTDCATKCCL DGADYTGTYG VTASGNSLNL KFVTQGPYSK NIGSRMYLME SESKYQGFTL
LGQEFTFDVD VSNLGCGLNG ALYFVSMDLD GGVSKYTTNK AGAKYGTGYC DSQCPRDLKF INGQANIDGW
QPSSNDANAG LGNHGSCCSE MDIWEANKVS AAYTPHPCTT IGQTMCTGDD CGGTYSSDRY AGICDPDGCD
FNSYRMGDTS FYGPGKTVDT GSKFTVVTQF LTGSDGNLSE IKRFYVQNGK VIPNSESKIA GVSGNSITTD
FCTAQKTAFG DTNVFEERGG LAQMGKALAE PMVLVLSVWD DHAVNMLWLD STYPTDSTKP GAARGDCPIT
SGVPADVESQ APNSNVIYSN IRFGPINSTY TGTPSGGNPP GGGTTTTTTT TTSKPSGPTT TTNPSGPQQT
HWGQCGGQGW TGPTVCQSPY TCKYSNDWYS QCL
21449327 Aspergillus nidulans MYQRALLFSA LLSVSRAQQA GTAQEEVHPS LTWQRCEASG SCTEVAGSVV LDSNWRWTHS VDGYTNCYTG
(also known as NEWDATLCPD NESCAQNCAV DGADYEATYG ITSNGDSLTL KFVTGSNVGS RVYLMEDDET YQMFDLLNNE
Emericella nidulans) FTFDVDVSNF PCGLNGALYF TSMDADGGLS KYEGNTAGAK YGTGYCDSQC PRDIKFINGL GNVEGWEPSD
SDANAGVGGM GTCCPEMDIW EANSISTAYT PHPCDSVEQT MCEGDSCGGT YSDDRYGGTC DPDGCDFNSY
RMGNTRFYGP GAIIDTSSKF TVVTQFIADG GSLSEIKRFY VQNGEVIPNS ESNISGVEGN SITSEFCTAQ
KTAFGDEDIF AQHGGLSAMG DAASAMVLIL SIWDDHHSSM MWLDSSYPTD ADPSQPGVAR GTCEQGAGDP
DVVESEHADA SVTFSNIKFG PIGSTF
171683762 Podospora anserine (S MMMKQYLQYL AAGSLMTGLV AGQGVGTQQT ETHPRITWKR CTGKANCTTV QAEVVIDSNW RWIHTSGGTN
mat+) CYDGNAWNTA ACSTATDCAS KCLMEGAGNY QQTYGASTSG DSLTLKFVTK HEYGTNVGSR FYLMNGASKY
QMFTLMNNEF TFDVDLSTVE CGLNSALYFV AMEEDGGMRS YPTNKAGAKY GTGYCDAQCA RDLKFVGGKA
NIEGWRESSN DENAGVGPYG GCCAEIDVWE SNAHAYAFTP HACENNNYHV CERDTCGGTY SEDRFAGGCD
ANGCDYNPYR MGNPDFYGKG KTVDTTKKFT VVTRFQDDNL EQFFVQNGQK ILAPAPTFDG IPASPNLTPE
FCSTQFDVFT DRNRFREVGD FPQLNAALRI PMVLVMSIWA DHYANMLWLD SVYPPEKEGE PGAARGPCAQ
DSGVPSEVKA NYPNAKVVWS NIRFGPIGST VNV
56718412 Thermoascus MYQRALLFSF FLAAARAQQA GTVTAENHPS LTWQQCSSGG SCTTQNGKVV IDANWRWVHT TSGYTNCYTG
aurantiacus var NTWDTSICPD DVTCAQNCAL DGADYSGTYG VTTSGNALRL NFVTQSSGKN IGSRLYLLQD DTTYQIFKLL
levisporus GQEFTFDVDV SNLPCGLNGA LYFVAMDADG GLSKYPGNKA GAKYGTGYCD SQCPRDLKFI NGQANVEGWQ
PSANDPNAGV GNHGSCCAEM DVWEANSIST AVTPHPCDTP GQTMCQGDDC GGTYSSTRYA GTCDPDGCDF
NPYRQGNHSF YGPGKIVDTS SKFTVVTQFI TDDGTPSGTL TEIKRFYVQN GKVIPQSEST ISGVTGNSIT
TEYCTAQKAA FGDNTGFFTH GGLQKISQAL AQGMVLVMSL WDDHAANMLW LDSTYPTDAD PDTPGVARGT
CPTTSGVPAD VESQNPNSYV IYSNIKVGPI NSTFTAN
15824273 Pseudotrichonympha MFAIVLLGLT RSLGTGTNQA ENHPSLSWQN CRSGGSCTQT SGSVVLDSNW RWTHDSSLTN CYDGNEWSSS
grassii LCPDPKTCSD NCLIDGADYS GTYGITSSGN SLKLVFVTNG PYSTNIGSRV YLLKDESHYQ IFDLKNKEFT
FTVDDSNLDC GLNGALYFVS MDEDGGTSRF SSNKAGAKYG TGYCDAQCPH DIKFINGEAN VENWKPQTND
ENAGNGRYGA CCTEMDIWEA NKYATAYTPH ICTVNGEYRC DGSECGDTDS GNRYGGVCDK DGCDFNSYRM
GNTSFWGPGL IIDTGKPVTV VTQFVTKDGT DNGQLSEIRR KYVQGGKVIE NTVVNIAGMS SGNSITDDFC
NEQKSAFGDT NDFEKKGGLS GLGKAFDYGM VLVLSLWDDH QVNMLWLDSI YPTDQPASQP GVKRGPCATS
SGAPSDVESQ HPDSSVTFSD IRFGPIDSTY
115390801 Aspergillus terreus MHQRALLFSA LVGAVRAQQA GTLTEEVHPP LTWQKCTADG SCTEQSGSVV IDSNWRWLHS TNGSTNCYTG
NIH2624 NTWDESLCPD NEACAANCAL DGADYESTYG ITTSGDALTL TFVTGENVGS RVYLMAEDDE SYQTFDLVGN
EFTFDVDVSN LPCGLNGALY FTSMDADGGV SKYPANKAGA KYGTGYCDSQ CPRDLKFING MANVEGWTPS
DNDKNAGVGG HGSCCPELDI WEANSISSAF TPHPCDDLGQ TMCSGDDCGG TYSETRYAGT CDPDGCDFNA
YRMGNTSYYG PDKIVDTNSV MTVVTQFIGD GGSLSEIKRL YVQNGKVIAN AQSNVDGVTG NSITSDFCTA
QKTAFGDQDI FSKHGGLSGM GDAMSAMVLI LSIWDDHNSS MMWLDSTYPE DADASEPGVA RGTCEHGVGD
PETVESQHPG ATVTFSKIKF GPIGSTYSSN STA
453223 Phanerochaete MFRAAALLAF TCLAMVSGQQ AGTNTAENHP QLQSQQCTTS GGCKPLSTKV VLDSNWRWVH STSGYTNCYT
chrysosporium GNEWDTSLCP DGKTCAANCA LDGADYSGTY GITSTGTALT LKFVTGSNVG SRVYLMADDT HYQLLKLLNQ
EFTFDVDMSN LPCGLNGALY LSAMDADGGM SKYPGNKAGA KYGTGYCDSQ CPKDIKFING EANVGNWTET
GSNTGTGSYG TCCSEMDIWE ANNDAAAFTP HPCTTTGQTR CSGDDCARNT GLCDGDGCDF NSFRMGDKTF
LGKGMTVDTS KPFTVVTQFL TNDNTSTGTL SEIRRIYIQN GKVIQNSVAN IPGVDPVNSI TDNFCAQQKT
AFGDTNWFAQ KGGLKQMGEA LGNGMVLALS IWDDHAANML WLDSDYPTDK DPSAPGVARG TCATTSGVPS
DVESQVPNSQ VVFSNIKFGD IGSTFSGTSS PNPPGGSTTS SPVTTSPTPP PTGPTVPQWG QCGGIGYSGS
TTCASPYTCH VLNPCESILS LQRSSNADQY LQTTRSATKR RLDTALQPRK
3132 Phanerochaete MRTALALILA LAAFSAVSAQ QAGTITAETH PTLTIQQCTQ SGGCAPLTTK VVLDVNWRWI HSTTGYTNCY
chrysosporium SGNTWDAILC PDPVTCAANC ALDGADYTGT FGILPSGTSV TLRPVDGLGL RLFLLADDSH YQMFQLLNKE
FTFDVEMPNM RCGSSGAIHL TAMDADGGLA KYPGNQAGAK YGTGFCSAQC PKGVKFINGQ ANVEGWLGTT
ATTGTGFFGS CCTDIALWEA NDNSASFAPH PCTTNSQTRC SGSDCTADSG LCDADGCNFN SFRMGNTTFF
GAGMSVDTTK LFTVVTQFIT SDNTSMGALV EIHRLYIQNG QVIQNSVVNI PGINPATSIT DDLCAQENAA
FGGTSSFAQH GGLAQVGEAL RSGMVLALSI VNSAADTLWL DSNYPADADP SAPGVARGTC PQDSASIPEA
PTPSVVFSNI KLGDIGTTFG AGSALFSGRS PPGPVPGSAP ASSATATAPP FGSQCGGLGY AGPTGVCPSP
YTCQALNIYY SQCI
16304152 Thermoascus MYQRALLFSF FLAAARAHEA GTVTAENHPS LTWQQCSSGG SCTTQNGKVV IDANWRWVHT TSGYTNCYTG
aurantiacus NTWDTSICPD DVTCAQNCAL DGADYSGTYG VTTSGNALRL NFVTQSSGKN IGSRLYLLQD DTTYQIFKLL
GQEFTFDVDV SNLPCGLNGA LYFVAMDADG NLSKYPGNKA GAKYGTGYCD SQCPRDLKFI NGQANVEGWQ
PSANDPNAGV GNHGSSCAEM DVWEANSIST AVTPHPCDTP GQTMCQGDDC GGTYSSTRYA GTCDTDGCDF
NPYQPGNHSF YGPGKIVDTS SKFTVVTQFI TDDGTPSGTL TEIKRFYVQN GKVIPQSEST ISGVTGNSIT
TEYCTAQKAA FDNTGFFTHG GLQKISQALA QGMVLVMSLW DDHAANMLWL DSTYPTDADP DTPGVARGTC
PTTSGVPADV ESQNPNSYVI YSNIKVGPIN STFTAN
156712280 Acremonium MHKRAATLSA LVVAAAGFAR GQGVGTQQTE THPKLTFQKC SAAGSCTTQN GEVVIDANWR WVHDKNGYTN
thermophilum CYTGNEWNTT ICADAASCAS NCVVDGADYQ GTYGASTSGN ALTLKFVTKG SYATNIGSRM YLMASPTKYA
MFTLLGHEFA FDVDLSKLPC GLNGAVYFVS MDEDGGTSKY PSNKAGAKYG TGYCDSQCPR DLKFIDGKAN
SASWQPSSND QNAGVGGMGS CCAEMDIWEA NSVSAAYTPH PCQNYQQHSC SGDDCGGTYS ATRFAGDCDP
DGCDWNAYRM GVHDFYGNGK TVDTGKKFSI VTQFKGSGST LTEIKQFYVQ DGRKIENPNA TWPGLEPFNS
ITPDFCKAQK QVFGDPDRFN DMGGFTNMAK ALANPMVLVL SLWDDHYSNM LWLDSTYPTD ADPSAPGKGR
GTCDTSSGVP SDVESKNGDA TVIYSNIKFG PLDSTYTAS
5231154 Volvariella volvacea MRASLLAFSL NSAAGQQAGT LQTKNHPSLT SQKCRQGGCP QVNTTIVLDA NWRWTHSTSG STNCYTGNTW
QATLCPDGKT CAANCALDGA DYTGTYGVTT SGNSLTLQFV TQSNVGARLG YLMADDTTYQ MFNLLNQEFW
FDVDMSNLPC GLNGALYFSA MARTAAWMPM VVCASTPLIS TRRSTARLLR LPVPPRSRYG RGICDSQCPR
DIKFINGEAN VQGWQPSPND TNAGTGNYGA CCNKMDVWEA NSISTAYTPH PCTQRGLVRC SGTACGGGSN
RYGSICDHDG LGFQNLFGMG RTRVRARVGR VKQFNRSSRV VEPISWTKQT TLHLGNLPWK SADCNVQNGR
VIQNSKVNIP GMPSTMDSVT TEFCNAQKTA FNDTFSFQQK GGMANMSEAL RRGMVLVLSI WDDHAANMLW
LDSITSAAAC RSTPSEVHAT PLRESQIRSS HSRQTRYVTF TNIKFGPFNS TGTTYTTGSV PTTSTSTGTT
GSSTPPQPTG VTVPQGQCGG IGYTGPTTCA SPTTCHVLNP YYSQCY
116200349 Chaetomium globosum MKQYLQYLAA ALPLMSLVSA QGVGTSTSET HPKITWKKCS SGGSCSTVNA EVVIDANWRW LHNADSKNCY
CBS 148-51 DGNEWTDACT SSDDCTSKCV LEGAEYGKTY GASTSGDSLS LKFLTKHEYG TNIGSRFYLM NGASKYQMFT
LMNNEFAFDV DLSTVECGLN SALYFVAMEE DGGMASYSTN KAGAKYGTGY CDAQCARDLK FVGGKANYDG
WTPSSNDANA GVGALGGCCA EIDVWESNAH AFAFTPHACE NNNYHVCEDT TCGGTYSEDR FAGDCDANGC
DYNPYRVGNT DFYGKGMTVD TSKKFTVVSQ FQENKLTQFF VQNGKKIEIP GPKHEGLPTE SSDITPELCS
AMPEVFGDRD RFAEVGGFDA LNKALAVPMV LVMSIWDDHY ANMLWLDSSY PPEKAGTPGG DRGPCAQDSG
VPSEVESQYP DATVVWSNIR FGPIGSTVQV
4586343 Irpex lacteus MFPKASLIAL SFIAAVYGQQ VGTQMAEVHP KLPSQLCTKS GCTNQNTAVV LDANWRWLHT TSGYTNCYTG
NSWDATLCPD ATTCAQNCAV DGADYSGTYG ITTSGNALTL KFKTGTNVGS RVYLMQTDTA YQMFQLLNQE
FTFDVDMSNL PCGLNGALYL SQMDQDGGLS KFPTNKAGAK YGTGYCDSQC PHDIKFINGM ANVAGWAGSA
SDPNAGSGTL GTCCSEMDIW EANNDAAAFT PHPCSVDGQT QCSGTQCGDD DERYSGLCDK DGCDFNSFRM
GDKSFLGKGM TVDTSRKFTV VTQFVTTDGT TNGDLHEIRR LYVQDGKVIQ NSVVSIPGID AVDSITDNFC
AQQKSVFGDT NYFATLGGLK KMGAALKSGM VLAMSVWDDH AASMQWLDSN YPADGDATKP GVARGTCSAD
SGLPTNVESQ SASASVTFSN IKWGDINTTF TGTGSTSPSS PAGPVSSSTS VASQPTQPAQ GTVAQWGQCG
GTGFTGPTVC ASPFTCHVVNPYYSQCY
15321718 Lentinula edodes MFRTAALLSF AYLAVVYGQQ AGTSTAETHP PLTWEQCTSG GSCTTQSSSV VLDSNWRWTH VVGGYTNCYT
GNEWNTTVCP DGTTCAANCA LDGADYEGTY GISTSGNALT LKFVTASAQT NVGSRVYLMA PGSETEYQMF
NPLNQEFTFD VDVSALPCGL NGALYFSEMD ADGGLSEYPT NKAGAKYGTG YCDSQCPRDI KFIEGKANVE
GWTPSSTSPN AGTGGTGICC NEMDIWEANS ISEALTPHPC TAQGGTACTG DSCSSPNSTA GICDQAGCDF
NSFRMGDTSF YGPGLTVDTT SKITVVTQFI TSDNTTTGDL TAIRRIYVQN GQVIQNSMSN IAGVTPTNEI
TTDFCDQQKT AFGDTNTFSE KGGLTGMGAA FSRGMVLVLS IWDDDAAEML WLDSTYPVGK TGPGAARGTC
ATTSGQPDQV ETQSPNAQVV FSNIKFGAIG STFSSTGTGT GTGTGTGTGT GTTTSSAPAA TQTKYGQCGG
QGWTGATVCA SGSTCTSSGP YYSQCL
146424875 Pleurotus sp Florida MFRTAALTAF TFAAVVLGQQ VGTLTTENHP ALSIQQCTAT GCTTQQKSVV LDSNWRWTHS TAGATNCYTG
NAWDPALCPD PATCATNCAI DGADYSGTYG ITTSGNALTL RFVTNGQYSQ NIGSRVYLLD DADHYKLFDL
KNQEFTFDVD MSGLPCGLNG ALYFSEMAAD GGKAAHAGNN AGAKYGTGYC DAQCPHDIKW INGEANVLDW
SASATDDNAG NGRYGACCAE MDIWEANSEA TAYTPHVCRD EGLYRCSGTE CGDGNNRYGG VCDKDGCDFN
SYRMGDKNFL GRGKTIDTTK KVTVVTQFIT DNNTPTGNLV EIRRVYVQNG VVYQNSFSTF PSLSQYNSIS
DEFCVAQKTL FGDNQYYNTH GGTTKMGDAF DNGMVLIMSL WSDHAAHMLW LDSDYPLDKS PSEPGVSRGA
CPTSSGDPDD VVANHPNASV TFSNIKYGPI GSTFGGSTPP VSSGGSSVPP VTSTTSSGTT TPTGPTGTVP
KWGQCGGIGY SGPTACVAGS TCTYSNDWYS QCL
62006158 Fusarium venenatum MYRAIATASA LIAAVRAQQV CSLTPETKPA LSWSKCTSSG CSNVQGSVTI DANWRWTHQL SGSTNCYTGN
KWDTSICTSG KVCAEKCCID GAEYASTYGI TSSGNQLSLS FVTKGTYGTN IGSRTYLMED ENTYQMFQLL
GNEFTFDVDV SNIGCGLNGA LYFVSMDADG GKAKYPGNKA GAKYGTGYCD AQCPRDVKFI NGQANSDGWQ
PSKSDVNGGI GNLGTCCPEM DIWEANSIST AHTPHPCTKL TQHSCTGDSC GGTYSEDRYG GTCDADGCDF
NAYRQGNKTF YGPGSGFNVD TTKKVTVVTQ FHKGSNGRLS EITRLYVQNG KVIANSESKI AGVPGSSLTP
EFCTAQKKVF GDIDDFEKKG AWGGMSDALE APMVLVMSLW HDHHSNMLWL DSTYPTDSTK LGAQRGSCST
SSGVPADLEK NVPNSKVAFS NIKFGPIGST YKEGQPEPTN PTNPNPTTPG GTVDQWGQCG GTNYSGPTAC
KSPFTCKKIN DFYSQCQ
296027 Phanerochaete MFRTATLLAF TMAAMVFGQQ VGTNTAENHR TLTSQKCTKS GGCSNLNTKI VLDANWRWLH STSGYTNCYT
chrysosporium GNQWDATLCP DGKTCAANCA LDGADYTGTY GITASGSSLK LQFVTGSNVG SRVYLMADDT HYQMFQLLNQ
EFTFDVDMSN LPCGLNGALY LSAMDADGGM AKYPTNKAGA KYGTGYCDSQ CPRDIKFING EANVEGWNAT
SANAGTGNYG TCCTEMDIWE ANNDAAAYTP HPCTTNAQTR CSGSDCTRDT GLCDADGCDF NSFRMGDQTF
LGKGLTVDTS KPFTVVTQFI TNDGTSAGTL TEIRRLYVQN GKVIQNSSVK IPGIDLVNSI TDNFCSQQKT
AFGDTNYFAQ HGGLKQVGEA LRTGMVLALS IWDDYAANML WLDSNYPTNK DPSTPGVARG TCATTSGVPA
QIEAQSPNAY VVFSNIKFGD LNTTYTGTVS SSSVSSSHSS TSTSSSHSSS STPPTQPTGV TVPQWGQCGG
IGYTGSTTCA SPYTCHVLNP YYSQCY
154449709 Fusicoccum sp MYQTSLLASL SFLLATSQAQ QVGTQTAETH PKLTTQKCTT AGGCTDQSTS IVLDANWRWL HTVDGYTNCY
BCC4124 TGQEWDTSIC TDGKTCAEKC ALDGADYEST YGISTSGNAL TMNFVTKSSQ TNIGGRVYLL AADSDDTYEL
FKLKNQEFTF DVDVSNLPCG LNGALYFSEM DSDGGLSKYT TNKAGAKYGT GYCDTQCPHD IKFINGEANV
QNWTASSTDK NAGTGHYGSC CNEMDIWEAN SQATAFTPHV CEAKVEGQYR CEGTECGDGD NRYGGVCDKD
GCDFNSYRMG NETFYGSNGS TIDTTKKFTV VTQFITADNT ATGALTEIRR KYVQNDVVIE NSYADYETLS
KFNSITDDFC AAQKTLSGDT NDFKTKGGIA RMGESFERGM VLVMSVWDDH AANALWLDSS YPTDADASKP
GVKRGPCSTS SGVPSDVEAN DADSSVIYSN IRYGDIGSTF NKTA
169859460 Coprinopsis cinerea MFSKVALTAL CFLAVAQAQQ VGREVAENHP RLPWQRCTRN GGCQTVSNGQ VVLDANWRWL HVTDGYTNCY
okayama TGNAWNSSVC SDGATCAQRC ALEGANYQQT YGITTSGDAL TIKFLTRSEQ TNIGARVYLM ENEDRYQMFN
LLNKEFTFDV DVSKVPCGIN GALYFIQMDA DGGLSSQPNN RAGAKYGTGY CDSQCPRDIK FINGEANSVG
WEPSETDPNA GKGQYGICCA EMDIWEANSI SNAYTPHPCQ TVNDGGYQRC QGRDCNQPRY EGLCDPDGCD
YNPFRMGNKD FYGPGKTVDT NRKMTVVTQF ITHDNTDTGT LVDIRRLYVQ DGRVIANPPT NFPGLMPAHD
SITQEFCDDA KRAFEDNDSF GRNGGLAHMG RSLAKGHVLA LSIWNDHTAH MLWLDSNYPT DADPNKPGIA
RGTCPTTGGS PRDTEQNHPD AQVIFSNIKF GDIGSTFSGN
50400675 Trichoderma MYRKLAVISA FLAAARAQQV CTQQAETHPP LTWQKCTASG CTPQQGSVVL DANWRWTHDT KSTTNCYDGN
harzianum (anamorph TWSSTLCPDD ATCAKNCCLD GANYSGTYGV TTSGDALTLQ FVTASNVGSR LYLMANDSTY QEFTLSGNEF
of Hypocrea lixii) SFDVDVSQLP CGLNGALYFV SMDADGGQSK YPGNAAGAKY GTGYCDSQCP RDLKFINGQA NVEGWEPSSN
NANTGVGGHG SCCSEMDIWE ANSISEALTP HPCETVGQTM CSGDSCGGTY SNDRYGGTCD PDGCDWNPYR
LGNTSFYGPG SSFALDTTKK LTVVTQFATD GSISRYYVQN GVKFQQPNAQ VGSYSGNTIN TDYCAAEQTA
FGGTSFTDKG GLAQINKAFQ GGMVLVMSLW DDYAVNMLWL DSTYPTNATA STPGAKRGSC STSSGVPAQV
EAQSPNSKVI YSNIRFGPIG STGGNTGSNP PGTSTTRAPP SSTGSSPTAT QTHYGQCGGT GWTGPTRCAS
GYTCQVLNPF YSQCL
729649 Neurospora crassa MRASLLAFSL AAAVAGGQQA GTLTAKRHPS LTWQKCTRGG CPTLNTTMVL DANWRWTHAT SGSTKCYTGN
(OR74A) KWQATLCPDG KSCAANCALD GADYTGTYGI TGSGWSLTLQ FVTDNVGARA YLMADDTQYQ MLELLNQELW
FDVDMSNIPC GLNGALYLSA MDADGGMRKY PTNKAGAKYA TGYCDAQCPR DLKYINGIAN VEGWTPSTND
ANGIGDHGSC CSEMDIWEAN KVSTAFTPHP CTTIEQHMCE GDSCGGTYSD DRYGVLCDAD GCDFNSYRMG
NTTFYGEGKT VDTSSKFTVV TQFIKDSAGD LAEIKAFYVQ NGKVIENSQS NVDGVSGNSI TQSFCKSQKT
AFGDIDDFNK KGGLKQMGKA LAQAMVLVMS IWDDHAANML WLDSTYPVPK VPGAYRGSGP TTSGVPAEVD
ANAPNSKVAF SNIKFGHLGI SPFSGGSSGT PPSNPSSSAS PTSSTAKPSS TSTASNPSGT GAAHWAQCGG
IGFSGPTTCP EPYTCAKDHD IYSQCV
119472134 Neosartorya fischeri MLASTFSYRM YKTALILAAL LGSGQAQQVG TSQAEVHPSM TWQSCTAGGS CTTNNGKVVI DANWRWVHKV
NRRL 181 GDYTNCYTGN TWDKTLCPDD ATCASNCALE GANYQSTYGA TTSGDSLRLN FVTTSQQKNI GSRLYMMKDD
TTYEMFKLLN QEFTFDVDVS NLPCGLNGAL YFVAMDADGG MSKYPTNKAG AKYGTGYCDS QCPRDLKFIN
GQANVEGWQP SSNDANAGTG NHGSCCAEMD IWEANSISTA FTPHPCDTPG QVMCTGDACG GTYSSDRYGG
TCDPDGCDFN SFRQGNKTFY GPGMTVDTKS KFTVVTQFIT DDGTASGTLK EIKRFYVQNG KVIPNSESTW
SGVGGNSITN DYCTAQKSLF KDQNVFAKHG GMEGMGAALA QGMVLVMSLW DDHAANMLWL DSNYPTTASS
STPGVARGTC DISSGVPADV EANHPDASVV YSNIKVGPIG STFNSGGSNP GGGTTTTAKP TTTTTTAGSP
GGTGVAQHYG QCGGNGWQGP TTCASPYTCQ KLNDFYSQCL
117935080 Chaetomium MQIKQYLQYL AAALPLVNMA AAQRAGTQQT ETHPRLSWKR CSSGGNCQTV NAEIVIDANW
thermophilum RWLHDSNYQN CYDGNRWTSA CSSATDCAQK CYLEGANYGS TYGVSTSGDA LTLKFVTKHE
YGTNIGSRVY LMNGSDKYQM FTLMNNEFAF DVDLSKVECG LNSALYFVAM EEDGGMRSYS
SNKAGAKYGT GYCDAQCARD LKFVGGKANI EGWRPSTNDA NAGVGPYGAC CAEIDVWESN
AYAFAFTPHG CLNNNYHVCE TSNCGGTYSE DRFGGLCDAN GCDYNPYRMG NKDFYGKGKT
VDTSRKFTVV TRFEENKLTQ FFIQDGRKID IPPPTWPGLP NSSAITPELC TNLSKVFDDR DRYEETGGFR
TINEALRIPM VLVMSIWDGH YASMLWLDSV YPPEKAGQPG AERGPCAPTS GVPAEVEAQF
PNAQVIWSNI RFGPIGSTYQ V
154300584 Botryotinia fuckeliana MTSRIALVSL FAAVYGQQVG TYQTETHPSL TWQSCTAKGS CTTNTGSIVL DGNWRWTHGV
B05-10 GTSTNCYTGN TWDATLCPDD ATCAQNCALE GADYSGTYGI TTSGNSLRLN FVTQSANKNI
GSRVYLMADT THYKTFNLLN QEFTFDVDVS NLPCGLNGAV YFANLPADGG ISSTNTAGAE
YGTGYCDSQC PRDMKFIKGQ ANVDGWVPSS NNANTGVGNH GSCCAEMDIW EANSISTAVT
PHSCDTVTQT VCTGDDCGGT YSSSRYAGTC DPDGCDFNSY RMGDETFYGP GKTVDTNSVF
TVVTQFLTTD GTASGTLNEI KRFYVQDGKV IPNSYSTISG VSGNSITTPF CDAQKTAFGD PTSFSDHGGL
ASMSAAFEAG MVLVLSLWDD YYANMLWLDS TYPVGKTSAG GPRGTCDTSS GVPASVEASS
PNAYVVYSNI KVGAINSTYG
15824271 Pseudotrichonympha MFVFVLLWLT QSLGTGTNQA ENHPSLSWQN CRSGGSCTQT SGSVVLDSNW RWTHDSSLTN
grassii CYDGNEWSSS LCPDPKTCSD NCLIDGADYS GTYGITSSGN SLKLVFVTNG PYSTNIGSRV
YLLKDESHYQ IFDLKNKEFT FTVDDSNLDC GLNGALYFVS MDEDGGTSRF SSNKAGAKYG
TGYCDAQCPH DIKFINGEAN VENWKPQTND ENAGNGRYGA CCTEMDIWEA NKYATAYTPH
ICTVNGEYRC DGSECGDTDS GNRYGGVCDK DGCDFNSYRM GNTSFWGPGL IIDTGKPVTV
VTQFVTKDGT DNGQLSEIRR KYVQGGKVIE NTVVNIAGMS SGNSITDDFC NEQKSAFGDT
NDFEKKGGLS GLGKAFDYGM VLVLSLWDDH QVNMLWLDSI YPTDQPASQP GVKRGPCATS
SGAPSDVESQ HPDSSVTFSD IRFGPIDSTY
4586345 Irpex lacteus MFRKAALLAF SFLAIAHGQQ VGTNQAENHP SLPSQKCTAS GCTTSSTSVV LDANWRWVHT
TTGYTNCYTG QTWDASICPD GVTCAKACAL DGADYSGTYG ITTSGNALTL QFVKGTNVGS
RVYLLQDASN YQMFQLINQE FTFDVDMSNL PCGLNGAVYL SQMDQDGGVS RFPTNTAGAK
YGTGYCDSQC PRDIKFINGE ANVEGWTGSS TDSNSGTGNY GTCCSEMDIW EANSVAAAYT
PHPCSVNQQT RCTGADCGQG DDRYDGVCDP DGCDFNSFRM GDQTFLGKGL TVDTSRKFTI
VTQFISDDGT TSGNLAEIRR FYVQDGNVIP NSKVSIAGID AVNSITDDFC TQQKTAFGDT
NRFAAQGGLK QMGAALKSGM VLALSLWDDH AANMLWLDSD YPTTADASNP GVARGTCPTT
SGFPRDVESQ SGSATVTYSN IKWGDLNSTF TGTLTTPSGS SSPSSPASTS GSSTSASSSA SVPTQSGTVA
QWAQCGGIGY SGATTCVSPY TCHVVNAYYS QCY
46241268 Gibberella avenacea MYRAIATASA LIAAARAQQV CTLTTETKPA LTWSKCTSSG CTDVKGSVGI DANWRWTHQT
SSSTNCYTGN KWDTSVCTSG ETCAQKCCLD GADYAGTYGI TSSGNQLSLG FVTKGSFSTN
IGSRTYLMEN ENTYQMFQLL GNEFTFDVDV SNIGCGLNGA LYFVSMDADG GKARYPANKA
GAKYGTGYCD AQCPRDVKFI NGKANSDGWK PSDSDINAGI GNMGTCCPEM DIWEANSIST
AFTPHPCTKL TQHACTGDSC GGTYSNDRYG GTCDADGCDF NSYRQGNKTF YGRGSDFNVD
TTKKVTVVTQ FKKGSNGRLS EITRLYVQNG KVIANSESKI PGNSGSSLTA DFCSKQKSVF
GDIDDFSKKG GWSGMSDALE SPPMVLVMSL WHDHHSNMLW LDSTYPTDST KLGAQRGSCA
TTSGVPSDLE RDVPNSKVSF SNIKFGPIGS TYSSGTTNPP PSSTDTSTTP TNPPTGGTVG QYGQCGGQTY
TGPKDCKSPY TCKKINDFYS QCQ
6164684 Aspergillus niger MSSFQIYRAA LLLSILATAN AQQVGTYTTE THPSLTWQTC TSDGSCTTND GEVVIDANWR
WVHSTSSATN CYTGNEWDTS ICTDDVTCAA NCALDGATYE ATYGVTTSGS ELRLNFVTQG
SSKNIGSRLY LMSDDSNYEL FKLLGQEFTF DVDVSNLPCG LNGALYFVAM DADGGTSEYS
GNKAGAKYGT GYCDSQCPRD LKFINGEANC DGWEPSSNNV NTGVGDHGSC CAEMDVWEAN
SISNAFTAHP CDSVSQTMCD GDSCGGTYSA SGDRYSGTCD PDGCDYNPYR LGNTDFYGPG
LTVDTNSPFT VVTQFITDDG TSSGTLTEIK RLYVQNGEVI ANGASTYSSV NGSSITSAFC ESEKTLFGDE
NVFDKHGGLE GMGEAMAKGM VLVLSLWDDY AADMLWLDSD YPVNSSASTP GVARGTCSTD
SGVPATVEAE SPNAYVTYSN IKFGPIGSTY SSGSSSGSGS SSSSSSTTTK ATSTTLKTTS TTSSGSSSTS
AAQAYGQCGG QGWTGPTTCV SGYTCTYENA YYSQCL
6164682 Aspergillus niger MHQRALLFSA LLTAVRAQQA GTLTEEVHPS LTWQKCTSEG SCTEQSGSVV IDSNWRWTHS
VNDSTNCYTG NTWDATLCPD DETCAANCAL DGADYESTYG VTTDGDSLTL KFVTGSNVGS
RLYLMDTSDE GYQTFNLLDA EFTFDVDVSN LPCGLNGALY FTAMDADGGV SKYPANKAGA
KYGTGYCDSQ CPRDLKFIDG QANVDGWEPS SNNDNTGIGN HGSCCPEMDI WEANKISTAL
TPHPCDSSEQ TMCEGNDCGG TYSDDRYGGT CDPDGCDFNP YRMGNDSFYG PGKTIDTGSK
MTVVTQFITD GSGSLSEIKR YYVQNGNVIA NADSNISGVT GNSITTDFCT AQKKAFGDED
IFAEHNGLAG ISDAMSSMVL ILSLWDDYYA SMEWLDSDYP ENATATDPGV ARGTCDSESG
VPATVEGAHP DSSVTFSNIK FGPINSTFSA SA
33733371 Chrysosporium MYAKFATLAA LVAGAAAQNA CTLTAENHPS LTWSKCTSGG SCTSVQGSIT IDANWRWTHR TDSATNCYEG
lucknowense NKWDTSYCSD GPSCASKCCI DGADYSSTYG ITTSGNSLNL KFVTKGQYST NIGSRTYLME SDTKYQMFQL
U.S. Pat. No. 6,573,086-10 LGNEFTFDVD VSNLGCGLNG ALYFVSMDAD GGMSKYSGNK AGAKYGTGYC DSQCPRDLKF INGEANVENW
QSSTNDANAG TGKYGSCCSE MDVWEANNMA AAFTPHPCXV IGQSRCEGDS CGGTYSTDRY AGICDPDGCD
FNSYRQGNKT FYGKGMTVDT TKKITVVTQF LKNSAGELSE IKRFYVQNGK VIPNSESTIP GVEGNSITQD
WCDRQKAAFG DVTDXQDKGG MVQMGKALAG PMVLVMSIWD DHAVNMLWLD STWPIDGAGK PGAERGACPT
TSGVPAEVEA EAPNSNVIFS NIRFGPIGST VSGLPDGGSG NPNPPVSSST PVPSSSTTSS GSSGPTGGTG
VAKHYEQCGG IGFTGPTQCE SPYTCTKLND WYSQCL
29160311 Thielavia australiensis MYAKFATLAA LVAGASAQAV CSLTAETHPS LTWQKCTAPG SCTNVAGSIT IDANWRWTHQ TSSATNCYSG
SKWDSSICTT GTDCASKCCI DGAEYSSTYG ITTSGNALNL KFVTKGQYST NIGSRTYLME SDTKYQMFKL
LGNEFTFDVD VSNLGCGLNG ALYFVSMDAD GGMSKYSGNK AGAKYGTGYC DAQCPRDLKF INGEANVEGW
ESSTNDANAG SGKYGSCCTE MDVWEANNMA TAFTPHPCTT IGQTRCEGDT CGGTYSSDRY AGVCDPDGCD
FNSYRQGNKT FYGKGMTVDT TKKITVVTQF LKNSAGELSE IKRFYAQDGK VIPNSESTIA GIPGNSITKA
YCDAQKTVFQ NTDDFTAKGG LVQMGKALAG DMVLVMSVWD DHAVNMLWLD STYPTDQVGV AGAERGACPT
TSGVPSDVEA NAPNSNVIFS NIRFGPIGST VQGLPSSGGT SSSSSAAPQS TSTKASTTTS AVRTTSTATT
KTTSSAPAQG TNTAKHWQQC GGNGWTGPTV CESPYKCTKQ NDWYSQCL
146197087 uncultured symbiotic MLTLVYFLLS LVVSLEIGTQ QSEDHPKLTW QNGSSSVSGS IVLDSNWRWV HDSGTTNCYD GNLWSKDLCP
protist of SSDTCSQKCY IEGADYSGTY GIQSSGSKLT LKFVTKGSYS TNIGSRVYLL KDENTYESFK LKNKEFTFTV
Reticulitermes DDSKLNCGLN GALYFVAMDA DGGKAKYSSF KPGAKYGMGY CDAQCPHDMK FISGKANVDD WKPQDNDENS
speratus GNGKLGTCCS EMDIWEGNMK SQAYTVHACT KSGQYECTGQ QCGDTDSGDR FKGTCDKDGC DYASWRWGDQ
SFYGEGKTVD TKQPVTVVTQ FIGDPLTEIR RLYVQGGKTI NNSKTSNLAD TYDSITDKFC DATKEASGDT
NDFKAKGAMS GFSTNLNNGQ VLVMSLWDDH TANMLWLDST YPTDSSDSTA QRGPCPTSSG VPKDVESQHG
DATVVFSDIK FGAINSTFKY N
146197237 uncultured symbiotic MLAAALFTFA CSVGVGTKTP ENHPKLNWQN CASKGSCSQV SGEVTMDSNW RWTHDGNGKN CYDGNTWISS
protist of Neotermes LCPDDKTCSD KCVLDGAEYQ ATYGIQSNGT ALTLKFVTHG SYSTNIGSRL YLLKDKSTYY VFKLNNKEFT
koshunensis FSVDVSKLPC GLNGALYFVE MDADGGKAKY AGAKPGAEYG LGYCDAQCPS DLKFINGEAN SEGWKPQSGD
KNAGNGKYGS CCSEMDVWES NSQATALTPH VCKTTGQQRC SGKSECGGQD GQDRFAGLCD EDGCDFNNWR
MGDKTFFGPG LIVDTKSPFV VVTQFYGSPV TEIRRKYVQN GKVIENSKSN IPGIDATAAI SDHFCEQQKK
AFGDTNDFKN KGGFAKLGQV FDRGMVLVLS LWDDHQVAML WLDSTYPTNK DKSQPGVDRG PCPTSSGKPD
DVESASADAT VVYGNIKFGA LDSTY
146197067 uncultured symbiotic MLTLVYFLLS LVVSLEIGTQ QSEDHPKLTW QNGSSSVSGS IVLDSNWRWV HDSGTTNCYD GNLWSKDLCP
protist of SSNTCSQKCY IEGADYSGTY GIQSSGSKLT LKFVTKGSYS TNIGSRVYLL KDENTYESFK LKNKEFTFTV
Reticulitermes DDSKLNCGLN GALYFVAMDA DGGKAKYSSF KPGAKYGMGY CDAQCPHDMK FISGKANVDD WKPQDNDENS
speratus GNGKLGTCCS EMDIWEGNMK SQAYTVHACT KSGQYECTGQ QCGDTDSGDR FKGTCDKDGC DYASWRWGDQ
SFYGEGKTVD TKQPVTVVTQ FIGDPLTEIR RLYVQGGKTI NNSKTSNLAD TYDSITDKFC DATKEASGDT
NDFKAKGAMS GFSTNLNNGQ VLVMSLWDDH TANMLWLDST YPTDSTKTGA SRGPCAVSSG VPKDVESQYG
DATVIYSDIK FGAINSTFKW N
146197407 uncultured symbiotic MILALLSLAK SLGIATNQAE THPKLTWTRY QSKGSGQTVN GEIVLDSNWR WTHHSGTNCY DGNTWSTSLC
protist of Cryptocercus PDPTTCSNNC DLDGADYPGT YGISTSGNSL KLGFVTHGSY STNIGSRVYL LRDSKNYEMF KLKNKEFTFT
punctulatus VDDSKLPCGL NGALYFVAMD EDGGVSKNSI NKAGAQYGTG YCDAQCPHDM KFINGEANVL DWKPQSNDEN
SGNGRYGACC TEMDIWEANS MATAYTPHVC TVTGLRRCEG TECGDTDANQ RYNGICDKDG CDFNSYRLGD
KTFFGVGKTV DSSKPVTVVT QFVTSNGQDS GTLSEIRRKY VQGGKVIENS KVNIAGITAG NSVTDTFCNE
QKKAFGDNND FEKKGGLGAL SKQLDAGMVL VLSLWDDHSV NMLWLDSTYP TNAAAGALGT ERGACATSSG
APSDVESQSP DATVTFSDIK FGPIDSTY
146197157 uncultured symbiotic MLVIALILRG LSVGTGTQQS ETHPSLSWQQ TSKGGSGQSV SGSVVLDSNW RWTHTTDGTT NCYDGNEWSS
protist of DLCPDASTCS SNCVLEGADY SGTYGITGSG SSLKLGFVTK GSYSTNIGSR VYLLGDESHY KLFKLENNEF
Hodotermopsis TFTVDDSNLE CGLNGALYFV AMDEDGGASK YSGAKPGAKY GMGYCDAQCP HDMKFINGDA NVEGWKPSDN
sjoestedti DENAGTGKWG ACCTEMDIWE ANKYATAYTP HICTKNGEYR CEGTDCGDTK DNNRYGGVCD KDGCDFNSWR
MGNQSFWGPG LIIDTGKPVT VVTQFLADGG SLSEIRRKYV QGGKVIENTV TKISGMDEFD SITDEFCNQQ
KKAFRDTNDF EKKGGLKGLG TAVDAGVVLV LSLWDDHDVN MLWLDSIYPT DSGSKAGADR GPCATSSGVP
KDVESNYASA SVTFSDIKFG PIDSTY
146197403 uncultured symbiotic MLLALFAFGK SLGIATNQAE NHPKLTWTRY QSKGSGQTVN GEIVLDSNWR WTHHSGTNCY DGNTWSTSLC
protist of Cryptocercus PDPTTCSNNC DLDGADYPGT YGISSSGNSL KLGFVTHGSY STNIGSRVYL LRDSKNYEMF KLKNKEFTFT
punctulatus VDDSKLPCGL NGALYFVAMD EDGGVSKNSI NKAGAQYGTG YCDAQCPHDM KFINGEANVL DWKPQSNDEN
SGNGRYGACC TEMDIWEANS MATAYTPHVC TVTGIRRCEG TECGDTDANQ RYNGICDKDG CDFNSYRLGD
KSFFGVGKTV DSSKPVTVVT QFVTSNGQDS GTLSEIRRKY VQGGKVIENS KVNIAGMAAG NSITDTFCNE
QKKAFGDNND FEKKGGLGAL SKQLDSGMVL VLSLWDDHSV NMLWLDSTYP TNAAAGALGT ERGACATSSG
APSDVESQSP DATVTFSDIK FGPIDSTY
146197081 uncultured symbiotic MLASVVYLVS LVVSLEIGTQ QSEEHPKLTW QNGSSSVSGS IVLDSNWRWL HDSGTTNCYD GNLWSDDLCP
protist of NADTCSSKCY IEGADYSGTY GITSSGSKVT LKFVTKGSYS TNIGSRIYLL KDENTYETFK LKNKEFTFTV
Reticulitermes DDSKLDCGLN GALYFVAMDA DGGKAKYSSF KPGAKYGMGY CDAQCPHDMK FISGKANVDD WKPQDNDENS
speratus GDGKLGTCCS EMDIWEGNAK SQAYTVHACS KSGQYECTGQ QCGDTDSGDR FKGTCDKDGC DYASWRWGDQ
SFYGEGKTVD TKSPVTVVTQ FIGDPLTEIR RVYVQGGKTI NNSKTSNLAD TYDSITDKFC DATKDATGDT
NDFKAKGAMA GFSTNLNTAQ VLVSVHCGMI IQPICCGLIR RIQRIQQKQV QAVDRVLCRR VFQRMLKASM
VMLQSRTRTL SLELSTRPLV GISPAGRLFF F
146197413 uncultured symbiotic MILALLVLGK SLGIATNQAE THPKLTWTRY QSKGSGSTVN GEIVLDSNWR WTHHSGTNCY DGNTWSTSLC
protist of Cryptocercus PDPTTCSNNC DLDGADYPGT YGISTSGNSL KLGFVTHGSY STNIGSRVYL LKDTKSYEMF KLKNKEFTFT
punctulatus VDDSKLPCGL NGALYFVAMD EDGGVSKNSI NKAGAQYGTG YCDAQCPHDM KFINGEANVL DWKPQSNDEN
SGNGRYGACC TEMDIWEANS MATAYTPHVC TVTGLRRCEG TECGDTDNDQ RYNGICDKDG CDFNSYRLGD
KSFFGVGKTV DSSKPVTVVT QFVTSNGQDS GTLSEIRRKY VQGGKVIENS KVNVAGITAG NSVTDTFCNE
QKKAFGDNND FEKKGGLGAL SKQLDAGMVL VLSLWDDHSV NMLWLDSTYP TNAAAGALGT ERGACATSSG
KPSDVESQSP DATVTFSDIK FGPIDSTY
146197309 uncultured symbiotic MLCIGLISFV YSLGVGTNTA ETHPKLTWKN GGQTVNGEVT VDSNWRWTHT KGSTKNCYDG NLWSKDLCPD
protist of Mastotermes AATCGKNCVL EGADYSGTYG VTSSGNALTL KFVTHGSYST NVGSRLYLLK DEKTYQMFNL NGKEFTFTVD
darwiniensis VSNLPCGLNG ALYHVNMDED GGTKRYPDNE AGAKYGTGYC DAQCPTDLKF INGIPNSDGW KPQSNDKNSG
NGKYGSCCSE MDIWEANSIC SAVTPHVCDN LQQTRCQGTA CGENGGGSRF GSSCDPDGCD FNSWRMGNKT
FYGPGLIVDT KSKFTVVTQF VGNPVTEIKR KYVQNGKVIE NSYSNIEGMD KFNSVSDKFC TAQKKAFGDT
DSFTKHGGFK QLGSALAKGM VLVLSLWDDH TVNMLWLDSV YPTNSKKAGS DRGPCPTTSG VPADVESKSA
DANVIYSDIR FGAIDSTYK
146197227 uncultured symbiotic MLGALVALAS CIGVGTNTPE KHPDLKWTNG GSSVSGSIVV DSNWRWTHIK GETKNCYDGN LWSDKYCPDA
protist of Neotermes ATCGKNCVLE GADYSGTYGV TTSGDAATLK FVTHGQYSTN VGSRLYLLKD EKTYQMFNLV GKEFTFTVDV
koshunensis SNLPCGLNGA LYFVQMDSDG GMAKYPDNQA GAKYGTGYCD AQCPTDLKFI NGIPNSDGWK PQKNDKNSGN
GKYGSCCSEM DIWEANSMAT AYTPHVCDKL EQTRCSGSAC GQNGGGDRFS SSCDPDGCDF NSWRMGNKTF
WGPGLIVDTK KPVQVVTQFV GSGGSVTEIK RKYVQGGKVI DNSMTNIAAM SKQYNSVSDE FCQAQKKAFG
DNDSFTKHGG FRQLGATLSK GHVLVLSLWD DHDVNMLWLD SVYPTNSNKP GADRGPCKTS SGVPSDVESQ
NADSTVKYSD IRFGAIDSTY SK
146197253 uncultured symbiotic MLAAALFTFA CSVGVGTKTT ETHPKLNWQQ CACKGSCSQV SGEVTMDSNW RWTHDGNGKN CYDGNTWISS
protist of Neotermes LCPDDKTCSD KCVLDGAEYQ ATYGIQSNGT ALTPKFVTHG SYSTNIGSRL YLLKDKSTYY VFQLNNKEFT
koshunensis FSVDVSKLPC GLNGALYFVE MDADGGKSKY AGAKPGAEYG LGYCDAQCPS DLKFINGEAN SEGWKPQSGD
KNAGNGKYGS CCSEMDVWES NSMATALTPH VCKTTGQTRC SGKSECGGQD GQDRFAGNCD EDGCDFNNWR
MGDKTFFGPG LTVDTKSPFV VVTQFYGSPV TEIRRKYVQN GKVIENAKSN IPGIDATNAI SDTFCEQQKK
AFGDTNDFKN KGGFTKLGSV FSRGMVLVLS LWDDHQVAML WLDSTYPTNK DKSVPGVDRG PCPTSSGKPD
DVESASGDAT VVYGNIKFGA LDSTY
146197099 uncultured symbiotic MFGFLLSLFA LQFALEIGTQ TSESHPSITW ELNGARQSGQ IVIDSNWRWL HDSGTTNCYD GNTWSSDLCP
protist of DPEKCSQNCY LEGADYSGTY GISASGSQLT LGFVTKGSYS TNIGSRVYLL KDENTYPMFK LKNKEFTFTV
Reticulitermes DVSNLPCGLN GALYFVAMPS DGGKAKYPLA KPGAKYGMGY CDAQCPHDMK FINGEANVLD WKPQSNDENA
speratus GTGRYGTCCT EMDIWEANSQ ATAYTVHACS KNARCEGTEC GDDSASQRYN GICDKDGCDF NSWRWGNKTF
FGPGLTVDSS KPVTVVTQFI GDPLTEIRRI WVQGGKVIQN SFTNVSGITS VDSITNTFCD ESKVATGDTN
DFKAKGGMSG FSKALDTEVV LVLSLWDDHT ANMLWLDSTY PTDSTAIGAS RGPCATSSGD PKDVESASAN
ASVKFSDIKF GALDSTY
146197409 uncultured symbiotic MLASLLPLSN SLGTASNQAE THPKLTWTQY TGKGAGQTVN GEIVLDSNWR WTHKDGTNCY DGNTWSSSLC
protist of Cryptocercus PDPTTCSNNC NLDGADYPGT YGITTSGNQL KLGFVTHGSY STNIGSRVYL LRDSKNYQMF KLKNKEFTFT
punctulatus VDDSKLPCGL NGAVYFVAMD EDGGTAKHSI NKAGAQYGTG YCDAQCPHDM KFINGEANVL DWKPQSNDEN
SGNGRWGARC TEMDIWEANS RATAYTPHIC TKTGLYRCEG TECGDSDTNR YGGVCDKDGC DFNSYRMGDK
SFFGQGKTVD SSKPVTVVTQ FITDNNQDSG KLTEIRRKYV QGGKVIDNSK VNIAGITAGN PITDTFCDEA
KKAFGDNNDF EKKGGLSALG TQLEAGFVLV LSLWDDHSVN MLWLDSTYPT NASPGALGVE RGDCAITSGV
PADVESQSAD ASVTFSDIKF GPIDSTY
146197315 uncultured symbiotic MLCIGLISFV YSLGVGTNTA ETHPKLTWKN GGQTVNGEVT VDSNWRWTHT KGSTKNCYDG NLWSKDLCPD
protist of Mastotermes AATCGKNCVL EGADYSGTYG VTSSGNALTL KFVTHGSYST NVGSRLYLLK DEKTYQMFNL NGKEFTFTVD
darwiniensis VSNLPCGLSG ALYHVNMDED GGTKRYPDNE AGAKYGTGYC DAQCPTDLKF INGIPNSDGW KPQSNDKNSG
NGKYGSCCSE MDIWEANSIC SAVTPHVCDN LQQTRCQGAA CGENGGGSRF GSSCDPDGCD FNSWGMGNKT
FYGPGLIVDT KSKFTVVTQF VGNPVTEIKR KYVQNGKVIE NSYSNIEGMD KFNSVSDKFC TAQKKAFGDT
DSFTKHGGFK QLGSALAKGM VLVLSLWDDH TVNMLWLDSV YPTNSKKAGS DRGPCPTTSG VPADVESKSA
DANVIYSDIR FGAIDSTYK
146197411 uncultured symbiotic MILALLVLGK SLGIATNQAE THPKLTWTRY QSKGSGSTVN GEIVLDSNWR WTHHSGTNCY DGNTWSTSLC
protist of Cryptocercus PDPTTCSNNC DLDGADYPGT YGISTSGNSL KLGFVTHGSY STNIGSRVYL LRDSKNYEMF KLKNKEFTFT
punctulatus VDDSKLPCGL NGALYFVAMD EDGGVSKNSI NKAGAQYGTG YCDAQCPHDM KFINGEANVL DWKPQSNDEN
SGNGRYGACC TEMDIWEANS MATAYTPHVC TVTGLRRCEG TECGDTDNDQ RYNGICDKDG CDFNSYRLGD
KSFFGVGKTV DSSKPVTVVT QFVTSNGQDS GILSETRRKY VQGGKVIENS KVNVAGITAG NSVTDTFCNE
QKKAFGDNND FEKKGGLGAL SKQLDAGMVL VLSLWDDHSV NMLWLDSTYP TNAAAGALGT ERGACATSSG
KPSDVESQSP DATVTFSDIK FGPIDSTY
146197161 uncultured symbiotic MIGIVLIQTV FGIGVGTQQS ESHPSLSWQQ CSKGGSCTSV SGSIVLDSNW RWTHIPDGTT NCYDGNEWSS
protist of DLCPDPTTCS NNCVLEGADY SGTYGISTSG SSAKLGFVTK GSYSTNIGSR VYLLGDESHY KIFDLKNKEF
Hodotermopsis TFTVDDSNLE CGLNGALYFV AMDEDGGASR FTLAKPGAKY GTGYCDAQCP HDIKFINGEA NVQDWKPSDN
sjoestedti DDNAGTGHYG ACCTEMDIWE ANKYATAYTP HICTENGEYR CEGKSCGDSS DDRYGGVCDK DGCDFNSWRL
GNQSFWGPGL IIDTGKPVTV VTQFVTKDGT DSGALSEIRR KYVQGGKTIE NTVVKISGID EVDSITDEFC
NQQKQAFGDT NDFEKKGGLS GLGKAFDYGV VLVLSLWDDH DVNMLWLDSV YPTNPAGKAG ADRGPCATSS
GDPKEVEDKY ASASVTFSDI KFGPIDSTY
146197323 uncultured symbiotic MLVFGIVSFV YSIGVGTNTA ETHPKLTWKN GGSTTNGEVT VDSNWRWTHT KGSTKNCYDG NLWSKDLCPD
protist of Mastotermes AATCGKNCVL EGADYSGTYG VTSSGDALTL KFVTHGSYST NVGSRLYLLK DEKTYQMFNL NGKEFTFTVD
darwiniensis VSQLPCGLNG ALYFVCMDQD GGMSRYPDNQ AGAKYGTGYC DAQCPTDLKF INGLPNSDGW KPQSNDKNSG
NGKYGSCCSE MDIWEANSLA TAVTPHVCDQ VGQTRCEGRA CGENGGGDRF GSICDPDGCD FNSWRMGNKT
FWGPGLIIDT KKPVTVVTQF IGSPVTEIKR EYVQGGKVIE NSYTNIEGMD KFNSISDKFC TAQKKAFGDN
DSFTKHGGFS KLGQSFTKGQ VLVLSLWDDH TVNMLWLDSV YPTNSKKLGS DRGPCPTSSG VPADVESKNA
DSSVKYSDIR FGSIDSTYK
146197077 uncultured symbiotic MLSFVFLLGF GVSLEIGTQQ SENHPTLSWQ QCTSSGSCTS QSGSIVLDSN WRWVHDSGTT NCYDGNEWSS
protist of DLCPDPETCS KNCYLDGADY SGTYGITSNG SSLKLGFVTE GSYSTNIGSR VYLKKDTNTY QIFKLKNHEF
Reticulitermes TFTVDVSNLP CGLNGALYFV EMEADGGKGK YPLAKPGAQY GMGYCDAQCP HDMKFINGNA NVLDWKPQET
speratus DENSGNGRYG TCCTEMDIWE ANSQATAYTP HICTKDGQYQ CEGTECGDSD ANQRYNGVCD KDGCDFNSYR
LGNKTFFGPG LIVDSKKPVT VVTQFITSNG QDSGDLTEIR RIYVQGGKTI QNSFTNIAGL TSVDSITEAF
CDESKDLFGD TNDFKAKGGF TAMGKSLDTG VVLVLSLWDD HSVNMLWLDS TYPTDAAAGA LGTQRGPCAT
SSGAPSDVES QSPDASVTFS DIKFGPLDST Y
146197089 uncultured symbiotic MLTLVVYLLS LVVSLEIGTQ QSESHPALTW QREGSSASGS IVLDSNWRWV HDSGTTNCYD GNEWSTDLCP
protist of SSDTCTQKCY IEGADYSGTY GITTSGSKLT LKFVTKGSYS TNIGSRVYLL KDENTYETFK LKNKEFTFTV
Reticulitermes DDSKLDCGLN GALYFVAMDA DGGKQKYSSF KPGAKYGMGY CDAQCPHDMK FISGKANVED WKPQDNDENS
speratus GNGKLGTCCS EMDIWEGNAK SQAYTVHACT KSGQYECTGT DCGDSDSRYQ GTCDKDGCDY ASYRWGDHSF
YGEGKTVDTK QPITVVTQFI GDPLTEIRRL YIQGGKVINN SKTQNLASVY DSITDAFCDA TKAASGDTND
FKAKGAMAGF SKNLDTPQVL VLSLWDDHTA NMLWLDSTYP TDSRDATAER GPCATSSGVP KDVESNQADA
SVVFSDIKFG AINSTYSYN
146197091 uncultured symbiotic MFGFLLSLFA LQFALEIGTQ TSESHPSITW ELNGARQSGQ IVIDSNWRWL HDSGTTNCYD GNTWSSDLCP
protist of DPEKCSQNCY LEGADYSGTY GISASGSQLT LGFVTKGSYS TNIGSRVYLL KDENTYQMFK LKNKEFTFTV
Reticulitermes DVSNLPCGLN GALYFVAMPS DGGKAKYPLA KPGAKYGMGY CDAQCPHDMK FINGEANVLD WKPQSNDENA
speratus GTGRYGTCCT EMDIWEANSQ ATAYTVHACS KNARCEGTEC GDDSASQRYN GICDKDGCDF NSWRWGNKTF
FGPGLTVDSS KPVTVVTQFI GDPLTEIRRI WVQGGKVIQN SFTNVSGITS VDSITNTFCD ESKVATGDTN
DFKAKGGMSG FSKALDTEVV LVLSLWDDHT ANMLWLDSTY PSNSTAIGAT RGPCATSSGD PKNVESASAN
ASVKFSDIKF GAFDSTY
146197097 uncultured symbiotic MLALVYFLLS LVVSLEIGTQ QSEDHPKLTW QNGSSSVSGS IVLDSNWRWV HDSGTTNCYD GNLWSTDLCP
protist of SSDTCTSKCY IEGADYSGTY GITSSGSKVT LKFVTKGSYS TNIGSRIYLL KDENTYETFK LKNKEFTFTV
Reticulitermes DDSQLNCGLN GALYFVAMDA DGGKAKYSSF KPGAKYGMGY CDAQCPHDMK FISGKANVDD WKPQDNDENS
speratus GNGKLGTCCS EMDIWEGNAK SQAYTVHACT KSGQYECTGQ QCGDTDSGDR FKGTCDKDGC DYASWRWGDQ
SFYGEGKTVD TKQPVTVVTQ FIGDPLTEIR RLYVQGGKTI NNSKTSNLAD TYDSITDKFC DATKEASGDT
NDFKAKGAMS GFSTNLNTAQ VLVLSLWDDH TANMLWLDST YPTDSTKTGA SRGPCAVTSG VPKDVESQYG
SAQVVYSDIK FGAINSTY
146197095 uncultured symbiotic MLALVYFLLS FVVSLEIGTQ QSEDHPKLTW QNGSSSVSGS IVLDSNWRWV HDSGTTNCYD GNLWSTDLCG
protist of SSDTCSSKCY IEGADYSGTY GISASGSKLT LKFVTKGSYS TNIGSRVYLL KDENTYETFK LKGKEFTFTV
Reticulitermes DDSKLDCGLN GALYFVAMDA DGGKAKYSSF KPGAKYGMGY CDAQCPHDMK FISGKANVDD WKPQDNDENS
speratus GNGKLGTCCS EMDIWEGNAK SQAYTVHACT KSGQYECTGQ QCGDTDSGDR FKGTCDKDGC DYASWRWGDQ
SFYGEGKTID TKQPVTVVTQ FIGDPLTEIR RVYVQGGKVI NNSKTSNLAN VYDSITDKFC DDTKDATGDT
NDFKAKGAMS GFSTNLNTAQ VLVMSLWDDH TANMLWLDST YPTDSTKTGA SRGPCAVLSG VPKNVESQHG
DATVIYSDIK FGAINSTFSY N
146197401 uncultured symbiotic MFLALFVLGK SLGIATNQAE NHPKLTWTRY QSKGSGQTVN GEVVLDSNWR WTHHSGTNCY DGNTWSTSLC
protist of Cryptocercus PDPQTCSSNC DLDGADYPGT YGISSSGNSL KLGFVTHGSY STNIGSRVYL LRDSKNYEMF KLKNKEFTFT
punctulatus VDDSKLPCGL NGALYFVAME EDGGVAKNSI NKAGAQYGTG YCDAQCPHDM KFINGEANVL DWKPQSNDEN
SGNGRYGACC IEMDIWEANS MATAYTPHVC TVTGIHRCEG TECGDTDANQ RYNGICDKDG CDFNSYRMGD
KSFFGVGKTV DSSKPVTVVT QFVTSNGQDG GTLSEIKRKY VQGGKVIENS KVNIAGITAV NSITDTFCNE
QKKAFGDNND FEKKGGLGAL SKQLDLGMVL VLSLWDDHSV NMLWLDSTYP TDAAAGALGT ERGACATSSG
KPSDVESQSP DASVTFSDIK FGPIDSTY
146197225 uncultured symbiotic MLLCLLSIAN SLGVGTNTAE NHPKLSWKNG GSSVSGSVTV DANWRWTHIK GETKNCYDGN LWSDKYCPDA
protist of Neotermes ATCGKNCVIE GADYQGTYGV SSSGDGLTLT FVTHGQYSTN VGSRLYLMKD EKTYQMFNLN GKEFTFTVDV
koshunensis SNLPCGLNGA LYFVQMDSDG GMAKYPDNQA GAKYGTGYCD AQCPTDLKFI NGIPNSDGWK PQKNDKNSGN
GKYGSCCSEM DIWEANSQAT AYTPHVCDKL EQTRCSGSSC GHTGGGERFS SSCDPDGCDF NSWRMGNKTF
WGPGLIVDTK KPVQVVTQFV GSGNSCTEIK RKYVQGGKVI DNSMSNIAGM SKQYNSVSDD FCQAQKKAFG
DNDSFTKHGG FRQLGATLGK GHVLVLSLWD DHDVNMLWLD SVYPTNSNKP GSDRGPCKTS SGIPADVESQ
AASSSVKYSD IRFGAIDSTY K
146197317 uncultured symbiotic MLCIGLISFV YSLGVGTNTA ETHPKLTWKN GGQTVNGEVT VDSNWRWTHT KGSTKNCYDG NLWSKDLCPD
protist of Mastotermes AATCGKNCVL EGADYSGTYG VTSSGNALTL KFVTHGSYST NVGSRLYLMK DEKTYQMFNL NGKEFTFTVD
darwiniensis VSNLPCGLNG ALYHVNMDED GGTKRYPDNE AGAKYGTGYC DAQCPTDLKF INGIPNSDGW KPQSNDKNSG
NGKYGSCCSE MDIWEANSIC SAVTPHVCDT LQQTRCQGTA CGENGGGSRF GSSCDPDGCD FNSWRMGNKT
FYGPGLIVDT KSKFTVVTQF VGSPVTEIKR KYVQNGKVIE NSFSNIEGMD KFNSISDKFC TAQKKAFGDT
DSFTKHGGFK QLGSALAKGM VLVLSLWDDH TVNMLWLDSV YPTNSKKAGS DRGPCPTTSG VPADVESKSA
NANVIYSDIR FGAIDSTYK
146197251 uncultured symbiotic MLLCLLGIAS SLDAGTNTAE NHPQLSWKNG GSSVSGSVTV DANWRWTHIK GETKNCYDGN LWSDKYCPDA
protist of Neotermes ATCGQNCVIE GADYQGTYGV SASGNALTLT FVTHGQYSTN VGSRLYLLKD EKTYQIFNLI GKEFTFTVDV
koshunensis SNLPCGLNGA LYFVQMDADG GTAKYSDNKA GAKYGTGYCD AQCPTDLKFI NGIPNSDGWK PQKNDKNSGN
GRYGSCCSEM DVWEANSLAT AYTPHVCDKL EQVRCDGRAC GQNGGGDRFS SSCDPDGCDF NSWRLGNKTF
WGPGLIVDTK QPVQVVTQWV GSGTSVTEIK RKYVQGGKVI DNSFTKLDSL TKQYNSVSDE FCVAQKKAFG
DNDSFTKHGG FRQLGATLAK GHVLVLSLWD DHDVNMLWLD SVYPTNSNKP GADRGPCKTS SGVPADVESQ
AASSSVKYSD IRFGAIDSTY K
146197319 uncultured symbiotic MLGIGFVCIV YSLGVGTNTA ENHPKLTWKN SGSTTNGEVT VDSNWRWTHT KGTTKNCYDG NLWSKDLCPD
protist of Mastotermes AATCGKNCVL EGADYSGTYG VTSSGDALTL KFVTHGSYST NVGSRLYLLK DEKTYQIFNL NGKEFTFTVD
darwiniensis VSNLPCGLNG ALYFVNMDAD GGTGRYPDNQ AGAKYGTGYC DAQCPTDLKF INGIPNSDGW KPQSNDKNSG
NGKYGSCCSE MDIWEANSLA TAVTPHVCDQ VGQTRCEGRA CGENGGGDRF GSSCDPDGCD FNSWRLGNKT
FWGPGLIVDT KKPVTVVTQF VGSPVTEIKR KYVQGGKVIE NSYTNIEGLD KFNSISDKFC TAQKKAFGDN
DSFIKHGGFR QLGQSFTKGQ VLVLSLWDDH TVNMLWLDSV YPTNSKKPGA DRGPCPTSSG VPADVESKNA
GSSVKYSDIR FGSIDSTYK
146197071 uncultured symbiotic MATLVGILVS LFALEVALEI GTQTSESHPS LSWELNGQRQ TGSIVIDSNW RWLHDSGTTN CYDGNEWSSD
protist of LCPDPEKCSQ NCYLEGADYS GTYGISSSGN SLQLGFVTKG SYSTNIGSRV YLLKDENTYA TFKLKNKEFT
Reticulitermes FTADVSNLPC GLNGALYFVA MPADGGKSKY PLAKPGAKYG MGYCDAQCPH DMKFINGEAN ILDWKPSSND
speratus ENAGAGRYGT CCTEMDIWEA NSQATAYTVH ACSKNARCEG TECGDDDGRY NGICDKDGCD FNSWRWGNKT
FFGPNLIVDS SKPVTVVTQF IGDPLTEIRR IYVQGGKVIQ NSFTNISGVA SVDSITDAFC NENKVATGDT
NDFKAKGGMS GFSKALDTEV VLVLSLWDDH TANMLWLDST YPTDSSALGA SRGPCAITSG EPKDVESASA
NASVKFSDIK FGAIDSTY
146197075 uncultured symbiotic MLTLVYFLLS LVVSLEIGTQ QSESHPQLSW QNGSSSVSGS IVLDSNWRWV HDSGTTNCYD GNLWSTDLCP
protist of SSDTCTSKCY IEGADYSGTY GITSSGSKLT LKFVTKGSYS TNIGSRVYLL KDENTYETFK LKNKEFTFTV
Reticulitermes DDSKLDCGLN GALYFVAMDA DGGKAKYSSF KPGAKYGMGY CDAQCPHDMK FISGKANVDD WKPQDNDENS
speratus GNGKLGTCCS EMDIWEGNAK SQAYTVHACT KSGQYECTGQ QCGDTDSGDR FKGTCDKDGC DYASWRWGDQ
SFYGEGKTVD TKQPLTVVTQ FVGDPLTEIR RVYVQGGKTI NNSKTSNLAD TYDSITDKFC DATKEASGDT
NDFKAKGAMS GFSTNLNTAQ VLVMSLWDDH TANMLWLDST YPTDSTKTGA SRGPCAVSSG VPKDVESQHG
DATVIYSDIK FGAINSTFKW N
146197159 uncultured symbiotic MLSLVSIFLV GLGFSLGVGT QQSESHPSLS WQNCSAKGSC QSVSGSIVLD SNWRWLHDSG TTNCYDGNEW
protist of STDLCPDAST CDKNCYIEGA DYSGTYGITS SGAQLKLGFV TKGSYSTNIG SRVYLLRDES HYQLFKLKNH
Hodotermopsis EFTFTVDDSQ LPCGLNGALY FVEMAEDGGA KPGAQYGMGY CDAQCPHDMK FITGEANVKD WKPQETDENA
sjoestedti GNGHYGACCT EMDIWEANSQ ATAYTPHICS KTGIYRCEGT ECGDNDANQR YNGVCDKDGC DFNSYRLGNK
TFWGPGLTVD SNKAMIVVTQ FTTSNNQDSG ELSEIRRIYV QGGKTIQNSD TNVQGITTTN KITQAFCDET
KVTFGDTNDF KAKGGFSGLS KSLESGAVLV LSLWDDHSVN MLWLDSTYPT DSAGKPGADR GPCAITSGDP
KDVESQSPNA SVTFSDIKFG PIDSTY
146197405 uncultured symbiotic MILALLVLGK SLGIATNQAE THPKLTWTRY QSKGSGSTVN GEIVLDSNWR WTHHSGTNCY DGNTWSTSLC
protist of Cryptocercus PDPTTCSNNC DLDGADYPGT YGISTSGNSL KLGFVTHGSY STNIGSRVYL LKDTKSYEMF KLKNKEFTFT
punctulatus VDDSKLPCGL NGALYFVAMD EDGGVSKNSI NKAGAQYGTG YCDAQCPHDM KFINGEANVL DWKPQSNDEN
SGNGRYGACC TEMDIWEANS MATAYTPHVC TVTGLRRCEG TECGDTDNDQ RYNGICDKDG CDFNSYRLGD
KSFFGVGKTV DSSKPVTVVT QFVTSNGQDS GTLSEIRRKY VQGGKVIENS KVNVAGITAG NSVTDTFCNE
QKKAFGDNND FEKKGGFGAL SKQLVAGMVL VLSLWDDHSV NMLWLDSTYP TNAAAGALGT ERGACATSSG
KPSDVESQSP DATVTFSDIK FGPIDSTY
146197327 uncultured symbiotic MLCVGLFGLV YSIGVGTNTQ ETHPKLSWKQ CSSGGSCTTQ QGSVVIDSNW RWTHSTKDLT NCYDGNLWDS
protist of Mastotermes TLCPDGTTCS KNCVLEGADY SGTYGITSSG DSLTLKFVTH GSYSTNVGSR LYLLKDDNNY QIFNLAGKEF
darwiniensis TFTVDVSNLP CGLNGALYFV EMDQDGGKGK HKENEAGAKY GTGYCDAQCP TDLKFIDGIA NSDGWKPQDN
DENSGNGKYG SCCSEMDIWE ANSLATAYTP HVCDTKGQKR CQGTACGENG GGDRFGSECD PDGCDFNSWR
QGNKSFWGPG LIIDTKKSVQ VVTQFIGSGS SVTEIRRKYV QNGKVIENSY STISGTEKYN SISDDYCNAQ
KKAFGDTNSF ENHGGFKRFS QHIQDMVLVL SLWDDHTVNM LWLDSVYPTN SNKPGADRGP CETSSGVPAD
VESKSASASV KYSDIRFGPI DSTYK
146197261 uncultured symbiotic MLLCLWSIAY SLGVGTNTAE NHPKLSWKNG GSSVSGSVTV DANWRWTHIK GETKNCYDGN LWSDKYCPDA
protist of Neotermes ATCGKNCVIE GADYQGTYGV SASGDGLTLT FVTHGQYSTN VGSRLYLMKD EKTYQIFNLN GKEFTFTVDV
koshunensis SNLPCGLNGA LYFVQMDSDG GMAKYPDNQA GAKYGTGYCD AQCPTDLKFI NGIPNSDGWK PQKNDKNSGN
GKYGSCCSEM DIWEANSQAT AYTPHVCDKL EQTRCSGSAC GHTGGGERFS SSCDPDGCDF NSWRMGNKTF
WGPGLIVDTK KPVQVVTQFV GSGNSCTEIK RKYVQGGKVI DNSMSNIAGM TKQYNSVSDD FCQAQKKAFG
DNDSFTKHGG FRQLGATLGK GHVLVLSLWD DHDVNMLWLD SVYPTNSNKP GSDRGPCKTS SGIPADVESQ
AASSSVKYSD IRFGAIDSTY K
TABLE 2
Database Position Position
Sequence Identifier Accession Corresponding to Corresponding to
(SEQ ID NO:) Number Species of Origin Position 268 Position 411
BD29555* Unknown 273 422
340514556 Trichoderma reesei 268 411
51243029 Penicillium occitanis 273 422
7cel (PDB) & Trichoderma reesei 251 394
67516425 Aspergillus nidulans FGSC A4 274 424
46107376 Gibberella zeae PH-1 268 415
70992391 Aspergillus fumigatus Af293 277 427
121699984 Aspergillus clavatus NRRL 1 277 427
1906845 Claviceps purpurea 269 416
1gpi (PDB) & Phanerochaete chrysosporium 240 391
119468034 Neosartorya fischeri NRRL 181 265 414
7804883 Leptosphaeria maculans 256 401
85108032 Neurospora crassa N150 268 412
169859458 Coprinopsis cinerea okayama 270 421
154292161 Botryotinia fuckeliana B05-10 — 410
169615761 # Phaeosphaeria nodorum SN15 246 393
4883502 Humicola grisea 272 413
950686 Humicola grisea 270 416
124491660 Chaetomium thermophilum 272 413
58045187 Chaetomium thermophilum 270 416
169601100 # Phaeosphaeria nodorum SN15 237 383
169870197 Coprinopsis cinerea okayama 269 421
3913806 Agaricus bisporus 263 414
169611094 Phaeosphaeria nodorum SN15 270 414
3131 Phanerochaete chrysosporium — 410
70991503 Aspergillus fumigatus Af293 265 414
294196 Phanerochaete chrysosporium 258 409
18997123 Thermoascus aurantiacus 268 418
4204214 Humicola grisea var thermoidea 272 413
34582632 Trichoderma viride (also known as 268 411
Hypochrea rufa)
156712284 Thermoascus aurantiacus 268 418
39977899 Magnaporthe grisea (oryzae) 70-15 268 414
20986705 Talaromyces emersonii 266 416
22138843 Aspergillus oryzae 265 414
55775695 Penicillium chrysogenum 276 426
171676762 Podospora anserina 270 417
146350520 Pleurotus sp Florida 268 420
37732123 Gibberella zeae 268 415
156055188 Sclerotinia sclerotiorum 1980 — 410
453224 Phanerochaete chrysosporium 258 409
50402144 Trichoderma reesei 268 411
115397177 Aspergillus terreus NIH2624 274 424
154312003 Botryotinia fuckeliana B05-10 266 416
49333365 Volvariella volvacea 268 420
729650 Penicillium janthinellum 274 424
146424871 Pleurotus sp Florida 267 418
67538012 Aspergillus nidulans FGSC A4 265 410
62006162 Fusarium poae 268 415
146424873 Pleurotus sp Florida 267 418
295937 Trichoderma viride 268 411
6179889 # Alternaria alternata 240 386
119483864 Neosartorya fischeri NRRL 181 278 428
85083281 Neurospora crassa OR74A 270 412
3913803 Cryphonectria parasitica 269 416
60729633 Corticium rolfsii 265 415
39971383 Magnaporthe grisea 70-15 268 410
39973029 Magnaporthe grisea 70-15 269 410
1170141 Fusarium oxysporum 268 415
121710012 Aspergillus clavatus NRRL 1 265 414
17902580 Penicillium funiculosum 273 422
1346226 Humicola grisea var thermoidea 270 416
156712282 Chaetomium thermophilum 270 416
169768818 Aspergillus oryzae RIB40 277 427
46241270 Gibberella pulicaris 268 415
49333363 Volvariella volvacea 265 418
46395332 Irpex lacteus 263 414
50844407 # Chaetomium thermophilum var 245 391
thermophilum
4586347 Irpex lacteus 264 415
3980202 Phanerochaete chrysosporium 258 410
27125837 Melanocarpus albomyces 273 414
171696102 Podospora anserina 265 415
3913802 Cochliobolus carbonum 270 416
50403723 Trichoderma viride 268 411
3913798 Aspergillus aculeatus 275 425
66828465 Dictyostelium discoideum 269 419
156060391 Sclerotinia sclerotiorum 1980 252 402
116181754 Chaetomium globosum CBS 148-51 263 413
145230535 Aspergillus niger 274 424
46241266 Nectria haematococca mpVI 268 415
1q9h (PDB) # Talaromyces emersonii 248 398
157362170 Polyporus arcularius 269 420
7804885 Leptosphaeria maculans 267 407
121852 Phanerochaete chrysosporium 258 409
126013214 Penicillium decumbens 264 415
156048578 Sclerotinia sclerotiorum 1980 265 413
156712278 Acremonium thermophilum 269 414
21449327 Aspergillus nidulans 265 410
171683762 Podospora anserina 274 415
56718412 Thermoascus aurantiacus var 268 418
levisporus
15824273 Pseudotrichonympha grassii 263 414
115390801 Aspergillus terreus NIH2624 266 411
453223 Phanerochaete chrysosporium 258 409
3132 Phanerochaete chrysosporium — 407
16304152 Thermoascus aurantiacus 268 417
156712280 Acremonium thermophilum 273 420
5231154 Volvariella volvacea 281 438
116200349 Chaetomium globosum CBS 148-51 270 412
4586343 Irpex lacteus 263 414
15321718 Lentinula edodes — 417
146424875 Pleurotus sp Florida 267 418
62006158 Fusarium venenatum 268 415
296027 Phanerochaete chrysosporium 258 409
154449709 Fusicoccum sp BCC4124 272 424
169859460 Coprinopsis cinerea okayama 269 421
50400675 Trichoderma harzianum 264 407
729649 Neurospora crassa 262 406
119472134 Neosartorya fischeri NRRL 181 277 427
117935080 Chaetomium thermophilum 272 413
154300584 Botryotinia fuckeliana B05-10 265 413
15824271 Pseudotrichonympha grassii 263 414
4586345 Irpex lacteus 263 414
46241268 Gibberella avenacea 268 416
6164684 Aspergillus niger 274 424
6164682 Aspergillus niger 266 412
33733371 Chrysosporium lucknowense 269 415
US6573086-10
29160311 Thielavia australiensis 269 415
146197087 uncultured symbiotic protist of 260 402
Reticulitermes speratus
146197237 uncultured symbiotic protist of 264 409
Neotermes koshunensis
146197067 uncultured symbiotic protist of 260 402
Reticulitermes speratus
146197407 uncultured symbiotic protist of 261 412
Cryptocercus punctulatus
146197157 uncultured symbiotic protist of 264 410
Hodotermopsis sjoestedti
146197403 uncultured symbiotic protist of 261 412
Cryptocercus punctulatus
146197081 uncultured symbiotic protist of 260 410
Reticulitermes speratus
146197413 uncultured symbiotic protist of 261 412
Cryptocercus punctulatus
146197309 uncultured symbiotic protist of 259 402
Mastotermes darwiniensis
146197227 uncultured symbiotic protist of 258 404
Neotermes koshunensis
146197253 uncultured symbiotic protist of 264 409
Neotermes koshunensis
146197099 uncultured symbiotic protist of 258 401
Reticulitermes speratus
146197409 uncultured symbiotic protist of 260 411
Cryptocercus punctulatus
146197315 uncultured symbiotic protist of 259 402
Mastotermes darwiniensis
146197411 uncultured symbiotic protist of 261 412
Cryptocercus punctulatus
146197161 uncultured symbiotic protist of 263 413
Hodotermopsis sjoestedti
146197323 uncultured symbiotic protist of 259 402
Mastotermes darwiniensis
146197077 uncultured symbiotic protist of 264 415
Reticulitermes speratus
146197089 uncultured symbiotic protist of 258 400
Reticulitermes speratus
146197091 uncultured symbiotic protist of 258 401
Reticulitermes speratus
146197097 uncultured symbiotic protist of 260 402
Reticulitermes speratus
146197095 uncultured symbiotic protist of 260 402
Reticulitermes speratus
146197401 uncultured symbiotic protist of 261 412
Cryptocercus punctulatus
146197225 uncultured symbiotic protist of 258 404
Neotermes koshunensis
146197317 uncultured symbiotic protist of 259 402
Mastotermes darwiniensis
146197251 uncultured symbiotic protist of 258 404
Neotermes koshunensis
146197319 uncultured symbiotic protist of 259 402
Mastotermes darwiniensis
146197071 uncultured symbiotic protist of 259 402
Reticulitermes speratus
146197075 uncultured symbiotic protist of 260 402
Reticulitermes speratus
146197159 uncultured symbiotic protist of 260 410
Hodotermopsis sjoestedti
146197405 uncultured symbiotic protist of 261 412
Cryptocercus punctulatus
146197327 uncultured symbiotic protist of 264 408
Mastotermes darwiniensis
146197261 uncultured symbiotic protist of 258 404
Neotermes koshunensis
TABLE 3
Signal Catalytic Cellulose
Database Sequence (SS) Domain (CD) Linker Start Binding
Accession Start and End Start and End and End Domain (CBD)
SEQ ID NO: Number Species of Origin Position Position Position Start and End
BD29555* Unknown 1-25 26-455 456-493 494-529
340514556 Trichoderma reesei 1-17 18-444 445-479 480-514
51243029 Penicillium occitanis 1-25 26-455 456-493 494-529
7cel (PDB) & Trichoderma reesei N/A 1-427 N/A N/A
67516425 Aspergillus nidulans 1-23 24-457 458-490 491-526
FGSC A4
46107376 Gibberella zeae PH-1 1-17 18-448 449-476 477-512
70992391 Aspergillus fumigatus 1-26 27-460 461-496 497-532
Af293
121699984 Aspergillus clavatus 1-27 27-460 461-503 504-539
NRRL 1
1906845 Claviceps purpurea 1-19 20-449 N/A N/A
1gpi (PDB) & Phanerochaete N/A 1-424 N/A N/A
chrysosporium
119468034 Neosartorya fischeri 1-17 18-447 N/A N/A
NRRL 181
7804883 Leptosphaeria 1-17 18-434 N/A N/A
maculans
85108032 Neurospora crassa 1-17 18-445 446-485 486-521
N150
169859458 Coprinopsis cinerea 1-18 19-454 N/A N/A
okayama
154292161 Botryotinia fuckeliana 1-18 19-443 444-555 556-596
B05-10
169615761 # Phaeosphaeria 1 2-426 N/A N/A
nodorum SN15
4883502 Humicola grisea 1-22 23-446 N/A N/A
950686 Humicola grisea 1-18 19-449 450-489 490-525
124491660 Chaetomium 1-22 23-446 N/A N/A
thermophilum
58045187 Chaetomium 1-18 19-449 450-494 495-530
thermophilum
169601100 # Phaeosphaeria 1 2-416 N/A N/A
nodorum SN15
169870197 Coprinopsis cinerea 1-18 19-454 N/A N/A
okayama
3913806 Agaricus bisporus 1-18 19-447 448-470 471-506
169611094 Phaeosphaeria 1-18 19-447 N/A N/A
nodorum SN15
3131 Phanerochaete 1-19 20-443 N/A N/A
chrysosporium
70991503 Aspergillus fumigatus 1-17 18-447 N/A N/A
Af293
294196 Phanerochaete 1-18 19-442 443-480 481-516
chrysosporium
18997123 Thermoascus 1-17 18-451 N/A N/A
aurantiacus
4204214 Humicola grisea var 1-22 23-446 N/A N/A
thermoidea
34582632 Trichoderma viride 1-18 18-444 445-479 480-514
(also known as
Hypochrea rufa)
156712284 Thermoascus 1-17 18-451 N/A N/A
aurantiacus
39977899 Magnaporthe grisea 1-17 18-447 N/A N/A
(oryzae) 70-15
20986705 Talaromyces emersonii 1-18 19-449 N/A N/A
22138843 Aspergillus oryzae 1-17 18-447 N/A N/A
55775695 Penicillium 1-25 26-459 460-494 495-529
chrysogenum
171676762 Podospora anserina 1-18 19-450 451-492 493-528
146350520 Pleurotus sp Florida 1-18 19-453 N/A N/A
37732123 Gibberella zeae 1-17 18-448 449-476 477-512
156055188 Sclerotinia 1-18 19-443 444-546 547-586
sclerotiorum 1980
453224 Phanerochaete 1-18 19-442 443-474 475-510
chrysosporium
50402144 Trichoderma reesei 1-17 18-444 445-478 479-513
115397177 Aspergillus terreus 1-23 24-457 458-505 506-541
NIH2624
154312003 Botryotinia fuckeliana 1-17 18-449 450-480 481-516
B05-10
49333365 Volvariella volvacea 1-18 19-453 N/A N/A
729650 Penicillium 1-25 26-456 457-502 503-537
janthinellum
146424871 Pleurotus sp Florida 1-18 19-451 452-487 488-523
67538012 Aspergillus nidulans 1-17 18-443 N/A N/A
FGSC A4
62006162 Fusarium poae 1-17 18-448 449-475 476-511
146424873 Pleurotus sp Florida 1-18 19-451 452-487 488-523
295937 Trichoderma viride 1-17 18-444 445-478 479-513
6179889 # Alternaria alternata 1 2-419 N/A N/A
119483864 Neosartorya fischeri 1-26 27-461 462-499 500-535
NRRL 181
85083281 Neurospora crassa 1-20 21-445 N/A N/A
OR74A
3913803 Cryphonectria 1-18 19-449 N/A N/A
Parasitica
60729633 Corticium rolfsii 1-18 19-448 449-492 493-528
39971383 Magnaporthe grisea 1-17 18-443 N/A N/A
70-15
39973029 Magnaporthe grisea 1-19 20-443 N/A N/A
70-15
1170141 Fusarium oxysporum 1-17 18-448 449-478 479-514
121710012 Aspergillus clavatus 1-17 18-447 N/A N/A
NRRL 1
17902580 Penicillium 1-25 26-455 456-493 494-529
funiculosum
1346226 Humicola grisea var 1-18 19-449 450-489 490-525
thermoidea
156712282 Chaetomium 1-18 19-449 450-496 497-532
thermophilum
169768818 Aspergillus oryzae 1-25 26-460 N/A N/A
RIB40
46241270 Gibberella pulicaris 1-17 18-448 449-474 475-510
49333363 Volvariella volvacea 1-18 19-451 452-476 477-512
46395332 Irpex lacteus 1-18 19-447 448-485 486-521
50844407 # Chaetomium N/A 1-424 425-469 470-505
thermophilum var
thermophilum
4586347 Irpex lacteus 1-18 19-448 449-490 491-526
3980202 Phanerochaete 1-18 19-443 444-475 476-511
chrysosporium
27125837 Melanocarpus 1-23 23-447 N/A N/A
albomyces
171696102 Podospora anserina 1-17 17-448 N/A N/A
3913802 Cochliobolus 1-18 19-449 N/A N/A
carbonum
50403723 Trichoderma viride 1-17 18-444 445-479 480-514
3913798 Aspergillus aculeatus 1-22 23-458 459-505 506-540
66828465 Dictyostelium 1-19 20-452 N/A N/A
discoideum
156060391 Sclerotinia 1-17 18-435 436-470 471-504
sclerotiorum 1980
116181754 Chaetomium globosum 1-17 18-446 N/A N/A
CBS 148-51
145230535 Aspergillus niger 1-21 22-457 458-500 501-536
46241266 Nectria haematococca 1-18 18-448 449-472 473-508
mpVI
1q9h (PDB) # Talaromyces emersonii N/A 1-431 N/A N/A
157362170 Polyporus arcularius 1-18 19-453 N/A N/A
7804885 Leptosphaeria 1-20 21-440 N/A N/A
maculans
121852 Phanerochaete 1-18 19-442 443-480 481-516
chrysosporium
126013214 Penicillium decumbens 1-17 18-448 N/A N/A
156048578 Sclerotinia 1-16 17-446 N/A N/A
sclerotiorum 1980
156712278 Acremonium 1-17 18-447 448-487 488-523
thermophilum
21449327 Aspergillus nidulans 1-17 18-443 N/A N/A
171683762 Podospora anserina 1-22 23-448 N/A N/A
56718412 Thermoascus 1-17 18-451 N/A N/A
aurantiacus var
levisporus
15824273 Pseudotrichonympha 1-20 21-447 N/A N/A
grassii
115390801 Aspergillus terreus 1-17 18-444 N/A N/A
NIH2624
453223 Phanerochaete 1-18 19-442 443-474 475-510
chrysosporium
3132 Phanerochaete 1-19 20-436 437-467 468-504
chrysosporium
16304152 Thermoascus 1-17 18-450 N/A N/A
aurantiacus
156712280 Acremonium 1-21 22-453 N/A N/A
thermophilum
5231154 Volvariella volvacea 1-15 16-472 473-500 501-536
116200349 Chaetomium globosum 1-20 21-445 N/A N/A
CBS 148-51
4586343 Irpex lacteus 1-18 19-447 448-481 482-517
15321718 Lentinula edodes 1-18 19-450 451-480 481-516
146424875 Pleurotus sp Florida 1-18 19-451 452-487 488-523
62006158 Fusarium venenatum 1-17 18-448 449-471 472-507
296027 Phanerochaete 1-18 19-442 443-480 481-516
chrysosporium
154449709 Pusicoccum sp 1-19 20-457 N/A N/A
BCC4124
169859460 Coprinopsis cinerea 1-18 19-454 N/A N/A
okayama
50400675 Trichoderma 1-17 18-440 441-470 471-505
harzianum
729649 Neurospora crassa 1-17 18-439 440-480 481-516
119472134 Neosartorya fischeri 1-26 27-460 461-494 495-530
NRRL 181
117935080 Chaetomium 1-22 23-446 N/A N/A
thermophilum
154300584 Botryotinia fuckeliana 1-16 17-446 N/A N/A
B05-10
15824271 Pseudotrichonympha 1-20 21-447 N/A N/A
grassii
4586345 Irpex lacteus 1-18 19-447 448-487 488-523
46241268 Gibberella avenacea 1-17 18-449 450-478 478-513
6164684 Aspergillus niger 1-21 22-457 458-500 501-536
6164682 Aspergillus niger 1-17 18-445 N/A N/A
33733371 Chrysosporium 1-17 18-448 449-490 491-526
lucknowense
US6573086-10
29160311 Thielavia australiensis 1-18 18-448 449-502 503-538
146197087 uncultured symbiotic 1-22 23-435 N/A N/A
protist of
Reticulitermes speratus
146197237 uncultured symbiotic 1-20 21-442 N/A N/A
protist of Neotermes
koshunensis
146197067 uncultured symbiotic 1-22 23-435 N/A N/A
protist of
Reticulitermes speratus
146197407 uncultured symbiotic 1-19 20-445 N/A N/A
protist of Cryptocercus
punctulatus
146197157 uncultured symbiotic 1-20 21-443 N/A N/A
protist of
Hodotermopsis
sjoestedti
146197403 uncultured symbiotic 1-19 20-445 N/A N/A
protist of Cryptocercus
punctulatus
146197081 uncultured symbiotic 1-22 23-443 N/A N/A
protist of
Reticuhtermes speratus
146197413 uncultured symbiotic 1-19 20-445 N/A N/A
protist of Cryptocercus
punctulatus
146197309 uncultured symbiotic 1-20 21-435 N/A N/A
protist of Mastotermes
darwiniensis
146197227 uncultured symbiotic 1-19 20-437 N/A N/A
protist of Neotermes
koshunensis
146197253 uncultured symbiotic 1-21 21-442 N/A N/A
protist of Neotermes
koshunensis
146197099 uncultured symbiotic 1-22 23-434 N/A N/A
protist of
Rehculitermes speratus
146197409 uncultured symbiotic 1-19 20-444 N/A N/A
protist of Cryptocercus
punctulatus
146197315 uncultured symbiotic 1-20 21-435 N/A N/A
protist of Mastotermes
darwiniensis
146197411 uncultured symbiotic 1-19 20-445 N/A N/A
protist of Cryptocercus
Punctulatus
146197161 uncultured symbiotic 1-20 21-446 N/A N/A
protist of
Hodotermopsis
sjoestedti
146197323 uncultured symbiotic 1-20 21-435 N/A N/A
protist of Mastotermes
darwiniensis
146197077 uncultured symbiotic 1-21 22-448 N/A N/A
protist of
Reticuhtermes speratus
146197089 uncultured symbiotic 1-22 23-433 N/A N/A
protist of
Reticuhtermes speratus
146197091 uncultured symbiotic 1-22 23-434 N/A N/A
protist of
Reticuhtermes speratus
146197097 uncultured symbiotic 1-22 23-435 N/A N/A
protist of
Reticuhtermes speratus
146197095 uncultured symbiotic 1-22 23-435 N/A N/A
protist of
Reticuhtermes speratus
146197401 uncultured symbiotic 1-19 20-445 N/A N/A
protist of Cryptocercus
Punctulatus
146197225 uncultured symbiotic 1-19 20-437 N/A N/A
protist of Neotermes
koshunensis
146197317 uncultured symbiotic 1-20 21-435 N/A N/A
protist of Mastotermes
darwiniensis
146197251 uncultured symbiotic 1-19 20-437 N/A N/A
protist of Neotermes
koshunensis
146197319 uncultured symbiotic 1-20 21-435 N/A N/A
protist of Mastotermes
darwiniensis
146197071 uncultured symbiotic 1-25 26-435 N/A N/A
protist of
Reticulitermes speratus
146197075 uncultured symbiotic 1-22 23-435 N/A N/A
protist of
Reticulitermes speratus
146197159 uncultured symbiotic 1-23 24-443 N/A N/A
protist of
Hodotermopsis
sjoestedti
146197405 uncultured symbiotic 1-19 20-445 N/A N/A
protist of Cryptocercus
punctulatus
146197327 uncultured symbiotic 1-20 21-441 N/A N/A
protist of Mastotermes
darwiniensis
146197261 uncultured symbiotic 1-19 20-437 N/A N/A
protist of Neotermes
koshunensis
TABLE 4
Amino Acid Amino Acid Position of
Positions of Positions of Active Catalytic
Sequence Database Fragment in Site Loop in Residues in
Identifier Accession Amino Acid Sequence of Fragment of Catalytic Domain Sequence Sequence Sequence
(SEQ ID NO:) Number Species of Origin Including Loop and Catalytic Residue Identifier Identifier Identifier
BD29555* Unknown NVEGWTPSSNNANTGLGNHGACCAELDIWEANS 210-242 214-226 234, 239
340514556 Trichoderma reesei NVEGWTPSANNANTGIGNHGACCAELDIWEANS 205-237 209-221 229, 234
51243029 Penicillium occitanis NVEGWEPSSNNANTGIGGHGSCCSEMDIWEANS 210-242 214-226 234, 239
7cel (PDB) & Trichoderma reesei NVEGWEPSSNNANTGIGGHGSCCSEMDIWQANS 188-220 192-204 212, 217
67516425 Aspergillus nidulans NVEGWESSDTNPNGGVGNHGSCCAEMDIWEANS 211-243 215-227 235, 240
FGSC A4
46107376 Gibberella zeae PH-1 NSDGWQPSDSDVNGGIGNLGTCCPEMDIWEANS 205-237 209-221 229, 234
70992391 Aspergillus fumigatus NVEGWQPSSNDANAGTGNHGSCCAEMDIWEANS 214-246 218-230 238, 243
Af293
121699984 Aspergillus clavatus NVEGWTPSSSDANAGNGGHGSCCAEMDIWEANS 214-246 218-230 238, 243
NRRL 1
1906845 Claviceps purpurea NSKDWIPSKSDANAGIGSLGACCREMDIWEANN 206-238 210-222 230, 235
1gpi (PDB) & Phanerochaete NVGNWTETG-SNTGTGSYGTCCSEMDIWEANN 185-215 189-199 207, 212
chrysosporium
119468034 Neosartorya fischeri NVEGWKPSSNDKNAGVGGHGSCCPEMDIWEANS 202-234 206-218 226, 231
NRRL 181
7804883 Leptosphaeria NVEGWQPSKNDQNAGVGGHGSCCAEMDIWEANS 193-225 197-209 217, 222
maculans
85108032 Neurospora crassa NVEGWTPSTNDANAGIGDHGTCCSEMDIWEANK 205-237 209-221 229, 234
N150 (OR74A)
169859458 Coprinopsis cinerea NSADWTPSETDPNAGRGRYGICCAEMDIWEANS 207-239 211-223 231, 236
okayama
154292161 Botryotinia NVEGWVPDSNSANSGTGNIGSCCSEFDVWEANS 203-235 207-219 227, 232
fuckeliana B05-10
169615761 # Phaeosphaeria NADGWQASTSDPNAGVGKKGACCAEMDVWEANS 183-215 187-199 207, 212
nodorum SN15
4883502 Humicola grisea NIEGWRPSTNDPNAGVGPMGACCAEIDVWESNA 208-240 212-224 232, 237
950686 Humicola grisea NIEGWTGSTNDPNAGAGRYGTCCSEMDIWEANN 207-239 211-223 231, 236
124491660 Chaetomium NIEGWRPSTNDANAGVGPYGACCAEIDVWESNA 209-241 213-225 233, 238
thermophilum
58045187 Chaetomium NIENWTPSTNDANAGFGRYGSCCSEMDIWEANN 207-239 211-223 231, 236
thermophilum
169601100 # Phaeosphaeria NVEGWKPSDNDANAGVGGHGSCCAEMDIWEANS 174-206 178-190 198, 203
nodorum SN15
169870197 Coprinopsis cinerea NSVGWEPSETDSNAGRGRYGICCAEMDIWEANS 207-239 211-223 231, 236
okayama
3913806 Agaricus bisporus NSEGWEGSPNDVNAGTGNFGACCGEMDIWEANS 203-235 207-219 227, 232
169611094 Phaeosphaeria NVEGWNPSDADPNAGSGKIGACCPEMDIWEANS 208-240 212-224 232, 237
nodorum SN15
3131 Phanerochaete NVQGWNATS--ATTGTGSYGSCCTELDIWEANS 204-234 208-218 226, 231
chrysosporium
70991503 Aspergillus fumigatus NVEGWEPSSSDKNAGVGGHGSCCPEMDIWEANS 202-234 206-218 226, 231
Af293
294196 Phanerochaete NVEGWNATS--ANAGTGNYGTCCTEMDIWEANN 203-233 207-217 225, 230
chrysosporium
18997123 Thermoascus NVEGWQPSANDPNAGVGNHGSSCAEMDVWEANS 205-237 209-221 229, 234
aurantiacus
4204214 Humicola grisea var NIEGWRPSTNDPNAGVGPMGACCAEIDVWESNA 208-240 212-224 232, 237
thermoidea
34582632 Trichoderma viride NVEGWEPSSNNANTGIGGHGSCCSEMDIWEANS 205-237 209-221 229, 234
(also known as
Hypochrea rufa)
156712284 Thermoascus NVEGWQPSANDPNAGVGNHGSCCAEMDVWEANS 205-237 209-221 229, 234
aurantiacus
39977899 Magnaporthe grisea NVEGWQPSSGDANSGVGNMGSCCAEMDIWEANS 205-237 209-221 229, 234
(oryzae) 70-15
20986705 Talaromyces NVEGWQPSSNNANTGIGDHGSCCAEMDVWEANS 203-235 207-219 227, 232
emersonii
22138843 Aspergillus oryzae R-KGWEPSDSDKNAGVGGHGSCCPQMDIWEANS 203-234 206-218 226, 231
55775695 Penicillium NVEGWEPSSSDVNGGTGNYGSCCAEMDIWEANS 213-245 217-229 237, 242
chrysogenum
171676762 Podospora anserina NIEGWNPSTNDVNAGAGRYGTCCSEMDIWEANN 207-239 211-223 231, 236
146350520 Pleurotus sp Florida NVQGWQPSPNDSNAGKGQYGSCCAEMDIWEANS 207-239 211-223 231, 236
37732123 Gibberella zeae NSDGWQPSDSDVNGGIGNLGTCCPEMDIWEANS 205-237 209-221 229, 234
156055188 Sclerotinia NNEGWVPDSNSANSGTGNIGSCCSEFDVWEANS 203-235 207-219 227, 232
sclerotiorum 1980
453224 Phanerochaete NVGNWTETG--SNTGTGSYGTCCSEMDIWEANN 203-233 207-217 225, 230
chrysosporium
50402144 Trichoderma reesei NVEGWEPSSNNANTGIGGHGSCCSEMDIWEANS 205-237 209-221 229, 234
115397177 Aspergillus terreus NVEGWEPSANDANAGTGNHGSCCAEMDIWEANS 211-243 215-227 235, 240
NIH2624
154312003 Botryotinia NSVGWTPSSNDVNAGAGQYGSCCSEMDIWEANK 206-238 210-222 230, 235
fuckeliana B05-10
49333365 Volvariella volvacea NVQGWQPSPNDTNAGTGNYGACCNEMDVWEANS 207-239 211-223 231, 236
729650 Penicillium NVDGWTPSKNDVNSGIGNHGSCCAEMDIWEANS 211-243 215-227 235, 240
janthinellum
146424871 Pleurotus sp Florida NILDWSASATDANAGNGRYGACCAEMDIWEANS 206-238 210-222 230, 235
67538012 Aspergillus nidulans NVEGWEPSDSDANAGVGGMGTCCPEMDIWEANS 202-234 206-218 226, 231
FGSC A4
62006162 Fusarium poae NSDGWEPSKSDVNGGIGNLGTCCPEMDIWEANS 205-237 209-221 229, 234
146424873 Pleurotus sp Florida NILDWSGSATDPNAGNGRYGACCAEMDIWEANS 206-238 210-222 230, 235
295937 Trichoderma viride NVEGWEPSSNNANTGIGGHGSCCSEMDIWEANS 205-237 209-221 229, 234
6179889 # Alternaria alternata NVEGWKPSSNDANAGVGGHGSCCAEMDIWEANS 177-209 181-193 201, 206
119483864 Neosartorya fischeri NVEGWTPSSNNENTGLGNYGSCCAELDIWESNS 215-247 219-231 239, 244
NRRL 181
85083281 Neurospora crassa NIEGWTPSTNDANAGVGPYGGCCAEIDVWESNA 207-239 211-223 231, 236
OR74A
3913803 Cryphonectria NVEGWTPSTNDANAGVGGLGSCCSEMDVWEANS 206-238 210-222 230, 235
parasitica
60729633 Corticium rolfsii NLLDWNATS--ANSGTGSYGSCCPEMDIWEANK 206-236 210-220 228, 233
39971383 Magnaporthe grisea NIEGWQPSSTDSSAGIGAQGACCAEIDIWESNK 205-237 209-221 229, 234
70-15
39973029 Magnaporthe grisea NIEGWKPSSNDANAGVGPYGACCAEIDVWESNA 206-238 210-222 230, 235
70-15
1170141 Fusarium oxysporum NSEGWKPSDSDVNAGVGNLGTCCPEMDIWEANS 205-237 209-221 229, 234
121710012 Aspergillus clavatus NVEGWKPSDNDKNAGVGGYGSCCPEMDIWEANS 202-234 206-218 226, 231
NRRL 1
17902580 Penicillium NVEGWTPSTNNSNTGIGNHGSCCAELDIWEANS 210-242 214-226 234, 239
funiculosum
1346226 Humicola grisea var NIEGWTGSTNDPNAGAGRYGTCCSEMDIWEANN 207-239 211-223 231, 236
thermoidea
156712282 Chaetomium NVGNWTPSTNDANAGFGRYGSCCSEMDVWEANN 207-239 211-223 231, 236
thermophilum
169768818 Aspergillus oryzae NVEGWVSSTNNANTGTGNHGSCCAELDIWESNS 214-246 218-230 238, 243
RIB40
46241270 Gibberella pulicaris NSDGWQPSKSDVNAGIGNMGTCCPEMDIWEANS 205-237 209-221 229, 234
49333363 Volvariella volvacea NVAGWNGSPNDTNAGTGNWGACCNEMDIWEANS 205-237 209-221 229, 234
46395332 Irpex lacteus NVAGWTGSSSDPNSGTGNYGTCCSEMDIWEANS 202-234 206-218 226, 231
50844407 # Chaetomium NIENWTPSTNDANAGFGRYGSCCSEMDIWEANN 182-214 186-198 206, 211
thermophilum var
thermophilum
4586347 Irpex lacteus NIVDWTASAGDANSGTGSFGTCCQEMDIWEANS 203-235 207-219 227, 232
3980202 Phanerochaete NVGNWTETG--SNTGTGSYGTCCSEMDIWEANN 203-233 207-217 225, 230
chrysosporium
27125837 Melanocarpus NIEGWKSSTSDPNAGVGPYGSCCAEIDVWESNA 210-242 214-226 234, 239
albomyces
171696102 Podospora anserina NVEGWGGAD--GNSGTGKYGICCAEMDIWEANS 206-236 210-220 228, 233
3913802 Cochliobolus NVEGWNPSDADPNGGAGKIGACCPEMDIWEANS 208-240 212-224 232, 237
carbonum
50403723 Trichoderma viride NVEGWEPSSNNANTGIGGHGSCCSEMDIWEANS 205-237 209-221 229, 234
3913798 Aspergillus aculeatus NIEGWEPSSTDVNAGTGNHGSCCPEMDIWEANS 210-242 214-226 234, 239
66828465 Dictyostelium NVDGWIPSTNNPNTGYGNLGSCCAEMDLWEANN 206-238 210-222 230, 235
discoideum
156060391 Sclerotinia NSVGWTPSSNDVNTGTGQYGSCCSEMDIWEANK 192-224 196-208 216, 221
sclerotiorum 1980
116181754 Chaetomium NSEGWGGED--GNSGTGKYGTCCAEMDIWEANL 203-233 207-217 225, 230
globosum CBS 148-
51
145230535 Aspergillus niger NCDGWEPSSNNVNTGVGDHGSCCAEMDVWEANS 209-241 213-225 233, 238
46241266 Nectria NSDEWKPSDSDKNAGVGKYGTCCPEMDIWEANK 205-237 209-221 229, 234
haematococca mpVI
1q9h (PDB) # Talaromyces NVEGWQPSSNNANTGIGDHGSCCAEMDVWEANS 185-217 189-201 209, 214
emersonii
157362170 Polyporus arcularius NVLDWAGSSNDPNAGTGHYGTCCNEMDIWEANS 208-240 212-224 232, 237
7804885 Leptosphaeria NAEGWTKSASDPNSGVGKKGACCAQMDVWEANS 204-236 208-220 228, 233
maculans
121852 Phanerochaete NVEGWNATS--ANAGTGNYGTCCTEMDIWEANN 203-233 207-217 225, 230
chrysosporium
126013214 Penicillium NVEGWKPSANDKNAGVGPHGSCCAEMDIWEANS 201-233 205-217 225, 230
decumbens
156048578 Sclerotinia NVDGWVPSSNNPNTGVGNYGSCCAEMDIWEANS 202-234 206-218 226, 231
sclerotiorum 1980
156712278 Acremonium NIDGWQPSSNDANAGLGNHGSCCSEMDIWEANK 206-238 210-222 230, 235
thermophilum
21449327 Aspergillus nidulans NVEGWEPSDSDANAGVGGMGTCCPEMDIWEANS 202-234 206-218 226, 231
(also known as
Emericella nidulans)
171683762 Podospora anserine NIEGWRESSNDENAGVGPYGGCCAEIDVWESNA 211-243 215-227 235, 240
(S mat+)
56718412 Thermoascus NVEGWQPSANDPNAGVGNHGSCCAEMDVWEANS 205-237 209-221 229, 234
aurantiacus var
levisporus
15824273 Pseudotrichonympha NVENWKPQTNDENAGNGRYGACCTEMDIWEANK 200-232 204-216 224, 229
grassii
115390801 Aspergillus terreus NVEGWTPSDNDKNAGVGGHGSCCPELDIWEANS 203-235 207-219 227, 232
NIH2624
453223 Phanerochaete NVGNWTETG--SNTGTGSYGTCCSEMDIWEANN 203-233 207-217 225, 230
chrysosporium
3132 Phanerochaete NVEGWLGTT--ATTGTGFFGSCCTDIALWEAND 202-232 206-216 224, 229
chrysosporium
16304152 Thermoascus NVEGWQPSANDPNAGVGNHGSSCAEMDVWEANS 205-237 209-221 229, 234
aurantiacus
156712280 Acremonium NSASWQPSSNDQNAGVGGMGSCCAEMDIWEANS 210-242 214-226 234, 239
thermophilum
5231154 Volvariella volvacea NVQGWQPSPNDTNAGTGNYGACCNKMDVWEANS 220-252 224-236 244, 249
116200349 Chaetomium NYDGWTPSSNDANAGVGALGGCCAEIDVWESNA 207-239 211-223 231, 236
globosum CBS 148-
51
4586343 Irpex lacteus NVAGWAGSASDPNAGSGTLGTCCSEMDIWEANN 202-234 206-218 226, 231
15321718 Lentinula edodes NVEGWTPSSTSPNAGTGGTGICCNEMDIWEANS 208-240 212-224 232, 237
146424875 Pleurotus sp Florida NVLDWSASATDDNAGNGRYGACCAEMDIWEANS 206-238 210-222 230, 235
62006158 Fusarium venenatum NSDGWQPSKSDVNGGIGNLGTCCPEMDIWEANS 205-237 209-221 229, 234
296027 Phanerochaete NVEGWNATS--ANAGTGNYGTCCTEMDIWEANN 203-233 207-217 225, 230
chrysosporium
154449709 Fusicoccum sp NVQNWTASSTDKNAGTGHYGSCCNEMDIWEANS 209-241 213-225 233, 238
BCC4124
169859460 Coprinopsis cinerea NSVGWEPSETDPNAGKGQYGICCAEMDIWEANS 207-239 211-223 231, 236
okayama
50400675 Trichoderma NVEGWEPSSNNANTGVGGHGSCCSEMDIWEANS 201-233 205-217 225, 230
harzianum
(anamorph of
Hypocrea lixii)
729649 Neurospora crassa NVEGWTPSTNDAN-GIGDHGSCCSEMDIWEANK 200-231 204-215 223, 228
(OR74A)
119472134 Neosartorya fischeri NVEGWQPSSNDANAGTGNHGSCCAEMDIWEANS 214-246 218-230 238, 243
NRRL 181
117935080 Chaetomium NIEGWRPSTNDANAGVGPYGACCAEIDVWESNA 209-241 213-225 233, 238
thermophilum
154300584 Botryotinia NVDGWVPSSNNANTGVGNHGSCCAEMDIWEANS 202-234 206-218 226, 231
fuckeliana B05-10
15824271 Pseudotrichonympha NVENWKPQTNDENAGNGRYGACCTEMDIWEANK 200-232 204-216 224, 229
grassii
4586345 Irpex lacteus NVEGWTGSSTDSNSGTGNYGTCCSEMDIWEANS 202-234 206-218 226, 231
46241268 Gibberella avenacea NSDGWKPSDSDINAGIGNMGTCCPEMDIWEANS 205-237 209-221 229, 234
6164684 Aspergillus niger NCDGWEPSSNNVNTGVGDHGSCCAEMDVWEANS 209-241 213-225 233, 238
6164682 Aspergillus niger NVDGWEPSSNNDNTGIGNHGSCCPEMDIWEANK 203-235 207-219 227, 232
33733371 Chrysosporium NVENWQSSTNDANAGTGKYGSCCSEMDVWEANN 206-238 210-222 230, 235
lucknowense
U.S. Pat. No. 6,573,086-10
29160311 Thielavia NVEGWESSTNDANAGSGKYGSCCTEMDVWEANN 206-238 210-222 230, 235
australiensis
146197087 uncultured symbiotic NVDDWKPQDNDENSGNGKLGTCCSEMDIWEGNM 197-229 201-213 221, 226
protist of
Reticulitermes
speratus
146197237 uncultured symbiotic NSEGWKPQSGDKNAGNGKYGSCCSEMDVWESNS 200-232 204-216 224, 229
protist of Neotermes
koshunensis
146197067 uncultured symbiotic NVDDWKPQDNDENSGNGKLGTCCSEMDIWEGNM 197-229 201-213 221, 226
protist of
Reticulitermes
speratus
146197407 uncultured symbiotic NVLDWKPQSNDENSGNGRYGACCTEMDIWEANS 198-230 202-214 222, 227
protist of
Cryptocercus
punctulatus
146197157 uncultured symbiotic NVEGWKPSDNDENAGTGKWGACCTEMDIWEANK 201-233 205-217 225, 230
protist of
Hodotermopsis
sjoestedti
146197403 uncultured symbiotic NVLDWKPQSNDENSGNGRYGACCTEMDIWEANS 198-230 202-214 222, 227
protist of
Cryptocercus
punctulatus
146197081 uncultured symbiotic NVDDWKPQDNDENSGDGKLGTCCSEMDIWEGNA 197-229 201-213 221, 226
protist of
Reticulitermes
speratus
146197413 uncultured symbiotic NVLDWKPQSNDENSGNGRYGACCTEMDIWEANS 198-230 202-214 222, 227
protist of
Cryptocercus
punctulatus
146197309 uncultured symbiotic NSDGWKPQSNDKNSGNGKYGSCCSEMDIWEANS 196-228 200-212 220, 225
protist of
Mastotermes
darwiniensis
146197227 uncultured symbiotic NSDGWKPQKNDKNSGNGKYGSCCSEMDIWEANS 195-227 199-211 219, 224
protist of Neotermes
koshunensis
146197253 uncultured symbiotic NSEGWKPQSGDKNAGNGKYGSCCSEMDVWESNS 200-232 204-216 224, 229
protist of Neotermes
koshunensis
146197099 uncultured symbiotic NVLDWKPQSNDENAGTGRYGTCCTEMDIWEANS 197-229 201-213 221, 226
protist of
Reticulitermes
speratus
146197409 uncultured symbiotic NVLDWKPQSNDENSGNGRWGARCTEMDIWEANS 198-230 202-214 222, 227
protist of
Cryptocercus
punctulatus
146197315 uncultured symbiotic NSDGWKPQSNDKNSGNGKYGSCCSEMDIWEANS 196-228 200-212 220, 225
protist of
Mastotermes
darwiniensis
146197411 uncultured symbiotic NVLDWKPQSNDENSGNGRYGACCTEMDIWEANS 198-230 202-214 222, 227
protist of
Cryptocercus
punctulatus
146197161 uncultured symbiotic NVQDWKPSDNDDNAGTGHYGACCTEMDIWEANK 201-233 205-217 225, 230
protist of
Hodotermopsis
sjoestedti
146197323 uncultured symbiotic NSDGWKPQSNDKNSGNGKYGSCCSEMDIWEANS 196-228 200-212 220, 225
protist of
Mastotermes
darwiniensis
146197077 uncultured symbiotic NVLDWKPQETDENSGNGRYGTCCTEMDIWEANS 201-233 205-217 225, 230
protist of
Reticulitermes
speratus
146197089 uncultured symbiotic NVEDWKPQDNDENSGNGKLGTCCSEMDIWEGNA 197-229 201-213 221, 226
protist of
Reticulitermes
speratus
146197091 uncultured symbiotic NVLDWKPQSNDENAGTGRYGTCCTEMDIWEANS 197-229 201-213 221, 226
protist of
Reticulitermes
speratus
146197097 uncultured symbiotic NVDDWKPQDNDENSGNGKLGTCCSEMDIWEGNA 197-229 201-213 221, 226
protist of
Reticulitermes
speratus
146197095 uncultured symbiotic NVDDWKPQDNDENSGNGKLGTCCSEMDIWEGNA 197-229 201-213 221, 226
protist of
Reticulitermes
speratus
146197401 uncultured symbiotic NVLDWKPQSNDENSGNGRYGACCIEMDIWEANS 198-230 202-214 222, 227
protist of
Cryptocercus
punctulatus
146197225 uncultured symbiotic NSDGWKPQKNDKNSGNGKYGSCCSEMDIWEANS 195-227 199-211 219, 224
protist of Neotermes
koshunensis
146197317 uncultured symbiotic NSDGWKPQSNDKNSGNGKYGSCCSEMDIWEANS 196-228 200-212 220, 225
protist of
Mastotermes
darwiniensis
146197251 uncultured symbiotic NSDGWKPQKNDKNSGNGRYGSCCSEMDVWEANS 195-227 199-211 219, 224
protist of Neotermes
koshunensis
146197319 uncultured symbiotic NSDGWKPQSNDKNSGNGKYGSCCSEMDIWEANS 196-228 200-212 220, 225
protist of
Mastotermes
darwiniensis
146197071 uncultured symbiotic NILDWKPSSNDENAGAGRYGTCCTEMDIWEANS 200-232 204-216 224, 229
protist of
Reticulitermes
speratus
146197075 uncultured symbiotic NVDDWKPQDNDENSGNGKLGTCCSEMDIWEGNA 197-229 201-213 221, 226
protist of
Reticulitermes
speratus
146197159 uncultured symbiotic NVKDWKPQETDENAGNGHYGACCTEMDIWEANS 197-229 201-213 221, 226
protist of
Hodotermopsis
sjoestedti
146197405 uncultured symbiotic NVLDWKPQSNDENSGNGRYGACCTEMDIWEANS 198-230 202-214 222, 227
protist of
Cryptocercus
punctulatus
146197327 uncultured symbiotic NSDGWKPQDNDENSGNGKYGSCCSEMDIWEANS 201-233 205-217 225, 230
protist of
Mastotermes
darwiniensis
146197261 uncultured symbiotic NSDGWKPQKNDKNSGNGKYGSCCSEMDIWEANS 195-227 199-211 219, 224
protist of Neotermes
koshunensis
TABLE 5
Tolerance to Tolerance to
250 mg/L Cellobiose Cellobiose Accumulation
% Activity in % Activity in
4-MUL Assay Bagasse Assay
Substitution(s) (+/−Cellobiose)± (−/+BG)¥
None 25% 60%
R273K/R422K 95% 84%
R273K/Y274Q/ 78% ND
D281K/Y410H/
P411G/R422K
TABLE 6
Tolerance to
250 mg/L Cellobiose Tolerance to
% Activity in Cellobiose Accumulation
4-MUL Assay % Activity in Bagasse Assay
Substitution(s) (+/−Cellobiose)± (−/+BG)¥
None 23% 74%
R268K/R411K 92% 94%
R268A/R411A 92% 95%
R268A/R411K 97% 94%
R268K/R411A 97% 102%
R268K ND 92%
R268A ND 86%
R411K ND 89%
R411A ND 94%
TABLE 7
SEQ ID NO. Amino Acid Sequence
MSALNSFNMY KSALILGSLL ATAGAQQIGT YTAETHPSLS WSTCKSGGSC TTNSGAITLD ANWRWVHGVN TSTNCYTGNT
WNTAICDTDA SCAQDCALDG ADYSGTYGIT TSGNSLRLNF VTGSNVGSRT YLMADNTHYQ IFDLLNQEFT FTVDVSHLPC
GLNGALYFVT MDADGGVSKY PNNKAGAQYG VGYCDSQCPR DLKFIAGQAN VEGWTPSSNN ANTGLGNHGA CCAELDIWEA
NSISEALTPH PCDTPGLSVC TTDACGGTYS SDRYAGTCDP DGCDFNPYRL GVTDFYGSGK TVDTTKPITV VTQFVTDDGT
STGTLSEIRR YYVQNGVVIP QPSSKISGVS GNVINSDFCD AEISTFGETA SFSKHGGLAK MGAGMEAGMV LVMSLWDDYS
VNMLWLDSTY PTNATGTPGA ARGSCPTTSG DPKTVESQSG SSYVTFSDIR VGPFNSTFSG GSSTGGSSTT TASGTTTTKA
SSTSTSSTST GTGVAAHWGQ CGGQGWTGPT TCASGTTCTV VNPYYSQCL
MYRKLAVISA FLATARAQSA CTLQSETHPP LTWQKCSSGG TCTQQTGSVV IDANWRWTHA TNSSTNCYDG NTWSSTLCPD
NETCAKNCCL DGAAYASTYG VTTSGNSLSI GFVTQSAQKN VGARLYLMAS DTTYQEFTLL GNEFSFDVDV SQLPCGLNGA
LYFVSMDADG GVSKYPTNTA GAKYGTGYCD SQCPRDLKFI NGQANVEGWE PSSNNANTGI GGHGSCCSEM DIWEANSISE
ALTPHPCTTV GQEICEGDGC GGTYSDNRYG GTCDPDGCDW NPYRLGNTSF YGPGSSFTLD TTKKLTVVTQ FETSGAINRY
YVQNGVTFQQ PNAELGSYSG NELNDDYCTA EEAEFGGSSF SDKGGLTQFK KATSGGMVLV MSLWDDYYAN MLWLDSTYPT
NETSSTPGAV RGSCSTSSGV PAQVESQSPN AKVTFSNIKF GPIGSTGNPS GGNPPGGNPP GTTTTRRPAT TTGSSPGPTQ
SHYGQCGGIG YSGPTVCASG TTCQVLNPYY SQCL
MSALNSFNMY KSALILGSLL ATAGAQQIGT YTAETHPSLS WSTCKSGGSC TTNSGAITLD ANWRWVHGVN TSTNCYTGNT
WNSAICDTDA SCAQDCALDG ADYSGTYGIT TSGNSLRLNF VTGSNVGSRT YLMADNTHYQ IFDLLNQEFT FTVDVSHLPC
GLNGALYFVT MDADGGVSKY PNNKAGAQYG VGYCDSQCPR DLKFIAGQAN VEGWTPSANN ANTGIGNHGA CCAELDIWEA
NSISEALTPH PCDTPGLSVC TTDACGGTYS SDRYAGTCDP DGCDFNPYRL GVTDFYGSGK TVDTTKPFTV VTQFVTNDGT
STGSLSEIRR YYVQNGVVIP QPSSKISGIS GNVINSDYCA AEISTFGGTA SFNKHGGLTN MAAGMEAGMV LVMSLWDDYA
VNMLWLDSTY PTNATGTPGA ARGTCATTSG DPKTVESQSG SSYVTFSDIR VGPFNSTFSG GSSTGGSTTT TASRTTTTSA
SSTSTSSTST GTGVAGHWGQ CGGQGWTGPT TCVSGTTCTV VNPYYSQCL
ESACTLQSET HPPLTWQKCS SGGTCTQQTG SVVIDANWRW THATNSSTNC YDGNTWSSTL CPDNETCAKN CCLDGAAYAS
TYGVTTSGNS LSIDFVTQSA QKNVGARLYL MASDTTYQEF TLLGNEFSFD VDVSQLPCGL NGALYFVSMD ADGGVSKYPT
NTAGAKYGTG YCDSQCPRDL KFINGQANVE GWEPSSNNAN TGIGGHGSCC SEMDIWQANS ISEALTPHPC TTVGQEICEG
DGCGGTYSDN RYGGTCDPDG CDWNPYRLGN TSFYGPGSSF TLDTTKKLTV VTQFETSGAI NRYYVQNGVT FQQPNAELGS
YSGNELNDDY CTAEEAEFGG SSFSDKGGLT QFKKATSGGM VLVMSLWDDY YANMLWLDST YPTNETSSTP GAVRGSCSTS
SGVPAQVESQ SPNAKVTFSN IKFGPIGSTG NPSG
MASSFQLYKA LLFFSSLLSA VQAQKVGTQQ AEVHPGLTWQ TCTSSGSCTT VNGEVTIDAN WRWLHTVNGY TNCYTGNEWD
TSICTSNEVC AEQCAVDGAN YASTYGITTS GSSLRLNFVT QSQQKNIGSR VYLMDDEDTY TMFYLLNKEF TFDVDVSELP
CGLNGAVYFV SMDADGGKSR YATNEAGAKY GTGYCDSQCP RDLKFINGVA NVEGWESSDT NPNGGVGNHG SCCAEMDIWE
ANSISTAFTP HPCDTPGQTL CTGDSCGGTY SNDRYGGTCD PDGCDFNSYR QGNKTFYGPG LTVDTNSPVT VVTQFLTDDN
TDTGTLSEIK RFYVQNGVVI PNSESTYPAN PGNSITTEFC ESQKELFGDV DVFSAHGGMA GMGAALEQGM VLVLSLWDDN
YSNMLWLDSN YPTDADPTQP GIARGTCPTD SGVPSEVEAQ YPNAYVVYSN IKFGPIGSTF GNGGGSGPTT TVTTSTATST
TSSATSTATG QAQHWEQCGG NGWTGPTVCA SPWACTVVNS WYSQCL
MYRAIATASA LIAAVRAQQV CSLTQESKPS LNWSKCTSSG CSNVKGSVTI DANWRWTHQV SGSTNCYTGN KWDTSVCTSG
KVCAEKCCLD GADYASTYGI TSSGDQLSLS FVTKGPYSTN IGSRTYLMED ENTYQMFQLL GNEFTFDVDV SNIGCGLNGA
LYFVSMDADG GKAKYPGNKA GAKYGTGYCD AQCPRDVKFI NGQANSDGWQ PSDSDVNGGI GNLGTCCPEM DIWEANSIST
AYTPHPCTKL TQHSCTGDSC GGTYSNDRYG GTCDADGCDF NSYRQGNKTF YGPGSGFNVD TTKKVTVVTQ FHKGSNGRLS
EITRLYVQNG KVIANSESKI AGVPGNSLTA DFCTKQKKVF NDPDDFTKKG AWSGMSDALE APMVLVMSLW HDHHSNMLWL
DSTYPTDSTK LGSQRGSCST SSGVPADLEK NVPNSKVAFS NIKFGPIGST YKSDGTTPTN PTNPSEPSNT ANPNPGTVDQ
WGQCGGSNYS GPTACKSGFT CKKINDFYSQ CQ
MLASTFSYRM YKTALILAAL LGSGQAQQVG TSQAEVHPSM TWQSCTAGGS CTTNNGKVVI DANWRWVHKV GDYTNCYTGN
TWDTTICPDD ATCASNCALE GANYESTYGV TASGNSLRLN FVTTSQQKNI GSRLYMMKDD STYEMFKLLN QEFTFDVDVS
NLPCGLNGAL YFVAMDADGG MSKYPTNKAG AKYGTGYCDS QCPRDLKFIN GQANVEGWQP SSNDANAGTG NHGSCCAEMD
IWEANSISTA FTPHPCDTPG QVMCTGDACG GTYSSDRYGG TCDPDGCDFN SFRQGNKTFY GPGMTVDTKS KFTVVTQFIT
DDGTSSGTLK EIKRFYVQNG KVIPNSESTW TGVSGNSITT EYCTAQKSLF QDQNVFEKHG GLEGMGAALA QGMVLVMSLW
DDHSANMLWL DSNYPTTASS TTPGVARGTC DISSGVPADV EANHPDAYVV YSNIKVGPIG STFNSGGSNP GGGTTTTTTT
QPTTTTTTAG NPGGTGVAQH YGQCGGIGWT GPTTCASPYT CQKLNDYYSQ CL
MLPSTISYRI YKNALFFAAL FGAVQAQKVG TSKAEVHPSM AWQTCAADGT CTTKNGKVVI DANWRWVHDV KGYTNCYTGN
TWNAELCPDN ESCAENCALE GADYAATYGA TTSGNALSLK FVTQSQQKNI GSRLYMMKDD NTYETFKLLN QEFTFDVDVS
NLPCGLNGAL YFVSMDADGG LSRYTGNEAG AKYGTGYCDS QCPRDLKFIN GLANVEGWTP SSSDANAGNG GHGSCCAEMD
IWEANSISTA YTPHPCDTPG QAMCNGDSCG GTYSSDRYGG TCDPDGCDFN SYRQGNKSFY GPGMTVDTKK KMTVVTQFLT
NDGTATGTLS EIKRFYVQDG KVIANSESTW PNLGGNSLTN DFCKAQKTVF GDMDTFSKHG GMEGMGAALA EGMVLVMSLW
DDHNSNMLWL DSNSPTTGTS TTPGVARGSC DISSGDPKDL EANHPDASVV YSNIKVGPIG STFNSGGSNP GGSTTTTKPA
TSTTTTKATT TATTNTTGPT GTGVAQPWAQ CGGIGYSGPT QCAAPYTCTK QNDYYSQCL
MHPSLQTILL SALFTTAHAQ QACSSKPETH PPLSWSRCSR SGCRSVQGAV TVDANWLWTT VDGSQNCYTG NRWDTSICSS
EKTCSESCCI DGADYAGTYG VTTTGDALSL KFVQQGPYSK NVGSRLYLMK DESRYEMFTL LGNEFTFDVD VSKLGCGLNG
ALYFVSMDED GGMKRFPMNK AGAKFGTGYC DSQCPRDVKF INGMANSKDW IPSKSDANAG IGSLGACCRE MDIWEANNIA
SAFTPHPCKN SAYHSCTGDG CGGTYSKNRY SGDCDPDGCD FNSYRLGNTT FYGPGPKFTI DTTRKISVVT QFLKGRDGSL
REIKRFYVQN GKVIPNSVSR VRGVPGNSIT QGFCNAQKKM FGAHESFNAK GGMKGMSAAV SKPMVLVMSL WDDHNSNMLW
LDSTYPTNSR QRGSKRGSCP ASSGRPTDVE SSAPDSTVVF SNIKFGPIGS TFSRGK
ESACTLQSET HPPLTWQKCS SGGTCTQQTG SVVIDANWRW THATNSSTNC YDGNTWSSTL CPDNETCAKN CCLDGAAYAS
TYGVTTSGNS LSIDFVTQSA QKNVGARLYL MASDTTYQEF TLLGNEFSFD VDVSQLPCGL NGALYFVSMD ADGGVSKYPT
NTAGAKYGTG YCDSQCPRDL KFINGQANVE GWEPSSNNAN TGIGGHGSCC SEMDIWQANS ISEALTPHPC TTVGQEICEG
DGCGGTYSDN RYGGTCDPDG CDWNPYRLGN TSFYGPGSSF TLDTTKKLTV VTQFETSGAI NRYYVQNGVT FQQPNAELGS
YSGNELNDDY CTAEEAEFGG SSFSDKGGLT QFKKATSGGM VLVMSLWDDY YANMLWLDST YPTNETSSTP GAVRGSCSTS
SGVPAQVESQ SPNAKVTFSN IKFGPIGSTG NPSG
MHQRALLFSA LAVAANAQQV GTQKPETHPP LTWQKCTAAG SCSQQSGSVV IDANWRWLHS TKDTTNCYTG NTWNTELCPD
NESCAQNCAV DGADYAGTYG VTTSGSELKL SFVTGANVGS RLYLMQDDET YQHFNLLNNE FTFDVDVSNL PCGLNGALYF
VAMDADGGMS KYPSNKAGAK YGTGYCDSQC PRDLKFINGM ANVEGWKPSS NDKNAGVGGH GSCCPEMDIW EANSISTAVT
PHPCDDVSQT MCSGDACGGT YSATRYAGTC DPDGCDFNPF RMGNESFYGP GKIVDTKSEM TVVTQFITAD GTDTGALSEI
KRLYVQNGKV IANSVSNVAD VSGNSISSDF CTAQKKAFGD EDIFAKHGGL SGMGKALSEM VLIMSIWDDH HSSMMWLDST
YPTDADPSKP GVARGTCEHG AGDPEKVESQ HPDASVTFSN IKFGPIGSTY KA
MYRSLIFATS LLSLAKGQLV GNLYCKGSCT AKNGKVVIDA NWRWLHVKGG YTNCYTGNEW NATACPDNKS CATNCAIDGA
DYRRLRHYCE RQLLGTEVHH QGLYSTNIGS RTYLMQDDST YQLFKFTGSQ EFTFDVDLSN LPCGLNGALY FVSMDADGGL
KKYPTNKAGA KYGTGYCDAQ CPRDLKFING EGNVEGWQPS KNDQNAGVGG HGSCCAEMDI WEANSVSTAV TPHSCSTIEQ
SRCDGDGCGG TYSADRYAGV CDPDGCDFNS YRMGVKDFYG KGKTVDTSKK FTVVTQFIGS GDAMEIKRFY VQNGKTIPQP
DSTIPGVTGN SITTFFCDAQ KKAFGDKYTF KDKGGMANMP STCNGMVLVM SLWDDHYSNM LWLDSTYPTD KNPDTDAGSG
RGECAITSGV PADVESQHPD ASVIYSNIKF GPINTTFG
MLAKFAALAA LVASANAQAV CSLTAETHPS LNWSKCTSSG CTNVAGSITV DANWRWTHIT SGSTNCYSGN EWDTSLCSTN
TDCATKCCVD GAEYSSTYGI QTSGNSLSLQ FVTKGSYSTN IGSRTYLMNG ADAYQGFELL GNEFTFDVDV SGTGCGLNGA
LYFVSMDLDG GKAKYTNNKA GAKYGTGYCD AQCPRDLKYI NGIANVEGWT PSTNDANAGI GDHGTCCSEM DIWEANKVST
AFTPHPCTTI EQHMCEGDSC GGTYSDDRYG GTCDADGCDF NSYRMGNTTF YGEGKTVDTS SKFTVVTQFI KDSAGDLAEI
KRFYVQNGKV IENSQSNVDG VSGNSITQSF CNAQKTAFGD IDDFNKKGGL KQMGKALAKP MVLVMSIWDD HAANMLWLDS
TYPVEGGPGA YRGECPTTSG VPAEVEANAP NSKVIFSNIK FGPIGSTFSG GSSGTPPSNP SSSVKPVTST AKPSSTSTAS
NPSGTGAAHW AQCGGIGFSG PTTCQSPYTC QKINDYYSQC V
MFKKVALTAL CFLAVAQAQQ VGREVAENHP RLPWQRCTRN GGCQTVSNGQ VVLDANWRWL HVTDGYTNCY TGNSWNSTVC
SDPTTCAQRC ALEGANYQQT YGITTNGDAL TIKFLTRSQQ TNVGARVYLM ENENRYQMFN LLNKEFTFDV DVSKVPCGIN
GALYFIQMDA DGGMSKQPNN RAGAKYGTGY CDSQCPRDIK FIDGVANSAD WTPSETDPNA GRGRYGICCA EMDIWEANSI
SNAYTPHPCR TQNDGGYQRC EGRDCNQPRY EGLCDPDGCD YNPFRMGNKD FYGPGKTVDT NRKMTVVTQF ITHDNTDTGT
LVDIRRLYVQ DGRVIANPPT NFPGLMPAHD SITEQFCTDQ KNLFGDYSSF ARDGGLAHMG RSLAKGHVLA LSIWNDHGAH
MLWLDSNYPT DADPNKPGIA RGTCPTTGGT PRETEQNHPD AQVIFSNIKF GDIGSTFSGY
MYSAAVLATF SFLLGAGAQQ VGTSTAETHP ALTVQKCAAG GTCTDESDSI VLDANWRWLH STSGSTNCYT GNTWDTTLCP
DAATCTTNCA LDGADYEGTY GITTSGDSLK LSFVTGSNVG SRTYLMDSET TYKEFALLGN EFTFTVDVSK LPCGLNGALY
FVPMDADGGM SKYPTNKAGA KYGTGYCDAQ CPQDMKFVNG TANVEGWVPD SNSANSGTGN IGSCCSEFDV WEANSMSQAL
TPHVCTVDSQ TACTGDDCAS NTGVCDGDGC DFNPYRMGNT TFYGSGMTID TSKPFSVVTQ FITDDGTETG TLTEIKRFYV
QDDVVYEQPS SDISGVSGNS ITDDFCAAQK TAFGDTDYFT QNGGMAAMGK KMADGMVLVL SIWDDYNVNM LWLDSDYPTT
KDASTPGVSR GSCATDSGVP ATVEAASGSA YVTFSSIKYG PIGSTFNAPA DSSSSVSASS SPAPIASSSS SASIAPVSSV
VAAIVSSSAQ AISSAAPVVS SSAQAISSAA PVVSSVVSSA APVATSSTKS KCSKVSSTLK TSVAAPATSA TSAAVVATSS
AASSTGSVPL YGNCTGGKTC SEGTCVVQND YYSQCVASS
MTWQRCTGTG GSSCTNVNGE IVIDANWRWI HATGGYTNCF DGNEWNKTAC PSNAACTKNC AIEGSDYRGT YGITTSGNSL
TLKFITKGQY STNVGSRTYL MKDTNNYEMF NLIGNEFTFD VDLSQLPCGL NGALYFVSMP EKGQGTPGAK YGTGKLSQCS
VHISKTLTDA CARDLKFVGG EANADGWQAS TSDPNAGVGK KGACCAEMDV WEANSMSTAL TPHSCQPEGY AVCEESNCGG
TYSLDRYAGT CDANGCDFNP YRVGNKDFYG KGKTVDTSKK MTVVTQFLGT GSDLTELKRF YVQDGKVISN PEPTIPGMTG
NSITQKWCDT QKEVFKEEVY PFNQWGGMAS MGKGMAQGMV LVMSLWDDHY SNMLWLDSTY PTDRDPESPG AARGECAITS
GAPAEVEANN PDASVMFSNI KFGPIGSTFQ QPA
MQIKSYIQYL AAALPLLSSV AAQQAGTITA ENHPRMTWKR CSGPGNCQTV QGEVVIDANW RWLHNNGQNC YEGNKWTSQC
SSATDCAQRC ALDGANYQST YGASTSGDSL TLKFVTKHEY GTNIGSRFYL MANQNKYQMF TLMNNEFAFD VDLSKVECGI
NSALYFVAME EDGGMASYPS NRAGAKYGTG YCDAQCARDL KFIGGKANIE GWRPSTNDPN AGVGPMGACC AEIDVWESNA
YAYAFTPHAC GSKNRYHICE TNNCGGTYSD DRFAGYCDAN GCDYNPYRMG NKDFYGKGKT VDTNRKFTVV SRFERNRLSQ
FFVQDGRKIE VPPPTWPGLP NSADITPELC DAQFRVFDDR NRFAETGGFD ALNEALTIPM VLVMSIWDDH HSNMLWLDSS
YPPEKAGLPG GDRGPCPTTS GVPAEVEAQY PNAQVVWSNI RFGPIGSTVN V
MRTAKFATLA ALVASAAAQQ ACSLTTERHP SLSWKKCTAG GQCQTVQASI TLDSNWRWTH QVSGSTNCYT GNKWDTSICT
DAKSCAQNCC VDGADYTSTY GITTNGDSLS LKFVTKGQYS TNVGSRTYLM DGEDKYQTFE LLGNEFTFDV DVSNIGCGLN
GALYFVSMDA DGGLSRYPGN KAGAKYGTGY CDAQCPRDIK FINGEANIEG WTGSTNDPNA GAGRYGTCCS EMDIWEANNM
ATAFTPHPCT IIGQSRCEGD SCGGTYSNER YAGVCDPDGC DFNSYRQGNK TFYGKGMTVD TTKKITVVTQ FLKDANGDLG
EIKRFYVQDG KIIPNSESTI PGVEGNSITQ DWCDRQKVAF GDIDDFNRKG GMKQMGKALA GPMVLVMSIW DDHASNMLWL
DSTFPVDAAG KPGAERGACP TTSGVPAEVE AEAPNSNVVF SNIRFGPIGS TVAGLPGAGN GGNNGGNPPP PTTTTSSAPA
TTTTASAGPK AGRWQQCGGI GFTGPTQCEE PYTCTKLNDW YSQCL
MQIKQYLQYL AAALPLVNMA AAQRAGTQQT ETHPRLSWKR CSSGGNCQTV NAEIVIDANW RWLHDSNYQN CYDGNRWTSA
CSSATDCAQK CYLEGANYGS TYGVSTSGDA LTLKFVTKHE YGTNIGSRVY LMNGSDKYQM FTLMNNEFAF DVDLSKVECG
LNSALYFVAM EEDGGMRSYS SNKAGAKYGT GYCDAQCARD LKFVGGKANI EGWRPSTNDA NAGVGPYGAC CAEIDVWESN
AYAFAFTPHG CLNNNYHVCE TSNCGGTYSE DRFGGLCDAN GCDYNPYRMG NKDFYGKGKT VDTSRKFTVV TRFEENKLTQ
FFIQDGRKID IPPPTWPGLP NSSAITPELC TNLSKVFDDR DRYEETGGFR TINEALRIPM VLVMSIWDGH YANMLWLDSV
YPPEKAGQPG AERGPCAPTS GVPAEVEAQF PNAQVIWSNI RFGPIGSTYQ V
MMYKKFAALA ALVAGAAAQQ ACSLTTETHP RLTWKRCTSG GNCSTVNGAV TIDANWRWTH TVSGSTNCYT GNEWDTSICS
DGKSCAQTCC VDGADYSSTY GITTSGDSLN LKFVTKHQHG TNVGSRVYLM ENDTKYQMFE LLGNEFTFDV DVSNLGCGLN
GALYFVSMDA DGGMSKYSGN KAGAKYGTGY CDAQCPRDLK FINGEANIEN WTPSTNDANA GFGRYGSCCS EMDIWDANNM
ATAFTPHPCT IIGQSRCEGN SCGGTYSSER YAGVCDPDGC DFNAYRQGDK TFYGKGMTVD TTKKMTVVTQ FHKNSAGVLS
EIKRFYVQDG KIIANAESKI PGNPGNSITQ EWCDAQKVAF GDIDDFNRKG GMAQMSKALE GPMVLVMSVW DDHYANMLWL
DSTYPIDKAG TPGAERGACP TTSGVPAEIE AQVPNSNVIF SNIRFGPIGS TVPGLDGSTP SNPTATVAPP TSTTTSVRSS
TTQISTPTSQ PGGCTTQKWG QCGGIGYTGC TNCVAGTTCT ELNPWYSQCL
MYRNFLYAAS LLSVARSQLV GTQTTETHPG MTWQSCTAKG SCTTCSDNKA CASNCAVDGA DYKGTYGITA SGNSLQLKFI
TKGSYSTNIG SRTYLMASDT AYQMFKFDGN KEFTFDVDLS GLPCGFNGAL YFVSMDEDGG LKKYSGNKAG AKYGTGYCDA
QCPRDLKFIN GEGNVEGWKP SDNDANAGVG GHGSCCAEMD IWEANSISTA VTPHACSTIE QTRCDGDGCG GTYSADRYAG
VCDPDGCDFN AYRMGVKNFY GKGMTVDTSK KFTVVTQFIG TGDAMEIKRF YVQGGKTIEQ PASTIPGVEG NSITTKFCDQ
QKQVFGDRYT YKEKGGTANM AKALAQGMVL VMSLWDDHYS NMLWLDSTYP TDKNPDTDLG SGRGSCDVKS GAPADVESKS
PDATVIYSNI KFGPLNSTY
MLGKIAIASL SFLAIAKGQQ VGREVAENHP RLPWQRCTRN GGCQTVSNGQ VVLDANWRWL HVTDGYTNCY TGNSWNSSVC
SDGTTCAQRC ALEGANYQQT YGITTSGNSL TMKFLTRSQG TNVGGRVYLM ENENRYQMFN LLNKEFTFDV DVSKVPCGIN
GALYFIQMDA DGGMSSQPNN RAGAKYGTGY CDSQCPRDIK FIDGVANSVG WEPSETDSNA GRGRYGICCA EMDIWEANSI
SNAYTPHPCR TQNDGGYQRC EGRDCNQPRY EGLCDPDGCD YNPFRMGNKD FYGPGKTIDT NRKMTVVTQF ITHDNTDTGT
LVDIRRLYVQ DGRVIANPPT NFPGLMPAHD SITEQFCTDQ KNLFGDYSSF ARDGGLAHMG RSLAKGHVLA LSIWNDHGAH
MLWLDSNYPT DADPNKPGIA RGTCPTTGGT PRETEQNHPD AQVIFSNIKF GDIGSTFSGY
MFPRSILLAL SLTAVALGQQ VGTNMAENHP SLTWQRCTSS GCQNVNGKVT LDANWRWTHR INDFTNCYTG NEWDTSICPD
GVTCAENCAL DGADYAGTYG VTSSGTALTL KFVTESQQKN IGSRLYLMAD DSNYEIFNLL NKEFTFDVDV SKLPCGLNGA
LYFSEMAADG GMSSTNTAGA KYGTGYCDSQ CPRDIKFIDG EANSEGWEGS PNDVNAGTGN FGACCGEMDI WEANSISSAY
TPHPCREPGL QRCEGNTCSV NDRYATECDP DGCDFNSFRM GDKSFYGPGM TVDTNQPITV VTQFITDNGS DNGNLQEIRR
IYVQNGQVIQ NSNVNIPGID SGNSISAEFC DQAKEAFGDE RSFQDRGGLS GMGSALDRGM VLVLSIWDDH AVNMLWLDSD
YPLDASPSQP GISRGTCSRD SGKPEDVEAN AGGVQVVYSN IKFGDINSTF NNNGGGGGNP SPTTTRPNSP AQTMWGQCGG
QGWTGPTACQ SPSTCHVIND FYSQCF
MYRNLALASL SLFGAARAQQ AGTVTTETHP SLSWKTCTGT GGTSCTTKAG KITLDANWRW THVTTGYTNC YDGNSWNTTA
CPDGATCTKN CAVDGADYSG TYGITTSSNS LSIKFVTKGS NSANIGSRTY LMESDTKYQM FNLIGQEFTF DVDVSKLPCG
LNGALYFVEM AADGGIGKGN NKAGAKYGTG YCDSQCPHDI KFINGKANVE GWNPSDADPN AGSGKIGACC PEMDIWEANS
ISTAYTPHPC KGTGLQECTD DVSCGDGSNR YSGLCDKDGC DFNSYRMGVK DFYGPGATLD TTKKMTVVTQ FLGSGSTLSE
IKRFYVQNGK VFKNSDSAIE GVTGNSITES FCAAQKTAFG DTNSFKTLGG LNEMGASLAR GHVLVMSLWD DHAVNMLWLD
STYPTNSTKL GAQRGTCAID SGKPEDVEKN HPDATVVFSD IKFGPIGSTF QQPS
MVDIQIATFL LLGVVGVAAQ QVGTYIPENH PLLATQSCTA SGGCTTSSSK IVLDANRRWI HSTLGTTSCL TANGWDPTLC
PDGITCANYC ALDGVSYSST YGITTSGSAL RLQFVTGTNI GSRVFLMADD THYRTFQLLN QELAFDVDVS KLPCGLNGAL
YFVAMDADGG KSKYPGNRAG AKYGTGYCDS QCPRDVQFIN GQANVQGWNA TSATTGTGSY GSCCTELDIW EANSNAAALT
PHTCTNNAQT RCSGSNCTSN TGFCDADGCD FNSFRLGNTT FLGAGMSVDT TKTFTVVTQF ITSDNTSTGN LTEIRRFYVQ
NGNVIPNSVV NVTGIGAVNS ITDPFCSQQK KAFIETNYFA QHGGLAQLGQ ALRTGMVLAF SISDDPANHM LWLDSNFPPS
ANPAVPGVAR GMCSITSGNP ADVGILNPSP YVSFLNIKFG SIGTTFRPA
MHQRALLFSA LAVAANAQQV GTQTPETHPP LTWQKCTAAG SCSQQSGSVV IDANWRWLHS TKDTTNCYTG NTWNTELCPD
NESCAQNCAL DGADYAGTYG VTTSGSELKL SFVTGANVGS RLYLMQDDET YQHFNLLNHE FTFDVDVSNL PCGLNGALYF
VAMDADGGMS KYPSNKAGAK YGTGYCDSQC PRDLKFINGM ANVEGWEPSS SDKNAGVGGH GSCCPEMDIW EANSISTAVT
PHPCDDVSQT MCSGDACGGT YSESRYAGTC DPDGCDFNPF RMGNESFYGP GKIVDTKSKM TVVTQFITAD GTDSGALSEI
KRLYVQNGKV IANSVSNVAG VSGNSITSDF CTAQKKAFGD EDIFAKHGGL SGMGKALSEM VLIMSIWDDH HSSMMWLDST
YPTDADPSKP GVARGTCEHG AGDPENVESQ HPDASVTFSN IKFGPIGSTY EG
MFRTATLLAF TMAAMVFGQQ VGTNTAENHR TLTSQKCTKS GGCSNLNTKI VLDANWRWLH STSGYTNCYT GNQWDATLCP
DGKTCAANCA LDGADYTGTY GITASGSSLK LQFVTGSNVG SRVYLMADDT HYQMFQLLNQ EFTFDVDMSN LPCGLNGALY
LSAMDADGGM AKYPTNKAGA KYGTGYCDSQ CPRDIKFING EANVEGWNAT SANAGTGNYG TCCTEMDIWE ANNDAAAYTP
HPCTTNAQTR CSGSDCTRDT GLCDADGCDF NSFRMGDQTF LGKGLTVDTS KPFTVVTQFI TNDGTSAGTL TEIRRLYVQN
GKVIQNSSVK IPGIDPVNSI TDNFCSQQKT AFGDTNYFAQ HGGLKQVGEA LRTGMVLALS IWDDYAANML WLDSNYPTNK
DPSTPGVARG TCATTSGVPA QIEAQSPNAY VVFSNIKFGD LNTTYTGTVS SSSVSSSHSS TSTSSSHSSS STPPTQPTGV
TVPQWGQCGG IGYTGSTTCA SPYTCHVLNP YYSQCY
MYQRALLFSF FLAAARAHEA GTVTAENHPS LTWQQCSSGG SCTTQNGKVV IDANWRWVHT TSGYTNCYTG NTWDTSICPD
DVTCAQNCAL DGADYSGTYG VTTSGNALRL NFVTQSSGKN IGSRLYLLQD DTTYQIFKLL GQEFTFDVDV SNLPCGLNGA
LYFVAMDADG NLSKYPGNKA GAKYGTGYCD SQCPRDLKFI NGQANVEGWQ PSANDPNAGV GNHGSSCAEM DVWEANSIST
AVTPHPCDTP GQTMCQGDDC GGTYSSTRYA GTCDPDGCDF NPYQPGNHSF YGPGKIVDTS SKFTVVTQFI TDDGTPSGTL
TEIKRFYVQN GKVIPQSEST ISGVTGNSIT TEYCTAQKAA FGDNTGFFTH GGLQKISQAL AQGMVLVMSL WDDHAANMLW
LDSTYPTDAD PDTPGVARGT CPTTSGVPAD VESQNPNSYV IYSNIKVGPI NSTFTAN
MQIKSYIQYL AAALPLLSSV AAQQAGTITA ENHPRMTWKR CSGPGNCQTV QGEVVIDANW RWLHNNGQNC YEGNKWTSQC
SSATDCAQRC ALDGANYQST YGASTSGDSL TLKFVTKHEY GTNIGSRFYL MANQNKYQMF TLMNNEFAFD VDLSKVECGI
NSALYFVAME EDGGMASYPS NRAGAKYGTG YCDAQCARDL KFIGGKANIE GWRPSTNDPN AGVGPMGACC AEIDVWESNA
YAYAFTPHAC GSKNRYHICE TNNCGGTYSD DRFAGYCDAN GCDYNPYRMG NKDFYGKGKT VDTNRKFTVV SRFERNRLSQ
FFVQDGRKIE VPPPTWPGLP NSADITPELC DAQFRVFDDR NRFAETGGFD ALNEALTIPM VLVMSIWDDH HSNMLWLDSS
YPPEKAGLPG GDRGPCPTTS GVPAEVEAQY PDAQVVWSNI RFGPIGSTVN V
MYRKLAVISA FLATARAQSA CTLQSETHPP LTWQKCSSGG TCTQQTGSVV IDANWRWTHA TNSSTNCYDG NTWSSTLCPD
NETCAKNCCL DGAAYASTYG VTTSGNSLSI GFVTQSAQKN VGARLYLMAS DTTYQEFTLL GNEFSFDVDV SQLPCGLNGA
LYFVSMDADG GVSKYPTNTA GAKYGTGYCD SQCPRDLKFI NGQANVEGWE PSSNNANTGI GGHGSCCSEM DIWEANSISE
ALTPHPCTTV GQEICEGDGC GGTYSDNRYG GTCDPDGCDW DPYRLGNTSF YGPGSSFTLD TTKKLTVVTQ FETSGAINRY
YVQNGVTFQQ PNAELGSYSG NGLNDDYCTA EEAEFGGSSF SDKGGLTQFK KATSGGMVLV MSLWDDYYAN MLWLDSTYPT
NETSSTPGAV RGSCSTSSGV PAQVESQSPN AKVTFSNIKF GPIGSTGDPS GGNPPGGNPP GTTTTRRPAT TTGSSPGPTQ
SHYGQCGGIG YSGPTVCASG TTCQVLNPYY SQCL
MYQRALLFSF FLAAARAQQA GTVTAENHPS LTWQQCSSGG SCTTQNGKVV IDANWRWVHT TSGYTNCYTG NTWDTSICPD
DVTCAQNCAL DGADYSGTYG VTTSGNALRL NFVTQSSGKN IGSRLYLLQD DTTYQIFKLL GQEFTFDVDV SNLPCGLNGA
LYFVAMDADG GLSKYPGNKA GAKYGTGYCD SQCPRDLKFI NGQANVEGWQ PSANDPNAGV GNHGSCCAEM DVWEANSIST
AVTPHPCDTP GQTMCQGDDC GGTYSSTRYA GTCDPDGCDF NPYRQGNHSF YGPGQIVDTS SKFTVVTQFI TDDGTPSGTL
TEIKRFYVQN GKVIPQSEST ISGVTGNSIT TEYCTAQKAA FGDNTGFFTH GGLQKISQAL AQGMVLVMSL WDDHAANMLW
LDSTYPTDAD PDTPGVARGT CPTTSGVPAD VESQYPNSYV IYSNIKVGPI NSTFTAN
MIRKITTLAA LVGVVRGQAA CSLTAETHPS LTWQKCSSGG SCTNVAGSVT IDANWRWTHT TSGYTNCYTG NKWDTSICST
NADCASKCCV DGANYQQTYG ASTSGNALSL QYVTQSSGKN VGSRLYLLES ENKYQMFNLL GNEFTFDVDA SKLGCGLNGA
VYFVSMDADG GQSKYSGNKA GAKYGTGYCD SQCPRDLKYI NGAANVEGWQ PSSGDANSGV GNMGSCCAEM DIWEANSIST
AYTPHPCSNN AQHSCKGDDC GGTYSSVRYA GDCDPDGCDF NSYRQGNRTF YGPGSNFNVD SSKKVTVVTQ FISSGGQLTD
IKRFYVQNGK VIPNSQSTIT GVTGNSVTQD YCDKQKTAFG DQNVFNQRGG LRQMGDALAK GMVLVMSVWD DHHSQMLWLD
STYPTTSTAP GAARGSCSTS SGKPSDVQSQ TPGATVVYSN IKFGPIGSTF KSS
MLRRALLLSS SAILAVKAQQ AGTATAENHP PLTWQECTAP GSCTTQNGAV VLDANWRWVH DVNGYTNCYT GNTWDPTYCP
DDETCAQNCA LDGADYEGTY GVTSSGSSLK LNFVTGSNVG SRLYLLQDDS TYQIFKLLNR EFSFDVDVSN LPCGLNGALY
FVAMDADGGV SKYPNNKAGA KYGTGYCDSQ CPRDLKFIDG EANVEGWQPS SNNANTGIGD HGSCCAEMDV WEANSISNAV
TPHPCDTPGQ TMCSGDDCGG TYSNDRYAGT CDPDGCDFNP YRMGNTSFYG PGKIIDTTKP FTVVTQFLTD DGTDTGTLSE
IKRFYIQNSN VIPQPNSDIS GVTGNSITTE FCTAQKQAFG DTDDFSQHGG LAKMGAAMQQ GMVLVMSLWD DYAAQMLWLD
SDYPTDADPT TPGIARGTCP TDSGVPSDVE SQSPNSYVTY SNIKFGPINS TFTAS
MHQRALLFSA FWTAVQAQQA GTLTAETHPS LTWQKCAAGG TCTEQKGSVV LDSNWRWLHS VDGSTNCYTG NTWDATLCPD
NESCASNCAL DGADYEGTYG VTTSGDALTL QFVTGANIGS RLYLMADDDE SYQTFNLLNN EFTFDVDASK LPCGLNGAVY
FVSMDADGGV AKYSTNKAGA KYGTGYCDSQ CPRDLKFING QVRKGWEPSD SDKNAGVGGH GSCCPQMDIW EANSISTAYT
PHPCDDTAQT MCEGDTCGGT YSSERYAGTC DPDGCDFNAY RMGNESFYGP SKLVDSSSPV TVVTQFITAD GTDSGALSEI
KRFYVQGGKV IANAASNVDG VTGNSITADF CTAQKKAFGD DDIFAQHGGL QGMGNALSSM VLTLSIWDDH HSSMMWLDSS
YPEDADATAP GVARGTCEPH AGDPEKVESQ SGSATVTYSN IKYGPIGSTF DAPA
MASTLSFKIY KNALLLAAFL GAAQAQQVGT STAEVHPSLT WQKCTAGGSC TSQSGKVVID SNWRWVHNTG GYTNCYTGND
WDRTLCPDDV TCATNCALDG ADYKGTYGVT ASGSSLRLNF VTQASQKNIG SRLYLMADDS KYEMFQLLNQ EFTFDVDVSN
LPCGLNGALY FVAMDEDGGM ARYPTNKAGA KYGTGYCDAQ CPRDLKFING QANVEGWEPS SSDVNGGTGN YGSCCAEMDI
WEANSISTAF TPHPCDDPAQ TRCTGDSCGG TYSSDRYGGT CDPDGCDFNP YRMGNQSFYG PSKIVDTESP FTVVTQFITN
DGTSTGTLSE IKRFYVQNGK VIPQSVSTIS AVTGNSITDS FCSAQKTAFK DTDVFAKHGG MAGMGAGLAE GMVLVMSLWD
DHAANMLWLD STYPTSASST TPGAARGSCD ISSGEPSDVE ANHSNAYVVY SNIKVGPLGS TFGSTDSGSG TTTTKVTTTT
ATKTTTTTGP STTGAAHYAQ CGGQNWTGPT TCASPYTCQR QGDYYSQCL
MVSAKFAALA ALVASASAQQ VCSLTPESHP PLTWQRCSAG GSCTNVAGSV TLDSNWRWTH TLQGSTNCYS GNEWDTSICT
TGTKCAQNCC VEGAEYAATY GITTSGNQLN LKFVTEGKYS TNVGSRTYLM ENATKYQGFN LLGNEFTFDV DVSNIGCGLN
GALYFVSMDL DGGLAKYSGN KAGAKYGTGY CDAQCPRDIK FINGEANIEG WNPSTNDVNA GAGRYGTCCS EMDIWEANNM
ATAYTPHSCT ILDQSRCEGE SCGGTYSSDR YGGVCDPDGC DFNSYRMGNK EFYGKGKTVD TTKKMTVVTQ FLKNAAGELS
EIKRFYVQNG VVIPNSVSSI PGVPNQNSIT QDWCDAQKIA FGDPDDNTAK GGLRQMGLAL DKPMVLVMSI WNDHAAHMLW
LDSTYPVDAA GRPGAERGAC PTTSGVPSEV EAEAPNSNVA FSNIKFGPIG STFNSGSTNP NPISSSTATT PTSTRVSSTS
TAAQTPTSAP GGTVPRWGQC GGQGYTGPTQ CVAPYTCVVS NQWYSQCL
MFPYIALVSF SFLSVVLAQQ VGTLTAETHP QLTVQQCTRG GSCTTQQRSV VLDGNWRWLH STSGSNNCYT GNTWDTSLCP
DAATCSRNCA LDGADYSGTY GITSSGNALT LKFVTHGPYS TNIGSRVYLL ADDSHYQMFN LKNKEFTFDV DVSQLPCGLN
GALYFSQMDA DGGTGRFPNN KAGAKYGTGY CDSQCPHDIK FINGEANVQG WQPSPNDSNA GKGQYGSCCA EMDIWEANSM
ASAYTPHPCT VTTPTRCQGN DCGDGDNRYG GVCDKDGCDF NSFRMGDKNF LGPGKTVNTN SKFTVVTQFL TSDNTTSGTL
SEIRRLYVQN GRVIQNSKVN IPGMASTLDS ITESFCSTQK TVFGDTNSFA SKGGLRAMGN AFDKGMVLVL SIWDDHEAKM
LWLDSNYPLD KSASAPGVAR GTCATTSGEP KDVESQSPNA QVIFSNIKYG DIGSTYSN
MYRAIATASA LIAAVRAQQV CSLTQESKPS LNWSKCTSSG CSNVKGSVTI DANWRWTHQV SGSTNCYTGN KWDTSVCTSG
KVCAERCCLD GADYASTYGI TSSGDQLSLS FVTKGPYSTN IGSRTYLMED ENTYQMFQLL GNEFTFDVDV SNIGCGLNGA
LYFVSMDADG GKAKYPGNKA GAKYGTGYCD AQCPRDVKFI NGQANSDGWQ PSDSDVNGGI GNLGTCCPEM DIWEANSIST
AYTPHPCTKL TQHSCTGDSC GGTYSNDRYG GTCDADGCDF NSYRQGNKTF YGPGSGFNVD TTKKVTVVTQ FHKGSNGRLS
EITRLYVQNG KVIANSESKI AGVPGNSLTA DFCTKQKKVF NDPDDFTKKG AWSGMSDALE APMVLVMSLW HDHHSNMLWL
DSTYPTDSTK LGSQRGSCST SSGVPADLEK NVPNSKVAFS NIKFGPIGST YKSDGTTPTN PTNPSEPSNT ANPNPGTVDQ
WGQCGGSNYS GPTACKSGFT CKKINDFYSQ CQ
MYSAAVLATF SFLLGAGAQQ VGTLKTESHP PLTIQKCAAG GTCTDEADSV VLDANWRWLH STSGSTNCYT GNTWDTTLCP
DAATCTANCA FDGADYEGTY GITSSGDSLK LSFVTGSNVG SRTYLMDSET TYKEFALLGN EFTFTVDVSK LPCGLNGALY
FVPMDADGGM SKYPTNKAGA KYGTGYCDAQ CPQDMKFVSG GANNEGWVPD SNSANSGTGN IGSCCSEFDV WEANSMSQAL
TPHTCTVDGQ TACTGDDCAG NTGVCDADGC DFNPYRMGNT TFYGSGKTID TTKPFSVVTQ FITDDGTETG TLTEIKRFYV
QDDVVYEQPN SDISGVSGNS ITDDFCTAQK TAFGDTDYFS QKGGMAAMGK KMADGMVLVL SIWDDYNVNM LWLDSDYPTT
KDASTPGVSR GSCATTSGVP ATVEAASGSA YVTFSSIKYG PIGSTFKAPA DSSSPVVASS SPAAVAAVVS TSSAQAVPSH
PAVSSSQAAV STPEAVSSAP EVPASSSAAQ SVAPTSTKPK CSKVSQSSTL ATSVAAPATT ATSAAVAATS AASSSGSVPL
YGNCTGGKTC SEGTCVVQNP WYSQCVASS
MFRAAALLAF TCLAMVSGQQ AGTNTAENHP QLQSQQCTTS GGCKPLSTKV VLDSNWRWVH STSGYTNCYT GNEWDTSLCP
DGKTCAANCA LDGADYSGTY GITSTGTALT LKFVTGSNVG SRVYLMADDT HYQLLKLLNQ EFTFDVDMSN LPCGLNGALY
LSAMDADGGM SKYPGNKAGA KYGTGYCDSQ CPKDIKFING EANVGNWTET GSNTGTGSYG TCCSEMDIWE ANNDAAAFTP
HPCTTTGQTR CSGDDCARNT GLCDGDGCDF NSFRMGDKTF LGKGMTVDTS KPFTVVTQFL TNDNTSTGTL SEIRRIYIQN
GKVIQNSVAN IPGVDPVNSI TDNFCAQQKT AFGDTNWFAQ KGGLKQMGEA LGNGMVLALS IWDDHAANML WLDSDYPTDK
DPSAPGVARG TCATTSGVPS DVESQVPNSQ VVFSNIKFGD IGSTFSGTSS PNPPGGSTTS SPVTTSPTPP PTGPTVPQWG
QCGGIGYSGS TTCASPYTCH VLNPYYSQCY
MYRKLAVISA FLATARAQSA CTLQSETHPP LTWQKCSSGG TCTQQTGSVV IDANWRWTHA TNSSTNCYDG NTWSSTLCPD
NETCAKNCCL DGAAYASTYG VTTSGNSLSI GFVTQSAQKN VGARLYLMAS DTTYQEFTLL GNEFSFDVDV SQLPCGLNGA
LYFVSMDADG GVSKYPTNTA GAKYGTGYCD SQCPRDLKFI NGQANVEGWE PSSNNANTGI GGHGSCCSEM DIWEANSISE
ALTPHPCTTV GQEICEGDGC GGTYSDNRYG GTCDPDGCDW NPYRLGNTSF YGPGSSFTLD TTKKLTVVTQ FETSGAINRY
YVQNGVTFQQ PNAELGSYSG NELNDDYCTA EEAEFGGSSF SDKGGLTQFK KATSGGMVLV MSLWDDYYAN MLWLDSTYPT
NETSSTPGAV RGSCSTSSGV PAQVESQSPN AKVTFSNIKF GPIGSTGNPS GGNPPGGNRG TTTTRRPATT TGSSPGPTQS
HYGQCGGIGY SGPTVCASGT TCQVLNPYYS QCL
MPSTYDIYKK LLLLASFLSA SQAQQVGTSK AEVHPSLTWQ TCTSGGSCTT VNGKVVVDAN WRWVHNVDGY NNCYTGNTWD
TTLCPDDETC ASNCALEGAD YSGTYGVTTS GNSLRLNFVT QASQKNIGSR LYLMEDDSTY KMFKLLNQEF TFDVDVSNLP
CGLNGAVYFV SMDADGGMAK YPANKAGAKY GTGYCDSQCP RDLKFINGMA NVEGWEPSAN DANAGTGNHG SCCAEMDIWE
ANSISTAYTP HPCDTPGQVM CTGDSCGGTY SSDRYGGTCD PDGCDFNSYR QGNKTFYGPG MTVDTKSKIT VVTQFLTNDG
TASGTLSEIK RFYVQNGKVI PNSESTWSGV SGNSITTAYC NAQKTLFGDT DVFTKHGGME GMGAALAEGM VLVLSLWDDH
NSNMLWLDSN YPTDKPSTTP GVARGSCDIS SGDPKDVEAN DANAYVVYSN IKVGPIGSTF SGSTGGGSSS STTATSKTTT
TSATKTTTTT TKTTTTTSAS STSTGGAQHW AQCGGIGWTG PTTCVAPYTC QKQNDYYSQC L
MISKVLAFTS LLAAARAQQA GTLTTETHPP LSVSQCTASG CTTSAQSIVV DANWRWLHST TGSTNCYTGN TWDKTLCPDG
ATCAANCALD GADYSGVYGI TTSGNSIKLN FVTKGANTNV GSRTYLMAAG STTQYQMLKL LNQEFTFDVD VSNLPCGLNG
ALYFAAMDAD GGLSRFPTNK AGAKYGTGYC DAQCPQDIKF INGVANSVGW TPSSNDVNAG AGQYGSCCSE MDIWEANKIS
AAYTPHPCSV DTQTRCTGTD CGIGARYSSL CDADGCDFNS YRQGNTSFYG AGLTVNTNKV FTVVTQFITN DGTASGTLKE
IRRFYVQNGV VIPNSQSTIA GVPGNSITDS FCAAQKTAFG DTNEFATKGG LATMSKALAK GMVLVMSIWD DHTANMLWLD
APYPATKSPS APGVTRGSCS ATSGNPVDVE ANSPGSSVTF SNIKWGPINS TYTGSGAAPS VPGTTTVSSA PASTATSGAG
GVAKYAQCGG SGYSGATACV SGSTCVALNP YYSQCQ
MFPAATLFAF SLFAAVYGQQ VGTQLAETHP RLTWQKCTRS GGCQTQSNGA IVLDANWRWV HNVGGYTNCY TGNTWNTSLC
PDGATCAKNC ALDGANYQST YGITTSGNAL TLKFVTQSEQ KNIGSRVYLL ESDTKYQLFN PLNQEFTFDV DVSQLPCGLN
GAVYFSAMDA DGGMSKFPNN AAGAKYGTGY CDSQCPRDIK FINGEANVQG WQPSPNDTNA GTGNYGACCN EMDVWEANSI
STAYTPHPCT QQGLVRCSGT ACGGGSNRYG SICDPDGCDF NSFRMGDKSF YGPGLTVNTQ QKFTVVTQFL TNNNSSSGTL
REIRRLYVQN GRVIQNSKVN IPGMPSTMDS VTTEFCNAQK TAFNDTFSFQ QKGGMANMSE ALRRGMVLVL SIWDDHAANM
LWLDSNYPTD RPASQPGVAR GTCPTSSGKP SDVENSTANS QVIYSNIKFG DIGSTYSA
MKGSISYQIY KGALLLSALL NSVSAQQVGT LTAETHPALT WSKCTAGXCS QVSGSVVIDA NWPXVHSTSG STNCYTGNTW
DATLCPDDVT CAANCAVDGA RRQHLRVTTS GNSLRINFVT TASQKNIGSR LYLLENDTTY QKFNLLNQEF TFDVDVSNLP
CGLNGALYFV DMDADGGMAK YPTNKAGAKY GTGYCDSQCP RDLKFINGQA NVDGWTPSKN DVNSGIGNHG SCCAEMDIWE
ANSISNAVTP HPCDTPSQTM CTGQRCGGTY STDRYGGTCD PDGCDFNPYR MGVTNFYGPG ETIDTKSPFT VVTQFLTNDG
TSTGTLSEIK RFYVQGGKVI GNPQSTIVGV SGNSITDSWC NAQKSAFGDT NEFSKHGGMA GMGAGLADGM VLVMSLWDDH
ASDMLWLDST YPTNATSTTP GAKRGTCDIS RRPNTVESTY PNAYVIYSNI KTGPLNSTFT GGTTSSSSTT TTTSKSTSTS
SSSKTTTTVT TTTTSSGSSG TGARDWAQCG GNGWTGPTTC VSPYTCTKQN DWYSQCL
MFRTAALTAF TLAAVVLGQQ VGTLTAENHP ALSIQQCTAS GCTTQQKSVV LDSNWRWTHS LPVHTNCYTG NAWDASLCPD
PTTCATNCAI DGADYSGTYG ITTSGNALTL RFVTNGPYSK NIGSRVYLLD DADHYKMFDL KNQEFTFDVD MSGLPCGLNG
ALYFSEMPAD GGKAAHTSNK AGAKYGTGYC DAQCPHDIKW INGEANILDW SASATDANAG NGRYGACCAE MDIWEANSEA
TAYTPHVCRD EGLYRCSGTE CGDGDNRYGG VCDKDGCDFN SYRMGDKNFL GRGKTIDTTK KITVVTQFIT DDNTSSGNLV
EIRRVYVQDG VTYQNSFSTF PSLSQYNSIS DDFCVAQKTL FGDNQYYNTH GGTEKMGDAM ANGMVLIMSL WSDHAAHMLW
LDSDYPLDKS PSEPGVSRGA CATTTGDPDD VVANHPNASV TFSNIKYGPI GSTYGGSTPP VSSGNTSAPP VTSTTSSGPT
TPTGPTGTVP KWGQCGGNGY SGPTTCVAGS TCTYSNDWYS QCL
MYQRALLFSA LLSVSRAQQA GTAQEEVHPS LTWQRCEASG SCTEVAGSVV LDSNWRWTHS VDGYTNCYTG NEWDATLCPD
NESCAQNCAV DGADYEATYG ITSNGDSLTL KFVTGSNVGS RVYLMEDDET YQMFDLLNNE FTFDVDVSNL PCGLNGALYF
TSMDADGGLS KYEGNTAGAK YGTGYCDSQC PRDIKFINGL GNVEGWEPSD SDANAGVGGM GTCCPEMDIW EANSISTAYT
PHPCDSVEQT MCEGDSCGGT YSDDRYGGTC DPDGCDFNSY RMGNTSFYGP GAIIDTSSKF TVVTQFIADG GSLSEIKRFY
VQNGEVIPNS ESNISGVEGN SITSEFCTAQ KTAFGDEDIF AQHGGLSAMG DAASAMVLIL SIWDDHHSSM MWLDSSYPTD
ADPSQPGVAR GTCEQGAGDP DVVESEHADA SVTFSNIKFG PIGSTF
MYRAIATASA LIAAVRAQQV CSLTTETKPA LTWSKCTSSG CSNVQGSVTI DANWRWTHQV SGSTNCHTGN KWDTSVCTSG
KVCAEKCCVD GADYASTYGI TSSGNQLSLS FVTKGSYGTN IGSRTYLMED ENTYQMFQLL GNEFTFDVDV SNIGCGLNGA
LYFVSMDADG GKAKYPGNKA GAKYGTGYCD AQCPRDVKFI NGQANSDGWE PSKSDVNGGI GNLGTCCPEM DIWEANSIST
AYTPHPCTKL TQHACTGDSC GGTYSNDRYG GTCDADGCDF NAYRQGNKTF YGPGSGFNVD TTKKVTVVTQ FHKGSNGRLS
EITRLYVQNG KVIANSESKI AGNPGSSLTS DFCTTQKKVF GDIDDFAKKG AWNGMSDALE APMVLVMSLW HDHHSNMLWL
DSTYPTDSTA LGSQRGSCST SSGVPADLEK NVPNSKVAFS NIKFGPIGST YNKEGTQPQP TNPTNPNPTN PTNPGTVDQW
GQCGGTNYSG PTACKSPFTC KKINDFYSQC Q
MFRTAALTAF TLAAVVLGQQ VGTLAAENHP ALSIQQCTAS GCTTQQKSVV LDSNWRWTHS TAGATNCYTG NAWDSSLCPN
PTTCATNCAI DGADYSGTYG ITTSGNSLTL RFVTNGQYSE NIGSRVYLLD DADHYKLFNL KNQEFTFDVD MSGLPCGLNG
ALYFSEMAAD GGKAAHTGNN AGAKYGTGYC DAQCPHDIKW INGEANILDW SGSATDPNAG NGRYGACCAE MDIWEANSEA
TAYTPHVCRD EGLYRCSGTE CGDGDNRYGG VCDKDGCDFN SYRMGDKNFL GRGKTIDTTK KITVVTQFIT DDNTPTGNLV
EIRRVYVQDG VTYQNSFSTF PSLSQYNSIS DDFCVAQKTL FGDNQYYNTH GGTEKMGDSL ANGMVLIMSL WSDHAAHMLW
LDSDYPLDKS PSEPGVSRGA CATTTGDPDD VVANHPNASV TFSNIKYGPI GSTYGGSTPP VSSGNTSVPP VTSTTSSGPT
TPTGPTGTVP KWGQCGGIGY SGPTSCVAGS TCTYSNEWYS QCL
MYQKLALISA FLATARAQSA CTLQAETHPP LTWQKCSSGG TCTQQTGSVV IDANWRWTHA TNSSTNCYDG NTWSSTLCPD
NETCAKNCCL DGAAYASTYG VTTSADSLSI GFVTQSAQKN VGARLYLMAS DTTYQEFTLL GNEFSFDVDV SQLPCGLNGA
LYFVSMDADG GVTKYPTNTA GAKYGTGYCD SQCPRDLKFI NGQANVEGWE PSSNNANTGI GGHGSCCSEM DIWEANSISE
ALTPHPCTTV GQEICEGDSC GGTYSGDRYG GTCDPDGCDW NPYRLGNTSF YGPGSSFTLD TTKKLTVVTQ FETSGAINRY
YVQNGVTFQQ PNAELGDYSG NSLDDDYCAA EEAEFGGSSF SDKGGLTQFK KATSGGMVLV MSLWDDYYAN MLWLDSTYPT
DETSSTPGAV RGSSSTSSGV PAQLESNSPN AKVVYSNIKF GPIGSTGNPS GGNPPGGNPP GTTTPRPATS TGSSPGPTQT
HYGQCGGIGY IGPTVCASGS TCQVLNPYYS QCL
MTWQSCTAKG SCTNKNGKIV IDANWRWLHK KEGYDNCYTG NEWDATACPD NKACAANCAV DGADYSGTYG ITAGSNSLKL
KFITKGSYST NIGSRTYLMK DDTTYEMFKF TGNQEFTFDV DVSNLPCGFN GALYFVSMDA DGGLKKYSTN KAGAKYGTGY
CDAQCPRDLK FINGEGNVEG WKPSSNDANA GVGGHGSCCA EMDIWEANSV STAVTPHSCS TIEQSRCDGD GCGGTYSADR
YAGVCDPDGC DFNSYRMGVK DFYGKGKTVD TSKKFTVVTQ FIGTGDAMEI KRFYVQNGKT IAQPASAVPG VEGNSITTKF
CDQQKAVFGD TYTFKDKGGM ANMAKALANG MVLVMSLWDD HYSNMLWLDS TYPTDKNPDT DLGTGRGECE TSSGVPADVE
SQHADATVVY SNIKFGPLNS TFG
MASAISFQVY RSALILSAFL PSITQAQQIG TYTTETHPSM TWETCTSGGS CATNQGSVVM DANWRWVHQV GSTTNCYTGN
TWDTSICDTD ETCATECAVD GADYESTYGV TTSGSQIRLN FVTQNSNGAN VGSRLYMMAD NTHYQMFKLL NQEFTFDVDV
SNLPCGLNGA LYFVTMDEDG GVSKYPNNKA GAQYGVGYCD SQCPRDLKFI QGQANVEGWT PSSNNENTGL GNYGSCCAEL
DIWESNSISQ ALTPHPCDTA TNTMCTGDAC GGTYSSDRYA GTCDPDGCDF NPYRMGNTTF YGPGKTIDTN SPFTVVTQFI
TDDGTDTGTL SEIRRYYVQN GVTYAQPDSD ISGITGNAIN ADYCTAENTV FDGPGTFAKH GGFSAMSEAM STGMVLVMSL
WDDYYADMLW LDSTYPTNAS SSTPGAVRGS CSTDSGVPAT IESESPDSYV TYSNIKVGPI GSTFSSGSGS GSSGSGSSGS
ASTSTTSTKT TAATSTSTAV AQHYSQCGGQ DWTGPTTCVS PYTCQVQNAY YSQCL
MKAYFEYLVA ALPLLGLATA QQVGKQTTET HPKLSWKKCT GKANCNTVNA EVVIDSNWRW LHDSSGKNCY DGNKWTSACS
SATDCASKCQ LDGANYGTTY GASTSGDALT LKFVTKHEYG TNIGSRFYLM NGASKYQMFT LMNNEFAFDV DLSTVECGLN
AALYFVAMEE DGGMASYSSN KAGAKYGTGY CDAQCARDLK FVGGKANIEG WTPSTNDANA GVGPYGGCCA EIDVWESNAH
SFAFTPHACK TNKYHVCERD NCGGTYSEDR FAGLCDANGC DYNPYRMGNT DFYGKGKTVD TSKKFTVVSR FEENKLTQFF
VQNGQKIEIP GPKWDGIPSD NANITPEFCS AQFQAFGDRD RFAEVGGFAQ LNSALRMPMV LVMSIWDDHY ANMLWLDSVY
PPEKEGQPGA ARGDCPQSSG VPAEVESQYA NSKVVYSNIR FGPVGSTVNV
MFSKFALTGS LLAGAVNAQG VGTQQTETHP QMTWQSCTSP SSCTTNQGEV VIDSNWRWVH DKDGYVNCYT GNTWNTTLCP
DDKTCAANCV LDGADYSSTY GITTSGNALS LQFVTQSSGK NIGSRTYLME SSTKYHLFDL IGNEFAFDVD LSKLPCGLNG
ALYFVTMDAD GGMAKYSTNT AGAEYGTGYC DSQCPRDLKF INGQGNVEGW TPSTNDANAG VGGLGSCCSE MDVWEANSMD
MAYTPHPCET AAQHSCNADE CGGTYSSSRY AGDCDPDGCD WNPFRMGNKD FYGSGDTVDT SQKFTVVTQF HGSGSSLTEI
SQYYIQGGTK IQQPNSTWPT LTGYNSITDD FCKAQKVEFN DTDVFSEKGG LAQMGAGMAD GMVLVMSLWD DHYANMLWLD
STYPVDADAS SPGKQRGTCA TTSGVPADVE SSDASATVIY SNIKFGPIGA TY
MFPAAALLSF TLLAVASAQQ IGTNTAEVHP SLTVSQCTTS GGCTSSTQSI VLDANWRWLH STSGYTNCYT GNQWNSDLCP
DPDTCATNCA LDGASYESTY GISTDGNAVT LNFVTQGSQT NVGSRVYLLS DDTHYQTFSL LNKEFSFDVD ASNIGCGING
AVYFVQMDAD GGLSKYSSNK AGAQYGTGYC DSQCPQDIKF INGEANLLDW NATSANSGTG SYGSCCPEMD IWEANKYAAA
YTPHPCSVSG QTRCTGTSCG AGSERYDGYC DKDGCDFNSW RMGNETFLGP GMTIDTNKKF TIVTQFITDD NTANGTLSEI
RRLYVQGGTV IQNSVANQPN IPKVNSITDS FCTAQKTEFG DQDYFGTIGG LSQMGKAMSD MVLVMSIWDD YDAEMLWLDS
NYPTSGSAST PGISRGPCSA TSGLPATVES QQASASVTYS NIKWGDIGST YSGSGSSGSS SSSSSSAASA STSTHTSAAA
TATSSAAAAT GSPVPAYGQC GGQSYTGSTT CASPYVCKVS NAYYSQCLPA
MKRALCASLS LLAAAVAQQV GTNEPEVHPK MTWKKCSSGG SCSTVNGEVV IDGNWRWIHN IGGYENCYSG NKWTSVCSTN
ADCATKCAME GAKYQETYGV STSGDALTLK FVQQNSSGKN VGSRMYLMNG ANKYQMFTLK NNEFAFDVDL SSVECGMNSA
LYFVPMKEDG GMSTEPNNKA GAKYGTGYCD AQCARDLKFI GGKGNIEGWQ PSSTDSSAGI GAQGACCAEI DIWESNKNAF
AFTPHPCENN EYHVCTEPNC GGTYADDRYG GGCDANGCDY NPYRMGNPDF YGPGKTIDTN RKFTVISRFE NNRNYQILMQ
DGVAHRIPGP KFDGLEGETG ELNEQFCTDQ FTVFDERNRF NEVGGWSKLN AAYEIPMVLV MSIWSDHFAN MLWLDSTYPP
EKAGQPGSAR GPCPADGGDP NGVVNQYPNA KVIWSNVRFG PIGSTYQVD
MQLTKAGVFL GALMGGAAAQ QVGTQTAENH PKMTWKKCTG KASCTTVNGE VVIDANWRWL HDASSKNCYD GNRWTDSCRT
ASDCAAKCSL EGADYAKTYG ASTSGDALSL KFVTRHDYGT NIGSRFYLMN GASKYQMFSL LGNEFAFDVD LSTIECGLNS
ALYFVAMEED GGMKSYSSNK AGAKYGTGYC DAQCARDLKF VGGKANIEGW KPSSNDANAG VGPYGACCAE IDVWESNAHA
FAFTPHPCTD NKYHVCQDSN CGGTYSDDRF AGKCDANGCD INPYRLGNTD FYGKGKTVDT SKKFTVVTRF ERDALTQFFV
QNNKRIDMPS PALEGLPATG AITAEYCTNV FNVFGDRNRF DEVGGWSQLQ QALSLPMVLV MSIWDDHYSN MLWLDSVYPP
DKEGSPGAAR GDCPQDSGVP SEVESQIPGA TVVWSNIRFG PVGSTVNV
MYRIVATASA LIAAARAQQV CSLNTETKPA LTWSKCTSSG CSDVKGSVVI DANWRWTHQT SGSTNCYTGN KWDTSICTDG
KTCAEKCCLD GADYSGTYGI TSSGNQLSLG FVTNGPYSKN IGSRTYLMEN ENTYQMFQLL GNEFTFDVDV SGIGCGLNGA
PHFVSMDEDG GKAKYSGNKA GAKYGTGYCD AQCPRDVKFI NGVANSEGWK PSDSDVNAGV GNLGTCCPEM DIWEANSIST
AFTPHPCTKL TQHSCTGDSC GGTYSSDRYG GTCDADGCDF NAYRQGNKTF YGPGSNFNID TTKKMTVVTQ FHKGSNGRLS
EITRLYVQNG KVIANSESKI AGNPGSSLTS DFCSKQKSVF GDIDDFSKKG GWNGMSDALS APMVLVMSLW HDHHSNMLWL
DSTYPTDSTK VGSQRGSCAT TSGKPSDLER DVPNSKVSFS NIKFGPIGST YKSDGTTPNP PASSSTTGSS TPTNPPAGSV
DQWGQCGGQN YSGPTTCKSP FTCKKINDFY SQCQ
MYQRALLFSA LATAVSAQQV GTQKAEVHPA LTWQKCTAAG SCTDQKGSVV IDANWRWLHS TEDTTNCYTG NEWNAELCPD
NEACAKNCAL DGADYSGTYG VTADGSSLKL NFVTSANVGS RLYLMEDDET YQMFNLLNNE FTFDVDVSNL PCGLNGALYF
VSMDADGGLS KYPGNKAGAK YGTGYCDSQC PRDLKFINGE ANVEGWKPSD NDKNAGVGGY GSCCPEMDIW EANSISTAYT
PHPCDGMEQT RCDGNDCGGT YSSTRYAGTC DPDGCDFNSF RMGNESFYGP GGLVDTKSPI TVVTQFVTAG GTDSGALKEI
RRVYVQGGKV IGNSASNVAG VEGDSITSDF CTAQKKAFGD EDIFSKHGGL EGMGKALNKM ALIVSIWDDH ASSMMWLDST
YPVDADASTP GVARGTCEHG LGDPETVESQ HPDASVTFSN IKFGPIGSTY KSV
MSALNSFNMY KSALILGSLL ATAGAQQIGT YTAETHPSLS WSTCKSGGSC TTNSGAITLD ANWRWVHGVN TSTNCYTGNT
WNTAICDTDA SCAQDCALDG ADYSGTYGIT TSGNSLRLNF VTGSNVGSRT YLMADNTHYQ IFDLLNQEFT FTVDVSNLPC
GLNGALYFVT MDADGGVSKY PNNKAGAQYG VGYCDSQCPR DLKFIAGQAN VEGWTPSTNN SNTGIGNHGS CCAELDIWEA
NSISEALTPH PCDTPGLTVC TADDCGGTYS SNRYAGTCDP DGCDFNPYRL GVTDFYGSGK TVDTTKPFTV VTQFVTDDGT
SSGSLSEIRR YYVQNGVVIP QPSSKISGIS GNVINSDFCA AELSAFGETA SFTNHGGLKN MGSALEAGMV LVMSLWDDYS
VNMLWLDSTY PANETGTPGA ARGSCPTTSG NPKTVESQSG SSYVVFSDIK VGPFNSTFSG GTSTGGSTTT TASGTTSTKA
STTSTSSTST GTGVAAHWGQ CGGQGWTGPT TCASGTTCTV VNPYYSQCL
MRTAKFATLA ALVASAAAQQ ACSLTTERHP SLSWNKCTAG GQCQTVQASI TLDSNWRWTH QVSGSTNCYT GNKWDTSICT
DAKSCAQNCC VDGADYTSTY GITTNGDSLS LKFVTKGQHS TNVGSRTYLM DGEDKYQTFE LLGNEFTFDV DVSNIGCGLN
GALYFVSMDA DGGLSRYPGN KAGAKYGTGY CDAQCPRDIK FINGEANIEG WTGSTNDPNA GAGRYGTCCS EMDIWEANNM
ATAFTPHPCT IIGQSRCEGD SCGGTYSNER YAGVCDPDGC DFNSYRQGNK TFYGKGMTVD TTKKITVVTQ FLKDANGDLG
EIKRFYVQDG KIIPNSESTI PGVEGNSITQ DWCDRQKVAF GDIDDFNRKG GMKQMGKALA GPMVLVMSIW DDHASNMLWL
DSTFPVDAAG KPGAERGACP TTSGVPAEVE AEAPNSNVVF SNIRFGPIGS TVAGLPGAGN GGNNGGNPPP PTTTTSSAPA
TTTTASAGPK AGRWQQCGGI GFTGPTQCEE PYICTKLNDW YSQCL
MMYKKFAALA ALVAGASAQQ ACSLTAENHP SLTWKRCTSG GSCSTVNGAV TIDANWRWTH TVSGSTNCYT GNQWDTSLCT
DGKSCAQTCC VDGADYSSTY GITTSGDSLN LKFVTKHQYG TNVGSRVYLM ENDTKYQMFE LLGNEFTFDV DVSNLGCGLN
GALYFVSMDA DGGMSKYSGN KAGAKYGTGY CDAQCPRDLK FINGEANVGN WTPSTNDANA GFGRYGSCCS EMDVWEANNM
ATAFTPHPCT TVGQSRCEAD TCGGTYSSDR YAGVCDPDGC DFNAYRQGDK TFYGKGMTVD TNKKMTVVTQ FHKNSAGVLS
EIKRFYVQDG KIIANAESKI PGNPGNSITQ EYCDAQKVAF SNTDDFNRKG GMAQMSKALA GPMVLVMSVW DDHYANMLWL
DSTYPIDQAG APGAERGACP TTSGVPAEIE AQVPNSNVIF SNIRFGPIGS TVPGLDGSNP GNPTTTVVPP ASTSTSRPTS
STSSPVSTPT GQPGGCTTQK WGQCGGIGYT GCTNCVAGTT CTQLNPWYSQ CL
MASLSLSKIC RNALILSSVL STAQGQQVGT YQTETHPSMT WQTCGNGGSC STNQGSVVLD ANWRWVHQTG SSSNCYTGNK
WDTSYCSTND ACAQKCALDG ADYSNTYGIT TSGSEVRLNF VTSNSNGKNV GSRVYMMADD THYEVYKLLN QEFTFDVDVS
KLPCGLNGAL YFVVMDADGG VSKYPNNKAG AKYGTGYCDS QCPRDLKFIQ GQANVEGWVS STNNANTGTG NHGSCCAELD
IWESNSISQA LTPHPCDTPT NTLCTGDACG GTYSSDRYSG TCDPDGCDFN PYRVGNTTFY GPGKTIDTNK PITVVTQFIT
DDGTSSGTLS EIKRFYVQDG VTYPQPSADV SGLSGNTINS EYCTAENTLF EGSGSFAKHG GLAGMGEAMS TGMVLVMSLW
DDYYANMLWL DSNYPTNEST SKPGVARGTC STSSGVPSEV EASNPSAYVA YSNIKVGPIG STFKS
MYRAIATASA LIAAVRAQQV CSLTPETKPA LSWSKCTSSG CSNVQGSVTI DANWRWTHQL SGSTNCYTGN KWDTSICTSG
KVCAEKCCID GAEYASTYGI TSSGNQLSLS FVTKGAYGTN IGSRTYLMED ENTYQMFQLL GNEFTFDVDV SNIGCGLNGA
LYFVSMDADG GKAKYPGNKA GAKYGTGYCD AQCPRDVKFI NGQANSDGWQ PSKSDVNAGI GNMGTCCPEM DIWEANSIST
AYTPHPCTKL TQHSCTGDSC GGTYSNDRYG GTCDADGCDF NAYRQGNKTF YGPGSGFNVD TTKKVTVVTQ FHKGSNGRLS
EITRLYVQNG KVIANSESKI AGVPGSSLTP EFCTAQKKVF GDTDDFAKKG AWSGMSDALE APMVLVMSLW HDHHSNMLWL
DSTYPTDSTK LGAQRGSCST SSGVPADLEK NVPNSKVAFS NIKFGPIGST YKEGVPEPTN PTNPTNPTNP TNPGTVDQWA
QCGGTNYSGP TACKSPFTCK KINDFYSQCQ
MFPKSSLLVL SFLATAYAQQ VGTQTAEVHP SLNWARCTSS GCTNVAGSVT LDANWRWLHT TSGYTNCYTG NSWNTTLCPD
GATCAQNCAL DGANYQSTCG ITTSGNALTL KFVTQGEQKN IGSRVYLMAS ESRYEMFGLL NKEFTFDVDV SNLPCGLNGA
LYFSSMDADG GMAKNPGNKA GAKYGTGYCD SQCPRDIKFI NGEANVAGWN GSPNDTNAGT GNWGACCNEM DIWEANSISA
AYTPHPCTVQ GLSRCSGTAC GTNDRYGTVC DPDGCDFNSY RMGDKTYYGP GGTGVDTRSK FTVVTQFLTN NNSSSGTLSE
IRRLYVQNGR VVQNSKVNIP GMSNTLDSIT TGFCDSQKTA FGDTRSFQNK GGMSAMGQAL GAGMVLVLSV WDDHAANMLW
LDSNYPVDAD PSKPGIARGT CSTTSGKPTD VEQSAANSSV TFSNIKFGDI GTTYTGGSVT TTPGNPGTTT STAPGAVQTK
WGQCGGQGWT GPTRCESGST CTVVNQWYSQ CI
MFRKAALLAF SFLAIAHGQQ VGTNQAENHP SLPSQHCTAS GCTTSSTSVV LDANWRWVHT TTGYTNCYTG QTWDASICPD
GVTCAKACAL DGADYSGTYG ITTSGNALTL QFVKGTNVGS RVYLLQDASN YQLFKLINQE FTFDVDMSNL PCGLNGAVYL
SQMDQDGGVS RFPTNTAGAK YGTGYCDSQC PRDIKFINGE ANVAGWTGSS SDPNSGTGNY GTCCSEMDIW EANSVAAAYT
PHPCSVNQQT RCTGADCGQD ANRYKGVCDP DGCDFNSFRM GDQTFLGKGL TVDTSRKFTI VTQFISDDGT SSGNLAEIRR
FYVQDGKVIP NSKVNIAGCD AVNSITDKFC TQQKTAFGDT NRFADQGGLK QMGAALKSGM VLALSLWDDH AANMLWLDSD
YPTTADASKP GVARGTCPNT SGVPKDVESQ SGSATVTYSN IKWGDLNSTF SGTASNPTGP SSSPSGPSSS SSSTAGSQPT
QPSSGSVAQW GQCGGIGYSG ATGCVSPYTC HVVNPYYSQC Y
TETHPRLTWK RCTSGGNCST VNGAVTIDAN WRWTHTVSGS TNCYTGNEWD TSICSDGKSC AQTCCVDGAD YSSTYGITTS
GDSLNLKFVT KHQHGTNVGS RVYLMENDTK YQMFELLGNE FTFDVDVSNL GCGLNGALYF VSMDADGGMS KYSGNKAGAK
YGTGYCDAQC PRDLKFINGE ANIENWTPST NDANAGFGRY GSCCSEMDIW EANNMATAFT PHPCTIIGQS RCEGNSCGGT
YSSERYAGVC DPDGCDFNAY RQGDKTFYGK GMTVDTTKKM TVVTQFHKNS AGVLSEIKRF YVQDGKIIAN AESKIPGNPG
NSITQEWCDA QKVAFGDIDD FNRKGGMAQM SKALEGPMVL VMSVWDDHYA NMLWLDSTYP IDKAGTPGAE RGACPTTSGV
PAEIEAQVPN SNVIFSNIRF GPIGSTVPGL DGSTPSNPTA TVAPPTSTTT SVRSSTTQIS TPTSQPGGCT TQKWGQCGGI
GYTGCTNCVA GTTCTELNPW YSQCL
MFHKAVLVAF SLVTIVHGQQ AGTQTAENHP QLSSQKCTAG GSCTSASTSV VLDSNWRWVH TTSGYTNCYT GNTWDASICS
DPVSCAQNCA LDGADYAGTY GITTSGDALT LKFVTGSNVG SRVYLMEDET NYQMFKLMNQ EFTFDVDVSN LPCGLNGAVY
FVQMDQDGGT SKFPNNKAGA KFGTGYCDSQ CPQDIKFING EANIVDWTAS AGDANSGTGS FGTCCQEMDI WEANSISAAY
TPHPCTVTEQ TRCSGSDCGQ GSDRFNGICD PDGCDFNSFR MGNTEFYGKG LTVDTSQKFT IVTQFISDDG TADGNLAEIR
RFYVQNGKVI PNSVVQITGI DPVNSITEDF CTQQKTVFGD TNNFAAKGGL KQMGEAVKNG MVLALSLWDD YAAQMLWLDS
DYPTTADPSQ PGVARGTCPT TSGVPSQVEG QEGSSSVIYS NIKFGDLNST FTGTLTNPSS PAGPPVTSSP SEPSQSTQPS
QPAQPTQPAG TAAQWAQCGG MGFTGPTVCA SPFTCHVLNP YYSQCY
MFRAAALLAF TCLAMVSGQQ AGTNTAENHP QLQSQQCTTS GGCKPLSTKV VLDSNWRWVH STSGYTNCYT GNEWNTSLCP
DGKTCAANCA LDGADYSGTY GITSTGTALT LKFVTGSNVG SRVYLMADDT HYQLLKLLNQ EFTFDVDMSN LPCGLNGALY
LSAMDADGGM SKYPGNKAGA KYGTGYCDSQ CPKDIKFING EANVGNWTET GSNTGTGSYG TCCSEMDIWE ANNDAAAFTP
HPCTTTGQTR CSGDDCARNT GLCDHGDGCD FNSFRMGDKT FLGKGMTVDT SKPFTDVTQF LTNDNTSTGT LSEIRRIYIQ
NGKVIQNSVA NIPGVDPVNS ITDNFCAQQK TAFGDTNWFA QKGGLKQMGE ALGNGMVLAL SIWDDHAANM LWLDSDYPTD
KDPSAPGVAR GTCATTSGVP SDVESQVPNS QVVFSNIKFG DIGSTFSGTS SPNPPGGSTT SSPVTTSPTP PPTGPTVPQW
GQCGGIGYSG STTCASPYTC HVLNPYYSQC Y
MMMKQYLQYL AAALPLVGLA AGQRAGNETP ENHPPLTWQR CTAPGNCQTV NAEVVIDANW RWLHDDNMQN CYDGNQWTNA
CSTATDCAEK CMIEGAGDYL GTYGASTSGD ALTLKFVTKH EYGTNVGSRF YLMNGPDKYQ MFNLMGNELA FDVDLSTVEC
GINSALYFVA MEEDGGMASY PSNQAGARYG TGYCDAQCAR DLKFVGGKAN IEGWKSSTSD PNAGVGPYGS CCAEIDVWES
NAYAFAFTPH ACTTNEYHVC ETTNCGGTYS EDRFAGKCDA NGCDYNPYRM GNPDFYGKGK TLDTSRKFTV VSRFEENKLS
QYFIQDGRKI EIPPPTWEGM PNSSEITPEL CSTMFDVFND RNRFEEVGGF EQLNNALRVP MVLVMSIWDD HYANMLWLDS
IYPPEKEGQP GAARGDCPTD SGVPAEVEAQ FPDAQVVWSN IRFGPIGSTY DF
MYRSATFLTF ASLVLGQQVG TYTAERHPSM PIQVCTAPGQ CTRESTEVVL DANWRWTHIT NGYTNCYTGN EWNATACPDG
ATCAKNCAVD GADYSGTYGI TTPSSGALRL QFVKKNDNGQ NVGSRVYLMA SSDKYKLFNL LNKEFTFDVD VSKLPCGLNG
AVYFSEMLED GGLKSFSGNK AGAKYGTGYC DSQCPQDIKF INGEANVEGW GGADGNSGTG KYGICCAEMD IWEANSDATA
YTPHVCSVNE QTRCEGVDCG AGSDRYNSIC DKDGCDFNSY RLGNREFYGP GKTVDTTRPF TIVTQFVTDD GTDSGNLKSI
HRYYVQDGNV IPNSVTEVAG VDQTNFISEG FCEQQKSAFG DNNYFGQLGG MRAMGESLKK MVLVLSIWDD HAVNMNWLDS
IFPNDADPEQ PGVARGRCDP ADGVPATIEA AHPDAYVIYS NIKFGAINST FTAN
MYRTLAFASL SLYGAARAQQ VGTSTAENHP KLTWQTCTGT GGTNCSNKSG SVVLDSNWRW AHNVGGYTNC YTGNSWSTQY
CPDGDSCTKN CAIDGADYSG TYGITTSNNA LSLKFVTKGS FSSNIGSRTY LMETDTKYQM FNLINKEFTF DVDVSKLPCG
LNGALYFVEM AADGGIGKGN NKAGAKYGTG YCDSQCPHDI KFINGKANVE GWNPSDADPN GGAGKIGACC PEMDIWEANS
ISTAYTPHPC RGVGLQECSD AASCGDGSNR YDGQCDKDGC DFNSYRMGVK DFYGPGATLD TTKKMTVITQ FLGSGSSLSE
IKRFYVQNGK VYKNSQSAVA GVTGNSITES FCTAQKKAFG DTSSFAALGG LNEMGASLAR GHVLIMSLWG DHAVNMLWLD
STYPTDADPS KPGAARGTCP TTSGKPEDVE KNSPDATVVF SNIKFGPIGS TFAQPA
MYQKLALISA FLATARAQSA CTLQAETHPP LTWQKCSSGG TCTQQTGSVV IDANWRWTHA TNSSTNCYDG NTWSSTLCPD
NETCAKNCCL DGAAYASTYG VTTSADSLSI GFVTQSAQKN VGARLYLMAS DTTYQEFTLL GNEFSFDVDV SQLPCGLNGA
LYFVSMDADG GVSKYPTNTA GAKYGTGYCD SQCPRDLKFI NGQANVEGWE PSSNNANTGI GGHGSCCSEM DIWEANSISE
ALTPHPCTTV GQEICDGDSC GGTYSGDRYG GTCDPDGCDW NPYRLGNTSF YGPGSSFTLD TTKKLTVVTQ FETSGAINRY
YVQNGVTFQQ PNAELGDYSG NSLDDDYCAA EEAEFGGSSF SDKGGLTQFK KATSGGMVLV MSLWDDYYAN MLWLDSTYPT
NETSSTPGAV RGSCSTSSGV PAQLESNSPN AKVVYSNIKF GPIGSTGNSS GGNPPGGNPP GTTTTRRPAT STGSSPGPTQ
THYGQCGGIG YSGPTVCASG STCQVLNPYY SQCL
MVDSFSIYKT ALLLSMLATS NAQQVGTYTA ETHPSLTWQT CSGSGSCTTT SGSVVIDANW RWVHEVGGYT NCYSGNTWDS
SICSTDTTCA SECALEGATY ESTYGVTTSG SSLRLNFVTT ASQKNIGSRL YLLADDSTYE TFKLFNREFT FDVDVSNLPC
GLNGALYFVS MDADGGVSRF PTNKAGAKYG TGYCDSQCPR DLKFIDGQAN IEGWEPSSTD VNAGTGNHGS CCPEMDIWEA
NSISSAFTAH PCDSVQQTMC TGDTCGGTYS DTTDRYSGTC DPDGCDFNPY RFGNTNFYGP GKTVDNSKPF TVVTQFITHD
GTDTGTLTEI RRLYVQNGVV IGNGPSTYTA ASGNSITESF CKAEKTLFGD TNVFETHGGL SAMGDALGDG MVLVLSLWDD
HAADMLWLDS DYPTTSCASS PGVARGTCPT TTGNATYVEA NYPNSYVTYS NIKFGTLNST YSGTSSGGSS SSSTTLTTKA
STSTTSSKTT TTTSKTSTTS SSSTNVAQLY GQCGGQGWTG PTTCASGTCTKQNDYYSQCL
MYRILKSFIL LSLVNMSLSQ KIGKLTPEVH PPMTFQKCSE GGSCETIQGE VVVDANWRWV HSAQGQNCYT GNTWNPTICP
DDETCAENCY LDGANYESVY GVTTSEDSVR LNFVTQSQGK NIGSRLFLMS NESNYQLFHV LGQEFTFDVD VSNLDCGLNG
ALYLVSMDSD GGSARFPTNE AGAKYGTGYC DAQCPRDLKF ISGSANVDGW IPSTNNPNTG YGNLGSCCAE MDLWEANNMA
TAVTPHPCDT SSQSVCKSDS CGGAASSNRY GGICDPDGCD YNPYRMGNTS FFGPNKMIDT NSVITVVTQF ITDDGSSDGK
LTSIKRLYVQ DGNVISQSVS TIDGVEGNEV NEEFCTNQKK VFGDEDSFTK HGGLAKMGEA LKDGMVLVLS LWDDYQANML
WLDSSYPTTS SPTDPGVARG SCPTTSGVPS KVEQNYPNAY VVYSNIKVGP IDSTYKK
MISRVLAISS LLAAARAQQI GTNTAEVHPA LTSIVIDANW RWLHTTSGYT NCYTGNSWDA TLCPDAVTCA ANCALDGADY
SGTYGITTSG NSLKLNFVTK GANTNVGSRT YLMAAGSKTQ YQLLKLLGQE FTFDVDVSNL PCGLNGALYF AEMDADGGVS
RFPTNKAGAQ YGTGYCDAQC PQDIKFINGQ ANSVGWTPSS NDVNTGTGQY GSCCSEMDIW EANKISAAYT PHPCSVDGQT
RCTGTDCGIG ARYSSLCDAD GCDFNSYRMG DTGFYGAGLT VDTSKVFTVV TQFITNDGTT SGTLSEIRRF YVQNGKVIPN
SQSKVTGVSG NSITDSFCAA QKTAFGDTNE FATKGGLATM SKALAKGMVL VMSIWDDHSA NMLWLDAPYP ASKSPSAAGV
SRGSCSASSG VPADVEANSP GASVTYSNIK WGPINSTYSA GTGSNTGSGS GSTTTLVSSV PSSTPTSTTG VPKYGQCGGS
GYTGPTNCIG STCVSMGQYY SQCQ
MYRQVATALS FASLVLGQQV GTLTAETHPS LPIEVCTAPG SCTKEDTTVV LDANWRWTHV TDGYTNCYTG NAWNETACPD
GKTCAANCAI DGAEYEKTYG ITTPEEGALR LNFVTESNVG SRVYLMAGED KYRLFNLLNK EFTMDVDVSN LPCGLNGAVY
FSEMDEDGGM SRFEGNKAGA KYGTGYCDSQ CPRDIKFING EANSEGWGGE DGNSGTGKYG TCCAEMDIWE ANLDATAYTP
HPCKVTEQTR CEDDTECGAG DARYEGLCDR DGCDFNSFRL GNKEFYGPEK TVDTSKPFTL VTQFVTADGT DTGALQSIRR
FYVQDGTVIP NSETVVEGVD PTNEITDDFC AQQKTAFGDN NHFKTIGGLP AMGKSLEKMV LVLSIWDDHA VYMNWLDSNY
PTDADPTKPG VARGRCDPEA GVPETVEAAH PDAYVIYSNI KIGALNSTFA AA
MSSFQVYRAA LLLSILATAN AQQVGTYTTE THPSLTWQTC TSDGSCTTND GEVVIDANWR WVHSTSSATN CYTGNEWDTS
ICTDDVTCAA NCALDGATYE ATYGVTTSGS ELRLNFVTQG SSKNIGSRLY LMSDDSNYEL FKLLGQEFTF DVDVSNLPCG
LNGALYFVAM DADGGTSEYS GNKAGAKYGT GYCDSQCPRD LKFINGEANC DGWEPSSNNV NTGVGDHGSC CAEMDVWEAN
SISNAFTAHP CDSVSQTMCD GDSCGGTYSA SGDRYSGTCD PDGCDYNPYR LGNTDFYGPG LTVDTNSPFT VVTQFITDDG
TSSGTLTEIK RLYVQNGEVI ANGASTYSSV NGSSITSAFC ESEKTLFGDE NVFDKHGGLE GMGEAMAKGM VLVLSLWDDY
AADMLWLDSD YPVNSSASTP GVARGTCSTD SGVPATVEAE SPNAYVTYSN IKFGPIGSTY SSGSSSGSGS SSSSSSTTTK
ATSTTLKTTS TTSSGSSSTS AAQAYGQCGG QGWTGPTTCV SGYTCTYENA YYSQCL
MYRAIATASA LLATARAQQV CTLNTENKPA LTWAKCTSSG CSNVRGSVVV DANWRWAHST SSSTNCYTGN TWDKTLCPDG
KTCADKCCLD GADYSGTYGV TSSGNQLNLK FVTVGPYSTN VGSRLYLMED ENNYQMFDLL GNEFTFDVDV NNIGCGLNGA
LYFVSMDKDG GKSRFSTNKA GAKYGTGYCD AQCPRDVKFI NGVANSDEWK PSDSDKNAGV GKYGTCCPEM DIWEANKIST
AYTPHPCKSL TQQSCEGDAC GGTYSATRYA GTCDPDGCDF NPYRQGNKTF YGPGSGFNVD TTKKVTVVTQ FIKGSDGKLS
EIKRLYVQNG KVIGNPQSEI ANNPGSSVTD SFCKAQKVAF NDPDDFNKKG GWSGMSDALA KPMVLVMSLW HDHYANMLWL
DSTYPKGSKT PGSARGSCPE DSGDPDTLEK EVPNSGVSFS NIKFGPIGST YTGTGGSNPD PEEPEEPEEP VGTVPQYGQC
GGINYSGPTA CVSPYKCNKI NDFYSQCQ
EQAGTATAEN HPPLTWQECT APGSCTTQNG AVVLDANWRW VHDVNGYTNC YTGNTWDPTY CPDDETCAQN CALDGADYEG
TYGVTSSGSS LKLNFVTGSN VGSRLYLLQD DSTYQIFKLL NREFSFDVDV SNLPCGLNGA LYFVAMDADG GVSKYPNNKA
GAKYGTGYCD SQCPRDLKFI DGEANVEGWQ PSSNNANTGI GDHGSCCAEM DVWEANSISN AVTPHPCDTP GQTMCSGDDC
GGTYSNDRYA GTCDPDGCDF NPYRMGNTSF YGPGKIIDTT KPFTVVTQFL TDDGTDTGTL SEIKRFYIQN SNVIPQPNSD
ISGVTGNSIT TEFCTAQKQA FGDTDDFSQH GGLAKMGAAM QQGMVLVMSL WDDYAAQMLW LDSDYPTDAD PTTPGIARGT
CPTDSGVPSD VESQSPNSYV TYSNIKFGPI NSTFTAS
MFPTLALVSL SFLAIAYGQQ VGTLTAETHP KLSVSQCTAG GSCTTVQRSV VLDSNWRWLH DVGGSTNCYT GNTWDDSLCP
DPTTCAANCA LDGADYSGTY GITTSGNALS LKFVTQGPYS TNIGSRVYLL SEDDSTYEMF NLKNQEFTFD VDMSALPCGL
NGALYFVEMD KDGGSGRFPT NKAGSKYGTG YCDTQCPHDI KFINGEANVL DWAGSSNDPN AGTGHYGTCC NEMDIWEANS
MGAAVTPHVC TVQGQTRCEG TDCGDGDERY DGICDKDGCD FNSWRMGDQT FLGPGKTVDT SSKFTVVTQF ITADNTTSGD
LSEIRRLYVQ NGKVIANSKT QIAGMDAYDS ITDDFCNAQK TTFGDTNTFE QMGGLATMGD AFETGMVLVM SIWDDHEAKM
LWLDSDYPTD ADASAPGVSR GPCPTTSGDP TDVESQSPGA TVIFSNIKTG PIGSTFTS
MLSASKAAAI LAFCAHTASA WVVGDQQTET HPKLNWQRCT GKGRSSCTNV NGEVVIDANW RWLAHRSGYT NCYTGSEWNQ
SACPNNEACT KNCAIEGSDY AGTYGITTSG NQMNIKFITK RPYSTNIGAR TYLMKDEQNY EMFQLIGNEF TFDVDLSQRC
GMNGALYFVS MPQKGQGAPG AKYGTGYCDA QCARDLKFVR GSANAEGWTK SASDPNSGVG KKGACCAQMD VWEANSAATA
LTPHSCQPAG YSVCEDTNCG GTYSEDRYAG TCDANGCDFN PFRVGVKDFY GKGKTVDTTK KMTVVTQFVG SGNQLSEIKR
FYVQDGKVIA NPEPTIPGME WCNTQKKVFQ EEAYPFNEFG GMASMSEGMS QGMVLVMSLW DDHYANMLWL DSNWPREADP
AKPGVARRDC PTSGGKPSEV EAANPNAQVM FSNIKFGPIG STFAHAA
MFRTATLLAF TMAAMVFGQQ VGTNTARSHP ALTSQKCTKS GGCSNLNTKI VLDANWRWLH STSGYTNCYT GNQWDATLCP
DGKTCAANCA LDGADYTGTY GITASGSSLK LQFVTGSNVG SRVYLMADDT HYQMFQLLNQ EFTFDVDMSN LPCGLNGALY
LSAMDADGGM AKYPTNKAGA KYGTGYCDSQ CPRDIKFING EANVEGWNAT SANAGTGNYG TCCTEMDIWE ANNDAAAYTP
HPCTTNAQTR CSGSDCTRDT GLCDADGCDF NSFRMGDQTF LGKGLTVDTS KPFTVVTQFI TNDGTSAGTL TEIRRLYVQN
GKVIQNSSVK IPGIDPVNSI TDNFCSQQKT AFGDTNYFAQ HGGLKQVGEA LRTGMVLALS IWDDYAANML WLDSNYPTNK
DPSTPGVARG TCATTSGVPA QIEAQSPNAY VVFSNIKFGD LNTTYTGTVS SSSVSSSHSS TSTSSSHSSS STPPTQPTGV
TVPQWGQCGG IGYTGSTTCA SPYTCHVLNP YYSQCY
MYQRALLFSA LMAGVSAQQV GTQKPETHPP LAWKECTSSG CTSKDGSVVI DANWRWVHSV DGYKNCYTGN EWDSTLCPDD
ATCATNCAVD GADYAGTYGA TTEGDSLSIN FVTGSNIGSR FYLMEDENKY QMFKLLNKEF TFDVDVSTLP CGLNGALYFV
SMDADGGMSK YETNKAGAKY GTGYCDSQCP RDLKFINGKG NVEGWKPSAN DKNAGVGPHG SCCAEMDIWE ANSISTALTP
HPCDTNGQTI CEGDSCGGTY STTRYAGTCD PDGCDFNPFR MGNESFYGPG KMVDTKSKMT VVTQFITSDG TDTGSLKEIK
RVYVQNGKVI ANSASDVSGI TGNSITSDFC TAQKKTFGDE DVFNKHGGLS GMGDALGEGM VLVMSLWDDH NSNMLWLDGE
KYPTDAAASK AGVSRGTCST DSGKPSTVES ESGSAKVVFS NIKVGSIGST FSA
MTSKIALASL FAAAYGQQIG TYTTETHPSL TWQSCTAKGS CTTQSGSIVL DGNWRWTHST TSSTNCYTGN TWDATLCPDD
ATCAQNCALD GADYSGTYGI TTSGDSLRLN FVTQTANKNV GSRVYLLADN THYKTFNLLN QEFTFDVDVS NLPCGLNGAV
YFANLPADGG ISSTNKAGAQ YGTGYCDSQC PRDGKFINGK ANVDGWVPSS NNPNTGVGNY GSCCAEMDIW EANSISTAVT
PHSCDTVTQT VCTGDNCGGT YSTTRYAGTC DPDGCDFNPY RQGNESFYGP GKTVDTNSVF TIVTQFLTTD GTSSGTLNEI
KRFYVQNGKV IPNSESTISG VTGNSITTPF CTAQKTAFGD PTSFSDHGGL ASMSAAFEAG MVLVLSLWDD YYANMLWLDS
TYPTTKTGAG GPRGTCSTSS GVPASVEASS PNAYVVYSNI KVGAINSTFG
MYTKFAALAA LVATVRGQAA CSLTAETHPS LQWQKCTAPG SCTTVSGQVT IDANWRWLHQ TNSSTNCYTG NEWDTSICSS
DTDCATKCCL DGADYTGTYG VTASGNSLNL KFVTQGPYSK NIGSRMYLME SESKYQGFTL LGQEFTFDVD VSNLGCGLNG
ALYFVSMDLD GGVSKYTTNK AGAKYGTGYC DSQCPRDLKF INGQANIDGW QPSSNDANAG LGNHGSCCSE MDIWEANKVS
AAYTPHPCTT IGQTMCTGDD CGGTYSSDRY AGICDPDGCD FNSYRMGDTS FYGPGKTVDT GSKFTVVTQF LTGSDGNLSE
IKRFYVQNGK VIPNSESKIA GVSGNSITTD FCTAQKTAFG DTNVFEERGG LAQMGKALAE PMVLVLSVWD DHAVNMLWLD
STYPTDSTKP GAARGDCPIT SGVPADVESQ APNSNVIYSN IRFGPINSTY TGTPSGGNPP GGGTTTTTTT TTSKPSGPTT
TTNPSGPQQT HWGQCGGQGW TGPTVCQSPY TCKYSNDWYS QCL
MYQRALLFSA LLSVSRAQQA GTAQEEVHPS LTWQRCEASG SCTEVAGSVV LDSNWRWTHS VDGYTNCYTG NEWDATLCPD
NESCAQNCAV DGADYEATYG ITSNGDSLTL KFVTGSNVGS RVYLMEDDET YQMFDLLNNE FTFDVDVSNF PCGLNGALYF
TSMDADGGLS KYEGNTAGAK YGTGYCDSQC PRDIKFINGL GNVEGWEPSD SDANAGVGGM GTCCPEMDIW EANSISTAYT
PHPCDSVEQT MCEGDSCGGT YSDDRYGGTC DPDGCDFNSY RMGNTRFYGP GAIIDTSSKF TVVTQFIADG GSLSEIKRFY
VQNGEVIPNS ESNISGVEGN SITSEFCTAQ KTAFGDEDIF AQHGGLSAMG DAASAMVLIL SIWDDHHSSM MWLDSSYPTD
ADPSQPGVAR GTCEQGAGDP DVVESEHADA SVTFSNIKFG PIGSTF
MMMKQYLQYL AAGSLMTGLV AGQGVGTQQT ETHPRITWKR CTGKANCTTV QAEVVIDSNW RWIHTSGGTN CYDGNAWNTA
ACSTATDCAS KCLMEGAGNY QQTYGASTSG DSLTLKFVTK HEYGTNVGSR FYLMNGASKY QMFTLMNNEF TFDVDLSTVE
CGLNSALYFV AMEEDGGMRS YPTNKAGAKY GTGYCDAQCA RDLKFVGGKA NIEGWRESSN DENAGVGPYG GCCAEIDVWE
SNAHAYAFTP HACENNNYHV CERDTCGGTY SEDRFAGGCD ANGCDYNPYR MGNPDFYGKG KTVDTTKKFT VVTRFQDDNL
EQFFVQNGQK ILAPAPTFDG IPASPNLTPE FCSTQFDVFT DRNRFREVGD FPQLNAALRI PMVLVMSIWA DHYANMLWLD
SVYPPEKEGE PGAARGPCAQ DSGVPSEVKA NYPNAKVVWS NIRFGPIGST VNV
MYQRALLFSF FLAAARAQQA GTVTAENHPS LTWQQCSSGG SCTTQNGKVV IDANWRWVHT TSGYTNCYTG NTWDTSICPD
DVTCAQNCAL DGADYSGTYG VTTSGNALRL NFVTQSSGKN IGSRLYLLQD DTTYQIFKLL GQEFTFDVDV SNLPCGLNGA
LYFVAMDADG GLSKYPGNKA GAKYGTGYCD SQCPRDLKFI NGQANVEGWQ PSANDPNAGV GNHGSCCAEM DVWEANSIST
AVTPHPCDTP GQTMCQGDDC GGTYSSTRYA GTCDPDGCDF NPYRQGNHSF YGPGKIVDTS SKFTVVTQFI TDDGTPSGTL
TEIKRFYVQN GKVIPQSEST ISGVTGNSIT TEYCTAQKAA FGDNTGFFTH GGLQKISQAL AQGMVLVMSL WDDHAANMLW
LDSTYPTDAD PDTPGVARGT CPTTSGVPAD VESQNPNSYV IYSNIKVGPI NSTFTAN
MFAIVLLGLT RSLGTGTNQA ENHPSLSWQN CRSGGSCTQT SGSVVLDSNW RWTHDSSLTN CYDGNEWSSS LCPDPKTCSD
NCLIDGADYS GTYGITSSGN SLKLVFVTNG PYSTNIGSRV YLLKDESHYQ IFDLKNKEFT FTVDDSNLDC GLNGALYFVS
MDEDGGTSRF SSNKAGAKYG TGYCDAQCPH DIKFINGEAN VENWKPQTND ENAGNGRYGA CCTEMDIWEA NKYATAYTPH
ICTVNGEYRC DGSECGDTDS GNRYGGVCDK DGCDFNSYRM GNTSFWGPGL IIDTGKPVTV VTQFVTKDGT DNGQLSEIRR
KYVQGGKVIE NTVVNIAGMS SGNSITDDFC NEQKSAFGDT NDFEKKGGLS GLGKAFDYGM VLVLSLWDDH QVNMLWLDSI
YPTDQPASQP GVKRGPCATS SGAPSDVESQ HPDSSVTFSD IRFGPIDSTY
MHQRALLFSA LVGAVRAQQA GTLTEEVHPP LTWQKCTADG SCTEQSGSVV IDSNWRWLHS TNGSTNCYTG NTWDESLCPD
NEACAANCAL DGADYESTYG ITTSGDALTL TFVTGENVGS RVYLMAEDDE SYQTFDLVGN EFTFDVDVSN LPCGLNGALY
FTSMDADGGV SKYPANKAGA KYGTGYCDSQ CPRDLKFING MANVEGWTPS DNDKNAGVGG HGSCCPELDI WEANSISSAF
TPHPCDDLGQ TMCSGDDCGG TYSETRYAGT CDPDGCDFNA YRMGNTSYYG PDKIVDTNSV MTVVTQFIGD GGSLSEIKRL
YVQNGKVIAN AQSNVDGVTG NSITSDFCTA QKTAFGDQDI FSKHGGLSGM GDAMSAMVLI LSIWDDHNSS MMWLDSTYPE
DADASEPGVA RGTCEHGVGD PETVESQHPG ATVTFSKIKF GPIGSTYSSN STA
MFRAAALLAF TCLAMVSGQQ AGTNTAENHP QLQSQQCTTS GGCKPLSTKV VLDSNWRWVH STSGYTNCYT GNEWDTSLCP
DGKTCAANCA LDGADYSGTY GITSTGTALT LKFVTGSNVG SRVYLMADDT HYQLLKLLNQ EFTFDVDMSN LPCGLNGALY
LSAMDADGGM SKYPGNKAGA KYGTGYCDSQ CPKDIKFING EANVGNWTET GSNTGTGSYG TCCSEMDIWE ANNDAAAFTP
HPCTTTGQTR CSGDDCARNT GLCDGDGCDF NSFRMGDKTF LGKGMTVDTS KPFTVVTQFL TNDNTSTGTL SEIRRIYIQN
GKVIQNSVAN IPGVDPVNSI TDNFCAQQKT AFGDTNWFAQ KGGLKQMGEA LGNGMVLALS IWDDHAANML WLDSDYPTDK
DPSAPGVARG TCATTSGVPS DVESQVPNSQ VVFSNIKFGD IGSTFSGTSS PNPPGGSTTS SPVTTSPTPP PTGPTVPQWG
QCGGIGYSGS TTCASPYTCH VLNPCESILS LQRSSNADQY LQTTRSATKR RLDTALQPRK
MRTALALILA LAAFSAVSAQ QAGTITAETH PTLTIQQCTQ SGGCAPLTTK VVLDVNWRWI HSTTGYTNCY SGNTWDAILC
PDPVTCAANC ALDGADYTGT FGILPSGTSV TLRPVDGLGL RLFLLADDSH YQMFQLLNKE FTFDVEMPNM RCGSSGAIHL
TAMDADGGLA KYPGNQAGAK YGTGFCSAQC PKGVKFINGQ ANVEGWLGTT ATTGTGFFGS CCTDIALWEA NDNSASFAPH
PCTTNSQTRC SGSDCTADSG LCDADGCNFN SFRMGNTTFF GAGMSVDTTK LFTVVTQFIT SDNTSMGALV EIHRLYIQNG
QVIQNSVVNI PGINPATSIT DDLCAQENAA FGGTSSFAQH GGLAQVGEAL RSGMVLALSI VNSAADTLWL DSNYPADADP
SAPGVARGTC PQDSASIPEA PTPSVVFSNI KLGDIGTTFG AGSALFSGRS PPGPVPGSAP ASSATATAPP FGSQCGGLGY
AGPTGVCPSP YTCQALNIYY SQCI
MYQRALLFSF FLAAARAHEA GTVTAENHPS LTWQQCSSGG SCTTQNGKVV IDANWRWVHT TSGYTNCYTG NTWDTSICPD
DVTCAQNCAL DGADYSGTYG VTTSGNALRL NFVTQSSGKN IGSRLYLLQD DTTYQIFKLL GQEFTFDVDV SNLPCGLNGA
LYFVAMDADG NLSKYPGNKA GAKYGTGYCD SQCPRDLKFI NGQANVEGWQ PSANDPNAGV GNHGSSCAEM DVWEANSIST
AVTPHPCDTP GQTMCQGDDC GGTYSSTRYA GTCDTDGCDF NPYQPGNHSF YGPGKIVDTS SKFTVVTQFI TDDGTPSGTL
TEIKRFYVQN GKVIPQSEST ISGVTGNSIT TEYCTAQKAA FDNTGFFTHG GLQKISQALA QGMVLVMSLW DDHAANMLWL
DSTYPTDADP DTPGVARGTC PTTSGVPADV ESQNPNSYVI YSNIKVGPIN STFTAN
MHKRAATLSA LVVAAAGFAR GQGVGTQQTE THPKLTFQKC SAAGSCTTQN GEVVIDANWR WVHDKNGYTN CYTGNEWNTT
ICADAASCAS NCVVDGADYQ GTYGASTSGN ALTLKFVTKG SYATNIGSRM YLMASPTKYA MFTLLGHEFA FDVDLSKLPC
GLNGAVYFVS MDEDGGTSKY PSNKAGAKYG TGYCDSQCPR DLKFIDGKAN SASWQPSSND QNAGVGGMGS CCAEMDIWEA
NSVSAAYTPH PCQNYQQHSC SGDDCGGTYS ATRFAGDCDP DGCDWNAYRM GVHDFYGNGK TVDTGKKFSI VTQFKGSGST
LTEIKQFYVQ DGRKIENPNA TWPGLEPFNS ITPDFCKAQK QVFGDPDRFN DMGGFTNMAK ALANPMVLVL SLWDDHYSNM
LWLDSTYPTD ADPSAPGKGR GTCDTSSGVP SDVESKNGDA TVIYSNIKFG PLDSTYTAS
MRASLLAFSL NSAAGQQAGT LQTKNHPSLT SQKCRQGGCP QVNTTIVLDA NWRWTHSTSG STNCYTGNTW QATLCPDGKT
CAANCALDGA DYTGTYGVTT SGNSLTLQFV TQSNVGARLG YLMADDTTYQ MFNLLNQEFW FDVDMSNLPC GLNGALYFSA
MARTAAWMPM VVCASTPLIS TRRSTARLLR LPVPPRSRYG RGICDSQCPR DIKFINGEAN VQGWQPSPND TNAGTGNYGA
CCNKMDVWEA NSISTAYTPH PCTQRGLVRC SGTACGGGSN RYGSICDHDG LGFQNLFGMG RTRVRARVGR VKQFNRSSRV
VEPISWTKQT TLHLGNLPWK SADCNVQNGR VIQNSKVNIP GMPSTMDSVT TEFCNAQKTA FNDTFSFQQK GGMANMSEAL
RRGMVLVLSI WDDHAANMLW LDSITSAAAC RSTPSEVHAT PLRESQIRSS HSRQTRYVTF TNIKFGPFNS TGTTYTTGSV
PTTSTSTGTT GSSTPPQPTG VTVPQGQCGG IGYTGPTTCA SPTTCHVLNP YYSQCY
MKQYLQYLAA ALPLMSLVSA QGVGTSTSET HPKITWKKCS SGGSCSTVNA EVVIDANWRW LHNADSKNCY DGNEWTDACT
SSDDCTSKCV LEGAEYGKTY GASTSGDSLS LKFLTKHEYG TNIGSRFYLM NGASKYQMFT LMNNEFAFDV DLSTVECGLN
SALYFVAMEE DGGMASYSTN KAGAKYGTGY CDAQCARDLK FVGGKANYDG WTPSSNDANA GVGALGGCCA EIDVWESNAH
AFAFTPHACE NNNYHVCEDT TCGGTYSEDR FAGDCDANGC DYNPYRVGNT DFYGKGMTVD TSKKFTVVSQ FQENKLTQFF
VQNGKKIEIP GPKHEGLPTE SSDITPELCS AMPEVFGDRD RFAEVGGFDA LNKALAVPMV LVMSIWDDHY ANMLWLDSSY
PPEKAGTPGG DRGPCAQDSG VPSEVESQYP DATVVWSNIR FGPIGSTVQV
MFPKASLIAL SFIAAVYGQQ VGTQMAEVHP KLPSQLCTKS GCTNQNTAVV LDANWRWLHT TSGYTNCYTG NSWDATLCPD
ATTCAQNCAV DGADYSGTYG ITTSGNALTL KFKTGTNVGS RVYLMQTDTA YQMFQLLNQE FTFDVDMSNL PCGLNGALYL
SQMDQDGGLS KFPTNKAGAK YGTGYCDSQC PHDIKFINGM ANVAGWAGSA SDPNAGSGTL GTCCSEMDIW EANNDAAAFT
PHPCSVDGQT QCSGTQCGDD DERYSGLCDK DGCDFNSFRM GDKSFLGKGM TVDTSRKFTV VTQFVTTDGT TNGDLHEIRR
LYVQDGKVIQ NSVVSIPGID AVDSITDNFC AQQKSVFGDT NYFATLGGLK KMGAALKSGM VLAMSVWDDH AASMQWLDSN
YPADGDATKP GVARGTCSAD SGLPTNVESQ SASASVTFSN IKWGDINTTF TGTGSTSPSS PAGPVSSSTS VASQPTQPAQ
GTVAQWGQCG GTGFTGPTVC ASPFTCHVVN PYYSQCY
MFRTAALLSF AYLAVVYGQQ AGTSTAETHP PLTWEQCTSG GSCTTQSSSV VLDSNWRWTH VVGGYTNCYT GNEWNTTVCP
DGTTCAANCA LDGADYEGTY GISTSGNALT LKFVTASAQT NVGSRVYLMA PGSETEYQMF NPLNQEFTFD VDVSALPCGL
NGALYFSEMD ADGGLSEYPT NKAGAKYGTG YCDSQCPRDI KFIEGKANVE GWTPSSTSPN AGTGGTGICC NEMDIWEANS
ISEALTPHPC TAQGGTACTG DSCSSPNSTA GICDQAGCDF NSFRMGDTSF YGPGLTVDTT SKITVVTQFI TSDNTTTGDL
TAIRRIYVQN GQVIQNSMSN IAGVTPTNEI TTDFCDQQKT AFGDTNTFSE KGGLTGMGAA FSRGMVLVLS IWDDDAAEML
WLDSTYPVGK TGPGAARGTC ATTSGQPDQV ETQSPNAQVV FSNIKFGAIG STFSSTGTGT GTGTGTGTGT GTTTSSAPAA
TQTKYGQCGG QGWTGATVCA SGSTCTSSGP YYSQCL
MFRTAALTAF TFAAVVLGQQ VGTLTTENHP ALSIQQCTAT GCTTQQKSVV LDSNWRWTHS TAGATNCYTG NAWDPALCPD
PATCATNCAI DGADYSGTYG ITTSGNALTL RFVTNGQYSQ NIGSRVYLLD DADHYKLFDL KNQEFTFDVD MSGLPCGLNG
ALYFSEMAAD GGKAAHAGNN AGAKYGTGYC DAQCPHDIKW INGEANVLDW SASATDDNAG NGRYGACCAE MDIWEANSEA
TAYTPHVCRD EGLYRCSGTE CGDGNNRYGG VCDKDGCDFN SYRMGDKNFL GRGKTIDTTK KVTVVTQFIT DNNTPTGNLV
EIRRVYVQNG VVYQNSFSTF PSLSQYNSIS DEFCVAQKTL FGDNQYYNTH GGTTKMGDAF DNGMVLIMSL WSDHAAHMLW
LDSDYPLDKS PSEPGVSRGA CPTSSGDPDD VVANHPNASV TFSNIKYGPI GSTFGGSTPP VSSGGSSVPP VTSTTSSGTT
TPTGPTGTVP KWGQCGGIGY SGPTACVAGS TCTYSNDWYS QCL
MYRAIATASA LIAAVRAQQV CSLTPETKPA LSWSKCTSSG CSNVQGSVTI DANWRWTHQL SGSTNCYTGN KWDTSICTSG
KVCAEKCCID GAEYASTYGI TSSGNQLSLS FVTKGTYGTN IGSRTYLMED ENTYQMFQLL GNEFTFDVDV SNIGCGLNGA
LYFVSMDADG GKAKYPGNKA GAKYGTGYCD AQCPRDVKFI NGQANSDGWQ PSKSDVNGGI GNLGTCCPEM DIWEANSIST
AHTPHPCTKL TQHSCTGDSC GGTYSEDRYG GTCDADGCDF NAYRQGNKTF YGPGSGFNVD TTKKVTVVTQ FHKGSNGRLS
EITRLYVQNG KVIANSESKI AGVPGSSLTP EFCTAQKKVF GDIDDFEKKG AWGGMSDALE APMVLVMSLW HDHHSNMLWL
DSTYPTDSTK LGAQRGSCST SSGVPADLEK NVPNSKVAFS NIKFGPIGST YKEGQPEPTN PTNPNPTTPG GTVDQWGQCG
GTNYSGPTAC KSPFTCKKIN DFYSQCQ
MFRTATLLAF TMAAMVFGQQ VGTNTAENHR TLTSQKCTKS GGCSNLNTKI VLDANWRWLH STSGYTNCYT GNQWDATLCP
DGKTCAANCA LDGADYTGTY GITASGSSLK LQFVTGSNVG SRVYLMADDT HYQMFQLLNQ EFTFDVDMSN LPCGLNGALY
LSAMDADGGM AKYPTNKAGA KYGTGYCDSQ CPRDIKFING EANVEGWNAT SANAGTGNYG TCCTEMDIWE ANNDAAAYTP
HPCTTNAQTR CSGSDCTRDT GLCDADGCDF NSFRMGDQTF LGKGLTVDTS KPFTVVTQFI TNDGTSAGTL TEIRRLYVQN
GKVIQNSSVK IPGIDLVNSI TDNFCSQQKT AFGDTNYFAQ HGGLKQVGEA LRTGMVLALS IWDDYAANML WLDSNYPTNK
DPSTPGVARG TCATTSGVPA QIEAQSPNAY VVFSNIKFGD LNTTYTGTVS SSSVSSSHSS TSTSSSHSSS STPPTQPTGV
TVPQWGQCGG IGYTGSTTCA SPYTCHVLNP YYSQCY
MYQTSLLASL SFLLATSQAQ QVGTQTAETH PKLTTQKCTT AGGCTDQSTS IVLDANWRWL HTVDGYTNCY TGQEWDTSIC
TDGKTCAEKC ALDGADYEST YGISTSGNAL TMNFVTKSSQ TNIGGRVYLL AADSDDTYEL FKLKNQEFTF DVDVSNLPCG
LNGALYFSEM DSDGGLSKYT TNKAGAKYGT GYCDTQCPHD IKFINGEANV QNWTASSTDK NAGTGHYGSC CNEMDIWEAN
SQATAFTPHV CEAKVEGQYR CEGTECGDGD NRYGGVCDKD GCDFNSYRMG NETFYGSNGS TIDTTKKFTV VTQFITADNT
ATGALTEIRR KYVQNDVVIE NSYADYETLS KFNSITDDFC AAQKTLSGDT NDFKTKGGIA RMGESFERGM VLVMSVWDDH
AANALWLDSS YPTDADASKP GVKRGPCSTS SGVPSDVEAN DADSSVIYSN IRYGDIGSTF NKTA
MFSKVALTAL CFLAVAQAQQ VGREVAENHP RLPWQRCTRN GGCQTVSNGQ VVLDANWRWL HVTDGYTNCY TGNAWNSSVC
SDGATCAQRC ALEGANYQQT YGITTSGDAL TIKFLTRSEQ TNIGARVYLM ENEDRYQMFN LLNKEFTFDV DVSKVPCGIN
GALYFIQMDA DGGLSSQPNN RAGAKYGTGY CDSQCPRDIK FINGEANSVG WEPSETDPNA GKGQYGICCA EMDIWEANSI
SNAYTPHPCQ TVNDGGYQRC QGRDCNQPRY EGLCDPDGCD YNPFRMGNKD FYGPGKTVDT NRKMTVVTQF ITHDNTDTGT
LVDIRRLYVQ DGRVIANPPT NFPGLMPAHD SITQEFCDDA KRAFEDNDSF GRNGGLAHMG RSLAKGHVLA LSIWNDHTAH
MLWLDSNYPT DADPNKPGIA RGTCPTTGGS PRDTEQNHPD AQVIFSNIKF GDIGSTFSGN
MYRKLAVISA FLAAARAQQV CTQQAETHPP LTWQKCTASG CTPQQGSVVL DANWRWTHDT KSTTNCYDGN TWSSTLCPDD
ATCAKNCCLD GANYSGTYGV TTSGDALTLQ FVTASNVGSR LYLMANDSTY QEFTLSGNEF SFDVDVSQLP CGLNGALYFV
SMDADGGQSK YPGNAAGAKY GTGYCDSQCP RDLKFINGQA NVEGWEPSSN NANTGVGGHG SCCSEMDIWE ANSISEALTP
HPCETVGQTM CSGDSCGGTY SNDRYGGTCD PDGCDWNPYR LGNTSFYGPG SSFALDTTKK LTVVTQFATD GSISRYYVQN
GVKFQQPNAQ VGSYSGNTIN TDYCAAEQTA FGGTSFTDKG GLAQINKAFQ GGMVLVMSLW DDYAVNMLWL DSTYPTNATA
STPGAKRGSC STSSGVPAQV EAQSPNSKVI YSNIRFGPIG STGGNTGSNP PGTSTTRAPP SSTGSSPTAT QTHYGQCGGT
GWTGPTRCAS GYTCQVLNPF YSQCL
MRASLLAFSL AAAVAGGQQA GTLTAKRHPS LTWQKCTRGG CPTLNTTMVL DANWRWTHAT SGSTKCYTGN KWQATLCPDG
KSCAANCALD GADYTGTYGI TGSGWSLTLQ FVTDNVGARA YLMADDTQYQ MLELLNQELW FDVDMSNIPC GLNGALYLSA
MDADGGMRKY PTNKAGAKYA TGYCDAQCPR DLKYINGIAN VEGWTPSTND ANGIGDHGSC CSEMDIWEAN KVSTAFTPHP
CTTIEQHMCE GDSCGGTYSD DRYGVLCDAD GCDFNSYRMG NTTFYGEGKT VDTSSKFTVV TQFIKDSAGD LAEIKAFYVQ
NGKVIENSQS NVDGVSGNSI TQSFCKSQKT AFGDIDDFNK KGGLKQMGKA LAQAMVLVMS IWDDHAANML WLDSTYPVPK
VPGAYRGSGP TTSGVPAEVD ANAPNSKVAF SNIKFGHLGI SPFSGGSSGT PPSNPSSSAS PTSSTAKPSS TSTASNPSGT
GAAHWAQCGG IGFSGPTTCP EPYTCAKDHD IYSQCV
MLASTFSYRM YKTALILAAL LGSGQAQQVG TSQAEVHPSM TWQSCTAGGS CTTNNGKVVI DANWRWVHKV GDYTNCYTGN
TWDKTLCPDD ATCASNCALE GANYQSTYGA TTSGDSLRLN FVTTSQQKNI GSRLYMMKDD TTYEMFKLLN QEFTFDVDVS
NLPCGLNGAL YFVAMDADGG MSKYPTNKAG AKYGTGYCDS QCPRDLKFIN GQANVEGWQP SSNDANAGTG NHGSCCAEMD
IWEANSISTA FTPHPCDTPG QVMCTGDACG GTYSSDRYGG TCDPDGCDFN SFRQGNKTFY GPGMTVDTKS KFTVVTQFIT
DDGTASGTLK EIKRFYVQNG KVIPNSESTW SGVGGNSITN DYCTAQKSLF KDQNVFAKHG GMEGMGAALA QGMVLVMSLW
DDHAANMLWL DSNYPTTASS STPGVARGTC DISSGVPADV EANHPDASVV YSNIKVGPIG STFNSGGSNP GGGTTTTAKP
TTTTTTAGSP GGTGVAQHYG QCGGNGWQGP TTCASPYTCQ KLNDFYSQCL
MQIKQYLQYL AAALPLVNMA AAQRAGTQQT ETHPRLSWKR CSSGGNCQTV NAEIVIDANW RWLHDSNYQN CYDGNRWTSA
CSSATDCAQK CYLEGANYGS TYGVSTSGDA LTLKFVTKHE YGTNIGSRVY LMNGSDKYQM FTLMNNEFAF DVDLSKVECG
LNSALYFVAM EEDGGMRSYS SNKAGAKYGT GYCDAQCARD LKFVGGKANI EGWRPSTNDA NAGVGPYGAC CAEIDVWESN
AYAFAFTPHG CLNNNYHVCE TSNCGGTYSE DRFGGLCDAN GCDYNPYRMG NKDFYGKGKT VDTSRKFTVV TRFEENKLTQ
FFIQDGRKID IPPPTWPGLP NSSAITPELC TNLSKVFDDR DRYEETGGFR TINEALRIPM VLVMSIWDGH YASMLWLDSV
YPPEKAGQPG AERGPCAPTS GVPAEVEAQF PNAQVIWSNI RFGPIGSTYQ V
MTSRIALVSL FAAVYGQQVG TYQTETHPSL TWQSCTAKGS CTTNTGSIVL DGNWRWTHGV GTSTNCYTGN TWDATLCPDD
ATCAQNCALE GADYSGTYGI TTSGNSLRLN FVTQSANKNI GSRVYLMADT THYKTFNLLN QEFTFDVDVS NLPCGLNGAV
YFANLPADGG ISSTNTAGAE YGTGYCDSQC PRDMKFIKGQ ANVDGWVPSS NNANTGVGNH GSCCAEMDIW EANSISTAVT
PHSCDTVTQT VCTGDDCGGT YSSSRYAGTC DPDGCDFNSY RMGDETFYGP GKTVDTNSVF TVVTQFLTTD GTASGTLNEI
KRFYVQDGKV IPNSYSTISG VSGNSITTPF CDAQKTAFGD PTSFSDHGGL ASMSAAFEAG MVLVLSLWDD YYANMLWLDS
TYPVGKTSAG GPRGTCDTSS GVPASVEASS PNAYVVYSNI KVGAINSTYG
MFVFVLLWLT QSLGTGTNQA ENHPSLSWQN CRSGGSCTQT SGSVVLDSNW RWTHDSSLTN CYDGNEWSSS LCPDPKTCSD
NCLIDGADYS GTYGITSSGN SLKLVFVTNG PYSTNIGSRV YLLKDESHYQ IFDLKNKEFT FTVDDSNLDC GLNGALYFVS
MDEDGGTSRF SSNKAGAKYG TGYCDAQCPH DIKFINGEAN VENWKPQTND ENAGNGRYGA CCTEMDIWEA NKYATAYTPH
ICTVNGEYRC DGSECGDTDS GNRYGGVCDK DGCDFNSYRM GNTSFWGPGL IIDTGKPVTV VTQFVTKDGT DNGQLSEIRR
KYVQGGKVIE NTVVNIAGMS SGNSITDDFC NEQKSAFGDT NDFEKKGGLS GLGKAFDYGM VLVLSLWDDH QVNMLWLDSI
YPTDQPASQP GVKRGPCATS SGAPSDVESQ HPDSSVTFSD IRFGPIDSTY
MFRKAALLAF SFLAIAHGQQ VGTNQAENHP SLPSQKCTAS GCTTSSTSVV LDANWRWVHT TTGYTNCYTG QTWDASICPD
GVTCAKACAL DGADYSGTYG ITTSGNALTL QFVKGTNVGS RVYLLQDASN YQMFQLINQE FTFDVDMSNL PCGLNGAVYL
SQMDQDGGVS RFPTNTAGAK YGTGYCDSQC PRDIKFINGE ANVEGWTGSS TDSNSGTGNY GTCCSEMDIW EANSVAAAYT
PHPCSVNQQT RCTGADCGQG DDRYDGVCDP DGCDFNSFRM GDQTFLGKGL TVDTSRKFTI VTQFISDDGT TSGNLAEIRR
FYVQDGNVIP NSKVSIAGID AVNSITDDFC TQQKTAFGDT NRFAAQGGLK QMGAALKSGM VLALSLWDDH AANMLWLDSD
YPTTADASNP GVARGTCPTT SGFPRDVESQ SGSATVTYSN IKWGDLNSTF TGTLTTPSGS SSPSSPASTS GSSTSASSSA
SVPTQSGTVA QWAQCGGIGY SGATTCVSPY TCHVVNAYYS QCY
MYRAIATASA LIAAARAQQV CTLTTETKPA LTWSKCTSSG CTDVKGSVGI DANWRWTHQT SSSTNCYTGN KWDTSVCTSG
ETCAQKCCLD GADYAGTYGI TSSGNQLSLG FVTKGSFSTN IGSRTYLMEN ENTYQMFQLL GNEFTFDVDV SNIGCGLNGA
LYFVSMDADG GKARYPANKA GAKYGTGYCD AQCPRDVKFI NGKANSDGWK PSDSDINAGI GNMGTCCPEM DIWEANSIST
AFTPHPCTKL TQHACTGDSC GGTYSNDRYG GTCDADGCDF NSYRQGNKTF YGRGSDFNVD TTKKVTVVTQ FKKGSNGRLS
EITRLYVQNG KVIANSESKI PGNSGSSLTA DFCSKQKSVF GDIDDFSKKG GWSGMSDALE SPPMVLVMSL WHDHHSNMLW
LDSTYPTDST KLGAQRGSCA TTSGVPSDLE RDVPNSKVSF SNIKFGPIGS TYSSGTTNPP PSSTDTSTTP TNPPTGGTVG
QYGQCGGQTY TGPKDCKSPY TCKKINDFYS QCQ
MSSFQIYRAA LLLSILATAN AQQVGTYTTE THPSLTWQTC TSDGSCTTND GEVVIDANWR WVHSTSSATN CYTGNEWDTS
ICTDDVTCAA NCALDGATYE ATYGVTTSGS ELRLNFVTQG SSKNIGSRLY LMSDDSNYEL FKLLGQEFTF DVDVSNLPCG
LNGALYFVAM DADGGTSEYS GNKAGAKYGT GYCDSQCPRD LKFINGEANC DGWEPSSNNV NTGVGDHGSC CAEMDVWEAN
SISNAFTAHP CDSVSQTMCD GDSCGGTYSA SGDRYSGTCD PDGCDYNPYR LGNTDFYGPG LTVDTNSPFT VVTQFITDDG
TSSGTLTEIK RLYVQNGEVI ANGASTYSSV NGSSITSAFC ESEKTLFGDE NVFDKHGGLE GMGEAMAKGM VLVLSLWDDY
AADMLWLDSD YPVNSSASTP GVARGTCSTD SGVPATVEAE SPNAYVTYSN IKFGPIGSTY SSGSSSGSGS SSSSSSTTTK
ATSTTLKTTS TTSSGSSSTS AAQAYGQCGG QGWTGPTTCV SGYTCTYENA YYSQCL
MHQRALLFSA LLTAVRAQQA GTLTEEVHPS LTWQKCTSEG SCTEQSGSVV IDSNWRWTHS VNDSTNCYTG NTWDATLCPD
DETCAANCAL DGADYESTYG VTTDGDSLTL KFVTGSNVGS RLYLMDTSDE GYQTFNLLDA EFTFDVDVSN LPCGLNGALY
FTAMDADGGV SKYPANKAGA KYGTGYCDSQ CPRDLKFIDG QANVDGWEPS SNNDNTGIGN HGSCCPEMDI WEANKISTAL
TPHPCDSSEQ TMCEGNDCGG TYSDDRYGGT CDPDGCDFNP YRMGNDSFYG PGKTIDTGSK MTVVTQFITD GSGSLSEIKR
YYVQNGNVIA NADSNISGVT GNSITTDFCT AQKKAFGDED IFAEHNGLAG ISDAMSSMVL ILSLWDDYYA SMEWLDSDYP
ENATATDPGV ARGTCDSESG VPATVEGAHP DSSVTFSNIK FGPINSTFSA SA
MYAKFATLAA LVAGAAAQNA CTLTAENHPS LTWSKCTSGG SCTSVQGSIT IDANWRWTHR TDSATNCYEG NKWDTSYCSD
GPSCASKCCI DGADYSSTYG ITTSGNSLNL KFVTKGQYST NIGSRTYLME SDTKYQMFQL LGNEFTFDVD VSNLGCGLNG
ALYFVSMDAD GGMSKYSGNK AGAKYGTGYC DSQCPRDLKF INGEANVENW QSSTNDANAG TGKYGSCCSE MDVWEANNMA
AAFTPHPCXV IGQSRCEGDS CGGTYSTDRY AGICDPDGCD FNSYRQGNKT FYGKGMTVDT TKKITVVTQF LKNSAGELSE
IKRFYVQNGK VIPNSESTIP GVEGNSITQD WCDRQKAAFG DVTDXQDKGG MVQMGKALAG PMVLVMSIWD DHAVNMLWLD
STWPIDGAGK PGAERGACPT TSGVPAEVEA EAPNSNVIFS NIRFGPIGST VSGLPDGGSG NPNPPVSSST PVPSSSTTSS
GSSGPTGGTG VAKHYEQCGG IGFTGPTQCE SPYTCTKLND WYSQCL
MYAKFATLAA LVAGASAQAV CSLTAETHPS LTWQKCTAPG SCTNVAGSIT IDANWRWTHQ TSSATNCYSG SKWDSSICTT
GTDCASKCCI DGAEYSSTYG ITTSGNALNL KFVTKGQYST NIGSRTYLME SDTKYQMFKL LGNEFTFDVD VSNLGCGLNG
ALYFVSMDAD GGMSKYSGNK AGAKYGTGYC DAQCPRDLKF INGEANVEGW ESSTNDANAG SGKYGSCCTE MDVWEANNMA
TAFTPHPCTT IGQTRCEGDT CGGTYSSDRY AGVCDPDGCD FNSYRQGNKT FYGKGMTVDT TKKITVVTQF LKNSAGELSE
IKRFYAQDGK VIPNSESTIA GIPGNSITKA YCDAQKTVFQ NTDDFTAKGG LVQMGKALAG DMVLVMSVWD DHAVNMLWLD
STYPTDQVGV AGAERGACPT TSGVPSDVEA NAPNSNVIFS NIRFGPIGST VQGLPSSGGT SSSSSAAPQS TSTKASTTTS
AVRTTSTATT KTTSSAPAQG TNTAKHWQQC GGNGWTGPTV CESPYKCTKQ NDWYSQCL
MLTLVYFLLS LVVSLEIGTQ QSEDHPKLTW QNGSSSVSGS IVLDSNWRWV HDSGTTNCYD GNLWSKDLCP SSDTCSQKCY
IEGADYSGTY GIQSSGSKLT LKFVTKGSYS TNIGSRVYLL KDENTYESFK LKNKEFTFTV DDSKLNCGLN GALYFVAMDA
DGGKAKYSSF KPGAKYGMGY CDAQCPHDMK FISGKANVDD WKPQDNDENS GNGKLGTCCS EMDIWEGNMK SQAYTVHACT
KSGQYECTGQ QCGDTDSGDR FKGTCDKDGC DYASWRWGDQ SFYGEGKTVD TKQPVTVVTQ FIGDPLTEIR RLYVQGGKTI
NNSKTSNLAD TYDSITDKFC DATKEASGDT NDFKAKGAMS GFSTNLNNGQ VLVMSLWDDH TANMLWLDST YPTDSSDSTA
QRGPCPTSSG VPKDVESQHG DATVVFSDIK FGAINSTFKY N
MLAAALFTFA CSVGVGTKTP ENHPKLNWQN CASKGSCSQV SGEVTMDSNW RWTHDGNGKN CYDGNTWISS LCPDDKTCSD
KCVLDGAEYQ ATYGIQSNGT ALTLKFVTHG SYSTNIGSRL YLLKDKSTYY VFKLNNKEFT FSVDVSKLPC GLNGALYFVE
MDADGGKAKY AGAKPGAEYG LGYCDAQCPS DLKFINGEAN SEGWKPQSGD KNAGNGKYGS CCSEMDVWES NSQATALTPH
VCKTTGQQRC SGKSECGGQD GQDRFAGLCD EDGCDFNNWR MGDKTFFGPG LIVDTKSPFV VVTQFYGSPV TEIRRKYVQN
GKVIENSKSN IPGIDATAAI SDHFCEQQKK AFGDTNDFKN KGGFAKLGQV FDRGMVLVLS LWDDHQVAML WLDSTYPTNK
DKSQPGVDRG PCPTSSGKPD DVESASADAT VVYGNIKFGA LDSTY
MLTLVYFLLS LVVSLEIGTQ QSEDHPKLTW QNGSSSVSGS IVLDSNWRWV HDSGTTNCYD GNLWSKDLCP SSNTCSQKCY
IEGADYSGTY GIQSSGSKLT LKFVTKGSYS TNIGSRVYLL KDENTYESFK LKNKEFTFTV DDSKLNCGLN GALYFVAMDA
DGGKAKYSSF KPGAKYGMGY CDAQCPHDMK FISGKANVDD WKPQDNDENS GNGKLGTCCS EMDIWEGNMK SQAYTVHACT
KSGQYECTGQ QCGDTDSGDR FKGTCDKDGC DYASWRWGDQ SFYGEGKTVD TKQPVTVVTQ FIGDPLTEIR RLYVQGGKTI
NNSKTSNLAD TYDSITDKFC DATKEASGDT NDFKAKGAMS GFSTNLNNGQ VLVMSLWDDH TANMLWLDST YPTDSTKTGA
SRGPCAVSSG VPKDVESQYG DATVIYSDIK FGAINSTFKW N
MILALLSLAK SLGIATNQAE THPKLTWTRY QSKGSGQTVN GEIVLDSNWR WTHHSGTNCY DGNTWSTSLC PDPTTCSNNC
DLDGADYPGT YGISTSGNSL KLGFVTHGSY STNIGSRVYL LRDSKNYEMF KLKNKEFTFT VDDSKLPCGL NGALYFVAMD
EDGGVSKNSI NKAGAQYGTG YCDAQCPHDM KFINGEANVL DWKPQSNDEN SGNGRYGACC TEMDIWEANS MATAYTPHVC
TVTGLRRCEG TECGDTDANQ RYNGICDKDG CDFNSYRLGD KTFFGVGKTV DSSKPVTVVT QFVTSNGQDS GTLSEIRRKY
VQGGKVIENS KVNIAGITAG NSVTDTFCNE QKKAFGDNND FEKKGGLGAL SKQLDAGMVL VLSLWDDHSV NMLWLDSTYP
TNAAAGALGT ERGACATSSG APSDVESQSP DATVTFSDIK FGPIDSTY
MLVIALILRG LSVGTGTQQS ETHPSLSWQQ TSKGGSGQSV SGSVVLDSNW RWTHTTDGTT NCYDGNEWSS DLCPDASTCS
SNCVLEGADY SGTYGITGSG SSLKLGFVTK GSYSTNIGSR VYLLGDESHY KLFKLENNEF TFTVDDSNLE CGLNGALYFV
AMDEDGGASK YSGAKPGAKY GMGYCDAQCP HDMKFINGDA NVEGWKPSDN DENAGTGKWG ACCTEMDIWE ANKYATAYTP
HICTKNGEYR CEGTDCGDTK DNNRYGGVCD KDGCDFNSWR MGNQSFWGPG LIIDTGKPVT VVTQFLADGG SLSEIRRKYV
QGGKVIENTV TKISGMDEFD SITDEFCNQQ KKAFRDTNDF EKKGGLKGLG TAVDAGVVLV LSLWDDHDVN MLWLDSIYPT
DSGSKAGADR GPCATSSGVP KDVESNYASA SVTFSDIKFG PIDSTY
MLLALFAFGK SLGIATNQAE NHPKLTWTRY QSKGSGQTVN GEIVLDSNWR WTHHSGTNCY DGNTWSTSLC PDPTTCSNNC
DLDGADYPGT YGISSSGNSL KLGFVTHGSY STNIGSRVYL LRDSKNYEMF KLKNKEFTFT VDDSKLPCGL NGALYFVAMD
EDGGVSKNSI NKAGAQYGTG YCDAQCPHDM KFINGEANVL DWKPQSNDEN SGNGRYGACC TEMDIWEANS MATAYTPHVC
TVTGIRRCEG TECGDTDANQ RYNGICDKDG CDFNSYRLGD KSFFGVGKTV DSSKPVTVVT QFVTSNGQDS GTLSEIRRKY
VQGGKVIENS KVNIAGMAAG NSITDTFCNE QKKAFGDNND FEKKGGLGAL SKQLDSGMVL VLSLWDDHSV NMLWLDSTYP
TNAAAGALGT ERGACATSSG APSDVESQSP DATVTFSDIK FGPIDSTY
MLASVVYLVS LVVSLEIGTQ QSEEHPKLTW QNGSSSVSGS IVLDSNWRWL HDSGTTNCYD GNLWSDDLCP NADTCSSKCY
IEGADYSGTY GITSSGSKVT LKFVTKGSYS TNIGSRIYLL KDENTYETFK LKNKEFTFTV DDSKLDCGLN GALYFVAMDA
DGGKAKYSSF KPGAKYGMGY CDAQCPHDMK FISGKANVDD WKPQDNDENS GDGKLGTCCS EMDIWEGNAK SQAYTVHACS
KSGQYECTGQ QCGDTDSGDR FKGTCDKDGC DYASWRWGDQ SFYGEGKTVD TKSPVTVVTQ FIGDPLTEIR RVYVQGGKTI
NNSKTSNLAD TYDSITDKFC DATKDATGDT NDFKAKGAMA GFSTNLNTAQ VLVSVHCGMI IQPICCGLIR RIQRIQQKQV
QAVDRVLCRR VFQRMLKASM VMLQSRTRTL SLELSTRPLV GISPAGRLFF F
MILALLVLGK SLGIATNQAE THPKLTWTRY QSKGSGSTVN GEIVLDSNWR WTHHSGTNCY DGNTWSTSLC PDPTTCSNNC
DLDGADYPGT YGISTSGNSL KLGFVTHGSY STNIGSRVYL LKDTKSYEMF KLKNKEFTFT VDDSKLPCGL NGALYFVAMD
EDGGVSKNSI NKAGAQYGTG YCDAQCPHDM KFINGEANVL DWKPQSNDEN SGNGRYGACC TEMDIWEANS MATAYTPHVC
TVTGLRRCEG TECGDTDNDQ RYNGICDKDG CDFNSYRLGD KSFFGVGKTV DSSKPVTVVT QFVTSNGQDS GTLSEIRRKY
VQGGKVIENS KVNVAGITAG NSVTDTFCNE QKKAFGDNND FEKKGGLGAL SKQLDAGMVL VLSLWDDHSV NMLWLDSTYP
TNAAAGALGT ERGACATSSG KPSDVESQSP DATVTFSDIK FGPIDSTY
MLCIGLISFV YSLGVGTNTA ETHPKLTWKN GGQTVNGEVT VDSNWRWTHT KGSTKNCYDG NLWSKDLCPD AATCGKNCVL
EGADYSGTYG VTSSGNALTL KFVTHGSYST NVGSRLYLLK DEKTYQMFNL NGKEFTFTVD VSNLPCGLNG ALYHVNMDED
GGTKRYPDNE AGAKYGTGYC DAQCPTDLKF INGIPNSDGW KPQSNDKNSG NGKYGSCCSE MDIWEANSIC SAVTPHVCDN
LQQTRCQGTA CGENGGGSRF GSSCDPDGCD FNSWRMGNKT FYGPGLIVDT KSKFTVVTQF VGNPVTEIKR KYVQNGKVIE
NSYSNIEGMD KFNSVSDKFC TAQKKAFGDT DSFTKHGGFK QLGSALAKGM VLVLSLWDDH TVNMLWLDSV YPTNSKKAGS
DRGPCPTTSG VPADVESKSA DANVIYSDIR FGAIDSTYK
MLGALVALAS CIGVGTNTPE KHPDLKWTNG GSSVSGSIVV DSNWRWTHIK GETKNCYDGN LWSDKYCPDA ATCGKNCVLE
GADYSGTYGV TTSGDAATLK FVTHGQYSTN VGSRLYLLKD EKTYQMFNLV GKEFTFTVDV SNLPCGLNGA LYFVQMDSDG
GMAKYPDNQA GAKYGTGYCD AQCPTDLKFI NGIPNSDGWK PQKNDKNSGN GKYGSCCSEM DIWEANSMAT AYTPHVCDKL
EQTRCSGSAC GQNGGGDRFS SSCDPDGCDF NSWRMGNKTF WGPGLIVDTK KPVQVVTQFV GSGGSVTEIK RKYVQGGKVI
DNSMTNIAAM SKQYNSVSDE FCQAQKKAFG DNDSFTKHGG FRQLGATLSK GHVLVLSLWD DHDVNMLWLD SVYPTNSNKP
GADRGPCKTS SGVPSDVESQ NADSTVKYSD IRFGAIDSTY SK
MLAAALFTFA CSVGVGTKTT ETHPKLNWQQ CACKGSCSQV SGEVTMDSNW RWTHDGNGKN CYDGNTWISS LCPDDKTCSD
KCVLDGAEYQ ATYGIQSNGT ALTPKFVTHG SYSTNIGSRL YLLKDKSTYY VFQLNNKEFT FSVDVSKLPC GLNGALYFVE
MDADGGKSKY AGAKPGAEYG LGYCDAQCPS DLKFINGEAN SEGWKPQSGD KNAGNGKYGS CCSEMDVWES NSMATALTPH
VCKTTGQTRC SGKSECGGQD GQDRFAGNCD EDGCDFNNWR MGDKTFFGPG LTVDTKSPFV VVTQFYGSPV TEIRRKYVQN
GKVIENAKSN IPGIDATNAI SDTFCEQQKK AFGDTNDFKN KGGFTKLGSV FSRGMVLVLS LWDDHQVAML WLDSTYPTNK
DKSVPGVDRG PCPTSSGKPD DVESASGDAT VVYGNIKFGA LDSTY
MFGFLLSLFA LQFALEIGTQ TSESHPSITW ELNGARQSGQ IVIDSNWRWL HDSGTTNCYD GNTWSSDLCP DPEKCSQNCY
LEGADYSGTY GISASGSQLT LGFVTKGSYS TNIGSRVYLL KDENTYPMFK LKNKEFTFTV DVSNLPCGLN GALYFVAMPS
DGGKAKYPLA KPGAKYGMGY CDAQCPHDMK FINGEANVLD WKPQSNDENA GTGRYGTCCT EMDIWEANSQ ATAYTVHACS
KNARCEGTEC GDDSASQRYN GICDKDGCDF NSWRWGNKTF FGPGLTVDSS KPVTVVTQFI GDPLTEIRRI WVQGGKVIQN
SFTNVSGITS VDSITNTFCD ESKVATGDTN DFKAKGGMSG FSKALDTEVV LVLSLWDDHT ANMLWLDSTY PTDSTAIGAS
RGPCATSSGD PKDVESASAN ASVKFSDIKF GALDSTY
MLASLLPLSN SLGTASNQAE THPKLTWTQY TGKGAGQTVN GEIVLDSNWR WTHKDGTNCY DGNTWSSSLC PDPTTCSNNC
NLDGADYPGT YGITTSGNQL KLGFVTHGSY STNIGSRVYL LRDSKNYQMF KLKNKEFTFT VDDSKLPCGL NGAVYFVAMD
EDGGTAKHSI NKAGAQYGTG YCDAQCPHDM KFINGEANVL DWKPQSNDEN SGNGRWGARC TEMDIWEANS RATAYTPHIC
TKTGLYRCEG TECGDSDTNR YGGVCDKDGC DFNSYRMGDK SFFGQGKTVD SSKPVTVVTQ FITDNNQDSG KLTEIRRKYV
QGGKVIDNSK VNIAGITAGN PITDTFCDEA KKAFGDNNDF EKKGGLSALG TQLEAGFVLV LSLWDDHSVN MLWLDSTYPT
NASPGALGVE RGDCAITSGV PADVESQSAD ASVTFSDIKF GPIDSTY
MLCIGLISFV YSLGVGTNTA ETHPKLTWKN GGQTVNGEVT VDSNWRWTHT KGSTKNCYDG NLWSKDLCPD AATCGKNCVL
EGADYSGTYG VTSSGNALTL KFVTHGSYST NVGSRLYLLK DEKTYQMFNL NGKEFTFTVD VSNLPCGLSG ALYHVNMDED
GGTKRYPDNE AGAKYGTGYC DAQCPTDLKF INGIPNSDGW KPQSNDKNSG NGKYGSCCSE MDIWEANSIC SAVTPHVCDN
LQQTRCQGAA CGENGGGSRF GSSCDPDGCD FNSWGMGNKT FYGPGLIVDT KSKFTVVTQF VGNPVTEIKR KYVQNGKVIE
NSYSNIEGMD KFNSVSDKFC TAQKKAFGDT DSFTKHGGFK QLGSALAKGM VLVLSLWDDH TVNMLWLDSV YPTNSKKAGS
DRGPCPTTSG VPADVESKSA DANVIYSDIR FGAIDSTYK
MILALLVLGK SLGIATNQAE THPKLTWTRY QSKGSGSTVN GEIVLDSNWR WTHHSGTNCY DGNTWSTSLC PDPTTCSNNC
DLDGADYPGT YGISTSGNSL KLGFVTHGSY STNIGSRVYL LRDSKNYEMF KLKNKEFTFT VDDSKLPCGL NGALYFVAMD
EDGGVSKNSI NKAGAQYGTG YCDAQCPHDM KFINGEANVL DWKPQSNDEN SGNGRYGACC TEMDIWEANS MATAYTPHVC
TVTGLRRCEG TECGDTDNDQ RYNGICDKDG CDFNSYRLGD KSFFGVGKTV DSSKPVTVVT QFVTSNGQDS GILSETRRKY
VQGGKVIENS KVNVAGITAG NSVTDTFCNE QKKAFGDNND FEKKGGLGAL SKQLDAGMVL VLSLWDDHSV NMLWLDSTYP
TNAAAGALGT ERGACATSSG KPSDVESQSP DATVTFSDIK FGPIDSTY
MIGIVLIQTV FGIGVGTQQS ESHPSLSWQQ CSKGGSCTSV SGSIVLDSNW RWTHIPDGTT NCYDGNEWSS DLCPDPTTCS
NNCVLEGADY SGTYGISTSG SSAKLGFVTK GSYSTNIGSR VYLLGDESHY KIFDLKNKEF TFTVDDSNLE CGLNGALYFV
AMDEDGGASR FTLAKPGAKY GTGYCDAQCP HDIKFINGEA NVQDWKPSDN DDNAGTGHYG ACCTEMDIWE ANKYATAYTP
HICTENGEYR CEGKSCGDSS DDRYGGVCDK DGCDFNSWRL GNQSFWGPGL IIDTGKPVTV VTQFVTKDGT DSGALSEIRR
KYVQGGKTIE NTVVKISGID EVDSITDEFC NQQKQAFGDT NDFEKKGGLS GLGKAFDYGV VLVLSLWDDH DVNMLWLDSV
YPTNPAGKAG ADRGPCATSS GDPKEVEDKY ASASVTFSDI KFGPIDSTY
MLVFGIVSFV YSIGVGTNTA ETHPKLTWKN GGSTTNGEVT VDSNWRWTHT KGSTKNCYDG NLWSKDLCPD AATCGKNCVL
EGADYSGTYG VTSSGDALTL KFVTHGSYST NVGSRLYLLK DEKTYQMFNL NGKEFTFTVD VSQLPCGLNG ALYFVCMDQD
GGMSRYPDNQ AGAKYGTGYC DAQCPTDLKF INGLPNSDGW KPQSNDKNSG NGKYGSCCSE MDIWEANSLA TAVTPHVCDQ
VGQTRCEGRA CGENGGGDRF GSICDPDGCD FNSWRMGNKT FWGPGLIIDT KKPVTVVTQF IGSPVTEIKR EYVQGGKVIE
NSYTNIEGMD KFNSISDKFC TAQKKAFGDN DSFTKHGGFS KLGQSFTKGQ VLVLSLWDDH TVNMLWLDSV YPTNSKKLGS
DRGPCPTSSG VPADVESKNA DSSVKYSDIR FGSIDSTYK
MLSFVFLLGF GVSLEIGTQQ SENHPTLSWQ QCTSSGSCTS QSGSIVLDSN WRWVHDSGTT NCYDGNEWSS DLCPDPETCS
KNCYLDGADY SGTYGITSNG SSLKLGFVTE GSYSTNIGSR VYLKKDTNTY QIFKLKNHEF TFTVDVSNLP CGLNGALYFV
EMEADGGKGK YPLAKPGAQY GMGYCDAQCP HDMKFINGNA NVLDWKPQET DENSGNGRYG TCCTEMDIWE ANSQATAYTP
HICTKDGQYQ CEGTECGDSD ANQRYNGVCD KDGCDFNSYR LGNKTFFGPG LIVDSKKPVT VVTQFITSNG QDSGDLTEIR
RIYVQGGKTI QNSFTNIAGL TSVDSITEAF CDESKDLFGD TNDFKAKGGF TAMGKSLDTG VVLVLSLWDD HSVNMLWLDS
TYPTDAAAGA LGTQRGPCAT SSGAPSDVES QSPDASVTFS DIKFGPLDST Y
MLTLVVYLLS LVVSLEIGTQ QSESHPALTW QREGSSASGS IVLDSNWRWV HDSGTTNCYD GNEWSTDLCP SSDTCTQKCY
IEGADYSGTY GITTSGSKLT LKFVTKGSYS TNIGSRVYLL KDENTYETFK LKNKEFTFTV DDSKLDCGLN GALYFVAMDA
DGGKQKYSSF KPGAKYGMGY CDAQCPHDMK FISGKANVED WKPQDNDENS GNGKLGTCCS EMDIWEGNAK SQAYTVHACT
KSGQYECTGT DCGDSDSRYQ GTCDKDGCDY ASYRWGDHSF YGEGKTVDTK QPITVVTQFI GDPLTEIRRL YIQGGKVINN
SKTQNLASVY DSITDAFCDA TKAASGDTND FKAKGAMAGF SKNLDTPQVL VLSLWDDHTA NMLWLDSTYP TDSRDATAER
GPCATSSGVP KDVESNQADA SVVFSDIKFG AINSTYSYN
MFGFLLSLFA LQFALEIGTQ TSESHPSITW ELNGARQSGQ IVIDSNWRWL HDSGTTNCYD GNTWSSDLCP DPEKCSQNCY
LEGADYSGTY GISASGSQLT LGFVTKGSYS TNIGSRVYLL KDENTYQMFK LKNKEFTFTV DVSNLPCGLN GALYFVAMPS
DGGKAKYPLA KPGAKYGMGY CDAQCPHDMK FINGEANVLD WKPQSNDENA GTGRYGTCCT EMDIWEANSQ ATAYTVHACS
KNARCEGTEC GDDSASQRYN GICDKDGCDF NSWRWGNKTF FGPGLTVDSS KPVTVVTQFI GDPLTEIRRI WVQGGKVIQN
SFTNVSGITS VDSITNTFCD ESKVATGDTN DFKAKGGMSG FSKALDTEVV LVLSLWDDHT ANMLWLDSTY PSNSTAIGAT
RGPCATSSGD PKNVESASAN ASVKFSDIKF GAFDSTY
MLALVYFLLS LVVSLEIGTQ QSEDHPKLTW QNGSSSVSGS IVLDSNWRWV HDSGTTNCYD GNLWSTDLCP SSDTCTSKCY
IEGADYSGTY GITSSGSKVT LKFVTKGSYS TNIGSRIYLL KDENTYETFK LKNKEFTFTV DDSQLNCGLN GALYFVAMDA
DGGKAKYSSF KPGAKYGMGY CDAQCPHDMK FISGKANVDD WKPQDNDENS GNGKLGTCCS EMDIWEGNAK SQAYTVHACT
KSGQYECTGQ QCGDTDSGDR FKGTCDKDGC DYASWRWGDQ SFYGEGKTVD TKQPVTVVTQ FIGDPLTEIR RLYVQGGKTI
NNSKTSNLAD TYDSITDKFC DATKEASGDT NDFKAKGAMS GFSTNLNTAQ VLVLSLWDDH TANMLWLDST YPTDSTKTGA
SRGPCAVTSG VPKDVESQYG SAQVVYSDIK FGAINSTY
MLALVYFLLS FVVSLEIGTQ QSEDHPKLTW QNGSSSVSGS IVLDSNWRWV HDSGTTNCYD GNLWSTDLCG SSDTCSSKCY
IEGADYSGTY GISASGSKLT LKFVTKGSYS TNIGSRVYLL KDENTYETFK LKGKEFTFTV DDSKLDCGLN GALYFVAMDA
DGGKAKYSSF KPGAKYGMGY CDAQCPHDMK FISGKANVDD WKPQDNDENS GNGKLGTCCS EMDIWEGNAK SQAYTVHACT
KSGQYECTGQ QCGDTDSGDR FKGTCDKDGC DYASWRWGDQ SFYGEGKTID TKQPVTVVTQ FIGDPLTEIR RVYVQGGKVI
NNSKTSNLAN VYDSITDKFC DDTKDATGDT NDFKAKGAMS GFSTNLNTAQ VLVMSLWDDH TANMLWLDST YPTDSTKTGA
SRGPCAVLSG VPKNVESQHG DATVIYSDIK FGAINSTFSY N
MFLALFVLGK SLGIATNQAE NHPKLTWTRY QSKGSGQTVN GEVVLDSNWR WTHHSGTNCY DGNTWSTSLC PDPQTCSSNC
DLDGADYPGT YGISSSGNSL KLGFVTHGSY STNIGSRVYL LRDSKNYEMF KLKNKEFTFT VDDSKLPCGL NGALYFVAME
EDGGVAKNSI NKAGAQYGTG YCDAQCPHDM KFINGEANVL DWKPQSNDEN SGNGRYGACC IEMDIWEANS MATAYTPHVC
TVTGIHRCEG TECGDTDANQ RYNGICDKDG CDFNSYRMGD KSFFGVGKTV DSSKPVTVVT QFVTSNGQDG GTLSEIKRKY
VQGGKVIENS KVNIAGITAV NSITDTFCNE QKKAFGDNND FEKKGGLGAL SKQLDLGMVL VLSLWDDHSV NMLWLDSTYP
TDAAAGALGT ERGACATSSG KPSDVESQSP DASVTFSDIK FGPIDSTY
MLLCLLSIAN SLGVGTNTAE NHPKLSWKNG GSSVSGSVTV DANWRWTHIK GETKNCYDGN LWSDKYCPDA ATCGKNCVIE
GADYQGTYGV SSSGDGLTLT FVTHGQYSTN VGSRLYLMKD EKTYQMFNLN GKEFTFTVDV SNLPCGLNGA LYFVQMDSDG
GMAKYPDNQA GAKYGTGYCD AQCPTDLKFI NGIPNSDGWK PQKNDKNSGN GKYGSCCSEM DIWEANSQAT AYTPHVCDKL
EQTRCSGSSC GHTGGGERFS SSCDPDGCDF NSWRMGNKTF WGPGLIVDTK KPVQVVTQFV GSGNSCTEIK RKYVQGGKVI
DNSMSNIAGM SKQYNSVSDD FCQAQKKAFG DNDSFTKHGG FRQLGATLGK GHVLVLSLWD DHDVNMLWLD SVYPTNSNKP
GSDRGPCKTS SGIPADVESQ AASSSVKYSD IRFGAIDSTY K
MLCIGLISFV YSLGVGTNTA ETHPKLTWKN GGQTVNGEVT VDSNWRWTHT KGSTKNCYDG NLWSKDLCPD AATCGKNCVL
EGADYSGTYG VTSSGNALTL KFVTHGSYST NVGSRLYLMK DEKTYQMFNL NGKEFTFTVD VSNLPCGLNG ALYHVNMDED
GGTKRYPDNE AGAKYGTGYC DAQCPTDLKF INGIPNSDGW KPQSNDKNSG NGKYGSCCSE MDIWEANSIC SAVTPHVCDT
LQQTRCQGTA CGENGGGSRF GSSCDPDGCD FNSWRMGNKT FYGPGLIVDT KSKFTVVTQF VGSPVTEIKR KYVQNGKVIE
NSFSNIEGMD KFNSISDKFC TAQKKAFGDT DSFTKHGGFK QLGSALAKGM VLVLSLWDDH TVNMLWLDSV YPTNSKKAGS
DRGPCPTTSG VPADVESKSA NANVIYSDIR FGAIDSTYK
MLLCLLGIAS SLDAGTNTAE NHPQLSWKNG GSSVSGSVTV DANWRWTHIK GETKNCYDGN LWSDKYCPDA ATCGQNCVIE
GADYQGTYGV SASGNALTLT FVTHGQYSTN VGSRLYLLKD EKTYQIFNLI GKEFTFTVDV SNLPCGLNGA LYFVQMDADG
GTAKYSDNKA GAKYGTGYCD AQCPTDLKFI NGIPNSDGWK PQKNDKNSGN GRYGSCCSEM DVWEANSLAT AYTPHVCDKL
EQVRCDGRAC GQNGGGDRFS SSCDPDGCDF NSWRLGNKTF WGPGLIVDTK QPVQVVTQWV GSGTSVTEIK RKYVQGGKVI
DNSFTKLDSL TKQYNSVSDE FCVAQKKAFG DNDSFTKHGG FRQLGATLAK GHVLVLSLWD DHDVNMLWLD SVYPTNSNKP
GADRGPCKTS SGVPADVESQ AASSSVKYSD IRFGAIDSTY K
MLGIGFVCIV YSLGVGTNTA ENHPKLTWKN SGSTTNGEVT VDSNWRWTHT KGTTKNCYDG NLWSKDLCPD AATCGKNCVL
EGADYSGTYG VTSSGDALTL KFVTHGSYST NVGSRLYLLK DEKTYQIFNL NGKEFTFTVD VSNLPCGLNG ALYFVNMDAD
GGTGRYPDNQ AGAKYGTGYC DAQCPTDLKF INGIPNSDGW KPQSNDKNSG NGKYGSCCSE MDIWEANSLA TAVTPHVCDQ
VGQTRCEGRA CGENGGGDRF GSSCDPDGCD FNSWRLGNKT FWGPGLIVDT KKPVTVVTQF VGSPVTEIKR KYVQGGKVIE
NSYTNIEGLD KFNSISDKFC TAQKKAFGDN DSFIKHGGFR QLGQSFTKGQ VLVLSLWDDH TVNMLWLDSV YPTNSKKPG
DRGPCPTSSG VPADVESKNA GSSVKYSDIR FGSIDSTYK
MATLVGILVS LFALEVALEI GTQTSESHPS LSWELNGQRQ TGSIVIDSNW RWLHDSGTTN CYDGNEWSSD LCPDPEKCSQ
NCYLEGADYS GTYGISSSGN SLQLGFVTKG SYSTNIGSRV YLLKDENTYA TFKLKNKEFT FTADVSNLPC GLNGALYFVA
MPADGGKSKY PLAKPGAKYG MGYCDAQCPH DMKFINGEAN ILDWKPSSND ENAGAGRYGT CCTEMDIWEA NSQATAYTVH
ACSKNARCEG TECGDDDGRY NGICDKDGCD FNSWRWGNKT FFGPNLIVDS SKPVTVVTQF IGDPLTEIRR IYVQGGKVIQ
NSFTNISGVA SVDSITDAFC NENKVATGDT NDFKAKGGMS GFSKALDTEV VLVLSLWDDH TANMLWLDST YPTDSSALGA
SRGPCAITSG EPKDVESASA NASVKFSDIK FGAIDSTY
MLTLVYFLLS LVVSLEIGTQ QSESHPQLSW QNGSSSVSGS IVLDSNWRWV HDSGTTNCYD GNLWSTDLCP SSDTCTSKCY
IEGADYSGTY GITSSGSKLT LKFVTKGSYS TNIGSRVYLL KDENTYETFK LKNKEFTFTV DDSKLDCGLN GALYFVAMDA
DGGKAKYSSF KPGAKYGMGY CDAQCPHDMK FISGKANVDD WKPQDNDENS GNGKLGTCCS EMDIWEGNAK SQAYTVHACT
KSGQYECTGQ QCGDTDSGDR FKGTCDKDGC DYASWRWGDQ SFYGEGKTVD TKQPLTVVTQ FVGDPLTEIR RVYVQGGKTI
NNSKTSNLAD TYDSITDKFC DATKEASGDT NDFKAKGAMS GFSTNLNTAQ VLVMSLWDDH TANMLWLDST YPTDSTKTGA
SRGPCAVSSG VPKDVESQHG DATVIYSDIK FGAINSTFKW N
MLSLVSIFLV GLGFSLGVGT QQSESHPSLS WQNCSAKGSC QSVSGSIVLD SNWRWLHDSG TTNCYDGNEW STDLCPDAST
CDKNCYIEGA DYSGTYGITS SGAQLKLGFV TKGSYSTNIG SRVYLLRDES HYQLFKLKNH EFTFTVDDSQ LPCGLNGALY
FVEMAEDGGA KPGAQYGMGY CDAQCPHDMK FITGEANVKD WKPQETDENA GNGHYGACCT EMDIWEANSQ ATAYTPHICS
KTGIYRCEGT ECGDNDANQR YNGVCDKDGC DFNSYRLGNK TFWGPGLTVD SNKAMIVVTQ FTTSNNQDSG ELSEIRRIYV
QGGKTIQNSD TNVQGITTTN KITQAFCDET KVTFGDTNDF KAKGGFSGLS KSLESGAVLV LSLWDDHSVN MLWLDSTYPT
DSAGKPGADR GPCAITSGDP KDVESQSPNA SVTFSDIKFG PIDSTY
MILALLVLGK SLGIATNQAE THPKLTWTRY QSKGSGSTVN GEIVLDSNWR WTHHSGTNCY DGNTWSTSLC PDPTTCSNNC
DLDGADYPGT YGISTSGNSL KLGFVTHGSY STNIGSRVYL LKDTKSYEMF KLKNKEFTFT VDDSKLPCGL NGALYFVAMD
EDGGVSKNSI NKAGAQYGTG YCDAQCPHDM KFINGEANVL DWKPQSNDEN SGNGRYGACC TEMDIWEANS MATAYTPHVC
TVTGLRRCEG TECGDTDNDQ RYNGICDKDG CDFNSYRLGD KSFFGVGKTV DSSKPVTVVT QFVTSNGQDS GTLSEIRRKY
VQGGKVIENS KVNVAGITAG NSVTDTFCNE QKKAFGDNND FEKKGGFGAL SKQLVAGMVL VLSLWDDHSV NMLWLDSTYP
TNAAAGALGT ERGACATSSG KPSDVESQSP DATVTFSDIK FGPIDSTY
MLCVGLFGLV YSIGVGTNTQ ETHPKLSWKQ CSSGGSCTTQ QGSVVIDSNW RWTHSTKDLT NCYDGNLWDS TLCPDGTTCS
KNCVLEGADY SGTYGITSSG DSLTLKFVTH GSYSTNVGSR LYLLKDDNNY QIFNLAGKEF TFTVDVSNLP CGLNGALYFV
EMDQDGGKGK HKENEAGAKY GTGYCDAQCP TDLKFIDGIA NSDGWKPQDN DENSGNGKYG SCCSEMDIWE ANSLATAYTP
HVCDTKGQKR CQGTACGENG GGDRFGSECD PDGCDFNSWR QGNKSFWGPG LIIDTKKSVQ VVTQFIGSGS SVTEIRRKYV
QNGKVIENSY STISGTEKYN SISDDYCNAQ KKAFGDTNSF ENHGGFKRFS QHIQDMVLVL SLWDDHTVNM LWLDSVYPTN
SNKPGADRGP CETSSGVPAD VESKSASASV KYSDIRFGPI DSTYK
MLLCLWSIAY SLGVGTNTAE NHPKLSWKNG GSSVSGSVTV DANWRWTHIK GETKNCYDGN LWSDKYCPDA ATCGKNCVIE
GADYQGTYGV SASGDGLTLT FVTHGQYSTN VGSRLYLMKD EKTYQIFNLN GKEFTFTVDV SNLPCGLNGA LYFVQMDSDG
GMAKYPDNQA GAKYGTGYCD AQCPTDLKFI NGIPNSDGWK PQKNDKNSGN GKYGSCCSEM DIWEANSQAT AYTPHVCDKL
EQTRCSGSAC GHTGGGERFS SSCDPDGCDF NSWRMGNKTF WGPGLIVDTK KPVQVVTQFV GSGNSCTEIK RKYVQGGKVI
DNSMSNIAGM TKQYNSVSDD FCQAQKKAFG DNDSFTKHGG FRQLGATLGK GHVLVLSLWD DHDVNMLWLD SVYPTNSNKP
GSDRGPCKTS SGIPADVESQ AASSSVKYSD IRFGAIDSTY K
SEQ ID NO: 299 QSACTLQSET HPPLTWQKCS SGGTCTQQTG SVVIDANWRW THATNSSTNC YDGNTWSSTL CPDNETCAKN CCLDGAAYAS
TYGVTTSGNS LSIGFVTQSA QKNVGARLYL MASDTTYQEF TLLGNEFSFD VDVSQLPCGL NGALYFVSMD ADGGVSKYPT
NTAGAKYGTG YCDSQCPRDL KFINGQANVE GWEPSSNNAN TGIGGHGSCC SEMDIWEANS ISEALTPHPC TTVGQEICEG
DGCGGTYSDN AYGGTCDPDG CDWNPYRLGN TSFYGPGSSF TLDTTKKLTV VTQFETSGAI NRYYVQNGVT FQQPNAELGS
YSGNELNDDY CTAEEAEFGG SSFSDKGGLT QFKKATSGGM VLVMSLWDDY YANMLWLDST YPTNETSSTP GAVRGSCSTS
SGVPAQVESQ SPNAKVTFSN IKFGPIGSTG NPSGGNPPGG NPPGTTTTRR PATTTGSSPG PTQSHYGQCG GIGYSGPTVC
ASGTTCQVLN PYYSQCL
SEQ ID NO: 300 QSACTLQSET HPPLTWQKCS SGGTCTQQTG SVVIDANWRW THATNSSTNC YDGNTWSSTL CPDNETCAKN CCLDGAAYAS
TYGVTTSGNS LSIGFVTQSA QKNVGARLYL MASDTTYQEF TLLGNEFSFD VDVSQLPCGL NGALYFVSMD ADGGVSKYPT
NTAGAKYGTG YCDSQCPRDL KFINGQANVE GWEPSSNNAN TGIGGHGSCC SEMDIWEANS ISEALTPHPC TTVGQEICEG
DGCGGTYSDN RYGGTCDPDG CDWNPYRLGN TSFYGPGSSF TLDTTKKLTV VTQFETSGAI NRYYVQNGVT FQQPNAELGS
YSGNELNDDY CTAEEAEFGG SSFSDKGGLT QFKKATSGGM VLVMSLWDDY YANMLWLDST YPTNETSSTP GAVAGSCSTS
SGVPAQVESQ SPNAKVTFSN IKFGPIGSTG NPSGGNPPGG NPPGTTTTRR PATTTGSSPG PTQSHYGQCG GIGYSGPTVC
ASGTTCQVLN PYYSQCL
SEQ ID NO: 301 MSALNSFNMY KSALILGSLL ATAGAQQIGT YTAETHPSLS WSTCKSGGSC TTNSGAITLD ANWRWVHGVN TSTNCYTGNT
WNTAICDTDA SCAQDCALDG ADYSGTYGIT TSGNSLRLNF VTGSNVGSRT YLMADNTHYQ IFDLLNQEFT FTVDVSHLPC
GLNGALYFVT MDADGGVSKY PNNKAGAQYG VGYCDSQCPR DLKFIAGQAN VEGWTPSSNN ANTGLGNHGA CCAELDIWEA
NSISEALTPH PCDTPGLSVC TTDACGGTYS SDKYAGTCDP DGCDFNPYRL GVTDFYGSGK TVDTTKPITV VTQFVTDDGT
STGTLSEIRR YYVQNGVVIP QPSSKISGVS GNVINSDFCD AEISTFGETA SFSKHGGLAK MGAGMEAGMV LVMSLWDDYS
VNMLWLDSTY PTNATGTPGA AKGSCPTTSG DPKTVESQSG SSYVTFSDIR VGPFNSTFSG GSSTGGSSTT TASGTTTTKA
SSTSTSSTST GTGVAAHWGQ CGGQGWTGPT TCASGTTCTV VNPYYSQCL
SEQ ID NO: 302 QQIGTYTAET HPSLSWSTCK SGGSCTTNSG AITLDANWRW VHGVNTSTNC YTGNTWNTAI CDTDASCAQD CALDGADYSG
TYGITTSGNS LRLNFVTGSN VGSRTYLMAD NTHYQIFDLL NQEFTFTVDV SHLPCGLNGA LYFVTMDADG GVSKYPNNKA
GAQYGVGYCD SQCPRDLKFI AGQANVEGWT PSSNNANTGL GNHGACCAEL DIWEANSISE ALTPHPCDTP GLSVCTTDAC
GGTYSSDKYA GTCDPDGCDF NPYRLGVTDF YGSGKTVDTT KPITVVTQFV TDDGTSTGTL SEIRRYYVQN GVVIPQPSSK
ISGVSGNVIN SDFCDAEIST FGETASFSKH GGLAKMGAGM EAGMVLVMSL WDDYSVNMLW LDSTYPTNAT GTPGAAKGSC
PTTSGDPKTV ESQSGSSYVT FSDIRVGPFN STFSGGSSTG GSSTTTASGT TTTKASSTST SSTSTGTGVA AHWGQCGGQG
WTGPTTCASG TTCTVVNPYY SQCL