Phenol-induced proteins of Thauera aromatica
This invention pertains to genes coding for phenol-induced proteins Five phenol-induced proteins isolated from Thauera aromatica. Three dominant phenol-induced proteins called F1, F2, and F3 respecitively were purified and sequenced to obtain the enzyme(s) that catalyze the 14CO2:4-hydroxybenzoate isotope exchange reaction and the carboxylation of phenylphosphate. The N-terminal amino acid sequences of these proteins as well as the N-terminus of the phenol-induced proteins (F4 and F5) were also determined.
[0001] This invention is in the field of molecular biology. More specifically, this invention pertains to nucleic acid fragments encoding phenol-induced proteins of the denitrifying bacterium Thauera aromatica.
BACKGROUND OF THE INVENTION[0002] Phenolic compounds are basic chemicals of high interest to the chemical and pharmaceutical industries. Phenolic compounds are important plant constituents and phenol is formed from a variety of natural and synthetic substrates by the activity of microorganisms. The aerobic metabolism of phenol has been studied extensively; in all aerobic metabolic pathways oxygenases initiate the degradation of phenol by hydroxylation to catechol. Catechol can be oxygenolytically cleaved by dioxygenases, either by ortho- or meta-cleavage.
[0003] Anaerobic metabolism of phenol, aniline, o-cresol (2-methylphenol), hydroquinone (1,4-dihydroxybenzene), catechol (1,2-dihydroxybenzene), naphthalene and phenanthrene (Zhang et al., App. Environ. Microbiol. 63:4759-4764 (1997)) by denitrifying and sulfate-reducing bacteria involves carboxylation of the aromatic ring ortho or para to the hydroxy or amino substituent. Products are 4-hydroxybenzoate, 4-aminobenzoate, 4-hydroxy-3-methylbenzoate, gentisate (2,5-dihydroxybenzoate), and protocatechuate (3,4-dihydroxybenzoate) (Heider et al., Eur. J. Biochem. 243:577-596 (1997)). Consortia of fermenting bacteria convert phenol to benzoate and decarboxylate 4-hydroxybenzoate to phenol (Winter et al., Appl. Microbiol. Biotechnol. 25:384-391 (1987); He et al., Eur. J. Biochem. 229:77-82 (1995); He et al., J. Bacteriol. 178:3539-3543 (1996); Van Schie et al., Appl. Environ. Microbiol. 64:2432-2438 (1998)). They also catalyze an isotope exchange between D2O and the proton at C4 of the aromatic ring of 4-hydroxybenzoate. Phenol carboxylation to 4-hydroxybenzoate in the denitrifying bacterium Thauera aromatica is the best studied of these carboxylation reactions and is a paradigm for this new type of carboxylation reaction (Tschech et al., Arch. Microbiol. 148:213-217 (1987); Lack et al., Eur. J. Biochem. 197:473-479 (1991); Lack et al., J. Bacteriol. 174:3629-3636 (1992); Lack et al., Arch. Microbiol. 161:132-139 (1994)).
[0004] Without an isolated gene and corresponding sequence of the coding sequence, there remains a need for a convenient way to produce various intermediates in phenol metabolism with a transformed microorganism.
SUMMARY OF THE INVENTION[0005] Five phenol-induced proteins from Thauera aromatica have been isolated. Three dominant phenol-induced proteins called F1, F2, and F3 were purified and sequenced in an attempt to purify the enzyme(s) that catalyze the 14CO2:4-hydroxybenzoate isotope exchange reaction and the carboxylation of phenylphosphate. The N-terminal amino acid sequences of these proteins as well as the N-terminus of the phenol-induced proteins F4 and F5 were determined. Internal sequences of F2 were obtained by trypsin digest. All of these sequences have application in industrial processes that involve the use of phenol or its intermediates. The instant invention provides a means to manipulate phenol metabolism and to produce various phenol intermediates in recombinant microorganisms. The approach is based on the observation that anoxic growth with phenol and nitrate induces novel proteins that are lacking in cells grown with 4-hydroxybenzoate and nitrate.
BRIEF DESCRIPTION OF THE SEQUENCE DESCRIPTIONS[0006] The following 44 sequence descriptions and sequence listings attached hereto comply with the rules governing nucleotide and/or amino acid sequence disclosures in patent applications as set forth in 37 C.F.R. §1.821-1.825 (“Requirements for Patent Applications contaning nucleotide sequences and/or Amino Acid Sequence Disclosure—the Sequence Rules”) and consistent with World Intellectual Property Organization (WIPO) Standard ST.25 (1998) and the sequence listing requirements of the EPO and PCT (Rules 5.2 and 4.95(a-bis) and Section 208 and Annex C of the Administrative Instructions). The Sequence Descriptions contain the one letter code for nucleotide sequence characters and the three letter codes for amino acids as defined in conformity with the IUPAC-IYUB standards described in Nucleic Acids Research 13:3021-3030 (1985) and in the Biochemical Journal 219(2):345-373 (1984) which are herein incorporated by reference. The symbols and format used for nucleotide and amino acid sequence data comply with the rules set forth in 37 C.F.R. §1.822. The present invention utilizes Wisconsin Package Version 9.0 software from Genetics Computer Group (GCG), Madison, Wis.
[0007] SEQ ID NO:1 is the deduced amino acid sequence of protein F1 and is coded by orf6.
[0008] SEQ ID NO:2 is the nucleotide sequence of orf6 that codes for protein F1.
[0009] SEQ ID NO:3 is the deduced amino acid sequence of protein F2 and is coded by orf4.
[0010] SEQ ID NO:4 is the nucleotide sequence of orf4 that codes for protein F2.
[0011] SEQ ID NO:5 is the deduced amino acid sequence of protein F3 and is coded by orf1.
[0012] SEQ ID NO:6 is the nucleotide sequence of orf1 that codes for protein F3.
[0013] SEQ ID NO:7 is the deduced amino acid sequence of protein F4 and is coded by orf5.
[0014] SEQ ID NO:8 is the nucleotide sequence of orf5 that codes for protein F4.
[0015] SEQ ID NO:9 is the deduced amino acid sequence of protein F5 and is coded by orf8.
[0016] SEQ ID NO:10 is the nucleotide sequence of orf8 that codes for protein F5.
[0017] SEQ ID NO:11 is the deduced amino acid sequence of orf2.
[0018] SEQ ID NO:12 is the nucleotide sequence of orf2 that codes for an unknown protein.
[0019] SEQ ID NO:13 is the deduced amino acid sequence of orf3.
[0020] SEQ ID NO:14 is the nucleotide sequence of orf3 that codes for an unknown protein.
[0021] SEQ ID NO:15 is the deduced amino acid sequence of orf7.
[0022] SEQ ID NO:16 is the nucleotide sequence of orf7 that codes for an unknown protein.
[0023] SEQ ID NO:17 is the deduced amino acid sequence of orf9.
[0024] SEQ ID NO:18 is the nucleotide sequence of orf9 that codes for an unknown protein.
[0025] SEQ ID NO:19 is the deduced amino acid sequence of orf10.
[0026] SEQ ID NO:20 is the nucleotide sequence of orf10 that codes for an unknown protein.
[0027] SEQ ID NO:21 is the deduced amino acid sequence of orf-1.
[0028] SEQ ID NO:22 is the nucleotide sequence of orf-1 that codes for an unknown protein.
[0029] SEQ ID NO:23 is the nucleotide sequence containing two gene clusters that are involved in phenol metabolism.
[0030] SEQ ID NO:24 is the N-terminal amino acid sequence of F1 (experimentally determined).
[0031] SEQ ID NO:25 is the N-terminal amino acid sequence of F1 (deduced from the genes).
[0032] SEQ ID NO:26 is the N-terminal amino acid sequence of F2 (experimentally determined).
[0033] SEQ ID NO:27 is the N-terminal amino acid sequence of F2 (deduced from the genes).
[0034] SEQ ID NO:28 is the N-terminal amino acid sequence of F3 (experimentally determined).
[0035] SEQ ID NO:29 is the N-terminal amino acid sequence of F3 (deduced from the genes).
[0036] SEQ ID NO:30 is the amino acid sequence of an internal fragment of F2 that was obtained by trypsin-digest.
[0037] SEQ ID NO:31 is the amino acid sequence of an internal fragment of F2 that was obtained by trypsin-digest.
[0038] SEQ ID NO:32 is the primer of F2-forward (N-terminus).
[0039] SEQ ID NO:33 is the primer of F2T6-reverse.
[0040] SEQ ID NO:34 is the primer of F2T43-reverse.
[0041] SEQ ID NO:35 is the primer T7.
[0042] SEQ ID NO:36 is the primer T3.
[0043] SEQ ID NO:37 is the primer designated breib31.
[0044] SEQ ID NO:38 is the primer designated breib07r3.
[0045] SEQ ID NO:39 is the primer of &lgr;15-forward.
[0046] SEQ ID NO:40 is the primer of &lgr;15-reverse.
[0047] SEQ ID NO:41 is the N-terminal amino acid sequence of F4 (experimentally determined).
[0048] SEQ ID NO:42 is the N-terminal amino acid sequence of F4 (deduced from the genes).
[0049] SEQ ID NO:43 is the N-terminal amino acid sequence of F5 (experimentally determined).
[0050] SEQ ID NO:44 is the N-terminal amino acid sequence of F5 (deduced from the genes).
BRIEF DESCRIPTION OF THE DRAWINGS[0051] FIG. 1 shows phenol metabolism in Thauera aromatica. The enzymes active in this pathway are Phenylphosphate synthase E1); Phenylphosphate carboxylase (Mn2+, K+)(E2); 4-Hydroxybenzoate-CoA Ligase (E3); 4-Hydroxybenzoyl-CoA reductase (Mo, FAD, Fe/S) (E4); Benzoyl-CoA reductase (Fe/S, FAD) (E5).
[0052] FIG. 2 shows SDS-PAGE (12.5%) with fractions after chromatography of the soluble fraction of K172 (grown anaerobically on phenol) on DEAE sepharose fast flow. See Example 4.
[0053] FIG. 3 shows clone 8 (pKSBam2.7). See Example 8.
[0054] FIG. 4 shows clone 9 (pKSEco5.25). See Example 8.
[0055] FIG. 5 shows clone 19 (pKSBam4). See Example 8.
[0056] FIG. 6 shows clone 2 (pKSBam9).
[0057] FIG. 7 shows clone 7 (pKSPst3.7). See Example 8.
[0058] FIG. 8 shows phagemid-vector—clone 1 (pBK-CMV).
[0059] FIG. 9 shows the expression of F1-F5 in E. coli. See Example 9.
[0060] FIG. 10 shows the two dimensional gel electrophoresis of 100 000× g supernatant of Thauera aromatica anaerobically grown on 4-hydroxybenzoate (A) and phenol (B), respectively. Phenol-induced proteins are indicated by triangulars.
[0061] FIG. 11 shows the organization of the genes possibly involved in anaerobic phenol metabolism of Thauera aromatica and their homologies to known proteins.
[0062] FIG. 12 shows the map of the orientation of the clones in the whole sequence of 14272 bp.
[0063] FIG. 13 shows the organization of the genes, with restriction sites, involved in phenol metabolism of Thauera aromatica.
DETAILED DESCRIPTION OF THE INVENTION[0064] Applicants have succeeded in identifying the genes coding for phenol-induced proteins. Five phenol-induced proteins from Thauera aromatica have been isolated. Three dominant phenol-induced proteins called F1, F2, and F3 were purified and sequenced to obtain the enzyme(s) that catalyze the 14CO2:4-hydroxybenzoate isotope exchange reaction and the carboxylation of phenylphosphate. The N-terminal amino acid sequences of these proteins as well as the N-terminus of the phenol-induced proteins F4 and F5 were determined. Internal sequences of F2 were obtained by trypsin digest. All of these sequences have utility in industrial processes. The instant invention provides a means to manipulate phenol metabolism and specifically the carboxylation of phenyl phosphate. Transformation of host cells with at least one copy of the identified genes under the control of appropriate promoters will provide the ability to produce various intermediates in phenol metabolism. The approach is based on the observation that anoxic growth with phenol and nitrate induces novel proteins that are lacking in cells grown with 4-hydroxybenzoate and nitrate.
[0065] The following definitions are provided for the full understanding of terms and abbreviations used in this specification.
[0066] The abbreviations in the specification correspond to units of measure, techniques, properties, or compounds as follows: “sec” means second(s), “min” means minute(s), “h” means hour(s), “d” means day(s), “L” means microliter, “mL” means milliliters, “L” means liters, “mM” means millimolar, “M” means molar, “mmol” means millimole(s), “Ampr” means ampicillin resistance, “Amps” means ampicillin sensitivity, “kb” means kilo base, “kd” means kilodaltons, “nm” means nanometers, and “wt” means weight. “ORF” means “open reading frame, “PCR” means polymerase chain reaction, “HPLC” means high performance liquid chromatography, “ca” means approximately, “dcw” means dry cell weight, “O.D.” means optical density at the designated wavelength, “IU” means International Units.
[0067] “Polymerase chain reaction” is abbreviated PCR.
[0068] “Open reading frame” is abbreviated ORF.
[0069] “Sample channels ratio” is abbreviated SCR.
[0070] “High performance liquid chromatography” is abbreviated HPLC.
[0071] The term “F1” refers to the protein encoded by orf6.
[0072] The term “F2” refers to the protein encoded by orf4.
[0073] The term “F3” refers to the protein encoded by orf1.
[0074] The term “F4” refers to the protein encoded by orf5.
[0075] The term “F5” refers to the protein encoded by orf8.
[0076] The term “E1” refers to phenol phosphorylating, phenol kinase or phenylphosphate synthase. Phenol phosphorylating and phenol kinase are used interchangeably by those skilled in the art.
[0077] The term “E2” refers to phenylphosphate carboxylase.
[0078] The terms “isolated nucleic acid fragment” or “isolated nucleic acid molecule” refer to a polymer of mononucleotides (RNA or DNA) that is single- or double-stranded, optionally containing synthetic, non-natural or altered nucleotide bases. An isolated nucleic acid fragment or an isolated nucleic acid molecule in the form of a polymer of mononucleotides may be comprised of one or more segments of cDNA, genomic DNA, or synthetic DNA.
[0079] The terms “host cell” and “host microorganism” refer to a cell capable of receiving foreign or heterologous genes and expressing those genes to produce an active gene product. The term “suitable host cells” encompasses microorganisms such as bacteria and fungi, and also includes plant cells.
[0080] The term “fragment” refers to a DNA or amino acid sequence comprising a subsequence of the nucleic acid sequence or protein of the instant invention. However, an active fragment of the instant invention comprises a sufficient portion of the protein to maintain activity.
[0081] The term “gene cluster” refers to genes organized in a single expression unit or in close proximity to each other on the chromosome.
[0082] The term “substantially similar” refers to nucleic acid fragments wherein changes in one or more nucleotide bases result in substitution of one or more amino acids, but do not affect the functional properties of the protein encoded by the DNA sequence. “Substantially similar” also refers to nucleic acid fragments wherein changes in one or more nucleotide bases do not affect the ability of the nucleic acid fragment to mediate alteration of gene expression by antisense or co-suppression technology. “Substantially similar” also refers to modifications of the nucleic acid fragments of the instant invention such as deletion or insertion of one or more nucleotide bases that do not substantially affect the functional properties of the resulting transcript vis-à-vis the ability to mediate alteration of gene expression by antisense or co-suppression technology or alteration of the functional properties of the resulting protein molecule. It is therefore understood that the invention encompasses more than the specific exemplary sequences.
[0083] For example, it is well known in the art that alterations in a gene which result in the production of a chemically equivalent amino acid at a given site, and yet do not effect the functional properties of the encoded protein, are common. For example, a codon for the amino acid alanine, a hydrophobic amino acid, may be substituted by a codon encoding another less hydrophobic residue (such as glycine) or a more hydrophobic residue (such as valine, leucine, or isoleucine). Similarly, changes which result in substitution of one negatively charged residue for another (such as aspartic acid for glutamic acid) or one positively charged residue for another (such as lysine for arginine) can also be expected to produce a functionally equivalent product. Nucleotide changes which result in alteration of the N-terminal and C-terminal portions of the protein molecule would also not be expected to alter the activity of the protein. Each of the proposed modifications is well within the routine skill in the art, as is determining what biological activity of the encoded products is retained. Moreover, the skilled artisan recognizes that substantially similar sequences encompassed by this invention are also defined by their ability to hybridize, under stringent conditions (0.1× SSC, 0.1% SDS, 65° C. and washed with 2× SSC, 0.1% SDS followed by 0.1× SSC, 0.1% SDS), with the sequences exemplified herein. Preferred substantially similar nucleic acid fragments of the instant invention are those nucleic acid fragments whose DNA sequences are at least 80% identical to the DNA sequence of the nucleic acid fragments reported herein. More preferred nucleic acid fragments are at least 90% identical to the DNA sequence of the nucleic acid fragments reported herein. Most preferred are nucleic acid fragments that are at least 95% identical to the DNA sequence of the nucleic acid fragments reported herein.
[0084] A nucleic acid molecule is “hybridizable” to another nucleic acid molecule, such as a cDNA, genomic DNA, or RNA, when a single stranded form of the nucleic acid molecule can anneal to the other nucleic acid molecule under the appropriate conditions of temperature and solution ionic strength. Hybridization and washing conditions are well known and exemplified in Sambrook, J., Fritsch, E. F. and Maniatis, T. Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor (1989), particularly Chapter 11 and Table 11.1 therein (entirely incorporated herein by reference). The conditions of temperature and ionic strength determine the “stringency” of the hybridization. For preliminary screening for homologous nucleic acids, low stringency hybridization conditions, corresponding to a Tm of 55°, can be used, e.g., 5× SSC, 0.1% SDS, 0.25% milk, and no formamide; or 30% formamide, 5× SSC, 0.5% SDS. Moderate stringency hybridization conditions correspond to a higher Tm, e.g., 40-45% formamide, with 5× or 6× SSC. Hybridization requires that the two nucleic acids contain complementary sequences, although depending on the stringency of the hybridization, mismatches between bases are possible. The appropriate stringency for hybridizing nucleic acids depends on the length of the nucleic acids and the degree of complementation, variables well known in the art. The greater the degree of similarity or homology between two nucleotide sequences, the greater the value of Tm for hybrids of nucleic acids having those sequences. The relative stability (corresponding to higher Tm) of nucleic acid hybridizations decreases in the following order: RNA:RNA, DNA:RNA, DNA:DNA. For hybrids of greater than 100 nucleotides in length, equations for calculating Tm have been derived (see Sambrook et al., supra, 9.50-9.51). For hybridizations with shorter nucleic acids, i.e., oligonucleotides, the position of mismatches becomes more important, and the length of the oligonucleotide determines its specificity (see Sambrook et al., supra, 11.7-11.8). In one embodiment, the length for a hybridizable nucleic acid is at least about 10 nucleotides. Preferably, a minimum length for a hybridizable nucleic acid is at least about 15 nucleotides; more preferably at least about 20 nucleotides; and most preferably the length is at least 30 nucleotides. Furthermore, the skilled artisan will recognize that the temperature and wash solution salt concentration may be adjusted as necessary according to factors such as length of the probe.
[0085] A “substantial portion” refers to an amino acid or nucleotide sequence which comprises enough of the amino acid sequence of a polypeptide or the nucleotide sequence of a gene to afford putative identification of that polypeptide or gene, either by manual evaluation of the sequence by one skilled in the art, or by computer-automated sequence comparison and identification using algorithms such as BLAST (Basic Local Alignment Search Tool; Altschul et al., J. Mol. Biol. 215:403-410 (1993); see also www.ncbi.nlm.nih.gov/BLAST/). In general, a sequence of ten or more contiguous amino acids or thirty or more nucleotides is necessary in order to putatively identify a polypeptide or nucleic acid sequence as homologous to a known protein or gene. Moreover, with respect to nucleotide sequences, gene-specific oligonucleotide probes comprising 20-30 contiguous nucleotides may be used in sequence-dependent methods of gene identification (e.g., Southern hybridization) and isolation (e.g., in situ hybridization of bacterial colonies or bacteriophage plaques). In addition, short oligonucleotides (generally 12 bases or longer) may be used as amplification primers in PCR in order to obtain a particular nucleic acid fragment comprising the primers. Accordingly, a “substantial portion” of a nucleotide sequence comprises enough of the sequence to afford specific identification and/or isolation of a nucleic acid fragment comprising the sequence. The instant specification teaches partial or complete amino acid and nucleotide sequences encoding one or more particular plant proteins. The skilled artisan, having the benefit of the sequences as reported herein, may now use all or a substantial portion of the disclosed sequences for the purpose known to those skilled in the art. Accordingly, the instant invention comprises the complete sequences as reported in the accompanying Sequence Listing, as well as substantial portions of those sequences as defined above.
[0086] For example, it is well known in the art that antisense suppression and co-suppression of gene expression may be accomplished using nucleic acid fragments representing less than the entire coding region of a gene, and by nucleic acid fragments that do not share 100% identity with the gene to be suppressed. Moreover, alterations in a gene that result in the production of a chemically equivalent amino acid at a given site, but do not effect the functional properties of the encoded protein, are well known in the art. Thus, a codon for the amino acid alanine, a hydrophobic amino acid, may be substituted by a codon encoding another less hydrophobic residue, such as glycine, or a more hydrophobic residue, such as valine, leucine, or isoleucine. Similarly, changes which result in substitution of one negatively charged residue for another, such as aspartic acid for glutamic acid, or one positively charged residue for another, such as lysine for arginine, can also be expected to produce a functionally equivalent product. Nucleotide changes which result in alteration of the N-terminal and C-terminal portions of the protein molecule would also not be expected to alter the activity of the protein. Each of the proposed modifications is well within the routine skill in the art, as is determination of retention of biological activity of the encoded products. Moreover, the skilled artisan recognizes that substantially similar sequences encompassed by this invention are also defined by their ability to hybridize, under stringent conditions (0.1× SSC, 0.1% SDS, 65° C.) or moderately stringent conditions, with the sequences exemplified herein. Preferred substantially similar nucleic acid fragments of the instant invention are those nucleic acid fragments whose DNA sequences are 80% identical to the DNA sequence of the nucleic acid fragments reported herein. More preferred nucleic acid fragments are 90% identical to the DNA sequence of the nucleic acid fragments reported herein. Most preferred are nucleic acid fragments that are 95% identical to the DNA sequence of the nucleic acid fragments reported herein.
[0087] The term “complementary” is used to describe the relationship between nucleotide bases that are capable to hybridizing to one another. For example, with respect to DNA, adenosine is complementary to thymine and cytosine is complementary to guanine. Accordingly, the instant invention also includes isolated nucleic acid fragments that are complementary to the complete sequences as reported in the accompanying Sequence Listing as well as those substantially similar nucleic acid sequences.
[0088] The term “percent identity” is a relationship between two or more polypeptide sequences or two or more polynucleotide sequences, as determined by comparing the sequences. In the art, “identity” also means the degree of sequence relatedness between polypeptide or polynucleotide sequences, as the case may be, as determined by the match between strings of such sequences. “Identity” and “similarity” can be readily calculated by known methods, including but not limited to those described in: Computational Molecular Biology (Lesk, A. M., ed.) Oxford University Press, New York (1988); Biocomputing: Informatics and Genome Projects (Smith, D. W., ed.) Academic Press, New York (1993); Computer Analysis of Sequence Data Part I (Griffin, A. M., and Griffin, H. G., eds.) Humana Press, New Jersey (1994); Sequence Analysis in Molecular Biology (von Heinje, G., ed.) Academic Press (1987); and Sequence Analysis Primer (Gribskov, M. and Devereux, J., eds.) Stockton Press, New York (1991). Preferred methods to determine identity are designed to give the largest match between the sequences tested. Methods to determine identity and similarity are codified in publicly available computer programs. Preferred computer program methods to determine identity and similarity between two sequences include, but are not limited to, the GCG Pileup program found in the GCG program package, using the Needleman and Wunsch algorithm with their standard default values of gap creation penalty=12 and gap extension penalty=4 (Devereux et al., Nucleic Acids Res. 12:387-395 (1984)), BLASTP, BLASTN, and FASTA (Pearson et al., Proc. Natl. Acad. Sci. USA 85:2444-2448 (1988). The BLASTX program is publicly available from NCBI and other sources (BLAST Manual, Altschul et al., Natl. Cent. Biotechnol. Inf., Natl. Library Med. (NCBI NLM) NIH, Bethesda, Md. 20894; Altschul et al., J. Mol. Biol. 215:403-410 (1990); Altschul et al., “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs”, Nucleic Acids Res. 25:3389-3402 (1997)). The method to determine percent identity preferred in the instant invention is by the method of DNASTAR protein alignment protocol using the Jotun-Hein algorithm (Hein et al., Methods Enzymol. 183:626-645 (1990)). Default parameters used for the Jotun-Hein method for alignments are: for multiple alignments, gap penalty=11, gap length penalty=3; for pairwise alignments ktuple=2. As an illustration, for a polynucleotide having a nucleotide sequence with at least 95% “identity” to a reference nucleotide sequence, it is intended that the nucleotide sequence of the polynucleotide is identical to the reference sequence except that the polynucleotide sequence may include up to five point mutations per each 100 nucleotides of the reference nucleotide sequence. In other words, to obtain a polynucleotide having a nucleotide sequence at least 95% identical to a reference nucleotide sequence, up to 5% of the nucleotides in the reference sequence may be deleted or substituted with another nucleotide, or a number of nucleotides up to 5% of the total nucleotides in the reference sequence may be inserted into the reference sequence. These mutations of the reference sequence may occur at the 5′ or 3′ terminal positions of the reference nucleotide sequence or anywhere between those terminal positions, interspersed either individually among nucleotides in the reference sequence or in one or more contiguous groups within the reference sequence. Analogously, for a polypeptide having an amino acid sequence having at least 95% identity to a reference amino acid sequence, it is intended that the amino acid sequence of the polypeptide is identical to the reference sequence except that the polypeptide sequence may include up to five amino acid alterations per each 100 amino acids of the reference amino acid. In other words, to obtain a polypeptide having an amino acid sequence at least 95% identical to a reference amino acid sequence, up to 5% of the amino acid residues in the reference sequence may be deleted or substituted with another amino acid, or a number of amino acids up to 5% of the total amino acid residues in the reference sequence may be inserted into the reference sequence. These alterations of the reference sequence may occur at the amino or carboxy terminal positions of the reference amino acid sequence or anywhere between those terminal positions, interspersed either individually among residues in the reference sequence or in one or more contiguous groups within the reference sequence.
[0089] The term “percent homology” refers to the extent of amino acid sequence identity between polypeptides. When a first amino acid sequence is identical to a second amino acid sequence, then the first and second amino acid sequences exhibit 100% homology. The homology between any two polypeptides is a direct function of the total number of matching amino acids at a given position in either sequence, e.g., if half of the total number of amino acids in either of the two sequences are the same then the two sequences are said to exhibit 50% homology.
[0090] “Codon degeneracy” refers to divergence in the genetic code permitting variation of the nucleotide sequence without effecting the amino acid sequence of an encoded polypeptide. Accordingly, the instant invention relates to any nucleic acid fragment that encodes all or a substantial portion of the amino acid sequence encoding the instant Thauera aromatica proteins as set forth in SEQ ID NO:1, SEQ ID NO:3 and SEQ ID NO:5. The skilled artisan is well aware of the “codon-bias” exhibited by a specific host cell to use nucleotide codons to specify a given amino acid. Therefore, when synthesizing a gene for improved expression in a host cell, it is desirable to design the gene such that its frequency of codon usage approaches the frequency of preferred codon usage of the host cell.
[0091] “Synthetic genes” can be assembled from oligonucleotide building blocks that are chemically synthesized using procedures known to those skilled in the art. These building blocks are ligated and annealed to form gene segments that are then enzymatically assembled to construct the entire gene. “Chemically synthesized”, as related to a sequence of DNA, means that the component nucleotides were assembled in vitro. Manual chemical synthesis of DNA may be accomplished using well established procedures, or automated chemical synthesis can be performed using one of a number of commercially available machines. Accordingly, the genes can be tailored for optimal gene expression based on optimization of nucleotide sequence to reflect the codon bias of the host cell. The skilled artisan appreciates the likelihood of successful gene expression if codon usage is biased towards those codons favored by the host. Determining preferred codons can be based on a survey of genes derived from the host cell where sequence information is available.
[0092] “Gene” refers to a nucleic acid fragment that expresses a specific protein, including regulatory sequences preceding (5′ non-coding sequences) and following (3′ non-coding sequences) the coding sequence. “Native gene” refers to a gene as found in nature with its own regulatory sequences. “Chimeric gene” refers to any gene, not a native gene, comprising regulatory and coding sequences that are not found together in nature. Accordingly, a chimeric gene may comprise regulatory sequences and coding sequences that are derived from different sources, or regulatory sequences and coding sequences derived from the same source, but arranged in a manner different than that found in nature. “Endogenous gene” refers to a native gene in its natural location in the genome of an organism. A “foreign” gene refers to a gene not normally found in the host organism, but which is introduced into the host organism by gene transfer. Foreign genes can comprise native genes inserted into a non-native organism, or chimeric genes. A “transgene” is a gene that has been introduced into the genome by a transformation procedure.
[0093] “Coding sequence” refers to a DNA sequence that codes for a specific amino acid sequence. “Regulatory sequences” refer to nucleotide sequences located upstream (5′ non-coding sequences), within, or downstream (3′ non-coding sequences) of a coding sequence, and which influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences may include promoters, translation leader sequences, introns, and polyadenylation recognition sequences.
[0094] “Promoter” refers to a DNA sequence capable of controlling the expression of a coding sequence or functional RNA. In general, a coding sequence is located 3′ to a promoter sequence. The promoter sequence consists of proximal and more distal upstream elements, the latter elements often referred to as enhancers. An “enhancer” is a DNA sequence which can stimulate promoter activity and may be an innate element of the promoter or a heterologous element inserted to enhance the level or tissue-specificity of a promoter. Promoters may be derived in their entirety from a native gene, or be composed of different elements derived from different promoters found in nature, or even comprise synthetic DNA segments. It is understood by those skilled in the art that different promoters may direct the expression of a gene in different tissues or cell types, or at different stages of development, or in response to different environmental conditions. Promoters which cause a gene to be expressed in most cell types at most times are commonly referred to as “constitutive promoters”. New promoters of various types useful in plant cells are constantly being discovered; numerous examples may be found in the compilation by Okamuro and Goldberg, (Biochemistry of Plants 15:1-82 (1989)). It is further recognized that since in most cases the exact boundaries of regulatory sequences have not been completely defined, DNA fragments of different lengths may have identical promoter activity.
[0095] The “translation leader sequence” refers to a DNA sequence located between the promoter sequence of a gene and the coding sequence. The translation leader sequence is present in the fully processed mRNA upstream of the translation start sequence. The translation leader sequence may affect processing of the primary transcript to mRNA, mRNA stability or translation efficiency. Examples of translation leader sequences have been described (Turner et a., Mol. Biotech. 3:225 (1995)).
[0096] The “3′ non-coding sequences” refer to DNA sequences located downstream of a coding sequence and include polyadenylation recognition sequences and other sequences encoding regulatory signals capable of affecting mRNA processing or gene expression. The polyadenylation signal is usually characterized by affecting the addition of polyadenylic acid tracts to the 3′ end of the mRNA precursor. The use of different 3′ non-coding sequences is exemplified by Ingelbrecht et al., Plant Cell 1:671-680 (1989).
[0097] “RNA transcript” refers to the product resulting from RNA polymerase-catalyzed transcription of a DNA sequence. When the RNA transcript is a perfect complementary copy of the DNA sequence, it is referred to as the primary transcript. The RNA transcript it may be a RNA sequence derived from posttranscriptional processing of the primary transcript and is referred to then as the mature RNA. “Messenger RNA” (mRNA) refers to the RNA that is without introns and that can be translated into protein by the cell. “cDNA” refers to a double-stranded DNA that is complementary to and derived from mRNA. “Sense” RNA refers to RNA transcript that includes the mRNA and so can be translated into protein by the cell. “Antisense RNA” refers to a RNA transcript that is complementary to all or part of a target primary transcript or mRNA and that blocks the expression of a target gene (U.S. Pat. No. 5,107,065). The complementarity of an antisense RNA may be with any part of the specific gene transcript, i.e., at the 5′ non-coding sequence, 3′ non-coding sequence, introns, or the coding sequence. “Functional RNA” refers to antisense RNA, ribozyme RNA, or other RNA that is not translated, yet has an effect on cellular processes.
[0098] The term “operably-linked” refers to the association of nucleic acid sequences on a single nucleic acid fragment so that the function of one is affected by the other. For example, a promoter is operably-linked with a coding sequence when it affects the expression of that coding sequence (i.e., that the coding sequence is under the transcriptional control of the promoter). Coding sequences can be operably-linked to regulatory sequences in sense or antisense orientation.
[0099] The term “expression” refers to the transcription and stable accumulation of sense (mRNA) or antisense RNA derived from the nucleic acid fragment of the invention. Expression may also refer to translation of mRNA into a polypeptide. “Antisense inhibition” refers to the production of antisense RNA transcripts capable of suppressing the expression of the target protein. “Overexpression” refers to the production of a gene product in transgenic organisms that exceeds levels of production in normal or non-transformed organisms. “Co-suppression” refers to the production of sense RNA transcripts capable of suppressing the expression of identical or substantially similar foreign or endogenous genes (U.S. Pat. No. 5,231,020).
[0100] “Altered levels” refers to the production of gene product(s) in organisms in amounts or proportions that differ from that of normal or non-transformed organisms.
[0101] “Transformation” refers to the transfer of a nucleic acid fragment into the genome of a host organism, resulting in genetically stable inheritance. Host organisms containing the transformed nucleic acid fragments are referred to as “transgenic” organisms. Examples of methods of plant transformation include Agrobacterium-mediated transformation (De Blaere et al., Meth. Enzymol. 143:277 (1987)) and particle-accelerated or “gene gun” transformation technology (Klein et al., Nature, London 327:70-73 (1987); U.S. 4,945,050).
[0102] The terms “plasmid”, “vector” and “cassette” refer to an extra chromosomal element often carrying genes which are not part of the central metabolism of the cell, and usually in the form of circular double-stranded DNA molecules. Such elements may be autonomously replicating sequences, genome integrating sequences, phage or nucleotide sequences, linear or circular, of a single- or double-stranded DNA or RNA, derived from any source, in which a number of nucleotide sequences have been joined or recombined into a unique construction which is capable of introducing a promoter fragment and DNA sequence for a selected gene product along with appropriate 3′ untranslated sequence into a cell. “Transformation cassette” refers to a specific vector containing a foreign gene and having elements in addition to the foreign gene that facilitate transformation of a particular host cell. “Expression cassette” refers to a specific vector containing a foreign gene and having elements in addition to the foreign gene that allow for enhanced expression of that gene in a foreign host.
[0103] Novel phenol-induced proteins, F1, F2, and F3, have been isolated. Comparison of their random cDNA sequences to the GenBank database using the BLAST algorithms, well known to those skilled in the art, revealed that F3 (orf1) and orf2 are proteins homologous to phosphoenolpyruvate sythase (PEP) of E. coli and are likely to represent the phenol phosphorylating enzyme E1 (FIG. 1). The nucleotide sequences of the F1, F2, and F3 genomic DNA are provided in SEQ ID NO:2, SEQ ID NO:4, and SEQ ID NO:6, and their deduced amino acid sequences are provided in SEQ ID NO:1, SEQ ID NO:3, and SEQ ID NO:5, respectively. F1, F2, and F3 genes from other bacteria can now be identified by comparison of random cDNA sequences to the F1, F2, and F3 sequences provided herein.
[0104] The nucleic acid fragments of the instant invention may be used to isolate cDNAs and genes encoding homologous F1, F2, and F3 phenol-induced proteins from the same or other plant or fungal species. Isolating homologous genes using sequence-dependent protocols is well known in the art. Examples of sequence-dependent protocols include, but are not limited to, methods of nucleic acid hybridization and methods of DNA and RNA amplification as exemplified by various uses of nucleic acid amplification technologies (e.g., polymerase chain reaction (PCR) or ligase chain reaction).
[0105] For example, other F1, F2, and F3 genes, either as cDNAs or genomic DNAs, could be isolated directly by using all or a portion of the instant nucleic acid fragments as DNA hybridization probes to screen libraries from any desired bacteria using methodology well known to those skilled in the art. Specific oligonucleotide probes based upon the instant F1, F2, and F3 sequences can be designed and synthesized by methods known in the art (Sambrook, supra). Moreover, entire sequences can be used directly to synthesize DNA probes by methods known to the skilled artisan such as random primers, DNA labeling, nick translation, or end-labeling techniques, or RNA probes using available in vitro transcription systems. In addition, specific primers can be designed and used to amplify a part of or full-length of the instant sequences. The resulting amplification products can be labeled directly during amplification reactions or labeled after amplification reactions, and used as probes to isolate full length cDNA or genomic fragments under conditions of appropriate stringency.
[0106] In addition, two short segments of the instant ORF's may be used in PCR protocols to amplify longer nucleic acid fragments encoding homologous F1, F2, F3, F4, and F5 genes from DNA or RNA. The polymerase chain reaction may also be performed on a library of cloned nucleic acid fragments wherein the sequence of one primer is derived from the instant nucleic acid fragments, and the sequence of the other primer takes advantage of the presence of the polyadenylic acid tracts to the 3′ end of the mRNA precursor encoding bacterial F1, F2, F3, F4, and F5. Alternatively, the second primer sequence may be based upon sequences derived from the cloning vector. For example, the skilled artisan can follow the RACE protocol (Frohman et al., Proc. Natl. Acad. Sci., USA 85:8998 (1988)) to generate cDNAs by using PCR to amplify copies of the region between a single point in the transcript and the 3′ or 5′ end. Primers oriented in the 3′ and 5′ directions can be designed from the instant sequences. Using commercially available 3′ RACE or 5′ RACE systems (BRL), specific 3′ or 5′ cDNA fragments can be isolated (Ohara et al., Proc. Natl. Acad. Sci., USA 86:5673 (1989); Loh et al., Science 243:217 (1989)). Products generated by the 3′ and 5′ RACE procedures can be combined to generate full-length cDNAs (Frohman et al., Techniques 1:165 (1989)).
[0107] Availability of the instant nucleotide and deduced amino acid sequences facilitates immunological screening of cDNA expression libraries. Synthetic peptides representing portions of the instant amino acid sequences may be synthesized. These peptides can be used to immunize animals to produce polyclonal or monoclonal antibodies with specificity for peptides or proteins comprising the amino acid sequences. These antibodies can then be used to screen cDNA expression libraries to isolate full-length cDNA clones of interest (Lemer et al., Adv. Immunol. 36:1 (1984); Sambrook, supra).
[0108] The enzymes and gene products of the instant ORF's may be produced in heterologous host cells, particularly in the cells of microbial hosts, and can be used to prepare antibodies to the resulting proteins by methods well known to those skilled in the art. The antibodies are useful for detecting the proteins in situ in cells or in vitro in cell extracts. Preferred heterologous host cells for production of the instant enzymes are microbial hosts and include those selected from the following: Comamonas sp., Corynebacterium sp., Brevibacterium sp., Rhodococcus sp., Azotobacter sp., Citrobacter sp., Enterobacter sp., Clostridium sp., Klebsiella sp., Salmonella s.p, Lactobacillus sp., Aspergillus sp., Saccharomyces sp., Zygosaccharomyces sp, Pichia sp., Kluyveromyces sp., Candida sp., Hansenula sp., Dunaliella sp., Debaryomyces sp., Mucor sp., Torylopsis sp., Methylobacteriasp., Bacillussp., Escherichia sp., Pseudomonas sp., Rhizobium sp., and Streptomyces sp. Microbial expression systems and expression vectors containing regulatory sequences that direct high level expression of foreign proteins are well known to those skilled in the art. Any of these could be used to construct chimeric genes for production of any of the gene products of the instant ORF's. These chimeric genes could then be introduced into appropriate microorganisms via transformation to provide high level expression of the enzymes.
[0109] Additionally, chimeric genes will be effective in altering the properties of the host bacteria. It is expected, for example, that introduction of chimeric genes encoding one or more of the ORF's 1-10 under the control of the appropriate promoters, into a host cell comprising at least one copy of these genes will demonstrate the ability to produce various intermediates in phenol metabolism. For example, the appropriately regulated ORF 1 and ORF 2, would be expected to express an enzyme capable of phosphorylating phenol (phenylphosphate synthase—FIG. 1). Similarly, ORF 4, ORF 6, ORF 7 and ORF 8 would be expected to express an enzyme capable of carboxylating phenylphosphate to afford 4-hydroxbenzoate (phenylphosphate carboxylase—FIG. 1). Finally, expression of SEQ ID NO:23 in a single recombinant organism will be expected to effect the conversion of phenol to 4-hydroxybenzoate in a transformed host (FIG. 1).
[0110] Vectors or cassettes useful for the transformation of suitable host cells are well known in the art. Typically the vector or cassette contains sequences directing transcription and translation of the relevant gene, a selectable marker, and sequences allowing autonomous replication or chromosomal integration. Suitable vectors comprise a region 5′ of the gene which harbors transcriptional initiation controls and a region 3′ of the DNA fragment which controls transcriptional termination. It is most preferred when both control regions are derived from genes homologous to the transformed host cell, although it is to be understood that such control regions need not be derived from the genes native to the specific species chosen as a production host.
[0111] Initiation control regions or promoters, which are useful to drive expression of the instant ORF's in the desired host cell are numerous and familiar to those skilled in the art. A promoter capable of driving these genes is suitable for the present invention including but not limited to CYC1, HIS3, GAL1, GAL10, ADH1, PGK, PHO5, GAPDH, ADC1, TRP1, URA3, LEU2, ENO, TPI (useful for expression in Saccharomyces); AOX1 (useful for expression in Pichia); and lac, trp, 1PL, 1PR, T7, tac, and trc (useful for expression in Escherichia coli). Useful strong promoters may also be used from Corynebacterium, Comamonas, Pseudomonas, and Rhodococcus.
[0112] Termination control regions may also be derived from various genes native to the preferred hosts. Optionally, a termination site may be unnecessary, however, it is most preferred if included.
DESCRIPTION OF THE PREFERRED EMBODIMENTS[0113] In the denitrifying bacterium Thauera aromatica phenol carboxylation proceeds in two steps and involves formation of phenylphosphate as the first intermediate (Equation 1). Cells grown with phenol were simultaneously adapted to growth with 4-hydroxybenzoate, whereas, vice-versa, 4-hydroxybenzoate-grown cells did not metabolize phenol. Induction of the capacity to metabolize phenol required several hours.
[0114] An enzyme activity catalyzing an isotope exchange of the phenyl moiety of phenylphosphate with free 14C-phenol was identified in extracts of phenol-grown cells (Equation 2), and was lacking in 4-hydroxybenzoate grown cells. Free 32P-phosphate did not exchange with phenylphosphate. This suggests a phosphorylated enzyme E1 (Equations 3 and 4) which becomes phosphorylated in an essentially irreversible step (Equation 5). The phosphorylated enzyme transforms phenol to phenylphosphate in a reversible reaction (Equation 6). The whole reaction is understood as the sum of Equation 5 and Equation 6. The phosphoryl donor X˜P is unknown so far. The enzyme E1 is termed phenol kinase. 1
[0115] Phenylphosphate is the substrate of a second enzyme E2, phenylphosphate carboxylase. It requires K+ and Mn2+ and catalyzes the carboxylation of phenylphosphate to 4-hydroxybenzoate (Equation 7). An enzyme activity catalyzing an isotope exchange between the carboxyl of 4-hydroxybenzoate and free 4CO2 (Equation 8) was present in phenol-grown cells. Free 14C-phenol did not exchange. This suggests an enzyme E2-phenolate intermediate (Equations 9 and 10) which is formed in a presumably exergonic reaction (Equation 11) followed by the reversible carboxylation (Equation 12). The actual substrate is CO2 rather than bicarbonate, and the carboxylating enzyme was not inhibited by avidin; both results suggest that biotin is not involved in carboxylation. The enzyme E2 is termed phenylphosphate carboxylase. 2
EXAMPLES[0116] The present invention is further defined in the following Examples, in which all parts and percentages are by weight and degrees are Celsius, unless otherwise stated. It should be understood that these Examples, while indicating preferred embodiments of the invention, are given by way of illustration only. From the above discussion and these Examples, one skilled in the art can ascertain the essential characteristics of this invention, and without departing from the spirit and scope thereof, can make various changes and modifications of the invention to adapt it to various usage and conditions.
[0117] Standard recombinant DNA and molecular cloning techniques used here are well known in the art and are described by Sambrook, J., Fritsch, E. F. and Maniatis, T. Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 1989 (hereinafter “Sambrook”); and by T. J. Silhavy, M. L. Bennan, and L. W. Enquist, Experiments with Gene Fusions, Cold Spring Harbor Laboratory Press, Cold Spring, N.Y. (1984) and by Ausubel et al., Current Protocols in Molecular Biology, pub. by Greene Publishing Assoc. and Wiley-Interscience (1987).
[0118] Manipulations of genetic sequences were accomplished using the suite of programs available from the Genetics Computer Group Inc. (Wisconsin Package Version 9.0, Genetics Computer Group (GCG), Madison, Wis.) and PC/Gene©: the nucleic acid and protein sequence analysis software system, A. Bairoch, University of Geneva, Switzerland, Intelligenetics™ Inc. Serial Number IGI2626/Version 6.70; programs used were as follows: REFORM—sequence file conversion program, Version 4.3, February 1991; RESTRI—restriction site analysis; NMANIP—simple nucleic acid sequence manipulations (inverse and complement the sequence); HAIRPIN—search for hairpin loops in a nucleotide sequence; default parameters: minimum stem size: 5, lower range of number of unpaired bases: 3, upper range of number of unpaired bases: 20, allowed basepairs: G-C, A-T (A-U).
Example 1 Strains and Culture Conditions[0119] In the denitrfying bacterium Thauera aromatica phenol carboxylation proceeds in two steps and involves formation of phenylphosphate as the first intermediate (FIG. 1). Cells grown with phenol were simultaneously adapted to growth with 4-hydroxybenzoate, whereas, vice-versa, 4-hydroxybenzoate-grown cells did not metabolize phenol. Induction of the capacity to metabolize phenol required several hours. The enzyme system not only acts on 4-hydroxy-benzoate/phenol (100%), but also on protocatechuate/catechol (30%), o-cresol (30%), 2-chlorophenol (75%) and 2,6-dichlorophenol (30%). The enzyme specifically catalyzes a para-carboxylation, and anaerobic growth of the organism on phenolic compounds and nitrate requires CO2.
[0120] Both, the phosphorylating and the carboxylating enzymes (E1 and E2, respectively), are strictly regulated. All activities were only present after anoxic growth of cells on phenol, and were lacking after growth on 4-hydroxybenzoate. Further metabolism of 4-hydroxybenzoate proceeds via benzyl-CoA in two steps, as shown in FIG. 1.
[0121] Thauera aromatica (K 172) was cultured anaerobically at 30° C. in a mineral salt medium (1.08 g/L KH2PO4, 5.6 g/L K2HPO4, 0.54 g/L NH4Cl) supplemented with 0.1 mM CaCl2, 0.8 mM MgSO4, 1 mL/L vitamin solution (cyanocobalamin 100 mg/L, pyridoxamin-2 HCl 300 mg/L, Ca-D(+)-pantothenate 100 mg/L, thiamindichloride 200 mg/L, nicotinate 200 mg/L, 4-aminobenzoate 80 mg/L, D(+)-biotin 20 mg/L) and 1 mL/L of a solution of trace elements (25% HCl 10 mL/L, FeCl2.4H2O 1.5 g/l, ZnCl2 70 mg/L, MnCl2.4H2O 100 mg/L, CoCl2.6H2O 100 mg/L, CuCl2.2H2O 2 mg/L, NiCl2.6H2O 24 mg/L, Na2MoO4.2H2O 36 mg/L, H3BO3 6 mg/L). 0.5 mM phenol and 10 mM NaHCO3 as sole source of carbon and energy were added, as well as 2 mM NaNO3 as the terminal electron acceptor. Note: All media, supplements and substrates were strictly anaerobic.
[0122] Escherichia coli strains XL1-blue [(F′, proAB, lacIqZ&Dgr;M15, Tn10, tetR), gyrA96, hsdR17, recA1, relA1, thi-1, &Dgr;(lac), Lambda-], K38 [hfrC, ompF267, phoA4, pit-10, relA1] and P2392 [hsdR514, supE44, supF58, lacY1, galK2, galT22, metB1, trpR55, mcrA, P2 lysogen] were cultured in Luria-Bertani medium at 37° C. (Sambrook). Antibiotics were added to E. coli cultures to the following final concentrations: kanamycin 50 &mgr;g/mL, ampicillin 50 &mgr;g/mL and tetracycline 20 &mgr;g/mL.
Example 2 4-Hydroxybenzoate:14CO2-Isotope Exchange[0123] The assay conditions were as follows: 20 mM imidazole/HCl (pH 6.5), 20 mM KCl, 0.5 mM MnCl2, 2 mM 4-hydroxybenzoate, 50 &mgr;mol CO2 (50 &mgr;L 1 M NaHCO3 per 1 mL assay), 25 &mgr;L soluble fraction (see Example 4) per 1 mL assay. The reaction was started by addition of 10 &mgr;L 14C-Na2CO3 (7 kBq; specific radioactivity 80 nCi/mmol). After 5 min incubation at 30° C. the reaction was stopped by the addition of 30 &mgr;L 3 M perchloric acid per 250 &mgr;L sample. The precipitated proteins were centrifuged down and the supernatant was acidified with 150 &mgr;L 10 M formic acid. The mixture was incubated under steady flow of CO2 (10 mL/min) to remove all the 14CO2 which was not fixed in the reaction. After 15 min 150 &mgr;L 1 M KHCO3 was added and incubated another 15 min under steady flow of CO2 (10 mL/min). The formed amount of non-volatile labeled product (4-hydroxybenzoate:14CO2) was analyzed by liquid scintillation counting.
[0124] Measurement of the 4-hydroxybenzoate:4CO2-isotope exchange in the soluble fraction of cells grown on phenol and 4-hydroxybenzoate, respectively was performed in an assay described below: 1 50 mM MnCl2 10 &mgr;L 2M KCl 10 &mgr;L 1M NaHCO3 50 &mgr;L 0.2M 4-hydroxybenzoate 10 &mgr;L 20 mM imidazole/HCl pH 6.5 895 &mgr;L soluble fraction 25 &mgr;L 14C—Na2CO3 10 &mgr;L (≅3923 Bq)
[0125] Following incubation for 4 min/30° C., to 200 &mgr;L sample treated as described above, 3.0 mL of scintillation cocktail was added and the amount of 14C was counted in a liquid scintillation counter for 5 min. The output of the scintillation counter was: 2 sample CpmA cpmB scr** dpmA dpmB % A* % B* Phenol grown 276 1659 0.168 0 1900 .00 87.32 cells 4-hydroxy- 6 20 0.318 0 25 .00 79.44 benzoate grown cells no cell extract 5 11 0.386 0 15 .00 75.97 (control) *A and B stand for the two windows in which the counting takes place and are preset for 14C. The results are reliable when % B is about 75% or higher. **scr stands for Sample Channels Ratio method and it relates to the efficiency and reliability of the measurements (a scr value of about 0.1-0.25 is optimal).
[0126] Calculating of the activity (nmol min−1 mg−1): total incorporation of 14CO2 would result in a value of 235380 dpm (desintegrations per minute, 60×3923 Bq) per 50 &mgr;moL NaHCO3 in 1 mL assay. 1900 dpm (see table dpmB) correspond to 32 Bq which means 382 nmol/4 min×200 &mgr;L sample. A 200 &mgr;L sample contains about 5 &mgr;L soluble fraction. The protein concentration of the soluble fraction of phenol-grown cells is about 62 mg/mL. Therefore, a 200 &mgr;L of sample corresponds to 310 &mgr;g soluble fraction. The specific activity was determined to be 308 nmol/min/mg protein.
Example 3 Carboxylation of Phenylphosphate[0127] Phenylphosphate is the substrate of the second enzyme E2, phenylphosphate carboxylase. It requires K+ and Mn2+ and catalyzes the carboxylation of phenylphosphate to 4-hydroxybenzoate. The assay conditions were as follows: 20 mM imidazole/HCl (pH 6.5), 20 mM KCl, 0.5 mM MnCl2, 2 mM phenylphosphate, 25 &mgr;mol CO2 (25 &mgr;L 1 M NaHCO3 per 1 mL assay), 25 &mgr;L soluble fraction (see Example 4) per 1 mL assay. The reaction was started by addition of 205 &mgr;L 14C-Na2CO3 (14 kBq; specific radioactivity 250 nCi/mmol). After 5 min incubation at 30° C. the reaction was stopped by the addition of 30 &mgr;L 3 M perchloric acid per 250 &mgr;L sample. The precipitated proteins were centrifuged down and the supernatant was acidified with 150 &mgr;L 10 M formic acid. The mixture was incubated under steady flow of CO2 (10 ML/min) to remove all the 14CO2 which was not fixed in the reaction. After 15 min 150 &mgr;L of 1.0 M KHCO3 was added and incubated another 15 min under steady flow of CO2 (10 mL/min). The formed amount of non-volatile labeled product was analyzed by liquid scintillation counting.
[0128] See description in Example 2 with the difference that 0.2 M phenyl-phosphate instead of 4-hydroxybenzoate and 25 &mgr;L 1 M NaHCO3 instead of 50 &mgr;L were used. The output of the scintillation counter was: 3 sample cpmA cpmB scr** dpmA dpmB % A* % B* phenol 21 114 0.199 0 134 .00 85.65 4-hydroxy- 7 19 0.360 0 24 .00 77.28 benzoate no extract 5 11 0.386 0 15 .00 75.97 *A and B stand for the two windows in which the counting takes place and are preset for 14C. The results are reliable when % B is about 75% or higher. **scr stands for Sample Channels Ratio method and it relates to the efficiency and reliability of the measurements (a scr value of about 0.1-0.25 is optimal).
[0129] The carboxylase activity was calculated as described in Example 2 taking into account the fact that 3923 Bq (235380 dpm)≅25 &mgr;mol incorporated 14Co2 per 1 mL assay. The specific activity was determined to be 10 nmol/min/mg.
Example 4 Partial Purification and Amino Acid Sequencing of Three Dominant Phenol-Induced Proteins F1 F2 and F3[0130] Thauera aromatica (K 172) was cultured anaerobically at 30° C. with 0.5 mM phenol and 10 mM NaHCO3 as sole source of carbon and energy, as well as 2 mM NaNO3 as the terminal electron acceptor. The bacterial cells were harvested and 20 g of the bacterial cells were resuspended in 20 mL 20 mM imidazole/HCl (pH 6.5), 10% glycerol, 0.5 mM dithionite and traces of DNase I, disrupted (French Press, 137.6 MPa) and ultracentrifuged (100 000× g). The supernatant with the soluble protein fraction contained all the 4-hydroxy-benzoate:14CO2-exchange activity (383 nmol min−1 mg−1) and phenylphosphate carboxylase activity (10 nmol min−1 mg−1). The supernatant was loaded on a DEAE Sepharose fast flow chromatography column (Amersham Pharmacia Biotech, Uppsala, Sweden). FIG. 2 shows the results of SDS-PAGE (12.5%) with fractions after chromatography of the soluble fraction of K172 (grown aerobically on phenol). A total amount of 20 &mgr;g protein was loaded per lane. Lane 1: K172 grown on 4-hydroxybenzoate/NO3-(105× g supernatant); Lane 2: K172 grown on phenol/NO3− (105× g supematent) show that three dominant phenol-induced proteins F1, F2, and F3 were separated. F1, F2, and F3 were identified by molecular weight: F1≈60 kDa, F2≈58 kDa, F3≈67 kDa. Lane 3: pooled fractions containing F1; Lane 4: pooled fractions containing F2; Lanes 5-7: fractions 17-19; Lanes 8-10: fractions 53-55; Lane 1: proteins that did not bind to DEAE; and Lane 12: fraction 84 containing F3.
[0131] The fraction, after chromatography on DEAE sepharose, containing F1 were pooled and loaded on a MonoQ chromatography column (Amersham Pharmacia Biotech, Uppsala, Sweden). Then the fractions containing F1 were pooled and blotted to an immobilon-Psq transfer membrane (Millipore, Bedford, Mass.). After staining of the PVDF membrane with Coomassie Blue, F1 was cut off and sequenced using an Applied Biosystems 473A sequencer (Table 1).
[0132] The fractions containing F2 were subjected to peptide and N-terminal sequencing. For peptide sequencing, the fractions after chromatography on DEAE sepharose containing F2 were pooled and loaded on a Blue sepharose chromatography column (Amersham Pharmacia Biotech, Uppsala, Sweden). Then the fractions containing F2 were pooled and digested with modified trypsin (Promega, Mannheim, Germany). The trypsin digest was done according to the following procedure: 500 &mgr;g protein in 200 &mgr;&mgr;L of 20 mM Tris/HCl, pH 7.5, was adjusted to pH 8 with 3 &mgr;L of triethylamine. 10 &mgr;g trypsin in 10 &mgr;L H2O (Promega sequencing grade modified, catalog #V5111) were added. The digest was carried out at 37° C. for 4 h. The reaction was stopped by heating for 5 min to 100° C. After centrifugation 5 &mgr;L, 70 &mgr;L and 100 &mgr;L, respectively, were applied to the HPLC. The peptides generated were separated on a reverse phase C-18 Superpac-Sephasil high performance liquid chromatography column (Amersham Pharmacia Biotech, Uppsala, Sweden). Fractions containing well resolved peptides were sequenced (Table 2).
[0133] For N-terminal sequencing, the pooled fractions after chromatography on DEAE sepharose containing F2 were loaded on a MonoQ chromatography column (Amersham Pharmacia Biotech, Uppsala, Sweden). Then the fractions containing F2 were pooled and blotted to a immobilon-Psq transfer membrane (Millipore, Bedford, Mass.). After staining of the PVDF membrane with Coomassie Blue, F2 was cut off and sequenced using an Applied Biosystems 473A sequencer (Table 1).
[0134] After chromatography on DEAE sepharose the pooled fractions containing F3 were loaded on a MonoQ chromatography column (Amersham Pharmacia Biotech, Uppsala, Sweden). The fractions containing F3 were pooled and blotted to a immobilon-Psq transfer membrane (Millipore, Bedford, Mass.). After staining of the PVDF membrane with Coomassie Blue, F3 was cut off and sequenced using an Applied Biosystems 473A sequencer (Table 1). 4 TABLE 1 N-Terminal Amino Acid Sequence N-Terminal Amino (Applied Biosystems 473A Acid Sequence Sequencer)* Deduced from the Genes F1 gKISA PKNNR EFIEA sVKSG MGKIIS APKNN REFIE DAVRI RQEVD WDNEA GAIVr ACVKS GDAVR I PA (SEQ ID NO: 24) (SEQ ID NO: 25) F2 MDLRY FINQX AEAHE LKRIT MDLRY FINQC ABAHE TEVDW NLEIS HVsKL XXe LKRIT TEVDW NLEIS (SEQ ID NO: 26) HVSKL TEE (SEQ ID NO: 27) F3 MKFPV PHDIQ AKTIP GTEGw MKFPV PHDIQ AKTIP ERMYP XXXAF VXd GTEGW ERMYP YHYQF VTD (SEQ ID NO: 28) (SEQ ID NO: 29) *The lower cases stand for amino acids that could not be clearly identified during sequencing.
[0135] 5 TABLE 2 Internal Fragments by Trypsin-Digest: Amino Acid Sequence F2 .FHEGG gg. .MQMLD DK. (SEQ ID NO: 30) .QVADA VIASN TGSYg M. .FWSVV DER. .IXTEV DWNLE ISXV. .TATLW TELEQ MR. .YIGTM VSVVL YDPET GR. .GQQAE FLMAX XXXXP VXAGA EIVLE XGI. (SEQ ID NO: 31) .GQQAE FLM..
Example 5 Preparation of DNA Probe for Screening a &lgr;EMBL3 Gene Library of Thauera aromatica[0136] On the basis of the N-terminal amino acid sequences of F1, F2, and F3 and of the internal fragments of F2 (Example 4), degenerated oligonucleotides were designed. The oligonucleotides F2-forward (N-terminus) (SEQ ID NO:32; ATG-GATC-CTGC-CGCG-TAC-TTC-ATC), F2T6-reverse (SEQ ID NO:33; TT-GATC-GATC-GCAG-CAT-CTG-CAT) and F2T43-reverse (SEQ ID NO:34; CAT-CGAG-GAA-TCTC-GCGC-CTG-CTG) (both internal fragments) were used as primers in a polymerase chain reaction (PCR) with genomic DNA of Thauera aromatica as target. PCR conditions were as follows: 100 ng target, 200 nM each primer, 200 &mgr;M each of dATP, dCTP, dTTP, dGTP, 50 mM KCl, 1.5 mM MgCl2, 10 mM Tris/HCl (pH 9.0), 1 unit Taq-DNA-Polymerase (Amersham Parmacia Biotech, Uppsala, Sweden). PCR parameters were as follows: 95° C. 30 sec, 40° C. 1 min, 72° C. 2.5 min, 30 cycles. The PCR products were subjected to ethidium bromide agarose gel electrophoresis followed by excision and purification.
[0137] The purified PCR product (F2-forward/F2T43-reverse) in a size of approximately 750 bp was sequenced and confirmed to be the N-terminus of F2. The PCR product was labeled with [32P]-dCTP and used as a probe for screening a &lgr;EMBL3 gene library of Thauera aromatica. One positive phage of about 11 kb was detected, prepared and restricted with BamHI, EcoRI and Pst1. The digests were subjected to ethidium bromide agarose gel electrophoresis followed by excision and purification of the restriction fragments. The purified fragments were ligated in the corresponding pBluescript vector KS(+) [Apr, lacZ, f1, ori] restricted with BamHI, EcoRI and Pst1, respectively. Ligation mix was used to transform competent E. coli XL 1-Blue and plated onto LB plates supplemented with IPTG, X-Gal and 50 &mgr;g/IL ampicillin. Plasmid DNA was prepared from several white colonies (clones 8, 9, and 19; FIGS. 3, 4, and 5, respectively) and sequenced by dideoxy termination protocol using T7 and T3 primer (SEQ ID NO 35: 3′ CGGGATATCACTCAGCATAATG 5′ and SEQ ID NO 36:5′ AATTAACCCTCACTAAAGGG 3′, respectively). Nucleotide sequence analysis confirmed that the amino acid sequences deduced from the genes corresponded to the N termini of F1, F2, and F3.
Example 6 Screening of the &lgr;EMBL3 Gene Library of Thauera aromatica for DNA Sequences 5′ of the Known Sequences[0138] The oligonucleotide designated breib31 (SEQ ID NO:37; 5′ GACAACTTCGTCGTCAA 3′) and the oligonucleotide designated breib07r3 (SEQ ID NO:38; 5′ GTGGATATTGGCTTCGGAAA 3′) were used as primers in a PCR with genomic DNA of Thauera aromatica as target. PCR conditions were as described in Example 5. The PCR product was subjected to ethidium bromide agarose gel electrophoresis followed by excision and purification. The purified PCR product in a size of approximately 500 bp was labeled with [32P]-dCTP and used as a probe for screening a &lgr;EMBL3 gene library of Thauera aromatica. Two positive phages could be detected. The phage DNA was prepared and restricted with BamHI, EcoRI and Pst1. The digests were subjected to ethidium bromide agarose gel electrophoresis followed by excision and purification of the restriction fragments. The purified fragments were ligated in the corresponding pBluescript vector KS(+) [Apr, lacZ, f1, ori] restricted with BamHI, EcoRI and Pst1, respectively. Ligation mix was used to transform competent E. coli XL1-Blue which was plated onto LB plates supplemented with IPTG, X-Gal and 50 &mgr;g/mL ampicillin. Plasmid DNA was prepared from several white colonies (clone 2 with a 9 kb BamHI insert and clone 7 with a 3.7 kb Pst1 insert as described in FIGS. 6 and 7) and sequenced by dideoxy termination protocol using T3 primer (SEQ ID NO:36). DNA sequences upstream of the known sequences were revealed by DNA analysis (FIG. 12).
Example 7 Screening of the &lgr;zap Express Gene Library of Thauera aromatica for DNA sequences 3′ of the Known Sequences[0139] The oligonucleotide designated &lgr;15-forward (SEQ ID NO:39; 5′TCGCCGGCGACGACGCCG 3′) and the oligonucleotide designated &lgr;15-reverse (SEQ ID NO:40; 5′ CCGCGCGCTGCGCCGCCG 3′) were used as primers in a PCR with genomic DNA of Thauera aromatica as target. PCR conditions were as follows: 100 ng target, 200 nM each primer, 200 &mgr;M each of dATP, dCTP, dTTP, dGTP, (NH4)SO4, KCl, 4.5 mM MgCl2, 10 mM Tris/HCl (pH 8.7), 1× Q solution, 1 unit Taq-DNA-Polymerase (Qiagen, Hilden, Germany). PCR parameters were as follows: 95° C. 30 sec, 45° C. 1 min, 72° C. 2.5 min, 30 cycles. The PCR product was subjected to ethidium bromide agarose gel electrophoresis followed by excision and purification. The purified PCR product in a size of approximately 600 bp was labeled with [32P]-dCTP and used as a probe for screening a &lgr;zap express gene library (Stratagene, Heidelberg, Germany) of Thauera aromatica. One positive clone was detected. The phagemid was prepared according to the manufacturer's protocol and restricted with Sal1/EcoRI. After ethidium bromide agarose gel electrophoresis of the digest, the DNA insert was estimated to be 9 kb in size (clone 1—FIG. 8). The restricted DNA was blotted and hybridized with [32P]-labeled probe designated as described above. A fragment of approximately 1 kb could be detected. DNA sequences downstream of the known sequences were revealed by DNA analysis (FIG. 12).
Example 8 DNA Sequencing of the Genes Coding for Putative Proteins Involved in Phenol Metabolism[0140] A 3.7-kb Pst1 fragment, a 2.7-kb BamHI fragment, a 4.0-kb BamHI fragment, a 5.25-kb EcoRI fragment and a 9 kb BamHI fragment were each ligated to the corresponding pBluescript KS(+) [Apr, lacZ, f1, ori] vector restricted with BamHI, Pst1 and EcoRI, respectively (FIGS. 7, 3, 5, and 4, respectively). The plasmids were transformed into competent E. coli XL 1-blue. Plasmid DNA purified by alkaline lysis method was sequenced by dideoxy termination protocol using T7 and T3 primers (SEQ ID NO:35 and SEQ ID NO:36, respectively) and then by primer walking. About 14 kb (SEQ ID NO:23) were sequenced which contained two gene clusters that appear to be involved in phenol metabolism.
[0141] The nucleotide sequences of F1, F2, and F3 are provided in SEQ ID NO:2, SEQ ID NO:4, and SEQ ID NO:6, respectively, and their deduced amino acid sequences are provided in SEQ ID NO:1, SEQ ID NO:3, and SEQ ID NO:5, respectively. Nucleotide and amino acid sequences were analyzed using the PC/gene software package (Genofit). Homologous sequences were identified using the BLAST (Basic Local Alignment Search Tool; Altschul et al., J. Mol. Biol. 215:403-410 (1990)) search using the TBLASTN algorithm provided by the National Center for Biotechnology Information (Table 4 and FIG. 13).
[0142] F3 shows homology to phosphoenolpyruvate (PEP) synthase. The reaction catalyzed by this enzyme is shown in FIG. 11. First, PEP-synthase is phosphorylted by ATP, AMP and Pi being the products. In a second step, the phosphorylated enzyme transfers the &bgr;-phosphoryl group of ATP to pyruvate. This reaction may be similar to the proposed reaction mechanism of the phenol kinase, whereby phenol ultimately becomes phosphorylated.
[0143] F1, F2, and F5 show good homology to the ubiD, a gene which codes for the 3-octaprenyl-4-hydroxybenzoate decarboxylase. This enzyme is involved in the biosynthesis of ubiquinone. The reaction catalyzed is shown in FIG. 11. This reaction is analogous to the reverse reaction of the postulated carboxylation of phenol.
Example 9 Expression of F1-F5 Proteins in E. coli[0144] A 3.7-kb Pst1 fragment contains: orf1 (SEQ ID NO:6) which codes for F3 protein (SEQ ID NO:5) and orf2 (SEQ ID NO:12) which codes for unknown protein (SEQ ID NO:11). A 2.7-kb BamHI fragment contains: orf3 (SEQ ID NO:14) which codes for unknown protein (SEQ ID NO:13) and orf4 (SEQ ID NO:4) which codes for F2 protein (SEQ ID NO:3). A 4.0-kb BamHI fragment contains: orf5 (SEQ ID NO:8) which codes for F4 protein (SEQ ID NO:7), orf6 (SEQ ID NO:2) which codes for F1 protein (SEQ ID NO:1), and orf7 (SEQ ID NO:16) which codes for unknown protein (SEQ ID NO:15). A 5.25-kb EcoRI fragment contains: orf7 (SEQ ID NO:16) which codes for unknown protein, SEQ ID NO:15), orf8 (SEQ ID NO:10) which codes for F5 protein (SEQ ID NO:9), orf9 (SEQ ID NO:18) which codes for unknown protein, SEQ ID NO:17), and orf10 (SEQ ID NO:20) which codes for unknown protein, SEQ ID NO:19). Each restriction fragment was ligated into pBluescript SK.
[0145] For expression of the genes, the recombinant plasmids were transformed into E. coli K38 containing the plasmid pGP 1-2 [kanr, cI857 T7Gen1(RNA Polymerase)] (Tabor and Richardson, 1985). Cells were grown in 1 mL Luria-Bertani medium plus ampicillin and kanamycin at 30° C. to an absorbance of 0.5 at 600 nm, washed in Werkman minimal medium (Fraenkel and Neidhardt, 1961) and resuspended in 5 mL Werkman minimal medium containing 0.01% (mass/volume) amino acids besides cysteine and methionine. After incubation for 1-2 h at 30° C. the temperature was shifted to 42° C. to induce expression of T7 polymerase. After 15 min E. coli RNA synthesis was stopped by addition of 200 &mgr;g rifampicin/mL. The cells were incubated for 10 min at 42° C. and for further 20 min at 30° C. to ensure degradation of E. coli mRNA. Aliquots of 1 mL of the induced culture were subsequently pulse-labeled with 10 &mgr;Ci [35S]methionine (Amersham) for 5 min at 30° C. Cells were centrifuged, resuspended in 120 &mgr;iL sample buffer and lysed by 5 min incubation at 95° C. Labeled proteins were separated by sodium dodecyl sulfate gel electrophoresis and localized by autoradiography. FIG. 9 shows the experimentally determined molecular masses of the proteins. Expression of F1-F5 in E. coli (T7 experiment). 25 &mgr;L were loaded on each lane. Lanes 1, 4, 7: marker proteins; Lane 2: Proteins (F3 & unknown) coded by 3.7 kb Pst1 fragment containing orf1 and orf2 respectively; Lane 3: Proteins (unknown & F2) coded by 2.7 kb BamHI fragment containing orf3 and orf4 respectively; Lane 5: Proteins (F5 and 3 unknowns) coded by 5.25 kb EcoRI fragment containing orf8, orf7, orf9 and orf10 respectively; and Lane 6: Proteins (F1, F4 and unknown) coded by 4.0 kb BamHI fragment containing orf6, orf5 and orf7. The predicted molecular masses agreed reasonably well with the experimentally determined molecular masses of FIG. 9.
Example 10 Extraction and N-terminal Sequencing of Phenol-induced Proteins F4 and F5 Using Two Dimensional Gel Electrophoresis[0146] 120 &mgr;g of the soluble fraction of cells that were grown on phenol/nitrate and of cells grown on 4-hydroxybenzoate, respectively, were lysed in 10 &mgr;L lysis buffer (9.5 M urea, 2% (w/v) CHAPS, 0.8% (w/v) ampholytes pH 3-10 (40% (w/v); Biorad), 1% (w/v) DTT, traces of bromophenol blue) and applied to a rehydrated Immobiline Dry Strip (linear pH gradient 3-10; Pharmacia) according to the manufacturers protocol (rehydration buffer: 8 M urea, 0.5% (w/v) CHAPS, 15 mM DTT, 0.2% (w/v) ampholytes pH 3-10 (40% (w/v); Biorad). The horizontal isoelectric focussing was run overnight (15 h, 1400 V). After the first dimension the Immobiline Dry Strips were equilibrated twice for 15 min in equilibration buffer (0.05 M Tris/HCl pH 8.8, 6 M urea, 30% (w/v) glycerol, 2% (w/v) SDS, traces of bromophenol blue and 10 mg/mL DTT or 48 mg/mL iodoacetamide, respectively). The second dimension was a vertical SDS polyacrylamide gel electrophoresis (11.5% polyacrylamide) indicating phenol-induced proteins (FIG. 10). The proteins were blotted to a PVDF membrane and stained with Coomassie Blue. The phenol-induced proteins F4 and F5 were cut off and N-terminal sequenced using an Applied Biosystems 473A sequencer (Table 3). Analysis of the amino acid sequence and translation into nucleotide sequence confirmed the genes encoding for F4 and F5. Furthermore, the predicted molecular masses agreed reasonably well with the experimentally determined masses. 6 TABLE 3 N-Terminal Amino Acid Sequence N-Terminal Amino (Applied Biosystems 473A Acid Sequence Sequencer) Deduced from the Genes F4 MEQAK NIKLV MEQAK NIKLV (SEQ ID NO: 42) (SEQ ID NO: 41) F5 MRIVV GMXGA MRIVV GMSGA (SEQ ID NO: 44) (SEQ ID NO: 43)
Example 11 Identification of Genes Coding for Phenol-induced Proteins[0147] About 14 kb of the &lgr;EMBL3 gene library were sequenced (SEQ ID NO:23). The nucleotide sequence was analyzed with The ORF Finder (Open Reading Frame Finder) (http://www.ncbi.nlm.nih.gov/gorf/gorf.html) to find the open reading frames (ORFs). Eleven ORFs could be detected (orfs1-10 and orf-1) as shown in FIG. 11.
[0148] Analysis of the sequence revealed 10 ORFs that were transcribed in the same direction. The first six ORFs were separated by less than 65 bp and totaled 7210 bp. This cluster of putative genes was followed by a 658 bp non-coding region containing putative secondary structures.
[0149] Another cluster of putative genes followed which also showed less than 40 bp intergenic regions. Downstream of orf10 470 bp were sequenced; however this appeared not to code for proteins. Upstream of orf7 and transcribed in the opposite direction another putative gene was found which was separated by 428 bp from orf1.
[0150] The nucleotide sequence of an ORF is automatically transcribed in amino acid sequence by the ORF Finder. Comparison of deduced amino acid sequences of orf1-10 and orf-1 (see FIG. 11) with the experimentally determined N-terminal amino acid sequences of phenol-induced proteins and the internal sequences revealed that the following ORFs coded for known proteins. orf1 (SEQ ID NO:6) for F3, orf4 (SEQ ID NO:4) for F2, orf5 (SEQ ID NO:8) for F4, orf6 (SEQ ID NO:2) for F1 and orf8 (SEQ ID NO:10) for F5. The predicted molecular masses agreed reasonably well with the experimentally determined masses (FIG. 10).
[0151] The deduced amino acid sequences of the ORFs was analyzed by using the BLAST search (Basic Local Alignment Search Tool; Altschul et al., J. Mol. Biol. 215:403-410 (1990)) using the BLASTP 2.0.8 algorithm (http://www.ncbi.nlm.nih.gov/cgi-bin/BLAST/nph-newblast) provided by the National Center for Biotechnology Information and by using the BLAST+BEAUTY searches using the NCBI BLAST Server (http://dot.imgen.bcm.tmc.edu:9331/seq-search/Options/beauty_pp.html) (Tables 4 and 5). Table 4 contains homologous hits and Table 5 contains hits with the highest homology.
[0152] orf1 (SEQ ID NO:6) and orf2 (SEQ ID NO:12) are likely to encode for the phenol-phosphorylating enzyme E1. This conclusion is deduced from the high similarity of the genes with the domains of PEP synthase of E. coli. PEP synthase catalyzes a similar posphorylation reaction (FIGS. 1 and 11).
[0153] orf4 (SEQ ID NO:4), orf6 (SEQ ID NO:2), orf7 (SEQ ID NO:16) and orf8 (SEQ ID NO:10) are likely to represent the carboxylating enzyme E2. This conclusion is deduced from the high similarity of the genes with two enzymes of E. coli that catalyze the decarboxylation of a 4-hydroxybenzoate isoprene derivative to the corresponding phenolic product (ubiD and ubiX). This reaction is formally equal to the phenol carboxylation reaction (FIGS. 1 and 11).
[0154] The function of the proteins encoded by orf3 (SEQ ID NO:14), orf5 (SEQ ID NO:8), orf9 (SEQ ID NO:18) and orf70 (SEQ ID NO:20) are unknown, and have low homology to other known sequences. 7 TABLE 4 SEQ- ID Nuc- SEQ ID % % Similarity leo- Amino Iden- Similar- E- ORF Identified tide Acid titya ityb valueC −1 gnl|PID|d1010531 22 21 47.2 72.3 1e-20 582 (D63814) pheR aa [Pseudomonasputida] 563 aa 1 gi|147146(M69116) 6 5 16.7 39.3 4e-10 612 PEP synthase aa [E. coli] 793 aa 2 gi|147146(M69116) 12 11 21.8 34.5 1e-63 233 PEP synthase aa [E. coli] 793 aa 3 gi|2621183 14 13 14.5 30.2 1e-8 223 (AE000803) aa inosine-5′- monophosphate dehydrogenase [Methanobacterium thermoauto- trophicum] 484 aa 4 gi|549586|sp|P26615| 4 3 30.8 58.95 5e-47 472 yigC aa [E. coli] 497 aa 5 gi|2851406|sp|P45396| 8 7 38.8 63.8 2e-25 169 yrbI aa [E. coli] 188 aa 6 gi|549586|sp|P26615| 2 1 29.4 57.1 1e-31 485 yigC aa [E coli] 497 aa 7 gi|549586|sp|P26615| 16 15 24.7 47.5 7e-25 357 yigC aa [E. coli] 497 aa 8 gi|2507150|sp|P09550| 10 9 60.3 86.8 5e-56 194 ubiX aa [E. coli] 189 aa 9 gi|2622617 18 17 40 64.8 8e-13 143 (AE000910) aa conserved protein [Methanbacterium- thermo.] 122 aa 10 gi|2129134|pir| 20 19 36.1 62.7 2e-9 182 D64443| aa mutator protein mutT [Methanoccus jann.] 169 aa a% Identity is defined as percentage of amino acids that are identical between the two proteins. b% Similarity is defined as percentage of amino acids that are identical or conserved between the two proteins. cExpect value. The Expect value estimates the statistical significance of the match, specifying the number of matches, with a given score, that are expected in a search of a database of this size absolutely by chance. aa: amino acids Citation: BCM Search Launcher - Pairwise Sequence Alignment ALIGN - optimal global alignment with no short-cuts (EERIE) - (http://dot.imgen.bcm.tmc.edu:9331/seq-search/alignment.html)
[0155] 8 TABLE 5 Amino Acid Name Gene Dir Range Size Top Hit PheR Transcriptional regulator ← 688 2479 582 gil3445531 (AF026065) positive phenol-degradative gene regulator F3 PEP Synthase → 2864 4703 612 splO29548IPPSA_ARCFU PROBABLE PHOSPHOENOLPYRUVATE SYNTHASE PEP Synthase → 4707 5841 374 splP46893IPPSA_STAMA PROBABLE PHOSPHOENOLPYRUVATE SYNTHASE (PYRUVATE, WATER DIKINASE) (PEP SYNTHASE) inosine-5′-monophosphate → 5853 6525 223 gil2621183 (AE000803) inosine-5′-monophosphate dehydrogenase dehydrogenase [Methanobacterium thermoautotrophicum] F2 hypothetical protein → 6587 8006 472 gil2650432 (AE001091) conserved hypothetical protein [Archaeoglobus (oxidoreductase) fulgidus] F4 YRBI_ECOLI HYPOTHETICAL → 8070 8580 169 splP45396IYRBI_ECOLI HYPOTHETICAL 20.0 KD PROTEIN IN MURA-RPON INTERGENIC REGION F1 probable membrane protein → 8589 10074 485 pirIIS62018 probable membrane protein YDR539w - yeast [Saccharomyces cerevisiae] Conserved Hypothetical → 10773 11805 357 gil2622505 (AE000902) conserved protein [Methanobacterium (oxidoreductase?) thermoautotrophicum] F5 Decarboxylase → 11819 12404 194 splP09550IUBIX-ECOLI 3-OCTAPRENYL-4-HYDROXYBENZOATE CARBOXY-LYASE (POLYPRENYL P-HYDROXYBENZOATE DECARBOXYLASE) conserved protein → 12414 12846 143 gil2622617 (AE000910) conserved protein [Methanobacterium thermoautotrophicum] mutator MutT protein → 12884 13433 182 gil2622420 (AE000895) mutator MutT protein [Methanobacterium thermoautotrophicum]
[0156]
Claims
1. A polypeptide encoded by DNA selected from the group consisting of:
- (a) DNA having the nucleotide sequence shown in SEQ ID NO:2, SEQ ID NO:4, or SEQ ID NO:6;
- (b) a degenerate nucleotide sequence of the DNA of (1); and
- (c) DNA that hybridizes with the complement of the nucleotide sequence of (1) or analog thereof under hybridization conditions wherein 6× SSC (1 NaCl), 40 to 45% formamide, 1% SDS at 37° C., and a wash in 0.5× to 1× SSC at 55 to 60° C. wherein the polypeptide is further characterized by phosphorylase activity on phenol substrates.
2. The polypeptide of claim 1 having the amino acid sequence of SEQ ID NO:1, SEQ ID NO:3, or SEQ ID NO:5.
3. An isolated nucleic acid fragment encoding the polypeptide of claim 1, the nucleic acid fragment selected from the group consisting of:
- (a) an isolated nucleic acid fragment encoding all or a substantial portion of the amino acid sequence of SEQ ID NO:2; SEQ ID NO:4; or SEQ ID NO:6;
- (b) an isolated nucleic acid fragment that is substantially similar to an isolated nucleic acid fragment encoding all or a substantial portion of the amino acid sequence SEQ ID NO:2; SEQ ID NO:4; or SEQ ID NO:6;
- (c) an isolated nucleic acid molecule that hybridizes with the nucleic acid fragment of (a) under hybridization conditions wherein 6× SSC (1 NaCl), 40 to 45% formamide, 1% SDS at 37° C., and a wash in 0.5× to 1× SSC at 55 to 60° C.; and
- (d) an isolated nucleic acid fragment that is complementary to (a), (b), or (c),
- wherein the isolated nucleic acid is further characterized by phosphorlase activity on phenol substrates.
4. The DNA fragment of claim 3, wherein the DNA fragment is isolated from Thauera aromatica.
5. An expression cassette comprising the DNA fragment of claim 3 operably linked to suitable signal sequences for the expression of the DNA fragment in a host microorganism.
6. An expression vector comprising the expression cassette of claim 5 and regulatory sequences ensuring the stable maintenance of said expression vector.
7. A microorganism stably transformed with the DNA fragment of claim 3.
8. A transformed microorganism comprising the expression vector of claim 6.
9. A transformed microorganism comprising the expression cassette of claim 5, wherein the signal sequences of the expression cassette are a ribosome binding site and a promoter sequence located upstream of the DNA fragment.
10. The transformed microorganism of claim 9 wherein the promoter is at least one of CYC1, HIS3, GAL1, GAL10, ADH1, PGK, PHO5, GAPDH, ADC1, TRP1, URA3, LEU2, ENO, TPI, AOX1, lac, trp, 1PL, IPR, T7, tac, and trc or at least one strong promoter of Corynebacterium, Comamonas, Rhodococcus or Pseudomonas.
11. The transformed microorganism of claim 9, wherein the ribosome binding site is selected from the group consisting of ribosome binding sites from the genomes of E. coli, P. pastoris, Comamonas, Pseudomonas, Rhodococcus, and Corynebacterium.
12. The transformed microorganism of claim 11, wherein the host microorganism is selected from the group consisting of Comamonas sp., Corynebacterium sp., Brevibacterium sp., Rhodococcus sp., Azotobacter sp., Citrobacter sp., Enterobacter sp., Clostridium sp., Klebsiella sp., Salmonella s.p, Lactobacillus sp., Aspergillus sp., Saccharomyces sp., Zygosaccharomyces sp, Pichia sp., Kluyveromyces sp., Candida sp., Hansenula sp., Dunaliella sp., Debaryomyces sp., Mucor sp., Torylopsis sp., Methylobacteriasp., Bacillussp., Escherichia sp., Pseudomonas sp., Rhizobium sp., and Streptomyces sp.
13. An isolated and purified DNA fragment having a nucleotide sequence SEQ ID NO:2, SEQ ID NO:4, or SEQ ID NO:6.
14. An isolated and purified 14.27 kb DNA fragment as shown in FIG. 11.
15. A microorganism stably transformed with chimeric genes having at least one copy of one or more of nucleotide sequences selected from the group consisting of SEQ ID NOs:6, 12, 14, 4, 8, 2, 16, 10, 18, and 20.
16. A microorganism stably transformed with a chimeric gene having at least one copy of the nucleic acid sequence of SEQ ID NO:23.
Type: Application
Filed: May 30, 2001
Publication Date: Apr 11, 2002
Inventors: Georg Fuchs (Freiburg), Sabine Breinig (Freiburg Im Breigau)
Application Number: 09870162
International Classification: C12N009/22; C07H021/04; C12N005/06; C12P021/02;