Gene Expression or Activity Enhancing Elements

The present invention relates to transgenic nucleic acids, expression cassettes, vectors, plant cells, plant organs and plants. The invention also relates to methods for increasing expression or activity of a target gene, particularly in a plant cell or plant organ, and also to uses of recombinant nucleic acids and expression cassettes to increase expression or activity of a target gene or for manufacturing of a vector, plant cell, plant organ or plant. Incidentally, the invention relates to enhancers for achieving increased expression or activity of a target gene, particularly in a plant cell or plant organ, when operably linked to a promoter functional in such plant cell, plant organ or plant. The invention is described herein with reference to the technical field of production of polyunsaturated fatty acids (PUFAs), without being limited to this technical field. For the production of desired molecules in plant cells, e.g. PUFAs, it is frequently required to express a target gene heterologous to the plant cell, or to overexpress a target gene naturally found in said plant cell.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No. 14/902,715, which is a National Stage application of International Application No. PCT/IB2014/062816 filed Jul. 3, 2014, which claims priority under 35 U.S.C. § 119 to European Patent Application No. 13175398.0, filed Jul. 5, 2013; all of the aforementioned applications are hereby incorporated herein by reference in their entirety.

The present invention relates to transgenic nucleic acids, expression cassettes, vectors, plant cells, plant organs and plants. The invention also relates to methods for increasing expression or activity of a target gene, particularly in a plant cell or plant organ, and also to uses of recombinant nucleic acids and expression cassettes to increase expression or activity of a target gene or for manufacturing of a vector, plant cell, plant organ or plant. Incidentally, the invention relates to enhancers for achieving increased expression or activity of a target gene, particularly in a plant cell or plant organ, when operably linked to a promoter functional in such plant cell, plant organ or plant. The invention is described herein with reference to the technical field of production of polyunsaturated fatty acids (PUFAs), without being limited to this technical field.

REFERENCE TO SEQUENCE LISTING SUBMITTED VIA EFS-WEB

This application was filed electronically via EFS-Web and includes an electronically submitted sequence listing in .txt format. The .txt file contains a sequence listing entitled “75088A_Seqlisting.TXT” created on Jul. 13, 2020, and is 203,784 bytes in size. The sequence listing contained in this .txt file is part of the specification and is hereby incorporated by reference herein in its entirety.

WO 02/102970 discloses two Conlinin genes (Conlinin 1 and 2) and their respective promoter regions obtained from flax which can be utilized to improve seed traits, modify the fatty acid composition of seed oil and amino acid composition of seed storage protein, and produce bioactive compounds in plant seeds. The document also mentions methods based on using theses promoters to direct seed-specific expression of a gene of interest, which for example might be involved in lipid biosynthesis like e.g. acyl carrier protein, saturases, desaturases, and elongases.

WO 01/16340 discloses methods allowing the seed-specific expression of heterologous genes in flax and other plants. Of particular interest were promoters associated with fatty acid metabolism, such as acyl carrier protein, saturases, desaturases, elongases and the like.

Promoters function to initiate transcription of DNA into mRNA. Generally, transcribed mRNA comprises a translated region, also called a gene sequence, and upstream thereof an untranslated region. This untranslated region is generally believed not to have any profound influence on the translation of the gene sequence or the stability of the mRNA. Thus, the region between a promoter TATA box and a start codon is normally treated as being unimportant. For example, WO 01/16340 discloses a putative conlinin promotor, but the only GUS expression construct disclosed (herein reproduced as SEQ ID NO. 311) in this document is shortened on the 3′ side of the putative promoter sequence.

Finding enhancer genetic elements which can improve the expression or activity of a target gene in a cell of interest is an ongoing demand for the development of improved agronomic traits. Specifically, oilseed crops producing modified fatty acid composition of the seed oil is a demand which makes the identification of further enhancing elements necessary; preferably, promoters are needed further improving the expression of genes of the fatty acid biosynthesis.

It has now been unexpectedly found that certain nucleic acids can improve expression or activity of a target or reporter gene when the gene is operably linked to a promoter.

It is to be understood that this invention is not limited to the particular methodology, protocols, cell lines, plant species or genera, constructs, and reagents described as such. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention, which will be limited only by the appended claims.

It must be noted that as used herein and in the appended claims, the singular forms “a”, “and,” and “the” include plural reference unless the context clearly dictates otherwise. Thus, for example, reference to “a vector” is a reference to one or more vectors and includes equivalents thereof known to those skilled in the art, and so forth. The term “about” is used herein to mean approximately, roughly, around, or in the region of. When the term “about” is used in conjunction with a numerical range, it modifies that range by extending the boundaries above and below the numerical values set forth. In general, the term “about” is used herein to modify a numerical value above and below the stated value by a variance of 20 percent, preferably 10 percent up or down (higher or lower). As used herein, the word “or” means any one member of a particular list and also includes any combination of members of that list.

To overcome, reduce or mitigate the aforementioned disadvantages and/or to further the aforementioned goals and/or improve the aforementioned advantages, the invention provides a recombinant nucleic acid comprising a target gene and an untranslated region adjacent to the target gene, wherein the untranslated region comprises an enhancer of at least 18 consecutive nucleotides, wherein at least 14 nucleotides are adenosine or cytidine.

Describing the invention from another perspective, the invention provides a recombinant nucleic acid comprising a plant promoter and an untranslated region adjacent to the promoter, wherein the untranslated region comprises an enhancer of at least 18 consecutive nucleotides, wherein at least 14 nucleotides are adenosine or cytidine.

According to the invention is also provided an enhancer comprising

    • a) a CCAAT-Box comprising SEQ ID NO. 100, preferably comprising a sequence having at least 90% identity to SEQ ID NO. 101 and comprising SEQ ID NO. 100, and more preferably SEQ ID NO. 101, and/or
    • b) a Dof1/MNB1a binding site comprising SEQ ID NO. 102, preferably comprising a sequence having at least 90% identity to SEQ ID NO. 103 and comprising SEQ ID NO. 102, and more preferably SEQ ID NO. 103.

Further according to the invention is provided an expression cassette comprising a recombinant nucleic acid according to the invention, and, if not already comprised in the recombinant nucleic acid, a plant promoter, wherein the promoter comprises

    • a TATA-box, preferably comprising SEQ ID NO. 108, more preferably comprising a sequence having at least 90% identity to SEQ ID NO. 107 and comprising SEQ ID NO. 108, and more preferably comprising SEQ ID NO. 107, and
    • a CPRF factor binding site, preferably comprising SEQ ID NO. 114, more preferably comprising a sequence having at least 90% identity to SEQ ID NO. 113 and comprising SEQ ID NO. 114, and more preferably comprising SEQ ID NO. 113, and
    • a TCP class I transcription factor binding site, preferably comprising SEQ ID NO. 116, more preferably comprising a sequence having at least 90% identity to SEQ ID NO. 115 and comprising SEQ ID NO. 116, and more preferably comprising SEQ ID NO. 115, and 1

a bZIP protein G-Box binding factor 1 binding site, preferably comprising SEQ ID NO. 118, more preferably comprising a sequence having at least 90% identity to SEQ ID NO. 117 and comprising SEQ ID NO. 118, and more preferably comprising SEQ ID NO. 117.

The invention also provides a vector comprising the expression cassette of the present invention.

Further, the invention provides plants, plant organs or plant cells comprising a recombinant nucleic acid according to the present invention or an expression cassette according to the present invention.

The invention also teaches a method of increasing expression or activity of a target gene, comprising the steps of

    • i) providing, upstream of the target gene, an untranslated region and a plant promoter to obtain an expression cassette according to any of claims 9 to 11, and
    • ii) introducing the expression cassette into a plant cell to allow expression of the target gene.

According to the invention, an enhancer as described herein according to the invention or an expression cassette according to the invention can be used for

    • increasing expression or activity of a target gene,
    • producing a vector according to claim 15, or for
    • producing a plant, plant organ or plant cell according to claim 16.

The invention is hereinafter described in more detail. Unless specifically stated otherwise, the definitions of the chapter “definitions” apply throughout all of this text.

DETAILED DESCRIPTION OF THE INVENTION

One way to look at the invention is to understand that according to the present invention a recombinant nucleic acid is provided, said nucleic acid comprising a target gene and an untranslated region adjacent to the target gene. According to the invention, the untranslated region comprises an enhancer of at least 18 consecutive nucleotides, wherein at least 14 nucleotides are adenosine or cytidine.

The inventors have found that a particular section of a nucleic acid—preferred embodiments of which will be described hereinafter—functions as an enhancer in host cells, particularly plant cells, that is, activity of a target or reporter gene product can be increased by operably linking the reporter gene with the enhancer. The enhancer and the target gene are thus functionally linked, i.e. the enhancer influences or modifies transcription or translation of the target gene. Increase of activity by the enhancer of the present invention can be achieved by an increase in production of mRNA by the host cell, or can be achieved by an increased rate of translation of the mRNA, e.g. by improving the binding of ribosomes to the mRNA, or by protection of the mRNA against degradation. The invention, however, is not limited by any of these mechanisms.

The enhancer is preferably heterologous to the target gene and/or a promoter driving expression of the target gene. Thus, the following sequences are not part of the present invention: SEQ ID NO. 311 to 323 which are particularly comprised in WO0116340, WO2002102970, WO2009130291

The target gene can be any gene whose activity in a plant is desired to be increased. Increase is determined by comparison of the activity of the target gene being expressed in the same type of cell, e.g. seed cells, root cells and so on, without being functionally linked to the enhancer. Examples of useful target genes are fatty acid desaturase and fatty acid elongase genes, particularly d12d15Des(Ac_GA) (cf. WO 2007042510), d12Des(Ce_GA) (cf. US 2003172398), d12Des(Co_GA2) (cf. WO 200185968), d12Des(Fg) (cf. WO 2007133425), d12Des(Ps_GA) (cf. WO 2006100241), d12Des(Tp_GA) (cf. WO 2006069710), d6Des(Ol_febit) (cf. WO 2008040787), d6Des(Ol_febit)2 (cf. WO 2008040787), d6Des(Ot_febit) (cf. WO 2008040787), d6Des(Ot_GA) (cf. WO 2005083093), d6Des(Ot_GA2) (cf. WO 2005083093), d6Des(Pir) (cf. WO 2002026946), d6Des(Pir_GAI) (cf. WO 2002026946), d6Des(Plu) (cf. WO 2007051577), d6Elo(Pp_GA) (cf. WO 2001059128), d6Elo(Pp_GA2) (cf. WO 2001059128), d6Elo(Pp_GA3) (cf. WO 2001059128), d6Elo(Tp_GA) (cf. WO 2005012316) and d6Elo(Tp_GA2) (cf. WO 2005012316).

The enhancer is comprised in or forms an untranslated region adjacent to the target gene. For the present invention, the untranslated region is considered adjacent to the target gene if no translated region other than a region belonging to the target gene is located between the untranslated region comprising or consisting of the enhancer and the target gene. Thus, for example, in cases where the target gene comprises several exons, the untranslated region is considered to be located adjacent to the target gene when the untranslated region us located upstream of the first exon such that no translated region is located between the untranslated region comprising or being the enhancer and the first exon. For the sake of the present invention, exons are counted in 5′ to 3′ direction, so that the first exon is the one comprising the start codon of the target gene.

It is to be noted that the untranslated region may comprise, in addition to the enhancer of the present invention, further functional units including transcription or translation enhancing sequences. Regardless of whether the untranslated region comprises such further functional units the untranslated region is, for the purposes of the present invention, located adjacent to the target gene under the aforementioned conditions, that is, no translated region is located between the untranslated region and the target gene. It is preferred but not required that the enhancer of the present invention as such is adjacent to the translation start codon of the target gene.

The untranslated region may or may not be transcribed in a cell. In particular, the untranslated region may comprise translation enhancing sequences or mRNA stability enhancing sequences which are transcribed but not translated.

The untranslated region and the enhancer are preferably located upstream of the target gene. If the nucleobases of the target gene were numbered starting from 1 for the 5′-most nucleobase of the first translated codon (normally “A” of the codon “ATG” of a DNA sequence corresponding to “AUG” of the target gene mRNA) and incrementing the number in 3′ direction, then nucleobases of the enhancer would be designated by negative numbers, as the untranslated region is preferably located in 5′ direction of the target gene.

It is particularly preferred that the untranslated region comprises, in 5′ to 3′ direction, the enhancer of the present invention and one or more further functional units, particularly one or more NEENAs or RENAs as described for example in WO2013038294.

At the junction of the untranslated region and the target gene preferably a Kozak sequence is located. Preferably a Kozak sequence comprises the nucleotide sequence “ATGG”, wherein the “ATG” is the start codon of the target gene. Kozak sequences facilitate the translation of the target gene. The skilled person can adapt the exact nucleotide sequence of the Kozak sequence according to the cell he would like to use, and also according to the expression needs of the target gene. For example, the skilled person could create all 256 variants of the sequence “NNATGNN”, where “N” designates any nucleobase “A”, “C”, “G” or “T”, and clone these variants in the cell he intends to use. By analyzing the activity of the target gene he will find the Kozak sequence optimal for his needs. The number of variants can be significantly reduced if the second amino acid is important for the functioning of the target gene, because in such cases at least the first nucleotide after (that is, in 3′ direction of) the “ATG” start codon is limited to one or two alternatives. A preferred Kozak sequence is “CCATGG”, as this sequence is also recognized by the restriction enzymes NcoI or Bsp19I, thus facilitating cloning of the target gene adjacent to the untranslated region. For the purposes of the present invention the leading “CC” nucleobases are considered to belong to the untranslated region.

The enhancer of the present invention comprises or consists of at least 18 consecutive nucleotides. The enhancer is thus not interrupted by any other element, be it a functional element or a non-functional element. As is described below, the enhancer and also the untranslated region can be substantially longer than 18 nucleotides, and preferably the enhancer consists of 57 or 58 consecutive nucleotides as described below.

The enhancer of the present invention has several beneficial features. For one, the enhancer sequence is short compared to other expression inducing sequences like NEENAs as described for example in WO2011023537, WO 2011023539, WO2011023800 or WO2013/005152. The enhancer thus can be incorporated with ease also in such constructs which are under severe size limitation, e.g. due to the number and/or size of genes to be incorporated in the respective construct.

Also, the enhancer has been shown to be active for a huge number of different target genes, particularly for desaturase and elongase genes of highly disparate sequence. Thus, the present invention provides a universally applicable enhacer for use in plants.

The enhancer of the present invention also is functional not only in Arabidopsis but also in other plants, particularly crop plants as described below, particularly in plant cells of the Brassicaceae family, even more in plant cells of genus Brassica and even more in particular in cells of Brassica napus, Brassica oleracea, Brassica carinata, Brassica nigra, Brassica juncea and Brassica rapa.

Another advantage of the enhancer of the present invention is that it is useful for increasing expression or activity of a target gene expressed under the control of a seed-specific promoter. Particularly the enhancer of the present invention can be combined with a Conlinin-type promoter to achieve seed-specific expression as described in WO2002102970. However, the enhancer of the present invention can also be combined with other promoters to increase expression or activity of a target gene.

The untranslated region or enhancer of the present invention preferably comprises any nucleotide sequence according to SEQ ID NO. 84, 85, 86, 87, 88 or 89. It is particularly preferred that the untranslated region or enhancer comprises two copies of one or more of the aforementioned sequences. The nucleotide sequences according to SEQ ID NO. 84, 85, 86, 87, 88 or 89 can be present in the untranslated region or enhancer in an overlapping form. For example, a nucleotide sequence of two cytidines followed by five adenosine nucleobases would simultaneously embody the sequences according to SEQ ID NO. 85, 86 and 87.

It is particularly preferred that the enhancer comprises the nucleotide sequence according to SEQ ID NO. 84. This sequence comprises the core motif SEQ ID NO. 100 of the plant CCAAT-box found in plant promoters. Thus, the presence of the sequence according to SEQ ID NO. 84 is particularly suitable for achieving the effects of transcription factor binding to this sequence. It is particularly preferred that the enhancer or untranslated region comprises two copies of SEQ ID NO. 84 separated by approximately 5 turns in a DNA helix, that is, a DNA sequence comprising the enhancer or untranslated region preferably comprises two instances of SEQ ID NO. 84 separated by 52, 53, 54, 55, 56, 57 or 58 nucleotides counting from the 1st nucleotide of the first (i.e. 5′-most) instance of SEQ ID NO. 84 to the last nucleotide in 5′ direction in front of the second instance of SEQ ID NO. 84, preferably they are separated by 54, 55 or 56 nucleotides and most preferably by 55 nucleotides. For example, in the nucleotide sequence according to SEQ ID NO. 143 the instances of SEQ ID NO. 84 of the untranslated region are separated by 55 nucleotides.

It is also particularly preferred that the enhancer is functionally linked to a promoter such that the enhancer can be transcribed in a cell. This aspect of the invention is described in greater detail below. In such cases it is particularly preferred that the enhancer comprises at least one copy or instance of SEQ ID NO. 84.

According to the present invention, the enhancer preferably comprises or consists of 18 consecutive nucleotides, of which at least 15, preferably at least 17 nucleotides are adenosine or cytidine, and most preferably at most 1 nucleotide is neither adenosine nor cytidine. Preferred embodiments of such enhancers are described in SEQ ID NO. 20 to SEQ ID NO. 45. Of these, the sequences according to SEQ ID NO. 20, 22, 25, 28, 34, 35 and 36 are preferred as they comprise an instance of SEQ ID NO. 84. Particularly preferred is the sequence according to SEQ ID NO. 25, as this sequence comprises all of SEQ ID NO. 84, 100 and 101 and thus closely resembles a plant CCAAT box.

It is also preferred that the enhancer comprises or consists of 21 consecutive nucleotides, of which at least 15, preferably at least 16 nucleotides are adenosine or cytidine, and most preferably at most 2 nucleotides are neither adenosine nor cytidine. A correspondingly preferred sequence is given by SEQ ID NO. 95 and by the last 21 nucleotides of any of SEQ ID NO. 46, 96 161-170, 221-230, 276-285 and 301-310.

It is also preferred that the enhancer comprises or consists of 22 consecutive nucleotides, of which at least 16, preferably at least 17 nucleotides are adenosine or cytidine, and most preferably at most 2 nucleotides are neither adenosine nor cytidine. A preferred instance of such sequence is given by the last 22 nucleotides of any of SEQ ID NO. 46, 96 161-170, 221-230, 276-285 and 301-310.

It is also preferred that the enhancer comprises or consists of 24 consecutive nucleotides, of which at least 18, preferably at least 19 nucleotides are adenosine or cytidine, and most preferably at most 3 nucleotides are neither adenosine nor cytidine. A preferred instance of such sequence is given by the last 24 nucleotides of any of SEQ ID NO. 46, 96 161-170, 221-230, 276-285 and 301-310.

It is also preferred that the enhancer comprises or consists of 36 consecutive nucleotides, of which at least 27, preferably at least 28 nucleotides are adenosine or cytidine, and most preferably at most 6 nucleotides are neither adenosine nor cytidine. A preferred instance of such sequence is given by SEQ ID NO. 96 and by the last 36 nucleotides of any of SEQ ID NO. 46, 161-170, 221-230, 276-285 and 301-310.

It is also preferred that the enhancer comprises or consists of 57 consecutive nucleotides, of which at least 42, preferably at least 45 nucleotides are adenosine or cytidine, and most preferably at most 8 nucleotides are neither adenosine nor cytidine. Preferred examples of such enhancer are given by any of SEQ ID NO. 46 to 83, or the last 57 nucleotides of any of SEQ ID NO. 46, 96 161-170, 221-230, 276-285 and 301-310.

It is also preferred that the enhancer comprises or consists of 83 consecutive nucleotides, of which at least 62, preferably at least 65 nucleotides are adenosine or cytidine, and most preferably at most 8 nucleotides are neither adenosine nor cytidine. A preferred instance of such sequence is given by the last 83 nucleotides of any of SEQ ID NO. 161-170, 221-230, 276-285 and 301-310. Further preferred instances of such sequence are given by the last (i.e. counting from the 3′ end) 83 nucleotides of a combination of SEQ ID NO. 140 and any of SEQ ID NO. 46 to 83, wherein the sequence of SEQ ID NO. 140 is fused immediately to the 5′ end of any of SEQ ID NO. 46 to 83.

The sequences of SEQ ID NO. 161-170, 221-230, 276-285 and 301-310 are, for each group, sorted in descending order of preference. For example SEQ ID NO. 161 is more preferred than SEQ ID NO. 162, and SEQ ID NO. 221 is more preferred than SEQ ID NO. 230. The groups, however, are sorted in ascending order of preference, such that for example SEQ ID NO. 221 is more preferred than SEQ ID NO. 161 or SEQ ID NO. 170.

Among the sequences disclosed in the present application, the sequences of SEQ ID NO. 161-170, 221-230, 276-285 and 301-310 are special, because these sequences were checked not to affect major known or predicted cis-regulatory elements of the sequence according to SEQ ID NO. 1. The cis-regulatory elements that were checked comprise those mentioned below in greater detail, i.e. TATA-box, CPRF factor binding site, TCP class I transcription factor binding site, bZIP protein G-Box binding factor 1 binding site, Ry motif, prolamin box, Cis-element as in GAPDH promoters conferring light inducibility, SBF-1 binding site and Sunflower homeodomain leucine-zipper protein Hahb-4 binding site. This approach has been demonstrated to provide functional variants of the seed-specific p-PvARC5, the p-VfSBP and the p-BnNapin promoters in a GUS reporter gene assay and is described in more details in WO2012077020, which is incorporated herein by reference.

According to the present invention it is thus preferred if the enhancer comprises any sequence according to

    • a) any of SEQ ID NO. 20 to SEQ ID NO. 45, or any sequence according to
    • b) the last 18, 21, 22, 24, 36 or 57 nucleotides of any of SEQ ID NO.46-83, 161-170, 221-230, 276-285 or 301-310, or
    • c) any of SEQ ID NO. 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103 or 137, or
    • d) a sequence according to b) or c) with 1 additional base inserted therein.

It is to be understood that preferably regardless of the number of A, C, G and T nucleotides an enhancer is considered an enhancer of the present invention if it consists of or comprises SEQ ID NO. 137.

Another way to look at the invention is to understand that according to the present invention a recombinant nucleic acid is provided, said nucleic acid comprising a plant promoter and an untranslated region adjacent to the promoter, wherein the untranslated region comprises an enhancer of the present invention. As described above, the enhancer consists of or comprises at least 18 consecutive nucleotides, wherein at least 14 nucleotides are adenosine or cytidine. Preferred enhancers and untranslated regions are described particularly above in greater detail. As described above, the enhancer is preferably heterologous to the promoter.

According to the present invention the untranslated region or enhancer is adjacent to a promoter as long as no translated region is present between the 3′-most TATA box of the promoter and the 5′ end of the untranslated region or enhancer. Preferably, the enhancer is located immediately contiguous to the promoter 3′ end, or is preferably separated from the promoter 3′ end by at most 56 nucleotides, even more preferably by at most 39 nucleotides and even more preferably by at most 17 nucleotides. A preferred spacer sequence of 17 nucleotides length is given in SEQ ID NO. 140.

The promoter of the present invention preferably is a minimal promoter, and thus preferably consists only of the minimum length and nucleotide sequence required to achieve expression of a target gene functionally linked to said promoter and enhancer or untranslated region. This way the advantages described above due to the short length of the enhancer of the present invention are preserved. Particularly, a combination of a minimal promoter and the enhancer of the present invention allows to provide an expression cassette as described below that is not much longer than the target gene. Thus, the combination of a minimal promoter and the enhancer of the present invention allows cloning of long nucleotide sequences also in vectors and using transformation means which are restricted in size. This is particularly important when trying to establish, in plants, new metabolic pathways which require the introduction of multiple genes. Such pathways are for example described in WO2005083093, WO2007017419, WO2007042510 and WO2007096387.

The promoter can also be minimal in the sense that it consists only of the minimum length an nucleotide sequence required to function as a promoter under specific circumstances, e.g. driving expression of a gene functionally linked to said promoter only in specific plant tissues, developmental stages or under specific environmental conditions like heat stress or attempted pathogen infection. The promoter can, according to the present invention, also be longer or comprise more transcription influencing elements (e.g. transcription factor binding sites) than a minimal promoter. Suitable promoters are described e.g. in WO2002102970, WO2009077478, WO2010000708 and WO2012077020, the contents of which are incorporated herein by reference.

A preferred promoter comprises a TATA-box, preferably comprising SEQ ID NO. 108, more preferably comprising a sequence having at least 89% identity to SEQ ID NO. 107 and comprising SEQ ID NO. 108, and more preferably comprising SEQ ID NO. 107. Such TATA box facilitates onset of transcription particularly in plant cells. As the TATA box at least comprises the core motif SEQ ID NO. 108 of plant TATA boxes, at least a minimal activity of the promoter in plant can be achieved. If the promoter does not comprise the exact sequence SEQ ID NO. 107, then the promoter preferably comprises at least a sequence similar thereto. Such similar sequence contains the exact sequence SEQ ID NO. 108 and has a minimum of 89% identity to SEQ ID NO. 107 and thus preferably differs by at most two nucleotides from the sequence of SEQ ID NO. 107, and even more preferably differs by at most one nucleotide from the sequence of SEQ ID NO 107.

A preferred promoter comprises a CPRF factor binding site, preferably comprising SEQ ID NO. 114, more preferably comprising a sequence having at least 90% identity to SEQ ID NO. 113 and comprising SEQ ID NO. 114, and more preferably comprising SEQ ID NO. 113. Where such promoter does not comprise the exact sequence SEQ ID NO. 113, it comprises a sequence differing from SEQ ID NO. 113 by at most one nucleotide and contains in this sequence the exact sequence SEQ ID NO. 114.

A preferred promoter comprises a TCP class I transcription factor binding site, preferably comprising SEQ ID NO. 116, more preferably comprising a sequence having at least 90% identity to SEQ ID NO. 115 and comprising SEQ ID NO. 116, and more preferably comprising SEQ ID NO. 115. Where such promoter does not comprise the exact sequence SEQ ID NO. 115, it comprises a sequence differing from SEQ ID NO. 115 by at most one nucleotide and contains in this sequence the exact sequence SEQ ID NO. 116.

A preferred promoter comprises a bZIP protein G-Box binding factor 1 binding site, preferably comprising SEQ ID NO. 118, more preferably comprising a sequence having at least 85% identity to SEQ ID NO. 117 and comprising SEQ ID NO. 118, and more preferably comprising SEQ ID NO. 117. If the promoter does not comprise the exact sequence SEQ ID NO. 117, then the promoter preferably comprises at least a sequence similar thereto. Such similar sequence contains the exact sequence SEQ ID NO. 118 and has a minimum of 85% identity to SEQ ID NO. 117 and thus preferably differs by at most three nucleotides from the sequence of SEQ ID NO. 117, more preferably differs by at most two nucleotides from the sequence of SEQ ID NO 117 and even more preferably differs by at most one nucleotide from the sequence of SEQ ID NO 117.

A preferred promoter comprises a Ry motif, preferably comprising SEQ ID NO. 110, more preferably comprising a sequence having at least 88% identity to SEQ ID NO. 109 and comprising SEQ ID NO. 110, and more preferably comprising SEQ ID NO. 109. Where such promoter does not comprise the exact sequence SEQ ID NO. 109, contains the exact sequence SEQ ID NO. 110 and has a minimum of 88% identity to SEQ ID NO. 109 and thus preferably differs by at most three nucleotides from the sequence of SEQ ID NO. 109, more preferably differs by at most two nucleotides from the sequence of SEQ ID NO 109 and even more preferably differs by at most one nucleotide from the sequence of SEQ ID NO 109.

A preferred promoter comprises a prolamin box, preferably comprising SEQ ID NO. 112, more preferably comprising a sequence having at least 90% identity to SEQ ID NO. 111 and comprising SEQ ID NO. 112, and more preferably comprising SEQ ID NO. 111. Where such promoter does not comprise the exact sequence SEQ ID NO. 111, contains the exact sequence SEQ ID NO. 112 and has a minimum of 90% identity to SEQ ID NO. 111 and thus preferably differs by at most two nucleotides from the sequence of SEQ ID NO 111 and even more preferably differs by at most one nucleotide from the sequence of SEQ ID NO 111.

A preferred promoter comprises a Cis-element as in GAPDH promoters conferring light inducibility, preferably comprising SEQ ID NO. 120, more preferably comprising a sequence having at least 90% identity to SEQ ID NO. 119 and comprising SEQ ID NO. 120, and more preferably comprising SEQ ID NO. 119. Where such promoter does not comprise the exact sequence SEQ ID NO. 119, contains the exact sequence SEQ ID NO. 120 and has a minimum of 90% identity to SEQ ID NO. 119 and thus preferably differs by at most two nucleotides from the sequence of SEQ ID NO 119 and even more preferably differs by at most one nucleotide from the sequence of SEQ ID NO 119.

A preferred promoter comprises a SBF-1 binding site, preferably comprising SEQ ID NO. 122, more preferably comprising a sequence having at least 90% identity to SEQ ID NO. 121 and comprising SEQ ID NO. 122, and more preferably comprising SEQ ID NO. 121. Where such promoter does not comprise the exact sequence SEQ ID NO. 121, contains the exact sequence SEQ ID NO. 122 and has a minimum of 90% identity to SEQ ID NO. 121 and thus preferably differs by at most two nucleotides from the sequence of SEQ ID NO 121 and even more preferably differs by at most one nucleotide from the sequence of SEQ ID NO 121.

A preferred promoter comprises a Sunflower homeodomain leucine-zipper protein Hahb-4 binding site, preferably comprising SEQ ID NO. 124, more preferably comprising a sequence having at least 80% identity to SEQ ID NO. 123 and comprising SEQ ID NO. 124, and more preferably comprising SEQ ID NO. 123. Where such promoter does not comprise the exact sequence SEQ ID NO. 123, contains the exact sequence SEQ ID NO. 124 and has a minimum of 80% identity to SEQ ID NO. 123 and thus preferably differs by at most two nucleotides from the sequence of SEQ ID NO 123 and even more preferably differs by at most one nucleotide from the sequence of SEQ ID NO 123.

A preferred promoter comprises a Transcriptional repressor BELLRINGER, preferably comprising SEQ ID NO. 126, more preferably comprising a sequence having at least 80% identity to SEQ ID NO. 125 and comprising SEQ ID NO. 126, and more preferably comprising SEQ ID NO. 125. Where such promoter does not comprise the exact sequence SEQ ID NO. 125, contains the exact sequence SEQ ID NO. 126 and has a minimum of 80% identity to SEQ ID NO. 125 and thus preferably differs by at most two nucleotides from the sequence of SEQ ID NO 125 and even more preferably differs by at most one nucleotide from the sequence of SEQ ID NO 125.

A preferred promoter comprises a Floral homeotic protein APETALA1, preferably comprising SEQ ID NO. 128, more preferably comprising a sequence having at least 85% identity to SEQ ID NO. 127 and comprising SEQ ID NO. 128, and more preferably comprising SEQ ID NO. 127. Where such promoter does not comprise the exact sequence SEQ ID NO. 127, contains the exact sequence SEQ ID NO. 128 and has a minimum of 85% identity to SEQ ID NO. 127 and thus preferably differs by at most three nucleotides from the sequence of SEQ ID NO 127, more preferably by at most two nucleotides from the sequence of SEQ ID NO 127 and even more preferably differs by at most one nucleotide from the sequence of SEQ ID NO 127.

A preferred promoter comprises an inducer of CBF expression 1, also called AtMYC2 (rd22BP1), preferably comprising SEQ ID NO. 130, more preferably comprising a sequence having at least 85% identity to SEQ ID NO. 129 and comprising SEQ ID NO. 130, and more preferably comprising SEQ ID NO. 129. Where such promoter does not comprise the exact sequence SEQ ID NO. 129, contains the exact sequence SEQ ID NO. 130 and has a minimum of 85% identity to SEQ ID NO. 129 and thus preferably differs by at most two nucleotides from the sequence of SEQ ID NO 129 and even more preferably differs by at most one nucleotide from the sequence of SEQ ID NO 129.

A preferred promoter comprises a binding site for bZIP factors DPBF-1 and/or 2, preferably comprising SEQ ID NO. 132, more preferably comprising a sequence having at least 81% identity to SEQ ID NO. 131 and comprising SEQ ID NO. 132, and more preferably comprising SEQ ID NO. 131. Where such promoter does not comprise the exact sequence SEQ ID NO. 131, contains the exact sequence SEQ ID NO. 132 and has a minimum of 81% identity to SEQ ID NO. 131 and thus preferably differs by at most two nucleotides from the sequence of SEQ ID NO 131 and even more preferably differs by at most one nucleotide from the sequence of SEQ ID NO 131.

A preferred promoter comprises a binding site for Class I GATA factors, preferably comprising SEQ ID NO. 134, more preferably comprising a sequence having at least 88% identity to SEQ ID NO. 133 and comprising SEQ ID NO. 134, and more preferably comprising SEQ ID NO. 133. Where such promoter does not comprise the exact sequence SEQ ID NO. 133, contains the exact sequence SEQ ID NO. 134 and has a minimum of 88% identity to SEQ ID NO. 133 and thus preferably differs by at most two nucleotides from the sequence of SEQ ID NO 133 and even more preferably differs by at most one nucleotide from the sequence of SEQ ID NO 133.

A preferred promoter comprises a binding site for Dof2 single zinc finger transcription factor, preferably comprising SEQ ID NO. 136, more preferably comprising a sequence having at least 88% identity to SEQ ID NO. 135 and comprising SEQ ID NO. 136, and more preferably comprising SEQ ID NO. 135. Where such promoter does not comprise the exact sequence SEQ ID NO. 135, contains the exact sequence SEQ ID NO. 136 and has a minimum of 88% identity to SEQ ID NO. 135 and thus preferably differs by at most two nucleotides from the sequence of SEQ ID NO 135 and even more preferably differs by at most one nucleotide from the sequence of SEQ ID NO 135.

A preferred promoter comprises a combination of two or more of the aforementioned transcription factor binding sites or cis-active elements. Preferably, the promoter comprises a TATA-box, a CPRF binding site, a TCP class I transcription factor binding site and a bZIP protein G-Box binding factor 1 binding site, each as defined above. Particularly preferred is a promoter comprising the sequence SEQ ID NO. 138 and/or SEQ ID NO. 139, or a sequence being at least 70% identical, preferably 80% identical, more preferably at least 90% identical to any of these sequences, and even more preferably differing from any of these sequences by at most 10 nucleotides, even more preferably by at most 9 nucleotides, even more preferably by at most 8 nucleotides, even more preferably by at most 7 nucleotides, even more preferably by at most 6 nucleotides, even more preferably by at most 5 nucleotides, even more preferably by at most 4 nucleotides, even more preferably by at most 3 nucleotides, even more preferably by at most 2 nucleotides, even more preferably by at most 1 nucleotide. Where such promoter comprises a sequence being at least 70% identical to SEQ ID NO. 138, the promoter preferably comprises at least one binding site for each of the transcription factors CPRF, TCP class I transcription factor and bZIP protein G-Box binding factor 1 as defined above, and preferably comprises at least each sequence according to SEQ ID NO. 114, 118 and 120. Where such promoter comprises a sequence being at least 70% identical to SEQ ID NO. 139, the promoter preferably comprises at least one binding site for each of the transcription factors BELLRINGER, APETALA1, CBF expression inducer 1, DPBF-1 and 2, Class I GATA factors and Dof2 as defined above, and preferably comprises at least each sequence according to SEQ ID NO. 126, 128, 130, 132, 134 and 136.

The function of the transcription factors referred herein are known to the skilled person. By providing the corresponding transcription factor binding sites ,e.g. as defined above, the skilled person achieves the benefits inherent in the action of these transcription factors. Particularly, the skilled person can combine two or more and preferably all of the aforementioned transcription factor binding sites.

With respect to the present invention the difference between nucleic acid sequences is calculated as the minimum number of substitutions, insertions or deletions required to transform one sequence into the other. Thus, for example, a sequence “ACGT” and “ATGT” differ by one nucleotide and have 75% sequence identity relative to the first sequence, and the sequences “AACCGGTT” and “AACTGTT” differ by two nucleotides, i.e one deletion and one substitution, and have 87.5% sequence identity relative to the first sequence. For the purposes of the present invention, sequences are given in the form of DNA sequences, the corresponding RNA sequences being considered identical, such that a substitution of “T” by “U” and vice versa is disregarded.

The promoter preferably has a length of at least or exactly 98 nucleotides, even more preferably at least or exactly 142 nucleotides, even more preferably at least or exactly 160 nucleotides, even more preferably at least or exactly 197 nucleotides, even more preferably at least or exactly 235 nucleotides and even more preferably at least or exactly 1063 nucleotides. A promoter having a length of not more than 98 nucleotides is particularly suitable for cloning of target genes under severe size limitation, a promoter of not more than 142 nucleotides is also still useful for cloning of target genes under severe size limitation, a promoter of not more than 160 nucleotides is also still useful for cloning of target genes under severe size limitation but is less preferred due to its larger size, a promoter of not more than 197 nucleotides is also still useful for cloning of target genes under severe size limitation but is less preferred due to its larger size, a promoter of not more than 235 nucleotides is also still useful for cloning of target genes under severe size limitation but is less preferred due to its larger size. Suitable promoters are selected preferably among those given in any of SEQ ID NO. 141, 144, 147, 150, 153, 156 and 159, and also in any of SEQ ID NO. 171-220, 231-275 and 286-300. Suitable promoters are also selected preferably among those having at least 70%, more preferably at least 80% and more preferably at least 90% sequence identity to any of SEQ ID NO. 141, 144, 147, 150, 153, 156, 159, 171-220, 231-275 and 286-300, and preferably comprise two or more transcription factor binding sites as described above.

Preferred nucleic acid sequences comprising a combination of a promoter and an enhancer of the present invention are selected from those of SEQ ID NO. 143, 146, 149, 152, 155, 158 and 1, and also from those of SEQ ID NO. 161-170, 221-230, 276-285 and 301-310. The order of preference for the sequences of SEQ ID NO. 161-170, 221-230, 276-285 and 301-310 and the reasons therefore are given above.

The invention also provides an expression cassette, comprising or consisting of a recombinant nucleic acid as described above. Where such recombinant nucleic acid does not already comprise a promoter, the expression cassette additionally comprises a promoter, preferably a plant promoter as described above. Thus, an expression cassette according to the present invention comprises, in 5′ to 3′ direction, a promoter, an untranslated region being or comprising the enhancer of the present invention, a target gene and optionally a terminator or other elements. The expression cassette of the present invention preferably comprises a promoter as defined above and an untranslated region or enhancer as described above. This way, the advantages attributed supra to the promoter and enhancer can be achieved using the expression cassette of the present invention. The expression cassette allows an easy transfer of a target gene into an organism, preferably a cell and preferably a plant cell.

Thus, the expression cassette of the present invention preferably comprises a promoter which in turn comprises

    • a TATA-box, preferably comprising SEQ ID NO. 108, more preferably comprising a sequence having at least 89% identity to SEQ ID NO. 107 and comprising SEQ ID NO. 108, and more preferably comprising SEQ ID NO. 107, and
    • a CPRF factor binding site, preferably comprising SEQ ID NO. 114, more preferably comprising a sequence having at least 90% identity to SEQ ID NO. 113 and comprising SEQ ID NO. 114, and more preferably comprising SEQ ID NO. 113, and
    • a TCP class I transcription factor binding site, preferably comprising SEQ ID NO. 116, more preferably comprising a sequence having at least 90% identity to SEQ ID NO. 115 and comprising SEQ ID NO. 116, and more preferably comprising SEQ ID NO. 115, and
    • a bZIP protein G-Box binding factor 1 binding site, preferably comprising SEQ ID NO. 118, more preferably comprising a sequence having at least 85% identity to SEQ ID NO. 117 and comprising SEQ ID NO. 118, and more preferably comprising SEQ ID NO. 117,
      and preferably also comprises at least one, preferably at least two, more preferably at least three and most preferably all of the following elements:
    • a Ry motif, preferably comprising SEQ ID NO. 110, more preferably comprising a sequence having at least 88% identity to SEQ ID NO. 109 and comprising SEQ ID NO. 110, and more preferably comprising SEQ ID NO. 109,
    • a prolamin box, preferably comprising SEQ ID NO. 112, more preferably comprising a sequence having at least 90% identity to SEQ ID NO. 111 and comprising SEQ ID NO. 112, and more preferably comprising SEQ ID NO. 111,
    • a Cis-element as in GAPDH promoters conferring light inducibility, preferably comprising SEQ ID NO. 120, more preferably comprising a sequence having at least 90% identity to SEQ ID NO. 119 and comprising SEQ ID NO. 120, and more preferably comprising SEQ ID NO. 119,
    • a SBF-1 binding site, preferably comprising SEQ ID NO. 122, more preferably comprising a sequence having at least 90% identity to SEQ ID NO. 121 and comprising SEQ ID NO. 122, and more preferably comprising SEQ ID NO. 121, and
    • a Sunflower homeodomain leucine-zipper protein Hahb-4 binding site, preferably comprising SEQ ID NO. 124, more preferably comprising a sequence having at least 90% identity to SEQ ID NO. 123 and comprising SEQ ID NO. 124, and more preferably comprising SEQ ID NO. 123.

Also preferably the promotor comprises or consists of

    • a) a nucleic acid according to any of SEQ ID NO. 141, 142, 144, 145, 147, 148, 150, 151, 153, 154, 156, 157, 159 to 310, or
    • b) a nucleic acid having at least 70% sequence identity to any of the nucleic acid sequences according to a).

The advantages conferred with such promoters are described above.

Most preferred is a promoter-enhancer combination comprising or consisting of the sequence according to SEQ ID NO 1 or of a sequence having at least 70% sequence identity to the sequence of SEQ ID NO. 1. Such sequence is found flax (Linum usitatissimum) and allows for seed specific and highly active expression of more or less any target gene expressible in plant seeds. Interestingly the advantages conferred by the combination of a promoter, particularly the promoter found in SEQ ID NO. 1, and the enhancer of the present invention had not been noticed despite attempts in the prior art to analyze the characteristics of the promoter. For example, in WO0116340 a construct is created comprising the promoter found in SEQ ID NO. 1, but the enhancer region had been deleted. Thus, only a sequence as given in SEQ ID NO. 311 has been fused in this document to a GUS reporter gene. However, it has now been found that by including the enhancer of the present invention a substantial increase of reporter gene activity can be achieved.

The expression cassette of the present invention is preferably comprised in a vector. Thus, the vector of the present invention allows to transform a cell, preferably a plant cell, with a long target gene or a combination of multiple genes while achieving a high expression or activity of the target gene functionally linked to the enhancer of the present invention.

Correspondingly the invention provides a plant, plant organ or plant cell comprising an expression cassette according to the present invention or a recombinant nucleic acid according to the present invention. Of course the recombinant nucleic acid should also comprise a promoter such as to allow for the expression of the target gene, because an increase of expression or activity of the target gene by the enhancer of the present invention obviously cannot be effected if the target gene is not expressed at all due to lack of a promoter. The plant, plant organ or plant cell makes use of the advantages conferred by the enhancer, recombinant nucleic acid or expression cassette of the present invention such that expression or activity of the target gene is increased compared to a plant, plant organ or plant cell comprising the same promoter and target gene combination without the enhancer of the present invention.

From what is given above it is clear that the invention also provides a method of increasing expression or activity of a target gene, comprising the steps of

    • i) providing, upstream of the target gene, an untranslated region and a plant promotor to obtain an expression cassette according to the present invention, and
    • ii) introducing the expression cassette into a plant cell.

The enhancer is, corresponding to the indications given above, preferably heterologous to the promoter and/or to the target gene. The expression cassette is introduced into the plant cell to allow for expression of the target gene in the plant cell or in plant cells derived from the exact plant cell that was subjected to introduction of the expression cassette. Thus, the above method of the invention encompasses the introduction of the expression cassette into a first plant cell and growth of further cells from the first cells, wherein the further cells can form for example a full plant or a plant organ, preferably a seed. Depending on the promoter of the expression cassette, the target gene is expressed in one or more of the further cells or during a selected stage of growth, for example during seed formation, or under selected environmental conditions, for example heat or drought stress or pathogen infection.

Also as described above the enhancer or expression cassette of the present invention is used for

    • increasing expression or activity of a target gene,
    • producing a vector according to the present invention, and/or for
    • producing a plant, plant organ or plant cell according to the present invention.

The advantages conferred by the above uses have been described supra in detail.

Unless indicated otherwise, the following definitions apply for the current invention:

The term “nucleic acid” refers to deoxyribonucleotides or ribonucleotides and polymers thereof (“polynucleotides”) in either single- or double-stranded form, composed of monomers (nucleotides) containing a sugar, phosphate and a base, which is either a purine or pyrimidine. Unless specifically limited, the term encompasses nucleic acids containing known analogs of natural nucleotides, which have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides. Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g. degenerate codon substitutions) and complementary sequences as well as the sequence explicitly indicated.

A “codon” is a nucleotide sequence of three nucleotides which encodes a specific amino acid.

One nucleotide sequence can be “complementary” to another sequence, meaning that they have the base on each position is the complementary (i.e. A to T, C to G) and in the reverse order. If one strand of the double-stranded DNA is considered the “sense” strand, then the other strand, considered the “antisense” strand, will have the complementary sequence to the sense strand. This distinction is due to “sense” sequences which code for proteins, and the complementary “antisense” sequences which are by nature non-functional.

A “nucleic acid fragment” is a fragment of a given nucleic acid molecule.

“Genetic elements” are nucleic acid fragments of solitary building blocks like genes, introns, promoters etc.

In higher plants, deoxyribonucleic acid (DNA) is the genetic material while ribonucleic acid (RNA) is involved in the transfer of information contained within DNA into proteins. The term “nucleotide sequence” refers to a polymer of DNA or RNA which can be single- or double-stranded, optionally containing synthetic, non-natural or altered nucleotide bases capable of incorporation into DNA or RNA polymers.

The terms “nucleic acid” or “nucleic acid sequence” or “polynucleotide sequence” are used interchangeably.

The “degeneracy code” is reflecting the redundancy of the genetic code characterized by its non-ambiguity. For example, although codons GAA and GAG both specify glutamic acid (redundancy), neither of them specifies any other amino acid (no ambiguity). Degeneracy results because there are more codons than amino acids to be encoded. Degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer 1991; Ohtsuka 1985; Rossolini 1994).

The term “gene” is used broadly to refer to any segment of nucleic acid associated with a biological function. Thus, genes include coding sequences and/or the regulatory sequences required for their expression. For example, gene refers to a nucleic acid fragment that expresses mRNA or functional RNA, or encodes a specific protein, and which includes regulatory sequences. Genes also include non-expressed DNA segments that, for example, form recognition sequences for other proteins. Genes can be obtained from a variety of sources, including cloning from a source of interest or synthesizing from known or predicted sequence information, and may include sequences designed to have desired parameters.

The terms “genome” or “genomic DNA” is referring to the heritable genetic information of a host organism. Said genomic DNA comprises the DNA of the nucleus (also referred to as chromosomal DNA) but also the DNA of the plastids (e.g. chloroplasts) and other cellular organelles (e.g. mitochondria). Preferably the terms genome or genomic DNA is referring to the chromosomal DNA of the nucleus.

The term “chromosomal DNA” or “chromosomal DNA-sequence” is to be understood as the genomic DNA of the cellular nucleus independent from the cell cycle status. Chromosomal DNA might therefore be organized in chromosomes or chromatids, they might be condensed or uncoiled. An insertion into the chromosomal DNA can be demonstrated and analyzed by various methods known in the art like e.g., polymerase chain reaction (PCR) analysis, Southern blot analysis, fluorescence in situ hybridization (FISH), and in situ PCR.

“Coding sequence” refers to a DNA or RNA molecule that codes for a specific amino acid molecule and excludes the “non-coding sequences”. It may constitute an “uninterrupted coding sequence”, i.e., lacking an intron, such as in a cDNA or it may include one or more introns bounded by appropriate splice junctions. An “intron” is a molecule of RNA which is contained in the primary transcript but which is removed through cleavage and re-ligation of the RNA within the cell to create the mature mRNA that can be translated into a protein.

A “regulatory sequence” refers to nucleotide molecules influencing the transcription, RNA processing or stability, or translation of the associated (or functionally linked) nucleotide molecules to be transcribed. The transcription regulating nucleotide molecule may have various localizations with respect to the nucleotide molecules to be transcribed. The transcription regulating nucleotide molecule may be located upstream (5′ non-coding sequences), within, or downstream (3′ non-coding sequences) of the molecule to be transcribed (e.g., a coding sequence). The transcription regulating nucleotide molecule may be selected from the group comprising enhancers, promoters, translation leader sequences, introns, 5′-untranslated sequences, 3′-untranslated sequences, and polyadenylation signal sequences. They include natural and synthetic molecules as well as molecules, which may be a combination of synthetic and natural molecules. The term “transcription regulating nucleotide molecule” is not limited to promoters. However, preferably a transcription regulating nucleotide molecule of the invention comprises at least one promoter molecule (e.g., a molecule localized upstream of the transcription start of a gene capable to induce transcription of the downstream molecules). In one preferred embodiment the transcription regulating nucleotide molecule of the invention comprises the promoter molecule of the corresponding gene and—optionally and preferably—the native 5′-untranslated region of said gene. Furthermore, the 3′-untranslated region and/or the polyadenylation region of said gene may also be employed. As used herein, the term “cis-element” or “promoter motif” refers to a cis-acting transcriptional regulatory element that confers an aspect of the overall control of gene expression. A cis-element may function to bind transcription factors, transacting protein factors that regulate transcription. Some cis-elements bind more than one transcription factor, and transcription factors may interact with different affinities with more than one cis-element.

A “functional RNA” refers to an antisense RNA, microRNA, siRNA, ribozyme, or other RNA that is not translated.

“Transcription” takes place when RNA polymerase makes a copy from the DNA to mRNA. “mRNA” conveys genetic information from DNA to the ribosome, where they specify the amino acid sequence of the protein products of gene expression. Non-eukaryotic mRNA is, in essence, mature upon transcription and normally requires no processing. Eukaryotic pre-mRNA, requires “processing”, meaning that the pre-mRNA is modified post-transcriptionally. Processing includes e.g. 5′ cap addition, splicing, polyadenylation.

The term “RNA transcript” refers to the product resulting from RNA polymerase catalyzed transcription of a DNA molecule. When the RNA transcript is a perfect complementary copy of the DNA molecule, it is referred to as the primary transcript or it may be a RNA molecule derived from posttranscriptional processing of the primary transcript and is referred to as the mature RNA.

“Messenger RNA” (mRNA) refers to the RNA that is without introns and that can be translated into protein by the cell.

“cDNA” refers to a single- or a double-stranded DNA that is complementary to and derived from mRNA.

The terms “open reading frame” and “ORF” refer to the amino acid sequence encoded between translation initiation and termination codons of a coding sequence. “Translation” proceeds in four phases: initiation, elongation, translocation and termination. The terms “initiation codon” and “termination codon” refer to a unit of three adjacent nucleotides (“codon”) in a coding sequence that specifies initiation and chain termination, respectively, of protein synthesis (mRNA translation). Initiation involves the small subunit of the ribosome binding to the 5′ end of mRNA with the help of initiation factors (IF). The start codon is the first codon of a mRNA transcript translated by a ribosome. The start codon always codes for methionine in eukaryotes and a modified Met (fMet) in prokaryotes. The most common start codon is AUG. Termination of the polypeptide happens when the A site of the ribosome faces a stop codon (UAA, UAG, or UGA).

“5′ non-coding sequence” or “5′-untranslated sequence” or “-region” refers to a sequence of a nucleotide molecule located 5′ (upstream) to the codikeine ahnungng sequence. It is present in the fully processed mRNA upstream of the initiation codon and may affect processing of the primary transcript to mRNA, mRNA stability or translation efficiency.

“3′ non-coding sequence” or “3′-untranslated sequence” or “-region” refers to a sequence of a nucleotide molecule located 3′ (downstream) to a coding sequence and include polyadenylation signal sequences and other sequences encoding regulatory signals capable of affecting mRNA processing or gene expression. The polyadenylation signal is usually characterized by affecting the addition of polyadenylic acid tracts to the 3′ end of the mRNA precursor. The use of different 3′ non-coding sequences is exemplified by Ingelbrecht et al., 1989.

“Promoter” refers to a nucleotide molecule, usually upstream (5′) to its coding sequence, which controls the expression of the coding sequence by providing the recognition for RNA polymerase and other factors required for proper transcription. “Promoter” includes a minimal promoter that is a short DNA sequence comprised of a TATA box and other sequences that serve to specify the site of transcription initiation, to which regulatory elements are added for control of expression. “Promoter” also refers to a nucleotide molecule that includes a minimal promoter plus regulatory elements that is capable of controlling the expression of a coding sequence or functional RNA. This type of promoter molecule consists of proximal and more distal upstream elements, the latter elements often referred to as enhancers. Accordingly, such an “enhancer” is a DNA molecule which can stimulate promoter activity and may be an innate element of the promoter or a heterologous element inserted to enhance the level or tissue specificity of a promoter. It is capable of operating in both orientations (normal or flipped), and is capable of functioning even when moved either upstream or downstream from the promoter. Both enhancers and other upstream promoter elements bind sequence-specific DNA-binding proteins that mediate their effects. Promoters may be derived in their entirety from a native gene, or be composed of different elements, derived from different promoters found in nature, or even be comprised of synthetic DNA segments. A promoter may also contain DNA sequences that are involved in the binding of protein factors, which control the effectiveness of transcription initiation in response to physiological or developmental conditions. A person skilled in the art is aware of methods for rendering a unidirectional to a bidirectional promoter and of methods to use the complement or reverse complement of a promoter sequence for creating a promoter having the same promoter specificity as the original sequence. Such methods are for example described for constitutive as well as inducible promoters by Xie et al. (2001) “Bidirectionalization of polar promoters in plants” nature biotechnology 19 pages 677-679. The authors describe that it is sufficient to add a minimal promoter to the 5′ prime end of any given promoter to receive a promoter controlling expression in both directions with same promoter specificity. The promoters of the present invention desirably contain cis-elements that can confer or modulate gene expression, also called transcription factor binding sites. Cis-elements can be identified by a number of techniques, including deletion analysis, i.e., deleting one or more nucleotides from the 5′ end or internal to a promoter; DNA binding protein analysis using DNase I footprinting, methylation interference, electrophoresis mobility-shift assays, in vivo genomic footprinting by ligation-mediated PCR, and other conventional assays; or by DNA sequence similarity analysis with known cis-element motifs by conventional DNA sequence comparison methods. The fine structure of a cis-element can be further studied by mutagenesis (or substitution) of one or more nucleotides or by other conventional methods. Cis-elements can be obtained by chemical synthesis or by isolation from promoters that include such elements, and they can be synthesized with additional flanking nucleotides that contain useful restriction enzyme sites to facilitate subsequent manipulation.

The “initiation site” is the position surrounding the first nucleotide that is part of the transcribed sequence, which is also defined as position +1. With respect to this site all other sequences of the gene and its controlling regions are numbered. Downstream sequences (i.e., further protein encoding sequences in the 3′ direction) are denominated positive, while upstream sequences (mostly of the controlling regions in the 5′ direction) are denominated negative.

Promoter elements, particularly a TATA element, that are inactive or that have greatly reduced promoter activity in the absence of upstream activation are referred to as “minimal or core promoters.” In the presence of a suitable transcription factor, the minimal promoter functions to permit transcription. A “minimal or core promoter” thus consists only of all basal elements needed for transcription initiation, e.g., a TATA box and/or an initiator.

“Constitutive promoter” refers to a promoter that is able to express the open reading frame (ORF) that it controls in all or nearly all of the plant tissues during all or nearly all developmental stages of the plant. Each of the transcdankmachs gut, and fühl dich gedrücktription-activating elements do not exhibit an absolute tissue-specificity, but mediate transcriptional activation in most plant parts at a level of at least 1% of the level reached in the part of the plant in which transcription is most active.

“Regulated promoter” refers to promoters that direct gene expression not constitutively, but in a temporally- and/or spatially-regulated manner, and includes both tissue-specific and inducible promoters. It includes natural and synthetic molecules as well as molecules which may be a combination of synthetic and natural molecules. Different promoters may direct the expression of a gene in different tissues or cell types, or at different stages of development, or in response to different environmental conditions. New promoters of various types useful in plant cells are constantly being discovered, numerous examples may be found in the compilation by Okamuro et al. (1989). Typical regulated promoters useful in plants include but are not limited to safener-inducible promoters, promoters derived from the tetracycline-inducible system, promoters derived from salicylate-inducible systems, promoters derived from alcohol-inducible systems, promoters derived from glucocorticoid-inducible system, promoters derived from pathogen-inducible systems, and promoters derived from ecdysone-inducible systems.

“Tissue-specific promoter” refers to regulated promoters that are not expressed in all plant cells but only in one or more cell types in specific organs (such as leaves or seeds), specific tissues (such as epidermis, green tissue, embryo or cotyledon), or specific cell types (such as leaf parenchyma or seed storage cells). These also include promoters that are temporally regulated, such as in early or late embryogenesis, during leaf expansion fruit ripening in developing seeds or fruit, in fully differentiated leaf, or at the onset of senescence.

“Tissue-specific transcription” in the context of this invention means the transcription of a nucleic acid molecule by a transcription regulating nucleic acid molecule in a way that transcription of said nucleic acid molecule in said tissue contribute to more than 90%, preferably more than 95%, more preferably more than 99% of the entire quantity of the RNA transcribed from said nucleic acid molecule in the entire plant during any of its developmental stage. The transcription regulating nucleotide molecules specifically disclosed herein are considered to be tissue-specific transcription regulating nucleotide molecules.

“Tissue-preferential transcription” in the context of this invention means the transcription of a nucleic acid molecule by a transcription regulating nucleic acid molecule in a way that transcription of said nucleic acid sequence in the said tissue contribute to more than 50%, preferably more than 70%, more preferably more than 80% of the entire quantity of the RNA transcribed from said nucleic acid sequence in the entire plant during any of its developmental stage.

“Inducible promoter” refers to those regulated promoters that can be turned on in one or more cell types or that cause increased expression upon an external stimulus, such as a chemical, light, hormone, stress, or a pathogen.

A terminator, or transcription terminator is a section of genetic sequence that marks the end of gene or operon on genomic DNA for transcription.

The term “translation leader sequence” refers to that DNA sequence portion of a gene between the promoter and coding sequence that is transcribed into RNA and is present in the fully processed mRNA upstream (5′) of the translation start codon. The translation leader sequence may affect processing of the primary transcript to mRNA, mRNA stability or translation efficiency.

As part of gene expression, “translation” is the process through which cellular ribosomes manufacture proteins. In translation, messenger RNA (mRNA) produced by transcription is decoded by the ribosome to produce a specific amino acid chain, or polypeptide, that will later fold into an active protein. In bacteria, translation occurs in the cell's cytoplasm, where the large and small subunits of the ribosome are located, and bind to the mRNA. In eukaryotes, translation occurs across the membrane of the endoplasmic reticulum in a process called vectorial synthesis. The ribosome facilitates decoding by inducing the binding of transfer RNAs (tRNA) with complementary anticodon sequences to that of the mRNA.

The Kozak sequence on an mRNA molecule is recognized by the ribosome as the translational start site, from which a protein is coded by that mRNA molecule. The ribosome requires this sequence, or a possible variation to initiate translation. The sequence is identified by the notation (gcc)gccRccAUGG,

which summarizes data analysed by Kozak from a wide variety of sources (about 699 in all; Kozak M (October 1987). “An analysis of 5′-noncoding sequences from 699 vertebrate messenger RNAs”. Nucleic Acids Res. 15 (20): 8125-8148.) as follows: a lower case letter denotes the most common base at a position where the base can nevertheless vary; upper case letters indicate highly-conserved bases, i.e. the ‘AUGG’ sequence is constant or rarely changes, ‘R’ which indicates that a purine (adenine or guanine) is always observed at this position (with adenine being claimed by Kozak to be more frequent); and the sequence in brackets ((gcc)) is of uncertain significance. Preferably, the Kozak consensus sequence it that of Arabidopsis thaliana AAA-AUG-GC.

A transfer RNA (tRNA) is an adaptor molecule composed of RNA that serves as the physical link between the nucleotide sequence of nucleic acids (DNA and RNA) and the amino acid sequence of proteins. It does this by carrying an amino acid to the protein synthetic machinery of a cell (i.e. the ribosome) as directed by a codon in the mRNA.

“Expression” refers to the transcription and/or translation of an endogenous gene, ORF or portion thereof, or a transgene in plants. For example, in the case of antisense constructs, expression may refer to the transcription of the antisense DNA only. In addition, expression refers to the transcription and stable accumulation of sense (mRNA) or functional RNA. Expression may also refer to the production of protein.

The “expression pattern” of a promoter (with or without enhancer) is the pattern of expression levels, which shows where in the plant and in what developmental stage transcription is initiated by said promoter. Expression patterns of a set of promoters are said to be complementary when the expression pattern of one promoter shows little overlap with the expression pattern of the other promoter. The level of expression of a promoter can be determined by measuring the “steady state” concentration of a standard transcribed reporter mRNA. This measurement is indirect since the concentration of the reporter mRNA is dependent not only on its synthesis rate, but also on the rate with which the mRNA is degraded. Therefore, the steady state level is the product of synthesis rates and degradation rates. The rate of degradation can however be considered to proceed at a fixed rate when the transcribed molecules are identical, and thus this value can serve as a measure of synthesis rates. When promoters are compared in this way, techniques available to those skilled in the art are hybridization S1-RNAse analysis, northern blots and competitive RT-PCR. This list of techniques in no way represents all available techniques, but rather describes commonly used procedures used to analyze transcription activity and expression levels of mRNA. The analysis of transcription start points in practically all promoters has revealed that there is usually no single base at which transcription starts, but rather a more or less clustered set of initiation sites, each of which accounts for some start points of the mRNA. Since this distribution varies from promoter to promoter the sequences of the reporter mRNA in each of the populations would differ from each other. Since each mRNA species is more or less prone to degradation, no single degradation rate can be expected for different reporter mRNAs. It has been shown for various eukaryotic promoter molecules that the sequence surrounding the initiation site (“initiator”) plays an important role in determining the level of RNA expression directed by that specific promoter. This includes also part of the transcribed sequences. The direct fusion of promoter to reporter molecules would therefore lead to suboptimal levels of transcription. A commonly used procedure to analyze expression patterns and levels is through determination of the “steady state” level of protein accumulation in a cell. Commonly used candidates for the reporter gene, known to those skilled in the art are beta-glucuronidase (GUS), chloramphenicol acetyl transferase (CAT) and proteins with fluorescent properties, such as green fluorescent protein (GFP) from Aequora victoria. In principle, however, many more proteins are suitable for this purpose, provided the protein does not interfere with essential plant functions. For quantification and determination of localization a number of tools are suited. Detection systems can readily be created or are available which are based on, e.g., immunochemical, enzymatic, fluorescent detection and quantification. Protein levels can be determined in plant tissue extracts or in intact tissue using in situ analysis of protein expression. Generally, individual transformed lines with one chimeric promoter reporter construct will vary in their levels of expression of the reporter gene. Also frequently observed is the phenomenon that such transformants do not express any detectable product (RNA or protein). The variability in expression is commonly ascribed to ‘position effects’, although the molecular mechanisms underlying this inactivity are usually not clear.

Preferably, the level of expression of a promoter of the current invention is analyzed on the basis of the target gene activity (conversion efficiency) as calculated by the sum of target gene products (in the examples below: ARA and EPA) divided by the total of target gene substrates and products (in the examples below: 20:3n-6, 20:4n-3, ARA and EPA).

“Constitutive expression” refers to expression using a constitutive or regulated promoter. “Conditional” and “regulated expression” refer to expression controlled by a regulated promoter.

“Specific expression” is the expression of gene products, which is limited to one or a few tissues (spatial limitation) and/or to one or a few developmental stages (temporal limitation) e.g. of a plant. It is acknowledged that hardly a true specificity exists: promoters seem to be preferably switch on in some tissues, while in other tissues there can be no or only little activity. This phenomenon is known as leaky expression. However, with specific expression in this invention is meant preferable expression in one or a few plant tissues.

The terms “polypeptide”, “peptide”, “oligopeptide”, “gene product”, “expression product” and “protein” are used interchangeably herein to refer to a polymer or oligomer of consecutive amino acid residues. As used herein, the term “amino acid sequence” or a “polypeptide sequence” refers to a list of abbreviations, letters, characters or words representing amino acid residues. Amino acids may be referred to herein by either their commonly known three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical Nomenclature Commission. The abbreviations used herein are conventional one letter codes for the amino acids: A, alanine; B, asparagine or aspartic acid; C, cysteine; D aspartic acid; E, glutamate, glutamic acid; F, phenylalanine; G, glycine; H histidine; I isoleucine; K, lysine; L, leucine; M, methionine; N, asparagine; P, proline; Q, glutamine; R, arginine ; S, serine; T, threonine; V, valine; W, tryptophan; Y, tyrosine; Z, glutamine or glutamic acid (see L. Stryer, Biochemistry, 1988, W. H. Freeman and Company, New York. The letter “x” as used herein within an amino acid sequence can stand for any amino acid residue.

The term “wild-type”, “natural” or “natural origin” means with respect to an organism, polypeptide, or nucleic acid sequence that said organism is naturally occurring or available in at least one naturally occurring organism which is not changed, mutated, or otherwise manipulated by man.

“Recombinant DNA molecule” is a combination of DNA sequences that are joined together using recombinant DNA technology and procedures used to join together DNA sequences as described, for example, in Sambrook et al., 1989.

“Genetic modification” is the result of recombinant DNA modification, meaning an organism is recombinantly modified resulting in modified characteristics compared to the wild-type organism, which has not been genetically modified.

A “transgene” refers to a gene that has been introduced into the genome by transformation and is stably maintained. Transgenes may include, for example, genes that are either heterologous or homologous to the genes of a particular plant to be transformed. Additionally, transgenes may comprise native genes inserted into a non-native organism, or chimeric genes. The term “endogenous gene” refers to a native gene in its natural location in the genome of an organism. A “foreign” gene refers to a gene not normally found in the host organism but that is introduced by gene transfer.

The terms “heterologous DNA molecule”, or “heterologous nucleic acid,” as used herein, each refer to a molecule that originates from a source foreign to the particular host cell or, if from the same source, is modified from its original form. Thus, a heterologous gene in a host cell includes a gene that is endogenous to the particular host cell but has been modified through, for example, the use of DNA shuffling. The terms also include non-naturally occurring multiple copies of a naturally occurring DNA molecule. Thus, the terms refer to a DNA segment that is foreign or heterologous to the cell, or homologous to the cell but in a position within the host cell nucleic acid in which the element is not ordinarily found. Exogenous DNA segments are expressed to yield exogenous polypeptides. A “homologous DNA molecule” is a DNA molecule that is naturally associated with a host cell into which it is introduced.

The heterologous nucleotide molecule to be expressed in e.g. a plant tissue, plant organ, plant, seed or plant cell is preferably operably linked to one or more introns having expression enhancing effects, NEENAs (WO2 011023537, WO 2011023539), 5′ and or 3′-untranslated regions, transcription termination and/or polyadenylation signals. 3′-untranslated regions are suitable to stabilize mRNA expression and structure. This can result in prolonged presence of the mRNA and thus enhanced expression levels. Termination and polyadenylation signals are suitable to stabilize mRNA expression (e.g., by stabilization of the RNA transcript and thereby the RNA level) to ensure constant mRNA transcript length and to prevent read-through transcription. Especially in multigene expression constructs this is an important feature. Furthermore correct termination of transcription is linked to re-initiation of transcription from the regulatory 5′nucleotide sequence resulting in enhanced expression levels. The above-mentioned signals can be any signal functional in plants and can for example be isolated from plant genes, plant virus genes or other plant pathogens. However, in a preferred embodiment the 3′-untranslated regions, transcription termination and polyadenylation signals are from the genes employed as the source for the promoters of this invention.

“Target gene” refers to a gene on the replicon that expresses the desired target coding sequence, functional RNA, or protein. The target gene is not essential for replicon replication. Additionally, target genes may comprise native non-viral genes inserted into a non-native organism, or chimeric genes, and will be under the control of suitable regulatory sequences. Thus, the regulatory sequences in the target gene may come from any source, including the virus. Target genes may include coding sequences that are either heterologous or homologous to the genes of a particular plant to be transformed. However, target genes do not include native viral genes. Typical target genes include, but are not limited to genes encoding a structural protein, a seed storage protein, a protein that conveys herbicide resistance, and a protein that conveys insect resistance. Proteins encoded by target genes are known as “foreign proteins”. The expression of a target gene in a plant will typically produce an altered plant trait.

A “reporter gene” is a special target gene. Meaning that such reporter genes are often attached to regulatory sequences because the characteristics they confer on organisms expressing them are easily identified and measured, or because they are selectable markers. Reporter genes are often used as an indication of whether a certain gene has been taken up by or expressed in the cell or organism. A “marker gene” encodes a selectable trait to be screened for.

The term “chimeric gene” refers to any gene that contains

    • DNA sequences, including regulatory and coding sequences, that are not functionally linked together in nature, or
    • sequences encoding parts of proteins not naturally adjoined, or
    • parts of promoters that are not naturally adjoined.

Accordingly, a chimeric gene may comprise regulatory molecules and coding sequences that are derived from different sources, or comprise regulatory molecules, and coding sequences derived from the same source, but arranged in a manner different from that found in nature.

“Chimeric transacting replication gene” refers either to a replication gene in which the coding sequence of a replication protein is under the control of a regulated plant promoter other than that in the native viral replication gene, or a modified native viral replication gene, for example, in which a site specific sequence(s) is inserted in the 5′ transcribed but untranslated region. Such chimeric genes also include insertion of the known sites of replication protein binding between the promoter and the transcription start site that attenuate transcription of viral replication protein gene.

“Replication gene” refers to a gene encoding a viral replication protein. In addition to the ORF of the replication protein, the replication gene may also contain other overlapping or non-overlapping ORF(s), as are found in viral sequences in nature. While not essential for replication, these additional ORFs may enhance replication and/or viral DNA accumulation. Examples of such additional ORFs are AC3 and AL3 in ACMV and TGMV geminiviruses, respectively.

An “oligonucleotide” corresponding to a nucleotide sequence of the invention, e.g., for use in probing or amplification reactions, may be about 30 or fewer nucleotides in length (e.g., 9, 12, 15, 18, 20, 21, 22, 23, or 24, or any number between 9 and 30). Generally specific primers are upwards of 14 nucleotides in length. For optimum specificity and cost effectiveness, primers of 16 to 24 nucleotides in length may be preferred. Those skilled in the art are well versed in the design of primers for use processes such as PCR. If required, probing can be done with entire restriction fragments of the gene disclosed herein which may be 100's or even 1000's of nucleotides in length.

An “isolated” or “purified” DNA molecule or an “isolated” or “purified” polypeptide is a DNA molecule or polypeptide that, by the hand of man, exists apart from its native environment and is therefore not a product of nature. An isolated DNA molecule or polypeptide may exist in a purified form or may exist in a non-native environment such as, for example, a transgenic host cell. For example, an “isolated” or “purified” nucleic acid molecule or protein, or biologically active portion thereof, is substantially free of other cellular material, or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized. Preferably, an “isolated” nucleic acid is free of sequences (preferably protein encoding sequences) that naturally flank the nucleic acid (i.e., sequences located at the 5′ and 3′ ends of the nucleic acid) in the genomic DNA of the organism from which the nucleic acid is derived. For example, in various embodiments, the isolated nucleic acid molecule can contain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb, or 0.1 kb of nucleotide sequences that naturally flank the nucleic acid molecule in genomic DNA of the cell from which the nucleic acid is derived.

A protein that is substantially free of cellular material includes preparations of protein or polypeptide having less than about 30%, 20%, 10%, 5%, (by dry weight) of contaminating protein. When the protein of the invention, or biologically active portion thereof, is recombinantly produced, preferably culture medium represents less than about 30%, 20%, 10%, or 5% (by dry weight) of chemical precursors or non-protein of interest chemicals. The nucleotide sequences of the invention include both the naturally occurring sequences as well as mutant (variant) forms. Such variants will continue to possess the desired activity, i.e., either promoter activity or the activity of the product encoded by the open reading frame of the non-variant nucleotide sequence.

“Expression cassette” as used herein means a DNA sequence capable of directing expression of a particular nucleotide sequence in an appropriate host cell, comprising a promoter operably linked to a nucleotide sequence of interest, which is—optionally—operably linked to termination signals and/or other regulatory elements. An expression cassette may also comprise sequences required for proper translation of the nucleotide sequence. The coding region usually codes for a protein of interest but may also code for a functional RNA of interest, for example antisense RNA or a non-translated RNA, in the sense or antisense direction. The expression cassette comprising the nucleotide sequence of interest may be chimeric, meaning that at least one of its components is heterologous with respect to at least one of its other components. The expression cassette may also be one, which is naturally occurring but has been obtained in a recombinant form useful for heterologous expression. An expression cassette may be assembled entirely extracellularly (e.g., by recombinant cloning techniques). However, an expression cassette may also be assembled using in part endogenous components. For example, an expression cassette may be obtained by placing (or inserting) a promoter sequence upstream of an endogenous sequence, which thereby becomes functionally linked and controlled by said promoter sequences. Likewise, a nucleic acid sequence to be expressed may be placed (or inserted) downstream of an endogenous promoter sequence thereby forming an expression cassette. The expression of the nucleotide sequence in the expression cassette may be under the control of a constitutive promoter or of an inducible promoter, which initiates transcription only when the host cell is exposed to some particular external stimulus. In the case of a multicellular organism, the promoter can also be specific to a particular tissue or organ or stage of development. In a preferred embodiment, such expression cassettes will comprise the transcriptional initiation region of the invention linked to a nucleotide sequence of interest. Such an expression cassette is preferably provided with a plurality of restriction sites for insertion of the gene of interest to be under the transcriptional regulation of the regulatory regions. The expression cassette may additionally contain selectable marker genes. The cassette will include in the 5′-3′ direction of transcription, a transcriptional and translational initiation region, a DNA sequence of interest, and a transcriptional and translational termination region functional in plants. The termination region may be native with the transcriptional initiation region, may be native with the DNA sequence of interest, or may be derived from another source. Convenient termination regions are available from the Ti-plasmid of A. tumefaciens, such, as the octopine synthase and nopaline synthase termination regions and others described below (see also, Guerineau 1991; Proudfoot 1991; Sanfacon 1991; Mogen 1990; Munroe 1990; Ballas 1989; Joshi 1987).

“Vector” is defined to include, inter alia, any plasmid, cosmid, phage or Agrobacterium binary vector in double or single stranded linear or circular form which may or may not be self transmissible or mobilizable, and which can transform prokaryotic or eukaryotic host either by integration into the cellular genome or exist extrachromosomally (e.g. autonomous replicating plasmid with an origin of replication).

Specifically included are shuttle vectors by which is meant a DNA vehicle capable, naturally or by design, of replication in two different host organisms, which may be selected from actinomycetes and related species, bacteria and eukaryotic (e.g. higher plant, mammalian, yeast or fungal cells).

Preferably the nucleic acid in the vector is under the control of, and operably linked to, an appropriate promoter or other regulatory elements for transcription in a host cell such as a microbial, e.g. bacterial, or plant cell. The vector may be a bi-functional expression vector which functions in multiple hosts. In the case of genomic DNA, this may contain its own promoter or other regulatory elements and in the case of cDNA this may be under the control of an appropriate promoter or other regulatory elements for expression in the host cell.

“Operably-linked” or “functionally linked” refers preferably to the association of nucleic acid molecules on single nucleic acid fragment so that the function of one is affected by the other. For example, a regulatory DNA molecule is said to be “operably linked to” or “associated with” a DNA molecule that codes for an RNA or a polypeptide if the two molecules are situated such that the regulatory DNA molecule affects expression of the coding DNA molecule (i.e., that the coding sequence or functional RNA is under the transcriptional control of the promoter). Coding sequences can be operably-linked to regulatory molecules in sense or antisense orientation.

“Cloning vectors” typically contain one or a small number of restriction endonuclease recognition sites at which foreign DNA sequences can be inserted in a determinable fashion without loss of essential biological function of the vector, as well as a marker gene that is suitable for use in the identification and selection of cells transformed with the cloning vector. Marker genes typically include genes that provide tetracycline resistance, hygromycin resistance or ampicillin resistance.

The term “transformation” refers to the transfer of a nucleic acid fragment into the genome of a host cell, resulting in genetically stable inheritance. Host cells containing the transformed nucleic acid fragments are referred to as “transgenic cells”, and organisms comprising transgenic cells are referred to as “transgenic organisms”. Examples of methods of transformation of plants and plant cells include Agrobacterium-mediated transformation (De Blaere 1987) and particle bombardment technology (U.S. Pat. No. 4,945,050). Whole plants may be regenerated from transgenic cells by methods well known to the skilled artisan (see, for example, Fromm 1990).

“Transformed”, “transgenic”, and “recombinant” refer to a host organism such as a bacterium or a plant into which a heterologous nucleic acid molecule has been introduced. The nucleic acid molecule can be stably integrated into the genome generally known in the art and are disclosed (Sambrook 1989; Innis 1995; Gelfand 1995; Innis & Gelfand 1999. Known methods of PCR include, but are not limited to, methods using paired primers, nested primers, single specific primers, degenerate primers, gene-specific primers, vector-specific primers, partially mismatched primers, and the like. For example, “transformed”, “transformant”, and “transgenic” plants or calli have been through the transformation process and contain a foreign gene integrated into their chromosome. The term “untransformed” refers to normal plants that have not been through the transformation process.

“Transiently transformed” refers to cells in which transgenes and foreign DNA have been introduced (for example, by such methods as Agrobacterium-mediated transformation or biolistic bombardment), but not selected for stable maintenance. “Stably transformed” refers to cells that have been selected and regenerated on a selection media following transformation.

“Genetically stable” and “heritable” refer to chromosomally-integrated genetic elements that are stably maintained in the plant and stably inherited by progeny through successive generations.

“Chromosomally-integrated” refers to the integration of a foreign gene or DNA construct into the host DNA by covalent bonds. Where genes are not “chromosomally integrated”, they may be “transiently expressed.” Transient expression of a gene refers to the expression of a gene that is not integrated into the host chromosome but functions independently, either as part of an autonomously replicating plasmid or expression cassette, for example, or as part of another biological system such as a virus. “Transient expression” refers to expression in cells in which a virus or a transgene is introduced by viral infection or by such methods as Agrobacterium-mediated transformation, electroporation, or biolistic bombardment, but not selected for its stable maintenance.

“Overexpression” refers to the level of expression in transgenic cells or organisms that exceeds levels of expression in normal or untransformed (non-transgenic) cells or organisms.

“Signal peptide” refers to the amino terminal extension of a polypeptide, which is translated in conjunction with the polypeptide forming a precursor peptide and which is required for its entrance into the secretory pathway. The term “signal sequence” refers to a nucleotide sequence that encodes the signal peptide. The term “transit peptide” as used herein refers part of an expressed polypeptide (preferably to the amino terminal extension of a polypeptide), which is translated in conjunction with the polypeptide forming a precursor peptide and which is required for its entrance into a cell organelle (such as the plastids (e.g., chloroplasts) or mitochondria). The term “transit sequence” refers to a nucleotide sequence that encodes the transit peptide.

The activity of a transcription regulating nucleotide molecule is considered equivalent if transcription is initiated in the same tissues as is by the reference molecule. Such expression profile is preferably demonstrated using reporter genes operably linked to said transcription regulating nucleotide sequence. Preferred reporter genes (Schenborn 1999) in this context are green fluorescence protein (GFP) (Chuff 1996; Leffel 1997), chloramphenicol transferase, luciferase (Millar 1992), β-glucuronidase or β-galactosidase. Especially preferred is β-glucuronidase (Jefferson 1987).

Beside this the transcription regulating activity of a functional equivalent homolog or fragment of the transcription regulating nucleotide molecule may vary from the activity of its parent sequence, especially with respect to expression level. The expression level may be higher or lower than the expression level of the parent sequence. Both derivations may be advantageous depending on the nucleic acid sequence of interest to be expressed. Preferred are such functional equivalent sequences, which—in comparison with its parent sequence—does, not derivate from the expression level of said parent sequence by more than 50%, preferably 25%, more preferably 10% (as to be preferably judged by either mRNA expression or protein (e.g., reporter gene) expression). Furthermore preferred are equivalent sequences which demonstrate an increased expression in comparison to its parent sequence, preferably an increase by at least 50%, more preferably by at least 100%, most preferably by at least 500%.

What is meant by “substantially the same activity” or “the same activity” when used in reference to a polynucleotide fragment or a homolog is that the fragment or homolog has at least 90% or more, e.g., 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, up to at least 99% of the expression regulating activity of the full length polynucleotide.

“Significant increase” is an increase that is larger than the margin of error inherent in the measurement technique, preferably an increase by about 2-fold or greater.

The word “plant” refers to any plant, particularly to agronomically useful plants (e.g., seed plants), and “plant cell” is a structural and physiological unit of the plant, which comprises a cell wall but may also refer to a protoplast. The plant cell may be in form of an isolated single cell or a cultured cell, or as a part of higher organized unit such as, for example, a plant tissue, or a plant organ differentiated into a structure that is present at any stage of a plant's development. Such structures include one or more plant organs including, but are not limited to, fruit, shoot, stem, leaf, flower petal, etc. Preferably, the term “plant” includes whole plants, shoot vegetative organs/structures (e.g. leaves, stems and tubers), roots, flowers and floral organs/structures (e.g. bracts, sepals, petals, stamens, carpels, anthers and ovules), seeds (including embryo, endosperm, and seed coat) and fruits (the mature ovary), plant tissues (e.g. vascular tissue, ground tissue, and the like) and cells (e.g. guard cells, egg cells, trichomes and the like), and progeny of same. The class of plants that can be used in the method of the invention is generally as broad as the class of higher and lower plants amenable to transformation techniques, including angiosperms (monocotyledonous and dicotyledonous plants), gymnosperms, ferns, and multicellular algae. It includes plants of a variety of ploidy levels, including aneuploid, polyploid, diploid, haploid and hemizygous. Included within the scope of the invention are all genera and species of higher and lower plants of the plant kingdom. Included are furthermore the mature plants, seed, shoots and seedlings, and parts, propagation material (for example seeds and fruit) and cultures, for example cell cultures, derived therefrom. Preferred are plants and plant materials of the following plant families: Amaranthaceae, Brassicaceae, Carophyllaceae, Chenopodiaceae, Compositae, Cucurbitaceae, Labiatae, Leguminosae, Papilionoideae, Liliaceae, Linaceae, Malvaceae, Rosaceae, Saxifragaceae, Scrophulariaceae, Solanaceae, Tetragoniaceae. Annual, perennial, monocotyledonous and dicotyledonous plants are preferred host organisms for the generation of transgenic plants. The use of the recombination system, or method according to the invention is furthermore advantageous in all ornamental plants, forestry, fruit, or ornamental trees, flowers, cut flowers, shrubs or turf. Said plant may include—but shall not be limited to—bryophytes such as, for example, Hepaticae (hepaticas) and Musci (mosses); pteridophytes such as ferns, horsetail and clubmosses; gymnosperms such as conifers, cycads, ginkgo and Gnetaeae; algae such as Chlorophyceae, Phaeophpyceae, Rhodophyceae, Myxophyceae, Xanthophyceae, Bacillariophyceae (diatoms) and Euglenophyceae. Plants for the purposes of the invention may comprise the families of the Rosaceae such as rose, Ericaceae such as rhododendrons and azaleas, Euphorbiaceae such as poinsettias and croton, Caryophyllaceae such as pinks, Solanaceae such as petunias, Gesneriaceae such as African violet, Balsaminaceae such as touch-me-not, Orchidaceae such as orchids, Iridaceae such as gladioli, iris, freesia and crocus, Compositae such as marigold, Geraniaceae such as geraniums, Liliaceae such as Drachaena, Moraceae such as ficus, Araceae such as philodendron and many others. The transgenic plants according to the invention are furthermore selected in particular from among dicotyledonous crop plants such as, for example, from the families of the Leguminosae such as pea, alfalfa and soybean; the family of the Umbelliferae, particularly the genus Daucus (very particularly the species carota (carrot)) and Apium (very particularly the species graveolens var. dulce (celery)) and many others; the family of the Solanaceae, particularly the genus Lycopersicon, very particularly the species esculentum (tomato) and the genus Solanum, very particularly the species tuberosum (potato) and melongena (aubergine), tobacco and many others; and the genus Capsicum, very particularly the species annum (pepper) and many others; the family of the Leguminosae, particularly the genus Glycine, very particularly the species max (soybean) and many others; and the family of the Cruciferae, particularly the genus Brassica, very particularly the species napus (oilseed rape), campestris (beet), oleracea cv Tastie (cabbage), oleracea cv Snowball Y (cauliflower) and oleracea cv Emperor (broccoli); and the genus Arabidopsis, very particularly the species thaliana and many others; the family of the Compositae, particularly the genus Lactuca, very particularly the species sativa (lettuce) and many others. The transgenic plants according to the invention may be selected among monocotyledonous crop plants, such as, for example, cereals such as wheat, barley, sorghum and millet, rye, triticale, maize, rice or oats, and sugarcane. Further preferred are trees such as apple, pear, quince, plum, cherry, peach, nectarine, apricot, papaya, mango, and other woody species including coniferous and deciduous trees such as poplar, pine, sequoia, cedar, oak, etc. Especially preferred are Arabidopsis thaliana, Nicotiana tabacum, oilseed rape, soybean, corn (maize), wheat, Linum usitatissimum (linseed and flax), Camelina sativa, Brassica juncea, potato and tagetes. Brassica napus is used synonymously with rapeseed and canola herein.

“Plant tissue” includes differentiated and undifferentiated tissues or plants, including but not limited to roots, stems, shoots, leaves, pollen, seeds, tumor tissue and various forms of cells and culture such as single cells, protoplast, embryos, and callus tissue. The plant tissue may be in plants or in organ, tissue or cell culture.

“Mature seed” is a seed that has fully developed and has undergone all the stages of its development successfully. Such a seed can germinate into a seedling if provided with the necessary physical conditions. What are harvested are usually mature seeds.

The term “altered plant trait” means any phenotypic or genotypic change in a transgenic plant relative to the wild-type or non-transgenic plant host.

A “transgenic plant” is a plant having one or more plant cells that contain an expression vector or recombinant expression construct.

“Primary transformant” and “T0 generation” refer to transgenic plants that are of the same genetic generation as the tissue which was initially transformed (i.e., not having gone through meiosis and fertilization since transformation).

“Secondary transformants” and the “T1, T2, T3, etc. generations” refer to transgenic plants derived from primary transformants through one or more meiotic and fertilization cycles. They may be derived by self-fertilization of primary or secondary transformants or crosses of primary or secondary transformants with other transformed or untransformed plants.

The term variant or “homolog” with respect to a sequence (e.g., a polypeptide or nucleic acid sequence such as—for example—a transcription regulating nucleotide molecule of the invention) is intended to mean substantially similar sequences. For nucleotide sequences comprising an open reading frame, variants include those sequences that, because of the degeneracy of the genetic code, encode the identical amino acid sequence of the native protein. Naturally occurring allelic variants such as these can be identified with the use of well-known molecular biology techniques, as, for example, with polymerase chain reaction (PCR) and hybridization techniques. Variant nucleotide sequences also include synthetically derived nucleotide sequences, such as those generated, for example, by using site-directed mutagenesis and for open reading frames, encode the native protein, as well as those that encode a polypeptide having amino acid substitutions relative to the native protein. Generally, nucleotide sequence variants of the invention will have at least 40, 50, 60, to 70%, e.g., preferably 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, to 79%, generally at least 80%, e.g., 81%-84%, at least 85%, e.g., 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, to 98% and 99% nucleotide sequence identity to the native (wild type or endogenous) nucleotide sequence.

Sequence comparisons maybe carried out using a Smith-Waterman sequence alignment algorithm (see e.g., Waterman (1995)). The localS program, version 1.16, is preferably used with following parameters: match: 1, mismatch penalty: 0.33, open-gap penalty: 2, extended-gap penalty: 2.

The following terms are used to describe the sequence relationships between two or more nucleic acids or polynucleotides: (a) “reference sequence”, (b) “comparison window”, (c) “sequence identity”, (d) “percentage of sequence identity”, and (e) “substantial identity”.

As used herein, “reference sequence” is a defined sequence used as a basis for sequence comparison. A reference sequence may be a subset or the entirety of a specified sequence; for example, as a segment of a full length cDNA or gene sequence or isolated nucleic acid sequence capable of regulating expression in plants, preferably the complete cDNA or gene sequence or isolated nucleic acid sequence capable of regulating expression in plants is the reference sequence.

As used herein, “comparison window” makes reference to a contiguous and specified segment of a polynucleotide sequence, wherein the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. Generally, the comparison window is at least 20 contiguous nucleotides in length, and optionally can be 30, 40, 50, 100, or longer. In a preferred embodiment the comparison window defining the homology of sequence consists of the entire query sequence. Those of skill in the art understand that to avoid a high similarity to a reference sequence due to inclusion of gaps in the polynucleotide sequence a gap penalty is typically introduced and is subtracted from the number of matches.

Methods of alignment of sequences for comparison are well known in the art. Thus, the determination of percent identity between any two sequences can be accomplished using a mathematical algorithm. Preferred, non-limiting examples of such mathematical algorithms are the algorithm of Myers and Miller, 1988; the local homology algorithm of Smith et al. 1981; the homology alignment algorithm of Needleman and Wunsch 1970; the search-for-similarity-method of Pearson and Lipman 1988; the algorithm of Karlin and Altschul, 1990, modified as in Karlin and Altschul, 1993.

Computer implementations of these mathematical algorithms can be utilized for comparison of sequences to determine sequence identity. Such implementations include, but are not limited to: CLUSTAL in the PC/Gene program (available from Intelligenetics, Mountain View, Calif.); the ALIGN program (Version 2.0) and GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Version 8 (available from Genetics Computer Group (GCG), 575 Science Drive, Madison, Wis., USA). Alignments using these programs can be performed using the default parameters. The CLUSTAL program is well described (Higgins 1988, 1989; Corpet 1988; Huang 1992; Pearson 1994). The ALIGN program is based on the algorithm of Myers and Miller, supra. The BLAST programs of Altschul et al., 1990, are based on the algorithm of Karlin and Altschul, supra. Multiple aligments (i.e. of more than 2 sequences) are preferably performed using the Clustal W algorithm (Thompson 1994; e.g., in the software VectorNTI™, version 9; Invitrogen Inc.) with the scoring matrix BLOSUM62MT2 with the default settings (gap opening penalty 15/19, gap extension penalty 6.66/0.05; gap separation penalty range 8; % identity for alignment delay 40; using residue specific gaps and hydrophilic residue gaps).

Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/). This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul 1990). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always >0) and N (penalty score for mismatching residues; always <0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when the cumulative alignment score falls off by the quantity X from its maximum achieved value, the cumulative score goes to zero or below due to the accumulation of one or more negative-scoring residue alignments, or the end of either sequence is reached.

In addition to calculating percent sequence identity, the BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin & Altschul (1993). One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance. For example, a test nucleic acid sequence is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid sequence to the reference nucleic acid sequence is less than about 0.1, more preferably less than about 0.01, and most preferably less than about 0.001.

To obtain gapped alignments for comparison purposes, Gapped BLAST (in BLAST 2.0) can be utilized as described in Altschul et al. 1997. Alternatively, PSI-BLAST (in BLAST 2.0) can be used to perform an iterated search that detects distant relationships between molecules. See Altschul et al., supra. When utilizing BLAST, Gapped BLAST, PSI-BLAST, the default parameters of the respective programs (e.g. BLASTN for nucleotide sequences, BLASTX for proteins) can be used. The BLASTN program (for nucleotide sequences) uses as defaults a wordlength (W) of 11, an expectation (E) of 10, a cutoff of 100, M=5, N=−4, and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff & Henikoff, 1989). See http://www.ncbi.nlm.nih.gov. Alignment may also be performed manually by inspection.

For purposes of the present invention, comparison of nucleotide sequences for determination of percent sequence identity to the promoter sequences disclosed herein is preferably made using the BlastN program (version 1.4.7 or later) with its default parameters or any equivalent program. By “equivalent program” is intended any sequence comparison program that, for any two sequences in question, generates an alignment having identical nucleotide or amino acid residue matches and an identical percent sequence identity when compared to the corresponding alignment generated by the preferred program.

As used herein, “sequence identity” or “identity” in the context of two nucleic acid or polypeptide sequences makes reference to the residues in the two sequences that are the same when aligned for maximum correspondence over a specified comparison window. When percentage of sequence identity is used in reference to proteins it is recognized that residue positions which are not identical often differ by conservative amino acid substitutions, where amino acid residues are substituted for other amino acid residues with similar chemical properties (e.g., charge or hydrophobicity) and therefore do not change the functional properties of the molecule. When sequences differ in conservative substitutions, the percent sequence identity may be adjusted upwards to correct for the conservative nature of the substitution. Sequences that differ by such conservative substitutions are said to have “sequence similarity” or “similarity.” Means for making this adjustment are well known to those of skill in the art. Typically this involves scoring a conservative substitution as a partial rather than a full mismatch, thereby increasing the percentage sequence identity. Thus, for example, where an identical amino acid is given a score of 1 and a non-conservative substitution is given a score of zero, a conservative substitution is given a score between zero and 1. The scoring of conservative substitutions is calculated, e.g., as implemented in the program PC/GENE (Intelligenetics, Mountain View, Calif.).

As used herein, “percentage of sequence identity” means the value determined by comparing two optimally aligned sequences over a comparison window, preferably the complete query or reference sequence as defined by SEQ ID NO: x, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison, and multiplying the result by 100 to yield the percentage of sequence identity.

The term “substantial identity” of polynucleotide sequences means that a polynucleotide comprises a sequence that has at least 90%, 91%, 92%, 93%, or 94%, and most preferably at least 95%, 96%, 97%, 98%, or 99% sequence identity, compared to a reference sequence using one of the alignment programs described using standard parameters. One of skill in the art will recognize that these values can be appropriately adjusted to determine corresponding identity of proteins encoded by two nucleotide sequences by taking into account codon degeneracy, amino acid similarity, reading frame positioning, and the like. Substantial identity of amino acid sequences for these purposes normally means sequence identity of at least 90%, 95%, and most preferably at least 98%.

Another indication that nucleotide sequences are substantially identical is if two molecules hybridize to each other under stringent conditions (see below). Generally, stringent conditions are selected to be about 5° C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. However, stringent conditions encompass temperatures in the range of about 1° C. to about 20° C., depending upon the desired degree of stringency as otherwise qualified herein. Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the polypeptides they encode are substantially identical. This may occur, e.g., when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code. One indication that two nucleic acid sequences are substantially identical is when the polypeptide encoded by the first nucleic acid is immunologically cross reactive with the polypeptide encoded by the second nucleic acid.

The term “substantial identity” in the context of a polypeptide indicates that a peptide comprises a sequence with at least 90%, 91%, 92%, 93%, or 94%, or even more preferably, 95%, 96%, 97%, 98% or 99%, sequence identity to the reference sequence over a specified comparison window. Preferably, optimal alignment is conducted using the homology alignment algorithm of Needleman and Wunsch (1970). An indication that two peptide sequences are substantially identical is that one peptide is immunologically reactive with antibodies raised against the second peptide. Thus, a peptide is substantially identical to a second peptide, for example, where the two peptides differ only by a conservative substitution.

For sequence comparison, typically one sequence acts as a reference sequence to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are input into a computer, subsequence coordinates are designated if necessary, and sequence algorithm program parameters are designated. The sequence comparison algorithm then calculates the percent sequence identity for the test sequence(s) relative to the reference sequence, based on the designated program parameters. The reference sequences of the invention is defined by SEQ ID NO: x.

An indication that two nucleic acid sequences are substantially identical is that the two molecules hybridize to each other under stringent conditions. The phrase “hybridizing specifically to” refers to the binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular) DNA or RNA. “Bind(s) substantially” refers to complementary hybridization between a probe nucleic acid and a target nucleic acid and embraces minor mismatches that can be accommodated by reducing the stringency of the hybridization media to achieve the desired detection of the target nucleic acid sequence.

“Stringent hybridization conditions” and “stringent hybridization wash conditions” in the context of nucleic acid hybridization experiments such as Southern and Northern hybridization are sequence dependent, and are different under different environmental parameters. The Tm is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Specificity is typically the function of post-hybridization washes, the critical factors being the ionic strength and temperature of the final wash solution. For DNA-DNA hybrids, the Tm can be approximated from the equation of Meinkoth and Wahl, 1984:


Tm=81.5° C.+16.6(log10 M)+0.41 (% GC)−0.61 (% form)−500/L

where M is the molarity of monovalent cations, % GC is the percentage of guanosine and cytosine nucleotides in the DNA, % form is the percentage of formamide in the hybridization solution, and L is the length of the hybrid in base pairs. Tm is reduced by about 1° C. for each 1% of mismatching; thus, Tm, hybridization, and/or wash conditions can be adjusted to hybridize to sequences of the desired identity. For example, if sequences with >90% identity are sought, the Tm can be decreased 10° C. Generally, stringent conditions are selected to be about 5° C. lower than the thermal melting point I for the specific sequence and its complement at a defined ionic strength and pH. However, severely stringent conditions can utilize a hybridization and/or wash at 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10° C. lower than the thermal melting point I; moderately stringent conditions can utilize a hybridization and/or wash at 6, 7, 8, 9, or 10° C. lower than the thermal melting point I; low stringency conditions can utilize a hybridization and/or wash at 11, 12, 13, 14, 15, or 20° C. lower than the thermal melting point I. Using the equation, hybridization and wash compositions, and desired T, those of ordinary skill will understand that variations in the stringency of hybridization and/or wash solutions are inherently described. If the desired degree of mismatching results in a T of less than 45° C. (aqueous solution) or 32° C. (formamide solution), it is preferred to increase the SSC concentration so that a higher temperature can be used. An extensive guide to the hybridization of nucleic acids is found in Tijssen, 1993. Generally, highly stringent hybridization and wash conditions are selected to be about 5° C. lower than the thermal melting point Tm for the specific sequence at a defined ionic strength and pH.

An example of highly stringent wash conditions is 0.15 M NaCl at 72° C. for about 15 minutes. An example of stringent wash conditions is a 0.2×SSC wash at 65° C. for 15 minutes (see, Sambrook, infra, for a description of SSC buffer). Often, a high stringency wash is preceded by a low stringency wash to remove background probe signal. An example medium stringency wash for a duplex of, e.g., more than 100 nucleotides, is 1×SSC at 45° C. for 15 minutes. An example low stringency wash for a duplex of, e.g., more than 100 nucleotides, is 4 to 6×SSC at 40° C. for 15 minutes. For short probes (e.g., about 10 to 50 nucleotides), stringent conditions typically involve salt concentrations of less than about 1.5 M, more preferably about 0.01 to 1.0 M, Na ion concentration (or other salts) at pH 7.0 to 8.3, and the temperature is typically at least about 30° C. and at least about 60° C. for long robes (e.g., >50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. In general, a signal to noise ratio of 2× (or higher) than that observed for an unrelated probe in the particular hybridization assay indicates detection of a specific hybridization. Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the proteins that they encode are substantially identical. This occurs, e.g., when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code.

Very stringent conditions are selected to be equal to the Tm for a particular probe. An example of stringent conditions for hybridization of complementary nucleic acids which have more than 100 complementary residues on a filter in a Southern or Northern blot is 50% formamide, e.g., hybridization in 50% formamide, 1 M NaCl, 1% SDS at 37° C., and a wash in 0.1×SSC at 60 to 65° C. Exemplary low stringency conditions include hybridization with a buffer solution of 30 to 35% formamide, 1 M NaCl, 1% SDS (sodium dodecyl sulphate) at 37° C., and a wash in 1× to 2×SSC (20×SSC=3.0 M NaCl/0.3 M trisodium citrate) at 50 to 55° C. Exemplary moderate stringency conditions include hybridization in 40 to 45% formamide, 1.0 M NaCl, 1% SDS at 37° C., and a wash in 0.5× to 1× SSC at 55 to 60° C.

The following are examples of sets of hybridization/wash conditions that may be used to clone orthologous nucleotide sequences that are substantially identical to reference nucleotide sequences of the present invention: a reference nucleotide sequence preferably hybridizes to the reference nucleotide sequence in 7% sodium dodecyl sulfate (SDS), 0.5 M NaPO4, 1 mM EDTA at 50° C. with washing in 2×SSC, 0.1% SDS at 50° C., more desirably in 7% sodium dodecyl sulfate (SDS), 0.5 M NaPO4, 1 mM EDTA at 50° C. with washing in 1×SSC, 0.1% SDS at 50° C., more desirably still in 7% sodium dodecyl sulfate (SDS), 0.5 M NaPO4, 1 mM EDTA at 50° C. with washing in 0.5×SSC, 0.1% SDS at 50° C., preferably in 7% sodium dodecyl sulfate (SDS), 0.5 M NaPO4, 1 mM EDTA at 50° C. with washing in 0.1×SSC, 0.1% SDS at 50° C., more preferably in 7% sodium dodecyl sulfate (SDS), 0.5 M NaPO4, 1 mM EDTA at 50° C. with washing in 0.1×SSC, 0.1% SDS at 65° C.

The term “fatty acid” refers to long chain aliphatic acids (alkanoic acids) of varying chain lengths, from about C12 to C22 (although both longer and shorter chain-length acids are known). The predominant chain lengths are between C16 and C22. Additional details concerning the differentiation between “saturated fatty acids” versus “unsaturated fatty acids”, “monounsaturated fatty acids” versus “polyunsaturated fatty acids” (or “PUFAs”), and “omega-6 fatty acids” (ω-6 or n-6) versus “omega-3 fatty acids” (ω-3 or n-3) are provided in WO 2004/101757.

Fatty acids are described herein by a simple notation system of “X:Y”, wherein the number before the colon indicates the number of carbon atoms in the fatty acid and the number after the colon is the number of double bonds that are present. The number following the fatty acid designation indicates the position of the double bond from the carboxyl end of the fatty acid with the “c” affix for the cis configuration of the double bond [e.g., palmitic acid (16:0), stearic acid (18:0), oleic acid (18: 1, 9c), petroselinic acid (18: 1, 6c), LA (18:2, 9c, 12c), GLA (18:3, 6c,9c, 12c) and ALA (18:3, 9c,12c,15c)]. Unless otherwise specified 18:1, 18:2 and 18:3 refer to oleic, LA and linolenic fatty acids. If not specifically written as otherwise, double bonds are assumed to be of the cis configuration. For instance, the double bonds in 18:2 (9, 12) would be assumed to be in the cis configuration.

Nomenclature of polyunsaturated fatty acids (PUFAs):

Common name Chemical name linoleic acid LA cis-9,12-octadecadienoic acid 18:2 ω-6 gamma-linoleic acid GLA cis-6,9,12-octadecatrienoic acid 18:3 ω-6 alpha-linoleic acid ALA cis-9,12,15-octadecatrienoic acid 18:3 ω-3 stearidonic acid STA cis-6,9,12,15-octadecatetraenoic acid 18:4 ω-3 eicosadienoic acid EDA cis-11,14-eicosadienoic acid 20:2 ω-6 dihomo-gamma linoleic acid DGLA cis-8,11,14-eicosatrienoic acid 20:3 ω-6 eicosatrienoic acid ETra cis-11,14,17-eicosatrienoic acid 20:3 ω-3 arachidonic acid AA cis-5,8,11,14-eicosatetraenoic acid 20:4 ω-6 eicosatetraenoic acid ETA cis-8,11,14,17-eicosatetraenoic acid 20:4 ω-3 eicosapentaenoic acid ETA cis-5,8,11,14,17-eicosapentaenoic acid 20:5 ω-3 docosapentaenoic acid DPA cis-7,10,13,16,19-docosapentaenoic acid 22:5 ω-3 docosahexaenoic acid DHA cis-4,7,10,13,16,19-docosapentaenoic acid 22:6 ω-3

The term “fat” refers to a lipid substance that is solid at 25° C. and usually saturated.

The term “oil” refers to a lipid substance that is liquid at 25° C. and usually polyunsaturated. PUFAs are found in the oils of some algae, oleaginous yeasts and filamentous fungi. “Microbial oils” or “single cell oils” are those oils naturally produced by microorganisms during their lifespan. Such oils can contain long chain PUFAs.

The term “PUFA biosynthetic pathway” refers to a metabolic process that converts oleic acid to LA, EDA, GLA, DGLA, ARA, ALA, STA, ETrA, ETA, EPA, DPA and DHA. This process is well described in the literature (e.g., see WO 2005/003322). Simplistically, this process involves elongation of the carbon chain through the addition of carbon atoms and desaturation of the molecule through the addition of double bonds, via a series of special desaturation and elongation enzymes (i.e., “PUFA biosynthetic pathway enzymes”) present in the endoplasmic reticulim membrane. More specifically, “PUFA biosynthetic pathway enzymes” refer to any of the following enzymes (and genes which encode said enzymes) associated with the biosynthesis of a PUFA, including: a delta-4 desaturase, a delta-S desaturase, a delta-6 desaturase, a delta-12 desaturase, a delta-15 desaturase, a delta-17 desaturase, a delta-9 desaturase, a delta-8 desaturase, a C14/16 elongase, a C16/18 elongase, a C18/20 elongase and/or a C20/22 elongase.

“Desaturase” is a polypeptide which can desaturate one or more fatty acids to produce a mono- or poly-unsaturated fatty acid or precursor which is of interest. Of particular interest herein are delta-8 desaturases that will desaturate a fatty acid between the 8th and 9th carbon atom numbered from the carboxyl-terminal end of the molecule and that can, for example, catalyze the conversion of EDA to DGLA and/or ETrA to ETA Other useful fatty acid desaturases include, for example:

    • a. delta-5 desaturases that catalyze the conversion of DGLA to ARA and/or ETA to EPA;
    • b. delta-6 desaturases that catalyze the conversion of LA to GLA and/or ALA to STA;
    • c. delta-4 desaturases that catalyze the conversion of DPA to DHA;
    • d. delta-12 desaturases that catalyze the conversion of oleic acid to LA;
    • e. delta-15 desaturases that catalyze the conversion of LA to ALA and/or GLA to STA;
    • f. delta-17 desaturases that catalyze the conversion of ARA to EPA and/or DGLA to ETA; and
    • g. delta-9 desaturases that catalyze the conversion of palmitate to palmitoleic acid (16:1) and/or stearate to oleic acid (18:1).

The term “elongase system” refers to a suite of four enzymes that are responsible for elongation of a fatty acid carbon chain to produce a fatty acid that is two carbons longer than the fatty acid substrate that the elongase system acts upon. More specifically, the process of elongation occurs in association with fatty acid synthase, whereby CoA is the acyl carrier (Lassner et al., The Plant Cell 8:281-292 (1996)). In the first step, which has been found to be both substrate-specific and also rate-limiting, malonyl-GoA is condensed with a long-chain acyl-CoA to yield CO2 and a beta-ketoacyl-CoA (where the acyl moiety has been elongated by two carbon atoms). Subsequent reactions include reduction to beta-hydroxyacyl-CoA, dehydration to an enoyl-CoA and a second reduction to yield the elongated acyl-CoA. Examples of reactions catalyzed by elongase systems are the conversion of GLA to DGLA, STA to ETA and EPA to DPA. For the purposes herein, an enzyme catalyzing the first condensation reaction (i.e., conversion of malonyl-GoA to beta-ketoacyl-CoA) will be referred to generically as an “elongase”. In general, the substrate selectivity of elongases is somewhat broad but segregated by both chain length and the degree of unsaturation. Accordingly, elongases can have different specificities. For example, a C16/18 elongase will utilize a C16 substrate (e.g., palmitate), a C18/20 elongase will utilize a C18 substrate (e.g., GLA, STA) and a C20/22 elongase will utilize a C20 substrate (e.g., EPA). In like manner, a delta-9 elongase is able to catalyze the conversion of LA and ALA to EDA and ETrA, respectively (see WO 2002/077213). It is important to note that some elongases have broad specificity and thus a single enzyme may be capable of catalyzing several elongase reactions (e.g., thereby acting as both a C16/18 elongase and a C18/20 elongase).

The following figures and examples describe the invention in further detail. The figures and examples are not meant to limit the scope of the invention or of the claims in any way.

FIG. 1 depicts the general pathways for polyunsaturated fatty acid synthesis up to arachidonic acid and eicosapentaenoic acid.

FIG. 2 depicts the general cloning strategy applied in the examples.

FIG. 3 depicts an alignment of sequences according to the present invention and a prior art sequence(SEQ-001-plus_A (SEQ ID NO: 327); SEQ-001-Kozak_ATG (SEQ ID NO: 328); SEQ-159-promoter (SEQ ID NO: 159); SEQ-020-minimal_enhancer (SEQ ID NO: 20); SEQ-046-enhancer (SEQ ID NO: 46); SEQ-137-enhancer_TFB (SEQ ID NO: 137); SEQ-138-promoter_TFB1 (SEQ ID NO: 138); SEQ-139-promoter_TFB2 (SEQ ID NO: 139); SEQ-140-spacer (SEQ ID NO: 140); SEQ-141-promoter_98bp (SEQ ID NO: 141); SEQ-147-promoter_160bp (SEQ ID NO: 147); SEQ-156-promoter_240bp (SEQ ID NO: 156); SEQ-003-1039bp+2 (SEQ ID NO: 3); SEQ-326-p1039_2UTR (SEQ ID NO: 326); SEQ-002-1039bp+38 (SEQ ID NO: 2); SEQ-324_p1039_38UTR (SEQ ID NO: 324); SEQ-325-p1039_38_differing_part (SEQ ID NO: 325); and WO0116340 GUS data (SEQ ID NO: 311).

FIG. 4 depicts the sequences referred to in the present application.

EXAMPLES

With regards to the present invention, the terms “binary vector, “T-DNA containing plasmid” and “T-plasmid” are used interchangeably. An overview of binary vectors and their usage is given by Hellens et al, Trends in Plant Science (2000) 5: 446-451.

Example 1 General Cloning Methods

Cloning methods, e.g. use of restriction endonucleases to cut double stranded DNA at specific sites, agarose gel electrophoreses, purification of DNA fragments, transfer of nucleic acids onto nitrocellulose and nylon membranes, joining of DNA-fragments, transformation of E. coli cells and culture of bacteria, were performed as described in Sambrook et al. (1989) (Cold Spring Harbor Laboratory Press: ISBN 0-87965-309-6). Polymerase chain reaction was performed using Phusion™ High-Fidelity DNA Polymerase (NEB, Frankfurt, Germany) according to the manufacturer's instructions. In general, primers used in PCR were designed such that at least 20 nucleotides of the 3′ end of the primer anneal perfectly with the template to amplify. Restriction sites were added by attaching the corresponding nucleotides of the recognition sites to the 5′ end of the primer. Fusion PCR, for example described by K. Heckman and L. R. Pease, Nature Protocols (2007) 2, 924-932, was used as an alternative method to join two fragments of interest, e.g. a promoter to a gene or a gene to a terminator.

Example 2 Assembly of Genes Required for EPA and DHA Synthesis Within Binary Vectors

The general cloning strategy is depicted in FIG. 2.

Following the modular cloning scheme depicted in FIG. 2, genes were either synthesized by GeneArt (Regensburg) or PCR-amplified using Phusion™ High-Fidelity DNA Polymerase (NEB, Frankfurt, Germany) according to the manufacturer's instructions from cDNA. In both cases a Nco I and/or Asc I restriction site at the 5′ terminus, and a Pac I restriction site at the 3′ terminus (FIG. 2A) were introduced to enable cloning of these genes between functional elements such as promoters and terminators using these restriction sites such that the genes are functionally linked to both the respective promoter and terminator (see below in this example).

Promoter-terminator modules were created by complete synthesis by GeneArt (Regensburg) or by joining the corresponding expression elements using fusion PCR as described in example 1 and cloning the PCR-product into the TOPO-vector pCR2.1 (Invitrogen) according to the manufacturer's instructions (FIG. 2B). While joining terminator sequences to promoter sequences, recognition sequences for the restriction endonucleases Xma I, Sbf I, Fse I, Kas I, Fso I, Not I were added to either side of the modules in FIG. 2B, and the recognition sites for the restriction endonucleases Nco I, Asc I and Pac I were introduced between promoter and terminator (see FIG. 2B).

To obtain the final expression modules, PCR-amplified genes were cloned between promoter and terminator or intron and terminator via Nco I and/or Pac I restriction sites (FIG. 2C)

Employing the custom multiple cloning site (MCS) containing the recognition sequences for the restriction endonucleases Xma I, Sbf I, Fse I, Kas I, Fso I, Not I, up to three of expression modules were combined as desired to yield expression cassettes harbored by either one of pENTR/A, pENTR/B or pENTR/C constructs(FIG. 2D).

Finally, the Multisite Gateway™ System (Invitrogen) was used to combine three expression cassette harbored by pENTR/A, pENTR/B and pENTR/C (FIG. 2E) to obtain the final binary T-plasmids for plant transformation. Besides features for maintenance of the binary plasmid in E. coli and agrobacteria, the binary T-plasmid contains an acetohydroxyacid synthase (AHAS) gene to allow selection of transgenic plants.

To demonstrate the effectiveness of the enhancer of the invention, particularly of SEQ ID NO. 20 and SEQ ID NO. 46, three different promoter-enhancer combinations (SEQ ID NO. 1-3) based on the Conlinin-1 promotor as described in WO02102970 (FIG. 8) were prepared as described above.

The nucleic acid sequence A comprises a conlinin-1 promoter of SEQ ID NO. 159, a delta-5-desaturase as target gene coding for the amino acid sequence SEQ ID NO. 11 and between the promoter and the target gene an untranslated region of the sequence of SEQ ID NO. 140 fused with the enhancer of SEQ ID NO. 46. SEQ ID NO. 1 thus comprises the promotor and UTR up to the start codon.

The nucleic acid sequence B comprises the conlinin-1 promotor of SEQ ID NO. 159 and the delta-5 desaturase target gene coding for the polypeptide of SEQ ID NO. 11, and between the promoter and the target gene an untranslated region according to SEQ ID NO. 324. This sequence lacks the last 24 nucleotides of the enhancer of the invention according to SEQ ID NO. 46 and completely lacks the enhancer sequence SEQ ID NO. 20. Instead, the last 24 nucleotides of SEQ ID NO. 46 have been replaced by the 38 nucleotides of SEQ ID NO. 325. Even though the sequences A and B are of similar length, the latter sequence has the enhancer of the invention replaced by a sequence of significantly different number of G and T nucleotides. SEQ ID NO. 2 thus comprises the promotor and UTR up to the start codon.

The nucleic acid sequence C comprises the conlinin-1 promotor of SEQ ID NO. 159 and the delta-5 desaturase target gene coding for the polypeptide of SEQ ID NO. 11, and between the promoter and the target gene an untranslated region according to SEQ ID NO. 326. This sequence has the last 24 nucleotides of the enhancer of the invention according to SEQ ID NO. 46 replaced by the sequence “CC”. SEQ ID NO. 3 thus comprises the promotor and UTR up to the start codon.

The delta-5 desaturase target gene converts the fatty acid 20:3n-6 to 20:4n-6 (arachidonic acid, ARA) and 20:4n-3 to 20:5n-3 (eicosapentaenoic acid, EPA). The reaction scheme is given in FIG. 1. In order to provide the substrates 20:3n-6 and 20:4n-3 for the delta-5-desaturase reporter gene in Brassica napus seeds, the constructs comprised in addition to the sequences A, B or C, respectively, further desaturase and elongase genes driven by other seed specific promoters in various combinations. This way it is assured that the activity and expression of the target gene is not dependent on any interaction with the desaturase and elongase genes or enzymes necessary for providing the substrates of the target gene. Among the desaturase and elongase genes used where d12d15Des(Ac_GA) (cf. WO 2007042510), d12Des(Ce_GA) (cf. US 2003172398), d12Des(Co_GA2) (cf. WO 200185968), d12Des(Fg) (cf. WO 2007133425), d12Des(Ps_GA) (cf. WO 2006100241), d12Des(Tp_GA) (cf. WO 2006069710), d6Des(Ol_febit) (cf. WO 2008040787), d6Des(Ol_febit)2 (cf. WO 2008040787), d6Des(Ot_febit) (cf. WO 2008040787), d6Des(Ot_GA) (cf. WO 2005083093), d6Des(Ot_GA2) (cf. WO 2005083093), d6Des(Pir) (cf. WO 2002026946), d6Des(Pir_GAI) (cf. WO 2002026946), d6Des(Plu) (cf. WO 2007051577), d6Elo(Pp_GA) (cf. WO 2001059128), d6Elo(Pp_GA2) (cf. WO 2001059128), d6Elo(Pp_GA3) (cf. WO 2001059128), d6Elo(Tp_GA) (cf. WO 2005012316) and d6Elo(Tp_GA2) (cf. WO 2005012316).

Activity of the delta-5 desaturase was analyzed by measuring fatty acid concentrations in seeds as described in example 4 and calculating the sum of desaturated products (ARA and EPA) divided by the total of desaturase substrates and products (20:3n-6, 20:4n-3, ARA and EPA) to obtain the conversion efficiency. The constructs frequently comprised genes for omega-3 desaturases. Presence of omega-3-desaturase genes was not motivated by the invention; the respective genes were present to answer questions unrelated to the present invention. Omega-3 desaturase only shift the ratio of the substrates between each other, as well as the ratios of the products between each other; their mode of action is depicted in FIG. 1. Thus, the presence of omega-3 desaturases does not influence the analysis of conversion efficiency nor does it perceptibly influence the activity of the target gene or any other fatty acid desaturase activity.

An alignment of the sequences found in the constructs of the present invention is shown in FIG. 3. An overview of genetic elements employed in the constructs is given in Table 1. The delta-5 desaturase gene sequences SEQ ID NO. 10 and SEQ ID NO. 12 code for the identical polypeptide sequence. The activity of the desaturase is not dependent on either gene sequence. Instead, the sequences can be arbitrarily exchanged without altering the outcome of the comparison experiments.

TABLE 1 Overview of genetic elements. “p- . . . ”: Promoter; “d5Des”: delta-5 desaturase; “o3Des”: omega-3 desaturase Genetic element SEQ ID NO. DNA SEQ ID NO. Prot p-(1064 bp) 1 p-(1039 bp + 38) 2 p-(1039 bp + 2) 3 p-BnNapin 4 p-LuPXR 5 p-PvArc 6 p-VfSBP 7 p-BnFAE1 8 p-VfUSP 9 d5Des(Tc_GA) 10 11 d5Des(Tc_GA2) 12 11 o3Des(Cp_GA) 13 14 o3Des(Cp_GA2)_V282L 15 16 o3Des(Pi_GA2) 17 18 o3Des(Pi_GA) 19 18

Example 3 General Procedure for Production of Transgenic Plants

In general, the transgenic rapeseed plants were generated by a modified protocol according to Moloney et al. 1992, Plant Cell Reports, 8:238-242). For the generation rapeseed plants, the binary vectors described in example 2 were transformed into Agrobacterium tumefaciens C58C1:pGV2260 (Deblaere et al. 1984, Nucl. Acids. Res. 13: 4777-4788).

Overnight cultures of agrobacteria harbouring the binary vectors described in example 2 were grown in Murashige-Skoog Medium (Murashige and Skoog 1962 Physiol. Plant. 15, 473) supplemented by 3% saccharose (3MS-Medium). Hypocotyls of sterile rapeseed plants were incubated in a petri dish in a 1:50 diluted agrobacterial suspension obtained from the overnight cultures for 5-10 minutes. This was followed by a three day co-incubation in darkness at 25° C. on 3MS-Medium with 0.8% bacto-agar. After three days the culture was transferred on MS-medium containing 500 mg/l Claforan (Cefotaxime-Natrium), 100 nM Imazethapyr, 20 microM Benzylaminopurin (BAP) and 1,6 g/l Glucose where they were cultivated for 7 days at 25° C. under 16 hours light/8 hours darkness conditions. Growing sprouts—indicating the presence of the T-DNA harboring the AHAS selectable marker, were transferred to MS-Medium containing 2% saccharose, 250 mg/l Claforan and 0.8% Bacto-Agar. Rooting could be stimulated by adding a growth hormone, for example 2-indolbutyl acid.

Regenerated sprouts have been obtained on 2MS-Medium with Imazetapyr and Claforan and were transferred to the greenhouse for further development. After flowering, the mature seeds were harvested and analysed for expression of the genes listed in example 2 via lipid analysis as described in example 4.

Example 4 Lipid Extraction and Lipid Analysis of Plant Oils

Total lipids were extracted from fresh or freeze-dried homogenized plant material (seed or cotyledons) by liquid/liquid extraction using tert-butyl methyl ether.

The fatty acid composition of the extracted lipids was subsequently determined by the means of gas chromatography with flame-ionization detection or mass-selective detection after derivatization of the extracted lipids with trimethylsulfonium hydroxide.

Gas chromatographic separation of the so generated fatty acid methyl esters was performed on a suitable capillary column (50%-Cyanopropylphenyl)-dimethylpolysiloxane as stationary phase).

Identification and quantification of the separated chromatographic signals is accomplished by comparison of the respective retention times and signal intensities to chromatograms of standard solutions with known composition and content of fatty acid methyl esters.

To generate transgenic plants containing the genetic element described in example 2 for production of ARA and EPA in seeds, Canola (Brassica napus) was transformed as described in example 3. Selected plants containing the genetic elements described in example 2 where grown until development of mature seeds (Day/night cycle: 16 h at 200 mE and 21° C., 8 h at darkness and 19° C.). Fatty acids from harvested seeds were extracted and analyzed using gas chromatography.

Example 5 Comparison of Construct Containing the Promoter According to the Invention with Construct Containing Promoter Not According to the Invention

Two constructs (LJB950=comprising SEQ ID NO. 2 without any omega-3 desaturase and LJB997=comprising SEQ ID NO. 1 without any omega-3 desaturase) were evaluated which were identical in all genetic elements and their arrangement in the construct, apart from the promoter-untranslated region combination driving the reporter gene expression. Table 2 shows the result of the three independent transgenic plants (events) obtained for each of the two constructs.

TABLE 2 Comparison of conversion efficiency resulting from the use of two different versions of the Conlinin promoter. 20:3n-6 + Convertion 20:4n- ARA + Efficiency n (# of N = (# of 3 EPA (%) Events) constructs) B) LJB950 (Conlinin 1039 bp + 38) 6.9 6.9 50 3 1 A) LJB997 (Conlinin 1064 bp) 3.6 8.7 71 3 1

Surprisingly, use of sequence SEQ ID NO: 1 resulted in significantly higher conversion efficiency compared to SEQ ID NO: 2.

Example 6 Comparison of Constructs Containing the Promoter According to the Invention with Constructs Containing Promoters Not According to the Invention

A total of 69 constructs were evaluated which all express the delta-5-desaturase protein as shown in SEQ ID NO: 11 as a reporter gene. The reporter gene was functionally linked to SEQ ID NO: 1, SEQ ID NO: 2 or SEQ ID NO: 3. The differences between these three promoter versions are depicted in FIG. 3. In order to provide the substrates 20:3n-6 and 20:4n-3 for the delta-5-desaturase reporter gene in Brassica napus seeds, the constructs further contained desaturase and elongase genes driven by other seed specific promoters in various combinations as described in example 2 and table 1.

Table 3 shows that constructs using the SEQ ID NO: 1 constantly display a significantly higher conversion efficiency compared to constructs using SEQ ID NO: 2 or SEQ ID NO:3. This was particularly unexpected as WO 0116340 taught the uses of the promoter similar to SEQ ID NO: 3 using a reporter gene.

TABLE 3 Comparison of conversion efficiency resulting from the use of different constructs; delta-5 desaturase according to SEQ ID NO. 11 was the target gene for all constructs 20:3n-6 + Convertion 20:4n- ARA + Efficiency n (# of N = (# of 3 EPA (%) Events) constructs) constructs comprising SEQ ID 2 2.6 57 425 16 NO. 2 and an omega-3 desaturase constructs comprising SEQ ID 4.1 3.3 45 363 10 NO. 2 without an omega-3 desaturase constructs comprising SEQ ID 1.8 2 53 448 16 NO. 3 and an omega-3 desaturase constructs comprising SEQ ID 1.3 3.1 70 143 7 NO. 1 and an omega-3 desaturase constructs comprising SEQ ID 1.8 4 69 797 20 NO. 1 without an omega-3 desaturase

Claims

1. A recombinant nucleic acid, comprising a target gene and an untranslated region adjacent to the target gene, wherein the untranslated region comprises an enhancer of at least 18 consecutive nucleotides, wherein at least 14 nucleotides are adenosine or cytidine.

2. A recombinant nucleic acid of claim 1, wherein the enhancer comprises any sequence according to SEQ ID NOS. 84, 85, 86, 87, 88 or 89.

3. A recombinant nucleic acid of claim 1, wherein the enhancer comprises or consists of

i) 18 consecutive nucleotides, of which at least 15 nucleotides are adenosine or cytidine, or
ii) 21 consecutive nucleotides, of which at least 15 nucleotides are adenosine or cytidine, or
iii) 22 consecutive nucleotides, of which at least 16 nucleotides are adenosine or cytidine or
iv) 24 consecutive nucleotides, of which at least 18nucleotides are adenosine or cytidine, or
v) 36 consecutive nucleotides, of which at least 27 nucleotides are adenosine or cytidine or
vi) 57 consecutive nucleotides, of which at least 42 nucleotides are adenosine or cytidine, or
vii) 83 consecutive nucleotides, of which at least 62 nucleotides are adenosine or cytidine.

4. A recombinant nucleic acid of claim 1, wherein the enhancer comprises any sequence according to

a) any of SEQ ID NOS. 20 to 45, or any sequence according to
b) the last 18, 21, 22, 24, 36 or 57 nucleotides of any of SEQ ID NOS.46-83, 161-170, 221-230, 276-285 or 301-310, or
c) any of SEQ ID NOS. 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103 or 137, or
d) a sequence according to b) or c) with 1 additional base inserted therein.

5. A recombinant nucleic acid, comprising a plant promoter and an untranslated region adjacent to the promoter, wherein the untranslated region comprises an enhancer of at least 18 consecutive nucleotides, wherein at least 14 nucleotides are adenosine or cytidine.

6. A recombinant nucleic acid according to claim 5, wherein the enhancer comprises or consists of

i) 18 consecutive nucleotides, of which at least 15 nucleotides are adenosine or cytidine, or
ii) 21 consecutive nucleotides, of which at least 15 nucleotides are adenosine or cytidine, or
iii) 22 consecutive nucleotides, of which at least 16 nucleotides are adenosine or cytidine, or
iv) 24 consecutive nucleotides, of which at least 18 nucleotides are adenosine or cytidine, or
v) 36 consecutive nucleotides, of which at least 27 nucleotides are adenosine or cytidine, or
vi) 57 consecutive nucleotides, of which at least 42 nucleotides are adenosine or cytidine, or
vii) 83 consecutive nucleotides, of which at least 62 nucleotides are adenosine or cytidine.

7. A recombinant nucleic acid of claim 5, wherein the enhancer comprises any sequence according to

a) any of SEQ ID NOS. 20 to 45, or any sequence according to
b) the last 18, 21, 22, 24, 36 or 57 nucleotides of any of SEQ ID NOS. 46 to 83, or
c) any of SEQ ID NOS. 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103 or 137, or
d) a sequence according to b) or c) with 1 additional base inserted therein.

8. A recombinant nucleic acid of claim 1, wherein the enhancer comprises

a) a CCAAT-Box comprising SEQ ID NO. 100, and/or
b) a Dof1/MNB1a binding site comprising SEQ ID NO. 102.

9. A recombinant nucleic acid of claim 1, wherein the target gene is a fatty acid desaturase or elongase gene.

10. An expression cassette, comprising a recombinant nucleic acid of claim 1 and a plant promoter,

wherein the promoter comprises a TATA-box, and a CPRF factor binding site, and a TCP class I transcription factor binding site, and a bZIP protein G-Box binding factor 1 binding site.

11. An expression cassette of claim 10, wherein the promoter further comprises one or more of the following sequences:

a Ry motif,
a prolamin box,
a Cis-element as in GAPDH promoters conferring light inducibility,
a SBF-1 binding site,
a Sunflower homeodomain leucine-zipper protein Hahb-4 binding site.

12. An expression cassette of claim 10, wherein the promoter comprises or consists of

a) a nucleic acid according to any of SEQ ID NOS. 141, 142, 144, 145, 147, 148, 150, 151, 153, 154, 156, 157, 159 to 310, or
b) a nucleic acid having at least 70% sequence identity to any of the nucleic acid sequences according to a).

13. An expression cassette of claim 10, comprising or consisting of the sequence according to SEQ ID NO. 1 or of a sequence having at least 70% sequence identity to the sequence of SEQ ID NO. 1.

14. An expression cassette of claim 10, wherein the target gene does not consist of a sequence according to SEQ ID NO. 311.

15. A vector comprising an expression cassette of claim 10.

16. A plant, plant organ or plant cell comprising

a recombinant nucleic acid of claim 1, operably linked to a promotor.

17. Method of increasing expression or activity of a target gene, comprising the steps of

i) providing, upstream of the target gene, an untranslated region and a plant promoter to obtain an expression cassette of claim 9, and
ii) introducing the expression cassette into a plant cell.

18. A plant, plant organ or plant cell comprising

an expression cassette comprising a recombinant nucleic acid comprising a target gene and an untranslated region adjacent to the target gene, wherein the untranslated region comprises an enhancer of at least 18 consecutive nucleotides, wherein at least 14 nucleotides are adenosine or cytidine
and a plant promoter, wherein the promoter comprises a TATA-box, and a CPRF factor binding site, and a TCP class I transcription factor binding site, and a bZIP protein G-Box binding factor 1 binding site.
Patent History
Publication number: 20200362358
Type: Application
Filed: Aug 4, 2020
Publication Date: Nov 19, 2020
Inventors: Toralf Senger (Durham, NC), Joerg Bauer (Research Triangle Park, NC)
Application Number: 16/984,378
Classifications
International Classification: C12N 15/82 (20060101); C12N 15/113 (20060101);