Alteration of Plant Embryo/Endosperm Size During Seed Development

Info

Publication number: 20090217412
Type: Application
Filed: Mar 21, 2006
Publication Date: Aug 27, 2009
Inventors: Hajime Sakai (Newark, DE), Nobuhiro Nagasawa (Newark, DE)
Application Number: 11/885,914

Abstract

Isolated nucleic acid fragments and recombinant constructs comprising such fragments useful for altering embryo/endosperm size during seed development are disclosed along with a method of controlling embryo/endosperm size during development in plants using such recombinant constructs.

Description

Description

This application claims the benefit of U.S. Provisional Application No. 60/664,512, filed 23 Mar. 2005, the entire content of which is hereby incorporated by reference.

FIELD OF THE INVENTION

The present invention is in the field of plant breeding and genetics and, in particular, relates to recombinant constructs useful for altering embryo/endosperm size during seed development.

BACKGROUND OF THE INVENTION

Elucidation of how the size of a developing embryo is genetically regulated is important because the final volume of endosperm as a storage organ of starch and proteins is affected by embryo size in cereal crops. Researchers have found that genes involved in embryo size contribute to the regulation of endosperm development. Investigation of these genes is important for agriculture because cereal endosperms are the staple diet in many countries.

Rice mutants, having normally differentiated shoot and radicle and either reduced or enlarged embryo when compared to wild type rice, were identified in the early 1990s in plants obtained from methyl-nitrosourea mutagenized Taichung 65 cultivar. Mutant plants displaying an enlarged embryo were designated giant embryo (ge) mutants while plants displaying a smaller embryo were designated reduced embryo (re) mutants (Kitano et al. 1993, Plant J. 3:607-610; Hong et al. in 1995, Dev. Genet 16:298-310).

The phenotypes of each of the three reduced embryo mutants were designated re1, re2, and re3 even though the gene(s) responsible for these phenotypes have not been characterized. A mutation in a different locus is responsible for the mutant phenotype. Phenotypic analysis of ge and re mutant plants led to the theory that embryo size may be determined by the interaction between embryo-specific genes and endosperm-specific genes regulating endosperm development (Hong et al. (1996) Development 122:2051-2058).

The reduced embryo size phenotype of re2 mutant plants is associated with the enlargement of the endosperm size without altering the overall seed size. This phenotype is potentially useful for improving cereal quality by increasing the amount of endosperm tissue, which is rich in starch and other nutrients. Moreover, the reduction of embryo size in seed has a potential benefit for some milling processes, where embryonic tissues are considered as waste, such as in the production of ethanol.

SUMMARY OF THE INVENTION

In a first embodiment, the invention concerns an isolated polynucleotide comprising:

- (a) a nucleic acid sequence encoding a polypeptide involved in altering embryo/endosperm size during seed development, said polypeptide having at least 80% amino acid sequence identity, based on the Clustal V method of alignment, when compared to an amino acid sequence selected from the group consisting of SEQ ID NOs:37, 39, 41, 43, 45, 47, 49, 51, and 53; or
- (b) a nucleic acid sequence set forth in SEQ ID NO:25 wherein said sequence comprises at least one of the following modifications:
  - (i) nucleotide 271 is a T residue instead of a C;
  - (ii) nucleotide 110 is a T residue instead of a G; or
  - (iii) nucleotide 75 is deleted; or
- (c) a nucleic acid sequence set forth in SEQ ID NO:34 wherein
  - (i) nucleotides 4473 through 4829 correspond to a first exon, and
  - (ii) nucleotides 5661 through 6110 correspond to a second exon, and
  - further wherein the nucleotides of (c) (i) and/or (c)(ii) encode a polypeptide involved in altering embryo/endosperm size during seed development,
- (d) a nucleic acid sequence set forth in SEQ ID NO:34 or 72; or
- (e) the full complement of (a), (b), (c), (d), or SEQ ID NO:34; or
- (f) all or part of a non-coding or coding region of the isolated polynucleotide comprising sequences of (a), (b) or SEQ ID NO:34 for use in co-suppression or antisense suppression of endogenous nucleic acid sequences encoding polypeptides involved in altering embryo/endosperm size during seed development.

In a second embodiment, the invention concerns a recombinant DNA construct comprising the isolated polynucleotide of the invention operably linked to at least one regulatory sequence.

In a third embodiment, the invention concerns a plant comprising in its genome the recombinant DNA construct of the invention as well as any seeds obtained from such a plant and oil obtained from such seeds. Also of interest are transformed plant tissue or plant cells comprising the recombinant DNA construct of the invention.

In a fourth embodiment, the invention concerns a method of altering embryo/endosperm size during seed development in a plant comprising:

- (a) transforming plant cells or plant tissue with the recombinant DNA construct of the invention;
- (b) regenerating transgenic plants from the transformed plant cells or plant tissue of (a);
- (c) screening the transgenic plants of (b) for seeds having an altered embryo/endosperm size based on a comparison of embryo/endosperm size of seeds obtained from non-transformed plants.

In a fifth embodiment, the invention concerns a method of mapping genetic variations related to controlling embryo/endosperm size and/or altering oil phenotype in plants comprising:

- (a) crossing two-plant varieties; and
- (b) evaluating genetic variations with respect to
  - (i) a nucleic acid sequence selected from the group consisting of SEQ ID NOs:25, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, and 72; or
  - (ii) a nucleic acid sequence encoding a polypeptide selected from the group consisting of SEQ ID NOs:26, 29, 31, 33, 37, 39, 41, 43, 45, 47, 49, 51, and 53; in progeny plants resulting from the cross of step (a) wherein the evaluation is made using a method selected from the group consisting of RFLP (restriction fragment length polymorphism) analysis, SNP (single nucleotide polymorphism) analysis, and PCR-based analysis.

In a sixth embodiment the invention concerns a method of molecular breeding to control embryo/endosperm size and/or altering oil phenotype in plants comprising:

- (a) crossing two plant varieties; and
- (b) evaluating genetic variations with respect to
  - (i) a nucleic acid sequence selected from the group consisting of SEQ ID NOs:25, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, and 72; or
  - (ii) a nucleic acid sequence encoding a polypeptide selected from the group consisting of SEQ ID NOs:26, 29, 31, 33, 37, 39, 41, 43, 45, 47, 49, 51, and 53;
    in progeny plants resulting from the cross of step (a) wherein the evaluation is made using a method selected from the group consisting of RFLP analysis, SNP analysis, and PCR-based analysis.

BRIEF DESCRIPTION OF THE FIGURES AND-SEQUENCE LISTINGS

The invention can be more fully understood from the following detailed description and the accompanying drawings and Sequence Listing that form a part of this application.

FIG. 1A-D shows an alignment of the nucleotide sequences obtained for wild type RE2 (SEQ ID NO:25), and mutants re2-1 (SEQ ID NO:28), re2-2 (SEQ ID NO:30), and re2-3 (SEQ ID NO:32). Changes in the nucleotide sequence are indicated by a star below the alignment and by a box around the nucleotides at that position. Numbers at the left of the alignment indicate the nucleotide position.

FIG. 2 shows an alignment of the amino acid sequences obtained for polypeptides from wild type RE2 protein (SEQ ID NO:26), and re2-1 mutant protein (SEQ ID NO:29), and re2-2 mutant protein (SEQ ID NO:31). Changes in the amino acid sequence are indicated by a star below the alignment and by a box around the amino acids at that position. As seen in FIG. 2, mutant allele re2-1 had an isoleucine at amino acid 93 instead of the highly conserved threonine; mutant allele re2-2 had a phenylalanine instead of the conserved cysteine at amino acid 37. The deletion of a nucleotide at position 75 in mutant allele re2-3 gene produced a frame shift that results in a 127 amino acid polypeptide for the re2-3 mutant protein (set forth in SEQ ID NO:33) that is quite different than the one encoded by wild type RE2 gene or mutant genes re2-1 or re2-2. Numbers at the left of the alignment indicate the amino acid position.

FIG. 3A-C depicts the Clustal V alignment obtained for the amino acid sequences from the rice wild type RE2 protein (SEQ ID NO:26), the O. sativa protein having NCBI General Identifier No. 18652509 (SEQ ID NO:27), the A. thaliana LOB domain 18 protein having NCBI General Identifier No. 17227164 (SEQ ID NO:54), and the amino acid sequences of the polypeptides encoded by corn clones cef1f.pk001.f4:fis (SEQ ID NO:37), cpf1c.pk006.d18a:fis (SEQ ID NO:39), cpi1c.pk005.a12:fis (SEQ ID NO:41), and cr1n.pk0028.h3a:fis (SEQ ID NO:43), Euphorbia lagascae clone eel1c.pk003.b10:fis (SEQ ID NO:45), columbine clone eav1c.pk003.c9 (SEQ ID NO:47), guar clone lds3c.pk011.j11:fis (SEQ ID NO:49), soybean clone sdr1f.pk005.d21.f:fis (SEQ ID NO:51), and wheat clone wdr1f.pk002.l10:fis (SEQ ID NO:53). The program uses dashes to maximize the alignment. An asterisk (*) below the alignment indicates amino acids conserved among all the sequences. The C-block, a GAS-block, and a leucine zipper conserved motifs are shown boxed. Numbers at the left of the alignment indicate the amino acid position.

The following sequence descriptions and sequence listings attached hereto comply with the rules governing nucleotide and/or amino acid sequence disclosures in patent applications as set forth in 37 C.F.R. §1.821-1.825.

SEQ ID NO:1 is the nucleotide sequence of oligonucleotide primer C10 6-3 used to amplify CAPS marker C10 7.7 to identify the re2 locus.

SEQ ID NO:2 is the nucleotide sequence of oligonucleotide primer C10 6-4 used to amplify CAPS marker C10 7.7 to identify the re2 locus.

SEQ ID NO:3 is the nucleotide sequence of oligonucleotide primer C10 15.9-1 used to amplify CAPS marker C10 15.9 to identify the re2 locus.

SEQ ID NO:4 is the nucleotide sequence of oligonucleotide primer C10 15.9-2 used to amplify CAPS marker C10 15.9 to identify the re2 locus.

SEQ ID NO:5 is the nucleotide sequence of oligonucleotide primer C10-7.7 2 HPYIVF used to amplify CAPS marker C10 7.7 Hpy.

SEQ ID NO:6 is the nucleotide sequence of oligonucleotide primer C10-7.7 2 HPYIVR used to amplify CAPS marker C10 7.7 Hpy.

SEQ ID NO:7 is the nucleotide sequence of oligonucleotide primer 11.5 HpyV used to amplify CAPS marker C10 11.5.

SEQ ID NO:8 is the nucleotide sequence of oligonucleotide primer C10 11.5-9 used to amplify CAPS marker C10 11.5.

SEQ ID NO:9 is the nucleotide sequence of oligonucleotide primer C10 11-5 used to amplify CAPS marker C10 11.0.

SEQ ID NO:10 is the nucleotide sequence of oligonucleotide primer 11 HinfR used to amplify CAPS marker C10 11.0.

SEQ ID NO:11 is the nucleotide sequence of oligonucleotide primer 9.6 DraIF used to amplify CAPS marker C10 9.6.

SEQ ID NO:12 is the nucleotide sequence of oligonucleotide primer 9.6 DraIR used to amplify CAPS marker C10 9.6.

SEQ ID NO:13 is the nucleotide sequence of the oligonucleotide primer E08 93KF used to amplify CAPS marker E08 93K.

SEQ ID NO:14 is the nucleotide sequence of the oligonucleotide primer E08 93KR used to amplify CAPS marker E08 93K.

SEQ ID NO:15 is the nucleotide sequence of the oligonucleotide primer E08 46KF used to amplify CAPS marker E08 46K.

SEQ ID NO:16 is the nucleotide sequence of the oligonucleotide primer E08 46KR used to amplify CAPS marker E08 46K.

SEQ ID NO:17 is the nucleotide sequence of the oligonucleotide primer K08 21KF used to amplify CAPS marker K08 21K.

SEQ ID NO:18 is the nucleotide sequence of the oligonucleotide primer K08 21KR used to amplify CAPS marker K08 21K.

SEQ ID NO:19 is the nucleotide sequence of the oligonucleotide primer K08 46KF used to amplify SNP-based marker K08 46K.

SEQ ID NO:20 is the nucleotide sequence of the oligonucleotide primer K08 46KR used to amplify SNP-based marker K08 46K.

SEQ ID NO:21 is the nucleotide sequence of the oligonucleotide primer LOB-82F used to amplify the first exon (exon 1) of RE2 wild type gene or re2 mutant gene from genomic DNA.

SEQ ID NO:22 is the nucleotide sequence of the oligonucleotide primer LOB R1 used to amplify the first exon (exon 1) of RE2 wild type gene or re2 mutant gene from genomic DNA.

SEQ ID NO:23 is the nucleotide sequence of the oligonucleotide primer LOB F2 used to amplify the second exon (exon 2) of RE2 wild type gene or re2 mutant gene from genomic DNA.

SEQ ID NO:24 is the nucleotide sequence of the oligonucleotide primer LOB R2 used to amplify the second exon (exon 2) of RE2 wild type gene or re2 mutant gene from genomic DNA.

SEQ ID NO:25 is the nucleotide sequence of the wild-type rice RE2 gene open reading frame (ORF) identified in the instant application.

SEQ ID NO:26 is the amino acid sequence of the wild-type rice RE2 protein derived from translating nucleotides 1 through 807 of SEQ ID NO:25.

SEQ ID NO:27 is the amino acid sequence of the rice protein of unknown function found in the NCBI database as Version AAL77143.1 having NCBI General Identifier No. 18652509.

SEQ ID NO:28 is the nucleotide sequence obtained for mutant allele re2-1 gene.

SEQ ID NO:29 is the amino acid sequence of a re2-1 mutant allele protein obtained by translating nucleotides 1 through 807 of SEQ ID NO:28.

SEQ ID NO:30 is the nucleotide sequence obtained for mutant allele re2-2 gene.

SEQ ID NO:31 is the amino acid sequence of a re2-2 mutant allele protein obtained by translating nucleotides 1 through 807 of SEQ ID NO:30.

SEQ ID NO:32 is the nucleotide sequence obtained for mutant allele re2-3 gene.

SEQ ID NO:33 is the amino acid sequence of a re2-3 mutant allele protein obtained by translating nucleotides 1 through 378 of SEQ ID NO:32.

SEQ ID NO:34 is the nucleotide sequence of the approximately 9 Kb BamH I fragment from RE2G4 which comprises the RE2 wild type gene coding region. Nucleotides 1 through 4472 are 5′ of the ATG initiation codon, nucleotides 4473 through 4829 correspond to the first exon, nucleotides 4830 through 5660 correspond to an intron, and nucleotides 5661 through 6110 correspond to the second exon. Nucleotides 6111 through 6113 form a termination codon.

SEQ ID NO:35 is the nucleotide sequence of vector pML18 used to subclone the approximately 9 Kb BamH I fragment from RE2G4 comprising the rice RE2 wild type gene coding region.

SEQ ID NO:36 is the nucleotide sequence comprising the entire cDNA insert in clone cef1f.pk001.f4:fis encoding a putative corn RE2 protein homolog.

SEQ ID NO:37 is the deduced amino acid sequence of a putative corn RE2 protein homolog derived from nucleotides 76 through 851 of SEQ ID NO:36.

SEQ ID NO:38 is the nucleotide sequence comprising the entire cDNA insert in clone cpf1c.pk006.d18a:fis encoding a putative corn RE2 protein homolog.

SEQ ID NO:39 is the deduced amino acid sequence of a putative corn RE2 protein homolog derived from nucleotides 151 through 804 of SEQ ID NO:38.

SEQ ID NO:40 is the nucleotide sequence comprising the entire cDNA insert in clone cpi1c.pk005.a12:fis encoding a putative corn RE2 protein homolog.

SEQ ID NO:41 is the deduced amino acid sequence of a putative corn RE2 protein homolog derived from nucleotides 81 through 854 of SEQ ID NO:40.

SEQ ID NO:42 is the nucleotide sequence comprising the entire cDNA insert in clone cr1n.pk0028.h3a:fis encoding a putative corn RE2 protein homolog.

SEQ ID NO:43 is the deduced amino acid sequence of a putative corn RE2 protein homolog derived from nucleotides 158 through 658 of SEQ ID NO:42.

SEQ ID NO:44 is the nucleotide sequence comprising the entire cDNA insert in clone eel1c.pk003.b10:fis encoding a putative Euphorbia RE2 protein homolog.

SEQ ID NO:45 is the deduced amino acid sequence of a putative Euphorbia RE2 protein homolog derived from nucleotides 71 through 823 of SEQ ID NO:44.

SEQ ID NO:46 is the nucleotide sequence comprising a portion of the cDNA insert in clone eav1c.pk003.c9 encoding a fragment of a putative columbine RE2 protein homolog.

SEQ ID NO:47 is the deduced amino acid sequence of a fragment of a putative columbine RE2 protein homolog derived from nucleotides 2 through 382 of SEQ ID NO:46.

SEQ ID NO:48 is the nucleotide sequence comprising the entire cDNA insert in clone Ids3c.pk011.j11:fis encoding a putative guar RE2 protein homolog.

SEQ ID NO:49 is the deduced amino acid sequence of a putative guar RE2 protein homolog derived from nucleotides 146 through 898 of SEQ ID NO:48.

SEQ ID NO:50 is the nucleotide sequence comprising the entire cDNA insert in clone sdr1f.pk005.d21.f:fis encoding putative soybean RE2 protein homolog.

SEQ ID NO:51 is the deduced amino acid sequence of a putative soybean RE2 protein homolog derived from nucleotides 971 through 1609 of SEQ ID NO:50.

SEQ ID NO:52 is the nucleotide sequence comprising the entire cDNA insert in clone wdr1f.pk002.l10:fis encoding a putative wheat RE2 protein homolog.

SEQ ID NO:53 is the deduced amino acid sequence of a putative wheat RE2 protein homolog derived from nucleotides 80 through 640 of SEQ ID NO:52.

SEQ ID NO:54 is the amino acid sequence of the Arabidopsis thaliana LOB domain 18 protein having NCBI General Identifier No. 17227164.

SEQ ID NO:55 is the consensus amino acid sequence included in the C block of RE2 protein homologs.

SEQ ID NO:56 is the amino acid sequence of the motif at the N-terminus of the 49 amino acid GAS block of RE2 protein homologs.

SEQ ID NO:57 is the amino acid sequence of the motif at the C-terminus of the 49 amino acid GAS block of RE2 protein homologs.

SEQ ID NO:58 is the amino acid sequence of the Leucine-zipper motif of RE2 protein homologs.

SEQ ID NO:59 is the nucleotide sequence of oligonucleotide primer Cpi Bbsl F used to amplify genomic Zea mays RE2 gene.

SEQ ID NO:60 is the nucleotide sequence of oligonucleotide primer Cpi Bbsl R used to amplify genomic Zea mays RE2 gene.

SEQ ID NO:61 is the nucleotide sequence of the genomic fragment encoding a maize RE2 protein homolog obtained by amplifying a maize genomic library with primers Cpi Bbsl F and Cpi Bbsl R. Nucleotides 79 through 429 correspond to the first exon, nucleotides 430 through 1363 correspond to an intron, and nucleotides 1364 through 1783 correspond to the second exon.

SEQ ID NO:62 is the nucleotide sequence of oligonucleotide primer RE2 pro Bst 2F used for amplifying a portion of the 5′ region of the OsRE2 gene.

SEQ ID NO:63 is the nucleotide sequence of oligonucleotide primer RE2 PRO R Bbsl used for amplifying a portion of the 5′ region of the OsRE2 gene.

SEQ ID NO:64 is the nucleotide sequence of plasmid RE2Pro comprising a portion of the OsRE2 gene promoter region.

SEQ ID NO:65 is the nucleotide sequence of oligonucleotide primer RE2 TERM Xbal R used for amplifying a 780 bp fragment of the 3′ terminator region from the OsRE2 gene.

SEQ ID NO:66 is the nucleotide sequence of oligonucleotide primer RE2 TERM EcoBspml used for amplifying a 780 bp fragment of the 3′ terminator region from the OsRE2 gene.

SEQ ID NO:67 is the nucleotide sequence of plasmid RE2TERGEM comprising a portion of the OsRE2 gene terminator region.

SEQ ID NO:68 is the nucleotide sequence of an oligonucleotide primer that may be used to identify RE2 homologs from other plant species.

SEQ ID NO:69 is the nucleotide sequence of an oligonucleotide primer that may be used to identify RE2 homologs from other species.

SEQ ID NO:70 is the nucleotide sequence of an oligonucleotide primer that may be used to identify RE2 homologs from other plant species.

SEQ ID NO:71 is the nucleotide sequence of the “RE2 second exon probe” used to screen for cDNAs encoding RE2 proteins.

SEQ ID NO:72 is the nucleotide sequence of clone RE2 cDNA C1, the longest cDNA clone identified encoding an RE2 protein.

The Sequence Listing contains the one letter code for nucleotide sequence characters and the three letter codes for amino acids as defined in conformity with the IUPAC-IUBMB standards described in Nucleic Acids Res. 13:3021-3030 (1985) and in the Biochemical J. 219 (No. 2):345-373 (1984) which are herein incorporated by reference. The symbols and format used for nucleotide and amino acid sequence data comply with the rules set forth in 37 C.F.R. §1.822.

DETAILED DESCRIPTION OF THE INVENTION

Disclosure of all references, patents, and patent applications cited herein are hereby incorporated by reference.

The terms “isolated nucleic acid fragment” and “isolated polynucleotide” are used interchangeably herein. These terms refer to a polymer of RNA or DNA that is single- or double-stranded, optionally containing synthetic, non-natural or altered nucleotide bases. An isolated nucleic acid fragment in the form of a polymer of DNA may be comprised of one or more segments of cDNA, genomic DNA or synthetic DNA. Nucleotides (usually found in their 5′-monophosphate form) are referred to by their single letter designation as follows: “A” for adenylate or deoxyadenylate (for RNA or DNA, respectively), “C” for cytidylate or deoxycytidylate, “G” for guanylate or deoxyguanylate, “U” for uridylate, “T” for deoxythymidylate, “R” for purines (A or G), “Y” for pyrimidines (C or T), “K” for G or T, “H” for A or C or T, “I” for inosine, and “N” for any nucleotide.

It has been reported that the Lateral Organ Boundary (LOB) gene in Arabidopsis has a potential role in lateral organ development. See Shuai et al., (2002), Plant Phys. 129, 747-761. Shuai et al. found LOB gene expression at the base of lateral organs in the shoots and roots of Arabidopsis. In fact, 23 members of the LOB domain family (LBD) of genes were found to exhibit expression patterns in the root tissues of Arabidopsis.

The LOB domain 18 protein is considered as being in the class I group of the Lateral Organ Boundaries (LOB) domain protein plant-specific gene family. The Class I LOB domain proteins contain a C-block, a GAS-block, and a leucine zipper motif (Shuai, B. et al., 2002, Plant Phys. 129:747-761). Thus, it is expected that an Oryza sativa RE2 protein and its homologs would also contain a C-block, a GAS-block, and a leucine zipper motif. The consensus sequences of these motifs were identified using a Clustal V alignment and are indicated in FIG. 3.

FIG. 3A-C depicts the Clustal V alignment obtained for the amino acid sequences from the rice wild type RE2 protein (SEQ ID NO:26), the O. sativa protein having NCBI General Identifier No. 18652509 (SEQ ID NO:27), the A. thaliana LOB domain 18 protein having NGBI General Identifier No. 17227164 (SEQ ID NO:54), and the amino acid sequences of the polypeptides encoded by corn clones cef1f.pk001.f4:fis (SEQ ID NO:37), cpf1c.pk006.d18a:fis (SEQ ID NO:39), cpi1c.pk005.a12:fis (SEQ ID NO:41), and cr1n.pk0028.h3a:fis (SEQ ID NO:43), Euphorbia lagascae clone eel1c.pk003.b10:fis (SEQ ID NO:45), columbine clone eav1c.pk003.c9 (SEQ ID NO:47), guar clone Ids3c.pk011.j11:fis (SEQ ID NO:49), soybean clone sdr1f.pk005.d21.f:fis (SEQ ID NO:51), and wheat clone wdr1f.pk002.l10:fis (SEQ ID NO:53). The program uses dashes to maximize the alignment. An asterisk (*) below the alignment indicates amino acids conserved among all the sequences. The C-block, a GAS-block, and a leucine zipper conserved motifs are shown boxed.

It has been found in the present invention that a single mutation of a rice gene encoding a member of a class I LOB domain protein family can lead to alteration of embryo/endosperm size during seed development.

The gene associated with the reduced embryo phenotype is named Reduced Embryo2 (RE2). Silencing or inhibition of this gene leads to a reduction of embryonic tissue, thus, resulting in a smaller embryo size and a concomitantly larger endosperm size. Reduction of embryo size will result in seeds having a reduced amount of components such as oils. On the other hand, overexpression of this gene might lead to an increase of embryonic tissue, thus, resulting in a larger embryo size and a concomitantly smaller endosperm size.

The italicized and uppercase term “RE2” as used herein refers to a genetic locus capable of expressing a Reduced Embryo 2 protein. The italicized and lowercase letters term “re2” as used herein refers to a mutated form of RE2. Italics are not used when referring to a protein or polypeptide encoded by the genetic locus. Thus, the uppercase term “RE2” as used herein refers to the wild type protein, and the lowercase “re2” as used herein refers to a mutant protein. As was noted above, the rice RE2 isolated polynucleotide was identified in the instant application using high fidelity mapping of DNA obtained from reduced embryo 2 (re2) mutant plants. These mutant plants produce grain that have a small embryo phenotype.

The terms “Oryza sativa RE2”, “OsRE2”, and “rice RE2” are used interchangeably herein. These terms refer to a polynucleotide isolated from wild-type rice and whose sequence is set forth in the instant application. The rice RE2 isolated polynucleotide is the polynucleotide that, when mutated, is responsible for a reduced embryo 2, or re2, phenotype as exemplified by Hong et al. (1996, Development 122:2051-2058). Mutant rice displaying the re2 phenotype has a reduced embryo size and an enlarged endosperm size.

The terms “subfragment that is functionally equivalent” and “functionally equivalent subfragment” are used interchangeably herein. These terms refer to a portion or subsequence of an isolated nucleic acid fragment in which the ability to alter gene expression or produce a certain phenotype is retained whether or not the fragment or subfragment encodes an active enzyme. For example, the fragment or subfragment can be used in the design of recombinant DNA constructs to produce the desired phenotype in a transformed plant. Recombinant DNA constructs can be designed for use in co-suppression or antisense by linking a nucleic acid fragment or subfragment thereof, whether or not it encodes an active enzyme, in the appropriate orientation relative to a plant promoter sequence.

The terms “homology”, “homologous”, “substantially similar” and “corresponding substantially” are used interchangeably herein. They refer to nucleic acid fragments wherein changes in one or more nucleotide bases does not affect the ability of the nucleic acid fragment to mediate gene expression or produce a certain phenotype. These terms also refer to modifications of the nucleic acid fragments of the instant invention such as deletion or insertion of one or more nucleotides that do not substantially alter the functional properties of the resulting nucleic acid fragment relative to the initial, unmodified fragment. It is therefore understood, as those skilled in the art will appreciate, that the invention encompasses more than the specific exemplary sequences.

A “homolog” can be a second gene in the same plant type or in a different plant type that has a polynucleotide sequence that is functionally identical to a sequence in the first gene. It is believed that, in general, homologs share a common evolutionary past.

The term “RE2 homolog” refers to an isolated polynucleotide encoding a class I LOB domain polypeptide obtained from a plant species, other than rice, that functions in a manner similar to that of the rice RE2 isolated polynucleotide and that, when mutated, exhibits a reduced embryo phenotype. The corn, Euphorbia lagascae, Columbine, guar, soybean, and wheat isolated polynucleotides disclosed herein appear to encode such polypeptides, namely, these polypeptides are members of a class I LOB domain protein family, have a C-like motif, a GAS-like motif, and a leucine zipper-like motif, and are useful for altering embryo/endosperm size during seed development.

A search of GenBank and Du Pont proprietary databases using the rice RE2 gene sequence or the RE2 polypeptide sequence uncovered a number of isolated polynucleotides from plants that appeared to be homologous. RE2 homologs appear to encompass those polynucleotides isolated from plants, other than rice, which appeared to encode a polypeptide that shares sequence and/or functional similarity to the polypeptide encoded by the rice RE2 isolated polynucleotide. It is believed that such a polynucleotide would comprise a subset of the polynucleotides encoding polypeptides of the class I LOB domain family, and that alteration in the expression of this polypeptide may affect embryo/endosperm size.

“Sequence identity” or “identity” in the context of nucleic acid or polypeptide sequences refers to the nucleic acid bases or amino acid residues in the two sequences that are the same when aligned for maximum correspondence over a specified comparison window.

Thus, “Percentage of sequence identity” refers to the valued determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the results by 100 to yield the percentage of sequence identity. Useful examples of percent sequence identities include, but are not limited to, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%, or any integer percentage from 55% to 100%. These identities can be determined using any of the programs described herein.

Sequence alignments and percent identity or similarity calculations may be determined using a variety of comparison methods designed to detect homologous sequences including, but not limited to, the Megalign program of the LASARGENE bioinformatics computing suite (DNASTAR Inc., Madison, Wis.). Multiple alignment of the sequences are performed using the Clustal V method of alignment (Higgins, D. G. and Sharp, P. M. (1989) Comput. Appl. Biosci. 5:151-153; Higgins, D. G. et al. (1992) Comput. Appl. Biosci. 8:189-191) with the default parameters (GAP PENALTY=10, GAP LENGTH PENALTY=10). Default parameters for pairwise alignments and calculation of percent identity of protein sequences using the Clustal method are KTUPLE=1, GAP PENALTY=3, WINDOW=5 and DIAGONALS SAVED=5. For nucleic acids these parameters are KTUPLE=2, GAP PENALTY=5, WINDOW=4 and DIAGONALS SAVED=4.

It is well understood by one skilled in the art that many levels of sequence identity are useful in identifying polypeptides, from other plant species, wherein such polypeptides have the same or similar function or activity. Useful examples of percent identities include, but are not limited to, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%, or any integer percentage from 55% to 100%. Indeed, any integer amino acid identity from 50%-100% may be useful in describing the present invention. Also, of interest is any full or partial complement of this isolated nucleotide fragment.

It is believed that another way to identify genes that are homologous to the rice RE2 gene is to screen by hybridization. It is possible to hybridize cDNA at 60° C. with a probe derived from the rice RE2 gene and wash at medium stringency conditions (5×SSPE, 0.5% SDS at 65° C. followed by 1×SSPE, 0.5×SDS at 65° C.). For general hybridization protocols, see Ausubel et al. 1993, “Current Protocols in Molecular Biology” John Wiley & Sons, USA, or Sambrook et al. 1989. Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press. An appropriate probe with a unique sequence can be extracted, for example, from part of the exon 1 of the RE2 gene. Exon 1 of the RE2 gene has regions of sequence identity between the corn and rice RE2 nucleotide sequences. Oligonucleotide primers useful in hybridization screenings may have the sequences disclosed in SEQ ID NO: 68, SEQ ID NO:69, or SEQ ID NO:70, for example. The oligonucleotide primers having the sequences set forth in SEQ ID NO: 68, SEQ ID NO:69, or SEQ ID NO:70 have the sequences set forth as follows:

SEQ ID NO:68 5′-GCATCTTCGCGCCCTACTTCGACTCGG-3′ SEQ ID NO:69 5′-GCACAAGGTGTTCGGCGCCAGCAACGTGTCCAAGC-3′ SEQ ID NO:70 5′-CCGCGACCCCGTCTACGGCTGCGTCGCCCACCTC-3′

Genomic DNA or cDNA clones giving significant signals may be isolated and their chromosomal origin analyzed using CAPS markers or SNP-based markers similar to those described in the present Application. DNA fragments containing the region homologous to rice RE2 gene may be further subcloned and sequenced. Polypeptides encoded by these polynucleotides should the have the C-Block, GAS Block N-end and C-end, and Leu Zipper consensus sequences described in Example 6 and as set forth in SEQ ID NOs:55 through 58.

“Gene” refers to a nucleic acid fragment that expresses a specific protein, including regulatory sequences preceding (5′ non-coding sequences) and following (3′ non-coding sequences) the coding sequence. “Native gene” refers to a gene as found in nature with its own regulatory sequences. “Recombinant DNA construct” refers to a combination of nucleic acid fragments that are not normally found together in nature. Accordingly, a recombinant DNA construct may comprise regulatory sequences and coding sequences that are derived from different sources, or regulatory sequences and coding sequences derived from the same source, but arranged in a manner different than that normally found in nature. A “foreign” gene refers to a gene not normally found in the host organism, but that is introduced into the host organism by gene transfer. Foreign genes can comprise native genes inserted into a non-native organism, or recombinant DNA constructs. A “transgene” is a gene that has been introduced into the genome by a transformation procedure.

“coding sequence” refers to a DNA sequence that codes for a specific amino acid sequence. “Regulatory sequences” refer to nucleotide sequences located upstream (5′ non-coding sequences), within, or downstream (3′ non-coding sequences) of a coding sequence, and which influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences may include, but are not limited to, promoters, translation leader sequences, introns, and polyadenylation recognition sequences.

“Promoter” refers to a DNA sequence capable of controlling the expression of a coding sequence or functional RNA. The promoter sequence consists of proximal and more distal upstream elements, the latter elements often referred to as enhancers. Accordingly, an “enhancer” is a DNA sequence which can stimulate promoter activity and may be an innate element of the promoter or a heterologous element inserted to enhance the level or tissue-specificity of a promoter. Promoter sequences can also be located within the transcribed portions of genes, and/or downstream of the transcribed sequences. Promoters may be derived in their entirety from a native gene, or be composed of different elements derived from different promoters found in nature, or even comprise synthetic DNA segments. It is understood by those skilled in the art that different promoters may direct the expression of an isolated nucleic acid fragment in different tissues or cell types, or at different stages of development, or in response to different environmental conditions. Promoters which cause an isolated nucleic acid fragment to be expressed in most cell types at most times are commonly referred to as “constitutive promoters”. New promoters of various types useful in plant cells are constantly being discovered; numerous examples may be found in the compilation by Okamuro and Goldberg, (1989) Biochemistry of Plants 15:1-82. It is further recognized that since in most cases the exact boundaries of regulatory sequences have not been completely defined, DNA fragments of some variation may have identical promoter activity.

Specific examples of promoters that may be useful in expressing the nucleic acid fragments of the invention include, but are not limited to, the oleosin promoter (PCT Publication WO99/65479, published Dec. 12, 1999), the maize 27 kD zein promoter (Ueda et al (1994) Mol. Cell. Biol. 14:4350-4359), the ubiquitin promoter (Christensen et al (1992) Plant Mol. Biol. 18:675-680), the SAM synthetase promoter (PCT Publication WO00/37662, published Jun. 29, 2000), the CaMV 35S (Odell et al (1985) Nature 313:810-812), and the promoter described in PCT. Publication WO02/099063 published Dec. 12, 2002.

An “intron” is an intervening sequence in a gene that does not encode a portion of the protein sequence. Thus, such sequences are transcribed into RNA but are then excised and are not translated. The term is also used for the excised RNA sequences. An “exon” is a portion of the sequence of a gene that is transcribed and is found in the mature messenger RNA derived from the gene, but is not necessarily a part of the sequence that encodes the final-gene product.

The “translation leader sequence” refers to a DNA sequence located between the promoter sequence of a gene and the coding sequence. The translation leader sequence is present in the fully processed mRNA upstream of the translation start sequence. The translation leader sequence may affect processing of the primary transcript to mRNA, mRNA stability or translation efficiency. Examples of translation leader sequences have been described (Turner, R. and Foster, G. D. (1995) Molecular Biotechnology 3:225).

The “3′ non-coding sequences” refer to DNA sequences located downstream of a coding sequence and include polyadenylation recognition sequences and other sequences encoding regulatory signals capable of affecting mRNA processing or gene expression. The polyadenylation signal is usually characterized by affecting the addition of polyadenylic acid tracts to the 3′ end of the mRNA precursor. The use of different 3′ non-coding sequences is exemplified by Ingelbrecht et al. (1989) Plant Cell 1:671-680.

“RNA transcript” refers to the product resulting from RNA polymerase-catalyzed transcription of a DNA sequence. When the RNA transcript is a perfect complementary copy of the DNA sequence, it is referred to as the primary transcript or it may be a RNA sequence derived from post-transcriptional processing of the primary transcript and is referred to as the mature RNA. “Messenger RNA (mRNA)” refers to the RNA that is without introns and that can be translated into protein by the cell. “cDNA” refers to a DNA that is complementary to and synthesized from an mRNA template using the enzyme reverse transcriptase. The cDNA can be single-stranded or converted into the double-stranded form using the Klenow fragment of DNA polymerase I. “Sense” RNA refers to RNA transcript that includes the mRNA and can be translated into protein within a cell or in vitro. “Antisense RNA” refers to an RNA transcript that is complementary to all or part of a target primary transcript or mRNA and that blocks the expression of a target isolated nucleic acid fragment (U.S. Pat. No. 5,107,065). The complementarity of an antisense RNA may be with any part of the specific gene transcript, i.e., at the 5′ non-coding sequence, 3′ non-coding sequence, introns, or the coding sequence. “Functional RNA” refers to antisense RNA, ribozyme RNA, or other RNA that may not be translated but yet has an effect on cellular processes. The terms “complement” and “reverse complement” are used interchangeably herein with respect to mRNA transcripts, and are meant to define the antisense RNA of the message.

The term “endogenous RNA” refers to any RNA which is encoded by any nucleic acid sequence present in the genome of the host prior to transformation with the recombinant construct of the present invention, whether naturally-occurring or non-naturally occurring, i.e., introduced by recombinant means, mutagenesis, etc.

The term “non-naturally occurring” means artificial, not consistent with what is normally found in nature.

The term “operably linked” refers to an association of nucleic acid sequences on a single nucleic acid fragment so that the function of one is regulated by the other. For example, a promoter is operably linked with a coding sequence when it is capable of regulating the expression of that coding sequence (i.e., that the coding sequence is under the transcriptional control of the promoter). Coding sequences can be operably linked to regulatory sequences in a sense or antisense orientation. In another example, the complementary RNA regions of the invention can be operably linked, either directly or indirectly, 5′ to the target mRNA, or 3′ to the target mRNA, or within the target mRNA, or a first complementary region is 5′ and its complement is 3′ to the target mRNA.

Cosuppression technology constitutes the subject matter of U.S. Pat. No. 5,231,020, which issued to Jorgensen et al. on Jul. 27, 1999. The phenomenon observed by Napoli et al. in petunia was referred to as “cosuppression” since expression of both the endogenous gene and the introduced transgene were suppressed (for reviews see Vaucheret et al., Plant J. 16:651-659 (1998); and Gura, Nature 404:804-808 (2000)).

Co-suppression constructs in plants previously have been designed by focusing on overexpression of a nucleic acid sequence having homology to an endogenous mRNA, in the sense orientation, which results in the reduction of all RNA having homology to the overexpressed sequence (see Vaucheret et al. (1998) Plant J 16:651-659; and Gura (2000) Nature 404:804-808). The overall efficiency of this phenomenon is low, and the extent of the RNA reduction is widely variable. Recent work has described the use of “hairpin” structures that incorporate all, or part, of an mRNA encoding sequence in a complementary orientation that results in a potential “stem-loop” structure for the expressed RNA (PCT Publication WO 99/53050 published on Oct. 21, 1999). This increases the frequency of co-suppression in the recovered transgenic plants. Another variation describes the use of plant viral sequences to direct the suppression, or “silencing”, of proximal mRNA encoding sequences (PCT Publication WO 98/36083 published on Aug. 20, 1998). Both of these co-suppressing phenomena have not been elucidated mechanistically, although recent genetic evidence has begun to unravel this complex situation (Elmayan et al. (1998) Plant Cell 10:1747-1757).

In addition to cosuppression, antisense technology has also been used to block the function of specific genes in cells. Antisense RNA is complementary to the normally expressed RNA, and presumably inhibits gene expression by interacting with the normal RNA strand. The mechanisms by which the expression of a specific gene are inhibited by either antisense or sense RNA are on their way to being understood. However, the frequencies of obtaining the desired phenotype in a transgenic plant may vary with the design of the construct, the gene, the strength and specificity of its promoter, the method of transformation and the complexity of transgene insertion events (Baulcombe, Curr. Biol. 12(3):R82-84 (2002); Tang et al., Genes Dev. 17(1):49-63 (2003); Yu et al., Plant Cell. Rep. 22(3):167-174 (2003)). Cosuppression and antisense inhibition are also referred to as “gene silencing”, “post-transcriptional gene silencing” (PTGS), RNA interference or RNAi. See for example U.S. Pat. No. 6,506,559.

MicroRNAs (miRNA) are small regulatory RNAs that control gene expression. miRNAs bind to regions of target RNAs and inhibit their translation and, thus, interfere with production of the polypeptide encoded by the target RNA. miRNAs can be designed to be complementary to any region of the target sequence RNA including the 3′ untranslated region, coding region, etc. miRNAs are processed from highly structured RNA precursors that are processed by the action of a ribonuclease III termed DICER. While the exact mechanism of action of miRNAs is unknown, it appears that they function to regulate expression of the target gene. See, e.g., U.S. Patent Publication No. 2004/0268441 A1 which was published on Dec. 30, 2004.

The term “expression”, as used herein, refers to the production of a functional end-product, be it mRNA or translation of mRNA into a polypeptide. “Antisense inhibition” refers to the production of antisense RNA transcripts capable of suppressing the expression of the target protein. “Co-suppression” refers to the production of sense RNA transcripts capable of suppressing the expression of identical or substantially similar foreign or endogenous genes (U.S. Pat. No. 5,231,020).

“Overexpression” refers to the production of a functional end-product in transgenic organisms that exceeds levels of production when compared to expression of that functional end-product in a normal, wild type or non-transformed organism.

“Stable transformation” refers to the transfer of a nucleic acid fragment into a genome of a host organism, including both nuclear and organellar genomes, resulting in genetically stable inheritance. In contrast, “transient transformation” refers to the transfer of a nucleic acid fragment into the nucleus, or DNA-containing organelle, of a host organism resulting in gene expression without integration or stable inheritance. Host organisms containing the transformed nucleic acid fragments are referred to as “transgenic” organisms. The preferred method of cell transformation of rice, corn and other monocots is using particle-accelerated or “gene gun” transformation technology (Klein et al. (1987) Nature (London) 327:70-73; U.S. Pat. No. 4,945,050), or an Agrobacterium-mediated method (Ishida Y. et al. (1996) Nature Biotech. 14:745750). The term “transformation” as used herein refers to both stable transformation and transient transformation.

Standard recombinant DNA and molecular cloning techniques used herein are well known in the art and are described more fully in Sambrook, J., Fritsch, E. F. and Maniatis, T. Molecular Cloning: A Laboratory Manual; Cold Spring Harbor Laboratory Press Cold Spring Harbor, 1989 (hereinafter “Sambrook”).

The term “recombinant” refers to an artificial combination of two otherwise separated segments of sequence, e.g., by chemical synthesis or by the manipulation of isolated segments of nucleic acids by genetic engineering techniques.

“PCR” or “Polymerase Chain Reaction” is a technique for the synthesis of large quantities of specific DNA segments, consists of a series of repetitive cycles (Perkin Elmer Cetus Instruments, Norwalk, Conn.). Typically, the double stranded DNA is heat denatured, the two primers complementary to the 3′ boundaries of the target segment are annealed at low temperature and then extended at an intermediate temperature. One set of these three consecutive steps is referred to as a cycle.

Polymerase chain reaction (“PCR”) is a powerful technique used to amplify DNA millions of fold, by repeated replication of a template, in a short period of time. (Mullis et al. (1986) Cold Spring Harbor Symp. Quant. Biol. 51:263-273; Erlich et al, European Patent Application 50,424; European Patent Application 84,796; European Patent Application 258,017, European Patent Application 237,362; Mullis, European Patent Application 201,184, Mullis et al U.S. Pat. No. 4,683,202; Erlich, U.S. Pat. No. 4,582,788; and Saiki et al, U.S. Pat. No. 4,683,194). The process utilizes sets of specific in vitro synthesized oligonucleotides to prime DNA synthesis. The design of the primers is dependent upon the sequences of DNA that are to be analyzed. The technique is carried out through many cycles (usually 20-50) of melting the template at high temperature, allowing the primers to anneal to complementary sequences within the template and then replicating the template with DNA polymerase.

The products of PCR reactions are analyzed by separation in agarose gels followed by ethidium bromide staining and visualization with UV transillumination. Alternatively, radioactive dNTPs can be added to the PCR in order to incorporate label into the products. In this case the products of PCR are visualized by exposure of the gel to x-ray film. The added advantage of radiolabeling PCR products is that the levels of individual amplification products can be quantitated.

The terms “recombinant construct”, “expression construct” and “recombinant expression construct” are used interchangeably herein. These terms refer to a functional unit of genetic material that can be inserted into the genome of a cell using standard methodology well known to one skilled in the art. Such construct may be itself or may be used in conjunction with a vector. If a vector is used then the choice of vector is dependent upon the method that will be used to transform host plants as is well known to those skilled in the art. For example, a plasmid vector can be used. The skilled artisan is well aware of the genetic elements that must be present on the vector in order to successfully transform, select and propagate host cells comprising any of the isolated nucleic acid fragments of the invention. The skilled artisan will also recognize that different independent transformation events will result in different levels and patterns of expression (Jones et al. (1985) EMBO J. 4:2411-2418; De Almeida et al. (1989) Mol. Gen. Genetics 218:78-86), and thus that multiple events must be screened in order to obtain lines displaying the desired expression level and pattern. Such screening may be accomplished by Southern analysis of DNA, Northern analysis of mRNA expression, Western analysis of protein expression, or phenotypic analysis.

“Motifs” or “subsequences” refer to relatively short conserved regions of nucleic acids or amino acids that comprise part of a longer sequence. For example, it is expected that such conserved subsequences, such as those exemplified in SEQ ID NOs:49, 50, 52, and 52, would be important for function and could be used to identify new homologues of class I LOB domain proteins involved in controlling embryo/endosperm size in plants. It is expected that some or all of the elements may be found in an RE2 homolog. Also, it is expected that one or two of the conserved amino acids in any given motif may differ in a true RE2 homolog.

Thus, in one aspect, this invention concerns an isolated polynucleotide comprising:

- (a) a nucleic acid sequence encoding a polypeptide involved in altering embryo/endosperm size during seed development, said polypeptide having at least 80% amino acid sequence identity, based on the Clustal V method of alignment, when compared to an amino acid sequence selected from the group consisting of SEQ ID NOs:37, 39, 41, 43, 45, 47, 49, 51, and 53; or
- (b) a nucleic acid sequence set forth in SEQ ID NO:25 wherein said sequence comprises at least one of the following modifications:
  - (i) nucleotide 271 is a T residue instead of a C;
  - (ii) nucleotide 110 is a T residue instead of a G; or
  - (iii) nucleotide 75 is deleted; or
- (c) a nucleic acid sequence set forth in SEQ ID NO:34 wherein
  - (i) nucleotides 4473 through 4829 correspond to a first exon, and
  - (ii) nucleotides 5661 through 6110 correspond to a second exon, and
  - further wherein the nucleotides of (c) (i) and/or (c)(ii) encode a polypeptide involved in altering embryo/endosperm size during seed development,
- (d) the nucleic acid sequence set forth in SEQ ID NO:34 or 72; or
- (e) the full complement of (a), (b), (c), (d), or SEQ ID NO:34; or
- (f) all or part of a non-coding or coding region of the isolated polynucleotide comprising sequences of (a), (b) or SEQ ID NO:34 for use in co-suppression or antisense suppression of endogenous nucleic acid sequences encoding polypeptides involved in altering embryo/endosperm size during seed development.

Also of interest are recombinant DNA constructs comprising an isolated polynucleotide comprising any of the nucleotide sequences described herein operably linked in a sense or anti-sense orientation to at least one regulatory sequence. Such constructs can then be used to transform plants, plant tissue, or plant cells. Transformation methods are well known to those skilled in the art and are described above. Any plant, dicot or monocot can be transformed with such recombinant DNA constructs.

Examples of monocots include, but are not limited to, corn, wheat, rice, sorghum, millet, barley, palm, lily, Alstroemeria, rye, and oat.

Examples of dicots include, but are not limited to, soybean, rape, sunflower, canola, grape, guayule, columbine, cotton, tobacco, peas, beans, flax, safflower, and alfalfa.

Plant tissue includes differentiated and undifferentiated tissues or plants, including but not limited to, roots, stems, shoots, leaves, pollen, seeds, tumor tissue, and various forms of cells and culture such as single cells, protoplasm, embryos, and callus tissue. The plant tissue may in plant or in organ, tissue or cell culture.

The term “plant organ” refers to plant tissue or group of tissues that constitute a morphologically and functionally distinct part of a plant. The term “genome” refers to the following: 1. The entire complement of genetic material (genes and non-coding sequences) is present in each cell of an organism, or virus or organelle. 2. A complete set of chromosomes inherited as a (haploid) unit from one parent. The term “stably integrated” refers to the transfer of a nucleic acid fragment into the genome of a host organism or cell resulting in genetically stable inheritance.

Also within the scope of this invention are seeds obtained from such transformed plants and oil obtained from these seeds.

In another aspect, this invention concerns a method of altering embryo/endosperm size during seed development in a plant comprising:

- (a) transforming plant cells or plant tissue with the recombinant DNA construct of the invention;
- (b) regenerating transgenic plants from the transformed plant cells or plant tissue of (a);
- (c) screening the transgenic plants of (b) for seeds having an altered embryo/endosperm size based on a comparison of embryo/endosperm size of seeds obtained from non-transformed plants.

The regeneration, development, and cultivation of plants from single plant protoplast transformants or from various transformed explants is well known in the art (Weissbach and Weissbach, In: Methods for Plant Molecular Biology, (Eds.), Academic Press, Inc. San Diego, Calif., (1988)). This regeneration and growth process typically includes the steps of selection of transformed cells, culturing those individualized cells through the usual stages of embryonic development through the rooted plantlet stage. Transgenic embryos and seeds are similarly regenerated. The resulting transgenic rooted shoots are thereafter planted in an appropriate plant growth medium such as soil. Preferably, the regenerated plants are self-pollinated to provide homozygous transgenic plants. Otherwise, pollen obtained from the regenerated plants is crossed to seed-grown plants of agronomically important lines. Conversely, pollen from plants of these important lines is used to pollinate regenerated plants. A transgenic plant of the present invention containing a desired polypeptide is cultivated using methods well known to one skilled in the art.

There are a variety of methods for the regeneration of plants from plant tissue. The particular method of regeneration will depend on the starting plant tissue and the particular plant species to be regenerated.

Methods for transforming dicots, primarily using Agrobacterium tumefaciens, and obtaining transgenic plants have been published for cotton (U.S. Pat. No. 5,004,863, U.S. Pat. No. 5,159,135, U.S. Pat. No. 5,518,908); soybean (U.S. Pat. No. 5,569,834, U.S. Pat. No. 5,416,011, McCabe et. al. (1988) Bio/Technology 6:923, Christou et al. (1988) Plant Physiol. 87:671674); Brassica(U.S. Pat. No. 5,463,174); peanut (Cheng et al. (1996) Plant Cell Rep. 15:653-657, McKently et al. (1995) Plant Cell Rep. 14:699-703); papaya and pea—(Grant et al. (1995) Plant Cell Rep. 15:254-258).

Transformation of monocotyledons using electroporation, particle bombardment, and Agrobacterium have also been reported. Transformation and plant regeneration have been achieved in asparagus (Bytebier et al., Proc. Natl. Acad. Sci. (USA) (1987) 84:5354); barley (Wan and Lemaux (1994) Plant Physiol. 104:37); Zea mays (Rhodes et al. (1988) Science 240:204, Gordon-Kamm et al. (1990) Plant Cell 2:603-618, Fromm et al. (1990) Bio/Technology 8:833; Koziel et al. (1993) Bio/Technology 11: 194, Armstrong et al. (1995) Crop Science 35:550-557); oat (Somers et al. (1992) Bio/Technology 10: 15 89); orchard grass (Horn et al. (1988) Plant Cell Rep. 7:469); rice (Toriyama et al. (1986) Theor. Appl. Genet. 205:34; Part et al. (1996) Plant Mol. Biol. 32:1135-1148; Abedinia et al. (1997) Aust. J. Plant Physiol. 24:133-141; Zhang and Wu (1988) Theor. AppL Genet. 76:835; Zhang et al. (1988) Plant Cell Rep. 7:379; Battraw and Hall (1992) Plant Sci. 86:191-202; Christou et al. (1991) Bio/Technology 9:957); rye (De la Pena et al. (1987) Nature 325:274); sugarcane (Bower and Birch (1992) Plant J. 2:409); tall fescue (Wang et al. (1992) Bio/Technology 10:691), and wheat (Vasil et al. (1992) Bio/Technology 10:667; U.S. Pat. No. 5,631,152).

“Plant” includes reference to whole plants, plant organs, plant tissues, seeds and plant cells and progeny of same. Plant cells include, without limitation, cells from seeds, suspension cultures, embryos, meristematic regions, callus tissue, leaves, roots, shoots, gametophytes, sporophytes, pollen, and microspores.

“Progeny” comprises any subsequent generation of a plant.

“Transgenic plant” includes reference to a plant which comprises within its genome a heterologous polynucleotide. Preferably, the heterologous polynucleotide is stably integrated within the genome such that the polynucleotide is passed on to successive generations. The heterologous polynucleotide may be integrated into the genome alone or as part of a recombinant DNA construct.

Assays for gene expression based on the transient expression of cloned nucleic acid constructs have been developed by introducing the nucleic acid molecules into plant cells by polyethylene glycol treatment, electroporation, or particle bombardment (Marcotte et al., Nature 335:454-457 (1988); Marcotte et al., Plant Cell 1:523-532 (1989); McCarty et al., Cell 66:895-905 (1991); Hattori et al., Genes Dev. 6:609-18 (1992); Goff et al., EMBO J. 9:2517-2522 (1-990)).

Transient expression systems may be used to functionally dissect isolated nucleic acid fragment constructs (see generally, Maliga et al., Methods in Plant Molecular Biology, Cold Spring Harbor Press (1995)). It is understood that any of the nucleic acid molecules of the present invention can be introduced into a plant cell in a permanent or transient manner in combination with other genetic elements such as vectors, promoters, enhancers etc.

In addition to the above discussed procedures the standard resource materials which describe specific conditions and procedures for the construction, manipulation and isolation of macromolecules (e.g., DNA molecules, plasmids, etc.), generation of recombinant organisms and screening and isolating of clones (see for example, Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press (1989); Maliga et al., Methods in Plant Molecular Biology, Cold Spring Harbor Press (1995); Birren et al., Genome Analysis: Detecting Genes, 1, Cold Spring Harbor, N.Y. (1998); Birren et al., Genome Analysis: Analyzing DNA, 2, Cold Spring Harbor, N.Y. (1998); Plant Molecular Biology: A Laboratory Manual, eds. Clark, Springer, New York (1997)) are well known.

In another aspect, this invention concerns a method of mapping genetic variations related to controlling embryo/endosperm size during seed development and/or altering oil phenotypes in plants comprising: (a) crossing two plant varieties; and evaluating genetic variations with respect to a nucleic acid sequence selected from the group consisting of SEQ ID NOs:25, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, and 72; or a nucleic acid sequence encoding a polypeptide selected from the group consisting of SEQ ID NOs:26, 29, 31, 33, 37, 39, 41, 43, 45, 47, 49, 51, and 53; in progeny plants resulting from the cross of step (a) wherein the evaluation is made using a method selected from the group consisting of RFLP analysis, SNP analysis, and PCR-based analysis.

The terms “mapping genetic variation” or “mapping genetic variability” are used interchangeably and define the process of identifying changes in DNA sequence, whether from natural or induced causes, within a genetic region that differentiates between different plant lines, cultivars, varieties, families, or species. The genetic variability at a particular locus (gene) due to even minor base changes can alter the pattern of restriction enzyme digestion fragments that can be generated. Pathogenic alterations to the genotype can be due to deletions or insertions within the gene being analyzed or even single nucleotide substitutions that can create or delete a restriction enzyme recognition site. Restriction fragment length polymorphism (RFLP) analysis takes advantage of this and utilizes Southern blotting with a probe corresponding to the isolated nucleic acid fragment of interest.

Thus, if a polymorphism (i.e., a commonly occurring variation in a gene or segment of DNA; also, the existence of several forms of a gene (alleles) in the same species) creates or destroys a restriction endonuclease cleavage site, or if it results in the loss or insertion of DNA (e.g., a variable nucleotide tandem repeat (VNTR) polymorphism), it will alter the size or profile of the DNA fragments that are generated by digestion with that restriction endonuclease. As such, individuals that possess a variant sequence can be distinguished from those having the original sequence by restriction fragment analysis. Polymorphisms that can be identified in this manner are termed “restriction fragment length polymorphisms: (“RFLPs”). RFLPs have been widely used in human and plant genetic analyses (Glassberg, UK Patent Application 2135774; Skolnick et al, Cytogen. Cell Genet 32:58-67 (1982); Botstein et al, Ann. J. Hum. Genet. 32:314-331 (1980); Fischer et al (PCT Application WO 90/13668; Uhlen, PCT Application WO 90/11369).

A central attribute of “single nucleotide polymorphisms” or “SNPs” is that the site of the polymorphism is at a single nucleotide. SNPs have certain reported advantages over RFLPs or VNTRs. First, SNPs are more stable than other classes of polymorphisms. Their spontaneous mutation rate is approximately 10⁻⁹(Kornberg, DNA Replication, W.H. Freeman & Co., San Francisco, 1980), approximately, 1,000 times less frequent than VNTRs (U.S. Pat. No. 5,679,524). Second, SNPs occur at greater frequency, and with greater uniformity than RrFLPs and VNTRs. As SNPs result from sequence variation, new polymorphisms can be identified by random sequencing of genomic or cDNA molecules. SNPs can also result from deletions, point mutations and insertions. Any single base alteration, whatever the cause, can be a SNP. The greater frequency of SNPs means that they can be more readily identified than the other classes of polymorphisms.

SNPs can be characterized using any of a variety of methods. Such methods include the direct or indirect sequencing of the site, the use of restriction enzymes where the respective alleles of the site create or destroy a restriction site, the use of allele-specific hybridization probes, and the use of antibodies that are specific for the proteins encoded by the different alleles of the polymorphism or by other biochemical interpretation. SNPs can be sequenced by a number of methods. Two basic methods may be used for DNA sequencing, the chain termination method of Sanger et al, Proc. Nati. Acad. Sci. (U.S.A.) 74:5463-5467 (1977), and the chemical degradation method of Maxam and Gilbert, Proc. Nati. Acad. Sci. (U.S.A.) 74: 560-564 (1977).

Furthermore, single point mutations can be detected by modified PCR techniques such as the ligase chain reaction (“LCR”) and PCR-single strand conformational polymorphisms (“PCR-SSCP”) analysis. The PCR technique can also be used to identify the level of expression of genes in extremely small samples of material, e.g., tissues or cells from a body. The technique is termed reverse transcription-PCR (“RT-PCR”).

In another embodiment, this invention concerns a method of molecular breeding to obtain altered embryo/endosperm size during seed development and/or altered oil phenotypes in plants comprising: (a) crossing two plant varieties; and (b) evaluating genetic variations with respect to: (i) a nucleic acid sequence selected from the group consisting of SEQ ID NOs:25, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, and 72; or a nucleic acid sequence encoding a polypeptide selected from the group consisting of SEQ ID NOs:26, 29, 31, 33, 37, 39, 41, 43, 45, 47, 49, 51, and 53; in progeny plants resulting from the cross of step (a) wherein the evaluation is made using a method selected from the group consisting of RFLP analysis, SNP analysis, and PCR-based analysis.

The term “molecular breeding” defines the process of tracking molecular markers during the breeding process. It is common for the molecular-markers to be linked to phenotypic traits that are desirable. By following the segregation of the molecular marker or genetic trait, instead of scoring for a phenotype, the breeding process can be accelerated by growing fewer plants and eliminating assaying or visual inspection for phenotypic variation. The molecular markers useful in this process include, but are not limited to, any marker useful in identifying mapable genetic variations previously mentioned, as well as any closely linked genes that display synteny across plant species. The term “synteny” refers to the conservation of gene placement/order on chromosomes between different organisms. This means that two or more genetic loci, that mayor may not be closely linked, are found on the same chromosome among different species. Another term for synteny is “genome colinearity”.

EXAMPLES

The present invention is further defined in the following Examples, in which parts and percentages are by weight and degrees are Celsius, unless otherwise stated. It should be understood that these Examples, while indicating preferred embodiments of the invention, are given by way of illustration only. From the above discussion and these Examples, one skilled in the art can ascertain the essential characteristics of this invention, and without departing from the spirit and scope thereof, can make various changes and modifications of the invention to adapt it to various usages and conditions. Thus, various modifications of the invention in addition to those set forth and described herein will be apparent to those skilled in the art from the foregoing description. Such modifications are also intended to fall within the scope of the appended claims.

The disclosure of each reference set forth herein is incorporated herein by reference in its entirety.

Example 1 Mapping of the Oryza sativa RE2 Locus to a Single Chromosome

Identification of the chromosome comprising the Oryza sativa RE2 locus was performed using Cleaved Amplified Polymorphic Sequence markers (CAPS markers). The Oryza sativa RE2 locus comprises the polynucleotide that, when mutated, is responsible for a reduced embryo 2, or re2, mutant phenotype as exemplified by Hong et al. (1996, Development 122:2051-2058). Mutant rice grains displaying the re2 phenotype show a reduced embryo size and an increased endosperm size. CAPS markers covering the entire rice genome were developed and, asset forth below, were used to identify the portion of the chromosome comprising the Oryza sativa RE2 Locus.

Developing of CAPS Markers

Mapping of the RE2 locus to a single chromosome required first developing CAPS markers covering the entire rice genome. CAPS markers were developed as follows.

Oligonucleotide primer sets were designed based on rice genomic sequence information available in the NCBI database. Information relating to the position of the sequences in the rice chromosomes was retrieved from the web sites of the Rice Genome Research Program (RGP), Tsukuba, Japan, or the Clemson University Genomics Institute, Clemson, S.C. The oligonucleotide primer sets were used to amplify portions of genomic DNA prepared from Indica (cv. Kasalth), Japonica (cv. Taichung 65), and Japonica (cv. Kinmaze) rice. The amplified fragments were digested with restriction endonucleases and polymorphisms identified between the three wild type rice as follows.

Genomic DNA was prepared from leaves of the three rice cultivars as follows. A 3 g piece from the leaf blade was ground using a mortar and pestle and suspended in 8 mL DNA extraction buffer (0.1 M ethylenediaminetetraacetic acid [EDTA], 1% N-lauroylsarcosine, 100 μg/mL proteinase K). The suspended sample was incubated at 50° C. for 1 hour, and debris removed by centrifuging at 3,400 rpm for 15 minutes using a RT-7 Plus centrifuge (Sorvall®) and transferring the supernatant to a fresh tube. The DNA was precipitated by adding 2 volumes of 100% ethanol and separated by centrifuging at 10,000 rpm for 15 minutes at 4° C. using an RC-5B centrifuge (Sorvall®). The DNA pellet was resuspended in 8 mL TE (10 mM tris, 1 mM EDTA) and reprecipitated with 16 mL 100% ethanol. After separation of the DNA pellet by centrifugation, it was resuspended in 3.7 mL TE, 50 μL 10 mg/mL ethidium bromide were added, the volume was brought up to 4 mL with TE, and 4.4 g CsCl were added. The solution was transferred to an OptiSeal™ tube (Beckman) and centrifuged for 16 hours at 52,000 rpm at 25° C. using an NVT65.2 rotor in an L8-M centrifuge (Beckman). After centrifugation the DNA band was visualized using an UV lamp and 500 μL removed using an 18-gauge needle in a 1 mL syringe. The DNA band was transferred to a 1.5 mL tube and the ethidium bromide removed by adding 500 μL isopropanol saturated with 20×SSPE buffer and centrifuging at 14,000 rpm for 30 seconds using a using a 5415C centrifuge (Eppendorf) and discarding the isopropanol phase. Removal of the ethidium bromide was accomplished by repeating addition of isopropanol and centrifugation 6 times. The DNA was then precipitated by adding 100 μL TE and 500 μL 100% ethanol and separated by centrifuging at 14,000 rpm for 15 minutes. The recovered DNA pellet was resuspended in 400 μL TE and 40 μL 3 M NaOAC. The DNA was precipitated one more time with the addition of 1 ml 100% ethanol, separated by centrifuging at 14,000 rpm for 15 minutes, rinsed with 500 μL 70% ethanol, dried, and resuspended in water to a concentration of 10 ng/μL. The genomic DNA was amplified using the oligonucleotide primer sets designed above using the following PCR conditions:

Amplifications were performed in 30 μL reactions containing 1 μL DNA prepared above (at 10 ng/μL concentration), 2 μL of 2.5 mM dNTPs, 2 μL 25 mM MgCl₂, 10 pmole of each primer, 0.3 μL Amplitaq gold (Perkin Elmer, Wellsley, Mass.), and 3 μL 10×PCR buffer. Amplification of DNA was performed by heating the reactions at 95° C. for 10 minutes followed by 40 cycles of 94° C. for 30 seconds, 56° C. for 30 seconds, and 72° C. for 30 seconds. Termination of the amplification reactions was accomplished by heating the reactions at 72° C. for 5 minutes.

Amplified DNA fragments were then digested with restriction endonucleases having 4 or 5 base recognition sites. Restriction endonuclease digestions were performed in 15 μL digestion reactions containing 2 μL of amplified DNA, 1.5 μL 10× reaction buffer, and 0.5 μL restriction enzyme. The digestion reactions were incubated for 1 hour at either 37° C. or at 60° C. depending on restriction endonuclease being utilized. Digested DNA products were loaded on a 2.5% agarose gel and separated by electrophoresis to analyze polymorphisms. Comparison of the CAPS markers developed for Japonica and Indica rice allowed the development of 26 CAPS markers for wild type rice.

Mapping of the Oryza sativa RE2 Locus to a Single Chromosome

Linkage between CAPS markers obtained for wild type rice and those obtained for re2 mutant plants was then analyzed. CAPS markers were prepared with genomic DNA from F3 Japonica rice plants whose F2 seed showed re2 phenotype and were compared to the CAPS markers prepared above. Two markers on chromosome 10 (markers-C10 7.7 and C10 15.9) showed co-segregation with the re2-1 phenotype and were identified as follows.

Plants displaying an re2 mutant phenotype were obtained by crossing a Japonica cv. Taichung 65 mutant plant showing the re2-1 mutant phenotype with a plant of the Indica cultivar Kasalath and scoring the embryo phenotype of F2 mature seeds using a dissecting microscope. Twenty eight (28) seeds showing re2 mutant phenotype were sterilized and sown in soil. Genomic DNA was extracted from the leaves of these 28 F3 re2 mutant plants as follows. Leaf samples, weighing 300 mg, were ground to powder in liquid Nitrogen using a mortar and pestle. Each sample was then suspended in 750 μL extraction buffer containing 1.5 M NaCl, 0.2 M EDTA, 1 M tris and 3% CTAB (cetyltrimethylammonium bromide) and vortexed. Proteins were removed by adding 50 μL chloroform to the samples, shaking for 20 minutes, centrifuging briefly in a microfuge, and decanting the supernatant, containing the DNA, into a new tube. Genomic DNA was precipitated by adding 300 μL isopropanol, mixing by quick vortexing, and allowing the aqueous phase to precipitate. The pellet, containing the DNA, was recovered in H2O and used in amplification reactions as follows.

Marker C10 7.7 was amplified using oligonucleotide primers C10 6-3 and C10 6-4. Oligonucleotide primers C16-3 and C10 6-4 were developed as described above, have the nucleotide sequences set forth in SEQ ID NO:1 and SEQ ID NO:2, respectively, and have the sequences set forth as follows:

SEQ ID NO:1: 5′-TAGCAGCTGGGAAGAACAACATG-3′ SEQ ID NO:2: 5′-CGTGCACCACGTAACGTTAAGC-3′

Polymorphism was observed on CAPS marker C10 7.7 when the amplified DNA was digested with the restriction endonuclease Dde I, loaded on a 2.5% agarose gel, and separated by electrophoresis. Comparison of C10 7.7 CAPS markers allowed the identification of 4 recombination breakpoints between DNA prepared from wild type plants and that obtained from re2 mutant plants.

Marker C10 15.9 was amplified using oligonucleotide primers C10 15.9-1 and C10 15.9-2. Oligonucleotide primers C10 15.9-1 and C10 15.9-2 were developed as described above, have the nucleotide sequences set forth in SEQ ID NO:3 and SEQ ID NO:4, respectively, and have the sequences set forth as follows:

SEQ ID NO:3: 5′-CAGGGTTGTGTAAGGATCGTTG-3′ SEQ ID NO:4: 5′-GATCATCGTGTAGTACCAGGAC-3′

Polymorphism was observed on CAPS marker C10 15.9 when the amplified DNA was digested with the restriction endonuclease Msp I. This digestion produced additional bands in the Indica (Kasalath) background. Comparison of marker C10 15.9 prepared from DNA obtained from wild type plants with marker C10 15.9 prepared from DNA obtained from re2 mutant plants allowed the identification of 4 recombination breakpoints different from the ones identified with CAPS marker C10 7.7.

As explained above, comparison of CAPS markers prepared from DNA obtained from wild type rice and that obtained from F3 rice plants whose F2 seed showed re2 phenotype allowed the identification of 4 recombination breakpoints in CAPS marker C10 7.7 and 4 different recombination breakpoints in CAPS marker C10 15.9. These results indicate that the RE2 locus which contains the polynucleotide that when mutated is responsible for a re2 mutant phenotype maps to a region on chromosome 10 flanked by markers C10 7.7 and C10 15.9.

Example 2 Map-Based Cloning of the Oryza sativa RE2 Gene

In Example 1 the RE2 locus, comprising the RE2 gene, was mapped to a region on chromosome 10 flanked by markers C10 7.7 and C10 15.9. This Example describes cloning of the RE2 gene from F2 recombinant plants produced by crossing a re2-1 mutant plant (Japonica cv. Taichung 65) with an Indica cultivar, Kasalath using CAPS markers as follows.

F2 seeds obtained from self-fertilized F1 plants were screened for the re2 mutant phenotype to obtain populations for cloning the RE2 gene. Seeds (308) displaying an re2 mutant phenotype were germinated on MS medium containing 0.3% gelrite and incubated in a growth chamber for 3 weeks with a 16 hour light/8 hour dark cycle. When the plants on the plates were at third leaf stage, 5-10 mm of the tip of the leaf was removed and used for DNA amplification. Direct PCR amplification reactions were carried out as described in Klimyuk et al. (1993 Plant J. 3:493-494) with a modification of extending the sample boiling time to 4 minutes after the neutralization step. Briefly; the leaf tissue was collected in a sterile vial containing 40 μL of 0.25 M NaOH and incubated 30 seconds in a boiling water bath. Samples were neutralized by adding 40 μL 0.25 M HCl and 20 μL 0.5 M Tris-HCL, pH 8.0 containing 0.25% (v/v) Nonidet P-40 and boiling for an additional 4 minutes. Tissue samples were used immediately for amplification or stored at 4° C. until needed. Each 30 μL amplification reaction contained 10 pmole of each primer, 2 μL of 2.5 mM dNTPs, 2 μL of 25 mM MgCl₂, 1 μL leaf extract, 0.3 μL AmpliTaq gold (Perkin Elmer), and 3 μL PCR buffer. The thermal cycler was set to 95° C. for 10 minutes, followed by 40 cycles of 94° C. for 4 minutes, 50° C. for 30 seconds, and 72° C. for 30 seconds followed by heating at 72° C. for 5 minutes.

DNA obtained from 44 of these 308 F2 recombinant plants contained breakpoints between CAPS markers C10 7.7 and C10 15.9 and were identified using CAPS markers C10 7.7 Hpy, C10 11.5, C10 11.0, C10 9.6, E08 93K, and E08 46K which were developed as follows.

Marker C10 7.7 Hpy was amplified using oligonucleotide primers C10-7.7 2 HPYIVF and C10-7.7 2 HPYIVR. Oligonucleotide primers C10-7.7 2 HPYIVF and C10-7.7 2 HPYIVR were developed as described in Example 1, have the nucleotide sequences set forth in SEQ ID NO:5 and SEQ ID NO:6, respectively, and have the sequences set forth as follows:

SEQ ID NO:5: 5′-ATTGTCTCGTGTGACAGCGC-3′ SEQ ID NO:6: 5′-CCGCAATTAATATTCCGAGC-3′

Polymorphism was observed on the C10 7.7 Hpy CAPS marker when the amplified DNA was digested with the restriction endonuclease HpyCH4 IV.

Marker C10 11.5 was amplified using oligonucleotide primers 11.5 HpyV and C10 11.5-9. Oligonucleotide primers 11.5 HpyV and C10 11.5-9 were developed as described in Example 1, have the nucleotide sequences set forth in SEQ ID NO:7 and SEQ ID NO:8, respectively, and have the sequences set forth as follows:

SEQ ID NO:7: 5′-AAAGTGTGGTAGGTGTCATCCAGTTG-3′ SEQ ID NO:8: 5′-GCCACATGATCATCCACTACCAATG-3′

Polymorphism was observed on the C10 11.5 CAPS marker when the amplified DNA was digested with the restriction endonuclease HpyCH4 V.

Marker C10 11.0 was amplified using oligonucleotide primers C10 11-5 and 11 HinfR. Oligonucleotide primers C10 11-5 and 11 HinfR were developed as described in Example 1, have the nucleotide sequences set forth in SEQ ID NO:9 and SEQ ID NO:10, respectively, and have the sequences set forth as follows:

SEQ ID NO:9: 5′-CTTTTTCCGACCCACATGAAGGT-3′ SEQ ID NO:10: 5′-TACAAACGCTCCTAAACCACCATGT-3′

Polymorphism was observed on the C10 11.0 CAPS marker when the amplified DNA was digested with the restriction endonuclease Hinf I.

Marker C10 9.6 was amplified using oligonucleotide primers 9.6 DraIF and 9.6 DraIR. Oligonucleotide primers-9.6 DraIF and 9.6 DraIR were developed as described in Example 1, have the nucleotide sequences set forth in SEQ ID NO:11 and SEQ ID NO:12, respectively, and have the sequences set forth as follows:

SEQ ID NO:11: 5′-TTTGGGTGCATTAAAGTGGACCA-3′ SEQ ID NO:12: 5′-GGGGTAATTCGGATGACCATG-3′

Polymorphism was observed on the C10 9.6 CAPS marker when the amplified DNA was digested with the restriction endonuclease Dra I.

Marker E08 93K was amplified using oligonucleotide primers E08 93KF and E08 93KR. Oligonucleotide primers E08 93KF and E08 93KR were developed as described in Example 1, have the nucleotide sequences set forth in SEQ ID NO:13 and SEQ ID NO:14, respectively, and have the sequences set forth as follows:

SEQ ID NO:13: 5′-CTCATAGCCGCCTAGCCTCATAG-3′ SEQ ID NO:14: 5′-GAAGCAGAGAAACTCCAACCTGG-3′

Polymorphism was observed on the E08 93K CAPS marker when the amplified DNA was digested with the restriction endonuclease HpyCH4 V.

Marker E08 46K was amplified using oligonucleotide primers E08 46KF and E08 46KR. Oligonucleotide primers E08 46KF and E08 46KR were developed as described in Example 1, have the nucleotide sequences set forth in SEQ ID NO:15 and SEQ ID NO:16, respectively, and have the sequences set forth as follows:

SEQ ID NO:15: 5′-GTTCATAGGTGCCAAATTTGGGTG-3′ SEQ ID NO:16: 5′-CACAAGTAACCCAATGCCCAAAC-3′

Polymorphism was observed on the E08 46K CAPS marker when the amplified DNA was digested with the restriction endonuclease Rsa I.

Analysis of recombination breakpoints identified 6 recombination-breakpoints between DNA obtained from re2 mutant plants and CAPS marker E08 93K and 3 recombination breakpoints between DNA obtained from re2 mutant plants and CAPS marker C10 9.6. Information relating to the position of the sequences of the CAPS markers in the rice chromosomes was retrieved from the web sites of the Rice Genome Research Program (RGP), Tsukuba, Japan, or the Clemson University Genomics Institute, Clemson, S.C. This information revealed that the sequences for CAPS markers E08 93K and C10 9.6 were derived from two overlapping BAC clones, OSJNBa0050E08 and OSJNBb0042K08, that cover 190 Kb on rice chromosome 10. At least 10 genes are found in this region.

An additional CAPS marker, K08 21K, and a single nucleotide polymorphism-based (SNP-based) marker were generated that were derived from BAC OSJNBb0042K08, and mapped 25 Kb apart.

Marker K08 21K was amplified using oligonucleotide primers K08 21KF and K08 21KR. Oligonucleotide primers K08 21KF and K08 21KR were developed as described in Example 1, have the nucleotide sequences set forth in SEQ ID NO:17 and SEQ ID NO:18, respectively, and have the sequences set forth as follows:

SEQ ID NO:17: 5′-GTTCACCCATTAGTGATGCCTGG-3′ SEQ ID NO:18: 5′-GTTCACTCGATAAGAGCAATCGAAC-3′

Polymorphism was observed on the K08 21K CAPS marker when the amplified DNA was digested with the restriction endonuclease Taq I.

SNP-based marker K08 46K was amplified using primers K08 46KF and K08 46KR. Oligonucleotide primers K08 46KF and K08 46KR were developed as described in Example 1, have the nucleotide sequences set forth in SEQ ID NO:19 and SEQ ID NO:20, respectively, and have the sequences set forth as follows:

SEQ ID NO:19: 5′-GTTATGTTGCACACCTCCAGTAGTTAC-3′ SEQ ID NO:20: 5′-GTCAAGCCTGCTGTTACCCTTTAAG-3′

Amplified DNA products were purified using a Qiagen PCR purification kit (Qiagen, Valencia, Calif.) and 100 ng of each purified DNA was used for direct sequencing. Of 9 recombination breakpoints analyzed 3 were found in marker K08 21K and 1 was found in marker K08 46K confining the RE2 gene to a 25 Kb region between these two markers.

This 25 Kb region contains DNA corresponding to two putative genes. One is gene OSJNBb0042K08.8 that is predicted to encode a myosin-like protein and is found in the NCBI database as Version AAL77142.1 having NCBI General Identifier No. 18652508. The other one is gene OSJNBb0042K08.9 that is predicted to encode a protein of unknown function and is found in the NCBI database as Version AAL77143.1 having. NCBI General Identifier No. 18652509.

The regions corresponding to these two genes were sequenced in genomic DNA obtained from mutant alleles re2-1, re2-2 and re2-3 to identify the RE2 gene. Amplification of exon 1 was performed using oligonucleotide primers LOB-82F and LOB R1 and amplification of exon 2 was performed using oligonucleotide primers LOB F2 and LOB R2. Oligonucleotide primers LOB-82F, LOB R1, LOB F2, and LOB R2 have the nucleotide sequences set forth in SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, and SEQ ID NO:24, respectively, and have the sequences set forth as follows:

SEQ ID NO:21: 5′-GTCAAGCCTGCTGTTACCCTTTAAG-3′ SEQ ID NO:22: 5′-CCACCATGACGAACATCTAAATG-3′ SEQ ID NO:23: 5′-GTATAGCTCCCAACCATTTCTCCTC-3′ SEQ ID NO:24: 5′-CCAACATCACCATCATCGTCTTC-3′

Amplification reactions were carried out using the same conditions that were used for CAPS marker amplifications in Example 1 except that 20 ng of DNA was used per reaction and the annealing temperature was 55° C. Amplified DNA products were cloned into p-GEM T easy Vector (Promega, Madison, Wis.) and, for each amplification reaction, plasmid DNA was prepared from at least 4 independent colonies using a Qiagen miniprep kit (Qiagen, Valencia, Calif.). Plasmids were sequenced using the M13 forward and reverse sequencing primers.

No mutation was found in the portion of DNA corresponding to the gene encoding the myosin-like protein, but mutations were found in the region encoding the unknown protein. This means that the RE2 gene has the sequence found in NCBI having locus tag OSJNBb0042K08.9 that is predicted to encode a protein of unknown function and is found in the NCBI database as Version AAL77143.1 having NCBI General Identifier No. 18652509.

The nucleotide sequence of the Oryza saliva RE2 gene is set forth in SEQ ID NO:25 and the amino acid sequence deduced from translating nucleotides 1 through 807 of SEQ ID NO:25 is set forth in SEQ ID NO:26. Nucleotides 808-810 of SEQ ID NO:25 correspond to a stop codon. The nucleotide sequence set forth in SEQ ID NO:25 is the same as the one found in the NCBI database having locus tag OSJNBb0042K08.9 that is predicted to encode a protein of unknown function. The amino acid sequence set forth in SEQ ID NO:26 is the same as the one for the protein of unknown function found in the NCBI database as Version AAL77143.1 having NCBI General Identifier No. 18652509 that is set forth here in SEQ ID NO:27.

Identification of Mutations Responsible for an re2 Phenotype

Mutations in the RE2 gene responsible for the re2 phenotype were determined by comparing the nucleotide sequences obtained for DNA from wild-type rice with the nucleotide sequences obtained for DNA from rice exhibiting an re2 phenotype. Three re2 mutant alleles were identified and labeled re2-1, re2-2, and re2-3. The nucleotide sequence obtained for mutant allele re2-1 is set forth in SEQ ID NO:28 and the amino acid sequence obtained by translating nucleotides 1 through 807 of SEQ ID NO:28 is set forth in SEQ ID NO:29. Nucleotides 808 through 810 of SEQ ID NO:28 correspond to a stop codon. The nucleotide sequence obtained for mutant allele re2-2 is set forth in SEQ ID NO:30 and the amino acid sequence obtained by translating nucleotides 1 through 807 of SEQ ID NO:30 is set forth in SEQ ID NO:31. Nucleotides 808 through 810 of SEQ ID NO:30 correspond to a stop codon. The nucleotide sequence obtained for mutant allele re2-3 is set forth in SEQ ID NO:32 and the amino acid sequence obtained by translating nucleotides 1-378 of SEQ ID NO:32 is set forth in SEQ ID NO:33. Nucleotides 379 through 381 of SEQ ID NO:32 correspond to a stop codon.

FIG. 1A-C shows an alignment of the nucleotide sequences obtained for the coding regions of wild type RE2 (SEQ ID NO:25), and mutants re2-1 (SEQ ID NO:28), re2-2 (SEQ ID NO:30), and re2-3 (SEQ ID NO:32). Changes in the nucleotide sequence are indicated by a star below the alignment and by a box around the nucleotides at that position. As seen in FIG. 1, mutant allele re2-1 had a T residue at nucleotide 279, mutant allele re2-2 had a T residue at nucleotide 110, and mutant allele re2-3 had the C at nucleotide 75 deleted. These nucleotide changes result in changes in the amino acid sequence.

FIG. 2 shows an alignment of the amino acid sequences obtained for wild type RE2 protein (SEQ ID NO:26), and mutant proteins re2-1 (SEQ ID NO:29), and re2-2 (SEQ ID NO:31). Amino acids that change between the wild type and mutant are indicated by a box around the amino acids that are different at that position. As seen in FIG. 2, mutant allele re2-1 protein had an isoleucine at amino acid 93 instead of the highly-conserved threonine, mutant allele re2-2 protein had a phenylalanine instead of the conserved cysteine at amino acid 37. The deletion of a nucleotide at position 75 in mutant allele re2-3 gene produced a frame shift that results in a 127 amino acid polypeptide (set forth in SEQ ID NO:33) that shares identity with the first 25 amino acids of wild type RE2 protein but whose remaining 102 amino acids share little or no homology with wild type RE2 protein or mutant proteins re2-1 or re2-2.

Example 3 Confirmation of the Function of the Oryza sativa RE2 Gene

Functional confirmation of the identity of the Oryza sativa RE2 gene identified in Example 2 was performed using genetic complementation. Rice callus cells derived from wild type and re2 mutant plants were transformed with a genomic DNA fragment comprising the RE2 gene. Restoration of the embryo size of the re2 mutant cells transformed with the genomic DNA fragment comprising the RE2 gene confirmed that the Oryza sativa RE2 gene identified in Example 2 is the sole target of mutations giving rise to the re2 phenotype. Cloning of the genomic fragment comprising the wild type RE2 gene and transformation into rice cells were performed as follows.

A genomic DNA fragment containing wild type Oryza sativa RE2 gene was obtained from a lambda rice genomic DNA library (Stratagene) as follows. The genomic library was screened using a DNA probe obtained using primers LOB F2 (SEQ ID NO:23) and LOB R2 (SEQ ID NO:24) that, as indicated in Example 2, above, may be used to amplify exon 2 of the RE2 gene. Of 8 clones identified, one clone, named RE2G4, contained a 15 Kb insert comprising an approximately 9 Kb fragment flanked by two BamH I sites and comprising the RE2 gene. One of the BamH I sites was located 4472 bp upstream of the ATG initiation codon in the RE2 gene and the other one was located 3089 bp downstream of the termination codon of the RE2 gene. Nucleotides 4473 through 4829 correspond to a first exon, nucleotides 4830 through 5660 correspond to an intron, and nucleotides 5661 through 6110 correspond to the second exon. Nucleotides 6111 through 6113 form a termination codon. The nucleotide sequence of this approximately 9 Kb BamH I fragment is set forth in SEQ ID NO:34.

The approximately 9 Kb BamH I fragment comprising the RE2 coding region (set forth in SEQ ID NO:34) was removed from clone RE2G4 by digestion with BamH I and was subcloned into the BamH I site of the pML18 transformation vector to produce vector OsRE2pML18. Transformation vector pML18 is derived from the commercially available vector pGEM9z (obtained from Gibco-BRL which is owned by Invitrogen, Carlsbad, Calif.) and was modified by adding a cassette to express the bacterial hygromycin phosphotransferase gene. The bacterial hygromycin phosphotransferase gene confers resistance to the antibiotic used as selectable marker for rice transformation. A Sal I fragment, containing a cassette comprising the cauliflower mosaic virus 35S promoter, driving expression of the bacterial hygromycin phosphotransferase gene, followed by nucleotides 848 to 1550 of the 3′ end of the nopaline synthase gene, was inserted at the Sal I site of vector pGEM9z to produce pML18. The nucleotide sequence of pML18 is set forth in SEQ ID NO:35.

Vector OsRE2pML18 was introduced into callus derived from wild type rice plants and from re2 mutant plants using a Biolistic PDS-1000/He gun (BioRAD Laboratories, Hercules, Calif.) and the particle bombardment technique (Klein et al. (1987) Nature (London) 327:70-73) as follows.

Embryogenic callus cultures derived from the scutellum of germinating rice seeds were used as source material for transformation experiments. This material was generated by germinating sterile rice seeds on N6-2,4D media (N6 salts, N6 vitamins, 2.0 mg/l 2,4-D, 100 mg/L myo-inositol, 300 mg/L casamino acids, and 2.7 g/L proline) in the dark at 27-28° C. Embryogenic callus proliferating from the scutellum of the embryos was then transferred to fresh N6-2,4D media. Callus cultures were maintained by routine sub-culture at two-week intervals and used for transformation within 4 weeks of initiation. The regeneration, development, and cultivation of plants from single plant protoplast transformants or from various transformed explants is well known in the art (Weissbach and Weissbach, In: Methods for Plant Molecular Biology, (Eds.), Academic Press, Inc. San Diego, Calif., (1988)).

Callus was prepared for transformation by arranging 0.5-1.0 mm callus pieces approximately 1 mm apart in a circular area of about 4 cm in diameter in the center of a circle of Whatman #541 paper placed on CM media and incubating in the dark at 27-28° C. for 3-5 days. Vector OsRE2pML18 was introduced into wild type callus cells and re2 mutant rice callus cells using a Biolistic PDS-1000/He gun (BioRAD Laboratories, Hercules, Calif.).

Transformation of mutant callus with vector OsRE2pML18 produced 16 transgenic plants of which 7 transgenic plants produced seed. T2 seed from 6 plants showed a wild type to re2 mutant phenotype segregating at a 3:1 ratio. Restoration of wild type phenotype in re2 mutant plants by vector OsRE2pML18 indicates that the 9,203 bp rice genomic DNA fragment present in vector OsRE2pML18 was capable of complementing an re2 mutation. This confirms that the Oryza sativa RE2 gene has the sequence found in NCBI having locus tag OSJNBb0042K08.9 that is predicted to encode a protein of unknown function found in the NCBI database as Version AAL77143.1 having NCBI General Identifier No. 18652509. These results also indicate that the 9,203 bp rice genomic DNA fragment in vector OsRE2pML18 used in these transformations and set forth in SEQ ID NO:34 contains the complete set of regulatory elements required for proper complementation of an re2 mutant phenotype and involved in altering embryo/endosperm size during seed development.

Example 4 Composition of cDNA Libraries: Isolation and Sequencing of cDNA Clones Encoding Polypeptides Involved in Altering Embryo/Endosperm Size During Seed Development

The sequence of the Oryza sativa RE2 gene was identified in Example 2 and its function was confirmed in Example 3 as being involved in altering embryo/endosperm size during seed development. Identification of genes from other crops involved in altering embryo/endosperm size during seed development is set forth in Examples 4 and 5. cDNAs encoding polypeptides homologous to rice RE2 protein were identified by electronically screening the Du Pont proprietary database using BLAST analysis (Basic Local Alignment Search Tool; Altschul et al. (1993) J. Mol. Biol. 215:403-410). Clones derived from cDNA libraries representing mRNAs from various corn (Zea maize), Euphorbia lagascae, columbine (Aquilegia vulgaris), guar (Cyamopsis tetragonoloba), rice (Oryza sativa), soybean (Glycine max), and wheat (Triticum aestivum) tissues were identified as encoding homologs to the rice RE2 protein. The libraries were prepared as described below. The characteristics of the libraries are described in Table 1.

TABLE 1 Libraries from Corn, Euphorbia lagascae, Columbine, Guar, Rice, Soybean, and Wheat Library Tissue Clone cef1f Corn entire fertilized ear 3 to 12 days cef1f.pk001.f4:fis after pollination cpf1c Corn pooled BMS treated with chemicals cpf1c.pk006.d18a:fis related to protein synthesis¹ cpi1c Corn pooled BMS treated with chemicals cpi1c.pk005.a12:fis related to biochemical compound synthesis² cr1n Corn root from 7 day old seedlings³ cr1n.pk0028.h3a:fis eel1c Euphorbia lagascae developing seeds eel1c.pk003.b10:fis eav1c Columbine developing seeds eav1c.pk003.c9 lds3c Guar seeds harvested 32 days after flowering lds3c.pk011.j11:fis sdr1f Soybean 10 day old root sdr1f.pk005.d21.f:fis wdr1f Wheat entire developing root wdr1f.pk002.l10:fis ¹Chemicals used included chloramphenicol, cyclohexamide, aurintricarboylic acid. ²Chemicals used included sorbitol, egosterol, taxifolin, methotrexate, D-mannose, D-galactose, alpha-amino adipic acid, ancymidol. ³This library was normalized essentially as described in U.S. Pat. No. 5,482,845

cDNA libraries representing mRNAs from the tissues described in Table 1 were prepared in Uni-ZAP™ XR vectors according to the manufacturer's protocol (Stratagene Cloning Systems, La Jolla, Calif.). Conversion of the Uni-ZAP™ XR libraries into plasmid libraries was accomplished according to the protocol provided by Stratagene. Upon conversion, cDNA inserts were contained in the plasmid vector pBluescript. cDNA inserts from randomly picked bacterial colonies containing recombinant pBluescript plasmids were amplified via polymerase chain reaction using primers specific for vector sequences flanking the inserted cDNA sequences or plasmid DNA was prepared from cultured bacterial cells. Amplified insert DNAs or plasmid DNAs were sequenced in dye-primer sequencing reactions to generate partial cDNA sequences (expressed sequence tags or “ESTs”; see Adams, M. D. et al., (1991) Science 252:1651). The resulting ESTs were analyzed using a Perkin Elmer Model 377 fluorescent sequencer.

Full-insert sequence (FIS) data was generated utilizing a modified transposition protocol. Clones identified for FIS were recovered from archived glycerol stocks as single colonies, and plasmid DNAs were isolated via alkaline lysis. Isolated DNA templates were reacted with vector primed M13 forward and reverse oligonucleotides in a PCR-based sequencing reaction and loaded onto automated sequencers. Confirmation of clone identification was performed by sequence alignment to the original EST sequence from which the FIS request was made.

Confirmed templates were transposed via the Primer Island transposition kit (PE Applied Biosystems, Foster City, Calif.) which is based upon the Saccharomyces cerevisiae Ty1 transposable element (Devine and Boeke (1994) Nucleic Acids Res. 22:3765-3772). The in vitro transposition system places unique binding sites randomly throughout a population of large DNA molecules. The transposed DNA was then used to transform DH10B electro-competent cells (Gibco BRL/Life Technologies, Rockville, Md.) via electroporation. The transposable element contains an additional selectable marker (named DHFR; Fling and Richards (1983) Nucleic Acids Res. 11:5147-5158), allowing for dual selection on agar plates of only those subclones containing the integrated transposon. Multiple subclones were randomly selected from each transposition reaction, plasmid DNAs were prepared via alkaline lysis, and templates were sequenced (ABI Prism dye-terminator ReadyReaction mix) outward from the transposition event site, utilizing unique primers specific to the binding sites within the transposon.

Sequence data was collected (ABI Prism Collections) and assembled using Phred and Phrap (Ewing et al. (1998) Genome Res. 8:175-185; Ewing and Green (1998) Genome Res. 8:186-194). Phred re-reads the ABI sequence data, re-calls the bases, assigns quality values, and writes the base calls and quality values into editable output files. Phrap is a sequence assembly program that uses the quality values assigned by Phred to increase the accuracy of the assembled sequence contigs. Assemblies are viewed using the Consed sequence editor (Gordon et al. (1998) Genome Res. 8:195-202).

Example 5 Identification and Characterization of cDNA Clones Encoding Putative Homologs of the Oryza sativa RE2 Protein

Clones containing cDNA-inserts encoding-polypeptides homologous to rice RE2 protein were identified by conducting BLAST (Basic Local Alignment Search Tool; Altschul et al. (1993) J. Mol. Biol. 215:403-410) searches for similarity to sequences contained in the Du Pont proprietary database. The sequences identified were also compared, using BLAST, to the Genbank database.

A BLASTX search was performed to identify cDNAs encoding proteins similar to those encoded by the RE2 gene. BLASTX compares the translation, in all six reading frames, of the nucleotide query sequence to a protein database. As mentioned in Example 2, the Oryza sativa RE2 gene has the sequence found in the NCBI database having locus tag OSJNBb0042K08.9 that is predicted to encode a protein of unknown function found in the NCBI database as Version AAL77143.1 having NCBI General Identifier No. 18652509. Thus, the polypeptides encoded by the cDNAs identified in the BLASTX search are similar to the protein of unknown function found in the NCBI database as Version AAL77143.1 having NCBI General Identifier No. 18652509.

The BLASTX search using the nucleotide sequences from the clones listed in Table 1 revealed that the polypeptides encoded by these CDNAs had similarity to the Oryza sativa protein having NCBI General Identifier No. 18652509 and the Arabidopsis thaliana LOB domain 18 protein having NCBI General Identifier No. 17227164. Set forth in Table 2 are the BLASTX results for individual ESTs (“EST”), or for the sequences of the entire cDNA inserts comprising the indicated cDNA clones (“FIS”):

TABLE 2 BLAST Results for Sequences Encoding Polypeptides Homologous To O. sativa RE2 Protein and A. thaliana LOB Domain 18 Protein aa BLAST pLog Score Clone SEQ ID NO: Status 18652509 17227164 rice RE2 26 FIS >180.00 73.70 cef1f.pk001.f4:fis 37 FIS 58.00 58.30 cpf1.c.pk006.d 18a:fis 39 FIS 36.00 38.22 cpi1c.pk005.a12:fis 41 FIS 60.70 58.15 cr1n.pk0028.h3a:fis 43 FIS 34.22 34.40 eel1c.pk003.b10:fis 45 FIS 59.05 69.40 eav1c.pk003.c9 47 EST 47.00 51.70 lds3c.pk011.j11:fis 49 FIS 53.10 56.40 sdr1f.pk005.d21.f:fis 51 FIS 39.15 41.10 wdr1f.pk002.l10:fis 53 FIS 36.30 33.70

The data set forth in Table 3 presents the percent identity, calculated using the Clustal V method of alignment, of the amino acid sequences set forth in SEQ ID NOs:26, 37, 39, 41, 43, 45, 47, 49, 51, and 53, with the Oryza saliva protein having NCBI General Identifier No. 18652509 (set forth in SEQ ID NO:27), and the Arabidopsis thaliana LOB domain 18 protein (NCBI General Identifier No. 17227164; set forth in SEQ ID NO:54).

TABLE 3 Percent Identity of Amino Acid Sequences Deduced From Nucleotide Sequences of cDNA Clones Encoding Putative O. sativa RE2 Homolog Polypeptides aa Percent Identity to SEQ ID NO. 18652509 17227164 rice RE2 26 100.00 49.6 cef1f.pk001.f4:fis 37 59.1 56.9 cpf1c.pk006.d18a:fis 39 38.5 40.4 cpi1c.pk005.a12:fis 41 57.8 52.7 cr1n.pk0028.h3a:fis 43 45.8 47.6 eel1c.pk003.b10:fis 45 53.4 58.6 eav1c.pk003.c9 47 79.2 85.4 lds3c.pk011.j11:fis 49 41.8 47.4 sdr1f.pk005.d21.f:fis 51 40.8 42.3 wdr1f.pk002.l10:fis 53 43.9 41.2

Sequence alignments and percent identity calculations were performed using the Megalign program of the LASERGENE bioinformatics computing suite (DNASTAR Inc., Madison, Wis.). Multiple alignment of the sequences was performed using the Clustal V method of alignment (Higgins, D. G. and Sharp, P. M. (1989) Comput. Appl. Biosci. 5:151-153; Higgins, D. G. et al. (1992) Comput. Appl. Biosci. 8:189-191.) and the default parameters (GAP PENALTY=10, GAP LENGTH PENALTY=10). Default parameters for pairwise alignments using the Clustal V method were KTUPLE 1, GAP PENALTY=3, WINDOW=5 and DIAGONALS SAVED=5. Sequence alignments and BLAST scores and probabilities indicate that the nucleic acid fragments comprising the instant cDNA clones encode polypeptides with homology to the O. sativa RE2 protein and the A. thaliana LOB 18 domain protein.

Example 6 Structure of the Oryza sativa RE2 Protein and its Putative Homologs

As set forth on Table 3, Example 5, the amino acid sequence of the RE2 polypeptide (SEQ ID NO:26) set forth in Example 2, above, to be able to complement an re2 mutant phenotype was identical to the Oryza sativa protein having NCBI General Identifier No. 18652509 (SEQ ID NO:27) and had sequence similarity to the Arabidopsis thaliana LOB domain 18 protein having NCBI General Identifier No. 17227164 (set forth in SEQ ID NO:54).

The LOB domain 18 protein is considered to belong in the class I group of the Lateral Organ Boundaries (LOB) domain protein plant-specific gene family. The Class I LOB domain proteins contain a C-block, a GAS-block, and a leucine zipper motif (Shuai, B. et al., 2002, Plant Phys. 129:747-761). Thus, it is expected that the Oryza sativa RE2 protein and its homologs also contain a C-block, a GAS-block, and a leucine zipper motif. The consensus sequences of these motifs were identified using a Clustal V alignment and are indicated in FIG. 3A-C.

FIG. 3A-C depicts the Clustal V alignment obtained for the amino acid sequences from the wild type rice RE2 protein (SEQ ID NO:26), the O. sativa protein having NCBI General Identifier No. 18652509 (SEQ ID NO:27), the A. thaliana LOB domain 18 protein having NCBI General Identifier No. 17227164 (SEQ ID NO:54), and the amino acid sequences of the polypeptides encoded by corn clones cef1f.pk001.f4:fis (SEQ ID NO:37), cpf1c.pk006.d18a:fis (SEQ ID NO:39), cpi1c.pk005.a12:fis (SEQ ID NO:41), and cr1n.pk0028.h3a:fis (SEQ ID NO:43), Euphorbia lagascae clone eel1c.pk003.b10:fis (SEQ ID NO:45), columbine clone eav1c.pk003.c9 (SEQ ID NO:47), guar clone lds3c.pk011.j11:fis (SEQ ID NO:49), soybean clone sdr1f.pk005.d21.f:fis (SEQ ID NO:51), and wheat clone wdr1f.pk002.l10:fis (SEQ ID NO:53). The program uses dashes to maximize the alignment. An asterisk (*) below the alignment indicates amino acids conserved among all the sequences. The C-block, a GAS-block, and a leucine zipper conserved motifs are set forth boxed.

Table 4 sets forth the amino acid position of the C-block, Gas Block, and leucine zipper conserved amino acid domains in SEQ ID NOs:26, 54, 37, 39, 41, 43, 45, 47, 49, 51, and 53. The amino acids in each domain are indicated in FIG. 1 and the consensus sequence for each domain described below the table.

TABLE 4 Location of the Conserved Domains in Oryza sativa RE2 and its Putative Homologs Gas Block SEQ ID NO: C-Block N-end C-end Leu Zipper 26/27 33-54 63-74 103-111 116-134 54 37-58 67-78 107-115 120-138 37 34-55 64-75 104-112 117-135 39 24-45 54-65 94-102 107-125 41 32-53 62-73 102-110 115-133 43 24-45 44-55 84-92 97-115 45 30-51 60-71 100-108 113-131 47 22-33 62-70 75-93 49 20-41 50-61 90-98 103-121 51 16-37 46-57 86-94 99-117 53 12-33 42-53 82-90 95-113

In the following consensus sequences the amino acids are indicated with their one letter code, positions where more than one amino acid is found at that position are indicated in parenthesis and the amino acids separated by a slash. An X is used in cases where at a certain position any amino acid may be present. The amino acids comprising the C-Block, GAS Block N-end and C-end, and Leu Zipper identified here follow:

The C block consensus sequence found in RE2 homologs is set forth in SEQ ID NO:55 and corresponds to:

SEQ ID NO:55: PCGACKFLRR(K/R)C(V/Q/A)X(G/D/E)C(V/I)FAP(Y/H)F

The GAS block has 49 amino acids that have an N-end consensus sequence set forth in SEQ ID NO:56 and a C-end C-end consensus sequence set forth in SEQ ID NO:57.

SEQ ID NO:56: FAA(V/I)HKVFGASN SEQ ID NO:57: RDP(V/I)(F/Y)GCV(A/S)

The consensus sequence for the Leucine Zipper domain is set forth in SEQ ID NO:58 and corresponds to:

SEQ ID NO:58: LQ(Q/H)QV(A/V/G)XLQX(E/Q)(L/V)X(Y/Q/H)(L/A/V) (Q/K/R)X(H/Q/Y)(L/V)

The C-Block, GAS Block N-end and C-end, and Leu Zipper consensus sequences set forth above were identified in a Clustal V alignment of polypeptides similar to the Oryza sativa RE2, thus, they should be present in any polypeptide having the same function in altering embryo/endosperm size during seed development as the Oryza sativa RE2 polypeptide.

Example 7 Cloning and Sequencing of a Genomic Fragment Encoding a Maize Putative RE2 Homolog and Preparation of a Recombinant DNA Construct to Complement re2 Mutant Plants

A genomic DNA fragment encoding a corn RE2 homolog was amplified from a maize genomic library, cloned and sequenced. Then, the portion of DNA from the initiator ATG to the terminator codon of the fragment encoding the maize RE2 homolog was used to replace the portion of DNA from the initiator ATG to the terminator codon encoding the rice RE2 protein in vector OsRE2pML18 as follows.

Cloning and Sequencing of a Genomic Fragment Encoding a Maize RE2 Homolog

The polynucleotide in cDNA clone cpi1c.pk005.a12 was identified in Example 5 as encoding a polypeptide with similarity to the Oryza saliva RE2 protein. A genomic fragment comprising the open reading frame in clone cpi1c.pk005.a12 was amplified from a maize genomic library (Stratagene, Catalog No. 946102) using oligonucleotide primers Cpi Bbsl F and Cpi Bsal R. Oligonucleotide primers Cpi Bbsl F and Cpi Bsal R were designed based on the sequence of clone cpi1c.pk005.a12, are set forth in SEQ ID NO:59 and SEQ ID NO:60, respectively, and have the sequences set forth as follows:

SEQ ID NO:59: 5′-GAAGACCAATGAGCGCTGGCGGCGGCAGCAG-3 SEQ ID NO:60: 5′-GGTCTCCTCATCTTGAGTGTGGCGGCGGGTGCTC-3′

Amplification was performed using the conditions suggested by the manufacturer of the library. The amplified DNA product comprising a maize RE2 homolog gene was named ZmRE2 ORF, was cloned into vector pGEM-T-easy, and was sequenced. The nucleotide sequence obtained for ZmRE2 ORF is set forth in SEQ ID NO:61. Nucleotides 79 through 429 correspond to the first exon, nucleotides 430 through 1363 correspond to an intron, and nucleotides 1364 through 1784 correspond to the second exon, and nucleotides 1785 to 1787 correspond to a stop codon.

A. Preparation of a Recombinant DNA Construct Encoding a Putative Maize RE2 Homolog

A recombinant DNA construct was prepared in which a genomic DNA fragment encoding a maize RE2 homolog present in ZmRE2 ORF was used to replace the Oryza sativa RE2 coding region in vector OsRE2pML18 (prepared in Example 3, above). The resulting chimeric construct comprises the genomic DNA fragment encoding a maize RE2 homolog (referred to as ZmRE2 ORF) surrounded by the sequences upstream of the initiator ATG and downstream of the termination codon from vector OsRE2pML18. This chimeric construct was prepared by amplifying portions upstream of the initiator ATG and downstream of the termination signal in vector OsRE2pML18, adding these portions to the pGEM-T-easy vector containing ZmRE2 ORF and then replacing the Oryza sativa RE2 coding sequence with this chimeric fragment in vector OsRE2pML18 as follows.

B. Amplification of a Fragment 5′ of the O. sativa RE2 Gene in Vector OsRE2pML18

A portion of the DNA fragment 5′ of the initiator ATG in vector OsRE2pML18 was amplified using oligonucleotide primers RE2 pro Bst 2F and RE2 PRO R Bbs. Oligonucleotide primers RE2 pro Bst 2F and RE2 PRO R Bbs are set forth in SEQ ID NO:62 and SEQ ID NO:63, respectively, and have the sequences set forth as follows:

SEQ ID NO:62: 5′-CACCATCATGTCAGTGTGCCAATACGCTAAACTTAGAAGA-3′ SEQ ID NO:63: 5′-GAAGACGCTCATTCTTGGAATGAGCCCCCA-3′

The amplified fragment comprises a portion of the Oryza sativa RE2 promoter and was cloned in pGEM-T-easy (Promega) to create plasmid RE2PRO whose sequence is set forth in SEQ ID NO:64.

C. Preparation of a Chimera Comprising the Fragments Amplified in A and B Above

Digestion of the pGEM-T-easy vector containing ZmRE2 ORF (prepared in A above) with Bbs I and Aat II produced a 1760 bp fragment. Restriction endonuclease Bbs I cuts the pGEM-T-easy vector containing ZmRE2 ORF immediately upstream of the initiator ATG and Aat II cuts in the vector, downstream of the maize stop codon. Plasmid RE2PRO was digested with Bbs I which cuts immediately upstream of the initiator ATG, and with Sal I which cuts in the vector's multiple cloning region. The 4316 bp fragment obtained from plasmid RE2PRO was ligated to the 1760 bp fragment obtained from the pGEM-T-easy vector containing ZmRE2 ORF by introducing the fragments in DH10B competent cells (Invitrogen). The resulting plasmid contains a portion of the Oryza sativa RE2 promoter region operably linked to the first codon of the genomic fragment encoding a maize RE2 homolog in ZmRE2 ORF.

D. Amplification of a Fragment 3′ of the O. sativa RE2 Gene in Vector OsRE2DML18

A portion of the DNA fragment 3′ of the termination signal in vector OsRE2 μMl18 was amplified using oligonucleotide primers RE2 TERM Xbal R and RF2 TERM EcoBspml. Oligonucleotide primers RE2 TERM Xbal R and RE2 TERM EcoBspml are set forth in SEQ ID NO:65 and SEQ ID NO:66, respectively, and have the sequences set forth as follows:

SEQ ID NO:65: 5′-GTAAAAGGATCTAGACACCTGGCTCTAGCCTCCAAGTA-3′ SEQ ID NO:66: 5′-TGGAGCGAATTCACCTGCCAAGATGATCCTCCTCACTGTGTGTGATCATC-3′

The amplified DNA product comprising a portion of the Oryza sativa RE2 terminator region was cloned into vector pGEM9z to produce plasmid pRE2TERGEM whose sequence is set forth in SEQ ID NO:67.

E. Addition of the Fragment Amplified in D Above to the Chimera of C Above

The maize sequences 3′ of the termination signal were replaced for rice sequences as follows.

Plasmid pRE2TERGEM was digested with Xba I and Eco RI to remove a 758 bp fragment containing only sequences from the termination region of the rice RE2 gene. This 758 bp fragment was cloned into vector pGEM7 that had been digested with Xba I and Eco RI to produce plasmid RE2TERMpGEM7.

Plasmid RE2TERMpGEM7 was digested with Bsp HI and Eco RI and an approximately 3.7 Kb fragment was recovered. The chimera prepared in C, above, was digested with Bsa I and Eco RI to remove the fragment comprising a portion of the Oryza sativa RE2 promoter region operably linked to the genomic fragment encoding a maize RE2 homolog. These two fragments were ligated to form a plasmid comprising a portion of the rice RE2 promoter operably linked to the genomic fragment encoding a maize RE2 homolog operably linked to a portion of the rice RE2 terminator region.

F. Preparation of a Vector Comprising a Genomic Fragment Encoding a Maize RE2 Homolog Under the Control of the Oryza Sativa RE2 Promoter and Terminator

A vector comprising the genomic fragment encoding the maize RE2 homolog under the control of the Oryza sativa RE2 promoter and terminator regions was assembled from vector OsRE2pML18 and the chimeric fragment prepared in part E above, as follows.

Vector OsR2pML18 and the chimeric fragment prepared in part E above were digested with restriction endonucleases Bst Ell and SexAI. Digestion of vector OsRE2pML18 removed the Oryza sativa RE2 coding region and portions of the promoter and terminator regions from vector OsRE2pML18 leaving a 12.1 Kb DNA fragment. Digestion of the fragment prepared in part E above, produced a 3.1 Kb fragment comprising a fragment encoding a maize RE2 homolog between portions of the Oryza sativa RE2 promoter and terminator regions. Ligation of the 12.1 and 3.1 Kb fragments produced a vector comprising a fragment encoding a maize RE2 homolog under the control of the Oryza sativa RE2 promoter and terminator. The vector comprising the maize RE2 homolog open reading frame under the control of the Oryza sativa RE2 promoter and terminator regions was named ZmRE2pML18.

Example 8 Genetic Complementation of a Rice Re2 Mutant Plant with an RE2 Homolog from Corn

Confirmation of the function of the corn RE2 homolog, identified in Example 5 above, was performed using genetic complementation. Rice callus cells derived from rice re2 mutant plants were transformed with vector ZmRE2pML18 prepared as described in Example 7 above. Transformations were performed using a Biolistic PDS-1000/He gun and the particle bombardment technique as in Example 3 above.

Transformation of re2-1 mutant cells with vector ZmRE2pML18 produced 14 transgenic plants. Thirteen of these fourteen plants produced seeds of which ten plants produced seeds having wild type appearance. Some of the seeds produced by these 10 plants had a wild-type phenotype and some had an re2 mutant phenotype. The ratio of seeds having a wild-type appearance to seeds having an re2 mutant phenotype varied in each plant. Approximately 25% to 70% of the seeds obtained from individual re2-1 mutant plants transformed with vector ZmRE2pML18 had a wild-type appearance. Restoration of a wild-type appearance in seeds from plants regenerated from re2 mutant cells transformed with the vector comprising the fragment encoding the corn RE2 homolog indicates that the corn RE2 homolog, encoded by ZmRE2 (SEQ ID NO:61), is capable of complementing an re2 mutation. These results suggest that the corn RE2 homolog performs the same function in corn as the rice RE2 protein performs in rice.

Example 9 Identification of a cDNA Clone Encoding OsRE2

A cDNA clone encoding OsRE2 was identified by screening a rice phage cDNA library using an RE2-specific probe.

The phage cDNA library was prepared from total RNA extracted from developing rice seeds harvested 2-5 days after pollination as follows. Total RNA was extracted using a TRIazol® Reagent containing phenol and guanidine thiocyanate (Life Technologies Inc., Rockville, Md.). Poly(A) mRNA was purified from the total RNA using mRNA Purification kits which consist of oligo (dT)-cellulose spin columns (Amersham Pharmacia Biotech Inc., Piscataway, N.J.). cDNA was synthesized using 5.5 μg of poly(A) mRNA and cDNA synthesis kits (Stratagene, La Jolla, Calif.), following manufacturer's protocol with the exception of using Superscript® reverse transcriptase (Life Technologies Inc.) in the first step instead of Moloney murine leukemia virus reverse transcriptase. The cDNA was size-fractionated using BRL cDNA Size Fraction Columns (GIBCO-BRL). Fractions 1 to 13 were precipitated, resuspended, and ligated with 1 μg Uni-ZAP XR vector following the manufacturer's instructions. After incubation for two days at 4° C. the ligated DNA was packaged using Gigapack III Gold® packaging extract (Stratagene, La Jolla, Calif.). The titer of the resulting library was approximately 7.8×10⁵plaque forming units per mL (pfu/mL). The cDNA phage library was amplified following the manufacturer's instructions and 150 mL of phage cDNA library were obtained. The amplified library had a 5.5×10⁸pfu/mL titer.

Screening for the RE2 cDNA was performed following standard protocols well known to those skilled in the art (Ausubel et al. 1993, “Current Protocols in Molecular Biology” John Wiley & Sons, USA, or Sambrook et al. 1989. Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press). Briefly, 1.0×10⁶pfu were plated, transferred to nylon membranes, and subjected to hybridization with radioactively-labeled RE2 second exon probe. The nucleotide sequence of RE2 second exon probe is shown in SEQ ID NO:71. Following hybridization the membranes were exposed to film where approximately 1 positive plaque was detected per 100,000 plaques plated. Eight plaques that gave a positive signal were isolated after a second round of screening. Lambda phage DNA was prepared from all 8 plaques, converted into plasmid DNA, and sequenced. Six of the eight clones contained a cDNA sequence encoding OsRE2. One of these six clones, RE2 cDNA C1, had a 5′UTR that extended 196 nucleotides upstream of the ATG start codon predicted from the genomic sequence. The nucleotide sequence of clone RE2 cDNA C1 is shown in SEQ ID NO:72.

Claims

1. An isolated polynucleotide comprising:

(a) a nucleic acid sequence encoding a polypeptide involved in altering embryo/endosperm size during seed development, said polypeptide having at least 80% amino acid sequence identity, based on the Clustal V method of alignment, when compared to an amino acid sequence selected from the group consisting of SEQ ID NOs:37, 39, 41, 43, 45, 47, 49, 51, and 53; or

(b) a nucleic acid sequence set forth in SEQ ID NO:25 wherein said sequence comprises at least one of the following modifications: (i) nucleotide 271 is a T residue instead of a C; (ii) nucleotide 110 is a T residue instead of a G; or (iii) nucleotide 75 is deleted; or

(c) a nucleic acid sequence set forth in SEQ ID NO:34 wherein (i) nucleotides 4473 through 4829 correspond to a first exon, and ii) nucleotides 5661 through 6110 correspond to a second exon, and further wherein the nucleotides of (c) (i) and/or (c)(ii) encode a polypeptide involved in altering embryo/endosperm size during seed development; or

(d) a nucleic acid sequence set forth in SEQ ID NO:72; or

(e) a full complement of (a), (b), (c), (d), or SEQ ID NO:34; or

(f) all or part of a non-coding or coding region of the isolated polynucleotide comprising sequences of (a), (b), (c), (d), (e), or SEQ ID NO:34 for use in co-suppression or antisense suppression of endogenous nucleic acid sequences encoding polypeptides involved in altering embryo/endosperm size during seed development.

2. The isolated polynucleotide of claim 1 wherein the amino acid sequence identity is at least 85%.

3. The isolated polynucleotide of claim 1 wherein the amino acid sequence identity is at least 90%.

4. The isolated polynucleotide of claim 1 wherein the amino acid sequence identity is at least 95%.

5. The isolated polynucleotide of claim 1 wherein the amino acid sequence identity is 100%.

6. The isolated polynucleotide of claim 1 wherein the nucleotide sequence corresponds to any of the nucleotide sequences set forth in SEQ ID NOs:34, 36, 38, 40, 42, 44, 46, 48, 50, 52, and 72.

7. A recombinant DNA construct comprising the isolated polynucleotide of any one of claims 1-6 operably linked to at least one regulatory sequence.

8. A plant comprising in its genome the recombinant DNA construct of claim 7.

9. Seeds and progeny thereof obtained from the plant of claim 8.

10. Oil obtained from the seeds of claim 9.

11. The plant of claim 8 wherein said plant is selected from the group consisting of rice, corn, sorghum, millet, rye, soybean, canola, wheat, barley, oat, beans, and nuts.

12. Transformed plant tissue or plant cells comprising the recombinant DNA construct of claim 7.

13. The transformed plant tissue or plant cells of claim 12 wherein the plant is selected from the group consisting of rice, corm, sorghum, millet, rye, soybean, canola, wheat, barley, oat, beans, and nuts.

14. A method of altering embryo/endosperm size during seed development in a plant comprising:

(a) transforming plant cells or plant tissue with the recombinant DNA construct of claim 7;

(b) regenerating transgenic plants from the transformed plant cells or plant tissue of (a);

(c) obtaining seeds and progeny thereof from the transgenic plants of (b) having altered embryo/endosperm size based on a comparison of embryo/endosperm size of seeds obtained from non-transformed plants.

15. The method of claim 14 wherein said plant is selected from the group consisting of rice, corn, sorghum, millet, rye, soybean, canola, wheat, barley, oat, beans, and nuts.

16. A method of mapping genetic variations related to controlling embryo/endosperm size and/or altering oil phenotype in plants comprising: in progeny plants resulting from the cross of step (a) wherein the evaluation is made using a method selected from the group consisting of RFLP analysis, SNP analysis, and PCR-based analysis.

(a) crossing two plant varieties; and

(b) evaluating genetic variations with respect to (i) a nucleic acid sequence selected from the group consisting of SEQ ID NOs:25, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52 and 72; or (ii) a nucleic acid sequence encoding a polypeptide selected from the group consisting of SEQ ID NOs:26, 29, 31, 33, 37., 39, 41, 43, 45, 47, 49, 51, and 53;

17. The method of claim 16 wherein the plant is selected from the group consisting of rice, corn, sorghum, millet, rye, soybean, canola, wheat, barley, oat, beans, and nuts.

18. A method of molecular breeding to control embryo/endosperm size and/or altering oil phenotype in plants comprising: in progeny plants resulting from the cross of step (a) wherein the evaluation is made using a method selected from the group consisting of RFLP analysis, SNP analysis, and PCR-based analysis.

(a) crossing two plant varieties; and

(b) evaluating genetic variations with respect to (i) a nucleic acid sequence selected from the group consisting of SEQ ID NOs:25, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52 and 72; or (ii) a nucleic acid sequence encoding a polypeptide selected from the group consisting of SEQ ID NOs:26, 29, 31, 33, 30, 32, 34, 36, 38, 40, 42, 44, and 46;

19. The plant of claim 18 wherein the plant is selected from the group consisting of rice, corn, sorghum, millet, rye, soybean, canola, wheat, barley, oat, beans, and nuts.