PROCESS FOR THE PRODUCTION OF CELLS WHICH ARE CAPABLE OF CONVERTING ARABINOSE

Info

Publication number: 20130040297
Type: Application
Filed: Apr 19, 2011
Publication Date: Feb 14, 2013
Applicant: DSM IP ASSETS (Heerlen)
Inventors: Paul Klaassen (Dordrecht), Bianca Elisabeth Maria Gielesen (Maassluis), Wilbert Herman Marie Heijne (Dordrecht), Gijsberdina Pieternella Van Suylekom (Gravenmoer)
Application Number: 13/642,107

Abstract

The invention relates to a process for the production of cells which are capable of converting arabinose, comprising the following steps: a) Introducing into a host strain that cannot convert arabinose, the genes AraA, araB and araD, this cell is designated as constructed cell; b) Subjecting the constructed cell to adaptive evolution until a cell that converts arabinose is obtained, c) Optionally, subjecting the first arabinose converting cell to adaptive evolution to improve the arabinose conversion; the cell produced in step b) or c) is designated as first arabinose converting cell; d) Analysing the full genome or part of the genome of the first arabinose converting cell and that of the constructed cell; e) Identifying single nucleotide polymorphisms (SNP's) in the first arabinose converting cell; and f) Using the information of the SNP's in rational design of a cell capable of converting arabinose; g) Construction of the cell capable of converting arabinose designed in step f).

Description

Description

FIELD OF THE INVENTION

The invention relates to a process for the production of cells which are capable of converting arabinose. The invention also relates to cells that may be produced by the process. The invention further relates to a process in which such cells are used for the production of a fermentation product, such as ethanol.

BACKGROUND OF THE INVENTION

Large-scale consumption of traditional, fossil fuels (petroleum-based fuels) in recent decades has contributed to high levels of pollution. This, along with the realisation that the world stock of fossil fuels is not limited and a growing environmental awareness, has stimulated new initiatives to investigate the feasibility of alternative fuels such as ethanol, which is a particulate-free burning fuel source that releases less CO2 than unleaded gasoline on a per litre basis. Although biomass-derived ethanol may be produced by the fermentation of hexose sugars obtained from many different sources, the substrates typically used for commercial scale production of fuel alcohol, such as cane sugar and corn starch, are expensive. Increases in the production of fuel ethanol will therefore require the use of lower-cost feedstocks. Currently, only lignocellulosic feedstock derived from plant biomass is available in sufficient quantities to substitute the crops currently used for ethanol production. In most lignocellulosic material, the second-most-common sugar, next to C6 sugar also contain considerable amounts of C5 sugars, including arabinose. Thus, for an economically feasible fuel production process, both hexose and pentose sugars must be fermented to form ethanol. The yeast Saccharomyces cerevisiae is robust and well adapted for ethanol production, but it is unable toconvert arabinose. Also, no naturally-occurring organisms are known which can ferment xylose to ethanol with both a high ethanol yield and a high ethanol productivity. There is therefore a need for an organism possessing these properties so as to enable the commercially-viable production of ethanol from lignocellulosic feedstocks.

SUMMARY OF THE INVENTION

An object of the invention is to provide a cell, in particular a yeast cell that is capable of converting arabinose.

This object is attained according to the invention that provides a process for the production of cells which are capable of converting arabinose, comprising the following steps:

a) Introducing into a host strain that cannot convert arabinose, the genes araA, araB and araD, this cell is designated as constructed cell;
b) Subjecting the constructed cell to adaptive evolution until a cell that converts arabinose is obtained,
c) Optionally, subjecting the first arabinose converting cell to adaptive evolution to improve the arabinose conversion; the cell produced in step b) or c) is designated as first arabinose converting cell;
d) Analysing the full genome or part of the genome of the first arabinose converting cell and that of the constructed cell;
e) Identifying single nucleotide polymorphisms (SNP's) in the first arabinose converting cell; and
f) Using the information of the SNP's in rational design of a cell capable of converting arabinose;
g) Construction of the cell capable of converting arabinose designed in step f).

The invention further provides a yeast cell having araA, araB and araD genes wherein chromosome VII has a size of from 1300 to 1600 Kb as determined by electrophoresis, with the exclusion of yeast cell BIE201.

The invention further relates to a polypeptide belonging to the group consisting of the polypeptides:

a. A polypeptide having a sequence encoded by polynucleotide SEQ ID NO: 14 having a substitution E455stop in SSY1 and variant polypeptides thereof wherein one or more of the other positions have mutation of an aminoacid with another aminoacid that is an existing aminoacid in the AA trans superfamily;
b. A polypeptide having having the sequence encoded by the polynucleotide SEQ ID NO: 16 having a substitution D171G in YJR154w and variant polypeptides thereof wherein one or more of the other positions have mutation of the aminoacid with another aminoacid that is an existing conserved aminoacid in the PhyH superfamily;
c. A polypeptide having the sequence encoded by the polynucleotide SEQ ID NO: 18 having a substitution S396G in CEP3;
d. A polypeptide having the sequence encoded by SEQ ID NO: 20 having a substitution T146P in GAL80 and variant polypeptides thereof wherein one or more of the other positions may have mutation of the aminoacid with an aminoacid that is an existing conserved aminoacid in the NADB Rossmann superfamily.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 sets out a physical map of vector pPWT006.

FIG. 2 sets out a physical map of plasmid pPWT018, the sequence of which is given in SEQ ID NO: 1.

FIG. 3 sets out an Autoradiogram showing the results of a hybridization experiment showing the correct integration of one copy of the plasmid pPWT080 in CEN.PK113-7D;

FIG. 4 sets out a physical map of plasmid pPWT080, the sequence of which is given in SEQ ID NO: 8.

FIG. 5 sets out an aerobic growth curve of reference strain BIE104A2P1 on 2% arabinose as sole carbon source,

FIG. 6 sets out an anaerobic growth curve of BIE104A2P1c on 2% arabinose as sole carbon source,

FIG. 7 sets out growth curve (sugar-, ethanol- and glycerol concentrations OD600 and CO2 produced (ml/hr, second axis) for BIE104 precultured on 2% glucose, and grown on Verduyn medium with 5% glucose, 5% xylose, 3.5% arabinose and 1% galactose, All % in w/w.

FIG. 8 sets out growth curve (sugar-, ethanol- and glycerol concentrations, OD600 and CO2 produced (ml/hr, second axis) for BIE104A2P1c precultured on 2% glucose, and grown on Verduyn medium with 5% glucose, 5% xylose, 3.5% arabinose and 1% galactose.

FIG. 9 sets out growth curve (sugar-, ethanol- and glycerol concentrations OD600 and CO2 produced (ml/hr, second axis) for BIE201 precultured on 2% glucose, and grown on Verduyn medium with 5% glucose, 5% xylose, 3.5% arabinose and 1% galactose, All % in w/w.

FIG. 10 sets out a schematic overview of crossing

FIG. 11 sets out an example of “Normalized Melting Curves” (melting curves; top panel) and a “Normalized melting Peaks” curve (lower panel). The latter is derived from the first graph and is showing the change in fluorescence signal as a function of the temperature. Strains BIE104A2P1 and BIE201 are displayed. The gene tested in this figure is YJR154w. The difference in melting temperature of the probe is clear between the two strains tested, BIE201 and BIE104A2P1.

FIG. 12 sets out a schematic representation (coverage plot) of chromosome VII in strain BIE201. The read depth is set out as a function of the position along the chromosome. Some parts of chromosome VII are present in multiple copies, i.e. two or three times overrepresented.

FIG. 13 sets out a CHEF gel, stained with ethidium bromide. Chromosomes were separated on their size using the CHEF technique. Strains analyzed are BIE104 (untransformed yeast cell), BIE104A2P1a (primary transformant unable to consume arabinose, synonym of BIE104A2P1), BIE104A2P1c, a strain derived from BIE104A2P1 by adaptive evolution, which is able to grow on arabinose, and strain BIE201, derived from BIE104A2P1c by adaptive evolution, which can grow on arabinose under anaerobic conditions. Shifts in chromosomes are observed (see text). Strain YNN295 is a marker strain (Bio-Rad).

FIG. 14 sets out a CHEF gel, blotted and hybridized with the araA probe.

Chromosomes were separated on their size using the CHEF technique. Strains analyzed are BIE104 (untransformed yeast cell), BIE104A2P1a (primary transformant unable to consume arabinose, synonym of BIE104A2P1), BIE104A2P1c, a strain derived from BIE104A2P1 by adaptive evolution, which is able to grow on arabinose, and strain BIE201, derived from BIE104A2P1c by adaptive evolution, which can grow on arabinose under anaerobic conditions. Shifts in chromosomes are observed (see text). Strain YNN295 is a marker strain (Bio-Rad), used as a reference for the size of the chromosomes.

FIG. 15 sets out a CHEF gel, blotted and hybridized with the ACT1 probe. Chromosomes were separated on their size using the CHEF technique. Strains analyzed are BIE104 (untransformed yeast cell), BIE104A2P1a (primary transformant unable to consume arabinose, synonym of BIE104A2P1), BIE104A2P1c, a strain derived from BIE104A2P1 by adaptive evolution, which is able to grow on arabinose, and strain BIE201, derived from BIE104A2P1c by adaptive evolution, which can grow on arabinose under anaerobic conditions. Shifts in chromosomes are observed (see text). Strain YNN295 is a marker strain (Bio-Rad), used as a reference for the size of the chromosomes.

FIG. 16 sets out a CHEF gel, blotted and hybridized with the PNC1 probe. Chromosomes were separated on their size using the CHEF technique. Strains analyzed are BIE104 (untransformed yeast cell), BIE104A2P1a (primary transformant unable to consume arabinose, synonym of BIE104A2P1), BIE104A2P1c, a strain derived from BIE104A2P1 by adaptive evolution, which is able to grow on arabinose, and strain BIE201, derived from BIE104A2P1c by adaptive evolution, which can grow on arabinose under anaerobic conditions. Shifts in chromosomes are observed (see text). Strain YNN295 is a marker strain (Bio-Rad), used as a reference for the size of the chromosomes.

FIG. 17 sets out a CHEF gel, blotted and hybridized with the HSF1 probe. Chromosomes were separated on their size using the CHEF technique. Strains analyzed are BIE104 (untransformed yeast cell), BIE104A2P1a (primary transformant unable to consume arabinose, synonym of BIE104A2P1), BIE104A2P1c, a strain derived from BIE104A2P1 by adaptive evolution, which is able to grow on arabinose, and strain BIE201, derived from BIE104A2P1c by adaptive evolution, which can grow on arabinose under anaerobic conditions. Shifts in chromosomes are observed (see text). Strain YNN295 is a marker strain (Bio-Rad), used as a reference for the size of the chromosomes.

FIG. 18 sets out a CHEF gel, blotted and hybridized with the YGRO31w probe. Chromosomes were separated on their size using the CHEF technique. Strains analyzed are BIE104 (untransformed yeast cell), BIE104A2P1a (primary transformant unable to consume arabinose, synonym of BIE104A2P1), BIE104A2P1c, a strain derived from BIE104A2P1 by adaptive evolution, which is able to grow on arabinose, and strain BIE201, derived from BIE104A2P1c by adaptive evolution, which can grow on arabinose under anaerobic conditions. Shifts in chromosomes are observed (see text). Strain YNN295 is a marker strain (Bio-Rad), used as a reference for the size of the chromosomes.

FIG. 19 sets out an example of ten dissected asci from the cross BIE104A2P1×BIE201. The asci were dissected with a Singer Micromanipulator. Each ascus consists of four ascospores. These ascospores are separated from each other and are put on the agar plate at distinctive distances. In theory, four haploid spore isolates can give rise to four individual colonies. The four colonies in a “column” originate from one ascus.

FIG. 20 illustrates the performance of strain BIE252 in the BAM (Biological Activity Monitor, Halotec, The Netherlands). The strain was precultured in Verduyn medium 2% glucose. Application in the BAM was done on Verduyn medium supplemented with 5% glucose, 5% xylose, 3.5% arabinose, 1% galactose and 0.5% mannose, pH4.2, under anaerobic conditions.

FIG. 21 illustrates the performance of strain BIE252ΔGAL80 in the BAM. The strain was precultured in Verduyn medium 2% glucose. Application in the BAM was done on Verduyn medium supplemented with 5% glucose, 5% xylose, 3.5% arabinose, 1% galactose and 0.5% mannose, pH4.2, under anaerobic conditions.

FIG. 22 sets out a schematic view of the double crossover integration of the complete adipic acid pathway into the genome.

FIG. 23 sets out a resulting chromatogram of an adipic acid standard and a sample measured with the analysis method.

FIG. 24 sets out a physical map of plasmid pGBS416ARABD

BRIEF DESCRIPTION OF THE SEQUENCE LISTING

SEQ ID NO: 1 sets out the sequence of pPWT018;

SEQ ID NO: 2 sets out the sequence of a primer for checking integration of pPWT018;

SEQ ID NO: 3 sets out a primer for checking integration of pPWT018 (with SEQ ID NO: 2) and for checking copy number pPWT018 (with SEQ ID NO: 4);

SEQ ID NO: 4 sets out the sequence for a primer for checking copy number pPWT018;

SEQ ID NO: 4 sets out the sequence for a primer for checking presence of pPWT018 in genome in combination with SEQ ID NO: 4;

SEQ ID NO: 6 sets out the sequence for a forward primer for generating the SIT2 probe;

SEQ ID NO: 7 sets out the sequence for a reverse primer for generating the SIT2 probe;

SEQ ID NO: 8 sets out the sequence for plasmid pPWT080;

SEQ ID NO: 9 sets out the sequence for a forward primer for checking correct integration of pPWT080 at the 3′-end of the GRE3-locus (with SEQ ID NO: 10) and for checking the copy number of plasmid pPWT080 (with SEQ ID NO: 11);

SEQ ID NO: 10 sets out the sequence for a reverse primer for checking correct integration of pPWT080 at the 3′-end of the GRE3-locus;

SEQ ID NO: 11 sets out the sequence for a reverse primer for checking the copy number of plasmid pPWT080 (with SEQ ID NO: 10);

SEQ ID NO: 12 sets out the sequence for a forward primer for generating an RKI1-probe;

SEQ ID NO: 13 sets out the sequence for a reverse primer for generating an RKI1-probe;

SEQ ID NO: 14 sets out the sequence for the sequence of the SSY1-gene in wild type strain BIE104;

SEQ ID NO: 15 sets out the sequence for the SSY1-gene in strains BIE104A2P1c and BIE201;

SEQ ID NO: 16 sets out the sequence for the YJR154w-gene in wild type strain BIE104;

SEQ ID NO: 17 sets out the sequence the YJR154w-gene in strains BIE104A2P1c and BIE201;

SEQ ID NO: 18 sets out the sequence the CEP3-gene in wild type strain BIE104;

SEQ ID NO: 19 sets out the sequence the CEP3-gene in strains BIE104A2P1c and BIE201;

SEQ ID NO: 20 sets out the sequence the YPL277c-gene in wild type strain BIE104;

SEQ ID NO: 21 sets out the sequence the YPL277c-gene in strains BIE104A2P1c and BIE201;

SEQ ID NO: 22 sets out the sequence for the GAL80-gene in wild type strain BIE104;

SEQ ID NO: 23 sets out the sequence the GAL80-gene in strain BIE201;

SEQ ID NO 24 sets out the sequence of forward primer SSY1;

SEQ ID NO 25 sets out the sequence of reverse primer SSY1;

SEQ ID NO 26 sets out the sequence of forward primer YJR154w;

SEQ ID NO 27 sets out the sequence of reverse primer YJR154w;

SEQ ID NO 28 sets out the sequence of forward primer CEP3;

SEQ ID NO 29 sets out the sequence of reverse primer CEP3;

SEQ ID NO 30 sets out the sequence of forward primer YPL277c;

SEQ ID NO 31 sets out the sequence of reverse primer YPL277c;

SEQ ID NO 32 sets out the sequence of forward primer GAL80;

SEQ ID NO 33 sets out the sequence of reverse primer GAL80;

SEQ ID NO 34 sets out the sequence of Hi-Res probe SSY1;

SEQ ID NO 35 sets out the sequence of Hi-Res probe YJR154w;

SEQ ID NO 36 sets out the sequence of Hi-Res probe CEP3;

SEQ ID NO 37 sets out the sequence of Hi-Res probe YPL277c;

SEQ ID NO 38 sets out the sequence of Hi-Res probe GAL80;

SEQ ID NO 39 sets out the sequence of forward primer YGL057c;

SEQ ID NO 40 sets out the sequence of reverse primer YGL057c;

SEQ ID NO 41 sets out the sequence of forward primer SDS23;

SEQ ID NO 42 sets out the sequence of reverse primer SDS23;

SEQ ID NO 43 sets out the sequence of forward primer ACT1;

SEQ ID NO 44 sets out the sequence of reverse primer ACT1;

SEQ ID NO 45 sets out the sequence of forward primer araA;

SEQ ID NO 46 sets out the sequence of reverse primer araA;

SEQ ID NO 47 sets out the sequence of forward primer ACT1;

SEQ ID NO 48 sets out the sequence of reverse primer ACT1;

SEQ ID NO 49 sets out the sequence of forward primer PNC1;

SEQ ID NO 50 sets out the sequence of reverse primer PNC1;

SEQ ID NO 51 sets out the sequence of forward primer HSF1;

SEQ ID NO 52 sets out the sequence of reverse primer HSF1;

SEQ ID NO 53 sets out the sequence of forward primer YGRO31w;

SEQ ID NO 54 sets out the sequence of reverse primer YGRO31w;

SEQ ID NO 55 sets out the sequence of forward primer (matA, matα);

SEQ ID NO 56 sets out the sequence of reverse primer matA;

SEQ ID NO 57 sets out the sequence of reverse primer matα (alpha);

SEQ ID NO 58 sets out the sequence of forward primer GAL80::kanMX;

SEQ ID NO 59 sets out the sequence of reverse primer GAL80::kanMX;

SEQ ID NO 60 sets out the sequence of Forward primer for amplification of the INT1LF;

SEQ ID NO 61 sets out the sequence of Reverse primer for the amplification of INT1LF with a 50 by flank overlapping Adi21 expression cassette;

SEQ ID NO 62 sets out the sequence of Forward primer for amplification of the Adi21 expression cassette with 50 by flank INT1LF;

SEQ ID NO 63 sets out the sequence of Reverse primer for the amplification of the Adi21 expression cassette

SEQ ID NO 64 sets out the sequence of Forward primer for the amplification of the Adi22 expression cassette;

SEQ ID NO 65 sets out the sequence of Reverse primer for the amplification of the Adi22 expression cassette;

SEQ ID NO 66 sets out the sequence of Forward primer for the amplification of the Adi23 expression cassette;

SEQ ID NO 67 sets out the sequence of Reverse primer for the amplification of the Adi23 expression cassette;

SEQ ID NO 68 sets out the sequence of Forward primer for the amplification of the kanMX marker from pUG7 with 50 by flank overlapping with Adi23;

SEQ ID NO 69 sets out the sequence of Reverse primer for the amplification of the kanMX marker from pUG7 with 50 by flank overlapping with Adi8;

SEQ ID NO 70 sets out the sequence of Forward primer for the amplification of the Adi8 expression cassette with 25 by flank overlap with kanMX of pUG7;

SEQ ID NO 71 sets out the sequence of Reverse primer Adi8 expression cassette;

SEQ ID NO 72 sets out the sequence of Forward primer for the amplification of the Adi24 expression cassette;

SEQ ID NO 73 sets out the sequence of Reverse primer for the amplification of the Adi24 expression cassette;

SEQ ID NO 74 sets out the sequence of Forward primer for the amplification of the Adi25 expression cassette;

SEQ ID NO 75 sets out the sequence of Reverse primer for the amplification of the Adi25 expression cassette with 50 by overlap with SucC;

SEQ ID NO 76 sets out the sequence of Forward primer for the amplification of the SucC with 50 by overlap with Adi25;

SEQ ID NO 77 sets out the sequence of Reverse primer for the amplification of the SucC expression cassette;

SEQ ID NO 78 sets out the sequence of Forward primer for the amplification of the SucD expression cassette;

SEQ ID NO 79 sets out the sequence of Reverse primer for the amplification of the SucD expression cassette;

SEQ ID NO 80 sets out the sequence of Forward primer for the amplification of the acdh67 expression cassette;

SEQ ID NO 81 sets out the sequence of Reverse primer for the amplification of the acdh67 construct with 50 by flank overlapping with INTRF;

SEQ ID NO 82 sets out the sequence of Forward primer for the amplification of the INT1LF site on yeast genome;

SEQ ID NO 83 sets out the sequence of Reverse primer for the amplification of the INT1LF site on yeast genome;

SEQ ID NO 84 sets out the sequence of ADI21 PCR fragment;

SEQ ID NO 85 sets out the sequence of ADI22 PCR fragment;

SEQ ID NO 86 sets out the sequence of ADI23 PCR fragment;

SEQ ID NO 87 sets out the sequence of ADI8 PCR fragment;

SEQ ID NO 88 sets out the sequence of ADI24 PCR fragment;

SEQ ID NO 89 sets out the sequence of ADI25 PCR fragment;

SEQ ID NO 90 sets out the sequence of SUCC PCR fragment;

SEQ ID NO 91 sets out the sequence of SUCD PCR fragment;

SEQ ID NO 92 sets out the sequence of ACDH67 PCR fragment;

SEQ ID NO 93 sets out the sequence of KANMX marker fragment;

SEQ ID NO 94 sets out the sequence of INT1LF PCR fragment;

SEQ ID NO 95 sets out the sequence of INT1RF PCR fragment;

SEQ ID NO 96 sets out the sequence of forward primer araABD cassette;

SEQ ID NO 97 sets out the sequence of reverse primer araABD cassette

SEQ ID NO 98 sets out the sequence of forward primer Ty1::araABD;

SEQ ID NO 99 sets out the sequence of reverse primer TY1::araABD;

SEQ ID NO 100 sets out the sequence of forward primer Ty1::kanMX;

SEQ ID NO 101 sets out the sequence of reverse primer Ty1::kanMX.

DETAILED DESCRIPTION OF THE INVENTION

Throughout the present specification and the accompanying claims, the words “comprise” and “include” and variations such as “comprises”, “comprising”, “includes” and “including” are to be interpreted inclusively. That is, these words are intended to convey the possible inclusion of other elements or integers not specifically recited, where the context allows.

The articles “a” and “an” are used herein to refer to one or to more than one (i.e. to one or at least one) of the grammatical object of the article. By way of example, “an element” may mean one element or more than one element.

The various embodiments of the invention described herein may be cross-combined. The invention provides a process for the production of cells which are capable of converting arabinose, comprising the steps a) to g) these will be described here in more detail:

Step a) Introducing into a host strain that cannot convert arabinose, the genes araA, araB and araD, this cell is designated as constructed cell

Step a) will be described below in detail in the description as well as being illustrated by the examples.

Steps b) and c) Subjecting the constructed cell to adaptive evolution until a cell that converts arabinose is obtained, Optionally, subjecting the first arabinose converting cell to adaptive evolution to improve the arabinose conversion; the cell produced in step b) or c) is designated as first arabinose converting cell;

Steps b) and c) will be described below in detail in the description under adaptive evolution as well as being illustrated by the examples.

Step d) Analysing the full genome or part of the genome of the first arabinose converting cell and that of the constructed cell;

This step d) may be executed using common techniques of genome resequencing

Step e) Identifying single nucleotide polymorphisms (SNP's) in the first arabinose converting cell;

By looking at the differences between the first arabinose converting cell and that of the constructed cell

Step f) Using the information of the SNP's in rational design of a cell capable of converting arabinose;

In step f) the skilled person will know to which SNP's arabinose conversion is attitubed, and with common skill be able to design an improved strain based on that information.

In steps e), f) and/or g) the skilled person preferably uses techniques of phenotyping, i.e. the identification of cells with desired traits and in combination with techniques of genotyping, i.e. the identification of candidate genes associated with the chosen traits.

Examples of techniques for phenotyping are growth experiments, in shake flasks or fementors, in the presence of single sugars or sugar mixtures. Also growth assays on solid agar media can be applied. However, other suitable known methods may be used.

Examples of techniques for genotyping are re-sequencing techniques, such as Solexa and the like, quatitative PCR (Q-PCR), Southern blotting. However other suitable known methods may be used.

Step g) Construction of the cell capable of converting arabinose designed in step f). In step g) all common techniques of construction of new strains may be used. In one embodiment, different strains (parents) are combined in order to combine advantageous properties of the parents. For example a crossing technique may be used involving the strain of step b) or c) which is crossed with a strain that does not have all SNP's present in the strains of step b) or c).

For example, a haploid yeast strain, transformed with genes necessary for or enhancing the ability to ferment arabinose (designated all together as ARA) was enhanced by a process called adaptive evolution. During the adaptive evolution process, three mutations have been introduced into the genome, designated mut1, mut2 and mut3. The genotype of such a yeast strain could be written as mut1 mut2 mut3 ARA.

Such a yeast strain may be crossed with another haploid yeast strain, also consisting of the genes needed for arabinose transformation, but yet unable to do so, because it lacks extra mutations to do so. However, this strain may have another beneficial property, such as tolerance to inhibitors. This property is designated as ABC. Such a process is illustrated in FIG. 10.

In an embodiment, in the above process, the yeast cell capable of converting arabinose has a chromosome that is amplified compared to the host strain, wherein the amplified chromosome has the same number as the chromosome in which the araA, araB and araD genes were introduced in the host strain. In an embodiment the amplified chromosome is chromosome VII. In an embodiment, in the yeast cell parts of chromosome VII, surrounding the centromere, are amplified (as compared to the host strain). In an embodiment, a region on the left arm of chromosome VII was amplified three times. In an embodiment, part of the right arm of chromosome VII was amplified twice, and an adjacent part was amplified three times (see FIG. 12).

The part on the right arm of chromosome VII that was amplified three times contains the arabinose expression cassette, i.e. the genes araA, araB and araD under control of strong constitutive promoters.

The invention further relates to a yeast cell having araA, araB and araD genes wherein chromosome VII has a size of from 1300 to 1600 Kb as determined by electrophoresis, with the exclusion of a yeast cell BIE201. Strain BIE201 has been disclosed in WO2011003893.

BIE201 has all the single nucleotide polymorphisms G1363T in the SSY1 gene, A512T in YJR154w gene, A1186G in CEP3 gene, and A436C in GAL80 gene.

In an embodiment, in the yeast cell, the copy number of the araA, araB and araD genes is two to ten, in an embodiment two to eight or three to five each. The copy number of the araA, araB and araD genes may be 2, 3, 4, 5, 6, 7, 8, 9, or 10. The copy number may be determined with methods known to the skilled person, Suitable methods are illustrated in the examples, and results are e.g. shown in FIG. 12

In an embodiment, the yeast cell one or more, but not all, of the single nucleotide polymorphism chosen from the group consisting of mutations G1363T in the SSY1 gene, A512T in YJR154w gene, A1186G in CEP3 gene, and A436C in GAL80 gene. In an embodiment, the yeast cell has a single polymorphism A436C in GAL80 gene. In an embodiment, the yeast cell has a single polymorphism A1186G in CEP3 gene.

Sexual Conjugation

Mating in yeast which is mediated by diffusible molecules, pheromones, can be readily demonstrated (Manney, Duntze & Betz 1981). When cells of opposite mating type are mixed on the surface of agar growth medium in a petri dish, changes become apparent within two to three hours. As each type of cell secretes its pheromone into the medium, it responds to the one produced by the opposite type (MacKay & Manney 1974). They each respond by differentiating into a specialized functional form, a gamete. The cells stop dividing and change their shape. They elongate and become pear-shaped. These distinctive cells have been termed “shmoos”. Cells of opposite mating types that are in contact or close proximity join at the surface and fuse together forming a characteristic “peanut” shape with a central constriction, i.e. two shmoos fused at their small ends. The two haploid nuclei within each joined pair fuse into a diploid nucleus, forming a true zygote. The diploid promptly buds at the constriction, forming a characteristic “clover leaf” figure. One can easily observe all of these stages under the microscope.

The mating pheromones that are secreted by haploid cells are small peptide molecules that diffuse through agar (Betz, Manney & Duntze 1981). Consequently, their existence and their effects on cells of the opposite mating types are easy to demonstrate. If cells of the mating type a (alpha) are grown overnight on agar medium, a high concentration of the pheromone accumulates in the agar surrounding the growth. If cells of the mating type a (matA or matα) are placed on this agar, they begin to undergo the “shmoo” transformation within a couple of hours. The same effect can be demonstrated in a liquid medium in which mating type a (alpha) cells have been grown.

Meiosis

Shmoos are the gametes in yeast. They differentiate from normal vegetative haploid cells only when a cell of the opposite mating type is present. In a like manner, any diploid cell can go through meiosis forming haploids which have the potential to become gametes (Esposito & Klapholz 1981; Fowell 1969). Meiosis is part of the process of sporulation which is initiated when diploid cells are transferred to a nutritionally unbalanced medium, but the changes become apparent under the microscope only after three to five days when the asci become quite distinctive. Theoretically, all asci should contain four spores but in practice, some contain only two or three. The ascus has a characteristic shape. Treating the sporulation mixture with a readily available crude preparation of digestive enzymes (e.g. Zymolyase, Glusulase) will remove the wall of the ascus, liberating the spores. When the spores, either within the ascus or after being liberated, are returned to a nutritionally adequate environment, they germinate and undergo vegetative growth in a stable haploid phase. Haploid strains occur in two mating types, called a and α (alpha). Within each ascus, two spores are normally mating-type a (matA) and the other two are a (matα (alpha)). When a cell of one mating type encounters one of the other mating type, they initiate a series of events that leads to conjugation (See Sexual Conjugation). The result is a diploid cell, which grows by mitotic cell division in a stable diploid phase. If one merely transfers a sporulated cell culture to growth medium the result is a mixed population of haploid strains and new diploid strains which are analogous to the progeny from a cross between diploid higher organisms.

Normally, yeast geneticists isolate the spores, either randomly or by micromanipulation, to prevent the haploid strains from mating and forming the next generation of diploid strains. This degree of control and the ability to observe the genetic traits in the haploid phase makes genetic analysis in yeast powerful and efficient.

Adaptation

Adaptation is the evolutionary process whereby a population becomes better suited (adapted) to its habitat or habitats. This process takes place over several to many generations, and is one of the basic phenomena of biology.

The term adaptation may also refer to a feature which is especially important for an organism's survival. Such adaptations are produced in a variable population by the better suited forms reproducing more successfully, by natural selection.

Changes in environmental conditions alter the outcome of natural selection, affecting the selective benefits of subsequent adaptations that improve an organism's fitness under the new conditions. In the case of an extreme environmental change, the appearance and fixation of beneficial adaptations can be essential for survival. A large number of different factors, such as e.g. nutrient availability, temperature, the availability of oxygen, etcetera, can drive adaptive evolution.

Fitness

There is a clear relationship between adaptedness (the degree to which an organism is able to live and reproduce in a given set of habitats) and fitness. Fitness is an estimate and a predictor of the rate of natural selection. By the application of natural selection, the relative frequencies of alternative phenotypes will vary in time, if they are heritable.

Genetic Changes

When natural selection acts on the genetic variability of the population, genetic changes are the underlying mechanism. By this means, the population adapts genetically to its circumstances. Genetic changes may result in visible structures, or may adjust the physiological activity of the organism in a way that suits the changed habitat.

It may occur that habitats frequently change. Therefore, it follows that the process of adaptation is never finally complete. In time, it may happen that the environment changes gradually, and the species comes to fit its surroundings better and better. On the other hand, it may happen that changes in the environment occur relatively rapidly, and then the species becomes less and less well adapted. Adaptation is a genetic process, which goes on all the time to some extent, also when the population does not change the habitat or environment.

Single nucleotides in a DNA sequence may be changed (substitution), removed (deletions) or added (insertion). Insertion or deletion SNPs (InDels) may shift the translational frame.

Single nucleotide polymorphisms may fall within coding sequences of genes (Open Reading Frames or ORFS), non-coding regions of genes (like promoter sequences, terminator sequences and the like), or in the intergenic regions between genes. SNPs within a coding sequence will not necessarily change the amino acid sequence of the corresponding protein that is produced after transcription and translation, due to degeneracy of the genetic code. A SNP in which both forms lead to the same polypeptide sequence is termed synonymous (a silent mutation). If a different polypeptide sequence is produced they are nonsynonymous. A nonsynonymous change may either be missense or nonsense. A missense change results in a different amino acid in the corresponding polypeptide, while a nonsense change results in a premature stop codon, sometimes leading to the formation of a truncated protein.

SNPs that are not in protein-coding regions may still have consequences for gene expression, for instance by a changed transcription factor binding or stability of the corresponding mRNA.

The changes that may occur in the DNA are not necessarily limited to the change (substitution, deletion or insertion) of a single nucleotide, but may also comprise a change of two or more nucleotides (Small Nuclear Variations).

In addition, chromosomal translocations may occur. A chromosome translocation is a chromosome abnormality caused by rearrangement of parts between nonhomologous chromosomes.

In particular, according to the invention SNP are created in the following reading frames: SSY1, CEP3 and GAL80.

SSY1 is herein a component of the SPS plasma membrane amino acid sensor system (Ssy1p-Ptr3p-Ssy5p), which senses external amino acid concentration and transmits intracellular signals that result in regulation of expression of amino acid permease genes.

CEP3 is herein an essential kinetochore protein, component of the CBF3 complex that binds the CDEIII region of the centromere; contains an N-terminal Zn2Cys6 type zinc finger domain, a C-terminal acidic domain, and a putative coiled coil dimerization domain. GAL80 is herein a transcriptional regulator involved in the repression of GAL genes in the absence of galactose. Typically it inhibits transcriptional activation by Gal4p and inhibition is relieved by Gal3p or Gal1p binding.

According to the invention, SNP's in the genes SSY1, CEP3 and GAL80 have been shown to be important for the cell to be able to ferment a mixed sugar composition. BLAST searches were conducted for the SNP's found in these genes.

An overview of the SNP that were identified is given in table 1:

TABLE 1 Overview of SNP's of the invention Nucleotide mutation Amino acid mutation Gene position in ORF* position in protein SSY1 G1363T E455stop YJR154w A512G D171G CEP3 A1186G S396G GAL80 A436C T146P *the A of the start codon ATG is the first nucleotide position

A blast of the genes containing the SNP resulted in the following data:

Ssy1p (Member of the AA Trans Superfamily)

Component of the SPS plasma membrane amino acid sensor system (Ssy1p-Ptr3p-Ssy5p), which senses external amino acid concentration and transmits intracellular signals that result in regulation of expression of amino acid permease genes [Saccharomyces cerevisiae]

Ssy1p S. cerevisiae JAY291 852 aa 99% identity Ssy1p S. cerevisiae YJM789 852 aa 99% identity YDR160w-like protein S. cerevisiae AWRI1631 791 aa 99% identity ZYRO0F13838p Z. rouxii CBS 732 836 aa 56% identity hypothetical protein C. glabrata CBS 138 853 aa 53% identity KLTH0G11726p Lachancea 824 aa 46% identity thermotolerans

Shorter protein found in S. cerevisiae BIE201 is a unique feature.

YJR154w (Member of the PhyH Superfamily)

Putative protein of unknown function; green fluorescent protein (GFP)-fusion protein localizes to the cytoplasm [Saccharomyces cerevisiae]

YJR154w S. cerevisiae JAY291 346 aa 100% identity conserved protein S. cerevisiae YJM789 346 aa 99% identity putative pimeloyl- S. cerevisiae 346 aa 71% identity CoA synth. YJR154Wp-like S. cerevisiae AWRI1631 227 aa 99% identity protein KLTH0E09900p Lachancea thermotolerans 340 aa 48% identity

In all these proteins, the D-residue at position 171 (or equivalent position based on the BLAST results) is conserved.

CEP3 (GAL4-Like Zn2Cys6 Binuclear Cluster DNA-Binding Domain; Found in Transcription Regulators like GAL4)

Centromere DNA-binding protein complex CBF3 subunit B

CEP3 S. cerevisiae JAY291 608 aa 100% identity ZYRO0A07260p Z. rouxii CBS 732 596 aa 46% identity unnamed protein Candida glabrata CBS138 611 aa 44% identity product AFL200Wp A. gossypii ATCC 10895 596 aa 41% identity

In all these proteins, the S-residue at position 396 (or equivalent position based on the BLAST results) is conserved.

GAL80 (Member of the NADB Rossmann Superfamily)

Galactose/lactose metabolism regulatory protein GAL80

transcriptional regulator S. cerevisiae 435 aa 100% identity YJM789 GAL80p S. kudriavzevii 435 aa 89% identity protein Kpol_1059p5 V. polyspora 429 aa 73% identity DSM 70294 ZYRO0G04664p Z. rouxii CBS 732 437 aa 67% identity KLTH0C02838p L. thermotolerans 424 aa 64% identity KIGAL80 protein Kluyveromyces 457 aa 58% identity lactis NECHADRAFT_86878 N. haematococca 367 aa 30% identity mpVI 77-13-4

In all these proteins, the T-residue at position 146 (or equivalent position based on the BLAST results) is conserved.

The Sugar Composition

The sugar composition according to the invention comprises glucose, arabinose and xylose. Any sugar composition may be used in the invention that suffices those criteria. Optional sugars in the sugar composition are galactose and rhamnose. In a preferred embodiment, the sugar composition is a hydrolysate of one or more lignocellulosic material. Lignocelllulose herein includes hemicellulose and hemicellulose parts of biomass. Also lignocellulose includes lignocellulosic fractions of biomass. Suitable lignocellulosic materials may be found in the following list: orchard primings, chaparral, mill waste, urban wood waste, municipal waste, logging waste, forest thinnings, short-rotation woody crops, industrial waste, wheat straw, oat straw, rice straw, barley straw, rye straw, flax straw, soy hulls, rice hulls, rice straw, corn gluten feed, oat hulls, sugar cane, corn stover, corn stalks, corn cobs, corn husks, switch grass, miscanthus, sweet sorghum, canola stems, soybean stems, prairie grass, gamagrass, foxtail; sugar beet pulp, citrus fruit pulp, seed hulls, cellulosic animal wastes, lawn clippings, cotton, seaweed, trees, softwood, hardwood, poplar, pine, shrubs, grasses, wheat, wheat straw, sugar cane bagasse, corn, corn husks, corn hobs, corn kernel, fiber from kernels, products and by-products from wet or dry milling of grains, municipal solid waste, waste paper, yard waste, herbaceous material, agricultural residues, forestry residues, municipal solid waste, waste paper, pulp, paper mill residues, branches, bushes, canes, corn, corn husks, an energy crop, forest, a fruit, a flower, a grain, a grass, a herbaceous crop, a leaf, bark, a needle, a log, a root, a sapling, a shrub, switch grass, a tree, a vegetable, fruit peel, a vine, sugar beet pulp, wheat midlings, oat hulls, hard or soft wood, organic waste material generated from an agricultural process, forestry wood waste, or a combination of any two or more thereof.

An overview of some suitable sugar compositions derived from lignocellulose and the sugar composition of their hydrolysates is given in table 1. The listed lignocelluloses include: corn cobs, corn fiber, rice hulls, melon shells, sugar beet pulp, wheat straw, sugar cane bagasse, wood, grass and olive pressings.

TABLE 1 Overview of sugar compositions from lignocellulosic materials. Lignocellulosic %. material Gal Xyl Ara Man Glu Rham Sum Gal. Lit. Corn cob a 10 286 36 227 11 570 1.7 (1) Corn cob b 131 228 160 144 663 19.8 (1) Rice hulls a 9 122 24 18 234 10 417 2.2 (1) Rice hulls b 8 120 28 209 12 378 2.2 (1) Melon Shells 6 120 11 208 16 361 1.7 (1) Sugar beet pulp 51 17 209 11 211 24 523 9.8 (2) Whea straw Idaho 15 249 36 396 696 2.2 (3) Corn fiber 36 176 113 372 697 5.2 (4) Cane Bagasse 14 180 24 5 391 614 2.3 (5) Corn stover 19 209 29 370 626 (6) Athel (wood) 5 118 7 3 493 625 0.7 (7) Eucalyptus (wood) 22 105 8 3 445 583 3.8 (7) CWR (grass) 8 165 33 340 546 1.4 (7) JTW (grass) 7 169 28 311 515 1.3 (7) MSW 4 24 5 20 440 493 0.9 (7) Reed Canary Grass 16 117 30 6 209 1 379 4.2 (8) Veg Reed Canary Grass 13 163 28 6 265 1 476 2.7 (9) Seed Olive pressing residu 15 111 24 8 329 487 3.1 (9) Gal = galactose, Xyl = xylose, Ara = arabinose, Man = mannose, Glu = glutamate, Rham = rhamnose. The percentage galactose (% Gal) and literature source is given.

It is clear from table 1 that in these lignocelluloses a high amount of sugar is presence in de form of glucose, xylose, arabinose and galactose. The conversion of glucose, xylose, arabinose and galactose to fermentation product is thus of great economic importance. Also rhamnose is present in some lignocellulose materials be it in lower amounts than the previously mentioned sugars. Advantageously therefore also rhamnose is converted by the mixed sugar cell.

Pretreatment and Enzymatic Hydrolysis

Pretreatment and enzymatic hydrolysis may be needed to release sugars that may be fermented according to the invention from the lignocellulosic (including hemicellulosic) material. These steps may be executed with conventional methods.

The Mixed Sugar Cell

The mixed sugar cell comprising the genes araA, araB and araD integrated into the mixed suger cell genome as defined hereafter. It is able to ferment glucose, arabinose, xylose, galactose and mannose. In one embodiment of the invention the mixed sugar cell is able to ferment one or more additional sugar, preferably C5 and/or C6 sugar. In an embodiment of the invention the mixed sugar cell comprises one or more of: a xylA-gene and/or XKS1-gene, to allow the mixed sugar cell to ferment xylose; deletion of the aldose reductase (GRE3) gene; overexpression of PPP-genes TAL1, TKL1, RPE1 and RKI1 to allow the increase of the flux through the pentose phosphate pass-way in the cell.

Construction of the Mixed Sugar Strain

The genes may be introduced in the mixed sugar cell by introduction into a host cell:

a) a cluster consisting of PPP-genes TAL1, TKL1, RPE1 and RKI1, under control of strong promoters;
b) a cluster consisting of a xylA-gene and a XKS1-gene both under control of constitutive promoters,
c) a cluster consisting of the genes araA, araB and araD and/or a cluster of xylA-gene and/or the XKS1-gene; and
d) deletion of an aldose reductase gene
and adaptive evolution to produce the mixed sugar cell. The above cell may be constructed using recombinant expression techniques.

Recombinant Expression

The cell of the invention is a recombinant cell. That is to say, a cell of the invention comprises, or is transformed with or is genetically modified with a nucleotide sequence that does not naturally occur in the cell in question.

Techniques for the recombinant expression of enzymes in a cell, as well as for the additional genetic modifications of a cell of the invention are well known to those skilled in the art. Typically such techniques involve transformation of a cell with nucleic acid construct comprising the relevant sequence. Such methods are, for example, known from standard handbooks, such as Sambrook and Russel (2001) “Molecular Cloning: A Laboratory Manual (3rd edition), Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, or F. Ausubel et al., eds., “Current protocols in molecular biology”, Green Publishing and Wiley Interscience, New York (1987). Methods for transformation and genetic modification of fungal host cells are known from e.g. EP-A- 0635 574, WO 98/46772, WO 99/60102, WO 00/37671, W090/14423, EP-A-0481008, EP-A-0635574 and U.S. Pat. No. 6,265,186.

Typically, the nucleic acid construct may be a plasmid, for instance a low copy plasmid or a high copy plasmid. The cell according to the present invention may comprise a single or multiple copies of the nucleotide sequence encoding a enzyme, for instance by multiple copies of a nucleotide construct or by use of construct which has multiple copies of the enzyme sequence.

The nucleic acid construct may be maintained episomally and thus comprise a sequence for autonomous replication, such as an autosomal replication sequence sequence. A suitable episomal nucleic acid construct may e.g. be based on the yeast 2μ or pKD1 plasmids (Gleer et al., 1991, Biotechnology 9: 968-975), or the AMA plasmids (Fierro et al., 1995, Curr Genet. 29:482-489). Alternatively, each nucleic acid construct may be integrated in one or more copies into the genome of the cell. Integration into the cell's genome may occur at random by non-homologous recombination but preferably, the nucleic acid construct may be integrated into the cell's genome by homologous recombination as is well known in the art (see e.g. WO90/14423, EP-A-0481008, EP-A-0635 574 and U.S. Pat. No. 6,265,186).

Most episomal or 2μ plasmids are relatively unstable, being lost in approximately 10⁻²or more cells after each generation. Even under conditions of selective growth, only 60% to 95% of the cells retain the episomal plasmid. The copy number of most episomal plasmids ranges from 10-40 per cell of cir⁺ hosts. However, the plasmids are not equally distributed among the cells, and there is a high variance in the copy number per cell in populations. Strains transformed with integrative plasmids are extremely stable, even in the absence of selective pressure. However, plasmid loss can occur at approximately 10⁻³to 10⁻⁴frequencies by homologous recombination between tandemly repeated DNA, leading to looping out of the vector sequence. Preferably, the vector design in the case of stable integration is thus, that upon loss of the selection marker genes (which also occurs by intramolecular, homologous recombination) that looping out of the integrated construct is no longer possible. Preferably the genes are thus stably integrated. Stable integration is herein defined as integration into the genome, wherein looping out of the integrated construct is no longer possible. Preferably selection markers are absent. Typically, the enzyme encoding sequence will be operably linked to one or more nucleic acid sequences, capable of providing for or aiding the transcription and/or translation of the enzyme sequence.

The term “operably linked” refers to a juxtaposition wherein the components described are in a relationship permitting them to function in their intended manner. For instance, a promoter or enhancer is operably linked to a coding sequence the said promoter or enhancer affects the transcription of the coding sequence.

As used herein, the term “promoter” refers to a nucleic acid fragment that functions to control the transcription of one or more genes, located upstream with respect to the direction of transcription of the transcription initiation site of the gene, and is structurally identified by the presence of a binding site for DNA-dependent RNA polymerase, transcription initiation sites and any other DNA sequences known to one of skilled in the art. A “constitutive” promoter is a promoter that is active under most environmental and developmental conditions. An “inducible” promoter is a promoter that is active under environmental or developmental regulation.

The promoter that could be used to achieve the expression of a nucleotide sequence coding for an enzyme according to the present invention, may be not native to the nucleotide sequence coding for the enzyme to be expressed, i.e. a promoter that is heterologous to the nucleotide sequence (coding sequence) to which it is operably linked. The promoter may, however, be homologous, i.e. endogenous, to the host cell.

Promotors are widely available and known to the skilled person. Suitable examples of such promoters include e.g. promoters from glycolytic genes, such as the phosphofructokinase (PFK), triose phosphate isomerase (TPI), glyceraldehyde-3-phosphate dehydrogenase (GPD, TDH3 or GAPDH), pyruvate kinase (PYK), phosphoglycerate kinase (PGK) promoters from yeasts or filamentous fungi; more details about such promoters from yeast may be found in (WO 93/03159). Other useful promoters are ribosomal protein encoding gene promoters, the lactase gene promoter (LAC4), alcohol dehydrogenase promoters (ADHI, ADH4, and the like), and the enolase promoter (ENO). Other promoters, both constitutive and inducible, and enhancers or upstream activating sequences will be known to those of skill in the art. The promoters used in the host cells of the invention may be modified, if desired, to affect their control characteristics. Suitable promoters in this context include both constitutive and inducible natural promoters as well as engineered promoters, which are well known to the person skilled in the art. Suitable promoters in eukaryotic host cells may be GAL7, GAL10, or GAL1, CYC1, HIS3, ADH1, PGL, PH05, GAPDH, ADC1, TRP1, URA3, LEU2, ENO1, TPI1, and AOX1. Other suitable promoters include PDC1, GPD1, PGK1, TEF1, and TDH3.

In a cell of the invention, the 3′-end of the nucleotide acid sequence encoding enzyme preferably is operably linked to a transcription terminator sequence. Preferably the terminator sequence is operable in a host cell of choice, such as e.g. the yeast species of choice. In any case the choice of the terminator is not critical; it may e.g. be from any yeast gene, although terminators may sometimes work if from a non-yeast, eukaryotic, gene. Usually a nucleotide sequence encoding the enzyme comprises a terminator. Preferably, such terminators are combined with mutations that prevent nonsense mediated mRNA decay in the host cell of the invention (see for example: Shirley et al., 2002, Genetics 161:1465-1482).

The transcription termination sequence further preferably comprises a polyadenylation signal.

Optionally, a selectable marker may be present in a nucleic acid construct suitable for use in the invention. As used herein, the term “marker” refers to a gene encoding a trait or a phenotype which permits the selection of, or the screening for, a host cell containing the marker. The marker gene may be an antibiotic resistance gene whereby the appropriate antibiotic can be used to select for transformed cells from among cells that are not transformed. Examples of suitable antibiotic resistance markers include e.g. dihydrofolate reductase, hygromycin-B-phosphotransferase, 3′-O-phosphotransferase II (kanamycin, neomycin and G418 resistance). Antibiotic resistance markers may be most convenient for the transformation of polyploid host cells, Also non-antibiotic resistance markers may be used, such as auxotrophic markers (URA3, TRPI, LEU2) or the S. pombe TPI gene (described by Russell P R, 1985, Gene 40: 125-130). In a preferred embodiment the host cells transformed with the nucleic acid constructs are marker gene free. Methods for constructing recombinant marker gene free microbial host cells are disclosed in EP-A-O 635 574 and are based on the use of bidirectional markers such as the A. nidulans amdS (acetamidase) gene or the yeast URA3 and LYS2 genes. Alternatively, a screenable marker such as Green Fluorescent Protein, lacL, luciferase, chloramphenicol acetyltransferase, beta-glucuronidase may be incorporated into the nucleic acid constructs of the invention allowing to screen for transformed cells.

Optional further elements that may be present in the nucleic acid constructs suitable for use in the invention include, but are not limited to, one or more leader sequences, enhancers, integration factors, and/or reporter genes, intron sequences, centromers, telomers and/or matrix attachment (MAR) sequences. The nucleic acid constructs of the invention may further comprise a sequence for autonomous replication, such as an ARS sequence.

The recombination process may thus be executed with known recombination techniques. Various means are known to those skilled in the art for expression and overexpression of enzymes in a cell of the invention. In particular, an enzyme may be overexpressed by increasing the copy number of the gene coding for the enzyme in the host cell, e.g. by integrating additional copies of the gene in the host cell's genome, by expressing the gene from an episomal multicopy expression vector or by introducing a episomal expression vector that comprises multiple copies of the gene.

Alternatively, overexpression of enzymes in the host cells of the invention may be achieved by using a promoter that is not native to the sequence coding for the enzyme to be overexpressed, i.e. a promoter that is heterologous to the coding sequence to which it is operably linked. Although the promoter preferably is heterologous to the coding sequence to which it is operably linked, it is also preferred that the promoter is homologous, i.e. endogenous to the host cell. Preferably the heterologous promoter is capable of producing a higher steady state level of the transcript comprising the coding sequence (or is capable of producing more transcript molecules, i.e. mRNA molecules, per unit of time) than is the promoter that is native to the coding sequence. Suitable promoters in this context include both constitutive and inducible natural promoters as well as engineered promoters.

The coding sequence used for overexpression of the enzymes mentioned above may preferably be homologous to the host cell of the invention. However, coding sequences that are heterologous to the host cell of the invention may be used.

Overexpression of an enzyme, when referring to the production of the enzyme in a genetically modified cell, means that the enzyme is produced at a higher level of specific enzymatic activity as compared to the unmodified host cell under identical conditions. Usually this means that the enzymatically active protein (or proteins in case of multi-subunit enzymes) is produced in greater amounts, or rather at a higher steady state level as compared to the unmodified host cell under identical conditions. Similarly this usually means that the mRNA coding for the enzymatically active protein is produced in greater amounts, or again rather at a higher steady state level as compared to the unmodified host cell under identical conditions. Preferably in a host cell of the invention, an enzyme to be overexpressed is overexpressed by at least a factor of about 1.1, about 1.2, about 1.5, about 2, about 5, about 10 or about 20 as compared to a strain which is genetically identical except for the genetic modification causing the overexpression. It is to be understood that these levels of overexpression may apply to the steady state level of the enzyme's activity, the steady state level of the enzyme's protein as well as to the steady state level of the transcript coding for the enzyme.

The Adaptive Evolution

The mixed sugar cells are in their preparation subjected to adaptive evolution. A cell of the invention may be adapted to sugar utilisation by selection of mutants, either spontaneous or induced (e.g. by radiation or chemicals), for growth on the desired sugar, preferably as sole carbon source, and more preferably under anaerobic conditions. Selection of mutants may be performed by techniques including serial transfer of cultures as e.g. described by Kuyper et al. (2004, FEMS Yeast Res. 4: 655-664) or by cultivation under selective pressure in a chemostat culture. E.g. in a preferred host cell of the invention at least one of the genetic modifications described above, including modifications obtained by selection of mutants, confer to the host cell the ability to grow on the xylose as carbon source, preferably as sole carbon source, and preferably under anaerobic conditions. Preferably the cell produce essentially no xylitol, e.g. the xylitol produced is below the detection limit or e.g. less than about 5, about 2, about 1, about 0.5, or about 0.3% of the carbon consumed on a molar basis.

Adaptive evolution is also described e.g. in Wisselink H. W. et al, Applied and Environmental Microbiology August 2007, p. 4881-4891

In one embodiment of adaptive evolution a regimen consisting of repeated batch cultivation with repeated cycles of consecutive growth in different media is applied, e.g. three media with different compositions (glucose, xylose, and arabinose; xylose and arabinose. See Wisselink et al. (2009) Applied and Environmental Microbiology, February 2009, p. 907-914.

The Host Cell

The host cell may be any host cell suitable for production of a useful product. A cell of the invention may be any suitable cell, such as a prokaryotic cell, such as a bacterium, or a eukaryotic cell. Typically, the cell will be a eukaryotic cell, for example a yeast or a filamentous fungus.

Yeasts are herein defined as eukaryotic microorganisms and include all species of the subdivision Eumycotina (Alexopoulos, C. J.,1962, In : Introductory Mycology, John Wiley & Sons, Inc. , New York) that predominantly grow in unicellular form.

Yeasts may either grow by budding of a unicellular thallus or may grow by fission of the organism. A preferred yeast as a cell of the invention may belong to the genera Saccharomyces, Kluyveromyces, Candida, Pichia, Schizosaccharomyces, Hansenula, Kloeckera, Schwanniomyces or Yarrowia. Preferably the yeast is one capable of anaerobic fermentation, more preferably one capable of anaerobic alcoholic fermentation.

Filamentous fungi are herein defined as eukaryotic microorganisms that include all filamentous forms of the subdivision Eumycotina. These fungi are characterized by a vegetative mycelium composed of chitin, cellulose, and other complex polysaccharides. The filamentous fungi of the suitable for use as a cell of the present invention are morphologically, physiologically, and genetically distinct from yeasts. Filamentous fungal cells may be advantageously used since most fungi do not require sterile conditions for propagation and are insensitive to bacteriophage infections. Vegetative growth by filamentous fungi is by hyphal elongation and carbon catabolism of most filamentous fungi is obligately aerobic. Preferred filamentous fungi as a host cell of the invention may belong to the genus Aspergillus, Trichoderma, Humicola, Acremoniurra, Fusarium or Penicillium. More preferably, the filamentous fungal cell may be a Aspergillus niger, Aspergillus oryzae, a Penicillium chrysogenum, or Rhizopus oryzae cell.

In one embodiment the host cell may be yeast.

Preferably the host is an industrial host, more preferably an industrial yeast. An industrial host and industrial yeast cell may be defined as follows. The living environments of yeast cells in industrial processes are significantly different from that in the laboratory. Industrial yeast cells must be able to perform well under multiple environmental conditions which may vary during the process. Such variations include change in nutrient sources, pH, ethanol concentration, temperature, oxygen concentration, etc., which together have potential impact on the cellular growth and ethanol production of Saccharomyces cerevisiae. Under adverse industrial conditions, the environmental tolerant strains should allow robust growth and production. Industrial yeast strains are generally more robust towards these changes in environmental conditions which may occur in the applications they are used, such as in the baking industry, brewing industry, wine making and the ethanol industry. Examples of industrial yeast (S. cerevisiae) are Ethanol Red® (Fermentis) Fermiol® (DSM) and Thermosacc® (Lallemand).

In an embodiment the host is inhibitor tolerant. Inhibitor tolerant host cells may be selected by screening strains for growth on inhibitors containing materials, such as illustrated in Kadar et al, Appl. Biochem. Biotechnol. (2007), Vol. 136-140, 847-858, wherein an inhibitor tolerant S. cerevisiae strain ATCC 26602 was selected.

Preferably the host cell is industrial and inhibitor tolerant.

araA, araB and araD Genes

A cell of the invention is capable of using arabinose. A cell of the invention is therefore, be capable of converting L-arabinose into L-ribulose and/or xylulose 5-phosphate and/or into a desired fermentation product, for example one of those mentioned herein.

Organisms, for example S. cerevisiae strains, able to produce ethanol from L-arabinose may be produced by modifying a cell introducing the araA (L-arabinose isomerase), araB (L-ribulokinase) and araD (L-ribulose-5-P4-epimerase) genes from a suitable source. Such genes may be introduced into a cell of the invention is order that it is capable of using arabinose. Such an approach is given is described in WO2003/095627. araA, araB and araD genes from Lactobacillus plantanum may be used and are disclosed in WO2008/041840. The araA gene from Bacillus subtilis and the araB and araD genes from Escherichia coli may be used and are disclosed in EP1499708.

PPP-Genes

A cell of the invention may comprise one ore more genetic modifications that increases the flux of the pentose phosphate pathway. In particular, the genetic modification(s) may lead to an increased flux through the non-oxidative part pentose phosphate pathway. A genetic modification that causes an increased flux of the non-oxidative part of the pentose phosphate pathway is herein understood to mean a modification that increases the flux by at least a factor of about 1.1, about 1.2, about 1.5, about 2, about 5, about 10 or about 20 as compared to the flux in a strain which is genetically identical except for the genetic modification causing the increased flux. The flux of the non-oxidative part of the pentose phosphate pathway may be measured by growing the modified host on xylose as sole carbon source, determining the specific xylose consumption rate and subtracting the specific xylitol production rate from the specific xylose consumption rate, if any xylitol is produced. However, the flux of the non-oxidative part of the pentose phosphate pathway is proportional with the growth rate on xylose as sole carbon source, preferably with the anaerobic growth rate on xylose as sole carbon source. There is a linear relation between the growth rate on xylose as sole carbon source (μ_max) and the flux of the non-oxidative part of the pentose phosphate pathway. The specific xylose consumption rate (Q_s) is equal to the growth rate (μ) divided by the yield of biomass on sugar (Y_xs) because the yield of biomass on sugar is constant (under a given set of conditions: anaerobic, growth medium, pH, genetic background of the strain, etc.; i.e. Q_s=μ/Y_xs). Therefore the increased flux of the non-oxidative part of the pentose phosphate pathway may be deduced from the increase in maximum growth rate under these conditions unless transport (uptake is limiting).

One or more genetic modifications that increase the flux of the pentose phosphate pathway may be introduced in the host cell in various ways. These including e.g. achieving higher steady state activity levels of xylulose kinase and/or one or more of the enzymes of the non-oxidative part pentose phosphate pathway and/or a reduced steady state level of unspecific aldose reductase activity. These changes in steady state activity levels may be effected by selection of mutants (spontaneous or induced by chemicals or radiation) and/or by recombinant DNA technology e.g. by overexpression or inactivation, respectively, of genes encoding the enzymes or factors regulating these genes.

In a preferred host cell, the genetic modification comprises overexpression of at least one enzyme of the (non-oxidative part) pentose phosphate pathway. Preferably the enzyme is selected from the group consisting of the enzymes encoding for ribulose-5- phosphate isomerase, ribulose-5-phosphate epimerase, transketolase and transaldolase. Various combinations of enzymes of the (non-oxidative part) pentose phosphate pathway may be overexpressed. E.g. the enzymes that are overexpressed may be at least the enzymes ribulose-5-phosphate isomerase and ribulose-5-phosphate epimerase; or at least the enzymes ribulose-5-phosphate isomerase and transketolase; or at least the enzymes ribulose-5-phosphate isomerase and transaldolase; or at least the enzymes ribulose-5-phosphate epimerase and transketolase; or at least the enzymes ribulose-5-phosphate epimerase and transaldolase; or at least the enzymes transketolase and transaldolase; or at least the enzymes ribulose-5-phosphate epimerase, transketolase and transaldolase; or at least the enzymes ribulose-5-phosphate isomerase, transketolase and transaldolase; or at least the enzymes ribulose-5-phosphate isomerase, ribulose-5-phosphate epimerase, and transaldolase; or at least the enzymes ribulose-5-phosphate isomerase, ribulose-5-phosphate epimerase, and transketolase. In one embodiment of the invention each of the enzymes ribulose-5-phosphate isomerase, ribulose-5-phosphate epimerase, transketolase and transaldolase are overexpressed in the host cell. More preferred is a host cell in which the genetic modification comprises at least overexpression of both the enzymes transketolase and transaldolase as such a host cell is already capable of anaerobic growth on xylose. In fact, under some conditions host cells overexpressing only the transketolase and the transaldolase already have the same anaerobic growth rate on xylose as do host cells that overexpress all four of the enzymes, i.e. the ribulose-5-phosphate isomerase, ribulose-5-phosphate epimerase, transketolase and transaldolase. Moreover, host cells overexpressing both of the enzymes ribulose-5-phosphate isomerase and ribulose-5-phosphate epimerase are preferred over host cells overexpressing only the isomerase or only the epimerase as overexpression of only one of these enzymes may produce metabolic imbalances.

The enzyme “ribulose 5-phosphate epimerase” (EC 5.1.3.1) is herein defined as an enzyme that catalyses the epimerisation of D-xylulose 5-phosphate into D-ribulose 5-phosphate and vice versa. The enzyme is also known as phosphoribulose epimerase; erythrose-4-phosphate isomerase; phosphoketopentose 3-epimerase; xylulose phosphate 3-epimerase; phosphoketopentose epimerase; ribulose 5-phosphate 3- epimerase; D-ribulose phosphate-3-epimerase; D-ribulose 5-phosphate epimerase; D- ribulose-5-P 3-epimerase; D-xylulose-5-phosphate 3-epimerase; pentose-5-phosphate 3-epimerase; or D-ribulose-5-phosphate 3-epimerase. A ribulose 5-phosphate epimerase may be further defined by its amino acid sequence. Likewise a ribulose 5-phosphate epimerase may be defined by a nucleotide sequence encoding the enzyme as well as by a nucleotide sequence hybridising to a reference nucleotide sequence encoding a ribulose 5-phosphate epimerase. The nucleotide sequence encoding for ribulose 5-phosphate epimerase is herein designated RPE1.

The enzyme “ribulose 5-phosphate isomerase” (EC 5.3.1.6) is herein defined as an enzyme that catalyses direct isomerisation of D-ribose 5-phosphate into D-ribulose 5-phosphate and vice versa. The enzyme is also known as phosphopentosisomerase; phosphoriboisomerase; ribose phosphate isomerase; 5-phosphoribose isomerase; D-ribose 5-phosphate isomerase; D-ribose-5-phosphate ketol-isomerase; or D-ribose-5-phosphate aldose-ketose-isomerase. A ribulose 5-phosphate isomerase may be further defined by its amino acid sequence. Likewise a ribulose 5-phosphate isomerase may be defined by a nucleotide sequence encoding the enzyme as well as by a nucleotide sequence hybridising to a reference nucleotide sequence encoding a ribulose 5-phosphate isomerase. The nucleotide sequence encoding for ribulose 5-phosphate isomerase is herein designated RPI1.

The enzyme “transketolase” (EC 2.2.1.1) is herein defined as an enzyme that catalyses the reaction: D-ribose 5-phosphate+D-xylulose 5-phosphate<−>sedoheptulose 7-phosphate +D-glyceraldehyde 3-phosphate and vice versa. The enzyme is also known as glycolaldehydetransferase or sedoheptulose-7-phosphate:D-glyceraldehyde-3-phosphate glycolaldehydetransferase. A transketolase may be further defined by its amino acid. Likewise a transketolase may be defined by a nucleotide sequence encoding the enzyme as well as by a nucleotide sequence hybridising to a reference nucleotide sequence encoding a transketolase. The nucleotide sequence encoding for transketolase is herein designated TKL1.

The enzyme “transaldolase” (EC 2.2.1.2) is herein defined as an enzyme that catalyses the reaction: sedoheptulose 7-phosphate+D-glyceraldehyde 3-phosphate<−>D-erythrose 4-phosphate+D-fructose 6-phosphate and vice versa. The enzyme is also known as dihydroxyacetonetransferase; dihydroxyacetone synthase; formaldehyde transketolase; or sedoheptulose-7-phosphate :D-glyceraldehyde-3-phosphate glyceronetransferase. A transaldolase may be further defined by its amino acid sequence. Likewise a transaldolase may be defined by a nucleotide sequence encoding the enzyme as well as by a nucleotide sequence hybridising to a reference nucleotide sequence encoding a transaldolase. The nucleotide sequence encoding for transketolase from is herein designated TAL1.

Xylose Isomerase Gene

The presence of the nucleotide sequence encoding a xylose isomerase confers on the cell the ability to isomerise xylose to xylulose. According to the invention, two to fifteen copies of one or more xylose isomerase gene are introduced into the host cell.

In one embodiment, the two to fifteen copies of one or more xylose isomerase gene are introduced into the host cell.

A “xylose isomerase” (EC 5.3.1.5) is herein defined as an enzyme that catalyses the direct isomerisation of D-xylose into D-xylulose and/or vice versa. The enzyme is also known as a D-xylose ketoisomerase. A xylose isomerase herein may also be capable of catalysing the conversion between D-glucose and D-fructose (and accordingly may therefore be referred to as a glucose isomerase). A xylose isomerase herein may require a bivalent cation, such as magnesium, manganese or cobalt as a cofactor.

Accordingly, a cell of the invention is capable of isomerising xylose to xylulose. The ability of isomerising xylose to xylulose is conferred on the host cell by transformation of the host cell with a nucleic acid construct comprising a nucleotide sequence encoding a defined xylose isomerase. A cell of the invention isomerises xylose into xylulose by the direct isomerisation of xylose to xylulose. This is understood to mean that xylose is isomerised into xylulose in a single reaction catalysed by a xylose isomerase, as opposed to two step conversion of xylose into xylulose via a xylitol intermediate as catalysed by xylose reductase and xylitol dehydrogenase, respectively.

A unit (U) of xylose isomerase activity may herein be defined as the amount of enzyme producing 1 nmol of xylulose per minute, under conditions as described by Kuyper et al. (2003, FEMS Yeast Res. 4: 69-78). The Xylose isomerise gene may have various origin, such as for example Pyromyces sp. as disclosed in WO2006/009434. Other suitable origins are Bacteroides, in particular Bacteroides unifomis as described in PCT/EP2009/52623, Bacillus, in particular Bacillus stearothermophilus as described in PCT/EP2009/052625, Thermotoga, in particular Thermotoga maritima, as described in PCT/EP2009/052621 and Clostridium, in particular Clostridium cellulolyticum as described in PCT/EP2009/052620.

XKS1 Gene

A cell of the invention may comprise one or more genetic modifications that increase the specific xylulose kinase activity. Preferably the genetic modification or modifications causes overexpression of a xylulose kinase, e.g. by overexpression of a nucleotide sequence encoding a xylulose kinase. The gene encoding the xylulose kinase may be endogenous to the host cell or may be a xylulose kinase that is heterologous to the host cell. A nucleotide sequence used for overexpression of xylulose kinase in the host cell of the invention is a nucleotide sequence encoding a polypeptide with xylulose kinase activity.

The enzyme “xylulose kinase” (EC 2.7.1.17) is herein defined as an enzyme that catalyses the reaction ATP+D-xylulose=ADP+D-xylulose 5-phosphate. The enzyme is also known as a phosphorylating xylulokinase, D-xylulokinase or ATP :D-xylulose 5-phosphotransferase. A xylulose kinase of the invention may be further defined by its amino acid sequence. Likewise a xylulose kinase may be defined by a nucleotide sequence encoding the enzyme as well as by a nucleotide sequence hybridising to a reference nucleotide sequence encoding a xylulose kinase.

In a cell of the invention, a genetic modification or modifications that increase(s) the specific xylulose kinase activity may be combined with any of the modifications increasing the flux of the pentose phosphate pathway as described above. This is not, however, essential.

Thus, a host cell of the invention may comprise only a genetic modification or modifications that increase the specific xylulose kinase activity. The various means available in the art for achieving and analysing overexpression of a xylulose kinase in the host cells of the invention are the same as described above for enzymes of the pentose phosphate pathway. Preferably in the host cells of the invention, a xylulose kinase to be overexpressed is overexpressed by at least a factor of about 1.1, about 1.2, about 1.5, about 2, about 5, about 10 or about 20 as compared to a strain which is genetically identical except for the genetic modification(s) causing the overexpression. It is to be understood that these levels of overexpression may apply to the steady state level of the enzyme's activity, the steady state level of the enzyme's protein as well as to the steady state level of the transcript coding for the enzyme.

Aldose Reductase (GRE3) Gene Deletion

A cell of the invention may comprise one or more genetic modifications that reduce unspecific aldose reductase activity in the host cell. Preferably, unspecific aldose reductase activity is reduced in the host cell by one or more genetic modifications that reduce the expression of or inactivates a gene encoding an unspecific aldose reductase. Preferably, the genetic modification(s) reduce or inactivate the expression of each endogenous copy of a gene encoding an unspecific aldose reductase in the host cell (herein called GRE3 deletion). Host cells may comprise multiple copies of genes encoding unspecific aldose reductases as a result of di-, poly- or aneu-ploidy, and/or the host cell may contain several different (iso)enzymes with aldose reductase activity that differ in amino acid sequence and that are each encoded by a different gene. Also in such instances preferably the expression of each gene that encodes an unspecific aldose reductase is reduced or inactivated. Preferably, the gene is inactivated by deletion of at least part of the gene or by disruption of the gene, whereby in this context the term gene also includes any non-coding sequence up- or down-stream of the coding sequence, the (partial) deletion or inactivation of which results in a reduction of expression of unspecific aldose reductase activity in the host cell.

A nucleotide sequence encoding an aldose reductase whose activity is to be reduced in the host cell of the invention is a nucleotide sequence encoding a polypeptide with aldose reductase activity.

Thus, a host cell of the invention comprising only a genetic modification or modifications that reduce(s) unspecific aldose reductase activity in the host cell is specifically included in the invention.

The enzyme “aldose reductase” (EC 1.1.1.21) is herein defined as any enzyme that is capable of reducing xylose or xylulose to xylitol. In the context of the present invention an aldose reductase may be any unspecific aldose reductase that is native (endogenous) to a host cell of the invention and that is capable of reducing xylose or xylulose to xylitol. Unspecific aldose reductases catalyse the reaction:

aldose+NAD(P)H+H⁺alditol+NAD(P)⁺

The enzyme has a wide specificity and is also known as aldose reductase; polyol dehydrogenase (NADP⁺); alditol:NADP oxidoreductase; alditol:NADP⁺1-oxidoreductase; NADPH-aldopentose reductase; or NADPH-aldose reductase.

A particular example of such an unspecific aldose reductase that is endogenous to S. cerevisiae and that is encoded by the GRE3 gene (Traff et al., 2001, Appl. Environ. Microbiol. 67: 5668-74). Thus, an aldose reductase of the invention may be further defined by its amino acid sequence. Likewise an aldose reductase may be defined by the nucleotide sequences encoding the enzyme as well as by a nucleotide sequence hybridising to a reference nucleotide sequence encoding an aldose reductase.

Bioproducts Production

Over the years suggestions have been made for the introduction of various organisms for the production of bio-ethanol from crop sugars. In practice, however, all major bio-ethanol production processes have continued to use the yeasts of the genus Saccharomyces as ethanol producer. This is due to the many attractive features of Saccharomyces species for industrial processes, i.e., a high acid-, ethanol-and osmo-tolerance, capability of anaerobic growth, and of course its high alcoholic fermentative capacity. Preferred yeast species as host cells include S. cerevisiae, S. bulderi, S. barnetti, S. exiguus, S. uvarum, S. diastaticus, K. lactis, K. marxianus or K fragilis.

A cell of the invention may be able to convert plant biomass, celluloses, hemicelluloses, pectins, rhamnose, galactose, frucose, maltose, maltodextrines, ribose, ribulose, or starch, starch derivatives, sucrose, lactose and glycerol, for example into fermentable sugars. Accordingly, a cell of the invention may express one or more enzymes such as a cellulase (an endocellulase or an exocellulase), a hemicellulase (an endo- or exo-xylanase or arabinase) necessary for the conversion of cellulose into glucose monomers and hemicellulose into xylose and arabinose monomers, a pectinase able to convert pectins into glucuronic acid and galacturonic acid or an amylase to convert starch into glucose monomers.

The cell further preferably comprises those enzymatic activities required for conversion of pyruvate to a desired fermentation product, such as ethanol, butanol, lactic acid, 3 -hydroxy- propionic acid, acrylic acid, acetic acid, succinic acid, citric acid, fumaric acid, malic acid, itaconic acid, an amino acid, 1,3- propane-diol, ethylene, glycerol, a β-lactam antibiotic or a cephalosporin.

A preferred cell of the invention is a cell that is naturally capable of alcoholic fermentation, preferably, anaerobic alcoholic fermentation. A cell of the invention preferably has a high tolerance to ethanol, a high tolerance to low pH (i.e. capable of growth at a pH lower than about 5, about 4, about 3, or about 2.5) and towards organic acids like lactic acid, acetic acid or formic acid and/or sugar degradation products such as furfural and hydroxy- methylfurfural and/or a high tolerance to elevated temperatures.

Any of the above characteristics or activities of a cell of the invention may be naturally present in the cell or may be introduced or modified by genetic modification.

A cell of the invention may be a cell suitable for the production of ethanol. A cell of the invention may, however, be suitable for the production of fermentation products other than ethanol. Such non-ethanolic fermentation products include in principle any bulk or fine chemical that is producible by a eukaryotic microorganism such as a yeast or a filamentous fungus.

Such fermentation products may be, for example, butanol, lactic acid, 3 -hydroxy-propionic acid, acrylic acid, acetic acid, succinic acid, citric acid, malic acid, fumaric acid, itaconic acid, an amino acid, 1,3-propane-diol, ethylene, glycerol, a β-lactam antibiotic or a cephalosporin. A preferred cell of the invention for production of non-ethanolic fermentation products is a host cell that contains a genetic modification that results in decreased alcohol dehydrogenase activity.

In a further aspect the invention relates to fermentation processes in which the cells of the invention are used for the fermentation of a carbon source comprising a source of xylose, such as xylose. In addition to a source of xylose the carbon source in the fermentation medium may also comprise a source of glucose. The source of xylose or glucose may be xylose or glucose as such or may be any carbohydrate oligo- or polymer comprising xylose or glucose units, such as e.g. lignocellulose, xylans, cellulose, starch and the like. For release of xylose or glucose units from such carbohydrates, appropriate carbohydrases (such as xylanases, glucanases, amylases and the like) may be added to the fermentation medium or may be produced by the cell. In the latter case the cell may be genetically engineered to produce and excrete such carbohydrases. An additional advantage of using oligo- or polymeric sources of glucose is that it enables to maintain a low(er) concentration of free glucose during the fermentation, e.g. by using rate-limiting amounts of the carbohydrases. This, in turn, will prevent repression of systems required for metabolism and transport of non-glucose sugars such as xylose.

In a preferred process the cell ferments both the xylose and glucose, preferably simultaneously in which case preferably a cell is used which is insensitive to glucose repression to prevent diauxic growth. In addition to a source of xylose (and glucose) as carbon source, the fermentation medium will further comprise the appropriate ingredient required for growth of the cell. Compositions of fermentation media for growth of microorganisms such as yeasts are well known in the art. The fermentation process is a process for the production of a fermentation product such as e.g. ethanol, butanol, lactic acid, 3 -hydroxy-propionic acid, acrylic acid, acetic acid, succinic acid, citric acid, malic acid, fumaric acid, itaconic acid, an amino acid, 1,3-propane-diol, ethylene, glycerol, a β-lactam antibiotic, such as Penicillin G or Penicillin V and fermentative derivatives thereof, and a cephalosporin.

Bioproducts Production

Over the years suggestions have been made for the introduction of various organisms for the production of bio-ethanol from crop sugars. In practice, however, all major bio-ethanol production processes have continued to use the yeasts of the genus Saccharomyces as ethanol producer. This is due to the many attractive features of Saccharomyces species for industrial processes, i.e., a high acid-, ethanol-and osmo-tolerance, capability of anaerobic growth, and of course its high alcoholic fermentative capacity. Preferred yeast species as host cells include S. cerevisiae, S. bulderi, S. barnetti, S. exiguus, S. uvarum, S. diastaticus, K. lactis, K. marxianus or K fragilis.

A mixed sugar cell may be a cell suitable for the production of ethanol. A mixed sugar cell may, however, be suitable for the production of fermentation products other than ethanol. Such non-ethanolic fermentation products include in principle any bulk or fine chemical that is producible by a eukaryotic microorganism such as a yeast or a filamentous fungus.

A mixed sugar cell may be used for production of non-ethanolic fermentation products is a host cell that contains a genetic modification that results in decreased alcohol dehydrogenase activity.

In an embodiment the mixed sugar cell may be used in a process wherein sugars originating from lignocellulose are converted into ethanol.

Liqnocellulose

Lignocellulose, which may be considered as a potential renewable feedstock, generally comprises the polysaccharides cellulose (glucans) and hemicelluloses (xylans, heteroxylans and xyloglucans). In addition, some hemicellulose may be present as glucomannans, for example in wood-derived feedstocks. The enzymatic hydrolysis of these polysaccharides to soluble sugars, including both monomers and multimers, for example glucose, cellobiose, xylose, arabinose, galactose, fructose, mannose, rhamnose, ribose, galacturonic acid, glucoronic acid and other hexoses and pentoses occurs under the action of different enzymes acting in concert.

In addition, pectins and other pectic substances such as arabinans may make up considerably proportion of the dry mass of typically cell walls from non-woody plant tissues (about a quarter to half of dry mass may be pectins).

Pretreatment

Before enzymatic treatment, the lignocellulosic material may be pretreated. The pretreatment may comprise exposing the lignocellulosic material to an acid, a base, a solvent, heat, a peroxide, ozone, mechanical shredding, grinding, milling or rapid depressurization, or a combination of any two or more thereof. This chemical pretreatment is often combined with heat-pretreatment, e.g. between 150-220° C. for 1 to 30 minutes.

Enzymatic Hydrolysis

The pretreated material is commonly subjected to enzymatic hydrolysis to release sugars that may be fermented according to the invention. This may be executed with conventional methods, e.g. contacting with cellulases, for instance cellobiohydrolase(s), endoglucanase(s), beta-glucosidase(s) and optionally other enzymes. The conversion with the cellulases may be executed at ambient temperatures or at higher tempatures, at a reaction time to release sufficient amounts of sugar(s). The result of the enzymatic hydrolysis is hydrolysis product comprising C5/C6 sugars, herein designated as the sugar composition.

Fermentation

The fermentation process may be an aerobic or an anaerobic fermentation process. An anaerobic fermentation process is herein defined as a fermentation process run in the absence of oxygen or in which substantially no oxygen is consumed, preferably less than about 5, about 2.5 or about 1 mmol/L/h, more preferably 0 mmol/L/h is consumed (i.e. oxygen consumption is not detectable), and wherein organic molecules serve as both electron donor and electron acceptors. In the absence of oxygen, NADH produced in glycolysis and biomass formation, cannot be oxidised by oxidative phosphorylation. To solve this problem many microorganisms use pyruvate or one of its derivatives as an electron and hydrogen acceptor thereby regenerating NAD⁺.

Thus, in a preferred anaerobic fermentation process pyruvate is used as an electron (and hydrogen acceptor) and is reduced to fermentation products such as ethanol, butanol, lactic acid, 3 -hydroxy-propionic acid, acrylic acid, acetic acid, succinic acid, citric acid, malic acid, fumaric acid, an amino acid, 1,3-propane-diol, ethylene, glycerol, a β-lactam antibiotic and a cephalosporin.

The fermentation process is preferably run at a temperature that is optimal for the cell. Thus, for most yeasts or fungal host cells, the fermentation process is performed at a temperature which is less than about 42° C., preferably less than about 38° C. For yeast or filamentous fungal host cells, the fermentation process is preferably performed at a temperature which is lower than about 35, about 33, about 30 or about 28° C. and at a temperature which is higher than about 20, about 22, or about 25° C.

The ethanol yield on xylose and/or glucose in the process preferably is at least about 50, about 60, about 70, about 80, about 90, about 95 or about 98%. The ethanol yield is herein defined as a percentage of the theoretical maximum yield.

The invention also relates to a process for producing a fermentation product.,

The fermentation processes may be carried out in batch, fed-batch or continuous mode. A separate hydrolysis and fermentation (SHF) process or a simultaneous saccharification and fermentation (SSF) process may also be applied. A combination of these fermentation process modes may also be possible for optimal productivity.

The fermentation process according to the present invention may be run under aerobic and anaerobic conditions. Preferably, the process is carried out under micro-aerophilic or oxygen limited conditions.

An anaerobic fermentation process is herein defined as a fermentation process run in the absence of oxygen or in which substantially no oxygen is consumed, preferably less than about 5, about 2.5 or about 1 mmol/L/h, and wherein organic molecules serve as both electron donor and electron acceptors.

An oxygen-limited fermentation process is a process in which the oxygen consumption is limited by the oxygen transfer from the gas to the liquid. The degree of oxygen limitation is determined by the amount and composition of the ingoing gasflow as well as the actual mixing/mass transfer properties of the fermentation equipment used. Preferably, in a process under oxygen-limited conditions, the rate of oxygen consumption is at least about 5.5, more preferably at least about 6, such as at least 7 mmol/L/h. A process of the invention comprises recovery of the fermentation product.

Fermentation Product

The fermentation product of the invention may be any useful product. In one embodiment, it is a product selected from the group consisting of ethanol, n-butanol, isobutanol, lactic acid, 3-hydroxy-propionic acid, acrylic acid, acetic acid, succinic acid, fumaric acid, malic acid, itaconic acid, maleic acid, citric acid, adipic acid, an amino acid, such as lysine, methionine, tryptophan, threonine, and aspartic acid, 1,3-propane-diol, ethylene, glycerol, a β-lactam antibiotic and a cephalosporin, vitamins, pharmaceuticals, animal feed supplements, specialty chemicals, chemical feedstocks, plastics, solvents, fuels, including biofuels and biogas or organic polymers, and an industrial enzyme, such as a protease, a cellulase, an amylase, a glucanase, a lactase, a lipase, a lyase, an oxidoreductases, a transferase or a xylanase. For example the fermentation products may be produced by cells according to the invention, following additionally prior art cell preparation methods and fermentation processes, which examples however should herein not be construed as limiting. For example. n-butanol may be produced by cells as described in WO2008121701 or WO2008086124; lactic acid as described in US2011053231 or US2010137551; 3-hydroxy-propionic acid as described in WO2010010291; acrylic acid as described in WO2009153047. An overview of all kind of fermentation products is and how they can be prepared in yeast is given in Romanos, Mass., et al, “Foreign Gene Expression in Yeast:: a Review”, yeast vol. 8: 423-488 (1992), see e.g. table 7. The production of glycerol, 1,3 propane diol, organic acids, and vitamin C (table 2) is described in Negvoigt, E., Microbiol. Mol. Biol. Rev. 72(3) 379-412 (2008). Giddijala, L., et al, BMC Biotechnology 8(29) (2008) describes production of beta-lactams in yeast.

Recovery of the Fermentation Product

For the recovery of the fermenation product existing technologies are used. For different fermentation products different recovery processes are appropriate. Existing methods of recovering ethanol from aqueous mixtures commonly use fractionation and adsorption techniques. For example, a beer still can be used to process a fermented product, which contains ethanol in an aqueous mixture, to produce an enriched ethanol-containing mixture that is then subjected to fractionation (e.g., fractional distillation or other like techniques). Next, the fractions containing the highest concentrations of ethanol can be passed through an adsorber to remove most, if not all, of the remaining water from the ethanol.

The following examples illustrate the invention:

EXAMPLES

Unless indicated otherwise, the methods described in here are standard biochemical techniques. Examples of suitable general methodology textbooks include Sambrook et al., Molecular Cloning, a Laboratory Manual (1989) and Ausubel et al., Current Protocols in Molecular Biology (1995), John Wiley & Sons, Inc.

Medium Composition

Growth experiments: Saccharomyces cerevisiae strains are grown on medium having the following composition: 0.67% (w/v) yeast nitrogen base or synthetic medium (Verduyn et al., Yeast 8:501-517, 1992) and glucose, arabinose, galactose or xylose, or a combination of these substrates, at varying concentrations (see examples for specific details; concentrations in % weight over volume (w/v)). For agar plates the medium is supplemented with 2% (w/v) bacteriological agar.

Ethanol Production

Pre-cultures were prepared by inoculating 25 ml Verduyn-medium (Verduyn et al.,

Yeast 8:501-517, 1992) supplemented with 2% glucose in a 100 ml shake flask with a frozen stock culture or a single colony from agar plate. After incubation at 30° C. in an orbital shaker (280 rpm) for approximately 24 hours, this culture was harvested and used for determination of CO₂evolution and ethanol production experiments.

Cultivations for ethanol production were performed at 30° C. in 100 ml synthetic model medium (Verduyn-medium (Verduyn et al., Yeast 8:501-517, 1992) with 5% glucose, 5% xylose, 3.5% arabinose and 1% galactose) in the BAM (Biological Activity Monitor, Halotec, The Netherlands). The pH of the medium was adjusted to 4.2 with 2 M NaOH/H₂SO4 prior to sterilisation. The synthetic medium for anaerobic cultivation was supplemented with 0.01 g I⁻¹ergosterol and 0.42 g I⁻¹Tween 80 dissolved in ethanol (Andreasen and Stier. J. Cell Physiol. 41:23-36, 1953; and Andreasen and Stier. J. Cell Physiol. 43:271-281, 1954). The medium was inoculated at an initial OD600 of approximately 2. Cultures were stirred by a magnetic stirrer. Anaerobic conditions developed rapidly during fermentation as the culture was not aerated. CO₂production was monitored constantly. Sugar conversion and product formation (ethanol, glycerol) was analyzed by NMR. Growth was monitored by following optical density of the culture at 600 nm on a LKB Ultrospec K spectrophotometer.

Transformation of S. Cerevisiae

Transformation of S. cerevisiae was done as described by Gietz and Woods (2002; Transformation of the yeast by the LiAc/SS carrier DNA/PEG method. Methods in Enzymology 350: 87-96).

Colony PCR

A single colony isolate was picked with a plastic toothpick and resuspended in 50 μl milliQ water. The sample was incubated for 10 minutes at 99° C. 5 μl of the incubated sample was used as a template for the PCR reaction, using Phusion® DNA polymerase (Finnzymes) according to the instructions provided by the supplier.

PCR Reaction Conditions:

step 1 3′ 98° C. step 2 10″ 98° C. step 3 15″ 58° C. repeat step 2 to 4 for 30 cycles step 4 30″ 72° C. step 5 4′ 72° C. step 6 30″ 20° C.

Chromosomal DNA Isolation

Yeast cells were grown in YEP-medium containing 2% glucose, in a rotary shaker (overnight, at 30° C. and 280 rpm). 1.5 ml of these cultures were transferred to an Eppendorf tube and centrifuged for 1 minute at maximum speed. The supernatant was decanted and the pellet was resuspended in 200 μl of YCPS (0.1% SB3-14 (Sigma Aldrich, the Netherlands) in 10 mM Tris.HCl pH 7.5; 1 mM EDTA) and 1 μl RNase (20 mg/ml RNase A from bovine pancreas, Sigma, the Netherlands). The cell suspension was incubated for 10 minutes at 65° C. The suspension was centrifuged in an Eppendorf centrifuge for 1 minute at 7000 rpm. The supernatant was discarded. The pellet was carefully dissolved in 200 μl CLS (25 mM EDTA, 2% SDS) and 1 μl RNase A. After incubation at 65° C. for 10 minutes, the suspension was cooled on ice. After addition of 70 μl PPS (10M ammonium acetate) the solutions were thoroughly mixed on a Vortex mixer. After centrifugation (5 minutes in Eppendorf centrifuge at maximum speed), the supernatant was mixed with 200 μl ice-cold isopropanol. The DNA readily precipitated and was pelleted by centrifugation (5 minutes, maximum speed). The pellet was washed with 400 μl ice-cold 70% ethanol. The pellet was dried at room temperature and dissolved in 50 μl TE (10 mM Tris.HCl pH7.5, 1 mM EDTA).

Example 1 Construction of Strain BIE104A2P1

1.1 Construction of an Expression Vector Containing the Genes for Arabinose Pathway

Plasmid pPWT018, as set out in FIG. 2, was constructed as follows: vector pPWT006 (FIG. 1, consisting of a SIT2-locus (Gottlin-Ninfa and Kaback (1986) Molecular and Cell Biology vol. 6, no. 6, 2185-2197) and the markers allowing for selection of transformants on the antibiotic G418 and the ability to grow on acetamide was digested with the restriction enzymes BsiWI and MluI. The kanMX-marker, conferring resistance to G418, was isolated from p427TEF (Dualsystems Biotech) and a fragment containing the amdS-marker has been described in the literature (Swinkels, B. W., Noordermeer, A. C. M. and Renniers, A. C. H. M (1995) The use of the amdS cDNA of Aspergillus nidulans as a dominant, bidirectional selectable marker for yeast transformation. Yeast Volume 11, Issue 1995A, page S579; and US 6051431). The genes encoding arabinose isomerase (araA), L-ribulokinase (araB) and L-ribulose-5-phosphate-4-epimerase (araD) from Lactobacillus plantarum, as disclosed in patent application WO2008/041840, were synthesized by BaseClear (Leiden, the Netherlands). One large fragment was synthesized, harbouring the three arabinose-genes mentioned above, under control of (or operable linked to) strong promoters from S. cerevisiae, i.e. the TDH3-promoter controlling the expression of the araA-gene, the ENO1-promoter controlling the araB-gene and the PGI1-promoter controlling the araD-gene. This fragment was surrounded by the unique restriction enzymes Acc65I and MluI. Cloning of this fragment into pPWT006 digested with MluI and BsiWI, resulted in plasmid pPWT018 (FIG. 2). The sequence of plasmid pPWT018 is set out in SEQ ID 1.

1.2 Yeast Transformation

CEN.PK113-7D (MATa URA3 HIS3 LEU2 TRP1 MAL2-8 SUC2) was transformed with plasmid pPWT018, which was previously linearized with SfiI (New England Biolabs), according to the instructions of the supplier. A synthetic SfiI-site was designed in the 5′-flank of the SIT2-gene (see FIG. 2). Transformation mixtures were plated on YPD-agar (per liter: 10 grams of yeast extract, 20 grams per liter peptone, 20 grams per liter dextrose, 20 grams of agar) containing 100 μg G418 (Sigma Aldrich) per ml. After two to four days, colonies appeared on the plates, whereas the negative control (i.e. no addition of DNA in the transformation experiment) resulted in blank YPD/G418-plates. The integration of plasmid pPWT018 is directed to the SIT2-locus. Transformants were characterized using PCR and Southern blotting techniques.

PCR reactions, which are indicative for the correct integration of one copy of plasmid pPWT018, were performed with the primers indicated by SEQ ID 2 and 3, and 3 and 4. With the primer pairs of SEQ ID 2 and 3, the correct integration at the SIT2-locus was checked. If plasmid pPWT018 was integrated in multiple copies (head-to-tail integration), the primer pair of SEQ ID 3 and 4 will give a PCR-product. If the latter PCR product is absent, this is indicative for one copy integration of pPWT018. A strain in which one copy of plasmid pPWT018 was integrated in the SIT2-locus was designated BIE104R2.

1.3 Marker Rescue

In order to be able to transform the yeast strain with other constructs using the same selection markers, it is necessary to remove the selectable markers. The design of plasmid pPWT018 was such, that upon integration of pPWT018 in the chromosome, homologous sequences are in close proximity of each other. This design allows the selectable markers to be lost by spontaneous intramolecular recombination of these homologous regions.

Upon vegetative growth, intramolecular recombination will take place, although at low frequency. The frequency of this recombination depends on the length of the homology and the locus in the genome (unpublished results). Upon sequential transfer of a subfraction of the culture to fresh medium, intramolecular recombinants will accumulate in time.

To this end, strain BIE104R2 was cultured in YPD-medium (per liter: 10 grams of yeast extract, 20 grams per liter peptone, 20 grams per liter dextrose), starting from a single colony isolate. 25 μl of an overnight culture was used to inoculate fresh YPD medium. After at least five of such serial transfers, the optical density of the culture was determined and cells were diluted to a concentration of approximately 5000 per ml. 100 μl of the cell suspension was plated on Yeast Carbon Base medium (Difco) containing 30 mM KPi (pH 6.8), 0.1% (NH4)2SO4, 40 mM fluoro-acetamide (Amersham) and 1.8% agar (Difco). Cells identical to cells of strain BIE104R2, i.e. without intracellular recombination, still contain the amdS-gene. To those cells, fluoro-acetamide is toxic. These cells will not be able to grow and will not form colonies on a medium containing fluoro-acetamide. However, if intramolecular recombination has occurred, BIE104R2-variants that have lost the selectable markers will be able to grow on the fluoro-acetamide medium, since they are unable to convert fluoro-acetamide into growth inhibiting compounds. Those cells will form colonies on this agar medium.

The thus obtained fluoro-acetamide resistant colonies were subjected to PCR analysis using primers of SEQ ID 2 and 3, and 4 and 5. Primers of SEQ ID 2 and 3 will give a band if recombination of the selectable markers has taken place as intended. As a result, the cassette with the genes araA, araB and araD under control of the strong yeast promoters have been integrated in the SIT2-locus of the genome of the host strain. In that case, a PCR reaction using primers of SEQ ID 4 and 5 should not result in a PCR product, since primer 4 primes in a region that should be lost due to the recombination. If a band is obtained with the latter primers, this is indicative for the presence of the complete plasmid pPWT018 in the genome, so no recombination has taken place.

If primers of SEQ ID 2 and 3 do not result in a PCR product, recombination has taken place, but in such a way that the complete plasmid pPWT018 has recombined out of the genome. Not only were the selectable markers lost, but also the arabinose-genes. In fact, wild-type yeast has been retrieved.

Isolates that showed PCR results in accordance with one copy integration of pPWT018 were subjected to Southern blot analysis. The chromosomal DNA of strains CEN.PK113-7D and the correct recombinants were digested with EcoRI and HindIII (double digestion). A SIT2-probe was prepared with primers of SEQ ID 6 and 7, using pPW018 as a template. The result of the hybridisation experiment is shown in FIG. 3.

In the wild-type strain, a band of 2.35 kb is observed, which is in accordance with the expected size of the wild-type gene. Upon integration and partial loss by recombination of the plasmid pPWT018, a band of 1.06 kb was expected. Indeed, this band is observed, as shown in FIG. 3 (lane 2).

One of the strains that showed the correct pattern of bands on the Southern blot (as can be deduced from FIG. 3) is the strain designated as BIE104A2.

1.4 Introduction of Four Constitutively Expressed Genes of the Non-Oxidative Pentose Phosphate Pathway

Saccharomyces cerevisiae BIE104A2, expressing the genes araA, araB and araD constitutively, was transformed with plasmid pPWT080 (FIG. 4). The sequence of plasmid pPWT080 is set out in SEQ ID 8. The procedure for transformation and selection, after selecting a one copy integration transformant, are the same as described above in sections 1.1, 1.2 and 1.3. In short, BIE104A2 was transformed with Sfil-digested pPWT080. Transformation mixtures were plated on YPD-agar (per liter: 10 grams of yeast extract, 20 grams per liter peptone, 20 grams per liter dextrose, 20 grams of agar) containing 100 μg G418 (Sigma Aldrich) per ml.

After two to four days, colonies appeared on the plates, whereas the negative control (i.e. no addition of DNA in the transformation experiment) resulted in blank YPD/G418-plates.

The integration of plasmid pPWT080 is directed to the GRE3-locus. Transformants were characterized using PCR and Southern blotting techniques. The correct integration of the plasmid pPWT080 at the GRE3-locus was checked by PCR using primer pairs SEQ ID 9 and SEQ ID10, and the primer pair SEQ ID 9 and SEQ ID 11 was used to detect single or multicopy integration of the plasmid pPWT080. For Southern analysis, a probe was prepared by PCR using SEQ ID 12 and SEQ ID 13, amplifying a part of the RKI1-gene of S. cerevisiae. Next to the native RKI1-gene, an extra signal was obtained resulting from the integration of the plasmid pPWT080 (data not shown)

A transformant showing correct integration of one copy of plasmid pPWT080, in accordance with the expected hybridisation pattern, was designated BIE104A2F1.

In order to remove the selection markers introduced by the integration of plasmid pPWT080, strain BIE104A2F1 was cultured in YPD-medium, starting from a colony isolate. 25 μl of an overnight culture was used to inoculate fresh YPD-medium. After five serial transfers, the optical density of the culture was determined and cells were diluted to a concentration of approximately 5000 per ml. 100 μl of the cell suspension was plated on Yeast Carbon Base medium (Difco) containing 30 mM KPi (pH 6.8), 0.1% (NH4)2SO4, 40 mM fluoro-acetamide (Amersham) and 1.8% agar (Difco). Fluoro-acetamide resistant colonies were subjected to PCR analysis using the primers of SEQ ID 9 and SEQ ID 10. In case of correct PCR-profiles, Southern blot analysis was performed in order to verify the correct integration, again using the probe of the RKI1-gene. One of the strains that showed the correct pattern of bands on the Southern blot is the strain designated as BIE104A2P1.

Example 2 Adaptive Evolution in Shake Flask Leading to BIE104A2P1c and BIE201

2.1 Adaptive Evolution (Aerobically)

A single colony isolate of strain BIE104A2P1 was used to inoculate YNB-medium (Difco) supplemented with 2% galactose. The preculture was incubated for approximately 24 hours at 30° C. and 280 rpm. Cells were harvested and inoculated in YNB medium containing 1% galactose and 1% arabinose at a starting OD⁶⁰⁰of 0.2 (FIG. 5). Cells were grown at 30° C. and 280 rpm. The optical density at 600 nm was monitored regularly.

When the optical density reached a value of 5, an aliquot of the culture was transferred to fresh YNB medium containing the same medium. The amount of cells added was such that the starting OD⁶⁰⁰of the culture was 0.2. After reaching an OD⁶⁰⁰of 5 again, an aliquot of the culture was transferred to YNB medium containing 2% arabinose as sole carbon source (event indicated by (1) in FIG. 5).

Upon transfer to YNB with 2% arabinose as sole carbon source growth could be observed after approximately two weeks. When the optical density at 600 nm reached a value at least of 1, cells were transferred to a shake flask with fresh YNB-medium supplemented with 2% arabinose at a starting OD⁶⁰⁰of 0.2 (FIG. 5, day 28). Sequential transfer was repeated three times, as is set out it in FIG. 5. The resulting strain which was able to grow fast on arabinose was designated BIE104A2P1c.

2.2 Adaptive Evolution (Anaerobically)

After adaptation on growth on arabinose under aerobic conditions, a single colony from strain BIE104A2P1c was inoculated in YNB medium supplemented with 2% glucose. The preculture was incubated for approximately 24 hours at 30° C. and 280 rpm. Cells were harvested and inoculated in YNB medium containing 2% arabinose, with a initial optical density OD⁶⁰⁰of 0.2. The flasks were closed with waterlocks, ensuring anaerobic growth conditions after the oxygen was exhausted from the medium and head space. After reaching an OD⁶⁰⁰minimum of 3, an aliquot of the culture was transferred to fresh YNB medium containing 2% arabinose (FIG. 6), each time at an initial OD⁶⁰⁰value of 0.2. After several transfers the resulting strain was designated BIE104A2P1d (=BIE201).

Example 3 Performance Test of Strains in the BAM Showing that Adaptive Evolution has Led to (Improved) Arabinose Conversion. Co-Fermentation with Galactose

Single colony isolates of strain BIE104, BIE104A2P1c and BIE201 were used to inoculate YNB-medium (Difco) supplemented with 2% glucose. The precultures were incubated for approximately 24 hours at 30° C. and 280 rpm. Cells were harvested and inoculated in a synthetic model medium (Verduyn et al., Yeast 8:501-517, 1992; 5% glucose, 5% xylose, 3.5% arabinose, 1% galactose) at an initial OD⁶⁰⁰of approximately 2 in the BAM. CO₂production was monitored constantly. Sugar conversion and product formation was analyzed by NMR. The data represent the residual amount of sugars at the indicated (glucose, arabinose, galactose and xylose in grams per litre) and the formation of (by-)products (ethanol, glycerol). Growth was monitored by following optical density of the culture at 600nm (FIGS. 7, 8 and 9). The experiment was running for approximately 140 hours.

The experiments clearly show that reference strain BIE104 converted glucose rapidly, but was not able to convert arabinose, xylose and/or galactose within 140 hours (FIG. 7). However, strain BIE104A2P1c and BIE201 were capable to convert arabinose and galactose (FIGS. 8 and 9, respectively). Galactose and arabinose utilization started immediately after glucose depletion after less than 20 hours. Both sugars were converted simultaneously. However, strain BIE201 which was improved for arabinose growth under anaerobic conditions, consumed both sugars more rapidly (FIG. 9). In all fermentations only glycerol was generated as by-product.

Example 4 Resequencing of the Strains and Identification of SNPs Involved in Arabinose Fermentation

As can be concluded from examples 1, 2 and 3, mere introduction of the genes encoding enzymes needed for or enhancing the utilization of arabinose is not sufficient to allow growth on arabinose as sole carbon source. As shown in example 2, a process called adaptive evolution is required to select cells that utilize arabinose as sole C-source.

Presumably, spontaneous mutations (SNPs, for Single Nucleotide Polymorphisms) in the genome are responsible for this phenotypic change. Alternatively, larger variations in the genome (not limited to the substitution, insertion or deletion of a single nucleotide) may have taken place.

In order to learn which mutations or SNPs are responsible for this phenotypic change, we resequenced the genomic DNA of the transformants, using the art known as Solexa® technology, using the Illumina® Genome Analyzer.

To this end, chromosomal DNA was isolated from the strains BIE104, primary transformant BIE104A2P1, evolved transformant BIE104A2P1c and further evolved transformant BIE201 from YEP 2% glucose overnight cultures. The DNA was sent to ServiceXS (Leiden, the Netherlands) for resequencing using the Illumina® Genome Analyzer (50 by reads, pair end sequencing).

Per strain, about 1800 Mb of sequences were obtained, which corresponds to an average genome coverage of 140, which means that on average, every base has been read 140 times.

Using NextGene software (SoftGenetics LLC, State College, Pa. 16803, USA), the sequencing reads were aligned using the S288c as a template. Mutations (single nucleotide polymorphisms and insertion/deletions up to 30 bp) were detected using NextGene software and summarised in a mutation report. The alignments of the different strains were compared to each other to identify the unique variations between the strains. Every entry of the mutation report was checked manually, in order to rule out the possibility of misalignment of the reads, sequencing errors or mutation calls in areas where the sequencing coverage was too low to support this. False positive mutations were removed from the mutation report.

The sequence of the primary transformant (BIE104A2P1) was identical to the sequence of wild-type strain BIE104, with the exception of the sequences that were introduced and the sequences that were deleted by the integration of the plasmids and the subsequent removal of the markers by recombination.

In the evolved transformant, strain BIE104A2P1c, a limited number of SNPs was introduced:

SSY1 YDR160w G → T introduction stop-codon YJR154w A → G D → G CEP3 YMR168c A → G S → G YPL277c C → T silent

In the further evolved transformant, strain BIE201, one additional SNP was observed, next to the 4 SNPs mentioned above:

GAL80 YML051w A → C T → P

The sequences of the five open reading frames of the genes containing the SNPs, both in the wild type strain BIE104 and in the evolved strains BIE104A2P1c and BIE201, are given in SEQ ID 14, SEQ ID 15 (SSY1), SEQ ID 16, SEQ ID 17 (YJR154w), SEQ ID 18, SEQ ID19 (CEP3), SEQ ID 20, SEQ ID 21 (YPL277c), SEQ ID 22 and SEQ ID 23 (GAL80).

Example 5 Confirmation of the SNPs

In order to (re)confirm the SNPs that were detected in the example described above, two methods were employed. The first method comprised amplification of the regions containing the SNPs followed by Single read (Sanger) sequencing on a AB13730XL sequencer (outsourced to Baseclear B V, Leiden, the Netherlands). The second method consisted of High Resolution Melting Analysis (Hi-Res).

5.1 Single Read Sanger Sequencing

Genomic DNA isolated from cultures of strains BIE104A2P1 and BIE201 was used as a template for PCR reactions using Phusion® High-Fidelity DNA Polymerase (Finnzymes, Vantaa, Finland). The PCR reactions were performed according to the suggestions made by the supplier. The following primers were used to amplify the following genes, expected to have a SNP.

TABLE 2 Primers used for amplification of PCR products Gene of interest Forward primer Reverse primer SSY1 (YDR160w) SEQ ID NO 24 SEQ ID NO 25 YJR154w SEQ ID NO 26 SEQ ID NO 27 CEP3 (YMR168c) SEQ ID NO 28 SEQ ID NO 29 YPL277c SEQ ID NO 30 SEQ ID NO 31 GAL80 (YML051w) SEQ ID NO 32 SEQ ID NO 33

The PCR products were cloned into the pTOPO Blunt II vector (Invitrogen, Carlsbad, USA). The correct clones were selected on basis of restriction enzyme analysis. Correct clones were sent to BaseClear BV (Leiden, the Netherlands) for single stranded Sanger sequencing.

The TOPO cloning of the CEP3 fragment was not successful. No Sanger sequencing data was obtained for this gene.

The sequence of YPL277c appeared to be identical to the sequence of the wild-type strain BIE104.

The Sanger sequencing results confirmed the SNPs in the genes SSY1, YJR154w and GAL80, i.e. the SNPs were the same as described in Example 4.

5.2 Hi-Res Analysis

The Hi-Res technology is commercialized by Idaho Technologies (Salt Lake City, Utah 84108, USA). In short, mutations in PCR products are detected by the presence of heteroduplexes optimally detected by LCGreen® dye. Variations are identified by changes in the shape of the melting profile compared to a reference sample. Hi-Res Melting® (HRM) on the LightScanner® is being used for mutation discovery in numerous research and clinical applications.

For each SNP, two primers were designed in order to amplify a region of around 100 to 200 by containing the SNP or the wild-type sequence. In addition, a third primer was designed to function as a probe in the experiments which detects the melting profile. The latter primer was designed such that it covers the SNP region and is exactly complimentary to the wild-type sequence. The matching to the SNP sequence is imperfect, i.e. all but one of the nucleotides of the probe are complementary to the region of interest. Mismatched DNA strands will melt earlier than matched DNA strands, which results in different melting curves of wild type and SNP amplicons, which are detected using the LightScanner® (Idaho Technologies, Salt Lake City, Utah, USA).

The table below summarizes the primer sequences that were used to amplify the gene or ORF of interest, of which the SNP should be verified in strain BIE201.

TABLE 3 Primers for amplification of PCR products Gene of interest Forward primer Reverse primer SSY1 (YDR160w) SEQ ID NO 24 SEQ ID NO 25 YJR154w SEQ ID NO 26 SEQ ID NO 27 CEP3 (YMR168c) SEQ ID NO 28 SEQ ID NO 29 YPL277c SEQ ID NO 30 SEQ ID NO 31 GAL80 (YML051w) SEQ ID NO 32 SEQ ID NO 33

The table below summarizes the SEQ ID NOs that have been used to verify the SNPs in strain BIE201 (the probes).

TABLE 4 Primers used as probes in Hi-Res analysis Gene of interest Probe wild-type sequence SSY1 (YDR160w) SEQ ID NO 34 YJR154w SEQ ID NO 35 CEP3 (YMR168c) SEQ ID NO 36 YPL277c SEQ ID NO 37 GAL80 (YML051w) SEQ ID NO 38

PCR reactions were carried out using chromosomal DNA of the strains BIE104 (wild type yeast strain) and strain BIE201 (the yeast strain capable of growing anaerobically on arabinose), using primer pairs of SEQ ID NO 24 and 25 (SSY1), 26 and 27 (YJR154w), 28 and 29 (CEP3), 30 and 31 (YPL277c) and 32 and 33 (GAL80), according to the instructions as provided by Idaho Technologies but in the absence of probe. The amplified fragments were checked on a 2% agarose gel for yield and integrity.

The HiRes analysis was performed as follows, analogous to the protocol provided by Idaho Technologies: 2 μl of probe (5 μM) was added to 10 μl PCR product in a PCR microplate (4titude Framestart 96, black frame, white wells (BiokéO, Leiden, the Netherlands)). After mixing the microplate was spun down. The plate was incubated for 30 seconds at 99° C. and cooled to room temperature (˜20° C.). Subsequently, the melting protocol on the Lightscanner was followed with start temperature of 55° C., end temperature of 94° C. and exposure settings on “auto”. After the measurements were complete, data analysis was performed. The temperature boundaries between which the change in fluorescence was analysed were manually set at the temperature interval where the probe was expected to melt from the PCR products.

An example of a melting curve is shown in FIG. 11. FIG. 11 displays an example of both “Normalized Melting Curves” (melting curves; top panel) and a “Normalized melting Peaks” curve (lower panel). The latter is derived from the first graph and is showing the change in fluorescence signal as a function of the temperature. Strains BIE104A2P1 and BIE201 are displayed. The gene tested in this figure is YJR154w. The difference in melting temperature of the probe is clear between the two strains tested, BIE201 and BIE104A2P1.

All expected SNPs, except the one in YPL277c, were confirmed. The sequence of this ORF (YPL277c) in BIE201 appeared to be identical to the sequence of the wild-type strain BIE104.

In summary, in Example 5 the SNPs in the ORFs SSY1 (YDR160w), YJR154w, CEP3 (YMR168c) and GAL80 (YML051w) were confirmed. The SNP that was previously identified (Example 4) in the ORF of YPL277c was falsified using two independent methods.

Example 6 Amplification of Parts of Chromosome VII

6.1 Amplification of a Part of Chromosome VII

As was described in Example 4, resequencing of the wild-type strain BIE104, primary transformant BIE104A2P1, evolved transformant BIE104A2P1c and further evolved strain BIE201 yielded several interesting SNPs.

Using the coverage plots, which indicate the read depth of every single nucleotide of the genome, we have searched for areas in the genome that were over- or underrepresented. Indeed, we have identified a region on chromosome VII that was overrepresented (see FIG. 12).

From the read depth, it was concluded that parts of chromosome VII, surrounding the centromere, were amplified. A region on the left arm of chromosome VII was amplified three times. A part of the right arm of chromosome VII was amplified twice, and an adjacent part was amplified three times (see FIG. 12).

The part on the right arm of chromosome VII that was amplified three times contains the arabinose expression cassette, i.e. the genes araA, araB and araD under control of strong constitutive promoters.

Firstly, the copy number of several genes was confirmed by Q-PCR. Secondly, it was investigated whether the amplification took place on the same chromosome (duplication cq. triplication) or whether the amplified region was integrated into another chromosome (translocation).

6.2 Copy Number Determination by Q-PCR

In order to verify the amplification of parts of chromosome VII, as indicated by the coverage plot of FIG. 12, Q-PCR experiments were performed. Specifically, this method measures the relative copy number of a gene of interest by comparing it with another gene, with a known copy number.

To this end, the Bio-Rad iCycler iQ system from Bio-Rad (Bio-Rad Laboratories, Hercules, Calif., USA) was used. The iQ SYBR Green Supermix (Bio-Rad) was used. Experiments were set up as suggested in the manual of the provider.

From the coverage plot (read depth) it was deduced that genes SDS23 and YGL057c were expected to be part of the amplified region on the left arm of chromosome VII. As a reference single copy gene, the ACT1 gene was chosen.

The primers for the detection of the genes YGL057c, SDS23 and ACT1 are summarized in the table below.

TABLE 5 Primers used for amplification in the Q-PCR experiment Gene of interest Forward primer Reverse primer YGL057c SEQ ID NO 39 SEQ ID NO 40 SDS23 SEQ ID NO 41 SEQ ID NO 42 ACT1 SEQ ID NO 43 SEQ ID NO 44

The Q-PCR conditions were as follows:

1) 95° C. for 3 min

for 40 cycli, steps 2-4

2) 95° C. for 10 sec

3) 58° C. for 45 sec

4) 72° C. for 45 sec

5) 65° C. for 10 sec

6) Increase of temperature with 0.5° C. per 10 sec to 95° C.

The melting curve is being determined by starting to measure fluorescence at 65° C. for 10 seconds. The temperature is increased every 10 seconds with 0.5° C., until a temperature of 95° C. is reached. From the reads, the copy number of the gene of interest were calculated and/or estimated. The results are presented in the table below.

TABLE 6 Relative copy number of selected genes in strains BIE104A2P1 and BIE201 Copy number in Copy number in Gene of interest BIE104A2P1 BIE201 YGL057c 1.2 5.1 SDS23 1.2 4.4 ACT1 1.0 (reference) 1.0 (reference)

The results corroborate the amplification as was apparent from the read depth analysis in Example 6 (section 6.1). The observed values are higher than the expected copy number of 3.0. The difference may be caused by a number of factors, as previously disclosed by Klein (Klein, D. (2002) TRENDS in Molecular Medicine Vol. 8 No. 6, 257-260).

6.3 Analysis of the Nature of the Duplication

In order to determine whether the amplified regions are located on the same chromosome as the genes are originally located, i.e. chromosome VII, or have been translocated to another chromosome, CHEF electrophoresis (Clamped Homogeneous Electric Fields electrophoresis; CHEF-DR® III Variable Angle System; Bio-Rad, Hercules, Calif. 94547, USA) was applied. Agarose plugs of yeast strains (see below) were prepared using the CHEF Yeast Genomic DNA Plug Kit (BioRad) according to the instructions of the supplier. 1% Agarose gels (Pulse Field Agarose, Bio-Rad) were prepared in 0.5× TBE (Tris-Borate-EDTA) according to the suppliers instructions. Gels were run according to the following settings:

Block 1 initial time 60 sec

- final time 80 sec
- ratio 1
- run time 15 hours

Block 2 initial time 90 sec

- final time 120 sec
- ratio 1
- run time 9 hours

As a marker for size determination of the chromosomes, agarose plugs of strain YNN295 (Bio-Rad) were included in the experiment.

After electrophoresis, gels were stained using ethidiumbromide at a final concentration of 70 pg per litre, for 30 minutes. In FIG. 13, an example of a stained gel is shown.

After staining, gels were blotted onto Amersham Hybond N+ membranes (GE Healthcare Life Sciences, Diegem, Belgium).

In order to be able to establish if the amplified genes are located on one chromosome or translocated to other chromosomes, probes were made for hybridization with the blotted membranes. Probes (see table below) were prepared using the PCR DIG Probe Synthesis Kit (Roche, Almere, the Netherlands) according to the instructions of the supplier.

The following probes were prepared.

TABLE 7 Primers for amplification of the indicated probes Size Systematic PCR Chro- name Forward Reverse product mo- Probe gene primer primer (bp) some araA SEQ ID NO 45 SEQ ID NO 46 641 VII ACT1 YFL039c SEQ ID NO 47 SEQ ID NO 48 392 VI PNC1 YGL037c SEQ ID NO 49 SEQ ID NO 50 384 VII HSF1 YGL073w SEQ ID NO 51 SEQ ID NO 52 381 VII YGR031w YGR031w SEQ ID NO 53 SEQ ID NO 54 392 VII

The araA-gene is expected to be amplified three times in BIE104A2P1c and BIE201.

The ACT1-gene is located on chromosome VI and not expected to be amplified. Hence, this probe serves as a control.

PNC1 is located on the left arm of chromosome VII and is expected to be amplified three times in BIE104A2P1c and BIE201.

HSF1 is located on the left arm of chromosome VII and is located upstream of the amplified region. Hence, this gene is expected to be present in the genome as a single gene in the strains tested.

YGR031w is located on the right arm of chromosome VII. This gene is expected to be present in two copies in the genome of strains BIE104A2P1c and BIE201.

Membranes were prehybridized in DIG Easy Hyb Buffer (Roche) according to the instructions of the supplier. The probes were denatured at 99° C. for 5 minutes, chilled on ice for 5 minutes, and added to the prehybridized membranes. Hybridization was done overnight at 42° C.

Washing of the membranes and blocking of the membranes prior to detection of the hybridized probes were done using the DIG Wash and Block Buffer Set (Roche) according to the instructions of the supplier. The detection was done by incubation with anti-dioxygenin-AP Fab fragments (Roche) followed by the addition of detection reagents using the CDP-Star ready-to-use kit (Roche). Detection of the chemiluminiscent signals were performed using the Bio-Rad Chemidoc XRS+ System, using the appropriate settings provided by the Chemidoc apparatus.

The results are shown in FIGS. 13, 14, 15, 16, 17 and 18.

From FIG. 13 it can already be inferred that there are differences in the size of the chromosomes in the strain lineage from BIE104 to BIE201. In strain BIE104A2P1(a), the primary transformant, no large differences are observed with respect to the size of the chromosomes when compared to BIE104. In strains BIE104A2P1c and BIE201 however, the size of chromosome VII has increased. In strain BIE104, chromosome VII is close to chromosome XV; in BIE104A2P1c and BIE201 however, the chromosome has increased in size and is almost as large as chromosome IV.

Hybridization with probes of the genes araA (FIG. 14), PNC1 (FIG. 16) and HSF1 (FIG. 17) projects the same image. This suggests that the amplification has taken place within the same chromosome, i.e. that all amplified regions are still on chromosome VII. If a translocation had occurred, multiple signals were expected, which is not the case. In strain BIE104A2P1(a), a smaller band is observed under the band of chromosome VII, with all three probes. This suggests that a second, smaller version chromosome VII is present. Since the intensity is lower than the larger band, it may be present in only a fraction of the cells. It may also be explained by assuming an electrophoresis artefact.

The hybridisation with the ACT1 probe (FIG. 15) results in a single band in all strains, as expected, is representing chromosome VI.

The hybridisation with the YGRO31w (FIG. 18) probe finally, resulted in many bands. Apparently, cross-hybridization occurred, resulting in multiple signals in each strain. Therefore, this result can not be used for the purpose of this experiment.

Though some differences in intensity are observed between the strains, it is difficult to conclude from these data whether amplification can be shown. Although an increase in the signal intensity may suggest an increase of the copy number of a certain gene, other factors may also influence the signal strength, like the amount of DNA applied on the gel, blotting efficiency, detection saturation, and the like.

Taken together, the results of Example 6 clearly indicate that the amplification has taken place within chromosome VII. There is no evidence for a translocation of the genetic context of the genes araA, araB and araD (including surrounding sequences) to another chromosome.

Example 7 Phenotypic Validation of the SNPs and Amplification

In order to validate whether the discovered SNPs and amplification, and if yes to which extent, contribute to the ability to convert arabinose into ethanol by yeast cells (apart from the introduced homologous and heterologous pathways), cross-breeding experiments were performed. To this end, the following experiments were performed: mating type switch of strain BIE201, cross-breeding of the mating type switched BIE201 with the non-evolved parent strain BIE104A2P1, sporulation of the diploid strain followed by dissection of the four ascospores, determination of the ability to utilize arabinose as sole carbon and energy source in the haploid offspring, SNP detection in the haploid offspring using Hi-Res, and analysis of these datasets.

By crossing the evolved, mating type switched BIE201 with the non-evolved primary transformant BIE104A2P1, a diploid cell is being constructed which is completely homozygous, except for the identified genomic changes (SNPs and amplification). By subsequently sporulating this diploid cell followed by dissection of the ascospores, haploid cells will be obtained which may have none, some or all genomic changes that were introduced during adapted evolution. The distribution of the genomic changes over the four haploid derivatives of one diploid cell is random, although per SNP, DIP or amplification, a 2:2 segregation is expected over the four haploid derivatives. For more theoretical background, see e.g. Mortimer R. K. and Hawthorne D. C. (1975) Genetic Mapping in Yeast. Methods Cell Biol. 11:221-33.

7.1 Mating Type Switch of Strain BIE201

Plasmid pGal-HO (KAN) is a derivative of the plasmid pGAL-HO (Herskowitz, I. and Jensen, R. E. (1991) Methods in Enzymology, 194:132-146). The URA3-marker in pGAL-HO has been replaced by the kanMX marker, by cutting pGAL-HO with EcoRV followed by the ligation of the kanMX fragment from pUG6 (Güldener, U. et al (1996) Nucleic Acids Research 24: 2519-2524). The kanMX marker, allowing for G418 selection in S. cerevisiae, was cut from pUG6 with the restriction enzymes XbaI and XhoI, followed by filling in the overhanging ends with Klenow polymerase. The resulting plasmid is pGal-HO (KAN).

Strain BIE201 (relevant genotype in relation to this experiment: matA) was transformed according to the method of Gietz and Woods (2002) with the plasmid pGal-HO (KAN). Transformants were selected on YEP/agar-plates containing glucose (2%) and G418 (100 μg/ml). Colonies appeared after two days of incubation at 30° C. Eight colonies were restreaked on fresh YEP/agar-plates with glucose and G418. Two colonies of each transformation were used to inoculate 20 ml YEP-medium containing 1% galactose and 0.1% glucose. After 2 days of incubation at 30° C. and 280 rpm, cells were restreaked on YEPD-plates. Plates were incubated during 2 days at 30° C., and colonies were visible. PCR reactions were performed for the determination of the mating-type using the primers of SEQ ID NO 55 and 56 (for identification of matA cells), and primers of SEQ ID NO 55 and 57 (for identification of matα (alpha) cells).

Several matα (alpha) variants of BIE201 were obtained. In order to test whether these derivatives have indeed switched their mating type, they were restreaked on fresh YEPD-plates. Also, strain BIE104A2P1 (the primary transformant, relevant genotype in this experiment: matA) was restreaked on a separate fresh YEPD-plate.

Subsequently, both strains were allowed to mate by mixing a loopful of each strain on a fresh YEPD-agar plate. After 6 hours of incubation at 30° C., mating was scored under the microscope. Some isolates indeed appeared to form zygotes, i.e. structures in which two cells of opposite mating type have fused to form a diploid strain. These BIE201 derivatives indeed changed the mating type to matα (alpha).

7.2 Cross-Breeding of the Mating Type Switched BIE201 with the Non-Evolved Parent Strain BIE104A2P1

The preparations in which the formation of hybrids (zygotes) were observed by microscopy (section 7.1), were plated on YEPD-agar plates. Plates were incubated at 30° C. for two days. The larger colonies were picked and restreaked on fresh YEPD-plates. Subsequently, colony PCR was performed using the primers of SEQ ID NO 55 and 56 and SEQ ID NO 55 and 57. Diploids will form a PCR product with both primer pairs. Several of these colonies were obtained and used to inoculate YEP-medium with 2% glucose (30° C., 280 rpm).

7.3 Sporulation of the Diploid Strain and Dissection of the Ascospores

After overnight growth at 30° C. and 280 rpm, 2.5 ml was transferred to 25 ml 1.5% KAc in tap water (sterilized). Incubation was continued at 30° C. and 280 rpm. Each day, the degree of sporulation was checked microscopically. When the ratio of asci versus vegetative cells was larger than 2, 60 asci were dissected using the Singer MSM System© series 300 (Somerset, UK) apparatus, using the instructions and protocols of the supplier. Dissection was done on YEPD-plates. Plates were incubated for 2 days at 30° C. An example of the result is set out in FIG. 19.

FIG. 19 shows 10 asci that were dissected. The ascospores from the ascus were separated from each other and put on the agar plate at distinctive distances. Colonies in a “column” (10 columns are shown) originate from one ascus.

As is apparent from FIG. 19, not all four spores were viable in all cases. In a minority of the cases, only three and sometimes even only two ascospores grew into viable colonies.

Also, some differences in the colony size were observed between the colonies from one ascospore.

7.4 Determination of the Ability to Utilize Arabinose as Sole Carbon and Energy Source in the Haploid Offspring

All complete sets of haploid derivatives, it is in those cases where four viable spores were obtained from an ascus, were inoculated in YEPD-agar in 96-wells microplates. Controls BIE104A2P1 and BIE201 were included as controls on each microplate in at least twofold. The plates were incubated for 2 days at 30° C. These plates are called the “masterplates”.

96-Well microplates containing 200 μl Verduyn-medium and 2% glucose were inoculated with colony material from the masterplates, with the aid of a disposable pin tool, which allows the transfer of cell material of all 96 strains in a microplate in one movement.

The microplate containing the liquid Verduyn medium with 2% glucose was grown for two days at 30° C. and 550 rpm, in an Infors microplate shaker, at 80% humidity.

Subsequently, 10 μl of the glucose grown microplate cultures were transferred to microplates containing 200 μl Verduyn medium containing 2% arabinose as a carbon source. The incubation in an Infors shaker at 30° C., 550 rpm and 80% humidity lasted for four days. Each day, the growth was monitored by measuring the optical density at 620 nm using a BMG FLUOstar microplate reader (BMG, Offenburg, Germany). The ability to utilize arabinose was expressed by dividing the final optical density after 4 days of incubation on arabinose as sole carbon source by the initial optical density of the same microplate. An example of the results is summarized in table 8.

TABLE 8 Of each haploid derivative from the dissected asci and the controls BIE104A2P1 and BIE201, the growth (defined as the final optical density at 620 nm divided by the initial optical density at 620 nm) was determined. Haploid strain Growth A1 27 A2 7 A3 5 A4 26 B1 6 B2 29 B3 9 B4 5 BIE201 25 BIE104A2P1a 5 C1 9 C2 11 C3 25 C4 12 D1 17 D2 8 D3 11 D4 15 E1 18 E2 6 E3 9 E4 10 F1 9 F2 8 F3 10 F4 7 G1 9 G2 9 G3 17 G4 32

From table 8 it is clear that there is, as can be expected, a large difference between the two control strains, BIE104A2P1 and BIE201. BIE104A2P1 reaches a level of 5, which in practice means that no growth was obtained. Though a factor 5 suggests that some growth has occurred, this will most likely be caused by carry over of nutrients (residual glucose, ethanol) from the preculture. Strain BIE201 reaches a growth ratio of 25, which is significantly higher than the strain BIE104A2P1.

The haploid derivatives display a wide range of growth phenotypes, ranging from low growth (similar to BIE104A2P1) to high levels of growth (similar to and exceeding the level of BIE201). Also, strains with intermediate growth levels were obtained. For instance, in the first ascus, ascus A, resulting in four haploid strains A1, A2, A3 and A4, a 2:2 segregation of the arabinose growth phenotype is obtained. In some other asci, the segregation between low and high growth levels obtained does not follow a 2:2 pattern. For instance, in ascus B, one high level growth phenotype strain is obtained, one with an intermediate level (value of 9), and two haploids that have a low growth phenotype. Similar observations can be done from the haploid strains derived from the other asci.

7.5 SNP Detection in the Haploid Offspring using Hi-Res

96-Well microplates containing YEP-medium supplemented with 2% glucose were inoculated with colony material from the masterplates (section 7.4). Cells were allowed to grow in an Infors shaker at 30° C., 550 rpm and 80% humidity for 2 days. As controls, strain BIE104A2P1 and BIE201 were included.

Chromosomal DNA was isolated using the above protocol in a downscaled fashion. The chromosomal DNA served as a template for Hi-Res analysis as described in section 5.2. The Hi-Res analysis allowed the identification of the SNPs in each haploid segregant from the cross BIE201 (matα) X BIE104A2P1 (matA). Likewise, the presence of the amplified regions on chromosome VII were determined according to the methods described in section 6.2. Of each haploid segregant, the genotype with respect to the SNPs and amplification were determined. The results are presented in table 9.

TABLE 9 Overview of the presence of the SNPs and the amplification in the haploid derivatives of the cross BIE104A2P1 × BIE201. As controls, BIE104A2P1 and BIE201 were included. Haploid strain YJR154w SSY1 CEP3 GAL80 Amplification A1 WT WT WT SNP + A2 SNP SNP SNP WT − A3 WT WT WT WT − A4 WT SNP WT SNP + B1 SNP WT SNP SNP − B2 WT WT WT SNP + B3 WT SNP SNP WT + B4 SNP SNP WT WT − BIE201 SNP SNP SNP SNP + BIE104A2P1a WT WT WT WT − C1 SNP SNP SNP WT − C2 WT WT SNP SNP − C3 WT WT WT SNP + C4 SNP SNP WT WT + D1 WT SNP SNP WT + D2 SNP SNP WT SNP − D3 SNP WT SNP SNP − D4 WT WT WT WT − E1 WT SNP WT WT + E2 WT WT SNP SNP − E3 SNP SNP WT SNP − E4 SNP WT SNP WT + F1 SNP WT WT WT − F2 WT WT SNP SNP − F3 WT SNP SNP WT − F4 SNP SNP WT SNP − G1 SNP SNP WT SNP − G2 WT WT WT WT − G3 WT SNP SNP WT + G4 SNP WT SNP SNP +

In most asci, a 2:2 segregation of the SNPs and amplification are observed. There are some exceptions to this, which may be caused by e.g. meiotic gene conversion.

7.6 Analysis of these Datasets

Combining the datasets of section 7.4 and 7.5 (tables 8 and 9 respectively), yields the following table, table 10. In table Z however, the results have been sorted from high growth to low growth on arabinose.

TABLE 10 Overview of the SNPs, the amplification and the growth phenotype of haploid derivatives of the cross BIE104A2P1 × BIE201, and the respective parent strains. Am- Strain YJR154w SSY1 CEP3 GAL80 plification Growth G4 SNP WT SNP SNP + 32 B2 WT WT WT SNP + 29 A1 WT WT WT SNP + 27 A4 WT SNP WT SNP + 26 BIE201 SNP SNP SNP SNP + 25 C3 WT WT WT SNP + 25 E1 WT SNP WT WT + 18 G3 WT SNP SNP WT + 17 D1 WT SNP SNP WT + 17 D4 WT WT WT WT − 15 C4 SNP SNP WT WT + 12 D3 SNP WT SNP SNP − 11 C2 WT WT SNP SNP − 11 E4 SNP WT SNP WT + 10 F3 WT SNP SNP WT − 10 E3 SNP SNP WT SNP − 9 G2 WT WT WT WT − 9 B3 WT SNP SNP WT + 9 G1 SNP SNP WT SNP − 9 C1 SNP SNP SNP WT − 9 F1 SNP WT WT WT − 9 F2 WT WT SNP SNP − 8 D2 SNP SNP WT SNP − 8 F4 SNP SNP WT SNP − 7 A2 SNP SNP SNP WT − 7 B1 SNP WT SNP SNP − 6 E2 WT WT SNP SNP − 6 BIE104A2P1a WT WT WT WT − 5 A3 WT WT WT WT − 5 B4 SNP SNP WT WT − 5

The results of table 10 strongly suggest that the amplification is the key event determining the ability to grow on arabinose at a relatively high growth rate. Most of the strains having the amplification are located in the top 9 of table 10. Two-third of these strains also have a SNP in the GAL80 gene, suggesting an interaction between the presence of the SNP in the GAL80 gene and the presence of the amplification.

In order to to determine, statistically, which of the factors are relevant for high growth and whether there are synergistic effects, ANOVA analysis was applied. Though the design is not balanced, based on the statistical testing of the data, it is clear that the presence of the amplification (p<<0.01) has a positive effect on the growth. The results also reveal that a strong interaction between GAL80 SNP and the presence of the amplification (p<<0.01) exists while the other SNPs have no significant effect (p>0.01).

A median growth of 8.4 is estimated in case of absence of the amplification, while in the presence of the amplification, the median growth is 17.6. A median growth of 8.7 is estimated in case of absence of both the GAL80 SNP and the amplification, while in case both are present, the median growth is 26.8.

Also, the interaction of the presence of the CEP3 SNP and the presence of the amplification appears to have a synergistic effect, although in a lesser extent than the interaction between the presence of the GAL80 SNP and the amplification.

In conclusion, the effects and the significance of effects on growth due to the presence of SNPs and/or the amplification could be determined. The amplification has a significant effect on the growth. This effect is increased through combination of the amplification and the GAL80 SNP. A minor interaction effect was detected for the combination of amplification and the CEP3 SNP and the combination of amplification, the GAL80 SNP and the CEP3 SNP.

Example 8 Deletion of GAL80 Leads to an Even Better Arabinose Conversion

In Example 7 it was shown that the identified SNP in the GAL80 gene has a positive additive effect on the growth on arabinose, if the amplification of a part of chromosome VII is also present.

GAL80 encodes a transcriptional repressor involved in transcriptional regulation in response to galactose (Timson D J, et al. (2002) Biochem J 363(Pt 3):515-20). In conjunction with Gal4p and Gal3p, Gal80p coordinately regulates the expression of genes containing a GAL upstream activation site in their promoter (UAS-GAL), which includes the GAL metabolic genes GAL1, GAL10, GAL2, and GAL7 (reviewed in Lohr D, et al. (1995) FASEB J 9(9):777-87). Cells null for gal80 constitutively express GAL genes, even in non-inducing media (Torchia T E, et al. (1984) Mol Cell Biol 4(8):1521-7).

The hypothesis is that the SNP that was identified in the GAL80 gene influences the interaction between Gal80p, Gal3p and Gal4p. Hence, the expression of the galactose metabolic genes, including GAL2 encoding galactose permease, will be changed as well as compared to a yeast cell with a wild type GAL80 allele. Gal2p (galactose permease) is the main sugar transporter for arabinose (Kou et al (1970) J Bacteriol. 103(3):671-678; Becker and Boles (2003) Appl Environ Microbiol. 69(7): 4144-4150).

Apparently, the SNP in the GAL80 gene has a positive effect on the ability to convert L-arabinose. In order to investigate whether the arabinose growth phenotype could further be improved, the coding sequence of the GAL80 gene was deleted in its entirety, using a PCR-mediated gene replacement strategy.

8.1 Disruption of the GAL80 Gene

Primers of SEQ ID NO 58 and 59 (the forward and reverse primers respectively) were used for amplification of the kanMX-marker from plasmid p427-TEF (Dualsystems Biotech, Schlieren, Switzerland). The flanks of the primers are homologous to the 5′-region and 3′-region of the GAL80 gene. Upon homologous recombination, the ORF of the GAL80 gene will be replaced by the kanMX marker, similar as described by Wach (Wach et al (1994) Yeast 10, 1793-1808). The obtained fragment is designated as the GAL80::kanMX fragment.

A yeast transformation of strain BIE252 was done with the purified GAL80::kanMX fragment according to the protocol described by Gietz and Woods (2002), Methods in Enzymology 350: 87-96). The construction of strain BIE252 has been described in EP10160622.6. Strain BIE252 is a xylose and arabinose fermenting strain of S. cerevisiae, which is a derivative of BIE201. Strain BIE252 also contains the GAL80 SNP.

The transformed cells were plated on YEPD-agar containing 100 μg/ml G418 for selection. The plates were incubated at 30° C. until colonies were visible. Plasmid p427-TEF was included as a positive control and yielded many colonies. MilliQ (i.e. no DNA) was included as a negative control and yielded no colonies. The GAL80::kanMX fragment yielded many colonies. Two independent colonies were tested by Southern blotting in order to verify the correct integration (data not shown). A colony with the correct deletion of the GAL80 gene was designated BIE252ΔGAL80.

8.2 Effect of GAL80 Gene Replacement on the Performance in the BAM

A BAM (Biological Activity Monitor; Halotec B V, Veenendaal, the Netherlands) experiment was performed. Single colony isolates of strain BIE252 and strain BIE252ΔGAL80 (a transformant in which the ORF of the GAL80 gene was correctly replaced by the kanMX marker) were used to inoculate Verduyn medium (Verduyn et al., Yeast 8:501-517, 1992) supplemented with 2% glucose. The precultures were incubated for approximately 24 hours at 30° C. and 280 rpm. Cells were harvested and inoculated in a synthetic model medium (Verduyn medium supplemented with 5% glucose, 5% xylose, 3.5% arabinose, 1% galactose and 0.5% mannose, pH 4.2) at a cell density of about 1 gram dry weight per kg of medium. CO₂production was monitored constantly. Sugar conversion and product formation was analyzed by NMR. The data represent the residual amount of sugars at the indicated time points (glucose, arabinose, galactose, mannose and xylose in grams per litre) and the formation of (by-)products (ethanol, glycerol, and the like). Growth was monitored by following optical density of the culture at 600nm. The experiment was running for approximately 72 hours.

The graphs are displayed in FIG. 20 (BIE252) and 21 (BIE252ΔGAL80).

The experiments clearly show that reference strain BIE252 converted glucose and mannose rapidly. After glucose depletion (around 10 hours), the conversion of xylose and arabinose commenced. Some galactose was already being fermented around the 10 hours time point, which might be due to the GAL80 SNP in this strain, which would allow (partial) simultaneous utilisation of glucose and galactose. At the end of the experiment, around 72 hours, almost all sugars were converted. An ethanol yield of 0.37 grams of ethanol per gram sugar was obtained.

Strain BIE252ΔGAL80 exhibits faster sugar conversion ability than strain BIE252. Also in case of this strain, mannose and glucose are converted in the first hours of fermentation. However, as opposed to strain BIE252, in this transformant there is some co-consumption of glucose, galactose and mannose with arabinose and especially xylose. In general, sugar consumption is faster, leading to a more complete use of all available sugars. This is also apparent from the CO₂evolution in time. In case of BIE252, a first peak is observed, which is basically the CO2 formed from glucose and mannose. After reaching a minimum of just above 10 ml/hr (FIG. 20) a second, more flat peak is observed. In case of BIE252ΔGAL80 however (FIG. 21), the second peak appears as a tail of the first peak, due to an intensified co-use of glucose, xylose, arabinose, mannose and galactose, as is apparent from the sugar analysis by NMR. In the parent strain BIE252, the use of the different sugars is more sequential. Hence, the yield of strain BIE252ΔGAL80 is higher at the end of the experiment (72 h): 0.40 grams of ethanol per gram sugar.

In conclusion, the deletion of the ORF of the GAL80 gene resulted in a further improved performance, as was tested in strain BIE252.

Example 9 Adipic Acid Production in Strain BIE201

9.1 Synthetic DNA Fragments Ordered at DNA2.0

Nine DNA fragments containing the nine open reading frames involved in the adipic acid pathway (see European Patent Application EP11160000.3 filed 28 Mar. 2011) and a S. cerevisiae promoter and terminator for efficient expression were ordered synthetically at DNA2.0 (Menlo Park, Calif. 94025, USA). In some cases homology to an adjacent part of the adipic acid pathway was added to the synthetic fragment for in vivo recombination of the pathway after transformation to BIE201. DNA2.0 delivered the synthetic fragments as cloned inserts in a standard cloning vector. This resulted in the following plasmids (between brackets the abbreviation), pADI141 (Adi21), pADI142 (Adi22), pADI143 (Adi23), pADI199 (Adi8), pADI145 (Adi24), pADI146 (Adi25), pADI149 (SucC), pADI150 (SucD) and pADI200 (Acdh67). Table 11 shows the genes involved in the pathway, the used abbreviations, source, Uniprot code and involvement in the pathway.

TABLE 11 Overview of the genes in the adipic acid pathway transformed to the BIE201 strain Uniprot Step in Abbreviation Name Source code pathway Adi21 beta-ketodipyl CoA Acinetobacter sp. Q6FBN0 1 thiolase (DcaF) Adi22 beta-hydroxy-adipoyl Acinetobacter sp. Q937T5 2 dehydrogenase(DcaH) Adi23 enoyl-CoA hydratase Acinetobacter sp. Q937T3 3 (DcaE) Adi8 trans-2-enoyl-CoA- Candida Q8WZM3 4 reductase tropicalus Adi24 acyl-CoA transferase Acinetobacter Sp. Q937T0 5 (Dcal) (subunit A) Adi25 acyl-CoA transferase Acinetobacter Sp. Q937S9 5 (Dcal) (subunit B) Acdh67 Acetylating Listeria innocua Q92CP2 Acetyl-CoA Acetaldehyde supply dehydrogenase SucC Succinyl-CoA E. coli P0A836 Succinyl- synthetase subunit A CoA supply SucD Succinyl-CoA E. coli P0AGE9 Succinyl- synthetase subunit B CoA supply

9.2 Preparation of PCR Fragments for Transformation to BIE201

In vivo homologous recombination was used to assemble and integrate the complete adipic acid pathway into BIE201. The necessary homology for recombination of the complete pathway (50-250 bp) was added during synthesis of the synthetic fragment or by adding the sequence to the primers used for amplification of the fragment. Primer sequences are listed in table 12.

TABLE 12 A list of all primer sequences used in the PCR-reactions to create the fragments for transformation to the BIE201 strain. Primer Short description SEQ ID NO 60 Forward primer for amplification of the INT1LF SEQ ID NO 61 Reverse primer for the amplification of INT1LF with a 50 bp flank overlapping Adi21 expression cassette SEQ ID NO 62 Forward primer for amplification of the Adi21 expression cassette with 50 bp flank INT1LF SEQ ID NO 63 Reverse primer for the amplification of the Adi21 expression cassette SEQ ID NO 64 Forward primer for the amplification of the Adi22 expression cassette SEQ ID NO 65 Reverse primer for the amplification of the Adi22 expression cassette SEQ ID NO 66 Forward primer for the amplification of the Adi23 expression cassette SEQ ID NO 67 Reverse primer for the amplification of the Adi23 expression cassette SEQ ID NO 68 Forward primer for the amplification of the kanMX marker from pUG7 with 50 bp flank overlapping with Adi23 SEQ ID NO 69 Reverse primer for the amplification of the kanMX marker from pUG7 with 50 bp flank overlapping with Adi8 SEQ ID NO 70 Forward primer for the amplification of the Adi8 expression cassette with 25 bp flank overlap with kanMX of pUG7 SEQ ID NO 71 Reverse primer Adi8 expression cassette SEQ ID NO 72 Forward primer for the amplification of the Adi24 expression cassette SEQ ID NO 73 Reverse primer for the amplification of the Adi24 expression cassette SEQ ID NO 74 Forward primer for the amplification of the Adi25 expression cassette SEQ ID NO 75 Reverse primer for the amplification of the Adi25 expression cassette with 50 bp overlap with SucC SEQ ID NO 76 Forward primer for the amplification of the SucC with 50 bp overlap with Adi25 SEQ ID NO 77 Reverse primer for the amplification of the SucC expression cassette SEQ ID NO 78 Forward primer for the amplification of the SucD expression cassette SEQ ID NO 79 Reverse primer for the amplification of the SucD expression cassette SEQ ID NO 80 Forward primer for the amplification of the acdh67 expression cassette SEQ ID NO 81 Reverse primer for the amplification of the acdh67 construct with 50 bp flank overlapping with INTRF SEQ ID NO 82 Forward primer for the amplification of the INT1LF site on yeast genome SEQ ID NO 83 Reverse primer for the amplification of the INT1LF site on yeast genome

In total 12 fragments (see FIG. 22) were needed to integrate the complete adipic acid pathway into the genome of BIE201, 9 PCR fragments containing the gene expression cassettes belonging to the adipic acid pathway (SEQ ID NO 84-92), one PCR fragment containing the kanMX-marker conferring resistance to G418 (SEQ ID 93) and finally the INT1LF (INTegration Left Flank) and INT1RF (INTegration Right Flank) integration flanks (SED ID NO 94 and SEQ ID NO 95 respectively). All fragments were created with overlapping homology to each neighboring fragment in the pathway and on the outside of the pathway to the INT1LF and INT1RF for integration of the pathway via a double crossover into the genome. The homologous recombination event, complete assembly and integration of the pathway, is shown in a drawing in FIG. 22. The created PCR fragments used in the transformation are listed in table 13. The sequences are included herein as SEQ ID NO 84 until and including SEQ ID NO 95. Table 13 shows information on the used promoters and terminators for the genes and the primers used in the PCR amplification reactions to create the fragments for transformation.

TABLE 13 Overview of DNA elements used for in vivo recombination/integration of the adipic acid pathway. The promoter-ORF-terminator fragments are referred to as the name of the ORF. The columns 5′ and 3′ homology indicate with which other fragment(s) homology is shared (see FIG. 22). The ‘plasmid name’ column shows the name of the DNA2.0 plasmid containing the synthetic fragment. 5′ homology 3′homology ID# ORF/ Forward Reverse with with plasmid element Promoter element terminator primer primer element element name ADI21 pTPI1 ADI21 tGND2 SEQ ID SEQ ID INT1LF ADI22 pADI141 SEQ ID NO 62 NO 63 NO 84 ADI22 pFBA1 ADI22 tPMA1 SEQ ID SEQ ID ADI21 ADI23 pADI142 SEQ ID NO 64 NO 65 NO 85 ADI23 pADH1 ADI23 tTDH1 SEQ ID SEQ ID ADI22 KANMX pADI143 SEQ ID NO 66 NO 67 NO 86 ADI8 pENO1 ADI8 tPDC1 SEQ ID SEQ ID KANMX ADI24 pADI199 SEQ ID NO 70 NO 71 NO 87 ADI24 pTDH1 ADI24 tADH2 SEQ ID SEQ ID ADI8 ADI25 pADI145 SEQ ID NO 72 NO 73 NO 88 ADI25 pENO2 ADI25 tGPM1 SEQ ID SEQ ID ADI24 SUCC pADI146 SEQ ID NO 74 NO 75 NO 89 SUCC pPDC1 SUCC tGND2 SEQ ID SEQ ID ADI25 SUCD pADI149 SEQ ID NO 76 NO 77 NO 90 SUCD pGPM1 SUCD tADH1 SEQ ID SEQ ID SUCC ACDH67 pADI150 SEQ ID NO 78 NO 79 NO 91 A67 pOYE2 ACDH67 tTPI1 SEQ ID SEQ ID SUCD INT1RF pADI200 SEQ ID NO 80 NO 81 NO 92 INT1LF — INT1LF — SEQ ID SEQ ID — ADI21 — SEQ ID NO 60 NO 61 NO 94 INT1RF — INT1RF — SEQ ID SEQ ID ACDH67 — — SEQ ID NO 82 NO 83 NO 95 KANMX — KANMX — SEQ ID SEQ ID ADI23 ADI8 pUG7 SEQ ID NO 68 NO 69 NO 93

All PCR reactions were performed with Phusion® polymerase (Finnzymes) according to the manual. The plasmids ordered at DNA2.0 were used as template for amplifying the 9 adipic acid pathway genes. The kanMX-marker was amplified from a plasmid pUG7 carrying the marker sequence. pUG7 was constructed as follows: the loxP-sites of plasmid pUG6 (Güldener, U. et al (1996) Nucleic Acids Research 24: 2519-2524) were replaced in two steps by cloning linkers containing the modified loxP-sites lox 66 and lox71 (Araki et al (1997) Nucleic Acids Research, 1997, Vol. 25, No. 4, pp 868-872). Restriction analysis and sequencing was done to confirm correct replacement.

The INT1LF and INT1RF (the left and right flanks, respectively) for integration at the “INT1 locus” were amplified using chromosomal DNA isolated from BIE104 as a template.

Size of the PCR fragments was checked with standard agarose electrophoresis techniques. PCR amplified DNA fragments were purified and concentrated with the PCR purification kit from Qiagen, according to the manual. DNA concentration was measured using the Nanodrop from Thermo scientific (A260/A280 absorbance).

9.3. Yeast Transformation

Transformation of S. cerevisiae was done as described by Gietz and Woods (2002, Methods in Enzymology 350: 87-96). BIE201 was transformed with 1 μg of each of the 12 amplified and purified PCR fragments. Transformation mixtures were plated on YPD-agar (per liter: 10 grams of yeast extract, 20 grams per liter peptone, 20 grams per liter dextrose, 20 grams of agar) containing 100 μg G418 (Sigma Aldrich) per ml. After two to four days, colonies appeared on the plates, whereas the negative control (i.e. no addition of DNA in the transformation experiment) resulted in blank YPD/G418-plates. From the transformation plate single colonies were transferred to new YPD-agar plates containing 100 μg G418 per ml. The plates were incubated 2 days at 30° C.

9.4 Adipic acid Production on Arabinose

Single colonies of 4 transformants (strains 1, 2 3 and 4) and BIE201 as a control strain were inoculated in duplo in a half deepwell MTP (microplate) containing 200 μl Verduyn medium with 2% arabinose and 0.05% glucose per well. The MTP was incubated 48 hours at 30° C., 550 rpm and 80% humidity in an Infors shaker for microplates. After 48 hours incubation 40 pl of each culture was transferred to two 24-well plates containing 2.5 ml Verduyn medium with 2% arabinose per well. The 24 well plates were covered with a standard MTP lid and incubated for 24 hours at 30° C., 550 rpm and 80% humidity. After the 24 hours incubation the 24 well plates were centrifuged for 10 minutes in Heraeus centrifuge at 2750 g. The supernatant was removed and to each well containing cell pellet, 4.5 ml fresh Verduyn media with 2% arabinose was added. The cell pellet was re-suspended with a pipette. For one plate the standard MTP lid was replaced by an airpore sheet (Qiagen) to improve aeration. For the second 24-well plate it was replaced by a BugStopper™ Capmat (Whatman) which creates a micro-aerobic environment. The 24-well plates were incubated in the Infors Microtron incubator for 72 hours at 30° C., 350 rpm and 80% humidity. After incubation the plates were centrifuged for 10 minutes at 2750 g in a Heraeus Centrifuge. Adipic acid concentrations were measured in the supernatant with LC-MS. Results are shown in table 14.

TABLE 14 Resulting adipic acid concentrations in supernatant produced by the BIE201 transformants after growth on arabinose. Adipic acid concentration Strain Used lid (mg/l) BIE201 Airpore sheets <0.2 BIE201 Airpore sheets <0.2 Strain 2 Airpore sheets 1.4 Strain 2 Airpore sheets 1.4 Strain 3 Airpore sheets 1.2 Strain 3 Airpore sheets 1.3 Strain 4 Airpore sheets 1.6 Strain 4 Airpore sheets 2.0 BIE201 Bugstopper <0.2 BIE201 Bugstopper <0.2 Strain 2 Bugstopper 3.0 Strain 2 Bugstopper 2.4 Strain 3 Bugstopper 1.8 Strain 3 Bugstopper 2.2 Strain 4 Bugstopper 2.5 Strain 4 Bugstopper 2.8

Strains 2, 3 and 4 produce adipic acid on Verduyn media with arabinose as sole C-source. Under oxygen limited conditions, i.e. with the bugstopper lids, a higher level is obtained as compared to the plates with airpore sheets.

Reference strain BIE201 grows on arabinose but does not produce adipic acid.

9.5 UPLC-MS/MS Analysis (ESI Negative Mode)

The samples were analysed with a column having the following specifications “Waters Acquity UPLC HSS T3, 1.8 μm, 100 mm*2.1 mm I.D.”. Injection volume was 5 μl using a full loop, the flow through the column was 0.250 ml/min and the column temperature was 40° C. Table 15 shows the gradient used for mobile phase A and B. Mobile phase A contains 0.1% formic acid in water and Mobile phase B contains 0.1% formic acid in acetonitril.

TABLE 15 The gradient used during UPLC-MS/MS analysis of adipic acid concentrations in the supernatant. Time (min.) 0.0 5.0 6.5 7.0 10.0 10.5 15.0 % A 100.0 85.0 85.0 20.0 20.0 100.0 100.0 % B 0.0 15.0 15.0 80.0 80.0 0.0 0.0

FIG. 23 depicts a MRM chromatogram of a standard containing 10, 5 mg/L adipic acid and a sample produced by strain 3 containing 3 mg/I adipic acid strain 3 production on arabinose with a Bugstopper.

Example 10 Succinic Acid Production

10.1 Expression Constructs

Expression construct pGBS414PPK-3 comprising a phosphoenol pyruvate carboxykinase PCKa (E.C. 4.1.1.49) from Actinobacillus succinogenes, and glycosomal fumarate reductase FRDg (E.C. 1.3.1.6) from Trypanosoma brucei, and an expression construct pGBS415FUM3 comprising a fumarase (E.C. 4.2.1.2.) from Rhizopus oryzae, and a peroxisomal malate dehydrogenase MDH3 (E.C. 1.1.1.37) were made as described previously in WO2009/065778 on p. 19-20, and 22-30 which herein enclosed by reference including the figures and sequence listing.

Expression construct pGBS416ARAABD comprising the genes araA, araB and araD, derived from Lactobacillus plantarum, were constructed by cloning a PCR product, comprising the araABD expression cassette from plasmid pPWT018, into plasmid pRS416. The PCR fragment was generated using Phusion® DNA polymerase (Finnzymes) and PCR primers defined in here as SEQ ID 96 and SEQ ID 97. The PCR product was cut with the restriction enzymes SalI and NotI, as was plasmid pRS416. After ligation and transformation of E. coli TOP10, the correct recombinants were selected on basis of restriction enzyme analysis. The physical map of plasmid pGBS416ARAABD is set out in FIG. 24.

10.2 S. Cerevisiae Strains

The plasmids pGBS414PPK-3, pGBS415-FUM-3 were transformed into S. cerevisiae strain CEN.PK113-6B (MATA ura3-52 /eu2-112 trp1-289). In addition plasmid pGBS416ARAABD is transformed into this yeast to create prototrophic yeast strains. The expression vectors were transformed into yeast by electroporation. The transformation mixtures were plated on Yeast Nitrogen Base (YNB) w/o AA (Difco)+2% glucose. One such transformant was called SUC595.

As a control, strain CEN.PK113-6B was transformed with plasmid pGBS416ARAABD only. One such transformant was called SUC600.

Strains were subjected to adaptive evolution (see Example 2, section 2.1) for growth on arabinose as sole carbon source. In Example 2, YNB-medium containing arabinose was used, while in the Example, Verduyn medium with 2% arabinose was used.

Isolated single colony isolates from the adaptive evolution shake flasks were characterized for their ability to grow on arabinose as sole carbon source. SUC689, a derivative of SUC595 through adaptive evolution, has a growth rate of 0.1 h⁻¹on arabinose as sole carbon source. SUC694, a derivative of SUC600 through adaptive evolution, has a growth rate of 0.09 h⁻¹on arabinose as sole carbon source.

10.3 Growth Experiments and Succinic Acid Production

Single colony isolates of transformants SUC689 and SUC694 were inoculated in 96 wells microplates containing YNB (Difco), 4% galactose and 2% agar. Four independent colonies were inoculated per strain. After growth for 2 days at 30° C., with the aid of a pin tool, colony material was transferred to a 96 wells microplate containing 200 μl pre-culture medium consisting of Verduyn medium (Verduyn et al., 1992, Yeast. July; 8(7):501-17) comprising 4% galactose (w/v) and grown under aerobic conditions in an Infors shaking incubator at 30° C., 550 rpm and 80% humidity. After approximately 48 hours, cells were transferred in duplicate to 24 wells microplates, containing 2.5 ml fresh Verduyn medium supplemented with 4% galactose. After 72 hours of incubation at 30° C., the plates were spun down in a microplate centrifuge, in order to separate the cells from the medium. The supernatant was discarded. The cells were resuspended in 4 ml Verduyn medium comprising 8% arabinose. At two time intervals, 48 hours (microplate 1) and 72 hours (microplate 2), the incubation was stopped by spinning down the cells. The supernatant was used to measure succinic acid levels by NMR as described in section 10.4.

10.4 NMR Analysis

NMR was performed for the determination of organic acids and sugars in broth samples.

The results are presented in tables 16 and 17.

TABLE 16 Results of the NMR analysis at time point 48 hours. Strain Arabinose Malic acid Glycerol Succinic acid Ethanol SUC689 18.5 0.4 3.3 0.7 8.4 SUC689 14.5 0.4 4.3 0.8 10.0 SUC689 16.6 0.4 4.3 0.8 9.7 SUC689 14.9 0.4 4.1 0.7 9.1 SUC694 0.7 N.D. N.D. 0.2 18.8 SUC694 0.4 N.D. 0.0 0.2 18.5 SUC694 1.1 N.D. N.D. 0.3 18.4 SUC694 0.7 N.D. N.D. 0.2 17.8 All values are in grams per litre. N.D. means not detected.

TABLE 17 Results of the NMR analysis at time point 72 hours. Strain Arabinose Malic acid Glycerol Succinic acid Ethanol SUC689 14.0 0.5 3.5 0.7 6.7 SUC689 11.2 0.5 4.3 0.8 6.8 SUC689 13.7 0.5 3.9 0.8 6.0 SUC689 10.3 0.5 3.9 0.7 7.5 SUC694 0.1 N.D. N.D. 0.2 15.6 SUC694 0.1 N.D. N.D. 0.2 15.2 SUC694 0.2 N.D. N.D. 0.2 15.6 SUC694 0.3 N.D. N.D. 0.3 13.6 All values are in grams per litre. N.D. means not detected.

It is clear from tables 16 and 17 that the amount of succinic acid is higher in case of strain SUC689, as compared to strain SUC694. The latter converts almost all arabinose, and as products mainly biomass and ethanol were formed. In case of strain SUC689, less ethanol is formed, but a significantly higher amount of succinic acid, 3 to 4 times higher as compared to SUC694. Succinic acid yields were calculated and shown in the table below.

TABLE 18 Succinic acid yields on arabinose as a carbon source. Average succinic acid Average succinic acid yield (gram succinic acid yield (gram succinic acid per gram arabinose) at 48 per gram arabinose) at 72 Strain hours hours SUC689 0.012 0.011 SUC694 0.003 0.003

In conclusion, succinic acid was produced from arabinose in strain SUC689, which was significantly lower in strain SUC694, the strain not expressing the succinic acid pathway.

Example 11 Introduction of Extra Copies of the araA, araB and araD-Genes

11.1 Amplification of the araABD-Cassette

In order to introduce extra copies of the araA, araB and araD genes into the genome, a PCR reaction is performed using Phusion® DNA polymerase (Finnzymes) with plasmid pPWT018 as a template and the oligonucleotides with SEQ ID 98 and SEQ ID 99 as primers. With these primers, the araABD-cassette is being amplified. The primer design is such that the flanks of the PCR fragment are homologous to the consensus sequence of the delta-sequences of the yeast transposon Ty-1. These sequences can be obtained from NCBI (http://www.ncbi.nlm.nih.gov/) and aligned using a software package allowing to do so, like e.g. Clone Manager 9 Professional Edition (Scientific & Educational Software, Cary, USA).

The araABD-cassette does not contain a selectable marker with which the integration into the genome can be selected for. In order to estimate transformation frequency, a second control transformation was done with the kanMX-marker. To this end, the kanMX-cassette from plasmid p427TEF (Dualsystems Biotech) was amplified in a PCR reaction using the primers corresponding to SEQ ID NO 100 and SEQ ID NO 101.

11.2 Transformation of BIE104A2P1

BIE104A2P1 is transformed according to the electroporation protocol (as described above) with the fragments comprising either 30 μg of the araABD-cassette (designated Ty1::araABD) or 10 μg of the kanMX-cassette. The kanMX-transformation mixture is plated on YPD-agar (per liter: 10 grams of yeast extract, 20 grams per liter peptone, 20 grams per liter dextrose, 20 grams of agar) containing 100 μg G418 (Sigma Aldrich) per ml. After two to four days, colonies are appearing on the plates, whereas the negative control (i.e. no addition of DNA in the transformation experiment) is resulting in blank YPD/G418-plates. The transformation frequency is higher than 600 colonies per pg of kanMX-cassette.

The Ty1::araABD transformation mixture is used to inoculate a shake flask containing 100 ml of Verduyn medium, supplemented with 2% arabinose. As a control, the negative control of the transformation (i.e. no addition of DNA in the transformation experiment) is used. The shake flasks were incubated at 30° C. and 280 rpm in an orbital shaker. Growth is followed by measuring the optical density at 600 nm on a regular basis.

After approximately 25 days, the optical density of the Ty1::araABD shake flask increases, while the growth in the negative control is still absent. At day 25, a flask containing fresh Verduyn medium supplemented with 2% arabinose is inoculated from the Ty1::araABD culture to a start optical density at 600 nm of 0.15. The culture starts to grow on arabinose immediately and rapidly. Since it is likely that the culture consists of a mixture of subcultures, thus consisting of cells with differences in copy number of the Ty1::araABD cassette and in growth rate on arabinose, cells are diluted in milliQ water and are plated on YPD-agar plates in order to get single colony isolates. The single colony isolates are tested for their ability to utilize different carbon sources.

11.3 Selection of Better Arabinose Converting Strains

In order to select a strain which has gained improved growth on arabinose as a sole carbon source without losing its ability to utilize the other important sugars (glucose, and galactose), ten single colony isolates of the adaptive evolution culture are restreaked on YPD-agar. Subsequently, a preculture is done on YPD-medium supplemented with 2% glucose. The ten cultures are incubated overnight at 30° C. and 280° C. Aliquots of each culture are used to inoculate fresh Verduyn medium supplemented with either 2% glucose, or 2% arabinose or 2% galactose, at an initial optical density of 0.15. As controls, strains BIE201, BIE104A2P1 and the mixed population (from which the ten single colony isolates are retrieved) are included in the experiment. Cells are grown at 30° C. and 280 rpm in an orbital shaker. Growth is assessed on basis of optical density measurements at 600 nm.

The results are showing that both the mixed culture and the ten single colony isolates exhibit a higher final optical density at 600 nm.

One colony (colony T) is selected on basis of its growth on arabinose as sole carbon source. This colony, if inoculated in Verduyn medium supplemented with 2% arabinose, is showing a higher growth rate than parent strain BIE104A2P1. Its growth rate is comparable to the growth rate of strain BIE201.

Q-PCR is done on the chromosomal DNA of strains BIE201, BIE104A2P1 and colony T. The copy number of the araABD cassette is determined to be 1 in case of BIE104A2P1, and larger than 2 in case of both colony T and BIE201.

Claims

1. A process for producing cells which are capable of converting arabinose, comprising:

a) Introducing into a host strain that cannot convert arabinose, genes araA, araB and araD, to form a constructed cell;

b) Subjecting the constructed cell to adaptive evolution until a first arabinose converting cell that converts arabinose is obtained, c) Optionally, subjecting the first arabinose converting cell to adaptive evolution to improve the arabinose conversion; said cell produced in step b) or c) is designated as first arabinose converting cell;

d) Analysing a full genome or part of a genome of said first arabinose converting cell and that of said constructed cell;

e) Identifying single nucleotide polymorphisms (SNP's) in said first arabinose converting cell; and

f) Using information of said SNP's in rational design of a cell capable of converting arabinose;

g) Constructing said cell capable of converting arabinose designed in f).

2. The process according to claim 1, wherein in e), f) and/or g) at least one technique of phenotyping is used in combination with at least one technique of genotyping.

3. The process according to claim 1, wherein, in said process, a yeast cell capable of converting arabinose has a chromosome that is amplified compared to the host strain, wherein the amplified chromosome has the same number as the chromosome in which the araA, araB and araD genes were introduced in the host strain.

4. The process according to claim 3, wherein said amplified chromosome is chromosome VII.

5. A yeast cell having araA, araB and araD genes wherein chromosome VII has a size of from 1300 to 1600 Kb as determined by electrophoresis, with the exclusion of a yeast cell BIE201.

6. The yeast cell according to claim 5, wherein a copy number of the araA, araB and araD genes is from three to five each.

7. The yeast cell according to claim 6, comprising at least one single nucleotide polymorphism selected from the group consisting of mutations G1363T in the SSY1 gene, A512T in YJR154w gene, A1186G in CEP3 gene, and A436C in GAL80 gene.

8. The yeast cell according to claim 7, comprising a single polymorphism A436C in GAL80 gene.

9. The yeast cell according to claim 8, comprising a single nucleotide polymorphism A1186G in CEP3 gene.

10. A polypeptide belonging to the group consisting of the polypeptides: and variant polypeptides thereof, wherein at least one of other positions may have mutation of an aminoacid with an aminoacid that is an existing conserved aminoacid in NADB Rossmann superfamily.

a. A polypeptide comprising the sequence encoded by polynucleotide SEQ ID NO: 14 having a substitution E455stop in SSY1 and variant polypeptides thereof wherein at least one of other positions have mutation of an aminoacid with another aminoacid that is an existing aminoacid in AA trans superfamily;

b. A polypeptide comprising the sequence encoded by the polynucleotide SEQ ID NO: 16 having a substitution D171G in YJR154w and variant polypeptides thereof wherein at least one of other positions have mutation of an aminoacid with another aminoacid that is an existing conserved aminoacid in PhyH superfamily;

c. A polypeptide comprising the sequence encoded by the polynucleotide SEQ ID NO: 18 comprising a substitution S396G in CEP3;

d. A polypeptide comprising the sequence encoded by SEQ ID NO: 20 comprising a substitution T146P in GAL80;

11. A process for producing at least one fermentation product from a sugar composition comprising glucose, galactose, arabinose and xylose, said process comprising fermenting said sugar composition with a yeast cell according to claim 5.

12. The process according to claim 11, wherein said sugar composition is produced from lignocellulosic material by:

a) pretreatment of at least one lignocellulosic material to produce pretreated lignocellulosic material;

b) enzymatic treatment of said pretreated lignocellulosic material to produce said sugar composition.

13. The process according to claim 11, wherein said fermentation is anaerobic.

14. The process according to claim 11, wherein said fermentation product is selected from the group consisting of ethanol, n-butanol, isobutanol, lactic acid, 3-hydroxy-propionic acid, acrylic acid, acetic acid, succinic acid, fumaric acid, malic acid, itaconic acid, maleic acid, citric acid, adipic acid, an amino acid, such as lysine, methionine, tryptophan, threonine, and aspartic acid, 1,3-propane-diol, ethylene, glycerol, a β-lactam antibiotic and a cephalosporin, vitamins, pharmaceuticals, animal feed supplements, specialty chemicals, chemical feedstocks, plastics, solvents, fuels, biofuels and biogas or organic polymers, and an industrial enzyme, a protease, a cellulase, an amylase, a glucanase, a lactase, a lipase, a lyase, an oxidoreductase, a transferase or a xylanase.