NOVEL CRISPR ENZYMES AND SYSTEMS

- THE BROAD INSTITUTE, INC.

The present disclosure provides for systems, methods, and compositions for targeting nucleic acids. In particular, the invention provides mutated Cas13 proteins and their use in modifying target sequences as well as mutated Cas13 nucleic acid sequences and vectors encoding mutated Cas13 proteins and vector systems or CRISPR-Cas13 systems.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No. 62/712,809, filed Jul. 31, 2018, U.S. Provisional Application No. 62/751,421, filed Oct. 26, 2018, U.S. Provisional Application No. 62/775,865, filed Dec. 5, 2018, U.S. Provisional Application No. 62/822,639, filed Mar. 22, 2019, and U.S. Provisional Application No. 62/873,031, filed Jul. 11, 2019. The entire contents of the above-identified applications are hereby fully incorporated herein by reference.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH

This invention was made with government support under Grant Nos. HG009761, MH110049 and HL141201 awarded by the National Institutes of Health. The government has certain rights in the invention.

REFERENCE TO AN ELECTRONIC SEQUENCE LISTING

The contents of the electronic sequence listing (“BROD-2660WP_ST25.txt”; Size is 1,997,857 bytes and it was created on Jul. 25, 2019) is herein incorporated by reference in its entirety.

TECHNICAL FIELD

The present invention generally relates to systems, methods and compositions used for the control of gene expression involving sequence targeting, such as perturbation of gene transcripts or nucleic acid editing, that may use vector systems related to Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and components thereof.

BACKGROUND

The CRISPR-CRISPR associated (Cas) systems of bacterial and archaeal adaptive immunity are some such systems that show extreme diversity of protein composition and genomic loci architecture. The CRISPR-Cas system loci have more than 50 gene families and there is no strictly universal genes indicating fast evolution and extreme diversity of loci architecture. So far, adopting a multi-pronged approach, there is comprehensive cas gene identification of about 395 profiles for 93 Cas proteins. Classification includes signature gene profiles plus signatures of locus architecture. A new classification of CRISPR-Cas systems is proposed in which these systems are broadly divided into two classes, Class 1 with multisubunit effector complexes and Class 2 with single-subunit effector modules exemplified by the Cas9 protein. Novel effector proteins associated with Class 2 CRISPR-Cas systems may be developed as powerful genome engineering tools and the prediction of putative novel effector proteins and their engineering and optimization is important. Novel Cas13b orthologues and uses thereof are desirable.

Following the demonstration that CRISPR-Cas9 could be repurposed for genome editing, interest in leveraging CRISPR systems lead to the discovery of several new Cas enzymes and CRISPR systems with novel properties (1-3). Notable amongst these new discoveries are the Class 2 type VI CRISPR-Cas13 systems, which use a single enzyme to target RNA using a programmable CRISPR-RNA (crRNA) guide (1-6). Cas13 binding to target single-stranded RNA activates a general RNase activity that cleaves the target and degrades surrounding RNA non-specifically (4). Type VI systems have been used for RNA knockdown, transcript labeling, RNA editing, and ultra-sensitive virus detection (3, 4, 7-12). CRISPR-Cas13 systems are further divided into four subtypes based on the identity of the Cas13 protein (Cas13a-d) (2). All Cas13 protein family members contain two Higher Eukaryotes and Prokaryotes Nucleotide-binding (HEPN) domains. Citation or identification of any document in this application is not an admission that such document is available as prior art to the present invention.

There exists a pressing need for alternative and robust systems and techniques for targeting nucleic acids or polynucleotides (e.g. DNA or RNA or any hybrid or derivative thereof) with a wide array of applications, in particular development of effector proteins having an altered functionality, such as including, but not limited to increased or decreased specificity, increased or decreased activity, altered specificity and/or activity, alternative PAM recognition, etc. This invention addresses this need and provides related advantages. Adding the novel RNA-targeting systems of the present application to the repertoire of genomic, transcriptomic, and epigenomic targeting technologies may transform the study and perturbation or editing of specific target sites through direct detection, analysis and manipulation. To utilize the RNA-targeting systems of the present application effectively for RNA targeting without deleterious effects, it is critical to understand aspects of engineering and optimization of these RNA targeting tools.

SUMMARY

In one aspect, the present disclosure provides an engineered CRISPR-Cas protein comprising one or more HEPN domains and further comprising one or more modified amino acids, wherein the amino acids: interact with a guide RNA that forms a complex with the engineered CRISPR-Cas protein; are in a HEPN active site, an inter-domain linker domain, a lid domain, a helical domain 1, a helical domain 2, or a bridge helix domain of the engineered CRISPR-Cas protein; or a combination thereof.

In some embodiments, the HEPN domain comprises RxxxxH motif. In some embodiments, the RxxxxH motif comprises a R{N/H/K}X1X2X3H (SEQ ID NO:78) sequence. In some embodiments, in the R{N/H/K}X1X2X3H sequence, X1 is R, S, D, E, Q, N, G, or Y, X2 is independently I, S, T, V, or L, and X3 is independently L, F, N, Y, V, I, S, D, E, or A.

In some embodiments, the CRISPR-Cas protein is a Type VI CRISPR Cas protein. In some embodiments, the Type VI CRISPR Cas protein is Cas13. In some embodiments, the Type VI CRISPR Cas protein is a Cas13a, a Cas13b, a Cas13c, or a Cas13d.

In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): T405, H407, K457, H500, K570, K590, N634, R638, N652, N653, K655, S658, K741, K744, N756, S757, R762, R791, K846, K857, K870, R877, K183, K193, R600, K607, K612, R614, K617, K826, K828, K829, R824, R830, Q831, K835, K836, R838, R618, D434, K431, R53, K943, R1041, Y164, R285, R287, K292, E296, N297, Q646, N647, R402, K393, N653, N652, R482, N480, D396, E397, D398, E399, K294, E400, R56, N157, H161, H452, N455, K484, N486, G566, H567, A656, V795, A796, W842, K871, E873, R874, R1068, N1069, or H1073.

In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): H407, K457, H500, K570, K590, N634, R638, N652, N653, K655, S658, K741, K744, N756, S757, R762, R791, K846, K857, K870, R877, K183, K193, R600, K607, K612, R614, K617, K826, K828, K829, R824, R830, Q831, K835, K836, R838, R618, D434, K431, R53, K943, R1041, Y164, R285, R287, K292, E296, N297, Q646, N647, R402, K393, N653, N652, R482, N480, D396, E397, D398, E399, K294, E400, R56, N157, H161, H452, N455, K484, N486, G566, H567, W842, K871, E873, R874, R1068, N1069, H1073.

In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): T405, H407, K457, H500, K570, K590, N634, R638, N652, N653, K655, S658, K741, K744, N756, S757, R762, R791, K846, K857, K870, R877, K183, K193, R600, K607, K612, R614, K617, K826, K828, K829, R824, R830, Q831, K835, K836, R838, R618, D434, K431, R53, K943, R1041, Y164, R285, R287, K292, E296, N297, Q646, N647, R402, K393, N653, N652, R482, N480, D396, E397, D398, E399, K294, or E400.

In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K393, R402, N482, T405, H407, S658, N653, A656, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, K741, R56, N157, H161, R1068, N1069, or H1073.

In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: K393, R402, N482, H407, S658, N653, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, K741, R56, N157, H161, R1068, N1069, or H1073.

In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: W842, K846, K870, E873, or R877. In some embodiments, in helical domain 1 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1 of PbCas13b: W842, K846, K870, E873, or R877. In some embodiments, in helical domain 1-3 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1-3 of PbCas13b: W842, K846, K870, E873, or R877. In some embodiments, in the bridge helix domain one or more mutation of an amino acid corresponding to the following amino acids in the bridge helix domain of PbCas13b: W842, K846, K870, E873, or R877. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: K393, R402, N480, N482, N652, or N653. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: K393, R402, N480, or N482. In some embodiments, in the LID domain one or more mutation of an amino acid corresponding to the following amino acids in the LID domain of PbCas13b: K393, R402, N480, or N482. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: N652 or N653. In some embodiments, in helical domain 2 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 2 of PbCas13b: N652 or N653. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: T405, H407, S658, N653, A656, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, or K741. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: H407, S658, N653, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, or K741. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: S658, N653, A656, K655, N652, H567, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, G566, K590, R638, S757, N756, or K741. In some embodiments, in a helical domain one or more mutation of an amino acid corresponding to the following amino acids in a helical domain of PbCas13b: S658, N653, A656, K655, N652, H567, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, G566, K590, R638, S757, N756, or K741. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: H567, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, G566, S757, or N756.

In some embodiments, in helical domain 1 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1 of PbCas13b: H567, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, G566, S757, or N756. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: H567, H500, R762, V795, A796, R791, G566, S757, or N756. In some embodiments, in helical domain 1 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1 of PbCas13b: H567, H500, R762, V795, A796, R791, G566, S757, or N756. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: K871, K857, K870, W842, E873, R877, K846, or R874. In some embodiments, in the bridge helix domain one or more mutation of an amino acid corresponding to the following amino acids in the bridge helix domain of PbCas13b: K871, K857, K870, W842, E873, R877, K846, or R874. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: H567, H500, or G566.

In some embodiments, in helical domain 1-2 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1-2 of PbCas13b: H567, H500, or G566. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, S757, or N756. In some embodiments, in helical domain 1-3 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1-3 of PbCas13b: K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, S757, or N756. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: R762, V795, A796, R791, S757, or N756. In some embodiments, in helical domain 1-3 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1-3 of PbCas13b: R762, V795, A796, R791, S757, or N756. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: S658, N653, A656, K655, N652, K590, R638, or K741. In some embodiments, in helical domain 2 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 2 of PbCas13b: S658, N653, A656, K655, N652, K590, R638, or K741. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: T405, H407, N486, K484, N480, H452, N455, or K457. In some embodiments, in the LID domain one or more mutation of an amino acid corresponding to the following amino acids in the LID domain of PbCas13b: T405, H407, N486, K484, N480, H452, N455, or K457. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: S658, N653, K655, N652, H567, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, G566, K590, R638, S757, N756, or K741. In some embodiments, in a helical domain one or more mutation of an amino acid corresponding to the following amino acids in a helical domain of PbCas13b: S658, N653, K655, N652, H567, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, G566, K590, R638, S757, N756, or K741. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: H567, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, G566, S757, or N756. In some embodiments, in helical domain 1 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1 of PbCas13b: H567, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, G566, S757, or N756. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: H567, H500, R762, R791, G566, S757, or N756. In some embodiments, in helical domain 1 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1 of PbCas13b: H567, H500, R762, R791, G566, S757, or N756. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, S757, or N756. In some embodiments, in helical domain 1-3 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1-3 of PbCas13b: K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, S757, or N756.

In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: R762, R791, S757, or N756. In some embodiments, in helical domain 1-3 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1-3 of PbCas13b: R762, R791, S757, or N756. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: S658, N653, K655, N652, K590, R638, or K741. In some embodiments, in helical domain 2 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 2 of PbCas13b: S658, N653, K655, N652, K590, R638, or K741. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: H407, N486, K484, N480, H452, N455, or K457.

In some embodiments, in the LID domain one or more mutation of an amino acid corresponding to the following amino acids in the LID domain of PbCas13b: H407, N486, K484, N480, H452, N455, or K457. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: R56, N157, H161, R1068, N1069, or H1073. In some embodiments, in a HEPN domain one or more mutation of an amino acid corresponding to the following amino acids in a HEPN domain of PbCas13b: R56, N157, H161, R1068, N1069, or H1073. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: R56, N157, or H161. In some embodiments, in HEPN domain 1 one or more mutation of an amino acid corresponding to the following amino acids in HEPN domain 1 of PbCas13b: R56, N157, or H161. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: R1068, N1069, or H1073. In some embodiments, in HEPN domain 2 one or more mutation of an amino acid corresponding to the following amino acids in HEPN domain 2 of PbCas13b: R1068, N1069, or H1073. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: K393, R402, N482, T405, H407, N486, K484, N480, H452, N455, or K457. In some embodiments, in the LID domain one or more mutation of an amino acid corresponding to the following amino acids in the LID domain of PbCas13b: K393, R402, N482, T405, H407, N486, K484, N480, H452, N455, or K457. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: K393, R402, N482, H407, N486, K484, N480, H452, N455, or K457. In some embodiments, in the LID domain one or more mutation of an amino acid corresponding to the following amino acids in the LID domain of PbCas13b: K393, R402, N482, H407, N486, K484, N480, H452, N455, or K457. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: T405, H407, S658, N653, A656, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, K741, K393, R402, or N482. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: H407, S658, N653, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, K741, K393, R402, or N482. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: S658, N653, A656, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, or K741. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: S658, N653, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, or K741. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: N486, K484, N480, H452, N455, or K457.

In some embodiments, in the LID domain one or more mutation of an amino acid corresponding to the following amino acids in the LID domain of PbCas13b: N486, K484, N480, H452, N455, or K457. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: K393, R402, N482, N486, K484, N480, H452, N455, or K457. In some embodiments, in the LID domain one or more mutation of an amino acid corresponding to the following amino acids in the LID domain of PbCas13b: K393, R402, N482, N486, K484, N480, H452, N455, or K457. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: S658, N653, A656, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, K741, K393, R402, or N482. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of PbCas13b: S658, N653, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, K741, K393, R402, or N482.

In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K943, or R1041. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53 or Y164. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K943 or R1041. In some embodiments, in a HEPN domain one or more mutation of an amino acid corresponding to the following amino acids in a HEPN domain of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K943, or R1041. In some embodiments, in HEPN domain 1 one or more mutation of an amino acid corresponding to the following amino acids in HEPN domain 1 of Prevotella buccae Cas13b (PbCas13b): R53 or Y164. In some embodiments, in HEPN domain 2 one or more mutation of an amino acid corresponding to the following amino acids in HEPN domain 2 of Prevotella buccae Cas13b (PbCas13b): K943 or R1041. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K943, R1041, R56, N157, H161, R1068, N1069, or H1073. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, Y164, R56, N157, or H161. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K943, R1041, R1068, N1069, or H1073. In some embodiments, in a HEPN domain one or more mutation of an amino acid corresponding to the following amino acids in a HEPN domain of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K943, R1041, R56, N157, H161, R1068, N1069, or H1073. In some embodiments, in HEPN domain 1 one or more mutation of an amino acid corresponding to the following amino acids in HEPN domain 1 of Prevotella buccae Cas13b (PbCas13b): R53, Y164, R56, N157, or H161. In some embodiments, in HEPN domain 2 one or more mutation of an amino acid corresponding to the following amino acids in HEPN domain 2 of Prevotella buccae Cas13b (PbCas13b): K943, R1041, R1068, N1069, or H1073. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K183, K193, K943, or R1041. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K183, or K193. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K943 or R1041.

In some embodiments, in a HEPN domain one or more mutation of an amino acid corresponding to the following amino acids in a HEPN domain of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K183, K193, K943, or R1041. In some embodiments, in HEPN domain 1 one or more mutation of an amino acid corresponding to the following amino acids in HEPN domain 1 of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K183, or K193. In some embodiments, in HEPN domain 2 one or more mutation of an amino acid corresponding to the following amino acids in HEPN domain 2 of Prevotella buccae Cas13b (PbCas13b): K943 or R1041. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K183, K193, K943, R1041, R56, N157, H161, R1068, N1069, or H1073. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K183, K193, R56, N157, or H161. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K943, R1041, R1068, N1069, or H1073. In some embodiments, in a HEPN domain one or more mutation of an amino acid corresponding to the following amino acids in a HEPN domain of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K183, K193, K943, R1041, R56, N157, H161, R1068, N1069, or H1073. In some embodiments, in HEPN domain 1 one or more mutation of an amino acid corresponding to the following amino acids in HEPN domain 1 of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K183, K193, R56, N157, or H161.

In some embodiments, in HEPN domain 2 one or more mutation of an amino acid corresponding to the following amino acids in HEPN domain 2 of Prevotella buccae Cas13b (PbCas13b): K943, R1041, R1068, N1069, or H1073. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K183 or K193. In some embodiments, in HEPN domain 1 one or more mutation of an amino acid corresponding to the following amino acids in HEPN domain 1 of Prevotella buccae Cas13b (PbCas13b): K183 or K193. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K943, or R1041. In some embodiments, in a HEPN domain one or more mutation of an amino acid corresponding to the following amino acids in a HEPN domain of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K943, or R1041. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, K943, or R1041; preferably R53A, R53K, R53D, or R53E; K943A, K943R, K943D, or K943E; or R1041A, R1041K, R1041D, or R1041E. In some embodiments, in a HEPN domain one or more mutation of an amino acid corresponding to the following amino acids in a HEPN domain of Prevotella buccae Cas13b (PbCas13b): R53, K943, or R1041; preferably R53A, R53K, R53D, or R53E; K943A, K943R, K943D, or K943E; or R1041A, R1041K, R1041D, or R1041E. In some embodiments, a mutation of an amino acid corresponding to amino acid Y164 of Prevotella buccae Cas13b (PbCas13b), preferably Y164A, Y164F, or Y164W.

In some embodiments, HEPN domain 1 a mutation of an amino acid corresponding to amino acid Y164 HEPN domain 1 of Prevotella buccae Cas13b (PbCas13b), preferably Y164A, Y164F, or Y164W. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): T405, H407, K457, D434, K431, R402, K393, R482, N480, D396, E397, D398, or E399. In some embodiments, in the LID domain one or more mutation of an amino acid corresponding to the following amino acids in the LID domain of Prevotella buccae Cas13b (PbCas13b): T405, H407, K457, D434, K431, R402, K393, R482, N480, D396, E397, D398, or E399. In some embodiments, a mutation of an amino acid corresponding to amino acid H407 of Prevotella buccae Cas13b (PbCas13b), preferably H407Y, H407W, or H407F. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R402, K393, R482, N480, D396, E397, D398, or E399. In some embodiments, in the LID domain one or more mutation of an amino acid corresponding to the following amino acids in the LID domain of Prevotella buccae Cas13b (PbCas13b): R402, K393, R482, N480, D396, E397, D398, or E399. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K457, D434, or K431. In some embodiments, in the LID domain one or more mutation of an amino acid corresponding to the following amino acids in the LID domain of Prevotella buccae Cas13b (PbCas13b): K457, D434, or K431.

In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): H500, K570, K590, N634, R638, N652, N653, K655, S658, K741, K744, N756, S757, R762, R791, K846, K857, K870, R877, R600, K607, K612, R614, K617, K826, K828, K829, R824, R830, Q831, K835, K836, R838, R618, Q646, N647, N653, or N652. In some embodiments, in a helical domain one or more mutation of an amino acid corresponding to the following amino acids in a helical domain of Prevotella buccae Cas13b (PbCas13b): H500, K570, K590, N634, R638, N652, N653, K655, S658, K741, K744, N756, S757, R762, R791, K846, K857, K870, R877, R600, K607, K612, R614, K617, K826, K828, K829, R824, R830, Q831, K835, K836, R838, R618, Q646, N647, N653, or N652. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): H500, K570, N756, S757, R762, R791, K846, K857, K870, R877, K826, K828, K829, R824, R830, Q831, K835, K836, or R838. In some embodiments, in helical domain 1 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1 of Prevotella buccae Cas13b (PbCas13b): H500, K570, N756, S757, R762, R791, K846, K857, K870, R877, K826, K828, K829, R824, R830, Q831, K835, K836, or R838. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): H500, K570, N756, S757, R762, or R791. In some embodiments, in helical domain 1 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1 of Prevotella buccae Cas13b (PbCas13b): H500, K570, N756, S757, R762, or R791.

In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K846, K857, K870, R877, K826, K828, K829, R824, R830, Q831, K835, K836, or R838. In some embodiments, in the bridge helix domain one or more mutation of an amino acid corresponding to the following amino acids in the bridge helix domain of Prevotella buccae Cas13b (PbCas13b): K846, K857, K870, R877, K826, K828, K829, R824, R830, Q831, K835, K836, or R838. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): H500 or K570. In some embodiments, in helical domain 1-2 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1-2 of Prevotella buccae Cas13b (PbCas13b): H500 or K570. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): N756, S757, R762, R791, K846, K857, K870, R877, K826, K828, K829, R824, R830, Q831, K835, K836, or R838. In some embodiments, in helical domain 1-3 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1-3 of Prevotella buccae Cas13b (PbCas13b): N756, S757, R762, R791, K846, K857, K870, R877, K826, K828, K829, R824, R830, Q831, K835, K836, or R838. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): N756, S757, R762, or R791. In some embodiments, in helical domain 1-3 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1-3 of Prevotella buccae Cas13b (PbCas13b): N756, S757, R762, or R791. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): N756, S757, R762, R791, K846, K857, K870, or R877. In some embodiments, in helical domain 1-3 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1-3 of Prevotella buccae Cas13b (PbCas13b): N756, S757, R762, R791, K846, K857, K870, or R877. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K826, K828, K829, R824, R830, Q831, K835, K836, or R838. In some embodiments, in helical domain 1-3 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1-3 of Prevotella buccae Cas13b (PbCas13b): K826, K828, K829, R824, R830, Q831, K835, K836, or R838. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K590, N634, R638, N652, N653, K655, S658, K741, K744, R600, K607, K612, R614, K617, R618, Q646, N647, N653, or N652. In some embodiments, in helical domain 2 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 2 of Prevotella buccae Cas13b (PbCas13b): K590, N634, R638, N652, N653, K655, S658, K741, K744, R600, K607, K612, R614, K617, R618, Q646, N647, N653, or N652.

In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): Q646 or N647. In some embodiments, in helical domain 2 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 2 of Prevotella buccae Cas13b (PbCas13b): Q646 or N647. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): N653 or N652. In some embodiments, in helical domain 2 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 2 of Prevotella buccae Cas13b (PbCas13b): N653 or N652. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K590, N634, R638, N652, N653, K655, S658, K741, or K744. In some embodiments, in helical domain 2 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 2 of Prevotella buccae Cas13b (PbCas13b): K590, N634, R638, N652, N653, K655, S658, K741, or K744. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R600, K607, K612, R614, K617, or R618. In some embodiments, in helical domain 2 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 2 of Prevotella buccae Cas13b (PbCas13b): R600, K607, K612, R614, K617, or R618. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R285, R287, K292, E296, N297, or K294. In some embodiments, in the IDL domain one or more mutation of an amino acid corresponding to the following amino acids in the IDL domain of Prevotella buccae Cas13b (PbCas13b): R285, R287, K292, E296, N297, or K294. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R285, K292, E296, or N297. In some embodiments, in the IDL domain one or more mutation of an amino acid corresponding to the following amino acids in the IDL domain of Prevotella buccae Cas13b (PbCas13b): R285, K292, E296, or N297.

In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): T405, H500, K570, K590, N634, R638, N652, N653, K655, S658, K741, K744, N756, S757, R762, R791, K846, K857, K870, R877, K183, K193, R600, K607, K612, R614, K617, K826, K828, K829, R824, R830, Q831, K835, K836, R838, R618, D434, K431, R285, R287, K292, E296, N297, Q646, N647, or K294. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R402, K393, N653, N652, R482, N480, D396, E397, D398, or E399. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, K655, R762, or R1041; preferably R53A or R53D; K655A; R762A; or R1041E or R1041D. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): N297, E296, K292, or R285; preferably N297A, E296A, K292A, or R285A. In some embodiments, in (the central channel of) the IDL domain one or more mutation of an amino acid corresponding to the following amino acids in (the central channel of) the IDL domain of Prevotella buccae Cas13b (PbCas13b): N297, E296, K292, or R285; preferably N297A, E296A, K292A, or R285A. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): Q831, K836, R838, N652, N653, R830, K655 or R762; preferably Q831A, K836A, R838A, N652A, N653A, R830A, K655A, or R762A. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): N652, N653, R830, K655 or R762; preferably N652A, N653A, R830A, K655A, or R762A. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K655 or R762; preferably K655A or R762A. In some embodiments, in a helical domain one or more mutation of an amino acid corresponding to the following amino acids in a helical domain of Prevotella buccae Cas13b (PbCas13b): Q831, K836, R838, N652, N653, R830, K655 or R762; preferably Q831A, K836A, R838A, N652A, N653A, R830A, K655A, or R762A.

In some embodiments, a helical domain one or more mutation of an amino acid corresponding to the following amino acids a helical domain of Prevotella buccae Cas13b (PbCas13b): N652, N653, R830, K655 or R762; preferably N652A, N653A, R830A, K655A, or R762A. In some embodiments, in helical domain 2 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 2 of Prevotella buccae Cas13b (PbCas13b): K655 or R762; preferably K655A or R762A. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R614, K607, K193, K183 or R600; preferably R614A, K607A, K193A, K183A or R600A. In some embodiments, in the trans-subunit loop of helical domain 2 one or more mutation of an amino acid corresponding to the following amino acids in the trans-subunit loop of helical domain 2 of Prevotella buccae Cas13b (PbCas13b): Q646 or N647; preferably Q646A or N647A. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53 or R1041; preferably R53A or R53D, or R1041E or R1041D. In some embodiments, in a HEPN domain one or more mutation of an amino acid corresponding to the following amino acids in a HEPN domain of Prevotella buccae Cas13b (PbCas13b): R53 or R1041; preferably R53A or R53D, or R1041E or R1041D. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K457, D397, E398, D399, E400, T405, H407 or D434; preferably D397A, E398A, D399A, E400A, T405A, H407A, H407W, H407Y, H407F or D434A. In some embodiments, in the LID domain one or more mutation of an amino acid corresponding to the following amino acids in the LID domain of Prevotella buccae Cas13b (PbCas13b): K457, D397, E398, D399, E400, T405, H407 or D434; preferably D397A, E398A, D399A, E400A, T405A, H407A, H407W, H407Y, H407F or D434A. In some embodiments, the amino acids correspond to the following amino acids of Prevotella buccae Cas13b (PbCas13b): amino acids 46-57, 73-79, 152-164, 1036-1046, and 1064-1074. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R156, N157, H161, R1068, N1069, and H1073. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R285, R287, K292, K294, E296, and N297. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K826, K828, K829, R824, R830, Q831, K835, K836, and R838. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): T405, H407, K457, H500, K570, K590, N634, R638, N652, N653, K655, S658, K741, K744, N756, S757, R762, R791, K846, K857, K870, and R877.

In some embodiments, a mutation of an amino acid corresponding to amino acid T405 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid H407 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K457 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid H500 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K570 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K590 of Prevotella buccae Cas13b (PbCas13b).

In some embodiments, a mutation of an amino acid corresponding to amino acid N634 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid R638 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid N652 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid N653 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K655 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid S658 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K741 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K744 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid N756 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid S757 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid R762 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid R791 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K846 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K857 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K870 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid R877 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K183 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K193 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid R600 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K607 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K612 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid R614 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K617 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K826 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K828 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K829 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid R824 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid R830 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid Q831 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K835 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K836 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid R838 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid R618 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid D434 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K431 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid R53 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K943 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid R1041 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid Y164 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid R285 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid R287 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K292 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid E296 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid N297 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid Q646 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid N647 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid R402 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K393 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid N653 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid N652 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid R482 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid N480 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid D396 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid E397 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid D398 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid E399 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K294 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid E400 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid R56 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid N157 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid H161 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid H452 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid N455 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K484 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid N486 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid G566 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid H567 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid A656 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid V795 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid A796 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid W842 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid K871 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid E873 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid R874 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid R1068 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid N1069 of Prevotella buccae Cas13b (PbCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid H1073 of Prevotella buccae Cas13b (PbCas13b).

In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Leptotrichia shahii Cas13a (LshCas13a): R597, N598, H602, R1278, N1279, or H1283. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Leptotrichia shahii Cas13a (LshCas13a): R597, N598, H602, R1278, N1279, or H1283. In some embodiments, in a HEPN domain one or more mutation of an amino acid corresponding to the following amino acids in a HEPN domain of Leptotrichia shahii Cas13a (LshCas13a): R597, N598, H602, R1278, N1279, or H1283. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Leptotrichia shahii Cas13a (LshCas13a): R597, N598, or H602. In some embodiments, in HEPN domain 1 one or more mutation of an amino acid corresponding to the following amino acids in HEPN domain 1 of Leptotrichia shahii Cas13a (LshCas13a): R597, N598, or H602. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Leptotrichia shahii Cas13a (LshCas13a): R1278, N1279, or H1283. In some embodiments, in HEPN domain 2 one or more mutation of an amino acid corresponding to the following amino acids in HEPN domain 2 of Leptotrichia shahii Cas13a (LshCas13a): R1278, N1279, or H1283. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Porphyromonas gulae Cas13b (PguCas13b): R146, H151, R1116, or H1121. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Porphyromonas gulae Cas13b (PguCas13b): R146, H151, R1116, or H1121. In some embodiments, in a HEPN domain one or more mutation of an amino acid corresponding to the following amino acids in a HEPN domain of Porphyromonas gulae Cas13b (PguCas13b): R146, H151, R1116, or H1121. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Porphyromonas gulae Cas13b (PguCas13b): R146 or H151. In some embodiments, in HEPN domain 1 one or more mutation of an amino acid corresponding to the following amino acids in HEPN domain 1 of Porphyromonas gulae Cas13b (PguCas13b): R146 or H151. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Porphyromonas gulae Cas13b (PguCas13b): R1116 or H1121. In some embodiments, in HEPN domain 2 one or more mutation of an amino acid corresponding to the following amino acids in HEPN domain 2 of Porphyromonas gulae Cas13b (PguCas13b): R1116 or H1121. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella sp. P5-125 Cas13b (PspCas13b): H133 or H1058. In some embodiments, one or more mutation of an amino acid corresponding to the following amino acids of Prevotella sp. P5-125 Cas13b (PspCas13b): H133 or H1058. In some embodiments, in a HEPN domain one or more mutation of an amino acid corresponding to the following amino acids in a HEPN domain of Prevotella sp. P5-125 Cas13b (PspCas13b): H133 or H1058. In some embodiments, a mutation of an amino acid corresponding to amino acid H133 of Prevotella sp. P5-125 Cas13b (PspCas13b). In some embodiments, in HEPN domain 1 a mutation of an amino acid corresponding to amino acid H133 in HEPN domain 1 of Prevotella sp. P5-125 Cas13b (PspCas13b). In some embodiments, a mutation of an amino acid corresponding to amino acid H1058 of Prevotella sp. P5-125 Cas13b (PspCas13b). In some embodiments, in HEPN domain 2 a mutation of an amino acid corresponding to the amino acid H1058 in HEPN domain 2 of Prevotella sp. P5-125 Cas13b (PspCas13b).

In some embodiments, the amino acid is mutated to A, P, or V, preferably A. In some embodiments, said amino acid is mutated to a hydrophobic amino acid. In some embodiments, said amino acid is mutated to an aromatic amino acid. In some embodiments, said amino acid is mutated to a charged amino acid. In some embodiments, said amino acid is mutated to a positively charged amino acid. In some embodiments, said amino acid is mutated to a negatively charged amino acid. In some embodiments, said amino acid is mutated to a polar amino acid. In some embodiments, said amino acid is mutated to an aliphatic amino acid. In some embodiments, the engineered CRISPR-Cas protein further comprises a functional heterologous domain.

In some embodiments, the Cas13 protein is from a species of the genus Alistipes, Anaerosalibacter, Bacteroides, Bacteroidetes, Bergeyella, Blautia, Butyrivibrio, Capnocytophaga, Carnobacterium, Chloroflexus, Chryseobacterium, Clostridium, Demequina, Eubacteriaceae, Eubacterium, Flavobacterium, Fusobacterium, Herbinix, Insolitispirillum, Lachnospiraceae, Leptotrichia, Listeria, Myroides, Paludibacter, Phaeodactylibacter, Porphyromonadaceae, Porphyromonas, Prevotella, Pseudobutyrivibrio, Psychroflexus, Reichenbachiella, Rhodobacter, Riemerella, Sinomicrobium, Thalassospira, Ruminococcus; preferably Leptotrichia shahii, Listeria seeligeri, Lachnospiraceae bacterium (such as Lb MA2020, Lb NK4A179, Lb NK4A144), Clostridium aminophilum (such as Ca DSM 10710), Carnobacterium gallinarum (such as Cg DSM 4847), Paludibacter propionicigenes (such as Pp WB4), Listeria weihenstephanensis (such as Lw FSL R9-0317), Listeriaceae bacterium (such as Lb FSL M6-0635), Leptotrichia wadei (such as Lw F0279), Rhodobacter capsulatus (such as Rc SB 1003, Rc R121, Rc DE442), Leptotrichia buccalis (such as Lb C-1013-b), Herbinix hemicellulosilytica, Eubacteriaceae bacterium (such as Eb CHKCI004), Blautia. sp Marseille-P2398, Leptotrichia sp. oral taxon 879 str. F0557, Chloroflexus aggregans, Demequina aurantiaca, Thalassospira sp. TSL5-1, Pseudobutyrivibrio sp. OR37, Butyrivibrio sp. YAB3001, Leptotrichia sp. Marseille-P3007, Bacteroides ihuae, Porphyromonadaceae bacterium (such as Pb KH3CP3RA), Listeria riparia, Insolitispirillum peregrinum, Alistipes sp. ZOR0009, Bacteroides pyogenes (such as Bp F0041), Bacteroidetes bacterium (such as Bb GWA2_31_9), Bergeyella zoohelcum (such as Bz ATCC 43767), Capnocytophaga canimorsus, Capnocytophaga cynodegmi, Chryseobacterium carnipullorum, Chryseobacterium jejuense, Chryseobacterium ureilyticum, Flavobacterium branchiophilum, Flavobacterium columnare, Flavobacterium sp. 316, Myroides odoratimimus (such as Mo CCUG 10230, Mo CCUG 12901, Mo CCUG 3837), Paludibacter propionicigenes, Phaeodactylibacter xiamenensis, Porphyromonas gingivalis (such as Pg F0185, Pg F0568, Pg JCVI SC001, Pg W4087, Porphyromonas gulae, Porphyromonas sp. COT-052 OH4946, Prevotella aurantiaca, Prevotella buccae (such as Pb ATCC 33574), Prevotella falsenii, Prevotella intermedia (such as Pi 17, Pi ZT), Prevotella pallens (such as Pp ATCC 700821), Prevotella pleuritidis, Prevotella saccharolytica (such as Ps F0055), Prevotella sp. MA2016, Prevotella sp. MSX73, Prevotella sp. P4-76, Prevotella sp. P5-119, Prevotella sp. P5-125, Prevotella sp. P5-60, Psychroflexus torquis, Reichenbachiella agariperforans, Riemerella anatipestifer, Sinomicrobium oceani, Fusobacterium necrophorum (such as Fn subsp. funduliforme ATCC 51357, Fn DJ-2, Fn BFTR-1, Fn subsp. funduliforme), Fusobacterium perfoetens (such as Fp ATCC 29250), Fusobacterium ulcerans (such as Fu ATCC 49185), Anaerosalibacter sp. ND1, Eubacterium siraeum, Ruminococcus flavefaciens (such as Rfx XPD3002), or Ruminococcus albus.

In some embodiments, the Cas13 protein is a Cas13a protein.

In some embodiments, the Cas13a protein is from a species of the genus Bacteroides, Blautia, Butyrivibrio, Carnobacterium, Chloroflexus, Clostridium, Demequina, Eubacterium, Herbinix, Insolitispirillum, Lachnospiraceae, Leptotrichia, Listeria, Paludibacter, Porphyromonadaceae, Pseudobutyrivibrio, Rhodobacter, or Thalassospira; preferably Leptotrichia shahii, Listeria seeligeri, Lachnospiraceae bacterium (such as Lb MA2020, Lb NK4A179, Lb NK4A144), Clostridium aminophilum (such as Ca DSM 10710), Carnobacterium gallinarum (such as Cg DSM 4847), Paludibacter propionicigenes (such as Pp WB4), Listeria weihenstephanensis (such as Lw FSL R9-0317), Listeriaceae bacterium (such as Lb FSL M6-0635), Leptotrichia wadei (such as Lw F0279), Rhodobacter capsulatus (such as Rc SB 1003, Rc R121, Rc DE442), Leptotrichia buccalis (such as Lb C-1013-b), Herbinix hemicellulosilytica, Eubacteriaceae bacterium (such as Eb CHKCI004), Blautia. sp Marseille-P2398, Leptotrichia sp. oral taxon 879 str. F0557, Chloroflexus aggregans, Demequina aurantiaca, Thalassospira sp. TSL5-1, Pseudobutyrivibrio sp. OR37, Butyrivibrio sp. YAB3001, Leptotrichia sp. Marseille-P3007, Bacteroides ihuae, Porphyromonadaceae bacterium (such as Pb KH3CP3RA), Listeria riparia, or Insolitispirillum peregrinum.

In some embodiments, the Cas13 protein is a Cas13b protein.

In some embodiments, the Cas13b protein is from a species of the genus Alistipes, Bacteroides, Bacteroidetes, Bergeyella, Capnocytophaga, Chryseobacterium, Flavobacterium, Myroides, Paludibacter, Phaeodactylibacter, Porphyromonas, Prevotella, Psychroflexus, Reichenbachiella, Riemerella, or Sinomicrobium; preferably Alistipes sp. ZOR0009, Bacteroides pyogenes (such as Bp F0041), Bacteroidetes bacterium (such as Bb GWA2_31_9), Bergeyella zoohelcum (such as Bz ATCC 43767), Capnocytophaga canimorsus, Capnocytophaga cynodegmi, Chryseobacterium carnipullorum, Chryseobacterium jejuense, Chryseobacterium ureilyticum, Flavobacterium branchiophilum, Flavobacterium columnare, Flavobacterium sp. 316, Myroides odoratimimus (such as Mo CCUG 10230, Mo CCUG 12901, Mo CCUG 3837), Paludibacter propionicigenes, Phaeodactylibacter xiamenensis, Porphyromonas gingivalis (such as Pg F0185, Pg F0568, Pg JCVI SC001, Pg W4087, Porphyromonas gulae, Porphyromonas sp. COT-052 OH4946, Prevotella aurantiaca, Prevotella buccae (such as Pb ATCC 33574), Prevotella falsenii, Prevotella intermedia (such as Pi 17, Pi ZT), Prevotella pallens (such as Pp ATCC 700821), Prevotella pleuritidis, Prevotella saccharolytica (such as Ps F0055), Prevotella sp. MA2016, Prevotella sp. MSX73, Prevotella sp. P4-76, Prevotella sp. P5-119, Prevotella sp. P5-125, Prevotella sp. P5-60, Psychroflexus torquis, Reichenbachiella agariperforans, Riemerella anatipestifer, or Sinomicrobium oceani.

In some embodiments, the Cas13 protein is a Cas13c protein.

In some embodiments, the Cas13c protein is from a species of the genus Fusobacterium or Anaerosalibacter; preferably Fusobacterium necrophorum (such as Fn subsp. funduliforme ATCC 51357, Fn DJ-2, Fn BFTR-1, Fn subsp. funduliforme), Fusobacterium perfoetens (such as Fp ATCC 29250), Fusobacterium ulcerans (such as Fu ATCC 49185), or Anaerosalibacter sp. ND1.

In some embodiments, the Cas13 protein is a Cas13d protein.

In some embodiments, the Cas13d protein is from a species of the genus Eubacterium or Ruminococcus, preferably Eubacterium siraeum, Ruminococcus flavefaciens (such as Rfx XPD3002), or Ruminococcus albus.

In some embodiments, the catalytic activity of the engineered CRISPR-Cas protein is increased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the catalytic activity of the engineered CRISPR-Cas protein is decreased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the gRNA binding of the engineered CRISPR-Cas protein is increased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the gRNA binding of the engineered CRISPR-Cas protein is decreased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the specificity of the CRISPR-Cas protein is increased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the specificity of the CRISPR-Cas protein is decreased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the stability of the CRISPR-Cas protein is increased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the stability of the CRISPR-Cas protein is decreased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the engineered CRISPR-Cas protein further comprises one or more mutations which inactivate catalytic activity. In some embodiments, the off-target binding of the CRISPR-Cas protein is increased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the off-target binding of the CRISPR-Cas protein is decreased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the target binding of the CRISPR-Cas protein is increased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the target binding of the CRISPR-Cas protein is decreased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the engineered CRISPR-Cas protein has a higher protease activity or polynucleotide-binding capability compared with a corresponding wildtype CRISPR-Cas protein. In some embodiments, PFS recognition is altered as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the engineered CRISPR-Cas protein further comprises a functional heterologous domain. In some embodiments, the engineered CRISPR-Cas protein further comprises an NLS.

In another aspect, the present disclosure provides one or more HEPN domains and is less than 1000 amino acids in length. In some embodiments, the protein is less than 950, less than 900, less than 850, less than 800, less, or than 750 amino acids in size. In some embodiments, the HEPN domain comprises RxxxxH motif sequence. In some embodiments, the RxxxxH motif comprises a R[N/H/K]X1X2X3H sequence. In some embodiments, X1 is R, S, D, E, Q, N, G, or Y, X2 is independently I, S, T, V, or L, and X3 is independently L, F, N, Y, V, I, S, D, E, or A. In some embodiments, the CRISPR-Cas protein is a Type VI CRISPR Cas protein. In some embodiments, the Type VI CRISPR Cas protein is a Cas13a, a Cas13b, a Cas13c, or a Cas13d. In some embodiments, the CRISPR-Cas protein is associated with a functional domain. In some embodiments, the CRISPR-Cas protein comprises one or more mutations equivalent to mutations described herein. In some embodiments, the CRISPR-Cas protein comprises one or more mutations in the helical domain. In some embodiments, the CRISPR-Cas protein is in a dead form or has nickase activity.

In another aspect, the present disclosure provides a polynucleic acid encoding the engineered CRISPR-Cas protein herein. In some embodiments, the polynucleic acid is codon optimized.

In another aspect, the present disclosure provides a CRISPR-Cas system comprising the engineered CRISPR-Cas protein herein or the polynucleotide herein, and a nucleotide component capable of forming a complex with the engineered CRISPR-Cas protein and able to hybridize with a target nucleic acid sequence and direct sequence-specific binding of said complex to the target nucleic acid sequence.

In another aspect, the present disclosure provides a vector system comprising one or more vectors, the one or more vectors comprising one or more polynucleotide molecules encoding components of the engineered CRISPR-Cas protein.

In another aspect, the present disclosure provides a method of modifying a target nucleic acid comprising: introducing in a cell or organism that comprises the target nucleic acid, the engineered CRISPR-Cas protein, the polynucleotide, the CRISPR-Cas system, or the vector or vector system described herein, such that the engineered CRISPR-Cas protein modifies the target nucleic acid in the cell or organism.

In some embodiments, the engineered CRISPR-Cas system is introduced via delivery by liposomes, nanoparticles, exosomes, microvesicles, nucleic acid nanoassemblies, a gene gun, an implantable device, or the vector system herein. In some embodiments, the engineered CRISPR-cas protein is associated with one or more functional domains. In some embodiments, the target nucleic acid comprises a genomic locus, and the engineered CRISPR-Cas protein modifies gene product encoded at the genomic locus or expression of the gene product. In some embodiments, the target nucleic acid is DNA or RNA and wherein one or more nucleotides in the target nucleic acid are base edited. In some embodiments, the target nucleic acid is DNA or RNA and wherein the target nucleic acid is cleaved. In some embodiments, the engineered CRISPR-Cas protein further cleaves non-target nucleic acid. In some embodiments, the method further comprises visualizing activity and, optionally, using a detectable label. In some embodiments, the method further comprises detecting binding of one or more components of the CRISPR-Cas system to the target nucleic acid. In some embodiments, said cell or organisms is a eukaryotic cell or organism. In some embodiments, said cell or organisms is an animal cell or organism. In some embodiments, said cell or organisms is a plant cell or organism.

In another aspect, the present disclosure provides method for detecting a target nucleic acid in a sample comprising: contacting a sample with: an engineered CRISPR-Cas protein herein; at least one guide polynucleotide comprising a guide sequence capable of binding to the target nucleic acid and designed to form a complex with the engineered CRISPR-Cas; and a RNA-based masking construct comprising a non-target sequence; wherein the engineered CRISPR-Cas protein exhibits collateral RNase activity and cleaves the non-target sequence of the detection construct; and detecting a signal from cleavage of the non-target sequence, thereby detecting the target nucleic acid in the sample.

In some embodiments, the method further comprises contacting the sample with reagents for amplifying the target nucleic acid. In some embodiments, the reagents for amplifying comprises isothermal amplification reaction reagents. In some embodiments, the isothermal amplification reagents comprise nucleic-acid sequence-based amplification, recombinase polymerase amplification, loop-mediated isothermal amplification, strand displacement amplification, helicase-dependent amplification, or nicking enzyme amplification reagents. In some embodiments, the target nucleic acid is DNA molecule and the method further comprises contacting the target DNA molecule with a primer comprising an RNA polymerase site and RNA polymerase. In some embodiments, the masking construct: suppresses generation of a detectable positive signal until the masking construct cleaved or deactivated, or masks a detectable positive signal or generates a detectable negative signal until the masking construct cleaved or deactivated.

In some embodiments, the masking construct comprises: a. a silencing RNA that suppresses generation of a gene product encoded by a reporting construct, wherein the gene product generates the detectable positive signal when expressed; b. a ribozyme that generates the negative detectable signal, and wherein the positive detectable signal is generated when the ribozyme is deactivated; or c. a ribozyme that converts a substrate to a first color and wherein the substrate converts to a second color when the ribozyme is deactivated; d. an aptamer and/or comprises a polynucleotide-tethered inhibitor; e. a polynucleotide to which a detectable ligand and a masking component are attached; f. a nanoparticle held in aggregate by bridge molecules, wherein at least a portion of the bridge molecules comprises a polynucleotide, and wherein the solution undergoes a color shift when the nanoparticle is disbursed in solution; g. a quantum dot or fluorophore linked to one or more quencher molecules by a linking molecule, wherein at least a portion of the linking molecule comprises a polynucleotide; h. a polynucleotide in complex with an intercalating agent, wherein the intercalating agent changes absorbance upon cleavage of the polynucleotide; or l. two fluorophores tethered by a polynucleotide that undergo a shift in fluorescence when released from the polynucleotide.

In some embodiments, the aptamer a. comprises a polynucleotide-tethered inhibitor that sequesters an enzyme, wherein the enzyme generates a detectable signal upon release from the aptamer or polynucleotidetethered inhibitor by acting upon a substrate; or b. is an inhibitory aptamer that inhibits an enzyme and prevents the enzyme from catalyzing generation of a detectable signal from a substrate or wherein the polynucleotidetethered inhibitor inhibits an enzyme and prevents the enzyme from catalyzing generation of a detectable signal from a substrate; or c. sequesters a pair of agents that when released from the aptamers combine to generate a detectable signal. In some embodiments, the nanoparticle is a colloidal metal. In some embodiments, the at least one guide polynucleotide comprises a mismatch. In some embodiments, the mismatch is up- or downstream of a single nucleotide variation on the one or more guide sequences.

In another aspect, the present disclosure provides a cell or organism comprising the engineered CRISPR-Cas protein herein, the polynucleic acid herein, the CRISPR-Cas system, or the vector or vector system herein.

In another aspect, the present disclosure provides an engineered adenosine deaminase comprising one or more mutations, wherein the engineered adenosine deaminase has cytidine deaminase activity.

In some embodiments, the engineered adenosine deaminase has adenosine deaminase activity. In some embodiments, the engineered adenosine deaminase is a portion of a fusion protein. In some embodiments, the fusion protein comprises a functional domain. In some embodiments, the functional domain is capable of directing the engineered adenosine deaminase to bind to a target nucleic acid. In some embodiments, the functional domain is a CRISPR-Cas protein herein. In some embodiments, the CRISPR-Cas protein is a dead form CRISPR-Cas protein or CRISPR-Cas nickase protein. In some embodiments, the one or more mutations comprises: E488Q, V351G, S486A, T375S, S370C, P462A, N597I, L332I, I398V, K350I, M383L, D619G, S582T, V440I, S495N, K418E, S661T based on amino acid sequence positions of hADAR2-D, and corresponding mutations in a homologous ADAR protein. In some embodiments, the one or more mutations comprises: E488Q, V351G, S486A, T375S, S370C, P462A, N597I, L332I, I398V, K350I, M383L, D619G, S582T, V440I, S495N, K418E, and S661T based on amino acid sequence positions of hADAR2-D, and corresponding mutations in a homologous ADAR protein.

In another aspect, the present disclosure provides a polynucleotide encoding the engineered adenosine deaminase, or a catalytic domain thereof. In another aspect, the present disclosure provides comprising the polynucleotide.

In another aspect, the present disclosure provides a pharmaceutical composition comprising the engineered adenosine deaminase or a catalytic domain thereof formulated for delivery by liposomes, nanoparticles, exosomes, microvesicles, nucleic acid nanoassemblies, a gene gun, or an implantable device.

In another aspect, the present disclosure an engineered cell expressing the engineered adenosine deaminase or a catalytic domain thereof. In some embodiments, the cell transiently expresses the engineered adenosine deaminase or the catalytic domain thereof. In some embodiments, the cell non-transiently expresses the engineered adenosine deaminase or the catalytic domain thereof.

An another aspect, the present disclosure provides an engineered, non-naturally occurring system for modifying nucleotides in a target nucleic acid, comprising a) a dead CRISPR-Cas or CRISPR-Cas nickase protein, or a nucleotide sequence encoding said dead Cas or Cas nickase protein; b) a guide molecule comprising a guide sequence that hybridizes to a target sequence and designed to form a complex with the dead CRISPR-Cas or CRISPR-Cas nickase protein; and c) a nucleotide deaminase protein or catalytic domain thereof, or a nucleotide sequence encoding said nucleotide deaminase protein or catalytic domain thereof, wherein said nucleotide deaminase protein or catalytic domain thereof is covalently or non-covalently linked to said dead CRISPR-Cas or CRISPR-Cas nickase protein or said guide molecule is adapted to link thereof after delivery.

In some embodiments, said adenosine deaminase protein or catalytic domain thereof comprises one or more of the mutations: E488Q, V351G, S486A, T375S, S370C, P462A, N597I, L332I, I398V, K350I, M383L, D619G, S582T, V440I, S495N, K418E, S661T based on amino acid sequence positions of hADAR2-D, and corresponding mutations in a homologous ADAR protein. In some embodiments, said adenosine deaminase protein or catalytic domain thereof comprises mutations: E488Q, V351G, S486A, T375S, S370C, P462A, N597I, L332I, I398V, K350I, M383L, D619G, S582T, V440I, S495N, K418E, and S661T based on amino acid sequence positions of hADAR2-D, and corresponding mutations in a homologous ADAR protein.

In some embodiments, the CRISPR-Cas protein is Cas9, Cas12, Cas13, Cas 14, CasX, CasY. In some embodiments, the CRISPR-Cas protein is Cas13b. In some embodiments, the CRISPR-Cas protein is Cas13b-t1, Cas13b-t2, or Cas13b-t3. In some embodiments, he CRISPR-Cas is an engineered CRISPR-Cas protein.

In another aspect, the present disclosure provides a method for modifying nucleotide in a target nucleic acid, comprising: delivering to said target nucleic acid the engineered adenosine deaminase, or the system, wherein the deaminase deaminates a nucleotide at one or more target loci on the target nucleic acid.

In some embodiments, said nucleotide deaminase protein or catalytic domain thereof has been modified to increase activity against a DNA-RNA heteroduplex. In some embodiments, said nucleotide deaminase protein or catalytic domain thereof has been modified to reduce off-target effects. In some embodiments, the target nucleic acid is within a cell. In some embodiments, said cell is a eukaryotic cell. In some embodiments, said cell is a non-human animal cell. In some embodiments, said cell is a human cell. In some embodiments, said cell is a plant cell. In some embodiments, said target nucleic acid is within an animal. In some embodiments, said target nucleic acid is within a plant. In some embodiments, said target nucleic acid is comprised in a DNA molecule in vitro. In some embodiments, the engineered adenosine deaminase, or one or more components of the system are delivered to the cell as a ribonucleoprotein complex. In some embodiments, the engineered adenosine deaminase, or one or more components of the system are delivered via one or more particles, one or more vesicles, or one or more viral vectors. In some embodiments, said one or more particles comprise a lipid, a sugar, a metal or a protein. In some embodiments, said one or more particles comprise lipid nanoparticles. In some embodiments, said one or more vesicles comprise exosomes or liposomes. In some embodiments, said one or more viral vectors comprise one or more adenoviral vectors, one or more lentiviral vectors, or one or more adeno-associated viral vectors. In some embodiments, said method modifies a cell, a cell line or an organism by manipulation of one or more target sequences at genomic loci of interest. In some embodiments, said deamination of said nucleotide at said target locus of interest remedies a disease caused by a G→A or C→T point mutation or a pathogenic SNP. In some embodiments, said disease is selected from cancer, haemophilia, beta-thalassemia, Marfan syndrome and Wiskott-Aldrich syndrome. In some embodiments, said deamination of said nucleotide at said target locus of interest remedies a disease caused by a T→C or A→G point mutation or a pathogenic SNP. In some embodiments, said deamination of said nucleotide at said target locus of interest inactivates a target gene at said target locus. In some embodiments, the engineered adenosine deaminase, or one or more components of the system are delivered by liposomes, nanoparticles, exosomes, microvesicles, nucleic acid nanoassemblies, a gene gun, an implantable device, or the vector system. In some embodiments, modification of the nucleotide modifies gene product encoded at the target locus or expression of the gene product.

These and other aspects, objects, features, and advantages of the example embodiments will become apparent to those having ordinary skill in the art upon consideration of the following detailed description of illustrated example embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

An understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention may be utilized, and the accompanying drawings of which:

FIGS. 1A-1D. The crystal structure of PbuCas13b-crRNA Binary Complex. (FIG. 1A) Linear domain organization of PbuCas13b. Active site positioning is denoted by asterisks. (FIG. 1B) crRNA hairpin in complex with PbuCas13b. (FIG. 1C) Overall structure of PbuCas13b. Two views are rotated 180 degrees from each other. Domains are colored consistent with the linear domain map. crRNA is colored red. (FIG. 1D) Space-filling model of PbuCas13b, each view rotated 180 degrees from each other.

FIGS. 2A-2E. PbuCas13b crRNA recognition. (FIG. 2A) Diagram of PbCas13b crRNA (SEQ ID NO:1). Direct repeat residues are colored red, and spacer residues in light blue. (FIG. 2B) Positioning of the 3′ end of the crRNA near K393 and coordinating residues within PbuCas13b. (FIG. 2C) Structure of the crRNA within the PbuCas13b complex. Coloring is consistent with panel (FIG. 2A). (FIG. 2D) Base identity swapping. Upper panel, nuclease activity; lower panel, thermal stability. Hashed fill denotes wild type base identities. (FIG. 2E) Mutagenesis of Lid domain residues that coordinate and process crRNA within PbuCas13b. Upper panel, RNase activity in SHERLOCK reaction; lower panel, crRNA processing. Cleavage bands and expected sizes are indicated by red markers, ladder with sizes are shown on left.

FIG. 3. Schematic view of the intermolecular contacts between PbuCas13b and crRNA (SEQ ID NO:2).

FIGS. 4A-4C. PbuCas13b comparison to LshCas13a architecture and active site. (FIG. 4A) Linear comparison of domain organization of PbuCas13b and LshCas13a (pdb 5wtk). crRNAs are shown to the right. (FIG. 4B) Two views of PbuCas13b rotated 90 degrees. Inset is zoomed in on active site residues in the same orientation as in (FIG. 4C). (FIG. 4C) LshCas13a colored consistently with (FIG. 4A). Homologous residues are labeled.

FIGS. 5A-5H. Site-directed mutagenesis of PbuCas13b; RNA interference in mammalian cell. (FIG. 5A) Effect of all PbuCas13b site-directed mutations on RNA interference in mammalian cells. Strongest interference knockdowns are colored in light blue. (FIG. 5B) PbuCas13b with strong mutations labeled and colored in red. (FIGS. 5C-5H) Mutations separated by region.

FIGS. 6A-6D. (FIG. 6A) Surface electrostatics of PbuCas13b. (FIG. 6B) Surface electrostatics of PbuCas13b rotated 180 degrees from panel A. (FIG. 6C) Surface electrostatics of PbuCas13b with the Lid domain removed, showing the inner positively charged channel. (FIG. 6D) Surface electrostatics of the putative crRNA processing active site.

FIG. 7. REPAIR assay of pgCas13b C-terminal truncations.

FIGS. 8A-8G. (FIG. 8A) PbuCas13b direct repeat structure. (FIG. 8B) Ideal A-form RNA. (FIG. 8C) Diagram of direct repeat base pairing and secondary structure (SEQ ID NO:3). (FIG. 8D) Multiplete one. (FIG. 8E) Multiplete two. (FIG. 8F) Multiplete three. (FIG. 8G) Alignment of PbuCas13b direct repeat sequences (SEQ ID NOs:4-9). Asterix denote conserved nucleotides.

FIG. 9. Expanded data for cleavage activity of PbuCas13 with mutated crRNA, and thermal stability of crRNA mutants.

FIGS. 10A-10D. (FIG. 10A) Schematic of crRNA substrate for processing assay (SEQ ID NOs:10-11). (FIG. 10B) Gel showing complementary DR is not processed. (FIG. 10C) crRNA processing by mutants of PbuCas13b. (FIG. 10D) SHERLOCK assay measuring general RNase activity.

FIGS. 11A-11C. Melting curves of PbuCas13b with substrate RNA and Magnesium ions. (FIG. 11A) The effect of RNA substrate on PbuCas13b thermal stability. (FIG. 11B) The effect of PbuCas13b RNA cleavage and thermal stability. (FIG. 11C) The effect of magnesium on PbuCas13b thermal stability.

FIG. 12. Limited proteolysis of PbuCas13b with RNA substrate. Limited proteolysis of PbuCas13b. T=Trypsin, C=Chymotrypsin, P=Pepsin

FIGS. 13A-13C. Cas13b bridge-helix. (FIG. 13A) Cas13b with bridge-helix highlighted in red. RNA is colored in pink. (FIG. 13B) Cas12(Cpf1) with bridge-helix highlighted in cyan. RNA is colored in light blue, DNA dark blue. (FIG. 13C) Manual sequence alignment of bridge helix from PbuCas13b and LbCas12 (SEQ ID NOs:12-13).

FIG. 14. Cas13b Neighbor-joining tree of all Cas13b family members. Inset, Cas13b subset with PbuCas13b (bolded).

FIG. 15. Structure based alignment of Cas13b subgroup (SEQ ID NOs:14-22).

FIG. 16. Structure based alignment of all Cas13bs (SEQ ID NOs:23-37).

FIGS. 17A-17D. Raw uncropped images of all gels shown in figures. (FIG. 17A) crRNA processing gel1. (FIG. 17B) crRNA processing gel2. (FIG. 17C) crRNA processing gel3. (FIG. 17D) limited proteolysis gel.

FIG. 18. Grouped topology map of PbuCas13b crystal structure.

FIG. 19 shows a pymol file that shows a position of the coordinated nucleotide in the active site of Cas13b.

FIG. 20 shows an exemplary RNA loop extension.

FIG. 21 shows exemplary fusion points via which a nucleotide deaminase is linked to a Cas13b.

FIG. 22 shows screening for mutations for RESCUE v9.

FIG. 23 shows validation of RESCUEv9's effect on T-flip guides.

FIG. 24 shows validation of RESCUEv9's effect on C-flip guides.

FIG. 25 shows performance of RESCUEv9 on endogenous targeting.

FIG. 26 shows screening for mutations for RESCUEv10.

FIG. 27 shows test results of 30-bp guides for C-flips.

FIG. 28 shows Gluc/Cluc results from comparison between Cas13b6 and Cas13b12 with RESCUE v1 through v8.

FIG. 29 shows fraction editing results from comparison between Cas13b6 and Cas13b12 with RESCUE v1 through v8.

FIG. 30 shows effects on endogenous targeting (T-flips) results from comparison between Cas13b6 and Cas13b12 with RESCUEv8.

FIG. 31 shows effects of RESCUEs on base converting.

FIG. 32 shows test results of CCN 3′ motif targeting.

FIG. 33A shows a schematic of constructs with dCas13b fused with ADAR. FIG. 33B shows test results of the constructs.

FIG. 34 shows sequencing of the N-terminal tag and linkers.

FIG. 35 shows quantification of off-targets.

FIG. 36 shows testing of off-target edits.

FIG. 37 shows test results of endogenous genes targets with (GGS)2/Q507R.

FIG. 38 and FIG. 39 show eGFP screening of mutations on (GGS)2/Q507R.

FIG. 40A shows constructs with Cas13b truncation. FIG. 40B shows test results of the constructs.

FIG. 41 shows multiplexed on/off-target guides for screening (SEQ ID NOs:38-39).

FIGS. 42A-42E show validation tests on RESCUEv10. FIG. 42A shows validation of RESCUEv10 (Rounds 50, 52). FIG. 42B shows validation of RESCUEv10 (Rounds 53, 54).

FIG. 42C shows validation of RESCUEv10 (Rounds 58). FIG. 42D shows validation of RESCUEv10 (Rounds 59). FIG. 42E shows validation of RESCUEv10 (Rounds 61).

FIG. 43 shows NGS analysis of RESCUEv10.

FIG. 44 shows identified mutations that improve specificity.

FIG. 45 shows effects of RESCUE on endogenous targeting (C-flips and T-flips) results.

FIG. 46 shows targeting β-catenin using RESCUE v6 and v9.

FIG. 47 shows new β-catenin secreted Gluc/Cluc reporter.

FIG. 48 shows results of targeting β-catenin by RESCUEv10.

FIG. 49 shows targeting ApoE4 by RESCUEv10.

FIG. 50 shows exemplary mutations in PCSK9 that can be generated using RESCUE.

FIG. 51 shows results from Gluc knockdown in mammalian cells by Cas13b-t1.

FIG. 52 shows results from Gluc knockdown in mammalian cells by Cas13b-t2.

FIG. 53 shows results from Gluc knockdown in mammalian cells by Cas13b-t3.

FIGS. 54A-54C show loci of Cas13b-t1, Cas13b-t2, and Cas13b-t3.

FIGS. 55A-55C show more details on loci of Cas13b-t1, Cas13b-t2, and Cas13b-t3 (SEQ ID NOs:40-45).

FIG. 56 shows alignments of Cas13b-t1, Cas13b-t2, and Cas13b-t3 with other Cas13b orthologs (SEQ ID NO:46-64).

FIG. 57 shows a summary of RESCUE mutations screened.

FIG. 58 is a graph illustrating results of an experiment in which better beta catenin mutants were selected.

FIG. 59 shows graphs illustrating results of RESCUE round 12.

FIG. 60 is a schematic illustrating the beta catenin migration assay.

FIG. 61 is a graph showing results of a cell migration assay induced by beta catenin.

FIG. 62 shows graphs illustrating that specificity mutations eliminate A-I off-targets.

FIG. 63 shows graphs illustrating that targeting Stat1/3 phosphorylation sites reduces signaling.

FIG. 64 shows graphs illustrating that targeting Stat1/3 phosphorylation sites reduces signaling (STAT1 non-treatment (left) and STAT1 IFNγ treatment (right)).

FIG. 65 shows graphs illustrating that targeting Stat1/3 phosphorylation sites reduces signaling, with FIG. 65A showing results for STAT3 IL6 activation and FIG. 65B showing results for STAT3 no treatment.

FIG. 66 show graphs illustrating results of RESCUE round 12.

FIG. 67 show graphs illustrating results from a potential RESCUE round 13.

FIG. 68 is a graph showing results of a cell migration assay induced by beta catenin.

FIG. 69 shows a graph illustrating results of comparison of dead and live tiny orthologs for Gluc knock down.

FIG. 70 shows a graph illustrating of testing function of Cas13b-t1.

FIG. 71 shows a graph illustrating of testing function of Cas13b-t3.

FIG. 72 shows a graph illustrating the guides, non-targeting comparison.

FIGS. 73A-73G: Directed evolution of a ADAR2 deaminase domain for cytidine deamination. (FIG. 73A) Schematic of the directed evolution approach, involving rational mutagenesis, yeast screening, and mammalian cell validation of activity. (FIG. 73B) Activity of RESCUE versions 0-16 on a cytidine flanked by a 5′ U and a C′ G on a Gluc transcript. Left: Luciferase reporter activity is reported for RESCUEv0-v16. Right: Percent editing levels of RESCUEv0-v16 is reported. (FIG. 73C) Heatmap depicting the percent editing levels of RESCUEv0-v16 on cytidines flanked by varying bases on the Gluc transcript. (FIG. 73D) Percent editing of RESCUEv0-v16 on a cytidine flanked by a 5′ U and a C′ G on a Gluc transcript at varying levels of the RESCUE plasmid transfected. (FIG. 73E) Editing activity of RESCUEv16 and RESCUEv8 on all possible 16 cytidine flanking bases motifs on the Gluc transcript. Guide designs with either a T-flip or a C-flip across from the target cytidine are used. (FIG. 73F) Cytidine deamination by RESCUEv16 is compared to editing with the guide RNA along with either ADAR2dd, full length ADAR2, or no protein. (FIG. 73G) A zoomed in crystal structure view of the mutants at the catalytic deamination site with the RNA with the flipped out base also shown.

FIGS. 74A-74G: C to U editing by RESCUE on endogenous and disease relevant targets. (FIG. 74A) Editing efficiency of RESCUEv16 on a panel of endogenous genes covering multiple motifs. (FIG. 74B) Heatmap depicting editing efficiency of RESCUE versions v0-v16 on a panel of three endogenous genes. (FIG. 74C) Editing efficiency of RESCUEv16 on a set of synthetic versions of relevant T>C disease mutations. (FIG. 74D) Schematic of multiplexed C to U and A to I editing with pre-crRNA guide arrays. (FIG. 74E) Simultaneous C to U and A to I editing on beta catenin transcripts. (FIG. 74F) Schematic of rational prevention of off-target activity at neighboring adenosine sites via introduction of disfavored base flips (SEQ ID NO:65-66). (FIG. 74G) Percent editing at on-target C and off-target A sites for Gaussia luciferase (left) and KRAS (right) using rational introduction of disfavored baseflips.

FIGS. 75A-75F: Transcriptome-wide specificity of RESCUEv16. (FIG. 75A) On-target C to U editing and summary of C to U and A to I transcriptome-wide off targets of RESCUE v16 and B6-REPAIRv1, B12-REPAIRv1, and B12-REPAIRv2. (FIG. 75B) Manhattan plot of RESCUEv16 A to I and C to U off targets. The on-target C to U edit is highlighted in orange. (FIG. 75C) Schematic of the interactions between ADAR2dd residues and double stranded RNA substrate with residues used in a mutagenesis screen for improving specificity highlighted red (SEQ ID NO:67-68). (FIG. 75D) Luciferase values for C to U activity with a targeting guide (y-axis) and A to I activity with a non-targeting guide (x-axis) shown for RESCUEv16 and 95 RESCUEv16 mutants. Mutants highlighted in blue have efficient targeted C to U activity, but have lost their residual A to I activity, indicating an improvement in A to I specificity. (FIG. 75E) On-target C to U editing and summary of C to U and A to I transcriptome-wide off targets of RESCUE v16 and top specificity mutants. (FIG. 75F) Manhattan plot of RESCUEv16S (+S375A) A to I and C to U off targets (SEQ ID NO:65-66). The on-target C to U edit is highlighted in orange.

FIGS. 76A-76H: Phenotypic outcomes directed by C to U RNA editing for cell growth and signaling. (FIG. 76A) Schematic of RNA targeting against phosphorylated residues of STAT3 to alter associated signaling pathways (SEQ ID NO:69-74). (FIG. 76B) Percent editing at relevant phosphorylated residues in STAT3 (left) and STAT1 (right) by RESCUEv16. (FIG. 76C) Inhibition of STAT3 (left) and STAT1 (right) signaling by RNA editing as measured by STAT-driven luciferase expression. (FIG. 76D) Schematic of RNA targeting against phosphorylated residues of CTNNB1 to promote stabilization (SEQ ID NO:75-77). (FIG. 76E) Schematic of beta catenin activation via editing of phosphorylated residues by RESCUE, resulting in increased cellular growth. (FIG. 76F) Percent editing at relevant phosphorylated residues in CTNNB1 by RESCUEv16. (FIG. 76G) Activation of CTNNB1 signaling by RNA editing as measured by CTNNB1-driven (TCF/LEF) luciferase expression. (FIG. 76H) Quantitation of cellular growth due to activation of CTNNB1 signaling by RNA editing.

FIGS. 77A-77B: Screening of inactivating Gluc mutations for generating a cytosine deamination luciferase reporter. (FIG. 77A) Luciferase activity of a panel of various Gluc mutants shown to previously have some effect on luciferase activity [cite Gluc paper]. Values represent mean+/−S.E.M (n=3). (FIG. 77B) Luciferase activity of a panel of leucine to proline Gluc mutants. Leucine to proline mutant reporters were focused on because they generate a CCN motif site for cytidine deamination (center C is deaminated). This allows for assaying the effect of all four CCN motifs on RESCUE deamination activity. Values represent mean+/−S.E.M (n=3).

FIG. 78: Cytidine deamination activity of RESCUEv0-v16 on CCG, ACG, GCG, CCA, and CCU sites in Gluc. Values represent mean+/−S.E.M (n=3).

FIGS. 79A-79B: Cytidine deamination activity of varying amounts of RESCUEv0-16. (FIG. 78A) Dose response of RESCUEv0-v16 activity as measured by restoration of luciferase activity on a UCG site in the Gluc transcript. Values represent mean of three replicates. (FIG. 78B) Dose response of RESCUEv0-v16 activity as measured by restoration of luciferase activity on the T41I site in the CTNNB1 transcript. Values represent mean of three replicates.

FIG. 80: Percent editing of a UCG site in the Gluc transcript by RESCUEv6-v9 at varying guide and RESCUE plasmid amounts. Values represent mean+/−S.E.M (n=3).

FIG. 81: Percent editing of Gluc sites with all 16 possible 5′ and 3′ base combinations with RESCUEv16 and v8 using guides with either G or A mismatches. Values represent mean+/−S.E.M (n=3).

FIG. 82: Percent editing of RESCUEv1 and RESCUEv2-v8 on a UCG site in the Gluc transcript with guide RNAs of varying U mismatch positions. RESCUE versions are compared with both RanCas13b and PspCas13b. Values represent mean+/−S.E.M (n=3). 20/22 denotes 20 mismatch distance for RanCas13b and 22 mismatch distance for PspCas13b.

FIG. 83: Percent editing of RESCUEv16 on a UCG site in the Gluc transcript with 30 bp and 50 bp guides with varying U mismatch positions. Values represent mean+/−S.E.M (n=3).

FIGS. 84A-84D: Editing rates of various yeast reporters for directed evolution. (FIG. 84A) Percent fluorescence correction of the GFP mutation Y66H by RESCUEv3, v7, and v16 with targeting and non-targeting guides. Fluorescence is measured by performing flow cytometry on 10,000 cells. (FIG. 84B) Percent editing correction of the GFP mutation Y66H by RESCUEv3, v7, and v16 with targeting and non-targeting guides. Values represent mean+/−S.E.M (n=3). (FIG. 84C) Percent editing correction of the HIS3 mutation P196L by RESCUEv7, and v16 with targeting and non-targeting guides. Values represent mean+/−S.E.M (n=3). (FIG. 84D) Percent editing correction of the HIS3 mutation S129P by RESCUEv7, and v16 with targeting and non-targeting guides. Values represent mean+/−S.E.M (n=3).

FIGS. 85A-85B: Biochemical deamination activity of ADAR2 deaminase domain containing RESCUEv2 mutations using recombinant protein. (FIG. 85A) Adenosine deamination activity of ADAR2 deaminase domain protein containing RESCUEv2 mutations with a 22 bp double-stranded RNA substrate containing a center adenine mismatched with a cytosine. Reactions were incubated for varying time points and with and without the deaminase domain. (FIG. 85B) Cytidine deamination activity of ADAR2 deaminase domain protein containing RESCUEv2 mutations with a 22 bp double-stranded RNA substrate containing a center cytosine mismatched with a uridine. Reactions were incubated for varying time points and with and without the deaminase domain.

FIGS. 86A-86E: Comparison of cytidine deaminase activity of RESCUEv16, full ADAR2 (with RESCUEv16 mutations), ADAR2 deaminase domain (with RESCUEv16 mutations), and without any protein. (FIG. 86A) Percent editing of a site in the Gluc transcript with varying 5′ bases with a targeting guide and RESCUEv16, full ADAR2 (with RESCUEv16 mutations), ADAR2 deaminase domain (with RESCUEv16 mutations), and no protein. Values represent mean+/−S.E.M (n=3). (FIG. 86B) Percent editing of a site in the Gluc transcript with varying 5′ bases with a non-targeting guide and RESCUEv16, full ADAR2 (with RESCUEv16 mutations), ADAR2 deaminase domain (with RESCUEv16 mutations), and no protein. Values represent mean+/−S.E.M (n=3). (FIG. 86C) Editing of a UCG site in the Gluc transcript with RESCUEv16 and guide RNAs containing varying mismatch positions. Values represent mean+/−S.E.M (n=3). (FIG. 86D) Editing of a UCG site in the Gluc transcript with full-length ADAR2 (with RESCUEv16 mutations) and guide RNAs containing varying mismatch positions. Values represent mean+/−S.E.M (n=3). (FIG. 86E) Editing of a UCG site in the Gluc transcript with ADAR2 deaminase domain (with RESCUEv16 mutations) and guide RNAs containing varying mismatch positions. Values represent mean+/−S.E.M (n=3).

FIGS. 87A-87C: Mismatch position tiling to find optimal editing guide design for RESCUEv16 on endogenous target sites. (FIG. 87A) Percent editing of endogenous target sites with varying base motifs with RESCUEv16 and guides with mismatches at position 7, 9, 11, and 13 and U base flips. Values represent mean+/−S.E.M (n=3). (FIG. 87B) Percent editing of endogenous target sites with varying base motifs with RESCUEv16 and guides with mismatches at position 7, 9, 11, and 13 and C base flips. Values represent mean+/−S.E.M (n=3). (FIG. 87C) Percent editing of endogenous target sites with varying base motifs with RESCUEv16 and guides with mismatches at position 3, 5, 7, 9, and 11 and C and U base flips. Values represent mean+/−S.E.M (n=3).

FIG. 88: Cytidine deamination activity of varying amounts of RESCUEv0-16 as measured by percent editing at a KRAS site. Values represent mean of three replicates.

FIG. 89: Percent editing of various disease-relevant mutations on synthetic reporters using RESCUEv16 and guides with varying mismatch positions. Values represent mean+/−S.E.M (n=3).

FIG. 90: Percent editing at the two ApoE4 cytosines (rs429358 and rs7412) using RESCUEv16 with guides of varying C and U mismatch positions. Values represent mean+/−S.E.M (n=3).

FIGS. 91A-91C: Specificity of RESCUE versions in the guide duplex window. (FIG. 91A) Schematic of editing site of Gaussia luciferase mutant C82R, with the targeted C highlighted in red and nearby adenine bases numbered and highlighted in gray. (FIG. 91B) Percent editing of at nearby adenine bases in Gaussia luciferase mutant C82R with targeting by RESCUEv0, RESCUEv8, and RESCUEv16. (FIG. 91C) Percent editing of adenine to guanosine at adenine 20 by varying amounts of RESCUEv0-v16. Values represent mean of three replicates.

FIGS. 92A-92D: Adenosine deaminase activity of RESCUEv0-v16 and RESCUEv16S. (FIG. 92A) Luciferase correction via adenosine deamination of the Gluc transcript by RESCUEv0-v16 and RESCUEv16S using a targeting guide RNA. Values represent mean+/−S.E.M (n=3). (FIG. 92B) Luciferase correction via adenosine deamination of the Gluc transcript by RESCUEv0-v16 and RESCUEv16S using a non-targeting guide RNA. Values represent mean+/−S.E.M (n=3). (FIG. 92C) Percent editing of adenosine to inosine of the Gluc transcript by RESCUEv0-v16 and RESCUEv16S using a targeting guide RNA. Values represent mean+/−S.E.M (n=3). (FIG. 92D) Percent editing of adenosine to inosine of the Gluc transcript by RESCUEv0-v16 and RESCUEv16S using a non-targeting guide RNA. Values represent mean+/−S.E.M (n=3).

FIGS. 93A-93C: Cytidine deamination activity and off-target activity on a Beta-catenin target site using varying amounts of RESCUEv0-16 and RESCUEv16S. (FIG. 93A) Schematic of editing site of CTNNB1 T41I, with the targeted C highlighted in red and the nearby off-target adenine base highlighted in gray. (FIG. 93B) Percent editing of cytosine to uridine (T41A) by varying amounts of RESCUEv0-v16 and RESCUEv16S. Values represent mean of three replicates. (FIG. 93C) Percent editing of adenine to guanosine at the off-target adenine by varying amounts of RESCUEv0-v16 and RESCUEv16S. Values represent mean of three replicates.

FIGS. 94A-94E: On target and off-target editing of RESCUEv16 and RESCUEv16S on endogenous targets. (FIG. 94A) Percent editing of endogenous target sites with varying base motifs with RESCUEv16 and RESCUEv16S. Values represent mean+/−S.E.M (n=3). (FIG. 94B) Percent editing of at neighboring adenine bases in NRAS I21I with targeting by RESCUEv16 and RESCUEv16S. (FIG. 94C) Percent editing of at neighboring adenine bases in NF2 T21M with targeting by RESCUEv16 and RESCUEv16S. (FIG. 94D) Percent editing of at neighboring adenine bases in RAFT P30S with targeting by RESCUEv16 and RESCUEv16S. (FIG. 94E) Percent editing of at neighboring adenine bases in CTNNB1 P44S with targeting by RESCUEv16 and RESCUEv16S.

FIGS. 95A-95B: Summary of amino acid changes enabled by RESCUE. (FIG. 97A) Amino acid conversions possible using cytidine deamination by RESCUE. (FIG. 97B) Codon table showing all potential amino acid changes possible by RESCUE.

FIG. 96: RESCUE v16S was able to effectively edit endogenous genes.

FIG. 97: RESCUE v16S maintained some A to I activity.

FIG. 98: RESCUE v16 was used to target STAT to reduce INFγ/IL6 induction.

FIGS. 99A-99B: RESCUE targeting induces cell growth.

FIG. 100. A schematic showing an example transcript tracking method.

FIG. 101 shows an example system and method of programmable cytidine to uridine conversion according to some embodiments herein.

FIG. 102 shows example approaches of correcting mutations and/or targeting post-translational signaling or catalysis using base editors according to some embodiments herein.

FIGS. 103A-103E Evolution of an ADAR2 deaminase domain for cytidine deamination in reporter and endogenous transcripts. FIG. 103A. Schematic of RNA targeting of the catalytic residue mutant (C82R) of Gaussia luciferase reporter transcript (SEQ ID NO:712-714). FIG. 103B. Heatmap depicting the percent editing levels of RESCUEr0-r16 on cytidines flanked by varying bases on the Gluc transcript. More favorable editing motifs are shown at the top, while less favorable motifs (5′C) are shown at the bottom. FIG. 103C. Editing activity of RESCUE on all possible 16 cytidine flanking bases motifs on the Gluc transcript with U-flip or C-flip guides. FIG. 103D. Activity comparison between RESCUE, ADAR2dd without Cas13, full-length ADAR2 without Cas13, or no protein. FIG. 103E. Editing efficiency of RESCUE on a panel of endogenous genes covering multiple motifs. The best guide for each site is shown with the entire panel of guides displayed in FIG. 125.

FIGS. 104A-104F Phenotypic outcomes of RESCUE on cell growth and signaling FIG. 104A. Schematic of b-catenin domains and RESCUE targeting guide (SEQ ID NO:715-717). FIG. 104B. Schematic of b-catenin activation and cell growth via RESCUE editing.

FIG. 104C. Percent editing by RESCUE at relevant positions in the CTNNB1 transcript. FIG. 104D. Activation of Wnt/b-catenin signaling by RNA editing as measured by b-catenin-driven (TCF/LEF) luciferase expression. FIG. 104E. Representative microscopy images of RESCUE CTNNB1 targeting and non-targeting guides in HEK293FT cells. FIG. 104F. Quantitation of cellular growth due to activation of CTNNB1 signaling by RNA editing in HEK293FT cells.

FIGS. 105A-105D RESCUE and REPAIR multiplexing and specificity enhancement via guide engineering. FIG. 105A. Schematic of multiplexed C to U and A to I editing with pre-crRNA guide arrays. FIG. 105B. Simultaneous C to U and A to I editing on CTNNB1 transcripts. FIG. 105C. Schematic of rational engineering with guanine base flips to prevent off-target activity at neighboring adenosine sites (SEQ ID NO:718-719). FIG. 105D. Percent editing at on-target C and off-target A sites for Gaussia luciferase (left) and KRAS (right) using rational introduction of disfavored base flips.

FIGS. 106A-106G Transcriptome-wide specificity of RESCUE. FIG. 106A. On-target C to U editing and summary of C to U and A to I transcriptome-wide off-targets for RESCUE compared to REPAIR. FIG. 106B. Manhattan plots of RESCUE A to I (left) and C to U (right) off-targets. The on-target C to U edit is highlighted in orange. FIG. 106C. Schematic of the interactions between ADAR2dd residues and double stranded RNA substrate with residues used in a mutagenesis screen for improving specificity highlighted red (SEQ ID NO:720-721). FIG. 106D. Luciferase values for C to U activity with a targeting guide (y-axis) and A to I activity with a non-targeting guide (x-axis) shown for RESCUE and 95 RESCUE mutants. Mutants highlighted in blue have higher specificity with maintained C to U activity. RESCUE is highlighted in red. The T375G mutation that generates REPAIRv2 is shown in orange. FIG. 106E. On-target C to U editing and summary of C to U and A to I transcriptome-wide off targets of RESCUE, REPAIR, and top specificity mutants. FIG. 106F. Manhattan plot of RESCUE-S(+S375A) A to I (left) and C to U (right) off-targets. The on-target C to U edit is highlighted in orange. FIG. 106G. Representative RNA sequencing reads surrounding the on-target Gluc editing site (blue triangle) for RESCUE (top) and RESCUE-S(bottom). A to I edits are highlighted in red; C to U (T) edits are highlighted in blue; sequencing errors are highlighted in yellow (SEQ ID NO:722-767).

FIGS. 107A-107B Targeted RNA cytidine to uridine editing enables new base conversions. FIG. 107A Amino acid conversions possible using cytidine deamination by RESCUE, with corresponding post-translation modifications and biological activities. FIG. 107B. Schematic of the directed evolution approach, involving rational mutagenesis, yeast screening, and mammalian cell validation of activity. Rational mutagenesis began with targeting residues known to contact the RNA substrate, as shown in the schematic at the top, derived from the crystal structure of ADAR2dd(23). Residues targeted with saturation mutagenesis are highlighted in red. For directed evolution, a HIS3 growth reporter was used to enable positive selection of ADAR2dd mutants in yeast with C to U editing and restoration of the HIS3 gene. Top mutants from each round of yeast evolution are evaluated in mammalian cells for C to U editing activity and then the top mutant is used for the next round of yeast evolution.

FIG. 108. Comparison of RanCas13b-REPAIR and PspCas13b-REPAIR adenosine deamination activity in yeast with targeting and non-targeting guides. A to I correction of the Y66H mutation in EGPF restores GFP fluorescence and is measured by flow cytometry. As REPAIR with the catalytically inactive Cas13b ortholog from Riemerella anatipestifer (dRanCas13b) was more effective than REPAIR with the catalytically inactive Cas13b ortholog from Prevotella sp. P5-125 (dPspCas13b), we began with a dRanCas13b-ADAR2dd fusion for development of RESCUE.

FIGS. 109A-109B Screening of inactivating Gluc mutations for generating a cytosine deamination luciferase reporter. FIG. 109A. Luciferase activity of a panel of various Gluc mutants shown to previously have some effect on luciferase activity (33). Values represent mean+/−S.E.M (n=3). FIG. 109B. Luciferase activity of a panel of leucine to proline Gluc mutants. Leucine to proline mutant reporters were focused on because they generate a CCN motif site for cytidine deamination (center C is deaminated). This allows for assaying the effect of all four CCN motifs on RESCUE deamination activity. Values represent mean+/−S.E.M (n=3); WT, wildtype Gluc sequence.

FIG. 110. Cytidine deamination activity of RESCUEr0-r16 on UCG, CCG, ACG, GCG, CCA, and CCU sites in Gluc. Values represent mean+/−S.E.M (n=3).

FIGS. 111A-111C Cytidine deamination activity of varying amounts of RESCUEr0-r16. FIG. 111A. Dose response of RESCUEr0-r16 activity as measured by restoration of luciferase activity on a UCG site in the Gluc transcript. Values represent mean of three replicates. FIG. 111B. Dose response of RESCUEr0-r16 activity as measured by C to U editing at a UCG site in the Gluc transcript. Values represent mean of three replicates. FIG. 111C. Dose response of RESCUEr0-r16 activity as measured by restoration of luciferase activity on the T41I site in the CTNNB1 transcript. Values represent mean of three replicates.

FIG. 112 Percent editing of a UCG site in the Gluc transcript by RESCUEr6-r9 at varying guide and RESCUE plasmid amounts. Values represent mean+/−S.E.M (n=3).

FIGS. 113A-113E Editing rates of various yeast reporters for directed evolution. FIG. 113A. Percent fluorescence correction of the GFP mutation Y66H by RESCUEr3, r7, and r16 with targeting and non-targeting guides. Fluorescence is measured by performing flow cytometry on 10,000 cells. T, targeting guide; NT, non-targeting guide. FIG. 113B. Percent editing correction of the GFP mutation Y66H by RESCUEr3, r7, and r16 with targeting and non-targeting guides. T, targeting guide; NT, non-targeting guide. FIG. 113C. Percent editing correction of the HIS3 mutation P196L by RESCUEr7, and r16 with targeting and non-targeting guides. T, targeting guide; NT, non-targeting guide. FIG. 113D. Percent editing correction of the HIS3 mutation S129P by RESCUEr7, and r16 with targeting and non-targeting guides. T, targeting guide; NT, non-targeting guide. FIG. 113E. Percent editing correction of the HIS3 mutation S22P by RESCUEr3, r7, and r16 with targeting guides of varying mismatch distance and non-targeting guide at different hours after RESCUE induction. NT, non-targeting guide.

FIGS. 114A-114C Percent editing of Gluc sites with all 16 possible 5′ and 3′ base combinations with RESCUEr16 and r8 using guides with U, C, G, or A mismatches. FIG. 114A. Percent editing of Gluc sites with all 16 possible 5 ÅL and 3 ÅL base combinations with RESCUEr8 using guides with either U or C mismatches. Values represent mean+/−S.E.M (n=3). FIG. 114B. Percent editing of Gluc sites with all 16 possible 5 ÅL and 3 ÅL base combinations with RESCUEr8 using guides with either G or A mismatches. Values represent mean+/−S.E.M (n=3). FIG. 114C. Percent editing of Gluc sites with all 16 possible 5 ÅL and 3 ÅL base combinations with RESCUEr16 using guides with either G or A mismatches. Values represent mean+/−S.E.M (n=3).

FIG. 115 Percent editing of RESCUE on a UCG site in the Gluc transcript with 30 bp and 50 bp guides with varying U mismatch positions. Values represent mean+/−S.E.M (n=3).

FIG. 116 Percent editing of RESCUEr1 and RESCUEr3-r8 on a UCG site in the Gluc transcript with guide RNAs of varying U mismatch positions. Candidate rounds are compared with both RanCas13b and PspCas13b. Values represent mean+/−S.E.M (n=3). 20/22 denotes 20 mismatch distance for RanCas13b and 22 mismatch distance for PspCas13b. As REPAIR uses a fusion of ADAR2dd with dPspCas13b (7), we compared our RESCUE candidate rounds with fusions of PspCas13b and RanCas13b and found them to be equivalently active.

FIGS. 117A-117B View of RESCUE mutations on the crystal structure of the ADAR2 deaminase domain. FIG. 117A. The RESCUE mutants are shown in the ADAR2 crystal structure (blue) along with the flipped-out cytidine modeled in purple. FIG. 117B. A zoomed in crystal structure view of the mutants at the catalytic deamination site with the RNA with the flipped-out base also shown in purple.

FIGS. 118A-118D Adenosine deaminase activity of RESCUEr0-r16 and RESCUEr16-S. With REPAIR, efficiency of adenosine deamination is dependent on the guide design choice of position relative to the target adenosine and base flip selection (7), as ADAR2dd prefers to deaminate in mismatch bubbles. The position of the target base within the guide:target dsRNA duplex is particularly important, as Cas13 guides can be placed anywhere without any sequence restriction and there is a small window of optimal activity for ADAR2dd (7). For RESCUE, we tested all possible guide base-flips across from the target cytosine, and found that the optimal base flips for cytidine deamination were either C or U, with optimal editing of the UCG motif with a 30-nt guide RNA with the targeting base-flip position 26 base pairs from the 5 ÅL end of the target. FIG. 118A. Luciferase correction via adenosine deamination of the Gluc transcript by RESCUEr0-r16 and RESCUEr16-S using a targeting guide RNA. Values represent mean+/−S.E.M(n=3). FIG. 118B. Luciferase correction via adenosine deamination of the Gluc transcript by RESCUEr0-v16 and RESCUEr16-S using a non-targeting guide RNA. Values represent mean+/−S.E.M (n=3).

FIG. 118C. Percent editing of adenosine to inosine of the Gluc transcript by RESCUEr0-r16 and RESCUEr16-S using a targeting guide RNA. Values represent mean+/−S.E.M (n=3).

FIG. 118D. Percent editing of adenosine to inosine of the Gluc transcript by RESCUEr0-r16 and RESCUEr16-S using a non-targeting guide RNA. Values represent mean+/−S.E.M (n=3).

FIGS. 119A-119D Evaluation of individual RESCUE mutations added on REPAIR (RESCUEr0) or individual mutations removed from RESCUEr16. FIG. 119A. Evaluation of C to U deaminase activity of individual RESCUE mutations added on REPAIR (RESCUEr0) targeting a site on the luciferase transcript, as measured by luciferase activity restoration. Values represent mean+/−S.E.M (n=3); WT, RESCUEr0 sequence. FIG. 119B. Evaluation of C to U deaminase activity of individual RESCUE mutations added on REPAIR (RESCUEr0) targeting a site on the luciferase transcript, as measured by percent editing. Values represent mean+/−S.E.M (n=3); WT, RESCUEr0 sequence. FIG. 119C. Evaluation of C to U deaminase activity of RESCUEr16 constructs with individual mutations removed targeting a site on the luciferase transcript, as measured by luciferase activity restoration. Values represent mean+/−S.E.M (n=3); WT, RESCUEr16 sequence. FIG. 119D. Evaluation of C to U deaminase activity of RESCUEr16 constructs with individual mutations removed targeting a site on the luciferase transcript, as measured by percent editing. Values represent mean+/−S.E.M (n=3); WT, RESCUEr16 sequence.

FIGS. 120A-120D Biochemical deamination activity of ADAR2 deaminase domain containing RESCUEr0, r2, r8, 13, and r16 mutations using recombinant protein. FIG. 120A. Adenosine deamination activity of ADAR2 deaminase domain protein containing various candidate mutations with a 22 bp double-stranded RNA substrate containing a center adenine mismatched with a cytidine. Reactions were incubated for varying time points and with and without the deaminase domain. Values represent mean+/−S.E.M (n=3, some error bars occluded by symbols). FIG. 120B. Cytidine deamination activity of ADAR2 deaminase domain protein containing various candidate mutations with a 22 bp double-stranded RNA substrate containing a center cytidine mismatched with a uridine. Reactions were incubated for varying time points and with and without the deaminase domain. Values represent mean+/−S.E.M (n=3, some error bars occluded by symbols). FIG. 120C. RESCUE r0 and r16 cytidine deaminase activity on RNA and DNA substrates, including a cytidine in RNA annealed to complementary DNA (RNA:DNA), a deoxycytidine in DNA annealed to complementary RNA (DNA:RNA), a deoxycytidine in double stranded DNA (dsDNA), and a deoxycytidine in ssDNA. All double-stranded templates contain a cytidine mismatched with a thymidine. Values represent mean+/−S.E.M (n=3). FIG. 120D. RESCUE r0 and r16 adenosine deaminase activity on RNA and DNA substrates, including an adenosine in RNA annealed to complementary DNA (RNA:DNA), a deoxyadenosine in DNA annealed to complementary RNA (DNA:RNA), a deoxyadenosine in double stranded DNA (dsDNA), and a deoxyadenosine in ssDNA. All double-stranded templates contain an adenosine mismatched with a cytidine. Values represent mean+/−S.E.M (n=3).

FIGS. 121A-121D Comparison of cytidine deaminase activity of RESCUEr16, full ADAR2 (with RESCUEr16 mutations), ADAR2 deaminase domain (with RESCUEr16 mutations), and without any protein. FIG. 121A. Adenosine deaminase activity measured by Cluc activity restoration with a targeting guide and RESCUEr16, full ADAR2 (with RESCUEr16 mutations), ADAR2 deaminase domain (with RESCUEr16 mutations), and no protein. Values represent mean+/−S.E.M (n=3). FIG. 121B. Cytidine deaminase activity measured by Gluc activity restoration with a targeting guide and RESCUEr16, full ADAR2 (with RESCUEr16 mutations), ADAR2 deaminase domain (with RESCUEr16 mutations), and no protein. Values represent mean+/−S.E.M (n=3). FIG. 121C. Percent editing of a site in the Gluc transcript with varying 5 ÅL bases with a targeting guide and RESCUEr16, full ADAR2 (with RESCUEr16 mutations), ADAR2 deaminase domain (with RESCUEr16 mutations), and no protein. Values represent mean+/−S.E.M (n=3). FIG. 121D. Percent editing of a site in the Gluc transcript with varying 5 ÅL bases with a non-targeting guide and RESCUEr16, full ADAR2 (with RESCUEr16 mutations), ADAR2 deaminase domain (with RESCUEr16 mutations), and no protein. Values represent mean+/−S.E.M (n=3).

FIGS. 122A-122C Comparison of cytidine deaminase activity of RESCUEr16, full ADAR2 (with RESCUEr16 mutations), ADAR2 deaminase domain (with RESCUEr16 mutations), and without any protein. FIG. 122A. Editing of a UCG site in the Gluc transcript with RESCUEr16 and guide RNAs containing varying mismatch positions. Values represent mean+/−S.E.M (n=3). FIG. 122B. Editing of a UCG site in the Gluc transcript with full-length ADAR2 (with RESCUEr16 mutations) and guide RNAs containing varying mismatch positions. Values represent mean+/−S.E.M (n=3). FIG. 122C. Editing of a UCG site in the Gluc transcript with ADAR2 deaminase domain (with RESCUEr16 mutations) and guide RNAs containing varying mismatch positions. Values represent mean+/−S.E.M (n=3).

FIGS. 123A-123C Cytidine deamination activity of RESCUEr16 on a Gluc transcript with guides without direct repeats of 30 or 50 nt in length and varying mismatches. FIG. 123A. Cytidine deamination activity of RESCUEr16 on a Gluc transcript with 30 nt guides without direct repeats and varying mismatches. Values represent mean+/−S.E.M (n=3). FIG. 123B. Cytidine deamination activity of RESCUEr16 on a Gluc transcript with 50 nt guides without direct repeats and varying mismatches. Values represent mean+/−S.E.M (n=3). FIG. 123C. Cytidine deamination activity of RESCUEr16 on a Gluc transcript with 30 nt guides with direct repeats and varying mismatches. Values represent mean+/−S.E.M (n=3).

FIGS. 124A-124F Cytidine deamination activity of alternative RNA editing technologies with RESCUE mutations incorporated into them. FIG. 124 A. Cytidine deamination activity of MS2-recruited ADAR deaminase domain(24) with RESCUE mutations on a Gluc transcript with 30 nt guides with different base-flips and varying mismatches. Activity is measured by restoration of luciferase activity. Values represent mean+/−S.E.M (n=3); NT, non-targeting guide. FIG. 124B. Percent Gluc editing by MS2-recruited ADAR deaminase domain(24) with RESCUE mutations on a Gluc transcript with 30 nt guides with different base-flips and varying mismatches. Values represent mean+/−S.E.M (n=3); NT, non-targeting guide. FIG. 124C. Cytidine deamination activity of associated ADAR guide RNA technology(24) with the deaminase domain containing RESCUE mutations on a Gluc transcript with 30 nt guides with different base-flips and varying mismatches. Activity is measured by restoration of luciferase activity. Values represent mean+/−S.E.M (n=3); NT, non-targeting guide. FIG. 124D. Percent Gluc editing by associated ADAR guide RNA technology(24) with the deaminase domain containing RESCUE mutations on a Gluc transcript with 30 nt guides with different base-flips and varying mismatches. Values represent mean+/−S.E.M (n=3); NT, non-targeting guide. FIG. 124E. Cytidine deamination activity of guide RNA-recruited ADAR deaminase domain(11) with RESCUE mutations on a Gluc transcript with 30 nt guides with different base-flips and varying mismatches. Activity is measured by restoration of luciferase activity. Values represent mean+/−S.E.M (n=3); NT, non-targeting guide. FIG. 124F. Percent Gluc editing by guide RNA-recruited ADAR deaminase domain(11) with RESCUE mutations on a Gluc transcript with 30 nt guides with different base-flips and varying mismatches. Values represent mean+/−S.E.M (n=3); NT, non-targeting guide.

FIGS. 125A-125C Mismatch position tiling to find optimal editing guide design for RESCUE on endogenous target sites. FIG. 125A. Percent editing of endogenous target sites with varying base motifs with RESCUE and guides with mismatches at position 7, 9, 11, and 13 and U base flips. Values represent mean+/−S.E.M (n=3). FIG. 125B. Percent editing of endogenous target sites with varying base motifs with RESCUE and guides with mismatches at position 7, 9, 11, and 13 and C base flips. Values represent mean+/−S.E.M (n=3). FIG. 125C. Percent editing of endogenous target sites with varying base motifs with RESCUE and guides with mismatches at position 3, 5, 7, 9, and 11 and C and U base flips. Values represent mean+/−S.E.M (n=3).

FIGS. 126A-126B Cytidine deamination activity of RESCUEr0-r16 as measured by percent editing at various endogenous sites and at varying amounts. FIG. 126A. Heatmap depicting editing efficiency of RESCUEr0-r16 on a panel of three endogenous genes. Values represent mean of three replicates. FIG. 126B. Cytidine deamination activity of varying amounts of RESCUEr0-r16 as measured by percent editing at a KRAS site. Values represent mean of three replicates.

FIGS. 127A-127B Percent editing of various disease-relevant mutations on synthetic reporters. FIG. 127A. Editing efficiency of RESCUE on a set of synthetic versions of relevant T>C disease mutations with the best possible mismatch guide per target site. Editing rates vary between 1% and 42% and conditions are shown sorted by editing efficiency. All editing rates for synthetic sites are listed in Table 31. Values represent mean+/−S.E.M (n=3). FIG. 127B. Editing of disease relevant mutations using RESCUE and guides with varying mismatch positions. Values represent mean+/−S.E.M (n=3).

FIG. 128 Percent editing at ApoE4 cytosines with RESCUE with guides of varying C and U mismatch positions. ApoE4 variants (rs429358 and rs7412) increase Alzheimer's risk markedly, and are edited by RESCUE at rate up to 5% and 12% on the two sites. All editing rates for synthetic sites are listed in Table 31. Values represent mean+/−S.E.M (n=3).

FIGS. 129A-129F RNA editing and signal modulation of STAT1/STAT3 by RESCUE. STAT3 and STAT1 are transcription factors that play important roles in signal transduction via the JAK/STAT pathway and are typically activated via phosphorylation by cytokines and growth factors. To demonstrate signaling modulation via RNA editing, we altered activation of the STAT pathway by editing phosphorylation sites Y705 and 5727 on STAT3 and Y701 and S727 on STAT1 with RESCUE over the course of 48 hours. FIG. 129A. Schematic of STAT3 domains and RESCUE guides targeting phosphorylated residues of STAT3 to alter associated signaling pathways (SEQ ID NO:768-770). FIG. 129B. Percent editing at relevant phosphorylated residues in STAT3 by RESCUE. In HEK293FT cells, we observed 6% editing of the S727 STAT3 site and 11% and 7% editing of the Y701 and S727 STAT1 sites, respectively. FIG. 129C. Inhibition of STAT3 signaling by RNA editing as measured by STAT3-driven luciferase expression with guides with different base-flips. These edits resulted in 13% repression of STAT3 and STAT1 activity. FIG. 129D. Percent editing at S727F phosphorylated residue site in STAT1 by RESCUE with guides with varying base-flips. FIG. 129E. Percent editing at Y701C phosphorylated residue site in STAT1 by RESCUE with guides with varying base-flips. FIG. 129F. Inhibition of STAT1 signaling by RNA editing with RESCUE as measured by STATdriven luciferase expression.

FIGS. 130A-130B Modulation of b-catenin phosphorylation and cell growth in HUVEC cells. FIG. 130A. Quantitation of cellular growth due to activation of CTNNB1 signaling by RNA editing in HUVEC cells. RESCUE stimulated HUVEC growth to levels comparable to levels observed in cells overexpressing a b-catenin phosphorylation-null mutant. NT, nontargeting guide. FIG. 130B. Representative microscopy images of RESCUE CTNNB1 targeting and non-targeting guides in HUVEC cells.

FIG. 131. RESCUE C to U and A to I activity on transcripts with varying 5′ and 3′ flanking bases around the target site with different C-terminal truncations of dRanCas13b.

FIGS. 132A-132C Specificity of candidate rounds in the guide duplex window. FIG. 132A. Schematic of editing site of Gaussia luciferase mutant C82R, with the targeted C highlighted in red and nearby adenine bases numbered and highlighted in gray (SEQ ID NO:771). FIG. 132B. Percent editing of at nearby adenine bases in Gaussia luciferase mutant C82R with targeting by RESCUEr0, RESCUEr8, and RESCUEr16. FIG. 132C. Percent editing of adenine to guanosine at adenine 20 by varying amounts of RESCUEr0-r16. Values represent mean of three replicates.

FIGS. 133A-133D Off-targets nearby target cytidines in single-plex and multiplex targeting by RESCUE r0, r8, and r16. FIG. 133A. Schematic of editing site of KRAS transcript, with the targeted C highlighted in red and nearby adenine bases numbered and highlighted in gray (SEQ ID NO:772). FIG. 133B. Percent editing of at nearby adenine bases in KRAS transcript with targeting by RESCUEr0, RESCUEr8, and RESCUEr16. FIG. 133C. Schematic of multiplexed editing sites of CTNNB1 transcript, with the two targeted C sites highlighted in red and nearby adenine bases numbered and highlighted in gray (SEQ ID NO:773). FIG. 133D. Percent editing of at nearby adenine bases in CTNNB1 transcript with multiplexed targeting by RESCUEr0, RESCUEr8, and RESCUEr16

FIGS. 134A-134F Characterization of RESCUE and RESCUE-S transcriptome-wide off-targets. FIG. 134A. Predicted effect of transcriptome-wide off-target edits by RESCUE with a targeting guide against a site on the luciferase transcript. FIG. 134B. Predicted oncogenic effects of transcriptome-wide off-target edits by RESCUE with a targeting guide against a site on the luciferase transcript. FIG. 134C. Transcriptome wide off-targets visualized as the number of off-target edits per transcript by RESCUE with a targeting guide against a site on the luciferase transcript. FIG. 134D. Predicted effect of transcriptome-wide off-target edits by RESCUE-S with a targeting guide against a site on the luciferase transcript.

FIG. 134E. Predicted oncogenic effects of transcriptome-wide off-target edits by RESCUE-S with a targeting guide against a site on the luciferase transcript. FIG. 134F. Transcriptome wide off-targets visualized as the number of off-target edits per transcript by RESCUE-S with a targeting guide against a site on the luciferase transcript.

FIGS. 135A-135C Characterization of 5′ and 3′ flanking bases of transcriptome-wide off-targets. FIG. 135A. The number of off-targets with each of all 16 possible 5 ÅL and 3 ÅL flanking bases by RESCUE with a targeting guide against a site on the luciferase transcript. FIG. 135B. The number of off-targets with each of all 16 possible 5 ÅL and 3 ÅL flanking bases by RESCUE-S with a targeting guide against a site on the luciferase transcript.

FIG. 135C. Number of significantly differentially expressed transcripts in conditions with RESCUE constructs targeting luciferase transcripts.

FIGS. 136A-136B Biochemical deamination activity of ADAR2 deaminase domain containing RESCUEr0, RESCUEr16 and RESCUEr16-S mutations using recombinant protein. FIG. 136A. Adenosine deamination activity of ADAR2 deaminase domain protein containing various candidate mutations with a 22 bp double-stranded RNA substrate containing a center adenine mismatched with a cytosine. Reactions were incubated for varying time points and with and without the deaminase domain. Values represent mean+/−S.E.M (n=3, some error bars occluded by symbols). FIG. 136B. Cytidine deamination activity of ADAR2 deaminase domain protein containing various candidate mutations with a 22 bp double-stranded RNA substrate containing a center cytosine mismatched with a uridine. Reactions were incubated for varying time points and with and without the deaminase domain. Values represent mean+/−S.E.M (n=3, some error bars occluded by symbols).

FIGS. 137A-137D Adenosine deaminase activity of RESCUE and RESCUE-S. FIG. 137A. Luciferase correction via adenosine deamination of the Gluc transcript by RESCUE and RESCUE-S using a targeting guide RNA. Values represent mean+/−S.E.M (n=3). FIG. 137B. Luciferase correction via adenosine deamination of the Gluc transcript by RESCUE and RESCUE-S using a non-targeting guide RNA. Values represent mean+/−S.E.M (n=3). FIG. 137C. Percent editing of adenosine to inosine of the Gluc transcript by RESCUE and RESCUES using a targeting guide RNA. Values represent mean+/−S.E.M (n=3). FIG. 137D. Percent editing of adenosine to inosine of the Gluc transcript by RESCUE and RESCUES using a non-targeting guide RNA. Values represent mean+/−S.E.M (n=3).

FIGS. 138A-138C Cytidine deamination activity and off-target activity on a b-catenin target site using varying amounts of RESCUEr0-r16 and RESCUEr16-S. FIG. 138A. Schematic of editing site of CTNNB1 T41I, with the targeted C highlighted in red and the nearby off-target adenine bases highlighted in gray (SEQ ID NO:774). FIG. 138B. Percent editing of cytosine to uridine (T41A) by varying amounts of RESCUEr0-r16 and RESCUEr16-S. Values represent mean of three replicates. FIG. 138C. Percent editing of adenine to guanosine at the off-target adenine by varying amounts of RESCUEr0-r16 and RESCUEr16-S. Values represent mean of three replicates.

FIGS. 139A-139C Editing of STAT1 and STAT3 by RESCUE and RESCUE-S. FIG. 139A. Schematic of edited sites at STAT3 by C to U and A to I editing (SEQ ID NO:775-778). FIG. 139B. Percent A to I editing at tyrosine residues in STAT1 and STAT3 by RESCUE and RESCUE-S. Values represent mean+/−S.E.M (n=3); NT, non-targeting guide. FIG. 139C. Percent C to U editing at serine residues in STAT1 and STAT3 by RESCUE and RESCUE-S. Values represent mean+/−S.E.M (n=3); NT, non-targeting guide.

FIGS. 140A-140E On target and off-target editing of RESCUE and RESCUE-S on endogenous targets. FIG. 140A. Percent editing of endogenous target sites with varying base motifs with RESCUE and RESCUE-S. Values represent mean+/−S.E.M (n=3). FIG. 140B. Percent editing of at neighboring adenine bases in NRAS I21I with targeting by RESCUE and RESCUE-S. FIG. 140C. Percent editing of at neighboring adenine bases in NF2 T21M with targeting by RESCUE and RESCUE-S. FIG. 140D. Percent editing of at neighboring adenine bases in RAF1 P30S with targeting by RESCUE and RESCUE-S. FIG. 140E. Percent editing of at neighboring adenine bases in CTNNB1 P44S with targeting by RESCUE and RESCUE-S.

FIG. 141 Summary of amino acid changes enabled by RESCUE. Codon table showing all potential amino acid changes possible by RESCUE.

The figures herein are for illustrative purposes only and are not necessarily drawn to scale.

DETAILED DESCRIPTION OF THE EXAMPLE EMBODIMENTS Definitions

Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. Definitions of common terms and techniques in molecular biology may be found in Molecular Cloning: A Laboratory Manual, 2nd edition (1989) (Sambrook, Fritsch, and Maniatis); Molecular Cloning: A Laboratory Manual, 4th edition (2012) (Green and Sambrook); Current Protocols in Molecular Biology (1987) (F. M. Ausubel et al. eds.); the series Methods in Enzymology (Academic Press, Inc.): PCR 2: A Practical Approach (1995) (M. J. MacPherson, B. D. Hames, and G. R. Taylor eds.): Antibodies, A Laboratory Manual (1988) (Harlow and Lane, eds.): Antibodies A Laboratory Manual, 2nd edition 2013 (E. A. Greenfield ed.); Animal Cell Culture (1987) (R.I. Freshney, ed.); Benjamin Lewin, Genes IX, published by Jones and Bartlet, 2008 (ISBN 0763752223); Kendrew et al. (eds.), The Encyclopedia of Molecular Biology, published by Blackwell Science Ltd., 1994 (ISBN 0632021829); Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by VCH Publishers, Inc., 1995 (ISBN 9780471185710); Singleton et al., Dictionary of Microbiology and Molecular Biology 2nd ed., J. Wiley & Sons (New York, N.Y. 1994), March, Advanced Organic Chemistry Reactions, Mechanisms and Structure 4th ed., John Wiley & Sons (New York, N.Y. 1992); and Marten H. Hofker and Jan van Deursen, Transgenic Mouse Methods and Protocols, 2nd edition (2011)

As used herein, the singular forms “a”, “an”, and “the” include both singular and plural referents unless the context clearly dictates otherwise.

The term “optional” or “optionally” means that the subsequent described event, circumstance or substituent may or may not occur, and that the description includes instances where the event or circumstance occurs and instances where it does not.

The recitation of numerical ranges by endpoints includes all numbers and fractions subsumed within the respective ranges, as well as the recited endpoints.

The terms “about” or “approximately” as used herein when referring to a measurable value such as a parameter, an amount, a temporal duration, and the like, are meant to encompass variations of and from the specified value, such as variations of +/−10% or less, +/−5% or less, +/−1% or less, and +1-0.1% or less of and from the specified value, insofar such variations are appropriate to perform in the disclosed invention. It is to be understood that the value to which the modifier “about” or “approximately” refers is itself also specifically, and preferably, disclosed.

As used herein, a “biological sample” may contain whole cells and/or live cells and/or cell debris. The biological sample may contain (or be derived from) a “bodily fluid”. The present invention encompasses embodiments wherein the bodily fluid is selected from amniotic fluid, aqueous humour, vitreous humour, bile, blood serum, breast milk, cerebrospinal fluid, cerumen (earwax), chyle, chyme, endolymph, perilymph, exudates, feces, female ejaculate, gastric acid, gastric juice, lymph, mucus (including nasal drainage and phlegm), pericardial fluid, peritoneal fluid, pleural fluid, pus, rheum, saliva, sebum (skin oil), semen, sputum, synovial fluid, sweat, tears, urine, vaginal secretion, vomit and mixtures of one or more thereof. Biological samples include cell cultures, bodily fluids, cell cultures from bodily fluids. Bodily fluids may be obtained from a mammal organism, for example by puncture, or other collecting or sampling procedures.

The terms “subject,” “individual,” and “patient” are used interchangeably herein to refer to a vertebrate, preferably a mammal, more preferably a human. Mammals include, but are not limited to, murines, simians, humans, farm animals, sport animals, and pets. Tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro are also encompassed.

Whenever reference is made herein to Cas13, it will be understood that a mutated or engineered Cas13 according to the invention as described herein is meant, unless explicitly indicated otherwise. Whenever reference is made herein to Cas13, preferably a mutated or engineered Cas13a, Cas13b, Cas13c, or Cas13d according to the invention as described herein is meant, unless explicitly indicated otherwise. Whenever reference is made herein to Cas13, preferably a mutated or engineered Cas13b according to the invention as described herein is meant, unless explicitly indicated otherwise.

Various embodiments are described hereinafter. It should be noted that the specific embodiments are not intended as an exhaustive description or as a limitation to the broader aspects discussed herein. One aspect described in conjunction with a particular embodiment is not necessarily limited to that embodiment and can be practiced with any other embodiment(s). Reference throughout this specification to “one embodiment”, “an embodiment,” “an example embodiment,” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment,” “in an embodiment,” or “an example embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to a person skilled in the art from this disclosure, in one or more embodiments. Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention. For example, in the appended claims, any of the claimed embodiments can be used in any combination.

All publications, published patent documents, and patent applications cited herein are hereby incorporated by reference to the same extent as though each individual publication, published patent document, or patent application was specifically and individually indicated as being incorporated by reference.

Overview

In one aspect, embodiments disclosed herein are directed to an engineered CRISPR-Cas protein comprising one or more modified amino acids. In certain embodiments, the engineered CRISPR-Cas protein increases or decreases one or more of PFS recognition/specificity, gRNA binding, protease activity, polynucleotide binding capability, stability, specificity, target binding, off-target binding, and/or catalytic activity as compared to a corresponding wild-type CRISPR-Cas protein. In certain embodiments, the CRISPR-Cas protein comprises one or more HEPN domains, and comprises one or more modified amino acids. The modified amino acids may interact with a guide RNA that forms a complex with the CRISPR-Cas protein, and/or are in a HEPN active site, an inter-domain linker domain, a lid domain, a helical domain or a bridge helix domain of the CRISPR-Cas protein, or a combination thereof. In some examples, the engineered CRISPR-Cas protein comprising one or more HEPN domains and further comprising one or more modified amino acids, wherein the amino acids: interact with a guide RNA that forms a complex with the engineered CRISPR-Cas protein; are in a HEPN active site, an inter-domain linker domain, a lid domain, a helical domain 1, a helical domain 2, or a bridge helix domain of the engineered CRISPR-Cas protein; or a combination thereof.

In another aspect, embodiments disclosed herein provide a sub-set of newly identified CRISPR-Cas orthologs that are smaller in size than previously discovered CRISPR-Cas orthologs, including further modifications to and uses thereof. In particular embodiments, the CRISPR-Cas orthologs are less than about 1000 amino acids and can be optionally provided as part of a fusion protein.

Engineered nucleotide deaminases are also provided herein. In certain embodiments, the engineered nucleotide deaminases are adenosine deaminases that can be engineered to comprise cytidine deaminase activity. In embodiments, the engineered nucleotide deaminases may be fused to a Cas protein, including the CRISPR-Cas proteins disclosed herein.

In another aspect, embodiments disclosed herein include systems and uses for such modified CRISPR-Cas proteins including, but not limited to, diagnostics, base editing therapeutics and methods of detection. Fusion proteins comprising a CRISPR Cas protein, including those disclosed herein, and nucleotide deaminase may also be used for base editing. Delivery of the proteins and systems disclosed is also provided, including to a variety of cells and via a variety of particles, vesicles and vectors.

CRISPR-Cas Systems in General

In general, the CRISPR-Cas or CRISPR system refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA), a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a “spacer” in the context of an endogenous CRISPR system), or “RNA(s)” as that term is herein used (e.g., RNA(s) to guide Cas, such as Cas9, e.g. CRISPR RNA and transactivating (tracr) RNA or a single guide RNA (sgRNA) (chimeric RNA)) or other sequences and transcripts from a CRISPR locus. In general, a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system). When the CRISPR protein is a Class 2 Type VI effector, a tracrRNA is not required. In an engineered system of the invention, the direct repeat may encompass naturally-occurring sequences or non-naturally-occurring sequences. The direct repeat of the invention is not limited to naturally occurring lengths and sequences. A direct repeat can be 36nt in length, but a longer or shorter direct repeat can vary. For example, a direct repeat can be 30nt or longer, such as 30-100 nt or longer. For example, a direct repeat can be 30 nt, 40nt, 50nt, 60nt, 70nt, 70nt, 80nt, 90nt, 100nt or longer in length. In some embodiments, a direct repeat of the invention can include synthetic nucleotide sequences inserted between the 5′ and 3′ ends of naturally occurring direct repeats. In certain embodiments, the inserted sequence may be self-complementary, for example, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% self-complementary. Furthermore, a direct repeat of the invention may include insertions of nucleotides such as an aptamer or sequences that bind to an adapter protein (for association with functional domains). In certain embodiments, one end of a direct repeat containing such an insertion is roughly the first half of a short DR and the end is roughly the second half of the short DR.

The CRISPR-Cas protein (used interchangeably herein with “Cas protein”, “Cas effector”) may include Cas9, Cas 12 (e.g., Cas12a, Cas12b, Cas12c, Cas12d, etc.), Cas13 (e.g., Cas13a, Cas13b (such as Cas13b-t1, Cas13b-t2, Cas13b-t3), Cas13c, Cas13d, etc.), Cas14, CasX, and CasY. In some embodiments, the CRISPR-Cas protein may be a type VI CRISPR-Cas protein. For example, the Type VI CRISPR-Cas protein may be a Cas13 protein. The Cas13 protein may be Cas13a, a Cas13b, a Cas13c, or a Cas13d. In some examples, the CRISPR-Cas protein is Cas13a. In some examples, the CRISPR-Cas protein is Cas13b. In some examples, the CRISPR-Cas protein is Cas13c. In some examples, the CRISPR-Cas protein is Cas13d.

In some embodiments, an engineered CRISPR-Cas protein comprising one or more HEPN domains and is less than 1000 amino acids in length. For example, the protein may be less than 950, less than 900, less than 850, less than 800, less, or than 750 amino acids in size.

In certain example embodiments, the CRISPR-Cas protein comprises at least one HEPN domain, including but not limited to the HEPN domains described herein, HEPN domains known in the art, and domains recognized to be HEPN domains by comparison to consensus sequence motifs. Several such domains are provided herein. In one non-limiting example, a consensus sequence can be derived from the sequences of C2c2 or Cas13b orthologs provided herein. In certain example embodiments, the effector protein comprises a single HEPN domain. In certain other example embodiments, the effector protein comprises two HEPN domains.

In one example embodiment, the one or more HEPN domains comprises a RxxxxH motif. The RxxxxH motif sequence can be, without limitation, from a HEPN domain described herein or a HEPN domain known in the art. RxxxxH motif sequences further include motif sequences created by combining portions of two or more HEPN domains. As noted, consensus sequences can be derived from the sequences of the orthologs disclosed in U.S. Provisional Patent Application 62/432,240 entitled “Novel CRISPR Enzymes and Systems,” U.S. Provisional Patent Application 62/471,710 entitled “Novel Type VI CRISPR Orthologs and Systems” filed on Mar. 15, 2017, and U.S. Provisional patent application entitled “Novel Type VI CRISPR Orthologs and Systems,” labeled as attorney docket number 47627-05-2133 and filed on Apr. 12, 2017.

In an embodiment of the invention, a HEPN domain comprises at least one RxxxxH motif comprising the sequence of R{N/H/K}X1X2X3H. In an embodiment of the invention, a HEPN domain comprises a RxxxxH motif comprising the sequence of R{N/H}X1X2X3H. In an embodiment of the invention, a HEPN domain comprises the sequence of R{N/K}X1X2X3H. In certain embodiments, X1 is R, S, D, E, Q, N, G, Y, or H. In certain embodiments, X2 is I, S, T, V, or L. In certain embodiments, X3 is L, F, N, Y, V, I, S, D, E, or A.

In the context of formation of a CRISPR complex, “target sequence” refers to a sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex. A target sequence may comprise any polynucleotide, such as DNA or RNA polynucleotides. In some embodiments, a target sequence is located in the nucleus or cytoplasm of a cell. In some embodiments, direct repeats may be identified in silico by searching for repetitive motifs that fulfill any or all of the following criteria: 1. found in a 2Kb window of genomic sequence flanking the type II CRISPR locus; 2. span from 20 to 50 bp; and 3. interspaced by 20 to 50 bp. In some embodiments, 2 of these criteria may be used, for instance 1 and 2, 2 and 3, or 1 and 3. In some embodiments, all 3 criteria may be used.

In embodiments of the invention the terms guide sequence and guide RNA, e.g., RNA capable of guiding CRISPR-Cas effector proteins to a target locus, are used interchangeably as in herein cited documents such as WO 2014/093622 (PCT/US2013/074667). In general, a guide sequence (or spacer sequence) is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a CRISPR complex to the target sequence. In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting example of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g. the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies; available at www.novocraft.com), ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net). In some embodiments, a guide sequence (or spacer sequence) is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length. Preferably the guide sequence is 10-40 nucleotides long, such as 20-30 or 20-40 nucleotides long or longer, such as 30 nucleotides long or about 30 nucleotides long. In certain embodiments, the guide sequence is 10-30 nucleotides long, such as 20-30 or 20-40 nucleotides long or longer, such as 30 nucleotides long or about 30 nucleotides long for CRISPR-Cas effectors. In certain embodiments, the guide sequence is 10-30 nucleotides long, such as 20-30 nucleotides long, such as 30 nucleotides long. The ability of a guide sequence to direct sequence-specific binding of a CRISPR complex to a target sequence may be assessed by any suitable assay. For example, the components of a CRISPR system sufficient to form a CRISPR complex, including the guide sequence to be tested, may be provided to a host cell having the corresponding target sequence, such as by transfection with vectors encoding the components of the CRISPR sequence, followed by an assessment of preferential cleavage within the target sequence, such as by Surveyor assay as described herein. Similarly, cleavage of a target polynucleotide sequence may be evaluated in a test tube by providing the target sequence, components of a CRISPR complex, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions. Other assays are possible, and will occur to those skilled in the art.

In a classic CRISPR-Cas systems, the degree of complementarity between a guide sequence and its corresponding target sequence can be about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or 100%; a guide or RNA or crRNA can be about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length; or guide or RNA or crRNA can be less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length; and advantageously tracr RNA is 30 or 50 nucleotides in length. However, an aspect of the invention is to reduce off-target interactions, e.g., reduce the guide interacting with a target sequence having low complementarity. Indeed, in the examples, it is shown that the invention involves mutations that result in the CRISPR-Cas system being able to distinguish between target and off-target sequences that have greater than 80% to about 95% complementarity, e.g., 83%-84% or 88-89% or 94-95% complementarity (for instance, distinguishing between a target having 18 nucleotides from an off-target of 18 nucleotides having 1, 2 or 3 mismatches). Accordingly, in the context of the present invention the degree of complementarity between a guide sequence and its corresponding target sequence is greater than 94.5% or 95% or 95.5% or 96% or 96.5% or 97% or 97.5% or 98% or 98.5% or 99% or 99.5% or 99.9%, or 100%. Off target is less than 100% or 99.9% or 99.5% or 99% or 99% or 98.5% or 98% or 97.5% or 97% or 96.5% or 96% or 95.5% or 95% or 94.5% or 94% or 93% or 92% or 91% or 90% or 89% or 88% or 87% or 86% or 85% or 84% or 83% or 82% or 81% or 80% complementarity between the sequence and the guide, with it advantageous that off target is 100% or 99.9% or 99.5% or 99% or 99% or 98.5% or 98% or 97.5% or 97% or 96.5% or 96% or 95.5% or 95% or 94.5% complementarity between the sequence and the guide.

In certain embodiments, modulations of cleavage efficiency can be exploited by introduction of mismatches, e.g. 1 or more mismatches, such as 1 or 2 mismatches between spacer sequence and target sequence, including the position of the mismatch along the spacer/target. The more central (i.e. not 3′ or 5′) for instance a double mismatch is, the more cleavage efficiency is affected. Accordingly, by choosing mismatch position along the spacer, cleavage efficiency can be modulated. By means of example, if less than 100% cleavage of targets is desired (e.g. in a cell population), 1 or more, such as preferably 2 mismatches between spacer and target sequence may be introduced in the spacer sequences. The more central along the spacer of the mismatch position, the lower the cleavage percentage.

The methods according to the invention as described herein comprehend inducing one or more nucleotide modifications in a eukaryotic cell (in vitro, i.e. in an isolated eukaryotic cell) as herein discussed comprising delivering to cell a vector as herein discussed. The mutation(s) can include the introduction, deletion, or substitution of one or more nucleotides at each target sequence of cell(s) via the guide(s) RNA(s) or sgRNA(s). The mutations can include the introduction, deletion, or substitution of 1-75 nucleotides at each target sequence of said cell(s) via the guide(s) RNA(s). The mutations can include the introduction, deletion, or substitution of 1, 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, or 75 nucleotides at each target sequence of said cell(s) via the guide(s) RNA(s). The mutations can include the introduction, deletion, or substitution of 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, or 75 nucleotides at each target sequence of said cell(s) via the guide(s) RNA(s). The mutations include the introduction, deletion, or substitution of 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, or 75 nucleotides at each target sequence of said cell(s) via the guide(s) RNA(s). The mutations can include the introduction, deletion, or substitution of 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, or 75 nucleotides at each target sequence of said cell(s) via the guide(s) RNA(s). The mutations can include the introduction, deletion, or substitution of 40, 45, 50, 75, 100, 200, 300, 400 or 500 nucleotides at each target sequence of said cell(s) via the guide(s) RNA(s).

For minimization of toxicity and off-target effect, it will be important to control the concentration of Cas mRNA or protein and guide RNA delivered. Optimal concentrations of Cas mRNA or protein and guide RNA can be determined by testing different concentrations in a cellular or non-human eukaryote animal model and using deep sequencing the analyze the extent of modification at potential off-target genomic loci.

Typically, in the context of an endogenous CRISPR system, formation of a CRISPR complex (comprising a guide sequence hybridized to a target sequence and complexed with one or more Cas proteins) results in cleavage in or near (e.g. within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, or more base pairs from) the target sequence, but may depend on for instance secondary structure, in particular in the case of RNA targets. In some cases, in the context of an endogenous CRISPR system, formation of a CRISPR complex (comprising a guide sequence hybridized to a target sequence and complexed with one or more Cas proteins) results in cleavage of one or both strands (if applicable) in or near (e.g. within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, or more base pairs from) the target sequence.

In particularly preferred embodiments according to the invention, the guide RNA (capable of guiding Cas to a target locus) may comprise (1) a guide sequence capable of hybridizing to a target locus (a polynucleotide target locus, such as an RNA target locus) in the eukaryotic cell; (2) a direct repeat (DR) sequence) which reside in a single RNA, i.e. an sgRNA (arranged in a 5′ to 3′ orientation) or crRNA.

With respect to general information on CRISPR-Cas Systems, components thereof, and delivery of such components, including methods, materials, delivery vehicles, vectors, particles, AAV, and making and using thereof, including as to amounts and formulations, all useful in the practice of the instant invention, reference is made to: U.S. Pat. Nos. 8,999,641, 8,993,233, 8,945,839, 8,932,814, 8,906,616, 8,895,308, 8,889,418, 8,889,356, 8,871,445, 8,865,406, 8,795,965, 8,771,945 and 8,697,359; US Patent Publications US 2014-0310830 (U.S. application Ser. No. 14/105,031), US 2014-0287938 A1 (U.S. application Ser. No. 14/213,991), US 2014-0273234 A1 (U.S. application Ser. No. 14/293,674), US2014-0273232 A1 (U.S. application Ser. No. 14/290,575), US 2014-0273231 (U.S. application Ser. No. 14/259,420), US 2014-0256046 A1 (U.S. application Ser. No. 14/226,274), US 2014-0248702 A1 (U.S. application Ser. No. 14/258,458), US 2014-0242700 A1 (U.S. application Ser. No. 14/222,930), US 2014-0242699 A1 (U.S. application Ser. No. 14/183,512), US 2014-0242664 A1 (U.S. application Ser. No. 14/104,990), US 2014-0234972 A1 (U.S. application Ser. No. 14/183,471), US 2014-0227787 A1 (U.S. application Ser. No. 14/256,912), US 2014-0189896 A1 (U.S. application Ser. No. 14/105,035), US 2014-0186958 (U.S. application Ser. No. 14/105,017), US 2014-0186919 A1 (U.S. application Ser. No. 14/104,977), US 2014-0186843 A1 (U.S. application Ser. No. 14/104,900), US 2014-0179770 A1 (U.S. application Ser. No. 14/104,837) and US 2014-0179006 A1 (U.S. application Ser. No. 14/183,486), US 2014-0170753 (U.S. application Ser. No. 14/183,429); European Patents EP 2 784 162 B1 and EP 2 771 468 B1; European Patent Applications EP 2 771 468 (EP13818570.7), EP 2 764 103 (EP13824232.6), and EP 2 784 162 (EP14170383.5); and PCT Patent Publications PCT Patent Publications WO 2014/093661 (PCT/US2013/074743), WO 2014/093694 (PCT/US2013/074790), WO 2014/093595 (PCT/US2013/074611), WO 2014/093718 (PCT/US2013/074825), WO 2014/093709 (PCT/US2013/074812), WO 2014/093622 (PCT/US2013/074667), WO 2014/093635 (PCT/US2013/074691), WO 2014/093655 (PCT/US2013/074736), WO 2014/093712 (PCT/US2013/074819), WO 2014/093701 (PCT/US2013/074800), WO 2014/018423 (PCT/US2013/051418), WO 2014/204723 (PCT/US2014/041790), WO 2014/204724 (PCT/US2014/041800), WO 2014/204725 (PCT/US2014/041803), WO 2014/204726 (PCT/US2014/041804), WO 2014/204727 (PCT/US2014/041806), WO 2014/204728 (PCT/US2014/041808), WO 2014/204729 (PCT/US2014/041809). Reference is also made to U.S. provisional patent applications 61/758,468; 61/802,174; 61/806,375; 61/814,263; 61/819,803 and 61/828,130, filed on Jan. 30, 2013; Mar. 15, 2013; Mar. 28, 2013; Apr. 20, 2013; May 6, 2013 and May 28, 2013 respectively. Reference is also made to U.S. provisional patent application 61/836,123, filed on Jun. 17, 2013. Reference is additionally made to U.S. provisional patent applications 61/835,931, 61/835,936, 61/836,127, 61/836,101, 61/836,080 and 61/835,973, each filed Jun. 17, 2013. Further reference is made to U.S. provisional patent applications 61/862,468 and 61/862,355 filed on Aug. 5, 2013; 61/871,301 filed on Aug. 28, 2013; 61/960,777 filed on Sep. 25, 2013 and 61/961,980 filed on Oct. 28, 2013. Reference is yet further made to: PCT Patent applications Nos: PCT/US2014/041803, PCT/US2014/041800, PCT/US2014/041809, PCT/US2014/041804 and PCT/US2014/041806, each filed Jun. 10, 2014 6/10/14; PCT/US2014/041808 filed Jun. 11, 2014; and PCT/US2014/62558 filed Oct. 28, 2014, and U.S. Provisional Patent Applications Ser. Nos. 61/915,150, 61/915,301, 61/915,267 and 61/915,260, each filed Dec. 12, 2013; 61/757,972 and 61/768,959, filed on Jan. 29, 2013 and Feb. 25, 2013; 61/835,936, 61/836,127, 61/836,101, 61/836,080, 61/835,973, and 61/835,931, filed Jun. 17, 2013; 62/010,888 and 62/010,879, both filed Jun. 11, 2014; 62/010,329 and 62/010,441, each filed Jun. 10, 2014; 61/939,228 and 61/939,242, each filed Feb. 12, 2014; 61/980,012, filed Apr. 15, 2014; 62/038,358, filed Aug. 17, 2014; 62/054,490, 62/055,484, 62/055,460 and 62/055,487, each filed Sep. 25, 2014; and 62/069,243, filed Oct. 27, 2014. Reference is also made to U.S. provisional patent applications Nos. 62/055,484, 62/055,460, and 62/055,487, filed Sep. 25, 2014; U.S. provisional patent application 61/980,012, filed Apr. 15, 2014; and U.S. provisional patent application 61/939,242 filed Feb. 12, 2014. Reference is made to PCT application designating, inter alia, the United States, application No. PCT/US14/41806, filed Jun. 10, 2014. Reference is made to U.S. provisional patent application 61/930,214 filed on Jan. 22, 2014. Reference is made to U.S. provisional patent applications 61/915,251; 61/915,260 and 61/915,267, each filed on Dec. 12, 2013. Reference is made to US provisional patent application U.S. Ser. No. 61/980,012 filed Apr. 15, 2014. Reference is made to PCT application designating, inter alia, the United States, application No. PCT/US14/41806, filed Jun. 10, 2014. Reference is made to U.S. provisional patent application 61/930,214 filed on Jan. 22, 2014. Reference is made to U.S. provisional patent applications 61/915,251; 61/915,260 and 61/915,267, each filed on Dec. 12, 2013.

Mention is also made of U.S. application 62/091,455, filed, 12 Dec. 2014, PROTECTED GUIDE RNAS (PGRNAS); U.S. application 62/096,708, 24 Dec. 2014, PROTECTED GUIDE RNAS (PGRNAS); U.S. application 62/091,462, 12 Dec. 2014, DEAD GUIDES FOR CRISPR TRANSCRIPTION FACTORS; U.S. application 62/096,324, 23 Dec. 2014, DEAD GUIDES FOR CRISPR TRANSCRIPTION FACTORS; U.S. application 62/091,456, 12 Dec. 2014, ESCORTED AND FUNCTIONALIZED GUIDES FOR CRISPR-CAS SYSTEMS; U.S. application 62/091,461, 12 Dec. 2014, DELIVERY, USE AND THERAPEUTIC APPLICATIONS OF THE CRISPR-CAS SYSTEMS AND COMPOSITIONS FOR GENOME EDITING AS TO HEMATOPOETIC STEM CELLS (HSCs); U.S. application 62/094,903, 19 Dec. 2014, UNBIASED IDENTIFICATION OF DOUBLE-STRAND BREAKS AND GENOMIC REARRANGEMENT BY GENOME-WISE INSERT CAPTURE SEQUENCING; U.S. application 62/096,761, 24 Dec. 2014, ENGINEERING OF SYSTEMS, METHODS AND OPTIMIZED ENZYME AND GUIDE SCAFFOLDS FOR SEQUENCE MANIPULATION; U.S. application 62/098,059, 30 Dec. 2014, RNA-TARGETING SYSTEM; U.S. application 62/096,656, 24 Dec. 2014, CRISPR HAVING OR ASSOCIATED WITH DESTABILIZATION DOMAINS; U.S. application 62/096,697, 24 Dec. 2014, CRISPR HAVING OR ASSOCIATED WITH AAV; U.S. application 62/098,158, 30 Dec. 2014, ENGINEERED CRISPR COMPLEX INSERTIONAL TARGETING SYSTEMS; U.S. application 62/151,052, 22 Apr. 2015, CELLULAR TARGETING FOR EXTRACELLULAR EXOSOMAL REPORTING; U.S. application 62/054,490, 24 Sep. 2014, DELIVERY, USE AND THERAPEUTIC APPLICATIONS OF THE CRISPR-CAS SYSTEMS AND COMPOSITIONS FOR TARGETING DISORDERS AND DISEASES USING PARTICLE DELIVERY COMPONENTS; U.S. application 62/055,484, 25 Sep. 2014, SYSTEMS, METHODS AND COMPOSITIONS FOR SEQUENCE MANIPULATION WITH OPTIMIZED FUNCTIONAL CRISPR-CAS SYSTEMS; U.S. application 62/087,537, 4 Dec. 2014, SYSTEMS, METHODS AND COMPOSITIONS FOR SEQUENCE MANIPULATION WITH OPTIMIZED FUNCTIONAL CRISPR-CAS SYSTEMS; U.S. application 62/054,651, 24 Sep. 2014, DELIVERY, USE AND THERAPEUTIC APPLICATIONS OF THE CRISPR-CAS SYSTEMS AND COMPOSITIONS FOR MODELING COMPETITION OF MULTIPLE CANCER MUTATIONS IN VIVO; U.S. application 62/067,886, 23 Oct. 2014, DELIVERY, USE AND THERAPEUTIC APPLICATIONS OF THE CRISPR-CAS SYSTEMS AND COMPOSITIONS FOR MODELING COMPETITION OF MULTIPLE CANCER MUTATIONS IN VIVO; U.S. application 62/054,675, 24 Sep. 2014, DELIVERY, USE AND THERAPEUTIC APPLICATIONS OF THE CRISPR-CAS SYSTEMS AND COMPOSITIONS IN NEURONAL CELLS/TISSUES; U.S. application 62/054,528, 24 Sep. 2014, DELIVERY, USE AND THERAPEUTIC APPLICATIONS OF THE CRISPR-CAS SYSTEMS AND COMPOSITIONS IN IMMUNE DISEASES OR DISORDERS; U.S. application 62/055,454, 25 Sep. 2014, DELIVERY, USE AND THERAPEUTIC APPLICATIONS OF THE CRISPR-CAS SYSTEMS AND COMPOSITIONS FOR TARGETING DISORDERS AND DISEASES USING CELL PENETRATION PEPTIDES (CPP); U.S. application 62/055,460, 25 Sep. 2014, MULTIFUNCTIONAL-CRISPR COMPLEXES AND/OR OPTIMIZED ENZYME LINKED FUNCTIONAL-CRISPR COMPLEXES; U.S. application 62/087,475, 4 Dec. 2014, FUNCTIONAL SCREENING WITH OPTIMIZED FUNCTIONAL CRISPR-CAS SYSTEMS; U.S. application 62/055,487, 25 Sep. 2014, FUNCTIONAL SCREENING WITH OPTIMIZED FUNCTIONAL CRISPR-CAS SYSTEMS; U.S. application 62/087,546, 4 Dec. 2014, MULTIFUNCTIONAL CRISPR COMPLEXES AND/OR OPTIMIZED ENZYME LINKED FUNCTIONAL-CRISPR COMPLEXES; and U.S. application 62/098,285, 30 Dec. 2014, CRISPR MEDIATED IN VIVO MODELING AND GENETIC SCREENING OF TUMOR GROWTH AND METASTASIS.

Also with respect to general information on CRISPR-Cas Systems, mention is made of the following (also hereby incorporated herein by reference):

    • Multiplex genome engineering using CRISPR/Cas systems. Cong, L., Ran, F. A., Cox, D., Lin, S., Barretto, R., Habib, N., Hsu, P. D., Wu, X., Jiang, W., Marraffini, L. A., & Zhang, F. Science February 15; 339(6121):819-23 (2013);
    • RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Jiang W., Bikard D., Cox D., Zhang F, Marraffini L A. Nat Biotechnol March; 31(3):233-9 (2013);
    • One-Step Generation of Mice Carrying Mutations in Multiple Genes by CRISPR/Cas-Mediated Genome Engineering. Wang H., Yang H., Shivalila C S., Dawlaty M M., Cheng A W., Zhang F., Jaenisch R. Cell May 9; 153(4):910-8 (2013);
    • Optical control of mammalian endogenous transcription and epigenetic states. Konermann S, Brigham M D, Trevino A E, Hsu P D, Heidenreich M, Cong L, Platt R J, Scott D A, Church G M, Zhang F. Nature. August 22; 500(7463):472-6. doi: 10.1038/Nature12466. Epub 2013 Aug. 23 (2013);
    • Double Nicking by RNA-Guided CRISPR Cas9 for Enhanced Genome Editing Specificity. Ran, F A., Hsu, P D., Lin, C Y., Gootenberg, J S., Konermann, S., Trevino, A E., Scott, D A., Inoue, A., Matoba, S., Zhang, Y., & Zhang, F. Cell August 28. pii: S0092-8674(13)01015-5 (2013-A);
    • DNA targeting specificity of RNA-guided Cas9 nucleases. Hsu, P., Scott, D., Weinstein, J., Ran, F A., Konermann, S., Agarwala, V., Li, Y., Fine, E., Wu, X., Shalem, O., Cradick, T J., Marraffini, L A., Bao, G., & Zhang, F. Nat Biotechnol doi:10.1038/nbt.2647 (2013);
    • Genome engineering using the CRISPR-Cas9 system. Ran, F A., Hsu, P D., Wright, J., Agarwala, V., Scott, D A., Zhang, F. Nature Protocols November; 8(11):2281-308 (2013-B); Genome-Scale CRISPR-Cas9 Knockout Screening in Human Cells. Shalem, O., Sanjana, N E., Hartenian, E., Shi, X., Scott, D A., Mikkelson, T., Heckl, D., Ebert, B L., Root, D E., Doench, J G., Zhang, F. Science December 12. (2013). [Epub ahead of print];
    • Crystal structure of cas9 in complex with guide RNA and target DNA. Nishimasu, H., Ran, F A., Hsu, P D., Konermann, S., Shehata, S I., Dohmae, N., Ishitani, R., Zhang, F., Nureki, O. Cell February 27, 156(5):935-49 (2014);
    • Genome-wide binding of the CRISPR endonuclease Cas9 in mammalian cells. Wu X., Scott D A., Kriz A J., Chiu A C., Hsu P D., Dadon D B., Cheng A W., Trevino A E., Konermann S., Chen S., Jaenisch R., Zhang F., Sharp P A. Nat Biotechnol. April 20. doi: 10.1038/nbt.2889 (2014);
    • CRISPR-Cas9 Knockin Mice for Genome Editing and Cancer Modeling. Platt R J, Chen S, Zhou Y, Yim M J, Swiech L, Kempton H R, Dahlman J E, Parnas O, Eisenhaure T M, Jovanovic M, Graham D B, Jhunjhunwala S, Heidenreich M, Xavier R J, Langer R, Anderson D G, Hacohen N, Regev A, Feng G, Sharp P A, Zhang F. Cell 159(2): 440-455 DOI: 10.1016/j.cell.2014.09.014 (2014);
    • Development and Applications of CRISPR-Cas9 for Genome Engineering, Hsu P D, Lander E S, Zhang F., Cell. June 5; 157(6):1262-78 (2014).
    • Genetic screens in human cells using the CRISPR/Cas9 system, Wang T, Wei J J, Sabatini D M, Lander E S., Science. January 3; 343(6166): 80-84. doi:10.1126/science.1246981 (2014);
    • Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation, Doench J G, Hartenian E, Graham D B, Tothova Z, Hegde M, Smith I, Sullender M, Ebert B L, Xavier R J, Root D E., (published online 3 Sep. 2014) Nat Biotechnol. December; 32(12):1262-7 (2014);
    • In vivo interrogation of gene function in the mammalian brain using CRISPR-Cas9, Swiech L, Heidenreich M, Banerjee A, Habib N, Li Y, Trombetta J, Sur M, Zhang F., (published online 19 Oct. 2014) Nat Biotechnol. January; 33(1):102-6 (2015);
    • Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex, Konermann S, Brigham M D, Trevino A E, Joung J, Abudayyeh O O, Barcena C, Hsu P D, Habib N, Gootenberg J S, Nishimasu H, Nureki O, Zhang F., Nature. January 29; 517(7536):583-8 (2015).
    • A split-Cas9 architecture for inducible genome editing and transcription modulation, Zetsche B, Volz S E, Zhang F., (published online 2 Feb. 2015) Nat Biotechnol. February; 33(2):139-42 (2015);
    • Genome-wide CRISPR Screen in a Mouse Model of Tumor Growth and Metastasis, Chen S, Sanjana N E, Zheng K, Shalem O, Lee K, Shi X, Scott D A, Song J, Pan J Q, Weissleder R, Lee H, Zhang F, Sharp P A. Cell 160, 1246-1260, Mar. 12, 2015 (multiplex screen in mouse), and
    • In vivo genome editing using Staphylococcus aureus Cas9, Ran F A, Cong L, Yan W X, Scott D A, Gootenberg J S, Kriz A J, Zetsche B, Shalem O, Wu X, Makarova K S, Koonin E V, Sharp P A, Zhang F., (published online 1 Apr. 2015), Nature. April 9; 520(7546): 186-91 (2015).
    • Shalem et al., “High-throughput functional genomics using CRISPR-Cas9,” Nature Reviews Genetics 16, 299-311 (May 2015).
    • Xu et al., “Sequence determinants of improved CRISPR sgRNA design,” Genome Research 25, 1147-1157 (August 2015).
    • Parnas et al., “A Genome-wide CRISPR Screen in Primary Immune Cells to Dissect Regulatory Networks,” Cell 162, 675-686 (Jul. 30, 2015).
    • Ramanan et al., CRISPR/Cas9 cleavage of viral DNA efficiently suppresses hepatitis B virus,” Scientific Reports 5:10833. doi: 10.1038/srep10833 (Jun. 2, 2015)
    • Nishimasu et al., Crystal Structure of Staphylococcus aureus Cas9,” Cell 162, 1113-1126 (Aug. 27, 2015)
    • Zetsche et al. (2015), “Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system,” Cell 163, 759-771 (Oct. 22, 2015) doi: 10.1016/j.cell.2015.09.038. Epub Sep. 25, 2015
    • Shmakov et al. (2015), “Discovery and Functional Characterization of Diverse Class 2 CRISPR-Cas Systems,” Molecular Cell 60, 385-397 (Nov. 5, 2015) doi: 10.1016/j.molcel.2015.10.008. Epub Oct. 22, 2015
    • Dahlman et al., “Orthogonal gene control with a catalytically active Cas9 nuclease,” Nature Biotechnology 33, 1159-1161 (November, 2015)
    • Gao et al, “Engineered Cpf1 Enzymes with Altered PAM Specificities,” bioRxiv 091611; doi: dx.doi.org/10.1101/091611 Epub Dec. 4, 2016
    • Smargon et al. (2017), “Cas13b Is a Type VI-B CRISPR-Associated RNA-Guided RNase Differentially Regulated by Accessory Proteins Csx27 and Csx28,” Molecular Cell 65, 618-630 (Feb. 16, 2017) doi: 10.1016/j.molcel.2016.12.023. Epub Jan. 5, 2017
      each of which is incorporated herein by reference, may be considered in the practice of the instant invention, and discussed briefly below:
    • Cong et al. engineered type II CRISPR-Cas systems for use in eukaryotic cells based on both Streptococcus thermophilus Cas9 and also Streptococcus pyogenes Cas9 and demonstrated that Cas9 nucleases can be directed by short RNAs to induce precise cleavage of DNA in human and mouse cells. Their study further showed that Cas9 as converted into a nicking enzyme can be used to facilitate homology-directed repair in eukaryotic cells with minimal mutagenic activity. Additionally, their study demonstrated that multiple guide sequences can be encoded into a single CRISPR array to enable simultaneous editing of several at endogenous genomic loci sites within the mammalian genome, demonstrating easy programmability and wide applicability of the RNA-guided nuclease technology. This ability to use RNA to program sequence specific DNA cleavage in cells defined a new class of genome engineering tools. These studies further showed that other CRISPR loci are likely to be transplantable into mammalian cells and can also mediate mammalian genome cleavage. Importantly, it can be envisaged that several aspects of the CRISPR-Cas system can be further improved to increase its efficiency and versatility.
    • Jiang et al. used the clustered, regularly interspaced, short palindromic repeats (CRISPR)-associated Cas9 endonuclease complexed with dual-RNAs to introduce precise mutations in the genomes of Streptococcus pneumoniae and Escherichia coli. The approach relied on dual-RNA:Cas9-directed cleavage at the targeted genomic site to kill unmutated cells and circumvents the need for selectable markers or counter-selection systems. The study reported reprogramming dual-RNA:Cas9 specificity by changing the sequence of short CRISPR RNA (crRNA) to make single- and multinucleotide changes carried on editing templates. The study showed that simultaneous use of two crRNAs enabled multiplex mutagenesis. Furthermore, when the approach was used in combination with recombineering, in S. pneumoniae, nearly 100% of cells that were recovered using the described approach contained the desired mutation, and in E. coli, 65% that were recovered contained the mutation.
    • Wang et al. (2013) used the CRISPR/Cas system for the one-step generation of mice carrying mutations in multiple genes which were traditionally generated in multiple steps by sequential recombination in embryonic stem cells and/or time-consuming intercrossing of mice with a single mutation. The CRISPR/Cas system will greatly accelerate the in vivo study of functionally redundant genes and of epistatic gene interactions.
    • Konermann et al. (2013) addressed the need in the art for versatile and robust technologies that enable optical and chemical modulation of DNA-binding domains based CRISPR Cas9 enzyme and also Transcriptional Activator Like Effectors
    • Ran et al. (2013-A) described an approach that combined a Cas9 nickase mutant with paired guide RNAs to introduce targeted double-strand breaks. This addresses the issue of the Cas9 nuclease from the microbial CRISPR-Cas system being targeted to specific genomic loci by a guide sequence, which can tolerate certain mismatches to the DNA target and thereby promote undesired off-target mutagenesis. Because individual nicks in the genome are repaired with high fidelity, simultaneous nicking via appropriately offset guide RNAs is required for double-stranded breaks and extends the number of specifically recognized bases for target cleavage. The authors demonstrated that using paired nicking can reduce off-target activity by 50- to 1,500-fold in cell lines and to facilitate gene knockout in mouse zygotes without sacrificing on-target cleavage efficiency. This versatile strategy enables a wide variety of genome editing applications that require high specificity.
    • Hsu et al. (2013) characterized SpCas9 targeting specificity in human cells to inform the selection of target sites and avoid off-target effects. The study evaluated >700 guide RNA variants and SpCas9-induced indel mutation levels at >100 predicted genomic off-target loci in 293T and 293FT cells. The authors that SpCas9 tolerates mismatches between guide RNA and target DNA at different positions in a sequence-dependent manner, sensitive to the number, position and distribution of mismatches. The authors further showed that SpCas9-mediated cleavage is unaffected by DNA methylation and that the dosage of SpCas9 and sgRNA can be titrated to minimize off-target modification. Additionally, to facilitate mammalian genome engineering applications, the authors reported providing a web-based software tool to guide the selection and validation of target sequences as well as off-target analyses.
    • Ran et al. (2013-B) described a set of tools for Cas9-mediated genome editing via non-homologous end joining (NHEJ) or homology-directed repair (HDR) in mammalian cells, as well as generation of modified cell lines for downstream functional studies. To minimize off-target cleavage, the authors further described a double-nicking strategy using the Cas9 nickase mutant with paired guide RNAs. The protocol provided by the authors experimentally derived guidelines for the selection of target sites, evaluation of cleavage efficiency and analysis of off-target activity. The studies showed that beginning with target design, gene modifications can be achieved within as little as 1-2 weeks, and modified clonal cell lines can be derived within 2-3 weeks.
    • Shalem et al. described a new way to interrogate gene function on a genome-wide scale. Their studies showed that delivery of a genome-scale CRISPR-Cas9 knockout (GeCKO) library targeted 18,080 genes with 64,751 unique guide sequences enabled both negative and positive selection screening in human cells. First, the authors showed use of the GeCKO library to identify genes essential for cell viability in cancer and pluripotent stem cells. Next, in a melanoma model, the authors screened for genes whose loss is involved in resistance to vemurafenib, a therapeutic that inhibits mutant protein kinase BRAF. Their studies showed that the highest-ranking candidates included previously validated genes NF1 and MED12 as well as novel hits NF2, CUL3, TADA2B, and TADA1. The authors observed a high level of consistency between independent guide RNAs targeting the same gene and a high rate of hit confirmation, and thus demonstrated the promise of genome-scale screening with Cas9.
    • Nishimasu et al. reported the crystal structure of Streptococcus pyogenes Cas9 in complex with sgRNA and its target DNA at 2.5 A° resolution. The structure revealed a bilobed architecture composed of target recognition and nuclease lobes, accommodating the sgRNA:DNA heteroduplex in a positively charged groove at their interface. Whereas the recognition lobe is essential for binding sgRNA and DNA, the nuclease lobe contains the HNH and RuvC nuclease domains, which are properly positioned for cleavage of the complementary and non-complementary strands of the target DNA, respectively. The nuclease lobe also contains a carboxyl-terminal domain responsible for the interaction with the protospacer adjacent motif (PAM). This high-resolution structure and accompanying functional analyses have revealed the molecular mechanism of RNA-guided DNA targeting by Cas9, thus paving the way for the rational design of new, versatile genome-editing technologies.
    • Wu et al. mapped genome-wide binding sites of a catalytically inactive Cas9 (dCas9) from Streptococcus pyogenes loaded with single guide RNAs (sgRNAs) in mouse embryonic stem cells (mESCs). The authors showed that each of the four sgRNAs tested targets dCas9 to between tens and thousands of genomic sites, frequently characterized by a 5-nucleotide seed region in the sgRNA and an NGG protospacer adjacent motif (PAM). Chromatin inaccessibility decreases dCas9 binding to other sites with matching seed sequences; thus 70% of off-target sites are associated with genes. The authors showed that targeted sequencing of 295 dCas9 binding sites in mESCs transfected with catalytically active Cas9 identified only one site mutated above background levels. The authors proposed a two-state model for Cas9 binding and cleavage, in which a seed match triggers binding but extensive pairing with target DNA is required for cleavage.
    • Platt et al. established a Cre-dependent Cas9 knockin mouse. The authors demonstrated in vivo as well as ex vivo genome editing using adeno-associated virus (AAV)-, lentivirus-, or particle-mediated delivery of guide RNA in neurons, immune cells, and endothelial cells.
    • Hsu et al. (2014) is a review article that discusses generally CRISPR-Cas9 history from yogurt to genome editing, including genetic screening of cells.
    • Wang et al. (2014) relates to a pooled, loss-of-function genetic screening approach suitable for both positive and negative selection that uses a genome-scale lentiviral single guide RNA (sgRNA) library.
    • Doench et al. created a pool of sgRNAs, tiling across all possible target sites of a panel of six endogenous mouse and three endogenous human genes and quantitatively assessed their ability to produce null alleles of their target gene by antibody staining and flow cytometry. The authors showed that optimization of the PAM improved activity and also provided an on-line tool for designing sgRNAs.
    • Swiech et al. demonstrate that AAV-mediated SpCas9 genome editing can enable reverse genetic studies of gene function in the brain.
    • Konermann et al. (2015) discusses the ability to attach multiple effector domains, e.g., transcriptional activator, functional and epigenomic regulators at appropriate positions on the guide such as stem or tetraloop with and without linkers.
    • Zetsche et al. demonstrates that the Cas9 enzyme can be split into two and hence the assembly of Cas9 for activation can be controlled.
    • Chen et al. relates to multiplex screening by demonstrating that a genome-wide in vivo CRISPR-Cas9 screen in mice reveals genes regulating lung metastasis.
    • Ran et al. (2015) relates to SaCas9 and its ability to edit genomes and demonstrates that one cannot extrapolate from biochemical assays. Shalem et al. (2015) described ways in which catalytically inactive Cas9 (dCas9) fusions are used to synthetically repress (CRISPRi) or activate (CRISPRa) expression, showing. advances using Cas9 for genome-scale screens, including arrayed and pooled screens, knockout approaches that inactivate genomic loci and strategies that modulate transcriptional activity. End Edits
    • Shalem et al. (2015) described ways in which catalytically inactive Cas9 (dCas9) fusions are used to synthetically repress (CRISPRi) or activate (CRISPRa) expression, showing. advances using Cas9 for genome-scale screens, including arrayed and pooled screens, knockout approaches that inactivate genomic loci and strategies that modulate transcriptional activity.
    • Xu et al. (2015) assessed the DNA sequence features that contribute to single guide RNA (sgRNA) efficiency in CRISPR-based screens. The authors explored efficiency of CRISPR/Cas9 knockout and nucleotide preference at the cleavage site. The authors also found that the sequence preference for CRISPRi/a is substantially different from that for CRISPR/Cas9 knockout.
    • Parnas et al. (2015) introduced genome-wide pooled CRISPR-Cas9 libraries into dendritic cells (DCs) to identify genes that control the induction of tumor necrosis factor (Tnf) by bacterial lipopolysaccharide (LPS). Known regulators of Tlr4 signaling and previously unknown candidates were identified and classified into three functional modules with distinct effects on the canonical responses to LPS.
    • Ramanan et al (2015) demonstrated cleavage of viral episomal DNA (cccDNA) in infected cells. The HBV genome exists in the nuclei of infected hepatocytes as a 3.2 kb double-stranded episomal DNA species called covalently closed circular DNA (cccDNA), which is a key component in the HBV life cycle whose replication is not inhibited by current therapies. The authors showed that sgRNAs specifically targeting highly conserved regions of HBV robustly suppresses viral replication and depleted cccDNA.
    • Nishimasu et al. (2015) reported the crystal structures of SaCas9 in complex with a single guide RNA (sgRNA) and its double-stranded DNA targets, containing the 5′-TTGAAT-3′ PAM and the 5′-TTGGGT-3′ PAM. A structural comparison of SaCas9 with SpCas9 highlighted both structural conservation and divergence, explaining their distinct PAM specificities and orthologous sgRNA recognition.

Also, “Dimeric CRISPR RNA-guided FokI nucleases for highly specific genome editing”, Shengdar Q. Tsai, Nicolas Wyvekens, Cyd Khayter, Jennifer A. Foden, Vishal Thapar, Deepak Reyon, Mathew J. Goodwin, Martin J. Aryee, J. Keith Joung Nature Biotechnology 32(6): 569-77 (2014), relates to dimeric RNA-guided FokI Nucleases that recognize extended sequences and can edit endogenous genes with high efficiencies in human cells. In addition, mention is made of PCT application PCT/US14/70057, Attorney Reference 47627.99.2060 and BI-2013/107 entitled “DELIVERY, USE AND THERAPEUTIC APPLICATIONS OF THE CRISPR-CAS SYSTEMS AND COMPOSITIONS FOR TARGETING DISORDERS AND DISEASES USING PARTICLE DELIVERY COMPONENTS (claiming priority from one or more or all of US provisional patent applications: 62/054,490, filed Sep. 24, 2014; 62/010,441, filed Jun. 10, 2014; and 61/915,118, 61/915,215 and 61/915,148, each filed on Dec. 12, 2013) (“the Particle Delivery PCT”), incorporated herein by reference, with respect to a method of preparing an sgRNA-and-Cas9 protein containing particle comprising admixing a mixture comprising an sgRNA and Cas9 protein (and optionally HDR template) with a mixture comprising or consisting essentially of or consisting of surfactant, phospholipid, biodegradable polymer, lipoprotein and alcohol; and particles from such a process. For example, wherein Cas9 protein and sgRNA were mixed together at a suitable, e.g., 3:1 to 1:3 or 2:1 to 1:2 or 1:1 molar ratio, at a suitable temperature, e.g., 15-30C, e.g., 20-25C, e.g., room temperature, for a suitable time, e.g., 15-45, such as 30 minutes, advantageously in sterile, nuclease free buffer, e.g., 1×PBS. Separately, particle components such as or comprising: a surfactant, e.g., cationic lipid, e.g., 1,2-dioleoyl-3-trimethylammonium-propane (DOTAP); phospholipid, e.g., dimyristoylphosphatidylcholine (DMPC); biodegradable polymer, such as an ethylene-glycol polymer or PEG, and a lipoprotein, such as a low-density lipoprotein, e.g., cholesterol were dissolved in an alcohol, advantageously a C1-6 alkyl alcohol, such as methanol, ethanol, isopropanol, e.g., 100% ethanol. The two solutions were mixed together to form particles containing the Cas9-sgRNA complexes. Accordingly, sgRNA may be pre-complexed with the Cas9 protein, before formulating the entire complex in a particle. Formulations may be made with a different molar ratio of different components known to promote delivery of nucleic acids into cells (e.g. 1,2-dioleoyl-3-trimethylammonium-propane (DOTAP), 1,2-ditetradecanoyl-sn-glycero-3-phosphocholine (DMPC), polyethylene glycol (PEG), and cholesterol) For example DOTAP:DMPC:PEG:Cholesterol Molar Ratios may be DOTAP 100, DMPC 0, PEG 0, Cholesterol 0; or DOTAP 90, DMPC 0, PEG 10, Cholesterol 0; or DOTAP 90, DMPC 0, PEG 5, Cholesterol 5. DOTAP 100, DMPC 0, PEG 0, Cholesterol 0. That application accordingly comprehends admixing sgRNA, Cas9 protein and components that form a particle; as well as particles from such admixing. Aspects of the instant invention can involve particles; for example, particles using a process analogous to that of the Particle Delivery PCT, e.g., by admixing a mixture comprising crRNA and/or CRISPR-Cas as in the instant invention and components that form a particle, e.g., as in the Particle Delivery PCT, to form a particle and particles from such admixing (or, of course, other particles involving crRNA and/or CRISPR-Cas as in the instant invention).

Guide Sequences

In embodiments of the invention the terms guide sequence and guide RNA and crRNA are used interchangeably as in foregoing cited documents such as WO 2014/093622 (PCT/US2013/074667). In general, a guide sequence is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a CRISPR complex to the target sequence. In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting example of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies; available at www.novocraft.com), ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net). In some embodiments, a guide sequence is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length. Preferably the guide sequence is 10-30 nucleotides long, such as 30 nucleotides long. The ability of a guide sequence to direct sequence-specific binding of a CRISPR complex to a target sequence may be assessed by any suitable assay. For example, the components of a CRISPR system sufficient to form a CRISPR complex, including the guide sequence to be tested, may be provided to a host cell having the corresponding target sequence, such as by transfection with vectors encoding the components of the CRISPR sequence, followed by an assessment of preferential cleavage within the target sequence, such as by Surveyor assay as described herein. Similarly, cleavage of a target polynucleotide sequence may be evaluated in a test tube by providing the target sequence, components of a CRISPR complex, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions. Other assays are possible, and will occur to those skilled in the art. A guide sequence may be selected to target any target sequence. In some embodiments, the target sequence is a sequence within a genome of a cell. Exemplary target sequences include those that are unique in the target genome.

In general, and throughout this specification, the term “vector” refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. Vectors include, but are not limited to, nucleic acid molecules that are single-stranded, double-stranded, or partially double-stranded; nucleic acid molecules that comprise one or more free ends, no free ends (e.g., circular); nucleic acid molecules that comprise DNA, RNA, or both; and other varieties of polynucleotides known in the art. One type of vector is a “plasmid,” which refers to a circular double stranded DNA loop into which additional DNA segments can be inserted, such as by standard molecular cloning techniques. Another type of vector is a viral vector, wherein virally-derived DNA or RNA sequences are present in the vector for packaging into a virus (e.g., retroviruses, replication defective retroviruses, adenoviruses, replication defective adenoviruses, and adeno-associated viruses). Viral vectors also include polynucleotides carried by a virus for transfection into a host cell. Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors). Other vectors (e.g., non-episomal mammalian vectors) are integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome. Moreover, certain vectors are capable of directing the expression of genes to which they are operatively-linked. Such vectors are referred to herein as “expression vectors.” Vectors for and that result in expression in a eukaryotic cell can be referred to herein as “eukaryotic expression vectors.” Common expression vectors of utility in recombinant DNA techniques are often in the form of plasmids.

Recombinant expression vectors can comprise a nucleic acid of the invention in a form suitable for expression of the nucleic acid in a host cell, which means that the recombinant expression vectors include one or more regulatory elements, which may be selected on the basis of the host cells to be used for expression, that is operatively-linked to the nucleic acid sequence to be expressed. Within a recombinant expression vector, “operably linked” is intended to mean that the nucleotide sequence of interest is linked to the regulatory element(s) in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).

The term “regulatory element” is intended to include promoters, enhancers, internal ribosomal entry sites (IRES), and other expression control elements (e.g., transcription termination signals, such as polyadenylation signals and poly-U sequences). Such regulatory elements are described, for example, in Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif. (1990). Regulatory elements include those that direct constitutive expression of a nucleotide sequence in many types of host cell and those that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences). A tissue-specific promoter may direct expression primarily in a desired tissue of interest, such as muscle, neuron, bone, skin, blood, specific organs (e.g., liver, pancreas), or particular cell types (e.g., lymphocytes). Regulatory elements may also direct expression in a temporal-dependent manner, such as in a cell-cycle dependent or developmental stage-dependent manner, which may or may not also be tissue or cell-type specific. In some embodiments, a vector comprises one or more pol III promoter (e.g., 1, 2, 3, 4, 5, or more pol III promoters), one or more pol II promoters (e.g., 1, 2, 3, 4, 5, or more pol II promoters), one or more pol I promoters (e.g., 1, 2, 3, 4, 5, or more pol I promoters), or combinations thereof. Examples of pol III promoters include, but are not limited to, U6 and H1 promoters. Examples of pol II promoters include, but are not limited to, the retroviral Rous sarcoma virus (RSV) LTR promoter (optionally with the RSV enhancer), the cytomegalovirus (CMV) promoter (optionally with the CMV enhancer) [see, e.g., Boshart et al, Cell, 41:521-530 (1985)], the SV40 promoter, the dihydrofolate reductase promoter, the β-actin promoter, the phosphoglycerol kinase (PGK) promoter, and the EF1α promoter. Also encompassed by the term “regulatory element” are enhancer elements, such as WPRE; CMV enhancers; the R-U5′ segment in LTR of HTLV-I (Mol. Cell. Biol., Vol. 8(1), p. 466-472, 1988); SV40 enhancer; and the intron sequence between exons 2 and 3 of rabbit β-globin (Proc. Natl. Acad. Sci. USA., Vol. 78(3), p. 1527-31, 1981). It will be appreciated by those skilled in the art that the design of the expression vector can depend on such factors as the choice of the host cell to be transformed, the level of expression desired, etc. A vector can be introduced into host cells to thereby produce transcripts, proteins, or peptides, including fusion proteins or peptides, encoded by nucleic acids as described herein (e.g., clustered regularly interspersed short palindromic repeats (CRISPR) transcripts, proteins, enzymes, mutant forms thereof, fusion proteins thereof, etc.).

Advantageous vectors include lentiviruses and adeno-associated viruses, and types of such vectors can also be selected for targeting particular types of cells.

As used herein, the term “crRNA” or “guide RNA” or “single guide RNA” or “sgRNA” or “one or more nucleic acid components” of a Type VI CRISPR-Cas locus effector protein comprises any polynucleotide sequence having sufficient complementarity with a target nucleic acid sequence to hybridize with the target nucleic acid sequence and direct sequence-specific binding of a RNA-targeting complex to the target RNA sequence.

In certain embodiments, the CRISPR system as provided herein can make use of a crRNA or analogous polynucleotide comprising a guide sequence, wherein the polynucleotide is an RNA, a DNA or a mixture of RNA and DNA, and/or wherein the polynucleotide comprises one or more nucleotide analogs. The sequence can comprise any structure, including but not limited to a structure of a native crRNA, such as a bulge, a hairpin or a stem loop structure. In certain embodiments, the polynucleotide comprising the guide sequence forms a duplex with a second polynucleotide sequence which can be an RNA or a DNA sequence.

In certain embodiments, guides of the invention comprise non-naturally occurring nucleic acids and/or non-naturally occurring nucleotides and/or nucleotide analogs, and/or chemically modifications. Non-naturally occurring nucleic acids can include, for example, mixtures of naturally and non-naturally occurring nucleotides. Non-naturally occurring nucleotides and/or nucleotide analogs may be modified at the ribose, phosphate, and/or base moiety. In an embodiment of the invention, a guide nucleic acid comprises ribonucleotides and non-ribonucleotides. In one such embodiment, a guide comprises one or more ribonucleotides and one or more deoxyribonucleotides. In an embodiment of the invention, the guide comprises one or more non-naturally occurring nucleotide or nucleotide analog such as a nucleotide with phosphorothioate linkage, boranophosphate linkage, a locked nucleic acid (LNA) nucleotides comprising a methylene bridge between the 2′ and 4′ carbons of the ribose ring, or bridged nucleic acids (BNA). Other examples of modified nucleotides include 2′-O-methyl analogs, 2′-deoxy analogs, 2-thiouridine analogs, N6-methyladenosine analogs, or 2′-fluoro analogs. Further examples of modified bases include, but are not limited to, 2-aminopurine, 5-bromo-uridine, pseudouridine (Ψ), N1-methylpseudouridine (melΨ), 5-methoxyuridine (5moU), inosine, 7-methylguanosine. Examples of guide RNA chemical modifications include, without limitation, incorporation of 2′-O-methyl (M), 2′-O-methyl 3′phosphorothioate (MS), S-constrained ethyl (cEt), or 2′-O-methyl 3′thioPACE (MSP) at one or more terminal nucleotides. Such chemically modified guide RNAs can comprise increased stability and increased activity as compared to unmodified guide RNAs, though on-target vs. off-target specificity is not predictable. (See, Hendel, 2015, Nat Biotechnol. 33(9):985-9, doi: 10.1038/nbt.3290, published online 29 Jun. 2015; Allerson et al., J. Med. Chem. 2005, 48:901-904; Bramsen et al., Front. Genet., 2012, 3:154; Deng et al., PNAS, 2015, 112:11870-11875; Sharma et al., MedChemComm., 2014, 5:1454-1471; Li et al., Nature Biomedical Engineering, 2017, 1, 0066 DOI:10.1038/s41551-017-0066).

In some embodiments, the 5′ and/or 3′ end of a guide RNA is modified by a variety of functional moieties including fluorescent dyes, polyethylene glycol, cholesterol, proteins, or detection tags. (See Kelly et al., 2016, J. Biotech. 233:74-83). In certain embodiments, a guide comprises ribonucleotides in a region that binds to a target DNA and one or more deoxyribonucleotides and/or nucleotide analogs in a region that binds to Cas9, Cpf1, or C2c1. In an embodiment of the invention, deoxyribonucleotides and/or nucleotide analogs are incorporated in engineered guide structures, such as, without limitation, 5′ and/or 3′ end, stem-loop regions, and the seed region. In certain embodiments, the modification is not in the 5′-handle of the stem-loop regions. Chemical modification in the 5′-handle of the stem-loop region of a guide may abolish its function (see Li, et al., Nature Biomedical Engineering, 2017, 1:0066). In certain embodiments, at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, or 75 nucleotides of a guide is chemically modified. In some embodiments, 3-5 nucleotides at either the 3′ or the 5′ end of a guide is chemically modified. In some embodiments, only minor modifications are introduced in the seed region, such as 2′-F modifications. In some embodiments, 2′-F modification is introduced at the 3′ end of a guide. In certain embodiments, three to five nucleotides at the 5′ and/or the 3′ end of the guide are chemically modified with 2′-O-methyl (M), 2′-O-methyl-3′-phosphorothioate (MS), S-constrained ethyl(cEt), or 2′-O-methyl-3′-thioPACE (MSP). Such modification can enhance genome editing efficiency (see Hendel et al., Nat. Biotechnol. (2015) 33(9): 985-989). In certain embodiments, all of the phosphodiester bonds of a guide are substituted with phosphorothioates (PS) for enhancing levels of gene disruption. In certain embodiments, more than five nucleotides at the 5′ and/or the 3′ end of the guide are chemically modified with 2′-O-Me, 2′-F or S-constrained ethyl(cEt). Such chemically modified guide can mediate enhanced levels of gene disruption (see Ragdarm et al., 0215, PNAS, E7110-E7111). In an embodiment of the invention, a guide is modified to comprise a chemical moiety at its 3′ and/or 5′ end. Such moieties include, but are not limited to amine, azide, alkyne, thio, dibenzocyclooctyne (DBCO), or Rhodamine. In certain embodiment, the chemical moiety is conjugated to the guide by a linker, such as an alkyl chain. In certain embodiments, the chemical moiety of the modified guide can be used to attach the guide to another molecule, such as DNA, RNA, protein, or nanoparticles. Such chemically modified guide can be used to identify or enrich cells generically edited by a CRISPR system (see Lee et al., eLife, 2017, 6:e25312, DOI:10.7554)

In some embodiments, the modification to the guide is a chemical modification, an insertion, a deletion or a split. In some embodiments, the chemical modification includes, but is not limited to, incorporation of 2′-O-methyl (M) analogs, 2′-deoxy analogs, 2-thiouridine analogs, N6-methyladenosine analogs, 2′-fluoro analogs, 2-aminopurine, 5-bromo-uridine, pseudouridine (Ψ), N1-methylpseudouridine (melΨ), 5-methoxyuridine (5moU), inosine, 7-methylguanosine, 2′-O-methyl-3′-phosphorothioate (MS), S-constrained ethyl(cEt), phosphorothioate (PS), or 2′-O-methyl-3′-thioPACE (MSP). In some embodiments, the guide comprises one or more of phosphorothioate modifications. In certain embodiments, at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or 25 nucleotides of the guide are chemically modified. In certain embodiments, one or more nucleotides in the seed region are chemically modified. In certain embodiments, one or more nucleotides in the 3′-terminus are chemically modified. In certain embodiments, none of the nucleotides in the 5′-handle is chemically modified. In some embodiments, the chemical modification in the seed region is a minor modification, such as incorporation of a 2′-fluoro analog. In a specific embodiment, one nucleotide of the seed region is replaced with a 2′-fluoro analog. In some embodiments, 5 or 10 nucleotides in the 3′-terminus are chemically modified. Such chemical modifications at the 3′-terminus of the Cpf1 CrRNA improve gene cutting efficiency (see Li, et al., Nature Biomedical Engineering, 2017, 1:0066). In a specific embodiment, 5 nucleotides in the 3′-terminus are replaced with 2′-fluoro analogues. In a specific embodiment, 10 nucleotides in the 3′-terminus are replaced with 2′-fluoro analogues. In a specific embodiment, 5 nucleotides in the 3′-terminus are replaced with 2′-O-methyl (M) analogs.

In some embodiments, the loop of the 5′-handle of the guide is modified. In some embodiments, the loop of the 5′-handle of the guide is modified to have a deletion, an insertion, a split, or chemical modifications. In certain embodiments, the loop comprises 3, 4, or 5 nucleotides. In certain embodiments, the loop comprises the sequence of UCUU, UUUU, UAUU, or UGUU.

In one aspect, the guide comprises portions that are chemically linked or conjugated via a non-phosphodiester bond. In one aspect, the guide comprises, in non-limiting examples, direct repeat sequence portion and a targeting sequence portion that are chemically linked or conjugated via a non-nucleotide loop. In some embodiments, the portions are joined via a non-phosphodiester covalent linker. Examples of the covalent linker include but are not limited to a chemical moiety selected from the group consisting of carbamates, ethers, esters, amides, imines, amidines, aminotrizines, hydrozone, disulfides, thioethers, thioesters, phosphorothioates, phosphorodithioates, sulfonamides, sulfonates, fulfones, sulfoxides, ureas, thioureas, hydrazide, oxime, triazole, photolabile linkages, C—C bond forming groups such as Diels-Alder cyclo-addition pairs or ring-closing metathesis pairs, and Michael reaction pairs.

In some embodiments, portions of the guide are first synthesized using the standard phosphoramidite synthetic protocol (Herdewijn, P., ed., Methods in Molecular Biology Col 288, Oligonucleotide Synthesis: Methods and Applications, Humana Press, New Jersey (2012)). In some embodiments, the non-targeting guide portions can be functionalized to contain an appropriate functional group for ligation using the standard protocol known in the art (Hermanson, G. T., Bioconjugate Techniques, Academic Press (2013)). Examples of functional groups include, but are not limited to, hydroxyl, amine, carboxylic acid, carboxylic acid halide, carboxylic acid active ester, aldehyde, carbonyl, chlorocarbonyl, imidazolylcarbonyl, hydrozide, semicarbazide, thio semicarbazide, thiol, maleimide, haloalkyl, sufonyl, ally, propargyl, diene, alkyne, and azide. Once a non-targeting portions of a guide is functionalized, a covalent chemical bond or linkage can be formed between the two oligonucleotides. Examples of chemical bonds include, but are not limited to, those based on carbamates, ethers, esters, amides, imines, amidines, aminotrizines, hydrozone, disulfides, thioethers, thioesters, phosphorothioates, phosphorodithioates, sulfonamides, sulfonates, fulfones, sulfoxides, ureas, thioureas, hydrazide, oxime, triazole, photolabile linkages, C—C bond forming groups such as Diels-Alder cyclo-addition pairs or ring-closing metathesis pairs, and Michael reaction pairs.

In some embodiments, one or more portions of a guide can be chemically synthesized. In some embodiments, the chemical synthesis uses automated, solid-phase oligonucleotide synthesis machines with 2′-acetoxyethyl orthoester (2′-ACE) (Scaringe et al., J. Am. Chem. Soc. (1998) 120: 11820-11821; Scaringe, Methods Enzymol. (2000) 317: 3-18) or 2′-thionocarbamate (2′-TC) chemistry (Dellinger et al., J. Am. Chem. Soc. (2011) 133: 11540-11546; Hendel et al., Nat. Biotechnol. (2015) 33:985-989).

In some embodiments, the guide portions can be covalently linked using various bioconjugation reactions, loops, bridges, and non-nucleotide links via modifications of sugar, internucleotide phosphodiester bonds, purine and pyrimidine residues. Sletten et al., Angew. Chem. Int. Ed. (2009) 48:6974-6998; Manoharan, M. Curr. Opin. Chem. Biol. (2004) 8: 570-9; Behlke et al., Oligonucleotides (2008) 18: 305-19; Watts, et al., Drug. Discov. Today (2008) 13: 842-55; Shukla, et al., ChemMedChem (2010) 5: 328-49.

In some embodiments, the guide portions can be covalently linked using click chemistry. In some embodiments, guide portions can be covalently linked using a triazole linker. In some embodiments, guide portions can be covalently linked using Huisgen 1,3-dipolar cycloaddition reaction involving an alkyne and azide to yield a highly stable triazole linker (He et al., ChemBioChem (2015) 17: 1809-1812; WO 2016/186745). In some embodiments, guide portions are covalently linked by ligating a 5′-hexyne portion and a 3′-azide portion. In some embodiments, either or both of the 5′-hexyne guide portion and a 3′-azide guide portion can be protected with 2′-acetoxyethl orthoester (2′-ACE) group, which can be subsequently removed using Dharmacon protocol (Scaringe et al., J. Am. Chem. Soc. (1998) 120: 11820-11821; Scaringe, Methods Enzymol. (2000) 317: 3-18).

In some embodiments, guide portions can be covalently linked via a linker (e.g., a non-nucleotide loop) that comprises a moiety such as spacers, attachments, bioconjugates, chromophores, reporter groups, dye labeled RNAs, and non-naturally occurring nucleotide analogues. More specifically, suitable spacers for purposes of this invention include, but are not limited to, polyethers (e.g., polyethylene glycols, polyalcohols, polypropylene glycol or mixtures of ethylene and propylene glycols), polyamines group (e.g., spennine, spermidine and polymeric derivatives thereof), polyesters (e.g., poly(ethyl acrylate)), polyphosphodiesters, alkylenes, and combinations thereof. Suitable attachments include any moiety that can be added to the linker to add additional properties to the linker, such as but not limited to, fluorescent labels. Suitable bioconjugates include, but are not limited to, peptides, glycosides, lipids, cholesterol, phospholipids, diacyl glycerols and dialkyl glycerols, fatty acids, hydrocarbons, enzyme substrates, steroids, biotin, digoxigenin, carbohydrates, polysaccharides. Suitable chromophores, reporter groups, and dye-labeled RNAs include, but are not limited to, fluorescent dyes such as fluorescein and rhodamine, chemiluminescent, electrochemiluminescent, and bioluminescent marker compounds. The design of example linkers conjugating two RNA components are also described in WO 2004/015075.

The linker (e.g., a non-nucleotide loop) can be of any length. In some embodiments, the linker has a length equivalent to about 0-16 nucleotides. In some embodiments, the linker has a length equivalent to about 0-8 nucleotides. In some embodiments, the linker has a length equivalent to about 0-4 nucleotides. In some embodiments, the linker has a length equivalent to about 2 nucleotides. Example linker design is also described in WO2011/008730.

In some embodiments, the degree of complementarity, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting example of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies; available at www.novocraft.com), ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net). The ability of a guide sequence (within a RNA-targeting guide RNA or crRNA) to direct sequence-specific binding of a nucleic acid-targeting complex to a target nucleic acid sequence may be assessed by any suitable assay. For example, the components of a RNA-targeting CRISPR-Cas system sufficient to form a nucleic acid-targeting complex, including the guide sequence to be tested, may be provided to a host cell having the corresponding target nucleic acid sequence, such as by transfection with vectors encoding the components of the nucleic acid-targeting complex, followed by an assessment of preferential targeting (e.g., cleavage) within the target nucleic acid sequence, such as by Surveyor assay as described herein. Similarly, cleavage of a target nucleic acid sequence may be evaluated in a test tube by providing the target nucleic acid sequence, components of a nucleic acid-targeting complex, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions. Other assays are possible, and will occur to those skilled in the art. A guide sequence, and hence a RNA-targeting guide RNA or crRNA may be selected to target any target nucleic acid sequence. The target sequence may be DNA. The target sequence may be any RNA sequence. In some embodiments, the target sequence may be a sequence within a RNA molecule selected from the group consisting of messenger RNA (mRNA), pre-mRNA, ribosomal RNA (rRNA), transfer RNA (tRNA), micro-RNA (miRNA), small interfering RNA (siRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), double stranded RNA (dsRNA), non-coding RNA (ncRNA), long non-coding RNA (lncRNA), and small cytoplasmatic RNA (scRNA). In some preferred embodiments, the target sequence may be a sequence within a RNA molecule selected from the group consisting of mRNA, pre-mRNA, and rRNA. In some preferred embodiments, the target sequence may be a sequence within a RNA molecule selected from the group consisting of ncRNA, and lncRNA. In some more preferred embodiments, the target sequence may be a sequence within an mRNA molecule or a pre-mRNA molecule.

In some embodiments, a RNA-targeting guide RNA or crRNA is selected to reduce the degree secondary structure within the RNA-targeting guide RNA or crRNA. In some embodiments, about or less than about 75%, 50%, 40%, 30%, 25%, 20%, 15%, 10%, 5%, 1%, or fewer of the nucleotides of the RNA-targeting guide RNA participate in self-complementary base pairing when optimally folded. Optimal folding may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimal Gibbs free energy. An example of one such algorithm is mFold, as described by Zuker and Stiegler (Nucleic Acids Res. 9 (1981), 133-148). Another example folding algorithm is the online webserver RNAfold, developed at Institute for Theoretical Chemistry at the University of Vienna, using the centroid structure prediction algorithm (see e.g., A. R. Gruber et al., 2008, Cell 106(1): 23-24; and P A Carr and G M Church, 2009, Nature Biotechnology 27(12): 1151-62).

In some embodiments, a nucleic acid-targeting guide is designed or selected to modulate intermolecular interactions among guide molecules, such as among stem-loop regions of different guide molecules. It will be appreciated that nucleotides within a guide that base-pair to form a stem-loop are also capable of base-pairing to form an intermolecular duplex with a second guide and that such an intermolecular duplex would not have a secondary structure compatible with CRISPR complex formation. Accordingly, is useful to select or design DR sequences in order to modulate stem-loop formation and CRISPR complex formation. In some embodiments, about or less than about 75%, 50%, 40%, 30%, 25%, 20%, 15%, 10%, 5%, 1%, or fewer of nucleic acid-targeting guides are in intermolecular duplexes. It will be appreciated that stem-loop variation will often be within limits imposed by DR-CRISPR effector interactions. One way to modulate stem-loop formation or change the equilibrium between stem-loop and intermolecular duplex is to vary nucleotide pairs in the stem of the stem-loop of a DR. For example, in one embodiment, a G-C pair is replaced by an A-U or U-A pair. In another embodiment, an A-U pair is substituted for a G-C or a C-G pair. In another embodiment, a naturally occurring nucleotide is replaced by a nucleotide analog. Another way to modulate stem-loop formation or change the equilibrium between stem-loop and intermolecular duplex is to modify the loop of the stem-loop of a DR. Without be bound by theory, the loop can be viewed as an intervening sequence flanked by two sequences that are complementary to each other. When that intervening sequence is not self-complementary, its effect will be to destabilize intermolecular duplex formation. The same principle applies when guides are multiplexed: while the targeting sequences may differ, it may be advantageous to modify the stem-loop region in the DRs of the different guides. Moreover, when guides are multiplexed, the relative activities of the different guides can be modulated by balancing the activity of each individual guide. In certain embodiments, the equilibrium between intermolecular stem-loops vs. intermolecular duplexes is determined. The determination may be made by physical or biochemical means and can be in the presence or absence of a CRISPR effector.

In certain embodiments, a guide RNA or crRNA may comprise, consist essentially of, or consist of a direct repeat (DR) sequence and a guide sequence or spacer sequence. In certain embodiments, the guide RNA or crRNA may comprise, consist essentially of, or consist of a direct repeat sequence fused or linked to a guide sequence or spacer sequence. In certain embodiments, the direct repeat sequence may be located upstream (i.e., 5′) from the guide sequence or spacer sequence. In other embodiments, the direct repeat sequence may be located downstream (i.e., 3′) from the guide sequence or spacer sequence. In other embodiments, multiple DRs (such as dual DRs) may be present.

In certain embodiments, the crRNA comprises a stem loop, preferably a single stem loop. In certain embodiments, the direct repeat sequence forms a stem loop, preferably a single stem loop.

In certain embodiments, the spacer length of the guide RNA is from 15 to 35 nt. In certain embodiments, the spacer length of the guide RNA is at least 15 nucleotides. In certain embodiments, the spacer length is from 15 to 17 nt, e.g., 15, 16, or 17 nt, from 17 to 20 nt, e.g., 17, 18, 19, or 20 nt, from 20 to 24 nt, e.g., 20, 21, 22, 23, or 24 nt, from 23 to 25 nt, e.g., 23, 24, or 25 nt, from 24 to 27 nt, e.g., 24, 25, 26, or 27 nt, from 27-30 nt, e.g., 27, 28, 29, or 30 nt, from 30-35 nt, e.g., 30, 31, 32, 33, 34, or 35 nt, or 35 nt or longer.

The “tracrRNA” sequence or analogous terms includes any polynucleotide sequence that has sufficient complementarity with a crRNA sequence to hybridize. In general, degree of complementarity is with reference to the optimal alignment of the sca sequence and tracr sequence, along the length of the shorter of the two sequences. Optimal alignment may be determined by any suitable alignment algorithm, and may further account for secondary structures, such as self-complementarity within either the sca sequence or tracr sequence. In some embodiments, the degree of complementarity between the tracr sequence and sca sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher. In certain embodiments, the tracrRNA may not be required. Indeed, the CRISPR-Cas effector protein from Bergeyella zoohelcum and orthologs thereof do not require a tracrRNA to ensure cleavage of an RNA target.

In further detail, the assay is as follows for a RNA target, provided that a PAM sequence is required to direct recognition. Two E. coli strains are used in this assay. One carries a plasmid that encodes the endogenous effector protein locus from the bacterial strain. The other strain carries an empty plasmid (e.g. pACYC184, control strain). All possible 7 or 8 bp PAM sequences are presented on an antibiotic resistance plasmid (pUC19 with ampicillin resistance gene). The PAM is located next to the sequence of proto-spacer 1 (the RNA target to the first spacer in the endogenous effector protein locus). Two PAM libraries were cloned. One has a 8 random bp 5′ of the proto-spacer (e.g. total of 65536 different PAM sequences=complexity). The other library has 7 random bp 3′ of the proto-spacer (e.g. total complexity is 16384 different PAMs). Both libraries were cloned to have in average 500 plasmids per possible PAM. Test strain and control strain were transformed with 5′PAM and 3′PAM library in separate transformations and transformed cells were plated separately on ampicillin plates. Recognition and subsequent cutting/interference with the plasmid renders a cell vulnerable to ampicillin and prevents growth. Approximately 12 h after transformation, all colonies formed by the test and control strains where harvested and plasmid RNA was isolated. Plasmid RNA was used as template for PCR amplification and subsequent deep sequencing. Representation of all PAMs in the untransformed libraries showed the expected representation of PAMs in transformed cells. Representation of all PAMs found in control strains showed the actual representation. Representation of all PAMs in test strain showed which PAMs are not recognized by the enzyme and comparison to the control strain allows extracting the sequence of the depleted PAM. In particular embodiments, the cleavage, such as the RNA cleavage is not PAM dependent. Indeed, for the Bergeyella zoohelcum Cas13b effector protein and its orthologs, RNA target cleavage appears to be PAM independent, and hence the Table 1 Cas13b of the invention may act in a PAM independent fashion.

For minimization of toxicity and off-target effect, it will be important to control the concentration of RNA-targeting guide RNA delivered. Optimal concentrations of nucleic acid-targeting guide RNA can be determined by testing different concentrations in a cellular or non-human eukaryote animal model and using deep sequencing the analyze the extent of modification at potential off-target genomic loci. The concentration that gives the highest level of on-target modification while minimizing the level of off-target modification should be chosen for in vivo delivery. The RNA-targeting system is derived advantageously from a CRISPR-Cas system. In some embodiments, one or more elements of a RNA-targeting system is derived from a particular organism comprising an endogenous RNA-targeting system of a Tables 1-4 Cas13 effector protein system as herein-discussed.

Dead Guide Sequence

In one aspect, the invention provides guide sequences which are modified in a manner which allows for formation of the CRISPR Cas complex and successful binding to the target, while at the same time, not either allowing for or not allowing for successful nuclease activity (i.e. without nuclease activity/without indel activity). For matters of explanation such modified guide sequences are referred to as “dead guides” or “dead guide sequences”. These dead guides or dead guide sequences can be thought of as catalytically inactive or conformationally inactive with regard to nuclease activity. Indeed, dead guide sequences may not sufficiently engage in productive base pairing with respect to the ability to promote catalytic activity or to distinguish on-target and off-target binding activity. Briefly, the assay involves synthesizing a CRISPR target RNA and guide RNAs comprising mismatches with the target RNA, combining these with the RNA targeting enzyme and analyzing cleavage based on gels based on the presence of bands generated by cleavage products, and quantifying cleavage based upon relative band intensities.

Hence, in a related aspect, the invention provides a non-naturally occurring or engineered composition RNA targeting CRISPR-Cas system comprising a functional RNA targeting enzyme as described herein, and guide RNA (gRNA) or crRNA wherein the gRNA or crRNA comprises a dead guide sequence whereby the gRNA is capable of hybridizing to a target sequence such that the RNA targeting CRISPR-Cas system is directed to a genomic locus of interest in a cell without detectable RNA cleavage activity of a non-mutant RNA targeting enzyme of the system. It is to be understood that any of the gRNAs or crRNAs according to the invention as described herein elsewhere may be used as dead gRNAs/crRNAs comprising a dead guide sequence.

The ability of a dead guide sequence to direct sequence-specific binding of a CRISPR complex to an RNA target sequence may be assessed by any suitable assay. For example, the components of a CRISPR-Cas system sufficient to form a CRISPR-Cas complex, including the dead guide sequence to be tested, may be provided to a host cell having the corresponding target sequence, such as by transfection with vectors encoding the components of the system, followed by an assessment of preferential cleavage within the target sequence.

As explained further herein, several structural parameters allow for a proper framework to arrive at such dead guides. Dead guide sequences can be typically shorter than respective guide sequences which result in active RNA cleavage. In particular embodiments, dead guides are 5%, 10%, 20%, 30%, 40%, 50%, shorter than respective guides directed to the same.

As explained below and known in the art, one aspect of gRNA or crRNA-RNA targeting specificity is the direct repeat sequence, which is to be appropriately linked to such guides. In particular, this implies that the direct repeat sequences are designed dependent on the origin of the RNA targeting enzyme. Structural data available for validated dead guide sequences may be used for designing CRISPR-Cas specific equivalents. Structural similarity between, e.g., the orthologous nuclease domains HEPN of two or more CRISPR-Cas effector proteins may be used to transfer design equivalent dead guides. Thus, the dead guide herein may be appropriately modified in length and sequence to reflect such CRISPR-Cas specific equivalents, allowing for formation of the CRISPR-Cas complex and successful binding to the target RNA, while at the same time, not allowing for successful nuclease activity.

Dead guides allow one to use gRNA or crRNA as a means for gene targeting, without the consequence of nuclease activity, while at the same time providing directed means for activation or repression. Guide RNA or crRNA comprising a dead guide may be modified to further include elements in a manner which allow for activation or repression of gene activity, in particular protein adaptors (e.g. aptamers) as described herein elsewhere allowing for functional placement of gene effectors (e.g. activators or repressors of gene activity). One example is the incorporation of aptamers, as explained herein and in the state of the art. By engineering the gRNA or crRNA comprising a dead guide to incorporate protein-interacting aptamers (Konermann et al., “Genome-scale transcription activation by an engineered CRISPR-Cas9 complex,” doi:10.1038/nature14136, incorporated herein by reference), one may assemble multiple distinct effector domains. Such may be modeled after natural processes.

Cas13 in General

The instant invention provides particular Cas13 effectors, nucleic acids, systems, vectors, and methods of use. The features and functions of Cas13 may also be the features and functions of other CRISPR-Cas proteins described herein.

As used herein, the terms Cas13b-s1 accessory protein, Cas13b-s1 protein, Cas13b-s1, Csx27, and Csx27 protein are used interchangeably and the terms Cas13b-s2 accessory protein, Cas13b-s2 protein, Cas13b-S2, Csx28, and Csx28 protein are used interchangeably.

In particular embodiments, the wildtype Cas13 effector protein has RNA binding and cleaving function.

In particular embodiments, the (wild type or mutated) Cas13 effector protein may have RNA and/or DNA cleaving function, preferably RNA cleaving function. In these embodiments, methods may be provided based on the effector proteins provided herein which comprehend inducing one or more mutations in a eukaryotic cell (in vitro, i.e. in an isolated eukaryotic cell) as herein discussed comprising delivering to cell a vector as herein discussed. The mutation(s) can include the introduction, deletion, or substitution of one or more nucleotides at each target sequence of cell(s) via the guide(s) RNA(s) or sgRNA(s) or crRNA(s). The mutations can include the introduction, deletion, or substitution of 1-75 nucleotides at each target sequence of said cell(s) via the guide(s) RNA(s) or sgRNA(s) or crRNA(s). The mutations can include the introduction, deletion, or substitution of 1, 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, or 75 nucleotides at each target sequence of said cell(s) via the guide(s) RNA(s) or sgRNA(s) or crRNA(s). The mutations can include the introduction, deletion, or substitution of 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, or 75 nucleotides at each target sequence of said cell(s) via the guide(s) RNA(s) or sgRNA(s) or crRNA(s). The mutations include the introduction, deletion, or substitution of 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, or 75 nucleotides at each target sequence of said cell(s) via the guide(s) RNA(s) or sgRNA(s) or crRNA(s). The mutations can include the introduction, deletion, or substitution of 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, or 75 nucleotides at each target sequence of said cell(s) via the guide(s) RNA(s) or sgRNA(s) or crRNA(s). The mutations can include the introduction, deletion, or substitution of 40, 45, 50, 75, 100, 200, 300, 400 or 500 nucleotides at each target sequence of said cell(s) via the guide(s) RNA(s) or sgRNA(s) or crRNAs.

For minimization of toxicity and off-target effect, it will be important to control the concentration of Cas13 mRNA and guide RNA delivered. Optimal concentrations of Cas13 mRNA and guide RNA can be determined by testing different concentrations in a cellular or non-human eukaryote animal model and using deep sequencing the analyze the extent of modification at potential off-target genomic loci. Guide sequences and strategies to minimize toxicity and off-target effects can be as in WO 2014/093622 (PCT/US2013/074667); or, via mutation as herein.

The nucleic acid molecule encoding a Cas13 is advantageously codon optimized. An example of a codon optimized sequence, is in this instance a sequence optimized for expression in a eukaryote, e.g., humans (i.e. being optimized for expression in humans), or for another eukaryote, animal or mammal as herein discussed; see, e.g., SaCas9 human codon optimized sequence in WO 2014/093622 (PCT/US2013/074667). Whilst this is preferred, it will be appreciated that other examples are possible and codon optimization for a host species other than human, or for codon optimization for specific organs is known. In some embodiments, an enzyme coding sequence encoding a Cas is codon optimized for expression in particular cells, such as eukaryotic cells. The eukaryotic cells may be those of or derived from a particular organism, such as a mammal, including but not limited to human, or non-human eukaryote or animal or mammal as herein discussed, e.g., mouse, rat, rabbit, dog, livestock, or non-human mammal or primate. In some embodiments, processes for modifying the germ line genetic identity of human beings and/or processes for modifying the genetic identity of animals which are likely to cause them suffering without any substantial medical benefit to man or animal, and also animals resulting from such processes, may be excluded. In general, codon optimization refers to a process of modifying a nucleic acid sequence for enhanced expression in the host cells of interest by replacing at least one codon (e.g. about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more codons) of the native sequence with codons that are more frequently or most frequently used in the genes of that host cell while maintaining the native amino acid sequence. Various species exhibit particular bias for certain codons of a particular amino acid. Codon bias (differences in codon usage between organisms) often correlates with the efficiency of translation of messenger RNA (mRNA), which is in turn believed to be dependent on, among other things, the properties of the codons being translated and the availability of particular transfer RNA (tRNA) molecules. The predominance of selected tRNAs in a cell is generally a reflection of the codons used most frequently in peptide synthesis. Accordingly, genes can be tailored for optimal gene expression in a given organism based on codon optimization. Codon usage tables are readily available, for example, at the “Codon Usage Database” available at www.kazusa.orjp/codon/ and these tables can be adapted in a number of ways. See Nakamura, Y., et al. “Codon usage tabulated from the international DNA sequence databases: status for the year 2000” Nucl. Acids Res. 28:292 (2000). Computer algorithms for codon optimizing a particular sequence for expression in a particular host cell are also available, such as Gene Forge (Aptagen; Jacobus, Pa.), are also available. In some embodiments, one or more codons (e.g. 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more, or all codons) in a sequence encoding a Cas correspond to the most frequently used codon for a particular amino acid.

In some embodiments, the unmodified RNA-targeting effector protein (Cas13) may have cleavage activity. In some embodiments, Cas13 may direct cleavage of one or two nucleic acid strands at the location of or near a target sequence, such as within the target sequence and/or within the complement of the target sequence or at sequences associated with the target sequence, e.g., within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence. In some embodiments, the Cas13 protein may direct more than one cleavage (such as one, two three, four, five, or more cleavages) of one or two strands within the target sequence and/or within the complement of the target sequence or at sequences associated with the target sequence and/or within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence. In some embodiments, the cleavage may be blunt, i.e., generating blunt ends. In some embodiments, the cleavage may be staggered, i.e., generating sticky ends. In some embodiments, a vector encodes a nucleic acid-targeting Cas13 protein that may be mutated with respect to a corresponding wild-type enzyme such that the mutated nucleic acid-targeting Cas13 protein lacks the ability to cleave one or two strands of a target polynucleotide containing a target sequence, e.g., alteration or mutation in a HEPN domain to produce a mutated Cas13 substantially lacking all RNA cleavage activity, e.g., the RNA cleavage activity of the mutated enzyme is about no more than 25%, 10%, 5%, 1%, 0.1%, 0.01%, or less of the nucleic acid cleavage activity of the non-mutated form of the enzyme; an example can be when the nucleic acid cleavage activity of the mutated form is nil or negligible as compared with the non-mutated form. By derived, Applicants mean that the derived enzyme is largely based, in the sense of having a high degree of sequence homology with, a wildtype enzyme, but that it has been mutated (modified) in some way as known in the art or as described herein.

Typically, in the context of an endogenous RNA-targeting system, formation of a RNA-targeting complex (comprising a guide RNA or crRNA hybridized to a target sequence and complexed with one or more RNA-targeting effector proteins) results in cleavage of RNA strand(s) in or near (e.g., within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, or more base pairs from) the target sequence. As used herein the term “sequence(s) associated with a target locus of interest” refers to sequences near the vicinity of the target sequence (e.g. within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, or more base pairs from the target sequence, wherein the target sequence is comprised within a target locus of interest).

An example of a codon optimized sequence, is in this instance a sequence optimized for expression in a eukaryote, e.g., humans (i.e. being optimized for expression in humans), or for another eukaryote, animal or mammal as herein discussed; see, e.g., SaCas9 human codon optimized sequence in WO 2014/093622 (PCT/US2013/074667) as an example of a codon optimized sequence (from knowledge in the art and this disclosure, codon optimizing coding nucleic acid molecule(s), especially as to effector protein (e.g., Cas13) is within the ambit of the skilled artisan). Whilst this is preferred, it will be appreciated that other examples are possible and codon optimization for a host species other than human, or for codon optimization for specific organs is known. In some embodiments, an enzyme coding sequence encoding a RNA-targeting Cas13 protein is codon optimized for expression in particular cells, such as eukaryotic cells. The eukaryotic cells may be those of or derived from a particular organism, such as a mammal, including but not limited to human, or non-human eukaryote or animal or mammal as herein discussed, e.g., mouse, rat, rabbit, dog, livestock, or non-human mammal or primate. In some embodiments, processes for modifying the germ line genetic identity of human beings and/or processes for modifying the genetic identity of animals which are likely to cause them suffering without any substantial medical benefit to man or animal, and also animals resulting from such processes, may be excluded. In general, codon optimization refers to a process of modifying a nucleic acid sequence for enhanced expression in the host cells of interest by replacing at least one codon (e.g., about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more codons) of the native sequence with codons that are more frequently or most frequently used in the genes of that host cell while maintaining the native amino acid sequence. Various species exhibit particular bias for certain codons of a particular amino acid. Codon bias (differences in codon usage between organisms) often correlates with the efficiency of translation of messenger RNA (mRNA), which is in turn believed to be dependent on, among other things, the properties of the codons being translated and the availability of particular transfer RNA (tRNA) molecules. The predominance of selected tRNAs in a cell is generally a reflection of the codons used most frequently in peptide synthesis. Accordingly, genes can be tailored for optimal gene expression in a given organism based on codon optimization. Codon usage tables are readily available, for example, at the “Codon Usage Database” available at www.kazusa.orjp/codon/ and these tables can be adapted in a number of ways. See Nakamura, Y., et al. “Codon usage tabulated from the international DNA sequence databases: status for the year 2000” Nucl. Acids Res. 28:292 (2000). Computer algorithms for codon optimizing a particular sequence for expression in a particular host cell are also available, such as Gene Forge (Aptagen; Jacobus, Pa.), are also available. In some embodiments, one or more codons (e.g., 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more, or all codons) in a sequence encoding a DNA/RNA-targeting Cas protein corresponds to the most frequently used codon for a particular amino acid.

The (i) Cas13 or nucleic acid molecule(s) encoding it or (ii) crRNA can be delivered separately; and advantageously at least one or both of one of (i) and (ii), e.g., an assembled complex is delivered via a particle or nanoparticle complex. RNA-targeting effector protein mRNA can be delivered prior to the RNA-targeting guide RNA or crRNA to give time for nucleic acid-targeting effector protein to be expressed. RNA-targeting effector protein (Cas13) mRNA might be administered 1-12 hours (preferably around 2-6 hours) prior to the administration of RNA-targeting guide RNA or crRNA. Alternatively, RNA-targeting effector protein mRNA and RNA-targeting guide RNA or crRNA can be administered together. Advantageously, a second booster dose of guide RNA or crRNA can be administered 1-12 hours (preferably around 2-6 hours) after the initial administration of RNA-targeting effector (Cas13) protein mRNA+guide RNA. Additional administrations of RNA-targeting effector protein mRNA and/or guide RNA or crRNA might be useful to achieve the most efficient levels of genome modification.

In one aspect, the invention provides methods for using one or more elements of a RNA-targeting system. The RNA-targeting complex of the invention provides an effective means for modifying a target RNA single or double stranded, linear or super-coiled. The RNA-targeting complex of the invention has a wide variety of utility including modifying (e.g., deleting, inserting, translocating, inactivating, activating) a target RNA in a multiplicity of cell types. As such the RNA-targeting complex of the invention has a broad spectrum of applications in, e.g., gene therapy, drug screening, disease diagnosis, and prognosis. An exemplary RNA-targeting complex comprises a RNA-targeting effector protein complexed with a guide RNA or crRNA hybridized to a target sequence within the target locus of interest.

In one embodiment, this invention provides a method of cleaving a target RNA. The method may comprise modifying a target RNA using a RNA-targeting complex that binds to the target RNA and effect cleavage of said target RNA. In an embodiment, the RNA-targeting complex of the invention, when introduced into a cell, may create a break (e.g., a single or a double strand break) in the RNA sequence. For example, the method can be used to cleave a disease RNA in a cell. For example, an exogenous RNA template comprising a sequence to be integrated flanked by an upstream sequence and a downstream sequence may be introduced into a cell. The upstream and downstream sequences share sequence similarity with either side of the site of integration in the RNA. Where desired, a donor RNA can be mRNA. The exogenous RNA template comprises a sequence to be integrated (e.g., a mutated RNA). The sequence for integration may be a sequence endogenous or exogenous to the cell. Examples of a sequence to be integrated include RNA encoding a protein or a non-coding RNA (e.g., a microRNA). Thus, the sequence for integration may be operably linked to an appropriate control sequence or sequences. Alternatively, the sequence to be integrated may provide a regulatory function. The upstream and downstream sequences in the exogenous RNA template are selected to promote recombination between the RNA sequence of interest and the donor RNA. The upstream sequence is a RNA sequence that shares sequence similarity with the RNA sequence upstream of the targeted site for integration. Similarly, the downstream sequence is a RNA sequence that shares sequence similarity with the RNA sequence downstream of the targeted site of integration. The upstream and downstream sequences in the exogenous RNA template can have 75%, 80%, 85%, 90%, 95%, or 100% sequence identity with the targeted RNA sequence. Preferably, the upstream and downstream sequences in the exogenous RNA template have about 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the targeted RNA sequence. In some methods, the upstream and downstream sequences in the exogenous RNA template have about 99% or 100% sequence identity with the targeted RNA sequence. An upstream or downstream sequence may comprise from about 20 bp to about 2500 bp, for example, about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, or 2500 bp. In some methods, the exemplary upstream or downstream sequence have about 200 bp to about 2000 bp, about 600 bp to about 1000 bp, or more particularly about 700 bp to about 1000 bp. In some methods, the exogenous RNA template may further comprise a marker. Such a marker may make it easy to screen for targeted integrations. Examples of suitable markers include restriction sites, fluorescent proteins, or selectable markers. The exogenous RNA template of the invention can be constructed using recombinant techniques (see, for example, Sambrook et al., 2001 and Ausubel et al., 1996). In a method for modifying a target RNA by integrating an exogenous RNA template, a break (e.g., double or single stranded break in double or single stranded RNA) is introduced into the RNA sequence by the nucleic acid-targeting complex, the break is repaired via homologous recombination with an exogenous RNA template such that the template is integrated into the RNA target. The presence of a double-stranded break facilitates integration of the template. In other embodiments, this invention provides a method of modifying expression of a RNA in a eukaryotic cell. The method comprises increasing or decreasing expression of a target polynucleotide by using a nucleic acid-targeting complex that binds to the DNA or RNA (e.g., mRNA or pre-mRNA). In some methods, a target RNA can be inactivated to affect the modification of the expression in a cell. For example, upon the binding of a RNA-targeting complex to a target sequence in a cell, the target RNA is inactivated such that the sequence is not translated, the coded protein is not produced, or the sequence does not function as the wild-type sequence does. For example, a protein or microRNA coding sequence may be inactivated such that the protein or microRNA or pre-microRNA transcript is not produced. The target RNA of a RNA-targeting complex can be any RNA endogenous or exogenous to the eukaryotic cell. For example, the target RNA can be a RNA residing in the nucleus of the eukaryotic cell. The target RNA can be a sequence (e.g., mRNA or pre-mRNA) coding a gene product (e.g., a protein) or a non-coding sequence (e.g., ncRNA, lncRNA, tRNA, or rRNA). Examples of target RNA include a sequence associated with a signaling biochemical pathway, e.g., a signaling biochemical pathway-associated RNA. Examples of target RNA include a disease associated RNA. A “disease-associated” RNA refers to any RNA which is yielding translation products at an abnormal level or in an abnormal form in cells derived from a disease-affected tissues compared with tissues or cells of a non disease control. It may be a RNA transcribed from a gene that becomes expressed at an abnormally high level; it may be a RNA transcribed from a gene that becomes expressed at an abnormally low level, where the altered expression correlates with the occurrence and/or progression of the disease. A disease-associated RNA also refers to a RNA transcribed from a gene possessing mutation(s) or genetic variation that is directly responsible or is in linkage disequilibrium with a gene(s) that is responsible for the etiology of a disease. The translated products may be known or unknown, and may be at a normal or abnormal level. The target RNA of a RNA-targeting complex can be any RNA endogenous or exogenous to the eukaryotic cell. For example, the target RNA can be a RNA residing in the nucleus of the eukaryotic cell. The target RNA can be a sequence (e.g., mRNA or pre-mRNA) coding a gene product (e.g., a protein) or a non-coding sequence (e.g., ncRNA, lncRNA, tRNA, or rRNA).

In some embodiments, the method may comprise allowing a RNA-targeting complex to bind to the target RNA to effect cleavage of said target RNA thereby modifying the target RNA, wherein the RNA-targeting complex comprises a nucleic acid-targeting effector (Cas13) protein complexed with a guide RNA or crRNA hybridized to a target sequence within said target RNA. In one aspect, the invention provides a method of modifying expression of RNA in a eukaryotic cell. In some embodiments, the method comprises allowing a RNA-targeting complex to bind to the RNA such that said binding results in increased or decreased expression of said RNA; wherein the RNA-targeting complex comprises a nucleic acid-targeting effector (Cas13) protein complexed with a guide RNA. Methods of modifying a target RNA can be in a eukaryotic cell, which may be in vivo, ex vivo or in vitro. In some embodiments, the method comprises sampling a cell or population of cells from a human or non-human animal, and modifying the cell or cells. Culturing may occur at any stage ex vivo. The cell or cells may even be re-introduced into the non-human animal or plant. For re-introduced cells it is particularly preferred that the cells are stem cells.

The use of two different aptamers (each associated with a distinct RNA-targeting guide RNAs) allows an activator-adaptor protein fusion and a repressor-adaptor protein fusion to be used, with different RNA-targeting guide RNAs or crRNAs, to activate expression of RNA, whilst repressing another. They, along with their different guide RNAs or crRNAs can be administered together, or substantially together, in a multiplexed approach. A large number of such modified RNA-targeting guide RNAs or crRNAs can be used all at the same time, for example 10 or 20 or 30 and so forth, whilst only one (or at least a minimal number) of effector protein (Cas13) molecules need to be delivered, as a comparatively small number of effector protein molecules can be used with a large number of modified guides. The adaptor protein may be associated (preferably linked or fused to) one or more activators or one or more repressors. For example, the adaptor protein may be associated with a first activator and a second activator. The first and second activators may be the same, but they are preferably different activators. Three or more or even four or more activators (or repressors) may be used, but package size may limit the number being higher than 5 different functional domains. Linkers are preferably used, over a direct fusion to the adaptor protein, where two or more functional domains are associated with the adaptor protein. Suitable linkers might include the GlySer linker.

It is also envisaged that the RNA-targeting effector protein-guide RNA complex as a whole may be associated with two or more functional domains. For example, there may be two or more functional domains associated with the RNA-targeting effector protein, or there may be two or more functional domains associated with the guide RNA or crRNA (via one or more adaptor proteins), or there may be one or more functional domains associated with the RNA-targeting effector protein and one or more functional domains associated with the guide RNA or crRNA (via one or more adaptor proteins).

The fusion between the adaptor protein and the activator or repressor may include a linker. For example, GlySer linkers GGGS can be used. They can be used in repeats of 3 ((GGGGS)3 (SEQ ID NO:79)) or 6, 9 or even 12 or more, to provide suitable lengths, as required. Linkers can be used between the guide RNAs and the functional domain (activator or repressor), or between the nucleic acid-targeting effector protein and the functional domain (activator or repressor). The linkers the user to engineer appropriate amounts of “mechanical flexibility”.

CRISPR effector (Cas13) protein or mRNA therefor (or more generally a nucleic acid molecule therefor) and guide RNA or crRNA might also be delivered separately e.g., the former 1-12 hours (preferably around 2-6 hours) prior to the administration of guide RNA or crRNA, or together. A second booster dose of guide RNA or crRNA can be administered 1-12 hours (preferably around 2-6 hours) after the initial administration.

The Cas13 effector protein is sometimes referred to herein as a CRISPR Enzyme. It will be appreciated that the effector protein is based on or derived from an enzyme, so the term ‘effector protein’ certainly includes ‘enzyme’ in some embodiments. However, it will also be appreciated that the effector protein may, as required in some embodiments, have DNA or RNA binding, but not necessarily cutting or nicking, activity, including a dead-Cas effector protein function.

Cellular targets include Hemopoietic Stem/Progenitor Cells (CD34+); Human T cells; and Eye (retinal cells)—for example photoreceptor precursor cells.

Inventive methods can further comprise delivery of templates. Delivery of templates may be via the cotemporaneous or separate from delivery of any or all the CRISPR effector protein (Cas13) or guide or crRNA and via the same delivery mechanism or different.

In certain embodiments, the methods as described herein may comprise providing a Cas13 transgenic cell in which one or more nucleic acids encoding one or more guide RNAs are provided or introduced operably connected in the cell with a regulatory element comprising a promoter of one or more gene of interest. As used herein, the term “Cas13 transgenic cell” refers to a cell, such as a eukaryotic cell, in which a Cas13 gene has been genomically integrated. The nature, type, or origin of the cell are not particularly limiting according to the present invention. Also the way how the Cas13 transgene is introduced in the cell is may vary and can be any method as is known in the art. In certain embodiments, the Cas13 transgenic cell is obtained by introducing the Cas13 transgene in an isolated cell. In certain other embodiments, the Cas13 transgenic cell is obtained by isolating cells from a Cas13 transgenic organism. By means of example, and without limitation, the Cas13 transgenic cell as referred to herein may be derived from a Cas13 transgenic eukaryote, such as a Cas13 knock-in eukaryote. Reference is made to WO 2014/093622 (PCT/US13/74667), incorporated herein by reference. Methods of US Patent Publication Nos. 20120017290 and 20110265198 assigned to Sangamo BioSciences, Inc. directed to targeting the Rosa locus may be modified to utilize the CRISPR Cas system of the present invention. Methods of US Patent Publication No. 20130236946 assigned to Cellectis directed to targeting the Rosa locus may also be modified to utilize the CRISPR Cas system of the present invention. By means of further example reference is made to Platt et. al. (Cell; 159(2):440-455 (2014)), describing a Cas9 knock-in mouse, which is incorporated herein by reference. The Cas13 transgene can further comprise a Lox-Stop-polyA-Lox(LSL) cassette thereby rendering Cas13 expression inducible by Cre recombinase. Alternatively, the Cas13 transgenic cell may be obtained by introducing the Cas13 transgene in an isolated cell. Delivery systems for transgenes are well known in the art. By means of example, the Cas13 transgene may be delivered in for instance eukaryotic cell by means of vector (e.g., AAV, adenovirus, lentivirus) and/or particle and/or particle delivery, as also described herein elsewhere.

It will be understood by the skilled person that the cell, such as the Cas13 transgenic cell, as referred to herein may comprise further genomic alterations besides having an integrated Cas13 gene or the mutations arising from the sequence specific action of Cas13 when complexed with RNA capable of guiding Cas13 to a target locus, such as for instance one or more oncogenic mutations, as for instance and without limitation described in Platt et al. (2014), Chen et al., (2014) or Kumar et al. (2009).

In some embodiments, the Cas13 sequence is fused to one or more nuclear localization sequences (NLSs), such as about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs. In some embodiments, the Cas13 comprises about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the amino-terminus, about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the carboxy-terminus, or a combination of these (e.g. zero or at least one or more NLS at the amino-terminus and zero or at one or more NLS at the carboxy terminus). When more than one NLS is present, each may be selected independently of the others, such that a single NLS may be present in more than one copy and/or in combination with one or more other NLSs present in one or more copies. In a preferred embodiment of the invention, the Cas13 comprises at most 6 NLSs. In some embodiments, an NLS is considered near the N- or C-terminus when the nearest amino acid of the NLS is within about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, or more amino acids along the polypeptide chain from the N- or C-terminus. Non-limiting examples of NLSs include an NLS sequence derived from: the NLS of the SV40 virus large T-antigen, having the amino acid sequence PKKKRKV (SEQ ID NO: 80); the NLS from nucleoplasmin (e.g. the nucleoplasmin bipartite NLS with the sequence KRPAATKKAGQAKKKK) (SEQ ID NO: 81); the c-myc NLS having the amino acid sequence PAAKRVKLD (SEQ ID NO: 82) or RQRRNELKRSP (SEQ ID NO: 83); the hRNPA1 M9 NLS having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 84); the sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 85) of the IBB domain from importin-alpha; the sequences VSRKRPRP (SEQ ID NO: 86) and PPKKARED (SEQ ID NO: 87) of the myoma T protein; the sequence POPKKKPL (SEQ ID NO: 88) of human p53; the sequence SALIKKKKKMAP (SEQ ID NO: 89) of mouse c-abl IV; the sequences DRLRR (SEQ ID NO: 90) and PKQKKRK (SEQ ID NO: 91) of the influenza virus NS1; the sequence RKLKKKIKKL (SEQ ID NO: 92) of the Hepatitis virus delta antigen; the sequence REKKKFLKRR (SEQ ID NO: 93) of the mouse Mx1 protein; the sequence KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 94) of the human poly(ADP-ribose) polymerase; and the sequence RKCLQAGMNLEARKTKK (SEQ ID NO: 95) of the steroid hormone receptors (human) glucocorticoid. In general, the one or more NLSs are of sufficient strength to drive accumulation of the Cas in a detectable amount in the nucleus of a eukaryotic cell. In general, strength of nuclear localization activity may derive from the number of NLSs in the Cas, the particular NLS(s) used, or a combination of these factors. Detection of accumulation in the nucleus may be performed by any suitable technique. For example, a detectable marker may be fused to the Cas, such that location within a cell may be visualized, such as in combination with a means for detecting the location of the nucleus (e.g. a stain specific for the nucleus such as DAPI). Cell nuclei may also be isolated from cells, the contents of which may then be analyzed by any suitable process for detecting protein, such as immunohistochemistry, Western blot, or enzyme activity assay. Accumulation in the nucleus may also be determined indirectly, such as by an assay for the effect of CRISPR complex formation (e.g. assay for DNA cleavage or mutation at the target sequence, or assay for altered gene expression activity affected by CRISPR complex formation and/or Cas enzyme activity), as compared to a control no exposed to the Cas or complex, or exposed to a Cas lacking the one or more NLSs.

The guide RNA(s), e.g., sgRNA(s) or crRNA(s) encoding sequences and/or Cas13 encoding sequences, can be functionally or operatively linked to regulatory element(s) and hence the regulatory element(s) drive expression. The promoter(s) can be constitutive promoter(s) and/or conditional promoter(s) and/or inducible promoter(s) and/or tissue specific promoter(s). The promoter can be selected from the group consisting of RNA polymerases, pol I, pol II, pol III, T7, U6, H1, retroviral Rous sarcoma virus (RSV) LTR promoter, the cytomegalovirus (CMV) promoter, the SV40 promoter, the dihydrofolate reductase promoter, the β-actin promoter, the phosphoglycerol kinase (PGK) promoter, and the EF1a promoter. An advantageous promoter is the promoter is U6.

In some embodiments, a CRISPR effector (Cas 13n) protein may form a component of an inducible system. The inducible nature of the system would allow for spatiotemporal control of gene editing or gene expression using a form of energy. The form of energy may include but is not limited to electromagnetic radiation, sound energy, chemical energy and thermal energy. Examples of inducible system include tetracycline inducible promoters (Tet-On or Tet-Off), small molecule two-hybrid transcription activations systems (FKBP, ABA, etc.), or light inducible systems (Phytochrome, LOV domains, or cryptochrome). In one embodiment, the CRISPR effector protein may be a part of a Light Inducible Transcriptional Effector (LITE) to direct changes in transcriptional activity in a sequence-specific manner. The components of a light may include a CRISPR effector protein, a light-responsive cytochrome heterodimer (e.g. from Arabidopsis thaliana), and a transcriptional activation/repression domain. Further examples of inducible DNA binding proteins and methods for their use are provided in U.S. 61/736,465 and U.S. 61/721,283, and WO 2014018423 A2 which is hereby incorporated by reference in its entirety.

Whenever reference is made herein to Cas13, it will be understood that a mutated Cas13 according to the invention as described herein is meant, unless explicitly indicated otherwise. Whenever reference is made herein to Cas13, preferably a mutated Cas13a, Cas13b, Cas13c, or Cas13d according to the invention as described herein is meant, unless explicitly indicated otherwise. Whenever reference is made herein to Cas13, preferably a mutated Cas13b according to the invention as described herein is meant, unless explicitly indicated otherwise.

In one aspect, the invention provides a mutated Cas13 as described herein, such as preferably, but without limitation Cas13b as described herein elsewhere, having one or more mutations resulting in reduced off-target effects, i.e. improved CRISPR enzymes for use in effecting modifications to target loci but which reduce or eliminate activity towards off-targets, such as when complexed to guide RNAs, as well as improved CRISPR enzymes for increasing the activity of CRISPR enzymes, such as when complexed with guide RNAs. It is to be understood that mutated enzymes as described herein below may be used in any of the methods according to the invention as described herein elsewhere. Any of the methods, products, compositions and uses as described herein elsewhere are equally applicable with the mutated CRISPR enzymes as further detailed below.

Slaymaker et al. recently described a method for the generation of Cas9 orthologues with enhanced specificity (Slaymaker et al. 2015 “Rationally engineered Cas9 nucleases with improved specificity”). This strategy can be used to enhance the specificity of the Cas13 protein. Primary residues for mutagenesis are preferably all positive charges residues within the HEPN domain. Additional residues are positive charged residues that are conserved between different orthologues.

In an aspect, the invention also provides methods and mutations for modulating Cas13 binding activity and/or binding specificity. In certain embodiments Cas13 proteins lacking nuclease activity are used. In certain embodiments, modified guide RNAs are employed that promote binding but not nuclease activity of a Cas13 nuclease. In such embodiments, on-target binding can be increased or decreased. Also, in such embodiments off-target binding can be increased or decreased. Moreover, there can be increased or decreased specificity as to on-target binding vs. off-target binding.

The methods and mutations which can be employed in various combinations to increase or decrease activity and/or specificity of on-target vs. off-target activity, or increase or decrease binding and/or specificity of on-target vs. off-target binding, can be used to compensate or enhance mutations or modifications made to promote other effects. Such mutations or modifications made to promote other effects in include mutations or modification to the Cas13 and or mutation or modification made to a guide RNA. The methods and mutations of the invention are used to modulate Cas13 nuclease activity and/or binding with chemically modified guide RNAs.

In an aspect, the invention provides methods and mutations for modulating binding and/or binding specificity of Cas13 proteins according to the invention as defined herein comprising functional domains such as nucleases, transcriptional activators, transcriptional repressors, and the like. For example, a Cas13 protein can be made nuclease-null, or having altered or reduced nuclease activity by introducing mutations such as for instance Cas13 mutations described herein elsewhere. Nuclease deficient Cas13 proteins are useful for RNA-guided target sequence dependent delivery of functional domains. The invention provides methods and mutations for modulating binding of Cas13 proteins. In one embodiment, the functional domain comprises VP64, providing an RNA-guided transcription factor. In another embodiment, the functional domain comprises Fok I, providing an RNA-guided nuclease activity. Mention is made of U.S. Pat. Pub. 2014/0356959, U.S. Pat. Pub. 2014/0342456, U.S. Pat. Pub. 2015/0031132, and Mali, P. et al., 2013, Science 339(6121):823-6, doi: 10.1126/science.1232033, published online 3 Jan. 2013 and through the teachings herein the invention comprehends methods and materials of these documents applied in conjunction with the teachings herein. In certain embodiments, on-target binding is increased. In certain embodiments, off-target binding is decreased. In certain embodiments, on-target binding is decreased. In certain embodiments, off-target binding is increased. Accordingly, the invention also provides for increasing or decreasing specificity of on-target binding vs. off-target binding of functionalized Cas13 binding proteins.

The use of Cas13 as an RNA-guided binding protein is not limited to nuclease-null Ca13. Cas13 enzymes comprising nuclease activity can also function as RNA-guided binding proteins when used with certain guide RNAs. For example short guide RNAs and guide RNAs comprising nucleotides mismatched to the target can promote RNA directed Cas13 binding to a target sequence with little or no target cleavage. (See, e.g., Dahlman, 2015, Nat Biotechnol. 33(11):1159-1161, doi: 10.1038/nbt.3390, published online 5 Oct. 2015). In an aspect, the invention provides methods and mutations for modulating binding of Cas13 proteins that comprise nuclease activity. In certain embodiments, on-target binding is increased. In certain embodiments, off-target binding is decreased. In certain embodiments, on-target binding is decreased. In certain embodiments, off-target binding is increased. In certain embodiments, there is increased or decreased specificity of on-target binding vs. off-target binding. In certain embodiments, nuclease activity of guide RNA-Cas13 enzyme is also modulated.

RNA-RNA duplex formation is important for cleavage activity and specificity throughout the target region, not only the seed region sequence closest to the PAM. Thus, truncated guide RNAs show reduced cleavage activity and specificity. In an aspect, the invention provides method and mutations for increasing activity and specificity of cleavage using altered guide RNAs.

In certain embodiments, the catalytic activity of the CRISPR-Cas protein (e.g., Cas13) of the invention is altered or modified. It is to be understood that mutated Cas13 has an altered or modified catalytic activity if the catalytic activity is different than the catalytic activity of the corresponding wild type CRISPR-Cas protein (e.g., unmutated CRISPR-Cas protein). Catalytic activity can be determined by means known in the art. By means of example, and without limitation, catalytic activity can be determined in vitro or in vivo by determination of indel percentage (for instance after a given time, or at a given dose). In certain embodiments, catalytic activity is increased. In certain embodiments, catalytic activity is increased by at least 5%, preferably at least 10%, more preferably at least 20%, such as at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 100%. In certain embodiments, catalytic activity is decreased. In certain embodiments, catalytic activity is decreased by at least 5%, preferably at least 10%, more preferably at least 20%, such as at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or (substantially) 100%. The one or more mutations herein may inactivate the catalytic activity, which may substantially all catalytic activity, below detectable levels, or no measurable catalytic activity.

One or more characteristics of the engineered CRISPR-Cas protein may be different from a corresponding wiled type CRISPR-Cas protein. Examples of such characteristics include catalytic activity, gRNA binding, specificity of the CRISPR-Cas protein (e.g., specificity of editing a defined target), stability of the CRISPR-Cas protein, off-target binding, target binding, protease activity, nickase activity, PFS recognition. In some examples, a engineered CRISPR-Cas protein may comprise one or more mutations of the corresponding wild type CRISPR-Cas protein. In some embodiments, the catalytic activity of the engineered CRISPR-Cas protein is increased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the catalytic activity of the engineered CRISPR-Cas protein is decreased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the gRNA binding of the engineered CRISPR-Cas protein is increased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the gRNA binding of the engineered CRISPR-Cas protein is decreased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the specificity of the CRISPR-Cas protein is increased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the specificity of the CRISPR-Cas protein is decreased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the stability of the CRISPR-Cas protein is increased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the stability of the CRISPR-Cas protein is decreased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the engineered CRISPR-Cas protein further comprises one or more mutations which inactivate catalytic activity. In some embodiments, the off-target binding of the CRISPR-Cas protein is increased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the off-target binding of the CRISPR-Cas protein is decreased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the target binding of the CRISPR-Cas protein is increased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the target binding of the CRISPR-Cas protein is decreased as compared to a corresponding wildtype CRISPR-Cas protein. In some embodiments, the engineered CRISPR-Cas protein has a higher protease activity or polynucleotide-binding capability compared with a corresponding wildtype CRISPR-Cas protein. In some embodiments, the PFS recognition is altered as compared to a corresponding wildtype CRISPR-Cas protein.

In certain embodiments, the gRNA (crRNA) binding of the Cas13 protein of the invention is altered or modified. It is to be understood that mutated Cas13 has an altered or modified gRNA binding if the gRNA binding is different than the gRNA binding of the corresponding wild type Cas13 (i.e. unmutated Cas13).gRNA binding can be determined by means known in the art. By means of example, and without limitation, gRNA binding can be determined by calculating binding strength or affinity (such as based on equilibrium constants, Ka, Kd, etc). In certain embodiments, gRNA binding is increased. In certain embodiments, gRNA binding is increased by at least 5%, preferably at least 10%, more preferably at least 20%, such as at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 100%. In certain embodiments, gRNA binding is decreased. In certain embodiments, gRNA binding is decreased by at least 5%, preferably at least 10%, more preferably at least 20%, such as at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or (substantially) 100%.

In certain embodiments, the specificity of the Cas13 protein of the invention is altered or modified. It is to be understood that mutated Cas13 has an altered or modified specificity if the specificity is different than the specificity of the corresponding wild type Cas13 (i.e. unmutated Cas13). Specificity can be determined by means known in the art. By means of example, and without limitation, specificity can be determined by comparison of on-target activity and off-target activity. In certain embodiments, specificity is increased. In certain embodiments, specificity is increased by at least 5%, preferably at least 10%, more preferably at least 20%, such as at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 100%. In certain embodiments, specificity is decreased. In certain embodiments, specificity is decreased by at least 5%, preferably at least 10%, more preferably at least 20%, such as at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or (substantially) 100%.

In certain embodiments, the stability of the Cas13 protein of the invention is altered or modified. It is to be understood that mutated Cas13 has an altered or modified stability if the stability is different than the stability of the corresponding wild type Cas13 (i.e. unmutated Cas13). Stability can be determined by means known in the art. By means of example, and without limitation, stability can be determined by determining the half-life of the Cas13 protein. In certain embodiments, stability is increased. In certain embodiments, stability is increased by at least 5%, preferably at least 10%, more preferably at least 20%, such as at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 100%. In certain embodiments, stability is decreased. In certain embodiments, stability is decreased by at least 5%, preferably at least 10%, more preferably at least 20%, such as at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or (substantially) 100%.

In certain embodiments, the target binding of the Cas13 protein of the invention is altered or modified. It is to be understood that mutated Cas13 has an altered or modified target binding if the target binding is different than the target binding of the corresponding wild type Cas13 (i.e. unmutated Cas13). target binding can be determined by means known in the art. By means of example, and without limitation, target binding can be determined by calculating binding strength or affinity (such as based on equilibrium constants, Ka, Kd, etc). In certain embodiments, target bindings increased. In certain embodiments, target binding is increased by at least 5%, preferably at least 10%, more preferably at least 20%, such as at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 100%. In certain embodiments, target binding is decreased. In certain embodiments, target binding is decreased by at least 5%, preferably at least 10%, more preferably at least 20%, such as at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or (substantially) 100%.

In certain embodiments, the off-target binding of the Cas13 protein of the invention is altered or modified. It is to be understood that mutated Cas13 has an altered or modified off-target binding if the off-target binding is different than the off-target binding of the corresponding wild type Cas13 (i.e. unmutated Cas13). Off-target binding can be determined by means known in the art. By means of example, and without limitation, off-target binding can be determined by calculating binding strength or affinity (such as based on equilibrium constants, Ka, Kd, etc). In certain embodiments, off-target bindings increased. In certain embodiments, off-target binding is increased by at least 5%, preferably at least 10%, more preferably at least 20%, such as at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 100%. In certain embodiments, off-target binding is decreased. In certain embodiments, off-target binding is decreased by at least 5%, preferably at least 10%, more preferably at least 20%, such as at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or (substantially) 100%.

In certain embodiments, the PFS (or PAM) recognition or specificity of the Cas13 protein of the invention is altered or modified. It is to be understood that mutated Cas13 has an altered or modified PFS recognition or specificity if the PFS recognition or specificity is different than the PFS recognition or specificity of the corresponding wild type Cas13 (i.e. unmutated Cas13). PFS recognition or specificity can be determined by means known in the art. By means of example, and without limitation, PFS recognition or specificity can be determined by PFS (PAM) screens. In certain embodiments, at least one different PFS is recognized by the Cas13. In certain embodiments, at least one PFS is recognized by the mutated Cas13 which is not recognized by the corresponding wild type Cas13. In certain embodiments, at least one PFS is recognized by the mutated Cas13 which is not recognized by the corresponding wild type Cas13, in addition to the wild type PFS. In certain embodiments, at least one PFS is recognized by the mutated Cas13 which is not recognized by the corresponding wild type Cas13, and the wild type PFS is not anymore recognized. In certain embodiments, the PFS recognized by the mutated Cas13 is longer than the PFS recognized by the wild type Cas13, such as 1, 2, or 3 nucleotides longer. In certain embodiments, the PFS recognized by the mutated Cas13 is shorter than the PFS recognized by the wild type Cas13, such as 1, 2, or 3 nucleotides shorter.

The invention provides a non-naturally occurring or engineered composition comprising

i) a mutated Cas13 effector protein, and
ii) a crRNA,
wherein the crRNA comprises a) a guide sequence that is capable of hybridizing to a target RNA sequence, and b) a direct repeat sequence,

whereby there is formed a CRISPR complex comprising the Cas13 effector protein complexed with the guide sequence that is hybridized to the target RNA sequence. The complex can be formed in vitro or ex vivo and introduced into a cell or contacted with RNA; or can be formed in vivo.

In some embodiments, such as for Cas13b, a non-naturally occurring or engineered composition of the invention may comprise an accessory protein that enhances Type VI-B CRISPR-Cas effector protein activity.

In certain such embodiments, the accessory protein that enhances Cas13b effector protein activity is a csx28 protein. In such embodiments, the Type VI-B CRISPR-Cas effector protein and the Type VI-B CRISPR-Cas accessory protein may be from the same source or from a different source.

In some embodiments, a non-naturally occurring or engineered composition of the invention comprises an accessory protein that represses Cas13b effector protein activity.

In certain such embodiments, the accessory protein that represses Cas13b effector protein activity is a csx27 protein. In such embodiments, the Type VI-B CRISPR-Cas effector protein and the Type VI-B CRISPR-Cas accessory protein may be from the same source or from a different source. In certain embodiments of the invention, the Type VI-B CRISPR-Cas effector protein is from Table 1.

In some embodiments, a non-naturally occurring or engineered composition of the invention comprises two or more crRNAs.

In some embodiments, a non-naturally occurring or engineered composition of the invention comprises a guide sequence that hybridizes to a target RNA sequence in a prokaryotic cell.

In some embodiments, a non-naturally occurring or engineered composition of the invention comprises a guide sequence that hybridizes to a target RNA sequence in a eukaryotic cell.

In some embodiment, the Cas13 effector protein comprises one or more nuclear localization signals (NLSs).

In certain embodiments, the Cas13 effector protein of the invention is, or in, or comprises, or consists essentially of, or consists of, or involves or relates to such a protein derived from or as set forth in Tables 1-4, and comprising one or more mutation of the invention as described herein elsewhere.

In some embodiment of the non-naturally occurring or engineered composition of the invention, the Cas13 effector protein is associated with one or more functional domains. The association can be by direct linkage of the effector protein to the functional domain, or by association with the crRNA. In a non-limiting example, the crRNA comprises an added or inserted sequence that can be associated with a functional domain of interest, including, for example, an aptamer or a nucleotide that binds to a nucleic acid binding adapter protein. The functional domain may be a functional heterologous domain.

In certain non-limiting embodiments, a non-naturally occurring or engineered composition of the invention comprises a functional domain cleaves the target RNA sequence.

In certain non-limiting embodiments, the non-naturally occurring or engineered composition of the invention comprises a functional domain that modifies transcription or translation of the target RNA sequence.

In some embodiment of the composition of the invention, the Cas13 effector protein is associated with one or more functional domains; and the effector protein contains one or more mutations within an HEPN domain, whereby the complex can deliver an epigenetic modifier or a transcriptional or translational activation or repression signal. The complex can be formed in vitro or ex vivo and introduced into a cell or contacted with RNA; or can be formed in vivo.

In some embodiment of the non-naturally occurring or engineered composition of the invention, the Cas13b effector protein and the accessory protein are from the same organism.

In some embodiment of the non-naturally occurring or engineered composition of the invention, the Cas13b effector protein and the accessory protein are from different organisms.

The invention also provides a Type VI CRISPR-Cas vector system, which comprises one or more vectors comprising:

a first regulatory element operably linked to a nucleotide sequence encoding the Cas13 effector protein, and a second regulatory element operably linked to a nucleotide sequence encoding the crRNA.

In certain embodiments, the vector system of the invention further comprises a regulatory element operably linked to a nucleotide sequence of a Type VI-B CRISPR-Cas accessory protein.

When appropriate, the nucleotide sequence encoding the Type VI CRISPR-Cas effector protein (and/or optionally the nucleotide sequence encoding the Type VI-B CRISPR-Cas accessory protein) is codon optimized for expression in a eukaryotic cell.

In some embodiment of the vector system of the invention, the nucleotide sequences encoding the Cas13 effector protein (and optionally) the accessory protein are codon optimized for expression in a eukaryotic cell.

In some embodiment, the vector system of the invention comprises in a single vector.

In some embodiment of the vector system of the invention, the one or more vectors comprise viral vectors.

In some embodiment of the vector system of the invention, the one or more vectors comprise one or more retroviral, lentiviral, adenoviral, adeno-associated or herpes simplex viral vectors.

The invention provides a delivery system configured to deliver a Cas13 effector protein and one or more nucleic acid components of a non-naturally occurring or engineered composition comprising

i) a mutated Cas13 effector protein according to the invention as described herein, and

ii) a crRNA,

wherein the crRNA comprises a) a guide sequence that hybridizes to a target RNA sequence in a cell, and b) a direct repeat sequence,

wherein the Cas13 effector protein forms a complex with the crRNA,

wherein the guide sequence directs sequence-specific binding to the target RNA sequence,

whereby there is formed a CRISPR complex comprising the Cas13 effector protein complexed with the guide sequence that is hybridized to the target RNA sequence. The complex can be formed in vitro or ex vivo and introduced into a cell or contacted with RNA; or can be formed in vivo.

In some embodiment of the delivery system of the invention, the system comprises one or more vectors or one or more polynucleotide molecules, the one or more vectors or polynucleotide molecules comprising one or more polynucleotide molecules encoding the Cas13 effector protein and one or more nucleic acid components of the non-naturally occurring or engineered composition.

In some embodiment, the delivery system of the invention comprises a delivery vehicle comprising liposome(s), particle(s), exosome(s), microvesicle(s), a gene-gun or one or more viral vector(s).

In some embodiment, the non-naturally occurring or engineered composition of the invention is for use in a therapeutic method of treatment or in a research program.

In some embodiment, the non-naturally occurring or engineered vector system of the invention is for use in a therapeutic method of treatment or in a research program.

In some embodiment, the non-naturally occurring or engineered delivery system of the invention is for use in a therapeutic method of treatment or in a research program.

The invention provides a method of modifying expression of a target gene of interest, the method comprising contacting a target RNA with one or more non-naturally occurring or engineered compositions comprising

i) a mutated Cas13 effector protein according to the invention as described herein, and

ii) a crRNA,

wherein the crRNA comprises a) a guide sequence that hybridizes to a target RNA sequence in a cell, and b) a direct repeat sequence,

wherein the Cas13 effector protein forms a complex with the crRNA,

wherein the guide sequence directs sequence-specific binding to the target RNA sequence in a cell,

whereby there is formed a CRISPR complex comprising the Cas13 effector protein complexed with the guide sequence that is hybridized to the target RNA sequence,

whereby expression of the target locus of interest is modified. The complex can be formed in vitro or ex vivo and introduced into a cell or contacted with RNA; or can be formed in vivo.

In some embodiment, the method of modifying expression of a target gene of interest further comprises contacting the target RNA with an accessory protein that enhances Cas13b effector protein activity.

In some embodiment of the method of modifying expression of a target gene of interest, the accessory protein that enhances Cas13b effector protein activity is a csx28 protein.

In some embodiment, the method of modifying expression of a target gene of interest further comprises contacting the target RNA with an accessory protein that represses Cas13b effector protein activity.

In some embodiment of the method of modifying expression of a target gene of interest, the accessory protein that represses Cas13b effector protein activity is a csx27 protein.

In some embodiment, the method of modifying expression of a target gene of interest comprises cleaving the target RNA.

In some embodiment, the method of modifying expression of a target gene of interest comprises increasing or decreasing expression of the target RNA.

In some embodiment of the method of modifying expression of a target gene of interest, the target gene is in a prokaryotic cell.

In some embodiment of the method of modifying expression of a target gene of interest, the target gene is in a eukaryotic cell.

The invention provides a cell comprising a modified target of interest, wherein the target of interest has been modified according to any of the method disclosed herein.

In some embodiment of the invention, the cell is a prokaryotic cell.

In some embodiment of the invention, the cell is a eukaryotic cell.

In some embodiment, modification of the target of interest in a cell results in:

a cell comprising altered expression of at least one gene product;
a cell comprising altered expression of at least one gene product, wherein the expression of the at least one gene product is increased; or
a cell comprising altered expression of at least one gene product, wherein the expression of the at least one gene product is decreased.

In some embodiment, the cell is a mammalian cell or a human cell.

The invention provides a cell line of or comprising a cell disclosed herein or a cell modified by any of the methods disclosed herein, or progeny thereof.

The invention provides a multicellular organism comprising one or more cells disclosed herein or one or more cells modified according to any of the methods disclosed herein.

The invention provides a plant or animal model comprising one or more cells disclosed herein or one or more cells modified according to any of the methods disclosed herein.

The invention provides a gene product from a cell or the cell line or the organism or the plant or animal model disclosed herein.

In some embodiment, the amount of gene product expressed is greater than or less than the amount of gene product from a cell that does not have altered expression.

In certain embodiments, the Cas13 protein originates from a species of the genus Alistipes, Anaerosalibacter, Bacteroides, Bacteroidetes, Bergeyella, Blautia, Butyrivibrio, Capnocytophaga, Carnobacterium, Chloroflexus, Chryseobacterium, Clostridium, Demequina, Eubacteriaceae, Eubacterium, Flavobacterium, Fusobacterium, Herbinix, Insolitispirillum, Lachnospiraceae, Leptotrichia, Listeria, Myroides, Paludibacter, Phaeodactylibacter, Porphyromonadaceae, Porphyromonas, Prevotella, Pseudobutyrivibrio, Psychroflexus, Reichenbachiella, Rhodobacter, Riemerella, Sinomicrobium, Thalassospira, Ruminococcus. As used herein, when a Cas13 protein originates form a species, it may be the wild type Cas13 protein in the species, or a homolog of the wild type Cas13 protein in the species. The Cas13 protein that is a homolog of the wild type Cas13 protein in the species may comprise one or more variations (e.g., mutations, truncations, etc.) of the wild type Cas13 protein.

In certain embodiments, the Cas13 protein originates from Leptotrichia shahii, Listeria seeligeri, Lachnospiraceae bacterium (such as Lb MA2020, Lb NK4A179, Lb NK4A144), Clostridium aminophilum (such as Ca DSM 10710), Carnobacterium gallinarum (such as Cg DSM 4847), Paludibacter propionicigenes (such as Pp WB4), Listeria weihenstephanensis (such as Lw FSL R9-0317), Listeriaceae bacterium (such as Lb FSL M6-0635), Leptotrichia wadei (such as Lw F0279), Rhodobacter capsulatus (such as Rc SB 1003, Rc R121, Rc DE442), Leptotrichia buccalis (such as Lb C-1013-b), Herbinix hemicellulosilytica, Eubacteriaceae bacterium (such as Eb CHKCI004), Blautia. sp Marseille-P2398, Leptotrichia sp. oral taxon 879 str. F0557, Chloroflexus aggregans, Demequina aurantiaca, Thalassospira sp. TSL5-1, Pseudobutyrivibrio sp. OR37, Butyrivibrio sp. YAB3001, Leptotrichia sp. Marseille-P3007, Bacteroides ihuae, Porphyromonadaceae bacterium (such as Pb KH3CP3RA), Listeria riparia, Insolitispirillum peregrinum, Alistipes sp. ZOR0009, Bacteroides pyogenes (such as Bp F0041), Bacteroidetes bacterium (such as Bb GWA2_31_9), Bergeyella zoohelcum (such as Bz ATCC 43767), Capnocytophaga canimorsus, Capnocytophaga cynodegmi, Chryseobacterium carnipullorum, Chryseobacterium jejuense, Chryseobacterium ureilyticum, Flavobacterium branchiophilum, Flavobacterium columnare, Flavobacterium sp. 316, Myroides odoratimimus (such as Mo CCUG 10230, Mo CCUG 12901, Mo CCUG 3837), Paludibacter propionicigenes, Phaeodactylibacter xiamenensis, Porphyromonas gingivalis (such as Pg F0185, Pg F0568, Pg JCVI SC001, Pg W4087, Porphyromonas gulae, Porphyromonas sp. COT-052 OH4946, Prevotella aurantiaca, Prevotella buccae (such as Pb ATCC 33574), Prevotella falsenii, Prevotella intermedia (such as Pi 17, Pi ZT), Prevotella pallens (such as Pp ATCC 700821), Prevotella pleuritidis, Prevotella saccharolytica (such as Ps F0055), Prevotella sp. MA2016, Prevotella sp. MSX73, Prevotella sp. P4-76, Prevotella sp. P5-119, Prevotella sp. P5-125, Prevotella sp. P5-60, Psychroflexus torquis, Reichenbachiella agariperforans, Riemerella anatipestifer, Sinomicrobium oceani, Fusobacterium necrophorum (such as Fn subsp. funduliforme ATCC 51357, Fn DJ-2, Fn BFTR-1, Fn subsp. funduliforme), Fusobacterium perfoetens (such as Fp ATCC 29250), Fusobacterium ulcerans (such as Fu ATCC 49185), Anaerosalibacter sp. ND1, Eubacterium siraeum, Ruminococcus flavefaciens (such as Rfx XPD3002), or Ruminococcus albus.

In certain embodiments, the Cas13 is Cas13a and originates from a species of the genus Bacteroides, Blautia, Butyrivibrio, Carnobacterium, Chloroflexus, Clostridium, Demequina, Eubacterium, Herbinix, Insolitispirillum, Lachnospiraceae, Leptotrichia, Listeria, Paludibacter, Porphyromonadaceae, Pseudobutyrivibrio, Rhodobacter, or Thalassospira.

In certain embodiments, the Cas13 is Cas13a and originates from Leptotrichia shahii, Listeria seeligeri, Lachnospiraceae bacterium (such as Lb MA2020, Lb NK4A179, Lb NK4A144), Clostridium aminophilum (such as Ca DSM 10710), Carnobacterium gallinarum (such as Cg DSM 4847), Paludibacter propionicigenes (such as Pp WB4), Listeria weihenstephanensis (such as Lw FSL R9-0317), Listeriaceae bacterium (such as Lb FSL M6-0635), Leptotrichia wadei (such as Lw F0279), Rhodobacter capsulatus (such as Rc SB 1003, Rc R121, Rc DE442), Leptotrichia buccalis (such as Lb C-1013-b), Herbinix hemicellulosilytica, Eubacteriaceae bacterium (such as Eb CHKCI004), Blautia. sp Marseille-P2398, Leptotrichia sp. oral taxon 879 str. F0557, Chloroflexus aggregans, Demequina aurantiaca, Thalassospira sp. TSL5-1, Pseudobutyrivibrio sp. OR37, Butyrivibrio sp. YAB3001, Leptotrichia sp. Marseille-P3007, Bacteroides ihuae, Porphyromonadaceae bacterium (such as Pb KH3CP3RA), Listeria riparia, or Insolitispirillum peregrinum.

In certain embodiments, the Cas13 is Cas13b and originates from a species of the genus Alistipes, Bacteroides, Bacteroidetes, Bergeyella, Capnocytophaga, Chryseobacterium, Flavobacterium, Myroides, Paludibacter, Phaeodactylibacter, Porphyromonas, Prevotella, Psychroflexus, Reichenbachiella, Riemerella, or Sinomicrobium.

In certain embodiments, the Cas13 is Cas13b and originates from Alistipes sp. ZOR0009, Bacteroides pyogenes (such as Bp F0041), Bacteroidetes bacterium (such as Bb GWA2_31_9), Bergeyella zoohelcum (such as Bz ATCC 43767), Capnocytophaga canimorsus, Capnocytophaga cynodegmi, Chryseobacterium carnipullorum, Chryseobacterium jejuense, Chryseobacterium ureilyticum, Flavobacterium branchiophilum, Flavobacterium columnare, Flavobacterium sp. 316, Myroides odoratimimus (such as Mo CCUG 10230, Mo CCUG 12901, Mo CCUG 3837), Paludibacter propionicigenes, Phaeodactylibacter xiamenensis, Porphyromonas gingivalis (such as Pg F0185, Pg F0568, Pg JCVI SC001, Pg W4087, Porphyromonas gulae, Porphyromonas sp. COT-052 OH4946, Prevotella aurantiaca, Prevotella buccae (such as Pb ATCC 33574), Prevotella falsenii, Prevotella intermedia (such as Pi 17, Pi ZT), Prevotella pallens (such as Pp ATCC 700821), Prevotella pleuritidis, Prevotella saccharolytica (such as Ps F0055), Prevotella sp. MA2016, Prevotella sp. MSX73, Prevotella sp. P4-76, Prevotella sp. P5-119, Prevotella sp. P5-125, Prevotella sp. P5-60, Psychroflexus torquis, Reichenbachiella agariperforans, Riemerella anatipestifer, or Sinomicrobium oceani. In some examples, the Cas13 is Riemerella anatipestifer Cas13b. In some examples, when the Cas13 is a dead Riemerella anatipestifer Cas13. In some examples, the Cas13 is Prevotella sp. P5-125. In some examples, the Cas13 is a dead Prevotella sp. P5-125.

In certain embodiments, the Cas13 is Cas13c and originates from a species of the genus Fusobacterium or Anaerosalibacter.

In certain embodiments, the Cas13 is Cas13c and originates from Fusobacterium necrophorum (such as Fn subsp. funduliforme ATCC 51357, Fn DJ-2, Fn BFTR-1, Fn subsp. funduliforme), Fusobacterium perfoetens (such as Fp ATCC 29250), Fusobacterium ulcerans (such as Fu ATCC 49185), or Anaerosalibacter sp. ND1.

In certain embodiments, the Cas13 is Cas13d and originates from a species of the genus Eubacterium or Ruminococcus.

In certain embodiments, the Cas13 is Cas13d and originates from Eubacterium siraeum, Ruminococcus flavefaciens (such as Rfx XPD3002), or Ruminococcus albus.

In certain embodiments, the invention provides an isolated Cas13 effector protein, comprising or consisting essentially of or consisting of or as set forth in Tables 1-4, and comprising one or more mutation as described herein elsewhere. A Tables 1-4 Cas13 effector protein is as discussed in more detail herein in conjunction with Tables 1-4. The invention provides an isolated nucleic acid encoding the Cas13 effector protein. In some embodiments of the invention the isolated nucleic acid comprises DNA sequence and further comprises a sequence encoding a crRNA. The invention provides an isolated eukaryotic cell comprising the nucleic acid encoding the Cas13 effector protein. Thus, herein, “Cas13 effector protein” or “effector protein” or “Cas” or “Cas protein” or “RNA targeting effector protein” or “RNA targeting protein” or like expressions is to be understood as including Cas13a, Cas13b, Cas13c, or Cas13d; expressions such as “RNA targeting CRISPR system” are to be understood as including Cas13a, Cas13b, Cas13c, or Cas13d CRISPR systems, and in certain embodiments can be read as a Tables 1-4 Cas13 effector protein CRISPR system; and references to guide RNA or sgRNA are to be read in conjunction with the herein-discussion of the Cas13 system crRNA, e.g., that which is sgRNA in other systems may be considered as or akin to crRNA in the instant invention.

The invention provides a method of identifying the requirements of a suitable guide sequence for the Cas13 effector protein of the invention (e.g., Tables 1-4), said method comprising:

(a) selecting a set of essential genes within an organism

(b) designing a library of targeting guide sequences capable of hybridizing to regions the coding regions of these genes as well as 5′ and 3′ UTRs of these genes

(c) generating randomized guide sequences that do not hybridize to any region within the genome of said organism as control guides

(d) preparing a plasmid comprising the RNA-targeting protein and a first resistance gene and a guide plasmid library comprising said library of targeting guides and said control guides and a second resistance gene,

(e) co-introducing said plasmids into a host cell

(f) introducing said host cells on a selective medium for said first and second resistance genes

(g) sequencing essential genes of growing host cells

(h) determining significance of depletion of cells transformed with targeting guides by comparing depletion of cells with control guides; and

(i) determining based on the depleted guide sequences the requirements of a suitable guide sequence.

In one aspect of such method, determining the PFS sequence for suitable guide sequence of the RNA-targeting protein is by comparison of sequences targeted by guides in depleted cells. In one aspect of such method, the method further comprises comparing the guide abundance for the different conditions in different replicate experiments. In one aspect of such method, the control guides are selected in that they are determined to show limited deviation in guide depletion in replicate experiments. In one aspect of such method, the significance of depletion is determined as (a) a depletion which is more than the most depleted control guide; or (b) a depletion which is more than the average depletion plus two times the standard deviation for the control guides. In one aspect of such method, the host cell is a bacterial host cell. In one aspect of such method, the step of co-introducing the plasmids is by electroporation and the host cell is an electro-competent host cell.

The invention provides a method of modifying sequences associated with or at a target locus of interest, the method comprising delivering to said locus a non-naturally occurring or engineered composition comprising a Cas13 effector protein and one or more nucleic acid components, wherein the effector protein forms a complex with the one or more nucleic acid components and upon binding of the said complex to the locus of interest the effector protein induces the modification of the sequences associated with or at the target locus of interest. In a preferred embodiment, the modification is the introduction of a strand break. In a preferred embodiment, the sequences associated with or at the target locus of interest comprises RNA or consists of RNA.

The invention provides a method of modifying sequences associated with or at a target locus of interest, the method comprising delivering to said locus a non-naturally occurring or engineered composition comprising a Cas13 effector protein, optionally a small accessory protein, and one or more nucleic acid components, wherein the effector protein forms a complex with the one or more nucleic acid components and upon binding of the said complex to the locus of interest the effector protein induces the modification of the sequences associated with or at the target locus of interest. In a preferred embodiment, the modification is the introduction of a strand break. In a preferred embodiment, the sequences associated with or at the target locus of interest comprises RNA or consists of RNA.

The invention provides a method of modifying sequences associated with or at a target locus of interest, the method comprising delivering to said sequences associated with or at the locus a non-naturally occurring or engineered composition comprising a Cas13 loci effector protein and one or more nucleic acid components, wherein the Cas13 effector protein forms a complex with the one or more nucleic acid components and upon binding of the said complex to the locus of interest the effector protein induces the modification of sequences associated with or at the target locus of interest. In a preferred embodiment, the modification is the introduction of a strand break. In a preferred embodiment the Cas13 effector protein forms a complex with one nucleic acid component; advantageously an engineered or non-naturally occurring nucleic acid component. The induction of modification of sequences associated with or at the target locus of interest can be Cas13 effector protein-nucleic acid guided. In a preferred embodiment the one nucleic acid component is a CRISPR RNA (crRNA). In a preferred embodiment the one nucleic acid component is a mature crRNA or guide RNA, wherein the mature crRNA or guide RNA comprises a spacer sequence (or guide sequence) and a direct repeat (DR) sequence or derivatives thereof. In a preferred embodiment the spacer sequence or the derivative thereof comprises a seed sequence, wherein the seed sequence is critical for recognition and/or hybridization to the sequence at the target locus. In a preferred embodiment of the invention the crRNA is a short crRNA that may be associated with a short DR sequence. In another embodiment of the invention the crRNA is a long crRNA that may be associated with a long DR sequence (or dual DR). Aspects of the invention relate to Cas13 effector protein complexes having one or more non-naturally occurring or engineered or modified or optimized nucleic acid components. In a preferred embodiment the nucleic acid component comprises RNA. In a preferred embodiment the nucleic acid component of the complex may comprise a guide sequence linked to a direct repeat sequence, wherein the direct repeat sequence comprises one or more stem loops or optimized secondary structures. In preferred embodiments of the invention, the direct repeat may be a short DR or a long DR (dual DR). In a preferred embodiment the direct repeat may be modified to comprise one or more protein-binding RNA aptamers. In a preferred embodiment, one or more aptamers may be included such as part of optimized secondary structure. Such aptamers may be capable of binding a bacteriophage coat protein. The bacteriophage coat protein may be selected from the group comprising Qβ, F2, GA, fr, JP501, MS2, M12, R17, BZ13, JP34, JP500, KU1, M11, MX1, TW18, VK, SP, FI, ID2, NL95, TW19, AP205, ϕCb5, ϕCb8r, ϕCb12r, ϕCb23r, 7s and PRR1. In a preferred embodiment the bacteriophage coat protein is MS2. The invention also provides for the nucleic acid component of the complex being 30 or more, 40 or more or 50 or more nucleotides in length.

The invention provides methods of genome editing or modifying sequences associated with or at a target locus of interest wherein the method comprises introducing a Cas13 complex into any desired cell type, prokaryotic or eukaryotic cell, whereby the Cas13 effector protein complex effectively functions to interfere with RNA in the eukaryotic or prokaryotic cell. In preferred embodiments, the cell is a eukaryotic cell and the RNA is transcribed from a mammalian genome or is present in a mammalian cell. In preferred methods of RNA editing or genome editing in human cells, the Cas13 effector proteins may include but are not limited to the specific species of Cas13 effector proteins disclosed herein.

The invention also provides a method of modifying a target locus of interest, the method comprising delivering to said locus a non-naturally occurring or engineered composition comprising a Cas13 effector protein and one or more nucleic acid components, wherein the Cas13 effector protein forms a complex with the one or more nucleic acid components and upon binding of the said complex to the locus of interest the effector protein induces the modification of the target locus of interest. In a preferred embodiment, the modification is the introduction of a strand break.

In such methods the target locus of interest may be comprised within a RNA molecule. In such methods the target locus of interest may be comprised in a RNA molecule in vitro.

In such methods the target locus of interest may be comprised in a RNA molecule within a cell. The cell may be a prokaryotic cell or a eukaryotic cell. The cell may be a mammalian cell. The modification introduced to the cell by the present invention may be such that the cell and progeny of the cell are altered for improved production of biologic products such as an antibody, starch, alcohol or other desired cellular output. The modification introduced to the cell by the present invention may be such that the cell and progeny of the cell include an alteration that changes the biologic product produced.

The mammalian cell many be a non-human mammal, e.g., primate, bovine, ovine, porcine, canine, rodent, Leporidae such as monkey, cow, sheep, pig, dog, rabbit, rat or mouse cell. The cell may be a non-mammalian eukaryotic cell such as poultry bird (e.g., chicken), vertebrate fish (e.g., salmon) or shellfish (e.g., oyster, claim, lobster, shrimp) cell. The cell may also be a plant cell. The plant cell may be of a monocot or dicot or of a crop or grain plant such as cassava, corn, sorghum, soybean, wheat, oat or rice. The plant cell may also be of an algae, tree or production plant, fruit or vegetable (e.g., trees such as citrus trees, e.g., orange, grapefruit or lemon trees; peach or nectarine trees; apple or pear trees; nut trees such as almond or walnut or pistachio trees; nightshade plants; plants of the genus Brassica; plants of the genus Lectica; plants of the genus Spinalis; plants of the genus Capsicum; cotton, tobacco, asparagus, carrot, cabbage, broccoli, cauliflower, tomato, eggplant, pepper, lettuce, spinach, strawberry, blueberry, raspberry, blackberry, grape, coffee, cocoa, etc).

The invention provides a method of modifying a target locus of interest, the method comprising delivering to said locus a non-naturally occurring or engineered composition comprising a Cas13 effector protein and one or more nucleic acid components, wherein the effector protein forms a complex with the one or more nucleic acid components and upon binding of the said complex to the locus of interest the effector protein induces the modification of the target locus of interest. In a preferred embodiment, the modification is the introduction of a strand break.

In such methods the target locus of interest may be comprised within an RNA molecule. In a preferred embodiment, the target locus of interest comprises or consists of RNA.

The invention also provides a method of modifying a target locus of interest, the method comprising delivering to said locus a non-naturally occurring or engineered composition comprising a Cas13 effector protein and one or more nucleic acid components, wherein the Cas13 effector protein forms a complex with the one or more nucleic acid components and upon binding of the said complex to the locus of interest the effector protein induces the modification of the target locus of interest. In a preferred embodiment, the modification is the introduction of a strand break.

Preferably, in such methods the target locus of interest may be comprised in a RNA molecule in vitro. Also preferably, in such methods the target locus of interest may be comprised in a RNA molecule within a cell. The cell may be a prokaryotic cell or a eukaryotic cell. The cell may be a mammalian cell. The cell may be a rodent cell. The cell may be a mouse cell.

In any of the described methods the target locus of interest may be a genomic or epigenomic locus of interest. In any of the described methods the complex may be delivered with multiple guides for multiplexed use. In any of the described methods more than one protein(s) may be used.

In further aspects of the invention the nucleic acid components may comprise a CRISPR RNA (crRNA) sequence. As the effector protein is a Cas13 effector protein, the nucleic acid components may comprise a CRISPR RNA (crRNA) sequence and generally may not comprise any trans-activating crRNA (tracr RNA) sequence.

In any of the described methods the effector protein and nucleic acid components may be provided via one or more polynucleotide molecules encoding the protein and/or nucleic acid component(s), and wherein the one or more polynucleotide molecules are operably configured to express the protein and/or the nucleic acid component(s). The one or more polynucleotide molecules may comprise one or more regulatory elements operably configured to express the protein and/or the nucleic acid component(s). The one or more polynucleotide molecules may be comprised within one or more vectors. In any of the described methods the target locus of interest may be a genomic, epigenomic, or transcriptomic locus of interest. In any of the described methods the complex may be delivered with multiple guides for multiplexed use. In any of the described methods more than one protein(s) may be used.

In any of the described methods the strand break may be a single strand break or a double strand break. In preferred embodiments the double strand break may refer to the breakage of two sections of RNA, such as the two sections of RNA formed when a single strand RNA molecule has folded onto itself or putative double helices that are formed with an RNA molecule which contains self-complementary sequences allows parts of the RNA to fold and pair with itself.

Regulatory elements may comprise inducible promotors. Polynucleotides and/or vector systems may comprise inducible systems.

In any of the described methods the one or more polynucleotide molecules may be comprised in a delivery system, or the one or more vectors may be comprised in a delivery system.

In any of the described methods the non-naturally occurring or engineered composition may be delivered via liposomes, particles including nanoparticles, exosomes, microvesicles, a gene-gun or one or more viral vectors.

The invention also provides a non-naturally occurring or engineered composition which is a composition having the characteristics as discussed herein or defined in any of the herein described methods.

In certain embodiments, the invention thus provides a non-naturally occurring or engineered composition, such as particularly a composition capable of or configured to modify a target locus of interest, said composition comprising a Cas13 effector protein and one or more nucleic acid components, wherein the effector protein forms a complex with the one or more nucleic acid components and upon binding of the said complex to the locus of interest the effector protein induces the modification of the target locus of interest. In certain embodiments, the effector protein may be a Cas13a, Cas13b, Cas13c, or Cas13d effector protein, preferably a Cas13b effector protein.

The invention also provides in a further aspect a non-naturally occurring or engineered composition, such as particularly a composition capable of or configured to modify a target locus of interest, said composition comprising: (a) a guide RNA molecule (or a combination of guide RNA molecules, e.g., a first guide RNA molecule and a second guide RNA molecule) or a nucleic acid encoding the guide RNA molecule (or one or more nucleic acids encoding the combination of guide RNA molecules); (b) a Cas13 effector protein. In certain embodiments, the effector protein may be a Cas13b effector protein.

The invention also provides in a further aspect a non-naturally occurring or engineered composition comprising: (I.) one or more CRISPR-Cas system polynucleotide sequences comprising (a) a guide sequence capable of hybridizing to a target sequence in a polynucleotide locus, (b) a tracr mate (i.e. direct repeat) sequence, and (II.) a second polynucleotide sequence encoding a Cas13 effector protein, wherein when transcribed, the guide sequence directs sequence-specific binding of a CRISPR complex to the target sequence, and wherein the CRISPR complex comprises the Cas13 effector protein complexed with the guide sequence that is hybridized to the target sequence. In certain embodiments, the effector protein may be a Cas13b effector protein.

In certain embodiments, a tracrRNA may not be required. Hence, the invention also provides in certain embodiments a non-naturally occurring or engineered composition comprising: (I.) one or more CRISPR-Cas system polynucleotide sequences comprising (a) a guide sequence capable of hybridizing to a target sequence in a polynucleotide locus, and (b) a direct repeat sequence, and (II.) a second polynucleotide sequence encoding a Cas13 effector protein, wherein when transcribed, the guide sequence directs sequence-specific binding of a CRISPR complex to the target sequence, and wherein the CRISPR complex comprises the Cas13 effector protein complexed with (1) the guide sequence that is hybridized to the target sequence, and (2) the direct repeat sequence. Preferably, the effector protein may be a Cas13b effector protein. Without limitation, the Applicants hypothesize that in such instances, the direct repeat sequence may comprise secondary structure that is sufficient for crRNA loading onto the effector protein. By means of example and not limitation, such secondary structure may comprise, consist essentially of or consist of a stem loop (such as one or more stem loops) within the direct repeat.

The invention also provides a vector system comprising one or more vectors, the one or more vectors comprising one or more polynucleotide molecules encoding components of a non-naturally occurring or engineered composition which is a composition having the characteristics as defined in any of the herein described methods.

The invention also provides a delivery system comprising one or more vectors or one or more polynucleotide molecules, the one or more vectors or polynucleotide molecules comprising one or more polynucleotide molecules encoding components of a non-naturally occurring or engineered composition which is a composition having the characteristics discussed herein or as defined in any of the herein described methods.

The invention also provides a non-naturally occurring or engineered composition, or one or more polynucleotides encoding components of said composition, or vector or delivery systems comprising one or more polynucleotides encoding components of said composition for use in a therapeutic method of treatment. The therapeutic method of treatment may comprise gene or genome editing, or gene therapy.

The invention also provides for methods and compositions wherein one or more amino acid residues of the effector protein may be modified e.g., an engineered or non-naturally-occurring Cas13 effector protein of or comprising or consisting or consisting essentially a Tables 1-4 protein. In an embodiment, the modification may comprise mutation of one or more amino acid residues of the effector protein. The one or more mutations may be in one or more catalytically active domains of the effector protein. The effector protein may have reduced or abolished nuclease activity compared with an effector protein lacking said one or more mutations. The effector protein may not direct cleavage of one RNA strand at the target locus of interest. In a preferred embodiment, the one or more mutations may comprise two mutations. In a preferred embodiment the one or more amino acid residues are modified in the Cas13 effector protein, e.g., an engineered or non-naturally-occurring Cas13 effector protein. In certain embodiments of the invention the effector protein comprises one or more HEPN domains. In a preferred embodiment, the effector protein comprises two HEPN domains. In another preferred embodiment, the effector protein comprises one HEPN domain at the C-terminus and another HEPN domain at the N-terminus of the protein. In certain embodiments, the one or more mutations or the two or more mutations may be in a catalytically active domain of the effector protein comprising a HEPN domain, or a catalytically active domain which is homologous to a HEPN domain. In certain embodiments, the effector protein comprises one or more of the following mutations: R116A, H121A, R1177A, H1182A (wherein amino acid positions correspond to amino acid positions of Group 29 protein originating from Bergeyella zoohelcum ATCC 43767). The skilled person will understand that corresponding amino acid positions in different Cas13 proteins may be mutated to the same effect. In certain embodiments, one or more mutations abolish catalytic activity of the protein completely or partially (e.g. altered cleavage rate, altered specificity, etc.) In certain embodiments, the effector protein as described herein is a “dead” effector protein, such as a dead Cas13 effector protein (i.e. dCas13b). In certain embodiments, the effector protein has one or more mutations in HEPN domain 1. In certain embodiments, the effector protein has one or more mutations in HEPN domain 2. In certain embodiments, the effector protein has one or more mutations in HEPN domain 1 and HEPN domain 2. The effector protein may comprise one or more heterologous functional domains. The one or more heterologous functional domains may comprise one or more nuclear localization signal (NLS) domains. The one or more heterologous functional domains may comprise at least two or more NLS domains. The one or more NLS domain(s) may be positioned at or near or in proximity to a terminus of the effector protein (e.g., Cas13b effector protein) and if two or more NLSs, each of the two may be positioned at or near or in proximity to a terminus of the effector protein (e.g., Cas13 effector protein). The one or more heterologous functional domains may comprise one or more transcriptional activation domains. In a preferred embodiment the transcriptional activation domain may comprise VP64. The one or more heterologous functional domains may comprise one or more transcriptional repression domains. In a preferred embodiment the transcriptional repression domain comprises a KRAB domain or a SID domain (e.g. SID4X). The one or more heterologous functional domains may comprise one or more nuclease domains. In a preferred embodiment a nuclease domain comprises FokI.

The invention also provides for the one or more heterologous functional domains to have one or more of the following activities: methylase activity, demethylase activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, nuclease activity, single-strand RNA cleavage activity, double-strand RNA cleavage activity, single-strand DNA cleavage activity, double-strand DNA cleavage activity and nucleic acid binding activity. At least one or more heterologous functional domains may be at or near the amino-terminus of the effector protein and/or wherein at least one or more heterologous functional domains is at or near the carboxy-terminus of the effector protein. The one or more heterologous functional domains may be fused to the effector protein. The one or more heterologous functional domains may be tethered to the effector protein. The one or more heterologous functional domains may be linked to the effector protein by a linker moiety.

In certain embodiments, the Cas13 effector proteins as intended herein may be associated with a locus comprising short CRISPR repeats between 30 and 40 bp long, more typically between 34 and 38 bp long, even more typically between 36 and 37 bp long, e.g., 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 bp long. In certain embodiments the CRISPR repeats are long or dual repeats between 80 and 350 bp long such as between 80 and 200 bp long, even more typically between 86 and 88 bp long, e.g., 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, or 90 bp long

In certain embodiments, a protospacer adjacent motif (PAM) or PAM-like motif directs binding of the effector protein (e.g. a Cas13 effector protein) complex as disclosed herein to the target locus of interest. In some embodiments, the PAM may be a 5′ PAM (i.e., located upstream of the 5′ end of the protospacer). In other embodiments, the PAM may be a 3′ PAM (i.e., located downstream of the 5′ end of the protospacer). In other embodiments, both a 5′ PAM and a 3′ PAM are required. In certain embodiments of the invention, a PAM or PAM-like motif may not be required for directing binding of the effector protein (e.g. a Cas13 effector protein). In certain embodiments, a 5′ PAM is D (e.g., A, G, or U). In certain embodiments, a 5′ PAM is D for Cas13b effectors. In certain embodiments of the invention, cleavage at repeat sequences may generate crRNAs (e.g. short or long crRNAs) containing a full spacer sequence flanked by a short nucleotide (e.g. 5, 6, 7, 8, 9, or 10 nt or longer if it is a dual repeat) repeat sequence at the 5′ end (this may be referred to as a crRNA “tag”) and the rest of the repeat at the 3′ end. In certain embodiments, targeting by the effector proteins described herein may require the lack of homology between the crRNA tag and the target 5′ flanking sequence. This requirement may be similar to that described further in Samai et al. “Co-transcriptional DNA and RNA Cleavage during Type III CRISPR-Cas Immunity” Cell 161, 1164-1174, May 21, 2015, where the requirement is thought to distinguish between bona fide targets on invading nucleic acids from the CRISPR array itself, and where the presence of repeat sequences will lead to full homology with the crRNA tag and prevent autoimmunity.

In certain embodiments, Cas13 effector protein is engineered and can comprise one or more mutations that reduce or eliminate nuclease activity, thereby reducing or eliminating RNA interfering activity. Mutations can also be made at neighboring residues, e.g., at amino acids near those that participate in the nuclease activity. In some embodiments, one or more putative catalytic nuclease domains are inactivated and the effector protein complex lacks cleavage activity and functions as an RNA binding complex. In a preferred embodiment, the resulting RNA binding complex may be linked with one or more functional domains as described herein.

In certain embodiments, the one or more functional domains are controllable, i.e. inducible.

In certain embodiments of the invention, the guide RNA or mature crRNA comprises, consists essentially of, or consists of a direct repeat sequence and a guide sequence or spacer sequence. In certain embodiments, the guide RNA or mature crRNA comprises, consists essentially of, or consists of a direct repeat sequence linked to a guide sequence or spacer sequence. In preferred embodiments of the invention, the mature crRNA comprises a stem loop or an optimized stem loop structure or an optimized secondary structure. In preferred embodiments the mature crRNA comprises a stem loop or an optimized stem loop structure in the direct repeat sequence, wherein the stem loop or optimized stem loop structure is important for cleavage activity. In certain embodiments, the mature crRNA preferably comprises a single stem loop. In certain embodiments, the direct repeat sequence preferably comprises a single stem loop. In certain embodiments, the cleavage activity of the effector protein complex is modified by introducing mutations that affect the stem loop RNA duplex structure. In preferred embodiments, mutations which maintain the RNA duplex of the stem loop may be introduced, whereby the cleavage activity of the effector protein complex is maintained. In other preferred embodiments, mutations which disrupt the RNA duplex structure of the stem loop may be introduced, whereby the cleavage activity of the effector protein complex is completely abolished.

The CRISPR system as provided herein can make use of a crRNA or analogous polynucleotide comprising a guide sequence, wherein the polynucleotide is an RNA, a DNA or a mixture of RNA and DNA, and/or wherein the polynucleotide comprises one or more nucleotide analogs. The sequence can comprise any structure, including but not limited to a structure of a native crRNA, such as a bulge, a hairpin or a stem loop structure. In certain embodiments, the polynucleotide comprising the guide sequence forms a duplex with a second polynucleotide sequence which can be an RNA or a DNA sequence.

The present disclosure also provides cells, tissues, organisms comprising the engineered CRISPR-Cas protein, the CRISPR-Cas systems, the polynucleotides encoding one or more components of the CRISPR-Cas systems, and/or vectors comprising the polynucleotides. The invention also provides for the nucleotide sequence encoding the effector protein being codon optimized for expression in a eukaryote or eukaryotic cell in any of the herein described methods or compositions. In an embodiment of the invention, the codon optimized effector protein is any Cas13 effector protein discussed herein and is codon optimized for operability in a eukaryotic cell or organism, e.g., such cell or organism as elsewhere herein mentioned, for instance, without limitation, a yeast cell, or a mammalian cell or organism, including a mouse cell, a rat cell, and a human cell or non-human eukaryote organism, e.g., plant.

In certain embodiments of the invention, at least one nuclear localization signal (NLS) is attached to the nucleic acid sequences encoding the Cas13 effector proteins. In preferred embodiments at least one or more C-terminal or N-terminal NLSs are attached (and hence nucleic acid molecule(s) coding for the Cas13 effector protein can include coding for NLS(s) so that the expressed product has the NLS(s) attached or connected). In a preferred embodiment a C-terminal NLS is attached for optimal expression and nuclear targeting in eukaryotic cells, preferably human cells. The invention also encompasses methods for delivering multiple nucleic acid components, wherein each nucleic acid component is specific for a different target locus of interest thereby modifying multiple target loci of interest. The nucleic acid component of the complex may comprise one or more protein-binding RNA aptamers. The one or more aptamers may be capable of binding a bacteriophage coat protein.

In a further aspect, the invention provides a eukaryotic cell comprising a modified target locus of interest, wherein the target locus of interest has been modified according to in any of the herein described methods. A further aspect provides a cell line of said cell. Another aspect provides a multicellular organism comprising one or more said cells.

In certain embodiments, the modification of the target locus of interest may result in: the eukaryotic cell comprising altered expression of at least one gene product; the eukaryotic cell comprising altered expression of at least one gene product, wherein the expression of the at least one gene product is increased; the eukaryotic cell comprising altered expression of at least one gene product, wherein the expression of the at least one gene product is decreased; or the eukaryotic cell comprising an edited genome.

In certain embodiments, the eukaryotic cell may be a mammalian cell or a human cell.

In further embodiments, the non-naturally occurring or engineered compositions, the vector systems, or the delivery systems as described in the present specification may be used for: site-specific gene knockout; site-specific genome editing; RNA sequence-specific interference; or multiplexed genome engineering.

Also provided is a gene product from the cell, the cell line, or the organism as described herein. In certain embodiments, the amount of gene product expressed may be greater than or less than the amount of gene product from a cell that does not have altered expression or edited genome. In certain embodiments, the gene product may be altered in comparison with the gene product from a cell that does not have altered expression or edited genome.

In another aspect, the invention provides a method for identifying novel nucleic acid modifying effectors, comprising: identifying putative nucleic acid modifying loci from a set of nucleic acid sequences encoding the putative nucleic acid modifying enzyme loci that are within a defined distance from a conserved genomic element of the loci, that comprise at least one protein above a defined size limit, or both; grouping the identified putative nucleic acid modifying loci into subsets comprising homologous proteins; identifying a final set of candidate nucleic acid modifying loci by selecting nucleic acid modifying loci from one or more subsets based on one or more of the following; subsets comprising loci with putative effector proteins with low domain homology matches to known protein domains relative to loci in other subsets, subsets comprising putative proteins with minimal distances to the conserved genomic element relative to loci in other subsets, subsets with loci comprising large effector proteins having a same orientations as putative adjacent accessory proteins relative to large effector proteins in other subsets, subset comprising putative effector proteins with lower existing nucleic acid modifying classifications relative to other loci, subsets comprising loci with a lower proximity to known nucleic acid modifying loci relative to other subsets, and total number of candidate loci in each subset.

In one embodiment, the set of nucleic acid sequences is obtained from a genomic or metagenomic database, such as a genomic or metagenomic database comprising prokaryotic genomic or metagenomic sequences.

In one embodiment, the defined distance from the conserved genomic element is between 1 kb and 25 kb.

In one embodiment, the conserved genomic element comprises a repetitive element, such as a CRISPR array. In a specific embodiment, the defined distance from the conserved genomic element is within 10 kb of the CRISPR array.

In one embodiment, the defined size limit of a protein comprised within the putative nucleic acid modifying (effector) locus is greater than 200 amino acids, or more particularly, the defined size limit is greater than 700 amino acids. In one embodiment, the putative nucleic acid modifying locus is between 900 to 1800 amino acids.

In one embodiment, the conserved genomic elements are identified using a repeat or pattern finding analysis of the set of nucleic acids, such as PILER-CR.

In one embodiment, the grouping step of the method described herein is based, at least in part, on results of a domain homology search or an HHpred protein domain homology search.

In one embodiment, the defined threshold is a BLAST nearest-neighbor cut-off value of 0 to le-7.

In one embodiment, the method described herein further comprises a filtering step that includes only loci with putative proteins between 900 and 1800 amino acids.

In one embodiment, the method described herein further comprises experimental validation of the nucleic acid modifying function of the candidate nucleic acid modifying effectors comprising generating a set of nucleic acid constructs encoding the nucleic acid modifying effectors and performing one or more biochemical validation assays, such as through the use of PAM validation in bacterial colonies, in vitro cleavage assays, the Surveyor method, experiments in mammalian cells, PFS validation, or a combination thereof.

In one embodiment, the method described herein further comprises preparing a non-naturally occurring or engineered composition comprising one or more proteins from the identified nucleic acid modifying loci.

In one embodiment, the identified loci comprise a Class 2 CRISPR effector, or the identified loci lack Cas1 or Cas2, or the identified loci comprise a single effector.

In one embodiment, the single large effector protein is greater than 900, or greater than 1100 amino acids in length, or comprises at least one HEPN domain.

In one embodiment, the at least one HEPN domain is near a N- or C-terminus of the effector protein, or is located in an interior position of the effector protein.

In one embodiment, the single large effector protein comprises a HEPN domain at the N- and C-terminus and two HEPN domains internal to the protein.

In one embodiment, the identified loci further comprise one or two small putative accessory proteins within 2 kb to 10 kb of the CRISPR array.

In one embodiment, a small accessory protein is less than 700 amino acids. In one embodiment, the small accessory protein is from 50 to 300 amino acids in length.

In one embodiment, the small accessory protein comprises multiple predicted transmembrane domains, or comprises four predicted transmembrane domains, or comprises at least one HEPN domain.

In one embodiment, the small accessory protein comprises at least one HEPN domain and at least one transmembrane domain.

In one embodiment, the loci comprise no additional proteins out to 25 kb from the CRISPR array.

In one embodiment, the CRISPR array comprises direct repeat sequences comprising about 36 nucleotides in length. In a specific embodiment, the direct repeat comprises a GTTG/GUUG at the 5′ end that is reverse complementary to a CAAC at the 3′ end.

In one embodiment, the CRISPR array comprises spacer sequences comprising about 30 nucleotides in length.

In one embodiment, the identified loci lack a small accessory protein.

The invention provides a method of identifying novel CRISPR effectors, comprising: a) identifying sequences in a genomic or metagenomic database encoding a CRISPR array; b) identifying one or more Open Reading Frames (ORFs) in said selected sequences within 10 kb of the CRISPR array; c) selecting loci based on the presence of a putative CRISPR effector protein between 900-1800 amino acids in size, d) selecting loci encoding a putative accessory protein of 50-300 amino acids; and e) identifying loci encoding a putative CRISPR effector and CRISPR accessory proteins and optionally classifying them based on structure analysis.

In one embodiment, the CRISPR effector is a Type VI CRISPR effector. In an embodiment, step (a) comprises i) comparing sequences in a genomic and/or metagenomic database with at least one pre-identified seed sequence that encodes a CRISPR array, and selecting sequences comprising said seed sequence; or ii) identifying CRISPR arrays based on a CRISPR algorithm.

In an embodiment, step (d) comprises identifying nuclease domains. In an embodiment, step (d) comprises identifying RuvC, HPN, and/or HEPN domains.

In an embodiment, no ORF encoding Cas1 or Cas2 is present within 10 kb of the CRISPR array

In an embodiment, an ORF in step (b) encodes a putative accessory protein of 50-300 amino acids.

In an embodiment, putative novel CRISPR effectors obtained in step (d) are used as seed sequences for further comparing genomic and/or metagenomics sequences and subsequent selecting loci of interest as described in steps a) to d) of claim 1. In an embodiment, the pre-identified seed sequence is obtained by a method comprising: (a) identifying CRISPR motifs in a genomic or metagenomic database, (b) extracting multiple features in said identified CRISPR motifs, (c) classifying the CRISPR loci using unsupervised learning, (d) identifying conserved locus elements based on said classification, and (e) selecting therefrom a putative CRISPR effector suitable as seed sequence.

In an embodiment, the features include protein elements, repeat structure, repeat sequence, spacer sequence and spacer mapping. In an embodiment, the genomic and metagenomic databases are bacterial and/or archaeal genomes. In an embodiment, the genomic and metagenomic sequences are obtained from the Ensembl and/or NCBI genome databases. In an embodiment, the structure analysis in step (d) is based on secondary structure prediction and/or sequence alignments. In an embodiment, step (d) is achieved by clustering of the remaining loci based on the proteins they encode and manual curation of the obtained clusters. n another aspect, the disclosure provides a mutated Cas13 protein comprising one or more mutations of amino acids, wherein the amino acids: interact with a guide RNA that forms a complex with the mutated Cas 13 protein; or are in a HEPN active site, a lid domain which is a domain that caps the 3′ end of the crRNA with two beta hairpins (see, e.g., FIG. 1, FIG. 18), a helical domain, selected from a helical 1 or a helical 2 domain, an inter-domain linker (IDL) domain, or a bridge helix domain of the engineered Cas 13 protein. In certain embodiments the helical domain 1 is helical domain 1-1, 1-2 or 1-3. In embodiments helical domain 2 is helical domain 2-1 or 2-2. In one aspect, the engineered Cas13 protein has a higher protease activity or polynucleotide-binding capability compared with a naturally-occurring counterpart Cas13 protein.

In some embodiments, the Cas13 protein is Cas13a, Cas13b, Cas13c, or Cas13d. In some embodiments, the Cas13 protein is Cas13b. In some embodiments, the amino acids interact with the guide RNA that forms a complex with the mutated Cas 13 protein. In some embodiments, the amino acids correspond to the following amino acids of Prevotella buccae Cas13b (PbCas13b): T405, H407, K457, H500, K570, K590, N634, R638, N652, N653, K655, S658, K741, K744, N756, S757, R762, R791, K846, K857, K870, and R877. In some embodiments, the amino acids are in a HEPN active site. In some embodiments, the amino acids correspond to the following amino acids of Prevotella buccae Cas13b (PbCas13b): amino acids 46-57, 73-79, 152-164, 1036-1046, and 1064-1074. In some embodiments, the amino acids correspond to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R156, N157, H161, R1068, N1069, and H1073. In some embodiments, the amino acids are in the inter-domain linker domain of the mutated Cas 13 protein. In some embodiments, the amino acids correspond to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R285, R287, K292, K294, E296, and N297. In some embodiments, the amino acids are in the bridge helix domain of the mutated Cas 13 protein. In some embodiments, the amino acids correspond to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K826, K828, K829, R824, R830, Q831, K835, K836, and R838.

In another aspect, the disclosure provides a method of altering activity of a Cas13 protein, comprising: identifying one or more candidate amino acids in the Cas13 protein based on a three-dimensional structure of at least a portion of the Cas 13 protein, wherein the one or more candidate amino acids interact with a guide RNA that forms a complex with the Cas13 protein, or are in a HEPN active site, an inter-domain linker domain, or a bridge helix domain of the Cas13 protein; and mutating the one or more candidate amino acids thereby generating a mutated Cas13 protein, wherein activity the mutated Cas13 protein is different than the Cas13 protein.

In some embodiments, the Cas13 protein is Cas13a, Cas13b, Cas13c, or Cas13d. In some embodiments, the Cas13 protein is Cas13b. In some embodiments, the amino acids interact with the guide RNA that forms a complex with the mutated Cas 13 protein. In some embodiments, the amino acids correspond to the following amino acids of Prevotella buccae Cas13b (PbCas13b): T405, H407, K457, H500, K570, K590, N634, R638, N652, N653, K655, S658, K741, K744, N756, S757, R762, R791, K846, K857, K870, and R877. In some embodiments, the amino acids are in a HEPN active site. In some embodiments, the amino acids correspond to the following amino acids of Prevotella buccae Cas13b (PbCas13b): amino acids 46-57, 73-79, 152-164, 1036-1046, and 1064-1074. In some embodiments, the amino acids correspond to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R156, N157, H161, R1068, N1069, and H1073. In some embodiments, the amino acids are in the inter-domain linker domain of the mutated Cas 13 protein. In some embodiments, the amino acids correspond to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R285, R287, K292, K294, E296, and N297. In some embodiments, the amino acids are in the bridge helix domain of the mutated Cas 13 protein. In some embodiments, the amino acids correspond to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K826, K828, K829, R824, R830, Q831, K835, K836, and R838.

In some embodiments, the Cas13 protein is Cas13b. In some embodiments, the Cas13b is a Cas13 ortholog smaller in size than Cas13 systems discovered to date. In some embodiments, the Cas 13b is Cas13b-t1, Cas13b-t1a, Cas13b-t2, or Cas13b-t3. In some embodiments, the Cas13b is Cas13b-t1. In some embodiments, the Cas13b is Cas13b-t1a. In some embodiments, the Cas13b is Cas13b-t2. In some embodiments, the Cas13b is Cas13b-t3. CAS13 ORTHOLOGS

The terms “orthologue” (also referred to as “ortholog” herein) and “homologue” (also referred to as “homolog” herein) are well known in the art. By means of further guidance, a “homologue” of a protein as used herein is a protein of the same species which performs the same or a similar function as the protein it is a homologue of. Homologous proteins may but need not be structurally related, or are only partially structurally related. An “orthologue” of a protein as used herein is a protein of a different species which performs the same or a similar function as the protein it is an orthologue of. Orthologous proteins may but need not be structurally related, or are only partially structurally related. In particular embodiments, the homologue or orthologue of a Cas13 protein as referred to herein has a sequence homology or identity of at least 60%, preferably at least 70%, preferably at least 80%, more preferably at least 85%, even more preferably at least 90%, such as for instance at least 95% with a Cas13 effector protein set forth in Tables 1-4, below. In a preferred embodiment, the Cas13b effector protein may be of or from an organism identified in Tables 1-4 or the genus to which the organism belongs.

It has been found that a number of Cas13 orthologs are characterized by common motifs. Accordingly, in particular embodiments, the Cas13b effector protein is a protein comprising a sequence having at least 70% sequence identity with one or more of the sequences consisting of DKHXFGAFLNLARHN (SEQ ID NO:96), GLLFFVSLFLDK (SEQ ID NO:97), SKIXGFK (SEQ ID NO:98), DMLNELXRCP (SEQ ID NO:99), RXZDRFPYFALRYXD (SEQ ID NO:100) and LRFQVBLGXY (SEQ ID NO:101). In further particular embodiments, the Cas13b effector protein comprises a sequence having at least 70% sequence identity at least 2, 3, 4, 5 or all 6 of these sequences. In further particular embodiments, the sequence identity with these sequences is at least 75%, 80%, 85%, 90%, 95% or 100%. In further particular embodiments, the Cas13b effector protein is a protein comprising a sequence having 100% sequence identity with GLLFFVSLFL (SEQ ID NO:102) and RHQXRFPYF (SEQ ID NO:103). In further particular embodiments, the Cas13b effector is a Cas13b effector protein comprising a sequence having 100% sequence identity with RHQDRFPY (SEQ ID NO:104).

In particular embodiments, the Cas13b effector protein is a Cas13b effector protein having at least 65%, preferably at least 70%, 75%, 80%, 85%, 90%, 95% or more sequence identity with a Cas13b protein from Prevotella buccae, Porphyromonas gingivalis, Prevotella saccharolytica, Riemerella antipestifer. In further particular embodiments, the Cas13b effector is selected from the Cas13b protein from Bacteroides pyogenes, Prevotella sp. MA2016, Riemerella anatipestifer, Porphyromonas gulae, Porphyromonas gingivalis, and Porphyromonas sp.COT-0520H4946.

It will be appreciated that orthologs of a Table 1 Cas13b enzyme that can be within the invention can include a chimeric enzyme comprising a fragment of a Table 1 Cas13b enzyme of multiple orthologs. Examples of such orthologs are described elsewhere herein. A chimeric enzyme may comprise a fragment of a Table 1 Cas13b enzyme and a fragment from another CRISPR enzyme, such as an ortholog of a Table 1 Cas13b enzyme of an organism which includes but is not limited to Bergeyella, Prevotella, Porphyromonas, Bacteroides, Alistipes, Riemerella, Myroides, Flavobacterium, Capnocytophaga, Chryseobacterium, Phaeodactylibacter, Paludibacter or Psychroflexus. A chimeric enzyme can comprise a first fragment and a second fragment, and the fragments, wherein one of the first and second a fragment is of or from a Table 1 Cas13b enzyme and the other fragment is of or from a CRISPR enzyme ortholog of a different species. In some cases, Cas13b is Cas13b-t. For example, Cas13b may be Cas13b-t1 (e.g., Cas13b-t1a), Cas13b-t2, or Cas13b-t3 (see, e.g. FIGS. 54A-54C).

In embodiments, the Cas13 RNA-targeting Cas13 effector proteins referred to herein also encompasses a functional variant of the effector protein or a homologue or an orthologue thereof. A “functional variant” of a protein as used herein refers to a variant of such protein which retains at least partially the activity of that protein. Functional variants may include mutants (which may be insertion, deletion, or replacement mutants), including polymorphs, etc., including as discussed herein in conjunction with Table 1. Also included within functional variants are fusion products of such protein with another, usually unrelated, nucleic acid, protein, polypeptide or peptide. Functional variants may be naturally occurring or may be man-made. In an embodiment, nucleic acid molecule(s) encoding the Cas13 RNA-targeting effector proteins, or an ortholog or homolog thereof, may be codon-optimized for expression in an eukaryotic cell. A eukaryote can be as herein discussed. Nucleic acid molecule(s) can be engineered or non-naturally occurring.

In an embodiment, the Cas13 RNA-targeting effector protein or an ortholog or homolog thereof, may comprise one or more mutations. The mutations may be artificially introduced mutations and may include but are not limited to one or more mutations in a catalytic domain, e.g., one or more mutations are introduced into one or more of the HEPN domains.

In an embodiment, the Cas13 protein or an ortholog or homolog thereof, may be used as a generic nucleic acid binding protein with fusion to or being operably linked to a functional domain. Exemplary functional domains may include but are not limited to translational initiator, translational activator, translational repressor, nucleases, in particular ribonucleases, a spliceosome, beads, a light inducible/controllable domain or a chemically inducible/controllable domain.

In an advantageous embodiment, the present invention encompasses Cas13 effector proteins with reference to Tables 1-5. In certain example embodiments, the Cas13 effector protein is from an organism identified in Tables 1-5. In certain example embodiments, the Cas13 effector protein is from an organism selected from Bergeyella zoohelcum, Prevotella intermedia, Prevotella buccae, Porphyromonas gingivalis, Bacteroides pyogenes, Alistipes sp. ZOR0009, Prevotella sp. MA2016, Riemerella anatipestifer, Prevotella aurantiaca, Prevotella saccharolytica, Myroides odoratimimus CCUG 10230, Capnocytophaga canimorsus, Porphyromonas gulae, Prevotella sp. P5-125, Flavobacterium branchiophilum, Myroides odoratimimus, Flavobacterium columnare, or Porphyromonas sp. COT-052 OH4946. In another embodiment, the one or more guide RNAs are designed to bind to one or more target RNA sequences that are diagnostic for a disease state.

In certain example embodiments, the CRISPR effector protein is a Cas13b protein selected from Table 1.

TABLE 1 Bergeyella 1 MENKTSLGNNIYYNPFKPQDKSYFAGYFNAAMENTDSVFRELG zoohelcum KRLKGKEYTSENFFDAIFKENISLVEYERYVKLLSDYFPMARLL (SEQ ID DKKEVPIKERKENFKKNFKGIIKAVRDLRNFYTHKEHGEVEITD No. 105) EIFGVLDEMLKSTVLTVKKKKVKTDKTKEILKKSIEKQLDILCQ KKLEYLRDTARKIEEKRRNQRERGEKELVAPFKYSDKRDDLIA AIYNDAFDVYIDKKKDSLKESSKAKYNTKSDPQQEEGDLKIPIS KNGVVFLLSLFLTKQEIHAFKSKIAGFKATVIDEATVSEATVSHG KNSICFMATHEIFSHLAYKKLKRKVRTAEINYGEAENAEQLSVY AKETLMMQMLDELSKVPDVVYQNLSEDVQKTFIEDWNEYLKE NNGDVGTMEEEQVIHPVIRKRYEDKFNYFAIRFLDEFAQFPTLR FQVHLGNYLHDSRPKENLISDRRIKEKITVFGRLSELEHKKALFI KNTETNEDREHYWEIFPNPNYDFPKENISVNDKDFPIAGSILDRE KQPVAGKIGIKVKLLNQQYVSEVDKAVKAHQLKQRKASKPSIQ NIIEEIVPINESNPKEAIVFGGQPTAYLSMNDIHSILYEFFDKWEK KKEKLEKKGEKELRKEIGKELEKKIVGKIQAQIQQIIDKDTNAKI LKPYQDGNSTAIDKEKLIKDLKQEQNILQKLKDEQTVREKEYN DFIAYQDKNREINKVRDRNHKQYLKDNLKRKYPEAPARKEVL YYREKGKVAVWLANDIKRFMPTDFKNEWKGEQHSLLQKSLAY YEQCKEELKNLLPEKVFQHLPFKLGGYFQQKYLYQFYTCYLDK RLEYISGLVQQAENFKSENKVFKKVENECFKFLKKQNYTHKEL DARVQSILGYPIFLERGFMDEKPTIIKGKTFKGNEALFADWFRY YKEYQNFQTFYDTENYPLVELEKKQADRKRKTKIYQQKKNDV FTLLMAKHIFKSVFKQDSIDQFSLEDLYQSREERLGNQERARQT GERNTNYIWNKTVDLKLCDGKITVENVKLKNVGDFIKYEYDQR VQAFLKYEENIEWQAFLIKESKEEENYPYVVEREIEQYEKVRRE ELLKEVHLIEEYILEKVKDKEILKKGDNQNFKYYILNGLLKQLK NEDVESYKVFNLNTEPEDVNINQLKQEATDLEQKAFVLTYIRN KFAHNQLPKKEFWDYCQEKYGKIEKEKTYAEYFAEVFKKEKE ALIK Prevotella 2 MEDDKKTTDSIRYELKDKHFWAAFLNLARHNVYITVNHINKIL intermedia EEGEINRDGYETTLKNTWNEIKDINKKDRLSKLIIKHFPFLEAAT (SEQ ID YRLNPTDTTKQKEEKQAEAQSLESLRKSFFVFIYKLRDLRNHYS No. 106) HYKHSKSLERPKFEEGLLEKMYNIFNASIRLVKEDYQYNKDINP DEDFKHLDRTEEEFNYYFTKDNEGNITESGLLFFVSLFLEKKDAI WMQQKLRGFKDNRENKKKMTNEVFCRSRMLLPKLRLQSTQTQ DWILLDMLNELIRCPKSLYERLREEDREKFRVPIEIADEDYDAEQ EPFKNTLVRHQDRFPYFALRYFDYNEIFTNLRFQIDLGTYHFSIY KKQIGDYKESHHLTHKLYGFERIQEFTKQNRPDEWRKFVKTFN SFETSKEPYIPETTPHYHLENQKIGIRFRNDNDKIWPSLKTNSEK NEKSKYKLDKSFQAEAFLSVHELLPMMFYYLLLKTENTDNDNE IETKKKENKNDKQEKHKIEEIIENKITEIYALYDTFANGEIKSIDE LEEYCKGKDIEIGHLPKQMIAILKDEHKVMATEAERKQEEMLV DVQKSLESLDNQINEEIENVERKNSSLKSGKIASWLVNDMMRF QPVQKDNEGKPLNNSKANSTEYQLLQRTLAFFGSEHERLAPYF KQTKLIESSNPHPFLKDTEWEKCNNILSFYRSYLEAKKNFLESLK PEDWEKNQYFLKLKEPKTKPKTLVQGWKNGFNLPRGIFTEPIRK WFMKHRENITVAELKRVGLVAKVIPLFFSEEYKDSVQPFYNYH FNVGNINKPDEKNFLNCEERRELLRKKKDEFKKMTDKEKEENP SYLEFKSWNKFERELRLVRNQDIVTWLLCMELFNKKKIKELNV EKIYLKNINTNTTKKEKNTEEKNGEEKNIKEKNNILNRIMPMRL PIKVYGRENFSKNKKKKIRRNTFFTVYIEEKGTKLLKQGNFKAL ERDRRLGGLFSFVKTPSKAESKSNTISKLRVEYELGEYQKARIEII KDMLALEKTLIDKYNSLDTDNFNKMLTDWLELKGEPDKASFQ NDVDLLIAVRNAFSHNQYPMRNRIAFANINPFSLSSANTSEEKG LGIANQLKDKTHKTIEKIIEIEKPIETKE Prevotella 3 MQKQDKLFVDRKKNAIFAFPKYITIMENKEKPEPIYYELTDKHF buccae WAAFLNLARHNVYTTINHINRRLEIAELKDDGYMMGIKGSWNE (SEQ ID QAKKLDKKVRLRDLIMKHFPFLEAAAYEMTNSKSPNNKEQRE No. 107) KEQSEALSLNNLKNVLFIFLEKLQVLRNYYSHYKYSEESPKPIFE WP_004343973.1 TSLLKNMYKVFDANVRLVKRDYMHHENIDMQRDFTHLNRKK QVGRTKNIIDSPNFHYHFADKEGNMTIAGLLFFVSLFLDKKDAI WMQKKLKGFKDGRNLREQMTNEVFCRSRISLPKLKLENVQTK DWMQLDMLNELVRCPKSLYERLREKDRESFKVPFDIFSDDYNA EEEPFKNTLVRHQDRFPYFVLRYFDLNEIFEQLRFQIDLGTYHFS IYNKRIGDEDEVRHLTHHLYGFARIQDFAPQNQPEEWRKLVKD LDHFETSQEPYISKTAPHYHLENEKIGIKFCSAHNNLFPSLQTDK TCNGRSKFNLGTQFTAEAFLSVHELLPMMFYYLLLTKDYSRKE SADKVEGIIRKEISNIYAIYDAFANNEINSIADLTRRLQNTNILQG HLPKQMISILKGRQKDMGKEAERKIGEMIDDTQRRLDLLCKQT NQKIRIGKRNAGLLKSGKIADWLVNDMMRFQPVQKDQNNIPIN NSKANSTEYRMLQRALALFGSENFRLKAYFNQMNLVGNDNPH PFLAETQWEHQTNILSFYRNYLEARKKYLKGLKPQNWKQYQH FLILKVQKTNRNTLVTGWKNSFNLPRGIFTQPIREWFEKHNNSK RIYDQILSFDRVGFVAKAIPLYFAEEYKDNVQPFYDYPFNIGNRL KPKKRQFLDKKERVELWQKNKELFKNYPSEKKKTDLAYLDFLS WKKFERELRLIKNQDIVTWLMFKELFNMATVEGLKIGEIHLRDI DTNTANEESNNILNRIMPMKLPVKTYETDNKGNILKERPLATFY IEETETKVLKQGNFKALVKDRRLNGLFSFAETTDLNLEEHPISKL SVDLELIKYQTTRISIFEMTLGLEKKLIDKYSTLPTDSFRNMLER WLQCKANRPELKNYVNSLIAVRNAFSHNQYPMYDATLFAEVK KFTLFPSVDTKKIELNIAPQLLEIVGKAIKEIEKSENKN Porphyromonas 4 MNTVPASENKGQSRTVEDDPQYFGLYLNLARENLIEVESHVRIK gingivalis FGKKKLNEESLKQSLLCDHLLSVDRWTKVYGHSRRYLPFLHYF (SEQ ID DPDSQIEKDHDSKTGVDPDSAQRLIRELYSLLDFLRNDFSHNRL No. 108) DGTTFEHLEVSPDISSFITGTYSLACGRAQSRFAVFFKPDDFVLA KNRKEQLISVADGKECLTVSGFAFFICLFLDREQASGMLSRIRGF KRTDENWARAVHETFCDLCIRHPHDRLESSNTKEALLLDMLNE LNRCPRILYDMLPEEERAQFLPALDENSMNNLSENSLDEESRLL WDGSSDWAEALTKRIRHQDRFPYLMLRFIEEMDLLKGIRFRVD LGEIELDSYSKKVGRNGEYDRTITDHALAFGKLSDFQNEEEVSR MISGEASYPVRFSLFAPRYAIYDNKIGYCHTSDPVYPKSKTGEK RALSNPQSMGFISVHDLRKLLLMELLCEGSFSRMQSDFLRKANR ILDETAEGKLQFSALFPEMRHRFIPPQNPKSKDRREKAETTLEKY KQEIKGRKDKLNSQLLSAFDMDQRQLPSRLLDEWMNIRPASHS VKLRTYVKQLNEDCRLRLRKFRKDGDGKARAIPLVGEMATFLS QDIVRMIISEETKKLITSAYYNEMQRSLAQYAGEENRRQFRAIV AELRLLDPSSGHPFLSATMETAHRYTEGFYKCYLEKKREWLAK IFYRPEQDENTKRRISVFFVPDGEARKLLPLLIRRRMKEQNDLQ DWIRNKQAHPIDLPSHLFDSKVMELLKVKDGKKKWNEAFKDW WSTKYPDGMQPFYGLRRELNIHGKSVSYIPSDGKKFADCYTHL MEKTVRDKKRELRTAGKPVPPDLAADIKRSFHRAVNEREFMLR LVQEDDRLMLMAINKMMTDREEDILPGLKNIDSILDEENQFSLA VHAKVLEKEGEGGDNSLSLVPATIEIKSKRKDWSKYIRYRYDR RVPGLMSHFPEHKATLDEVKTLLGEYDRCRIKIFDWAFALEGAI MSDRDLKPYLHESSSREGKSGEHSTLVKMLVEKKGCLTPDESQ YLILIRNKAAHNQFPCAAEMPLIYRDVSAKVGSIEGSSAKDLPE GSSLVDSLWKKYEMIIRKILPILDPENRFFGKLLNNMSQPINDL Bacteroides 5 MESIKNSQKSTGKTLQKDPPYFGLYLNMALLNVRKVENHIRKW pyogenes LGDVALLPEKSGFHSLLTTDNLSSAKWTRFYYKSRKFLPFLEMF (SEQ ID DSDKKSYENRRETAECLDTIDRQKISSLLKEVYGKLQDIRNAFS No. 109) HYHIDDQSVKHTALIISSEMHRFIENAYSFALQKTRARFTGVFVE TDFLQAEEKGDNKKFFAIGGNEGIKLKDNALIFLICLFLDREEAF KFLSRATGFKSTKEKGFLAVRETFCALCCRQPHERLLSVNPREA LLMDMLNELNRCPDILFEMLDEKDQKSFLPLLGEEEQAHILENS LNDELCEAIDDPFEMIASLSKRVRYKNRFPYLMLRYIEEKNLLPF IRFRIDLGCLELASYPKKMGEENNYERSVTDHAMAFGRLTDFH NEDAVLQQITKGITDEVRFSLYAPRYAIYNNKIGFVRTSGSDKIS FPTLKKKGGEGHCVAYTLQNTKSFGFISIYDLRKILLLSFLDKDK AKNIVSGLLEQCEKHWKDLSENLFDAIRTELQKEFPVPLIRYTLP RSKGGKLVSSKLADKQEKYESEFERRKEKLTEILSEKDFDLSQIP RRMIDEWLNVLPTSREKKLKGYVETLKLDCRERLRVFEKREKG EHPLPPRIGEMATDLAKDIIRMVIDQGVKQRITSAYYSEIQRCLA QYAGDDNRRHLDSIIRELRLKDTKNGHPFLGKVLRPGLGHTEK LYQRYFEEKKEWLEATFYPAASPKRVPRFVNPPTGKQKELPLII RNLMKERPEWRDWKQRKNSHPIDLPSQLFENEICRLLKDKIGKE PSGKLKWNEMFKLYWDKEFPNGMQRFYRCKRRVEVFDKVVE YEYSEEGGNYKKYYEALIDEVVRQKISSSKEKSKLQVEDLTLSV RRVFKRAINEKEYQLRLLCEDDRLLFMAVRDLYDWKEAQLDL DKIDNMLGEPVSVSQVIQLEGGQPDAVIKAECKLKDVSKLMRY CYDGRVKGLMPYFANHEATQEQVEMELRHYEDHRRRVFNWV FALEKSVLKNEKLRRFYEESQGGCEHRRCIDALRKASLVSEEEY EFLVHIRNKSAHNQFPDLEIGKLPPNVTSGFCECIWSKYKAIICRI IPFIDPERRFFGKLLEQK Alistipes 6 MSNEIGAFREHQFAYAPGNEKQEEATFATYFNLALSNVEGMMF sp. GEVESNPDKIEKSLDTLPPAILRQIASFIWLSKEDHPDKAYSTEE ZOR0009 VKVIVTDLVRRLCFYRNYFSHCFYLDTQYFYSDELVDTTAIGEK (SEQ ID LPYNFHHFITNRLFRYSLPEITLFRWNEGERKYEILRDGLIFFCCL No. 110) FLKRGQAERFLNELRFFKRTDEEGRIKRTIFTKYCTRESHKHIGIE EQDFLIFQDIIGDLNRVPKVCDGVVDLSKENERYIKNRETSNESD ENKARYRLLIREKDKFPYYLMRYIVDFGVLPCITFKQNDYSTKE GRGQFHYQDAAVAQEERCYNFVVRNGNVYYSYMPQAQNVVR ISELQGTISVEELRNMVYASINGKDVNKSVEQYLYHLHLLYEKI LTISGQTIKEGRVDVEDYRPLLDKLLLRPASNGEELRRELRKLLP KRVCDLLSNRFDCSEGVSAVEKRLKAILLRHEQLLLSQNPALHI DKIKSVIDYLYLFFSDDEKFRQQPTEKAHRGLKDEEFQMYHYL VGDYDSHPLALWKELEASGRLKPEMRKLTSATSLHGLYMLCL KGTVEWCRKQLMSIGKGTAKVEAIADRVGLKLYDKLKEYTPE QLEREVKLVVMHGYAAAATPKPKAQAAIPSKLTELRFYSFLGK REMSFAAFIRQDKKAQKLWLRNFYTVENIKTLQKRQAAADAA CKKLYNLVGEVERVHTNDKVLVLVAQRYRERLLNVGSKCAVT LDNPERQQKLADVYEVQNAWLSIRFDDLDFTLTHVNLSNLRKA YNLIPRKHILAFKEYLDNRVKQKLCEECRNVRRKEDLCTCCSPR YSNLTSWLKENHSESSIEREAATMMLLDVERKLLSFLLDERRKA IIEYGKFIPFSALVKECRLADAGLCGIRNDVLHDNVISYADAIGK LSAYFPKEASEAVEYIRRTKEVREQRREELMANSSQ Prevotella 7a MSKECKKQRQEKKRRLQKANFSISLTGKHVFGAYFNMARTNF sp. VKTINYILPIAGVRGNYSENQINKMLHALFLIQAGRNEELTTEQK MA2016 QWEKKLRLNPEQQTKFQKLLFKHFPVLGPMMADVADHKAYL (SEQ ID NKKKSTVQTEDETFAMLKGVSLADCLDIICLMADTLTECRNFY No. 111) THKDPYNKPSQLADQYLHQEMIAKKLDKVVVASRRILKDREGL SVNEVEFLTGIDHLHQEVLKDEFGNAKVKDGKVMKTFVEYDD FYFKISGKRLVNGYTVTTKDDKPVNVNTMLPALSDFGLLYFCV LFLSKPYAKLFIDEVRLFEYSPFDDKENMIMSEMLSIYRIRTPRL HKIDSHDSKATLAMDIFGELRRCPMELYNLLDKNAGQPFFHDE VKHPNSHTPDVSKRLRYDDRFPTLALRYIDETELFKRIRFQLQL GSFRYKFYDKENCIDGRVRVRRIQKEINGYGRMQEVADKRMD KWGDLIQKREERSVKLEHEELYINLDQFLEDTADSTPYVTDRRP AYNIHANRIGLYWEDSQNPKQYKVFDENGMYIPELVVTEDKKA PIKMPAPRCALSVYDLPAMLFYEYLREQQDNEFPSAEQVIIEYE DDYRKFFKAVAEGKLKPFKRPKEFRDFLKKEYPKLRMADIPKK LQLFLCSHGLCYNNKPETVYERLDRLTLQHLEERELHIQNRLEH YQKDRDMIGNKDNQYGKKSFSDVRHGALARYLAQSMMEWQP TKLKDKEKGHDKLTGLNYNVLTAYLATYGHPQVPEEGFTPRTL EQVLINAHLIGGSNPHPFINKVLALGNRNIEELYLHYLEEELKHI RSRIQSLSSNPSDKALSALPFIHHDRMRYHERTSEEMMALAARY TTIQLPDGLFTPYILEILQKHYTENSDLQNALSQDVPVKLNPTCN AAYLITLFYQTVLKDNAQPFYLSDKTYTRNKDGEKAESFSFKR AYELFSVLNNNKKDTFPFEMIPLFLTSDEIQERLSAKLLDGDGNP VPEVGEKGKPATDSQGNTIWKRRIYSEVDDYAEKLTDRDMKIS FKGEWEKLPRWKQDKIIKRRDETRRQMRDELLQRMPRYIRDIK DNERTLRRYKTQDMVLFLLAEKMFTNIISEQSSEFNWKQMRLS KVCNEAFLRQTLTFRVPVTVGETTIYVEQENMSLKNYGEFYRFL TDDRLMSLLNNIVETLKPNENGDLVIRHTDLMSELAAYDQYRS TIFMLIQSIENLIITNNAVLDDPDADGFWVREDLPKRNNFASLLE LINQLNNVELTDDERKLLVAIRNAFSHNSYNIDFSLIKDVKHLPE VAKGILQHLQSMLGVEITK Prevotella 7b MSKECKKQRQEKKRRLQKANFSISLTGKHVFGAYFNMARTNF sp. VKTINYILPIAGVRGNYSENQINKMLHALFLIQAGRNEELTTEQK MA2016 QWEKKLRLNPEQQTKFQKLLFKHFPVLGPMMADVADHKAYL (SEQ ID NKKKSTVQTEDETFAMLKGVSLADCLDIICLMADTLTECRNFY No. 112) THKDPYNKPSQLADQYLHQEMIAKKLDKVVVASRRILKDREGL SVNEVEFLTGIDHLHQEVLKDEFGNAKVKDGKVMKTFVEYDD FYFKISGKRLVNGYTVTTKDDKPVNVNTMLPALSDFGLLYFCV LFLSKPYAKLFIDEVRLFEYSPFDDKENMIMSEMLSIYRIRTPRL HKIDSHDSKATLAMDIFGELRRCPMELYNLLDKNAGQPFFHDE VKHPNSHTPDVSKRLRYDDRFPTLALRYIDETELFKRIRFQLQL GSFRYKFYDKENCIDGRVRVRRIQKEINGYGRMQEVADKRMD KWGDLIQKREERSVKLEHEELYINLDQFLEDTADSTPYVTDRRP AYNIHANRIGLYWEDSQNPKQYKVFDENGMYIPELVVTEDKKA PIKMPAPRCALSVYDLPAMLFYEYLREQQDNEFPSAEQVIIEYE DDYRKFFKAVAEGKLKPFKRPKEFRDFLKKEYPKLRMADIPKK LQLFLCSHGLCYNNKPETVYERLDRLTLQHLEERELHIQNRLEH YQKDRDMIGNKDNQYGKKSFSDVRHGALARYLAQSMMEWQP TKLKDKEKGHDKLTGLNYNVLTAYLATYGHPQVPEEGFTPRTL EQVLINAHLIGGSNPHPFINKVLALGNRNIEELYLHYLEEELKHI RSRIQSLSSNPSDKALSALPFIHHDRMRYHERTSEEMMALAARY TTIQLPDGLFTPYILEILQKHYTENSDLQNALSQDVPVKLNPTCN AAYLITLFYQTVLKDNAQPFYLSDKTYTRNKDGEKAESFSFKR AYELFSVLNNNKKDTFPFEMIPLFLTSDEIQERLSAKLLDGDGNP VPEVGEKGKPATDSQGNTIWKRRIYSEVDDYAEKLTDRDMKIS FKGEWEKLPRWKQDKIIKRRDETRRQMRDELLQRMPRYIRDIK DNERTLRRYKTQDMVLFLLAEKMFTNIISEQSSEFNWKQMRLS KVCNEAFLRQTLTFRVPVTVGETTIYVEQENMSLKNYGEFYRFL TDDRLMSLLNNIVETLKPNENGDLVIRHTDLMSELAAYDQYRS TIFMLIQSIENLIITNNAVLDDPDADGFWVREDLPKRNNFASLLE LINQLNNVELTDDERKLLVAIRNAFSHNSYNIDFSLIKDVKHLPE VAKGILQHLQSMLGVEITK Riemerella 8 MEKPLLPNVYTLKHKFFWGAFLNIARHNAFITICHINEQLGLKT anatipestifer PSNDDKIVDVVCETWNNILNNDHDLLKKSQLTELILKHFPFLTA (SEQ ID MCYHPPKKEGKKKGHQKEQQKEKESEAQSQAEALNPSKLIEAL No. 113) EILVNQLHSLRNYYSHYKHKKPDAEKDIFKHLYKAFDASLRMV KEDYKAHFTVNLTRDFAHLNRKGKNKQDNPDFNRYRFEKDGF FTESGLLFFTNLFLDKRDAYWMLKKVSGFKASHKQREKMTTE VFCRSRILLPKLRLESRYDHNQMLLDMLSELSRCPKLLYEKLSE ENKKHFQVEADGFLDEIEEEQNPFKDTLIRHQDRFPYFALRYLD LNESFKSIRFQVDLGTYHYCIYDKKIGDEQEKRHLTRTLLSFGRL QDFTEINRPQEWKALTKDLDYKETSNQPFISKTTPHYHITDNKIG FRLGTSKELYPSLEIKDGANRIAKYPYNSGFVAHAFISVHELLPL MFYQHLTGKSEDLLKETVRHIQRIYKDFEEERINTIEDLEKANQ GRLPLGAFPKQMLGLLQNKQPDLSEKAKIKIEKLIAETKLLSHR LNTKLKSSPKLGKRREKLIKTGVLADWLVKDFMRFQPVAYDA QNQPIKSSKANSTEFWFIRRALALYGGEKNRLEGYFKQTNLIGN TNPHPFLNKFNWKACRNLVDFYQQYLEQREKFLEAIKNQPWEP YQYCLLLKIPKENRKNLVKGWEQGGISLPRGLFTEAIRETLSED LMLSKPIRKEIKKHGRVGFISRAITLYFKEKYQDKHQSFYNLSY KLEAKAPLLKREEHYEYWQQNKPQSPTESQRLELHTSDRWKD YLLYKRWQHLEKKLRLYRNQDVMLWLMTLELTKNHFKELNL NYHQLKLENLAVNVQEADAKLNPLNQTLPMVLPVKVYPATAF GEVQYHKTPIRTVYIREEHTKALKMGNFKALVKDRRLNGLFSFI KEENDTQKHPISQLRLRRELEIYQSLRVDAFKETLSLEEKLLNKH TSLSSLENEFRALLEEWKKEYAASSMVTDEHIAFIASVRNAFCH NQYPFYKEALHAPIPLFTVAQPTTEEKDGLGIAEALLKVLREYC EIVKSQI Prevotella 9 MEDDKKTTGSISYELKDKHFWAAFLNLARHNVYITINHINKLLE aurantiaca IREIDNDEKVLDIKTLWQKGNKDLNQKARLRELMTKHFPFLET (SEQ ID AIYTKNKEDKKEVKQEKQAEAQSLESLKDCLFLFLDKLQEARN No. 114) YYSHYKYSEFSKEPEFEEGLLEKMYNIFGNNIQLVINDYQHNKD INPDEDFKHLDRKGQFKYSFADNEGNITESGLLFFVSLFLEKKD AIWMQQKLNGFKDNLENKKKMTHEVFCRSRILMPKLRLESTQT QDWILLDMLNELIRCPKSLYERLQGDDREKFKVPFDPADEDYN AEQEPFKNTLIRHQDRFPYFVLRYFDYNEIFKNLRFQIDLGTYHF SIYKKLIGGQKEDRHLTHKLYGFERIQEFAKQNRPDEWKAIVKD LDTYETSNKRYISETTPHYHLENQKIGIRFRNGNKEIWPSLKTND ENNEKSKYKLDKQYQAEAFLSVHELLPMMFYYLLLKKEKPNN DEINASIVEGFIKREIRNIFKLYDAFANGEINNIDDLEKYCADKGI PKRHLPKQMVAILYDEHKDMVKEAKRKQKEMVKDTKKLLAT LEKQTQKEKEDDGRNVKLLKSGEIARWLVNDMMRFQPVQKD NEGKPLNNSKANSTEYQMLQRSLALYNNEEKPTRYFRQVNLIE SNNPHPFLKWTKWEECNNILTFYYSYLTKKIEFLNKLKPEDWK KNQYFLKLKEPKTNRETLVQGWKNGFNLPRGIFTEPIREWFKR HQNNSKEYEKVEALDRVGLVTKVIPLFFKEEYFKDKEENFKED TQKEINDCVQPFYNFPYNVGNIHKPKEKDFLHREERIELWDKKK DKFKGYKEKIKSKKLTEKDKEEFRSYLEFQSWNKFERELRLVR NQDIVTWLLCKELIDKLKIDELNIEELKKLRLNNIDTDTAKKEK NNILNRVMPMELPVTVYEIDDSHKIVKDKPLHTIYIKEAETKLL KQGNFKALVKDRRLNGLFSFVKTNSEAESKRNPISKLRVEYELG EYQEARIEIIQDMLALEEKLINKYKDLPTNKFSEMLNSWLEGKD EADKARFQNDVDFLIAVRNAFSHNQYPMHNKIEFANIKPFSLYT ANNSEEKGLGIANQLKDKTKETTDKIKKIEKPIETKE Prevotella 10 MEDKPFWAAFFNLARHNVYLTVNHINKLLDLEKLYDEGKHKEI saccharolytica FEREDIFNISDDVMNDANSNGKKRKLDIKKIWDDLDTDLTRKY (SEQ QLRELILKHFPFIQPAIIGAQTKERTTIDKDKRSTSTSNDSLKQTG ID No. EGDINDLLSLSNVKSMFFRLLQILEQLRNYYSHVKHSKSATMPN 115) FDEDLLNWMRYIFIDSVNKVKEDYSSNSVIDPNTSFSHLIYKDE QGKIKPCRYPFTSKDGSINAFGLLFFVSLFLEKQDSIWMQKKIPG FKKASENYMKMTNEVFCRNHILLPKIRLETVYDKDWMLLDML NEVVRCPLSLYKRLTPAAQNKFKVPEKSSDNANRQEDDNPFSRI LVRHQNRFPYFVLRFFDLNEVFTTLRFQINLGCYHFAICKKQIGD KKEVHHLIRTLYGFSRLQNFTQNTRPEEWNTLVKTTEPSSGNDG KTVQGVPLPYISYTIPHYQIENEKIGIKIFDGDTAVDTDIWPSVST EKQLNKPDKYTLTPGFKADVFLSVHELLPMMFYYQLLLCEGML KTDAGNAVEKVLIDTRNAIFNLYDAFVQEKINTITDLENYLQDK PILIGHLPKQMIDLLKGHQRDMLKAVEQKKAMLIKDTERRLKL LDKQLKQETDVAAKNTGTLLKNGQIADWLVNDMMRFQPVKR DKEGNPINCSKANSTEYQMLQRAFAFYATDSCRLSRYFTQLHLI HSDNSHLFLSRFEYDKQPNLIAFYAAYLKAKLEFLNELQPQNW ASDNYFLLLRAPKNDRQKLAEGWKNGFNLPRGLFTEKIKTWFN EHKTIVDISDCDIFKNRVGQVARLIPVFFDKKFKDHSQPFYRYDF NVGNVSKPTEANYLSKGKREELFKSYQNKFKNNIPAEKTKEYR EYKNFSLWKKFERELRLIKNQDILIWLMCKNLFDEKIKPKKDIL EPRIAVSYIKLDSLQTNTSTAGSLNALAKVVPMTLAIHIDSPKPK GKAGNNEKENKEFTVYIKEEGTKLLKWGNFKTLLADRRIKGLF SYIEHDDIDLKQHPLTKRRVDLELDLYQTCRIDIFQQTLGLEAQL LDKYSDLNTDNFYQMLIGWRKKEGIPRNIKEDTDFLKDVRNAF SHNQYPDSKKIAFRRIRKFNPKELILEEEEGLGIATQMYKEVEKV VNRIKRIELFD HMPREF9712_03108 11 MKDILTTDTTEKQNRFYSHKIADKYFFGGYFNLASNNIYEVFEE [Myroides VNKRNTFGKLAKRDNGNLKNYIIHVFKDELSISDFEKRVAIFAS odoratimimus YFPILETVDKKSIKERNRTIDLTLSQRIRQFREMLISLVTAVDQLR CCUG 10230] NFYTHYHHSDIVIENKVLDFLNSSFVSTALHVKDKYLKTDKTKE (SEQ ID FLKETIAAELDILIEAYKKKQIEKKNTRFKANKREDILNAIYNEA No. 116) FWSFINDKDKDKDKETVVAKGADAYFEKNHHKSNDPDFALNIS EKGIVYLLSFFLTNKEMDSLKANLTGFKGKVDRESGNSIKYMA TQRIYSFHTYRGLKQKIRTSEEGVKETLLMQMIDELSKVPNVVY QHLSTTQQNSFIEDWNEYYKDYEDDVETDDLSRVIHPVIRKRYE DRFNYFAIRFLDEFFDFPTLRFQVHLGDYVHDRRTKQLGKVESD RIIKEKVTVFARLKDINSAKASYFHSLEEQDKEELDNKWTLFPN PSYDFPKEHTLQHQGEQKNAGKIGIYVKLRDTQYKEKAALEEA RKSLNPKERSATKASKYDIITQIIEANDNVKSEKPLVFTGQPIAY LSMNDIHSMLFSLLTDNAELKKTPEEVEAKLIDQIGKQINEILSK DTDTKILKKYKDNDLKETDTDKITRDLARDKEEIEKLILEQKQR ADDYNYTSSTKFNIDKSRKRKHLLFNAEKGKIGVWLANDIKRF MFKESKSKWKGYQHTELQKLFAYFDTSKSDLELILSNMVMVK DYPIELIDLVKKSRTLVDFLNKYLEARLEYIENVITRVKNSIGTP QFKTVRKECFTFLKKSNYTVVSLDKQVERILSMPLFIERGFMDD KPTMLEGKSYKQHKEKFADWFVHYKENSNYQNFYDTEVYEIT TEDKREKAKVTKKIKQQQKNDVFTLMMVNYMLEEVLKLSSND RLSLNELYQTKEERIVNKQVAKDTQERNKNYIWNKVVDLQLC DGLVHIDNVKLKDIGNFRKYENDSRVKEFLTYQSDIVWSAYLS NEVDSNKLYVIERQLDNYESIRSKELLKEVQEIECSVYNQVANK ESLKQSGNENFKQYVLQGLLPIGMDVREMLILSTDVKFKKEEII QLGQAGEVEQDLYSLIYIRNKFAHNQLPIKEFFDFCENNYRSISD NEYYAEYYMEIFRSIKEKYAN Prevotella 12 MEDDKKTTDSIRYELKDKHFWAAFLNLARHNVYITVNHINKIL intermedia EEDEINRDGYENTLENSWNEIKDINKKDRLSKLIIKHFPFLEATT (SEQ ID YRQNPTDTTKQKEEKQAEAQSLESLKKSFFVFIYKLRDLRNHYS No. 117) HYKHSKSLERPKFEEDLQNKMYNIFDVSIQFVKEDYKHNTDINP KKDFKHLDRKRKGKFHYSFADNEGNITESGLLFFVSLFLEKKDA IWVQKKLEGFKCSNKSYQKMTNEVFCRSRMLLPKLRLESTQTQ DWILLDMLNELIRCPKSLYERLQGVNRKKFYVSFDPADEDYDA EQEPFKNTLVRHQDRFPYFALRYFDYNEVFANLRFQIDLGTYHF SIYKKLIGGQKEDRHLTHKLYGFERIQEFDKQNRPDEWKAIVKD SDTFKKKEEKEEEKPYISETTPHYHLENKKIGIAFKNHNIWPSTQ TELTNNKRKKYNLGTSIKAEAFLSVHELLPMMFYYLLLKTENT KNDNKVGGKKETKKQGKHKIEAIIESKIKDIYALYDAFANGEIN SEDELKEYLKGKDIKIVHLPKQMIAILKNEHKDMAEKAEAKQE KMKLATENRLKTLDKQLKGKIQNGKRYNSAPKSGEIASWLVN DMMRFQPVQKDENGESLNNSKANSTEYQLLQRTLAFFGSEHER LAPYFKQTKLIESSNPHPFLNDTEWEKCSNILSFYRSYLKARKNF LESLKPEDWEKNQYFLMLKEPKTNRETLVQGWKNGFNLPRGFF TEPIRKWFMEHWKSIKVDDLKRVGLVAKVTPLFFSEKYKDSVQ PFYNYPFNVGDVNKPKEEDFLHREERIELWDKKKDKFKGYKA KKKFKEMTDKEKEEHRSYLEFQSWNKFERELRLVRNQDIVTWL LCTELIDKLKIDELNIKELKKLRLKDINTDTAKKEKNNILNRVMP MELPVTVYKVNKGGYIIKNKPLHTIYIKEAETKLLKQGNFKALV KDRRLNGLFSFVKTPSEAESESNPISKLRVEYELGKYQNARLDII EDMLALEKKLIDKYNSLDTDNFHNMLTGWLELKGEAKKARFQ NDVKLLTAVRNAFSHNQYPMYDENLFGNIERFSLSSSNIIESKGL DIAAKLKEEVSKAAKKIQNEEDNKKEKET Capnocytophaga 13 MKNIQRLGKGNEFSPFKKEDKFYFGGFLNLANNNIEDFFKEIITR canimorsus FGIVITDENKKPKETFGEKILNEIFKKDISIVDYEKWVNIFADYFP (SEQ ID FTKYLSLYLEEMQFKNRVICFRDVMKELLKTVEALRNFYTHYD No. 118) HEPIKIEDRVFYFLDKVLLDVSLTVKNKYLKTDKTKEFLNQHIG EELKELCKQRKDYLVGKGKRIDKESEIINGIYNNAFKDFICKREK QDDKENHNSVEKILCNKEPQNKKQKSSATVWELCSKSSSKYTE KSFPNRENDKHCLEVPISQKGIVFLLSFFLNKGEIYALTSNIKGFK AKITKEEPVTYDKNSIRYMATHRMFSFLAYKGLKRKIRTSEINY NEDGQASSTYEKETLMLQMLDELNKVPDVVYQNLSEDVQKTFI EDWNEYLKENNGDVGTMEEEQVIHPVIRKRYEDKFNYFAIRFL DEFAQFPTLRFQVHLGNYLCDKRTKQICDTTTEREVKKKITVFG RLSELENKKAIFLNEREEIKGWEVFPNPSYDFPKENISVNYKDFP IVGSILDREKQPVSNKIGIRVKIADELQREIDKAIKEKKLRNPKNR KANQDEKQKERLVNEIVSTNSNEQGEPVVFIGQPTAYLSMNDIH SVLYEFLINKISGEALETKIVEKIETQIKQIIGKDATTKILKPYTNA NSNSINREKLLRDLEQEQQILKTLLEEQQQREKDKKDKKSKRK HELYPSEKGKVAVWLANDIKRFMPKAFKEQWRGYHHSLLQKY LAYYEQSKEELKNLLPKEVFKHFPFKLKGYFQQQYLNQFYTDY LKRRLSYVNELLLNIQNFKNDKDALKATEKECFKFFRKQNYIIN PINIQIQSILVYPIFLKRGFLDEKPTMIDREKFKENKDTELADWF MHYKNYKEDNYQKFYAYPLEKVEEKEKFKRNKQINKQKKND VYTLMMVEYIIQKIFGDKFVEENPLVLKGIFQSKAERQQNNTHA ATTQERNLNGILNQPKDIKIQGKITVKGVKLKDIGNFRKYEIDQR VNTFLDYEPRKEWMAYLPNDWKEKEKQGQLPPNNVIDRQISK YETVRSKILLKDVQELEKIISDEIKEEHRHDLKQGKYYNFKYYIL NGLLRQLKNENVENYKVFKLNTNPEKVNITQLKQEATDLEQKA FVLTYIRNKFAHNQLPKKEFWDYCQEKYGKIEKEKTYAEYFAE VFKREKEALIK Porphyromonas 14 MTEQSERPYNGTYYTLEDKHFWAAFLNLARHNAYITLTHIDRQ gulae LAYSKADITNDQDVLSFKALWKNFDNDLERKSRLRSLILKHFSF (SEQ ID LEGAAYGKKLFESKSSGNKSSKNKELTKKEKEELQANALSLDN No. 119) LKSILFDFLQKLKDFRNYYSHYRHSGSSELPLFDGNMLQRLYNV FDVSVQRVKIDHEHNDEVDPHYHFNHLVRKGKKDRYGHNDNP SFKHHFVDGEGMVTEAGLLFFVSLFLEKRDAIWMQKKIRGFKG GTETYQQMTNEVFCRSRISLPKLKLESLRMDDWMLLDMLNEL VRCPKPLYDRLREDDRACFRVPVDILPDEDDTDGGGEDPFKNT LVRHQDRFPYFALRYFDLKKVFTSLRFHIDLGTYHFAIYKKMIG EQPEDRHLTRNLYGFGRIQDFAEEHRPEEWKRLVRDLDYFETG DKPYISQTSPHYHIEKGKIGLRFMPEGQHLWPSPEVGTTRTGRS KYAQDKRLTAEAFLSVHELMPMMFYYFLLREKYSEEVSAERV QGRIKRVIEDVYAVYDAFARDEINTRDELDACLADKGIRRGHLP RQMIAILSQEHKDMEEKIRKKLQEMMADTDHRLDMLDRQTDR KIRIGRKNAGLPKSGVIADWLVRDMMRFQPVAKDASGKPLNNS KANSTEYRMLQRALALFGGEKERLTPYFRQMNLTGGNNPHPFL HETRWESHTNILSFYRSYLRARKAFLERIGRSDRVENRPFLLLKE PKTDRQTLVAGWKGEFHLPRGIFTEAVRDCLIEMGHDEVASYK EVGFMAKAVPLYFERACEDRVQPFYDSPFNVGNSLKPKKGRFL SKEERAEEWERGKERFRDLEAWSYSAARRIEDAFAGIEYASPG NKKKIEQLLRDLSLWEAFESKLKVRADRINLAKLKKEILEAQEH PYHDFKSWQKFERELRLVKNQDIITWMMCRDLMEENKVEGLD TGTLYLKDIRPNVQEQGSLNVLNRVKPMRLPVVVYRADSRGH VHKEEAPLATVYIEERDTKLLKQGNFKSFVKDRRLNGLFSFVDT GGLAMEQYPISKLRVEYELAKYQTARVCVFELTLRLEESLLTRY PHLPDESFREMLESWSDPLLAKWPELHGKVRLLIAVRNAFSHN QYPMYDEAVFSSIRKYDPSSPDAIEERMGLNIAHRLSEEVKQAK ETVERIIQA Prevotella 15 MNIPALVENQKKYFGTYSVMAMLNAQTVLDHIQKVADIEGEQ sp. P5-125 NENNENLWFHPVMSHLYNAKNGYDKQPEKTMFIIERLQSYFPF (SEQ ID LKIMAENQREYSNGKYKQNRVEVNSNDIFEVLKRAFGVLKMY No. 120) RDLTNHYKTYEEKLNDGCEFLTSTEQPLSGMINNYYTVALRNM NERYGYKTEDLAFIQDKRFKFVKDAYGKKKSQVNTGFFLSLQD YNGDTQKKLHLSGVGIALLICLFLDKQYINIFLSRLPIFSSYNAQS EERRIIIRSFGINSIKLPKDRIHSEKSNKSVAMDMLNEVKRCPDEL FTTLSAEKQSRFRIISDDHNEVLMKRSSDRFVPLLLQYIDYGKLF DHIRFHVNMGKLRYLLKADKTCIDGQTRVRVIEQPLNGFGRLE EAETMRKQENGTFGNSGIRIRDFENMKRDDANPANYPYIVDTY THYILENNKVEMFINDKEDSAPLLPVIEDDRYVVKTIPSCRMSTL EIPAMAFHMFLFGSKKTEKLIVDVHNRYKRLFQAMQKEEVTAE NIASFGIAESDLPQKILDLISGNAHGKDVDAFIRLTVDDMLTDTE RRIKRFKDDRKSIRSADNKMGKRGFKQISTGKLADFLAKDIVLF QPSVNDGENKITGLNYRIMQSAIAVYDSGDDYEAKQQFKLMFE KARLIGKGTTEPHPFLYKVFARSIPANAVEFYERYLIERKFYLTG LSNEIKKGNRVDVPFIRRDQNKWKTPAMKTLGRIYSEDLPVELP RQMFDNEIKSHLKSLPQMEGIDFNNANVTYLIAEYMKRVLDDD FQTFYQWNRNYRYMDMLKGEYDRKGSLQHCFTSVEEREGLW KERASRTERYRKQASNKIRSNRQMRNASSEEIETILDKRLSNSR NEYQKSEKVIRRYRVQDALLFLLAKKTLTELADFDGERFKLKEI MPDAEKGILSEIMPMSFTFEKGGKKYTITSEGMKLKNYGDFFVL ASDKRIGNLLELVGSDIVSKEDIMEEFNKYDQCRPEISSIVFNLE KWAFDTYPELSARVDREEKVDFKSILKILLNNKNINKEQSDILR KIRNAFDHNNYPDKGVVEIKALPEIAMSIKKAFGEYAIMK Flavobacterium 16 MENLNKILDKENEICISKIFNTKGIAAPITEKALDNIKSKQKNDL branchiophilum NKEARLHYFSIGHSFKQIDTKKVFDYVLIEELKDEKPLKFITLQK (SEQ DFFTKEFSIKLQKLINSIRNINNHYVHNFNDINLNKIDSNVFHFLK ID No. ESFELAIIEKYYKVNKKYPLDNEIVLFLKELFIKDENTALLNYFT 121) NLSKDEAIEYILTFTITENKIWNINNEHNILNIEKGKYLTFEAMLF LITIFLYKNEANHLLPKLYDFKNNKSKQELFTFFSKKFTSQDIDA EEGHLIKFRDMIQYLNHYPTAWNNDLKLESENKNKIMTTKLIDS IIEFELNSNYPSFATDIQFKKEAKAFLFASNKKRNQTSFSNKSYN EEIRHNPHIKQYRDEIASALTPISFNVKEDKFKIFVKKHVLEEYFP NSIGYEKFLEYNDFTEKEKEDFGLKLYSNPKTNKLIERIDNHKL VKSHGRNQDRFMDFSMRFLAENNYFGKDAFFKCYKFYDTQEQ DEFLQSNENNDDVKFHKGKVTTYIKYEEHLKNYSYWDCPFVEE NNSMSVKISIGSEEKILKIQRNLMIYFLENALYNENVENQGYKL VNNYYRELKKDVEESIASLDLIKSNPDFKSKYKKILPKRLLHNY APAKQDKAPENAFETLLKKADFREEQYKKLLKKAEHEKNKED FVKRNKGKQFKLHFIRKACQMMYFKEKYNTLKEGNAAFEKKD PVIEKRKNKEHEFGHHKNLNITREEFNDYCKWMFAFNGNDSYK KYLRDLFSEKHFFDNQEYKNLFESSVNLEAFYAKTKELFKKWIE TNKPTNNENRYTLENYKNLILQKQVFINVYHFSKYLIDKNLLNS ENNVIQYKSLENVEYLISDFYFQSKLSIDQYKTCGKLFNKLKSN KLEDCLLYEIAYNYIDKKNVHKIDIQKILTSKIILTINDANTPYKIS VPFNKLERYTEMIAIKNQNNLKARFLIDLPLYLSKNKIKKGKDS AGYEIIIKNDLEIEDINTINNKIINDSVKFTEVLMELEKYFILKDKC ILSKNYIDNSEIPSLKQFSKVWIKENENEIINYRNIACHFHLPLLET FDNLLLNVEQKFIKEELQNVSTINDLSKPQEYLILLFIKFKHNNF YLNLFNKNESKTIKNDKEVKKNRVLQKFINQVILKKK Myroides 17 MKDILTTDTTEKQNRFYSHKIADKYFFGGYFNLASNNIYEVFEE odoratimimus VNKRNTFGKLAKRDNGNLKNYIIHVFKDELSISDFEKRVAIFAS (SEQ ID YFPILETVDKKSIKERNRTIDLTLSQRIRQFREMLISLVTAVDQLR No. 122) NFYTHYHHSDIVIENKVLDFLNSSFVSTALHVKDKYLKTDKTKE FLKETIAAELDILIEAYKKKQIEKKNTRFKANKREDILNAIYNEA FWSFINDKDKDKDKETVVAKGADAYFEKNHHKSNDPDFALNIS EKGIVYLLSFFLTNKEMDSLKANLTGFKGKVDRESGNSIKYMA TQRIYSFHTYRGLKQKIRTSEEGVKETLLMQMIDELSKVPNVVY QHLSTTQQNSFIEDWNEYYKDYEDDVETDDLSRVTHPVIRKRY EDRFNYFAIRFLDEFFDFPTLRFQVEILGDYVHDRRTKQLGKVES DRIIKEKVTVFARLKDINSAKASYFHSLEEQDKEELDNKWTLFP NPSYDFPKEHTLQHQGEQKNAGKIGIYVKLRDTQYKEKAALEE ARKSLNPKERSATKASKYDIITQIIEANDNVKSEKPLVFTGQPIA YLSMNDIHSMLFSLLTDNAELKKTPEEVEAKLIDQIGKQINEILS KDTDTKILKKYKDNDLKETDTDKITRDLARDKEEIEKLILEQKQ RADDYNYTSSTKFNIDKSRKRKHLLFNAEKGKIGVWLANDIKR FMFKESKSKWKGYQHIELQKLFAYFDTSKSDLELILSNMVMVK DYPIELIDLVKKSRTLVDFLNKYLEARLEYIENVITRVKNSIGTP QFKTVRKECFTFLKKSNYTVVSLDKQVERILSMPLFIERGFMDD KPTMLEGKSYKQHKEKFADWFVHYKENSNYQNFYDTEVYEIT TEDKREKAKVTKKIKQQQKNDVFTLMMVNYMLEEVLKLSSND RLSLNELYQTKEERIVNKQVAKDTQERNKNYIWNKVVDLQLC DGLVHIDNVKLKDIGNFRKYENDSRVKEFLTYQSDIVWSAYLS NEVDSNKLYVIERQLDNYESIRSKELLKEVQEIECSVYNQVANK ESLKQSGNENFKQYVLQGLLPIGMDVREMLILSTDVKFKKEEII QLGQAGEVEQDLYSLIYIRNKFAHNQLPIKEFFDFCENNYRSISD NEYYAEYYMEIFRSIKEKYAN Flavobacterium 18 MSSKNESYNKQKTFNHYKQEDKYFFGGFLNNADDNLRQVGKE columnare FKTRINFNRNNNELASVFKDYFNKEKSVAKREHALNLLSNYFP (SEQ ID VLERIQKHTNHNFEQTREIFELLLDTIKKLRDYYTHHYHKPITIN No. 123) PKIYDFLDDTLLDVLITIKKKKVKNDTSRELLKEKLRPELTQLKN QKREELIKKGKKLLEENLENAVFNHCLIPFLEENKTDDKQNKTV SLRKYRKSKPNEETSITLTQSGLVFLMSFFLHRKEFQVFTSGLER FKAKVNTIKEEEISLNKNNIVYMITHWSYSYYNFKGLKHRIKTD QGVSTLEQNNTTHSLTNTNTKEALLTQIVDYLSKVPNEIYETLSE KQQKEFEEDINEYMRENPENEDSTFSSIVSHKVIRKRYENKFNY FAMRFLDEYAELPTLRFMVNFGDYIKDRQKKILESIQFDSERIIK KEIHLFEKLSLVTEYKKNVYLKETSNIDLSRFPLFPNPSYVMAN NNIPFYIDSRSNNLDEYLNQKKKAQSQNKKRNLTFEKYNKEQS KDAIIAMLQKEIGVKDLQQRSTIGLLSCNELPSMLYEVIVKDIKG AELENKIAQKIREQYQSIRDFTLDSPQKDNIPTTLIKTINTDSSVT FENQPIDIPRLKNALQKELTLTQEKLLNVKEHEIEVDNYNRNKN TYKFKNQPKNKVDDKKLQRKYVFYRNEIRQEANWLASDLIHF MKNKSLWKGYMHNELQSFLAFFEDKKNDCIALLETVFNLKED CILTKGLKNLFLKHGNFIDFYKEYLKLKEDFLSTESTFLENGFIG LPPKILKKELSKRLKYIFIVFQKRQFIIKELEEKKNNLYADAINLS RGIFDEKPTMIPFKKPNPDEFASWFVASYQYNNYQSFYELTPDI VERDKKKKYKNLRAINKVKIQDYYLKLMVDTLYQDLFNQPLD KSLSDFYVSKAEREKIKADAKAYQKLNDSSLWNKVIHLSLQNN RITANPKLKDIGKYKRALQDEKIATLLTYDARTWTYALQKPEK ENENDYKELHYTALNMELQEYEKVRSKELLKQVQELEKKILDK FYDFSNNASHPEDLEIEDKKGKRHPNFKLYITKALLKNESEIINL ENIDIEILLKYYDYNTEELKEKIKNMDEDEKAKIINTKENYNKIT NVLIKKALVLIIIRNKMAHNQYPPKFIYDLANRFVPKKEEEYFAT YFNRVFETITKELWENKEKKDKTQV Porphyromonas 19 MTEQNEKPYNGTYYTLEDKHFWAAFLNLARHNAYITLAHIDR gingivalis QLAYSKADITNDEDILFFKGQWKNLDNDLERKARLRSLILKHFS (SEQ ID FLEGAAYGKKLFESQSSGNKSSKKKELSKKEKEELQANALSLD No. 124) NLKSILFDFLQKLKDFRNYYSHYRHPESSELPLFDGNMLQRLYN VFDVSVQRVKRDHEHNDKVDPHRHFNHLVRKGKKDKYGNND NPFFKHHFVDREGTVTEAGLLFFVSLFLEKRDAIWMQKKIRGFK GGTEAYQQMTNEVFCRSRISLPKLKLESLRTDDWMLLDMLNEL VRCPKSLYDRLREEDRARFRVPVDILSDEDDTDGTEEDPFKNTL VRHQDRFPYFALRYFDLKKVFTSLRFHIDLGTYHFAIYKKNIGE QPEDRHLTRNLYGFGRIQDFAEEHRPEEWKRLVRDLDYFETGD KPYITQTTPHYHIEKGKIGLRFVPEGQHLWPSPEVGATRTGRSK YAQDKRLTAEAFLSVHELMPMMFYYFLLREKYSEEVSAEKVQ GRIKRVIEDVYAVYDAFARDEINTRDELDACLADKGIRRGHLPR QMIAILSQEHKDMEEKVRKKLQEMIADTDHRLDMLDRQTDRKI RIGRKNAGLPKSGVVADWLVRDMMRFQPVAKDTSGKPLNNSK ANSTEYRMLQRALALFGGEKERLTPYFRQMNLTGGNNPHPFLH ETRWESHTNILSFYRSYLEARKAFLQSIGRSDRVENHRFLLLKEP KTDRQTLVAGWKGEFHLPRGIFTEAVRDCLIEMGYDEVGSYKE VGFMAKAVPLYFERASKDRVQPFYDYPFNVGNSLKPKKGRFLS KEKRAEEWESGKERFRLAKLKKEILEAKEHPYHDFKSWQKFER ELRLVKNQDIITWMMCRDLMEENKVEGLDTGTLYLKDIRTDV QEQGSLNVLNRVKPMRLPVVVYRADSRGHVHKEQAPLATVYI EERDTKLLKQGNFKSFVKDRRLNGLFSFVDTGALAMEQYPISK LRVEYELAKYQTARVCAFEQTLELEESLLTRYPHLPDKNFRKM LESWSDPLLDKWPDLHGNVRLLIAVRNAFSHNQYPMYDETLFS SIRKYDPSSPDAIEERMGLNIAHRLSEEVKQAKEMVERIIQA Porphyromonas 20 MTEQSERPYNGTYYTLEDKHFWAAFLNLARHNAYITLTHIDRQ sp. LAYSKADITNDQDVLSFKALWKNFDNDLERKSRLRSLILKHFSF COT-052 LEGAAYGKKLFESKSSGNKSSKNKELTKKEKEELQANALSLDN OH4946 LKSILFDFLQKLKDFRNYYSHYRHSESSELPLFDGNMLQRLYNV (SEQ ID FDVSVQRVKRDHEHNDKVDPHRHFNHLVRKGKKDRYGHNDN No. 125) PSFKHHFVDSEGMVTEAGLLFFVSLFLEKRDAIWMQKKIRGFK GGTETYQQMTNEVFCRSRISLPKLKLESLRTDDWMLLDMLNEL VRCPKPLYDRLREDDRACFRVPVDILPDEDDTDGGGEDPFKNT LVRHQDRFPYFALRYFDLKKVFTSLRFHIDLGTYHFAIYKKMIG EQPEDRHLTRNLYGFGRIQDFAEEHRPEEWKRLVRDLDYFETG DKPYISQTTPHYHIEKGKIGLRFVPEGQHLWPSPEVGTTRTGRSK YAQDKRLTAEAFLSVHELMPMMFYYFLLREKYSEEVSAEKVQ GRIKRVIEDVYAIYDAFARDEINTLKELDACLADKGIRRGHLPK QMIGILSQERKDMEEKVRKKLQEMIADTDHRLDMLDRQTDRKI RIGRKNAGLPKSGVIADWLVRDMMRFQPVAKDTSGKPLNNSK ANSTEYRMLQRALALFGGEKERLTPYFRQMNLTGGNNPHPFLH ETRWESHTNILSFYRSYLRARKAFLERIGRSDRVENCPFLLLKEP KTDRQTLVAGWKGEFHLPRGIFTEAVRDCLIEMGYDEVGSYRE VGFMAKAVPLYFERACEDRVQPFYDSPFNVGNSLKPKKGRFLS KEDRAEEWERGKERFRDLEAWSHSAARRIKDAFAGIEYASPGN KKKIEQLLRDLSLWEAFESKLKVRADKINLAKLKKEILEAQEHP YHDFKSWQKFERELRLVKNQDIITWMMCRDLMEENKVEGLDT GTLYLKDIRPNVQEQGSLNVLNRVKPMRLPVVVYRADSRGHV HKEEAPLATVYIEERDTKLLKQGNFKSFVKDRRLNGLFSFVDTG GLAMEQYPISKLRVEYELAKYQTARVCVFELTLRLEESLLSRYP HLPDESFREMLESWSDPLLAKWPELHGKVRLLIAVRNAFSHNQ YPMYDEAVFSSIRKYDPSSPDAIEERMGLNIAHRLSEEVKQAKE TVERIIQA Prevotella 21 MEDDKKTKESTNMLDNKHFWAAFLNLARHNVYITVNHINKVL intermedia ELKNKKDQDIIIDNDQDILAIKTHWEKVNGDLNKTERLRELMTK (SEQ ID HFPFLETAIYTKNKEDKEEVKQEKQAKAQSFDSLKHCLFLFLEK No. 126) LQEARNYYSHYKYSESTKEPMLEKELLKKMYNIFDDNIQLVIK DYQHNKDINPDEDFKHLDRTEEEFNYYFTTNKKGNITASGLLFF VSLFLEKKDAIWMQQKLRGFKDNRESKKKMTHEVFCRSRMLL PKLRLESTQTQDWILLDMLNELIRCPKSLYERLQGEYRKKFNVP FDSADEDYDAEQEPFKNTLVRHQDRFPYFALRYFDYNEIFTNLR FQIDLGTYHFSIYKKLIGGQKEDRHLTHKLYGFERIQEFAKQNR TDEWKAIVKDFDTYETSEEPYISETAPHYHLENQKIGIRFRNDN DEIWPSLKTNGENNEKRKYKLDKQYQAEAFLSVHELLPMMFY YLLLKKEEPNNDKKNASIVEGFIKREIRDIYKLYDAFANGEINNI DDLEKYCEDKGIPKRHLPKQMVAILYDEHKDMAEEAKRKQKE MVKDTKKLLATLEKQTQGEIEDGGRNIRLLKSGEIARWLVNDM MRFQPVQKDNEGNPLNNSKANSTEYQMLQRSLALYNKEEKPT RYFRQVNLINSSNPHPFLKWTKWEECNNILSFYRSYLTKKIEFLN KLKPEDWEKNQYFLKLKEPKTNRETLVQGWKNGFNLPRGIFTE PIREWFKRHQNDSEEYEKVETLDRVGLVTKVIPLFFKKEDSKDK EEYLKKDAQKEINNCVQPFYGFPYNVGNIHKPDEKDFLPSEERK KLWGDKKYKFKGYKAKVKSKKLTDKEKEEYRSYLEFQSWNK FERELRLVRNQDIVTWLLCTELIDKLKVEGLNVEELKKLRLKDI DTDTAKQEKNNILNRVMPMQLPVTVYEIDDSHNIVKDRPLHTV YIEETKTKLLKQGNFKALVKDRRLNGLFSFVDTSSETELKSNPIS KSLVEYELGEYQNARIETIKDMLLLEETLIEKYKTLPTDNFSDM LNGWLEGKDEADKARFQNDVKLLVAVRNAFSHNQYPMRNRIA FANINPFSLSSADTSEEKKLDIANQLKDKTHKIIKRIIEIEKPIETK E PIN17_0200 AFJ07523 MKMEDDKKTKESTNMLDNKHFWAAFLNLARHNVYITVNHIN [Prevotella KVLELKNKKDQDIIIDNDQDILAIKTHWEKVNGDLNKTERLREL intermedia MTKHFPFLETAIYTKNKEDKEEVKQEKQAKAQSFDSLKHCLFL 17] (SEQ FLEKLQEARNYYSHYKYSESTKEPMLEKELLKKMYNIFDDNIQ ID No. LVIKDYQHNKDINPDEDFKHLDRTEEEFNYYFTTNKKGNITASG 127) LLFFVSLFLEKKDAIWMQQKLRGFKDNRESKKKMTHEVFCRSR MLLPKLRLESTQTQDWILLDMLNELIRCPKSLYERLQGEYRKKF NVPFDSADEDYDAEQEPFKNTLVRHQDRFPYFALRYFDYNEIFT NLRFQIDLGTYHFSIYKKLIGGQKEDRHLTHKLYGFERIQEFAK QNRTDEWKAIVKDFDTYETSEEPYISETAPHYHLENQKIGIRFRN DNDEIWPSLKTNGENNEKRKYKLDKQYQAEAFLSVHELLPMM FYYLLLKKEEPNNDKKNASIVEGFIKREIRDIYKLYDAFANGEIN NIDDLEKYCEDKGIPKRHLPKQMVAILYDEHKDMAEEAKRKQ KEMVKDTKKLLATLEKQTQGEIEDGGRNIRLLKSGEIARWLVN DMMRFQPVQKDNEGNPLNNSKANSTEYQMLQRSLALYNKEEK PTRYFRQVNLINSSNPHPFLKWTKWEECNNILSFYRSYLTKKIEF LNKLKPEDWEKNQYFLKLKEPKTNRETLVQGWKNGFNLPRGIF TEPIREWFKRHQNDSEEYEKVETLDRVGLVTKVIPLFFKKEDSK DKEEYLKKDAQKEINNCVQPFYGFPYNVGNIHKPDEKDFLPSEE RKKLWGDKKYKFKGYKAKVKSKKLTDKEKEEYRSYLEFQSW NKFERELRLVRNQDIVTWLLCTELIDKLKVEGLNVEELKKLRLK DIDTDTAKQEKNNILNRVMPMQLPVTVYEIDDSHNIVKDRPLHT VYIEETKTKLLKQGNFKALVKDRRLNGLFSFVDTSSETELKSNPI SKSLVEYELGEYQNARIETIKDMLLLEETLIEKYKTLPTDNFSDM LNGWLEGKDEADKARFQNDVKLLVAVRNAFSHNQYPMRNRIA FANINPFSLSSADTSEEKKLDIANQLKDKTHKIIKRIIEIEKPIETK E Prevotella BAU18623 MEDDKKTTDSISYELKDKHFWAAFLNLARHNVYITVNHINKVL intermedia ELKNKKDQDIIIDNDQDILAIKTHWEKVNGDLNKTERLRELMTK (SEQ ID HFPFLETAIYSKNKEDKEEVKQEKQAKAQSFDSLKHCLFLFLEK No. 128) LQETRNYYSHYKYSESTKEPMLEKELLKKMYNIFDDNIQLVIKD YQHNKDINPDEDFKHLDRTEEDFNYYFTRNKKGNITESGLLFFV SLFLEKKDAIWMQQKLRGFKDNRESKKKMTHEVFCRSRMLLP KLRLESTQTQDWILLDMLNELIRCPKSLYERLQGEDREKFKVPF DPADEDYDAEQEPFKNTLVRHQDRFPYFALRYFDYNEIFTNLRF QIDLGTFHFSIYKKLIGGQKEDRHLTHKLYGFERIQEFAKQNRPD EWKAIVKDLDTYETSNERYISETTPHYHLENQKIGIRFRNDNDEI WPSLKTNGENNEKSKYKLDKQYQAEAFLSVHELLPMMFYYLL LKKEEPNNDKKNASIVEGFIKREIRDMYKLYDAFANGEINNIDD LEKYCEDKGIPKRHLPKQMVAILYDEHKDMVKEAKRKQRKMV KDTEKLLAALEKQTQEKTEDGGRNIRLLKSGEIARWLVNDMM RFQPVQKDNEGNPLNNSKANSTEYQMLQRSLALYNKEEKPTRY FRQVNLINSSNPHPFLKWTKWEECNNILSFYRSYLTKKIEFLNKL KPEDWEKNQYFLKLKEPKTNRETLVQGWKNGFNLPRGIFTEPIR EWFKRHQNDSKEYEKVEALDRVGLVTKVIPLFFKKEDSKDKEE DLKKDAQKEINNCVQPFYSFPYNVGNIHKPDEKDFLHREERIEL WDKKKDKFKGYKAKVKSKKLTDKEKEEYRSYLEFQSWNKFER ELRLVRNQDIVTWLLCTELIDKLKVEGLNVEELKKLRLKDIDTD TAKQEKNNILNRVMPMQLPVTVYEIDDSHNIVKDRPLHTVYIEE TKTKLLKQGNFKALVKDRRLNGLFSFVDTSSEAELKSNPISKSL VEYELGEYQNARIETIKDMLLLEETLIEKYKNLPTDNFSDMLNG WLEGKDEADKARFQNDVKLLVAVRNAFSHNQYPMRNRIAFAN INPFSLSSADTSEEKKLDIANQLKDKTHKIIKRIIEIEKPIETKE HMPREF6485_0083 EFU31981 MQKQDKLFVDRKKNAIFAFPKYITIMENKEKPEPIYYELTDKHF [Prevotella WAAFLNLARHNVYTTINHINRRLEIAELKDDGYMMGIKGSWNE buccae QAKKLDKKVRLRDLIMKHFPFLEAAAYEMTNSKSPNNKEQRE ATCC KEQSEALSLNNLKNVLFIFLEKLQVLRNYYSHYKYSEESPKPIFE 33574] TSLLKNMYKVFDANVRLVKRDYMHHENIDMQRDFTHLNRKK (SEQ ID QVGRTKNIIDSPNFHYHFADKEGNMTIAGLLFFVSLFLDKKDAI No. 129) WMQKKLKGFKDGRNLREQMTNEVFCRSRISLPKLKLENVQTK DWMQLDMLNELVRCPKSLYERLREKDRESFKVPFDIFSDDYNA EEEPFKNTLVRHQDRFPYFVLRYFDLNEIFEQLRFQIDLGTYHFS IYNKRIGDEDEVRHLTHHLYGFARIQDFAPQNQPEEWRKLVKD LDHFETSQEPYISKTAPHYHLENEKIGIKFCSAHNNLFPSLQTDK TCNGRSKFNLGTQFTAEAFLSVHELLPMMFYYLLLTKDYSRKE SADKVEGIIRKEISNIYAIYDAFANNEINSIADLTRRLQNTNILQG HLPKQMISILKGRQKDMGKEAERKIGEMIDDTQRRLDLLCKQT NQKIRIGKRNAGLLKSGKIADWLVNDMMRFQPVQKDQNNIPIN NSKANSTEYRMLQRALALFGSENFRLKAYFNQMNLVGNDNPH PFLAETQWEHQTNILSFYRNYLEARKKYLKGLKPQNWKQYQH FLILKVQKTNRNTLVTGWKNSFNLPRGIFTQPIREWFEKHNNSK RIYDQILSFDRVGFVAKAIPLYFAEEYKDNVQPFYDYPFNIGNRL KPKKRQFLDKKERVELWQKNKELFKNYPSEKKKTDLAYLDFLS WKKFERELRLIKNQDIVTWLMFKELFNMATVEGLKIGEIHLRDI DTNTANEESNNILNRIMPMKLPVKTYETDNKGNILKERPLATFY IEETETKVLKQGNFKALVKDRRLNGLFSFAETTDLNLEEHPISKL SVDLELIKYQTTRISIFEMTLGLEKKLIDKYSTLPTDSFRNMLER WLQCKANRPELKNYVNSLIAVRNAFSHNQYPMYDATLFAEVK KFTLFPSVDTKKIELNIAPQLLEIVGKAIKEIEKSENKN HMPREF9144_1146 EGQ18444 MKEEEKGKTPVVSTYNKDDKHFWAAFLNLARHNVYITVNHIN [Prevotella KILGEGEINRDGYENTLEKSWNEIKDINKKDRLSKLIIKHFPFLE pallens VTTYQRNSADTTKQKEEKQAEAQSLESLKKSFFVFIYKLRDLRN ATCC HYSHYKHSKSLERPKFEEDLQEKMYNIFDASIQLVKEDYKHNT 700821] DIKTEEDFKHLDRKGQFKYSFADNEGNITESGLLFFVSLFLEKK (SEQ ID DAIWVQKKLEGFKCSNESYQKMTNEVFCRSRMLLPKLRLQSTQ No. 130) TQDWILLDMLNELIRCPKSLYERLREEDRKKFRVPIEIADEDYD AEQEPFKNALVRHQDRFPYFALRYFDYNEIFTNLRFQIDLGTYH FSIYKKQIGDYKESHHLTHKLYGFERIQEFTKQNRPDEWRKFVK TFNSFETSKEPYIPETTPHYHLENQKIGIRFRNDNDKIWPSLKTNS EKNEKSKYKLDKSFQAEAFLSVHELLPMMFYYLLLKTENTDND NEIETKKKENKNDKQEKHKIEEIIENKITEIYALYDAFANGKINSI DKLEEYCKGKDIEIGHLPKQMIAILKSEHKDMATEAKRKQEEM LADVQKSLESLDNQINEEIENVERKNSSLKSGEIASWLVNDMM RFQPVQKDNEGNPLNNSKANSTEYQMLQRSLALYNKEEKPTRY FRQVNLIESSNPHPFLNNTEWEKCNNILSFYRSYLEAKKNFLESL KPEDWEKNQYFLMLKEPKTNCETLVQGWKNGFNLPRGIFTEPI RKWFMEHRKNITVAELKRVGLVAKVIPLFFSEEYKDSVQPFYN YLFNVGNINKPDEKNFLNCEERRELLRKKKDEFKKMTDKEKEE NPSYLEFQSWNKFERELRLVRNQDIVTWLLCMELFNKKKIKEL NVEKIYLKNINTNTTKKEKNTEEKNGEEKIIKEKNNILNRIMPMR LPIKVYGRENFSKNKKKKIRRNTFFTVYIEEKGTKLLKQGNFKA LERDRRLGGLFSFVKTHSKAESKSNTISKSRVEYELGEYQKARIE IIKDMLALEETLIDKYNSLDTDNFHNMLTGWLKLKDEPDKASF QNDVDLLIAVRNAFSHNQYPMRNRIAFANINPFSLSSANTSEEK GLGIANQLKDKTHKTIEKIIEIEKPIETKE HMPREF9714_02132 EHO08761 MKDILTTDTTEKQNRFYSHKIADKYFFGGYFNLASNNIYEVFEE [Myroides VNKRNTFGKLAKRDNGNLKNYIIHVFKDELSISDFEKRVAIFAS odoratimimus YFPILETVDKKSIKERNRTIDLTLSQRIRQFREMLISLVTAVDQLR CCUG NFYTHYHHSEIVIENKVLDFLNSSLVSTALHVKDKYLKTDKTKE 12901] FLKETIAAELDILIEAYKKKQIEKKNTRFKANKREDILNAIYNEA (SEQ ID FWSFINDKDKDKETVVAKGADAYFEKNHHKSNDPDFALNISEK No. 131) GIVYLLSFFLTNKEMDSLKANLTGFKGKVDRESGNSIKYMATQ RIYSFHTYRGLKQKIRTSEEGVKETLLMQMIDELSKVPNVVYQH LSTTQQNSFIEDWNEYYKDYEDDVETDDLSRVIHPVIRKRYEDR FNYFAIRFLDEFFDFPTLRFQVHLGDYVHDRRTKQLGKVESDRII KEKVTVFARLKDINSAKANYFHSLEEQDKEELDNKWTLFPNPS YDFPKEHTLQHQGEQKNAGKIGIYVKLRDTQYKEKAALEEARK SLNPKERSATKASKYDIITQIIEANDNVKSEKPLVFTGQPIAYLS MNDIHSMLFSLLTDNAELKKTPEEVEAKLIDQIGKQINEILSKDT DTKILKKYKDNDLKETDTDKITRDLARDKEEIEKLILEQKQRAD DYNYTSSTKFNIDKSRKRKHLLFNAEKGKIGVWLANDIKRFMT EEFKSKWKGYQHTELQKLFAYYDTSKSDLDLILSDMVMVKDY PIELIALVKKSRTLVDFLNKYLEARLGYMENVITRVKNSIGTPQF KTVRKECFTFLKKSNYTVVSLDKQVERILSMPLFIERGFMDDKP TMLEGKSYQQHKEKFADWFVHYKENSNYQNFYDTEVYEITTE DKREKAKVTKKIKQQQKNDVFTLMMVNYMLEEVLKLSSNDRL SLNELYQTKEERIVNKQVAKDTQERNKNYIWNKVVDLQLCEG LVRIDKVKLKDIGNFRKYENDSRVKEFLTYQSDIVWSAYLSNEV DSNKLYVIERQLDNYESIRSKELLKEVQEIECSVYNQVANKESL KQSGNENFKQYVLQGLVPIGMDVREMLILSTDVKFIKEEIIQLG QAGEVEQDLYSLIYIRNKFAHNQLPIKEFFDFCENNYRSISDNEY YAEYYMEIFRSIKEKYTS HMPREF9711_00870 EKB06014 MKDILTTDTTEKQNRFYSHKIADKYFFGGYFNLASNNIYEVFEE [Myroides VNKRNTFGKLAKRDNGNLKNYIIHVFKDELSISDFEKRVAIFAS odoratimimus YFPILETVDKKSIKERNRTIDLTLSQRIRQFREMLISLVTAVDQLR CCUG NFYTHYHHSEIVIENKVLDFLNSSLVSTALHVKDKYLKTDKTKE 3837] FLKETIAAELDILIEAYKKKQIEKKNTRFKANKREDILNAIYNEA (SEQ ID FWSFINDKDKDKETVVAKGADAYFEKNHHKSNDPDFALNISEK No. 132) GIVYLLSFFLTNKEMDSLKANLTGFKGKVDRESGNSIKYMATQ RIYSFHTYRGLKQKIRTSEEGVKETLLMQMIDELSKVPNVVYQH LSTTQQNSFIEDWNEYYKDYEDDVETDDLSRVIHPVIRKRYEDR FNYFAIRFLDEFFDFPTLRFQVHLGDYVHDRRTKQLGKVESDRII KEKVTVFARLKDINSAKASYFHSLEEQDKEELDNKWTLFPNPS YDFPKEHTLQHQGEQKNAGKIGIYVKLRDTQYKEKAALEEARK SLNPKERSATKASKYDIITQIIEANDNVKSEKPLVFTGQPIAYLS MNDIHSMLFSLLTDNAELKKTPEEVEAKLIDQIGKQINEILSKDT DTKILKKYKDNDLKETDTDKITRDLARDKEEIEKLILEQKQRAD DYNYTSSTKFNIDKSRKRKHLLFNAEKGKIGVWLANDIKRFMF KESKSKWKGYQHTELQKLFAYFDTSKSDLELILSDMVMVKDYP IELIDLVRKSRTLVDFLNKYLEARLGYIENVITRVKNSIGTPQFKT VRKECFAFLKESNYTVASLDKQIERILSMPLFIERGFMDSKPTML EGKSYQQHKEDFADWFVHYKENSNYQNFYDTEVYEIITEDKRE QAKVTKKIKQQQKNDVFTLMMVNYMLEEVLKLPSNDRLSLNE LYQTKEERIVNKQVAKDTQERNKNYIWNKVVDLQLCEGLVRID KVKLKDIGNFRKYENDSRVKEFLTYQSDIVWSGYLSNEVDSNK LYVIERQLDNYESIRSKELLKEVQEIECIVYNQVANKESLKQSGN ENFKQYVLQGLLPRGTDVREMLILSTDVKFKKEEIMQLGQVRE VEQDLYSLIYIRNKFAHNQLPIKEFFDFCENNYRPISDNEYYAEY YMEIFRSIKEKYAS HMPREF9699_02005 EKB54193 MENKTSLGNNIYYNPFKPQDKSYFAGYFNAAMENTDSVFRELG [Bergeyella KRLKGKEYTSENFFDAIFKENISLVEYERYVKLLSDYFPMARLL zoohelcum DKKEVPIKERKENFKKNFKGIIKAVRDLRNFYTHKEHGEVEITD ATCC EIFGVLDEMLKSTVLTVKKKKVKTDKTKEILKKSIEKQLDILCQ 43767] KKLEYLRDTARKIEEKRRNQRERGEKELVAPFKYSDKRDDLIA (SEQ ID AIYNDAFDVYIDKKKDSLKESSKAKYNTKSDPQQEEGDLKIPIS No. 133) KNGVVFLLSLFLTKQEIHAFKSKIAGFKATVIDEATVSEATVSHG KNSICFMATHEIFSHLAYKKLKRKVRTAEINYGEAENAEQLSVY AKETLMMQMLDELSKVPDVVYQNLSEDVQKTFIEDWNEYLKE NNGDVGTMEEEQVIHPVIRKRYEDKFNYFAIRFLDEFAQFPTLR FQVHLGNYLHDSRPKENLISDRRIKEKITVFGRLSELEHKKALFI KNTETNEDREHYWEIFPNPNYDFPKENISVNDKDFPIAGSILDRE KQPVAGKIGIKVKLLNQQYVSEVDKAVKAHQLKQRKASKPSIQ NIIEEIVPINESNPKEAIVFGGQPTAYLSMNDIHSILYEFFDKWEK KKEKLEKKGEKELRKEIGKELEKKIVGKIQAQIQQIIDKDTNAKI LKPYQDGNSTAIDKEKLIKDLKQEQNILQKLKDEQTVREKEYN DFIAYQDKNREINKVRDRNHKQYLKDNLKRKYPEAPARKEVL YYREKGKVAVWLANDIKRFMPTDFKNEWKGEQHSLLQKSLAY YEQCKEELKNLLPEKVFQHLPFKLGGYFQQKYLYQFYTCYLDK RLEYISGLVQQAENFKSENKVFKKVENECFKFLKKQNYTHKEL DARVQSILGYPIFLERGFMDEKPTIIKGKTFKGNEALFADWFRY YKEYQNFQTFYDTENYPLVELEKKQADRKRKTKIYQQKKNDV FTLLMAKHIFKSVFKQDSIDQFSLEDLYQSREERLGNQERARQT GERNTNYIWNKTVDLKLCDGKITVENVKLKNVGDFIKYEYDQR VQAFLKYEENIEWQAFLIKESKEEENYPYVVEREIEQYEKVRRE ELLKEVHLIEEYILEKVKDKEILKKGDNQNFKYYILNGLLKQLK NEDVESYKVFNLNTEPEDVNINQLKQEATDLEQKAFVLTYIRN KFAHNQLPKKEFWDYCQEKYGKIEKEKTYAEYFAEVFKKEKE ALIK HMPREF9151_01387 EKY00089 MMEKENVQGSHIYYEPTDKCFWAAFYNLARHNAYLTIAHINSF [Prevotella VNSKKGINNDDKVLDIIDDWSKFDNDLLMGARLNKLILKHFPFL saccharolytica KAPLYQLAKRKTRKQQGKEQQDYEKKGDEDPEVIQEAIANAFK F0055] MANVRKTLHAFLKQLEDLRNHFSHYNYNSPAKKMEVKFDDGF (SEQ ID CNKLYYVFDAALQMVKDDNRMNPEINMQTDFEHLVRLGRNR No. 134) KIPNTFKYNFTNSDGTINNNGLLFFVSLFLEKRDAIWMQKKIKG FKGGTENYMRMTNEVFCRNRMVIPKLRLETDYDNHQLMFDML NELVRCPLSLYKRLKQEDQDKFRVPIEFLDEDNEADNPYQENA NSDENPTEETDPLKNTLVRHQHRFPYFVLRYFDLNEVFKQLRFQ INLGCYHFSIYDKTIGERTEKRHLTRTLFGFDRLQNFSVKLQPEH WKNMVKHLDTEESSDKPYLSDAMPHYQIENEKIGIHFLKTDTE KKETVWPSLEVEEVSSNRNKYKSEKNLTADAFLSTHELLPMMF YYQLLSSEEKTRAAAGDKVQGVLQSYRKKIFDIYDDFANGTINS MQKLDERLAKDNLLRGNMPQQMLAILEHQEPDMEQKAKEKL DRLITETKKRIGKLEDQFKQKVRIGKRRADLPKVGSIADWLVND MMRFQPAKRNADNTGVPDSKANSTEYRLLQEALAFYSAYKDR LEPYFRQVNLIGGTNPHPFLHRVDWKKCNHLLSFYHDYLEAKE QYLSHLSPADWQKHQHFLLLKVRKDIQNEKKDWKKSLVAGW KNGFNLPRGLFTESIKTWFSTDADKVQITDTKLFENRVGLIAKLI PLYYDKVYNDKPQPFYQYPFNINDRYKPEDTRKRFTAASSKLW NEKKMLYKNAQPDSSDKIEYPQYLDFLSWKKLERELRMLRNQ DMMVWLMCKDLFAQCTVEGVEFADLKLSQLEVDVNVQDNLN VLNNVSSMILPLSVYPSDAQGNVLRNSKPLHTVYVQENNTKLL KQGNFKSLLKDRRLNGLFSFIAAEGEDLQQHPLTKNRLEYELSI YQTMRISVFEQTLQLEKAILTRNKTLCGNNFNNLLNSWSEHRTD KKTLQPDIDFLIAVRNAFSHNQYPMSTNTVMQGIEKFNIQTPKL EEKDGLGIASQLAKKTKDAASRLQNIINGGTN A3431752 EOA10535 MTEQNEKPYNGTYYTLEDKHFWAAFFNLARHNAYITLTHIDRQ [Porphyromonas LAYSKADITNDEDILFFKGQWKNLDNDLERKARLRSLILKHFSF gingivalis LEGAAYGKKLFESQSSGNKSSKKKELTKKEKEELQANALSLDN JCVI LKSILFDFLQKLKDFRNYYSHYRHPESSELPLFDGNMLQRLYNV SC001] FDVSVQRVKRDHEHNDKVDPHRHFNHLVRKGKKDRCGNNDN (SEQ ID PFFKHHFVDREEKVTEAGLLFFVSLFLEKRDAIWMQKKIRGFKG No. 135) GTETYQQMTNEVFCRSRISLPKLKLESLRTDDWMLLDMLNELV RCPKSLYDRLREEDRARFRVPVDILSDEDDTDGTEEDPFKNTLV RHQDRFPYFALRYFDLKKVFTSLRFHIDLGTYHFAIYKKNIGEQ PEDRHLTRNLYGFGRIQDFAEEHRPEEWKRLVRDLDYFETGDK PYITQTTPHYHIEKGKIGLRFVPEGQLLWPSPEVGATRTGRSKY AQDKRFTAEAFLSVHELMPMMFYYFLLREKYSEEASAERVQGR IKRVIEDVYAVYDAFARGEIDTLDRLDACLADKGIRRGHLPRQ MIAILSQEHKDMEEKVRKKLQEMIADTDHRLDMLDRQTDRKIR IGRKNAGLPKSGVIADWLVRDMMRFQPVAKDTSGKPLNNSKA NSTEYRMLQRALALFGGEKERLTPYFRQMNLTGGNNPETPFLHE TRWESHTNILSFYRSYLKARKAFLQSIGRSDRVENHRFLLLKEP KTDRQTLVAGWKGEFHLPRGIFTEAVRDCLIEMGLDEVGSYKE VGFMAKAVPLYFERACKDRVQPFYDYPFNVGNSLKPKKGRFLS KEKRAEEWESGKERFRDLEAWSHSAARRIEDAFAGIENASREN KKKIEQLLQDLSLWETFESKLKVKADKINIAKLKKEILEAKEHP YLDFKSWQKFERELRLVKNQDIITWMMCRDLMEENKVEGLDT GTLYLKDIRTDVHEQGSLNVLNRVKPMRLPVVVYRADSRGHV HKEQAPLATVYIEERDTKLLICQGNFKSFVKDRRLNGLFSFVDT GALAMEQYPISKLRVEYELAKYQTARVCAFEQTLELEESLLTRY PHLPDKNFRKMLESWSDPLLDKWPDLHGNVRLLIAVRNAFSHN QYPMYDETLFSSIRKYDPSSPDAIEERMGLNIAHRLSEEVKQAK EMVERIIQA HMPREF1981_03090 ERI81700 MESIKNSQKSTGKTLQKDPPYFGLYLNMALLNVRKVENHIRKW [Bacteroides LGDVALLPEKSGFHSLLTTDNLSSAKWTRFYYKSRKFLPFLEMF pyogenes DSDKKSYENRRETTECLDTIDRQKISSLLKEVYGKLQDIRNAFS F0041] HYHIDDQSVKHTALIISSEMHRFIENAYSFALQKTRARFTGVFVE (SEQ ID TDFLQAEEKGDNKKFFAIGGNEGIKLKDNALIFLICLFLDREEAF No. 136) KFLSRATGFKSTKEKGFLAVRETFCALCCRQPHERLLSVNPREA LLMDMLNELNRCPDILFEMLDEKDQKSFLPLLGEEEQAHILENS LNDELCEAIDDPFEMIASLSKRVRYKNRFPYLMLRYIEEKNLLPF IRFRIDLGCLELASYPKKMGEENNYERSVTDHAMAFGRLTDFH NEDAVLQQITKGITDEVRFSLYAPRYAIYNNKIGFVRTGGSDKIS FPTLKKKGGEGHCVAYTLQNTKSFGFISIYDLRKILLLSFLDKDK AKNIVSGLLEQCEKHWKDLSENLFDAIRTELQKEFPVPLIRYTLP RSKGGKLVSSKLADKQEKYESEFERRKEKLTEILSEKDFDLSQIP RRMIDEWLNVLPTSREKKLKGYVETLKLDCRERLRVFEKREKG EHPVPPRIGEMATDLAKDIIRMVIDQGVKQRITSAYYSEIQRCLA QYAGDDNRRHLDSIIRELRLKDTKNGHPFLGKVLRPGLGHTEK LYQRYFEEKKEWLEATFYPAASPKRVPRFVNPPTGKQKELPLII RNLMKERPEWRDWKQRKNSHPIDLPSQLFENEICRLLKDKIGKE PSGKLKWNEMFKLYWDKEFPNGMQRFYRCKRRVEVFDKVVE YEYSEEGGNYKKYYEALIDEVVRQKISSSKEKSKLQVEDLTLSV RRVFKRAINEKEYQLRLLCEDDRLLFMAVRDLYDWKEAQLDL DKIDNMLGEPVSVSQVIQLEGGQPDAVIKAECKLKDVSKLMRY CYDGRVKGLMPYFANHEATQEQVEMELRHYEDHRRRVFNWV FALEKSVLKNEKLRRFYEESQGGCEHRRCIDALRKASLVSEEEY EFLVHIRNKSAHNQFPDLEIGKLPPNVTSGFCECIWSKYKAIICRI IPFIDPERRFFGKLLEQK HMPREF1553_02065 ERJ65637 MNTVPASENKGQSRTVEDDPQYFGLYLNLARENLIEVESHVRIK [Porphyromonas FGKKKLNEESLKQSLLCDHLLSVDRWTKVYGHSRRYLPFLHYF gingivalis DPDSQIEKDHDSKTGVDPDSAQRLIRELYSLLDFLRNDFSHNRL F0568] DGTTFEHLEVSPDISSFITGTYSLACGRAQSRFADFFKPDDFVLA (SEQ ID KNRKEQLISVADGKECLTVSGLAFFICLFLDREQASGMLSRIRGF No. 137) KRTDENWARAVHETFCDLCIRHPHDRLESSNTKEALLLDMLNE LNRCPRILYDMLPEEERAQFLPALDENSMNNLSENSLNEESRLL WDGSSDWAEALTKRIRHQDRFPYLMLRFIEEMDLLKGIRFRVD LGEIELDSYSKKVGRNGEYDRTITDHALAFGKLSDFQNEEEVSR MISGEASYPVRFSLFAPRYAIYDNKIGYCHTSDPVYPKSKTGEK RALSNPRSMGFISVHDLRKLLLMELLCEGSFSRMQSDFLRKANR ILDETAEGKLQFSALFPEMRHRFIPPQNPKSKDRREKAETTLEKY KQEIKGRKDKLNSQLLSAFDMDQRQLPSRLLDEWMNIRPASHS VKLRTYVKQLNEDCRLRLQKFRKDGDGKARAIPLVGEMATFLS QDIVRMIISEETKKLITSAYYNEMQRSLAQYAGEENRHQFRAIV AELRLLDPSSGHPFLSATMETAHRYTEDFYKCYLEKKREWLAK TFYRPEQDENTKRRISVFFVPDGEARKLLPLLIRRRMKEQNDLQ DWIRNKQAHPIDLPSHLFDSKIMELLKVKDGKKKWNEAFKDW WSTKYPDGMQPFYGLRRELNIHGKSVSYIPSDGKKFADCYTHL MEKTVQDKKRELRTAGKPVPPDLAADIKRSFHRAVNEREFMLR LVQEDDRLMLMAINKMMTDREEDILPGLKNIDSILDEENQFSLA VHAKVLEKEGEGGDNSLSLVPATIEIKSKRKDWSKYIRYRYDR RVPGLMSHFPEHKATLDEVKTLLGEYDRCRIKIFDWAFALEGAI MSDRDLKPYLHESSSREGKSGEHSTLVKMLVEKKGCLTPDESQ YLILIRNKAAHNQFPCAAEMPLIYRDVSAKVGSIEGSSAKDLPE GSSLVDSLWKKYEMIIRKILPILDPENRFFGKLLNNMSQPINDL HMPREF1988_01768 ERJ81987 MNTVPASENKGQSRTVEDDPQYFGLYLNLARENLIEVESHVRIK [Porphyromonas FGKKKLNEESLKQSLLCDHLLSVDRWTKVYGHSRRYLPFLHYF gingivalis DPDSQIEKDHDSKTGVDPDSAQRLIRELYSLLDFLRNDFSHNRL F0185] DGTTFEHLEVSPDISSFITGTYSLACGRAQSRFADFFKPDDFVLA (SEQ ID KNRKEQLISVADGKECLTVSGLAFFICLFLDREQASGMLSRIRGF No. 138) KRTDENWARAVHETFCDLCIRHPHDRLESSNTKEALLLDMLNE LNRCPRILYDMLPEEERAQFLPALDENSMNNLSENSLNEESRLL WDGSSDWAEALTKRIRHQDRFPYLMLRFIEEMDLLKGIRFRVD LGEIELDSYSKKVGRNGEYDRTITDHALAFGKLSDFQNEEEVSR MISGEASYPVRFSLFAPRYAIYDNKIGYCHTSDPVYPKSKTGEK RALSNPQSMGFISVHDLRKLLLMELLCEGSFSRMQSGFLRKANR ILDETAEGKLQFSALFPEMRHRFIPPQNPKSKDRREKAETTLEKY KQEIKGRKDKLNSQLLSAFDMNQRQLPSRLLDEWMNIRPASHS VKLRTYVKQLNEDCRLRLRKFRKDGDGKARAIPLVGEMATFLS QDIVRMIISEETKKLITSAYYNEMQRSLAQYAGEENRRQFRAIV AELHLLDPSSGHPFLSATMETAHRYTEDFYKCYLEKKREWLAK TFYRPEQDENTKRRISVFFVPDGEARKLLPLLIRRRMKEQNDLQ DWIRNKQAHPIDLPSHLFDSKIMELLKVKDGKKKWNEAFKDW WSTKYPDGMQPFYGLRRELNIHGKSVSYIPSDGKKFADCYTHL MEKTVQDKKRELRTAGKPVPPDLAADIKRSFHRAVNEREFMLR LVQEDDRLMLMAINKMMTDREEDILPGLKNIDSILDEENQFSLA VHAKVLEKEGEGGDNSLSLVPATIEIKSKRKDWSKYIRYRYDR RVPGLMSHFPEHKATLDEVKTLLGEYDRCRIKIFDWAFALEGAI MSDRDLKPYLHESSSREGKSGEHSTLVKMLVEKKGCLTPDESQ YLILIRNKAAHNQFPCAAEMPLIYRDVSAKVGSIEGSSAKDLPE GSSLVDSLWKKYEMIIRKILPILDHENRFFGKLLNNMSQPINDL HMPREF1990_01800 ERJ87335 MNTVPASENKGQSRTVEDDPQYFGLYLNLARENLIEVESHVRIK [Porphyromonas FGKKKLNEESLKQSLLCDHLLSVDRWTKVYGHSRRYLPFLHYF gingivalis DPDSQIEKDHDSKTGVDPDSAQRLIRELYSLLDFLRNDFSHNRL W4087] DGTTFEHLEVSPDISSFITGTYSLACGRAQSRFADFFKPDDFVLA (SEQ ID KNRKEQLISVADGKECLTVSGLAFFICLFLDREQASGMLSRIRGF No. 139) KRTDENWARAVHETFCDLCIRHPHDRLESSNTKEALLLDMLNE LNRCPRILYDMLPEEERAQFLPALDENSMNNLSENSLNEESRLL WDGSSDWAEALTKRIRHQDRFPYLMLRFIEEMDLLKGIRFRVD LGEIELDSYSKKVGRNGEYDRTITDHALAFGKLSDFQNEEEVSR MISGEASYPVRFSLFAPRYAIYDNKIGYCHTSDPVYPKSKTGEK RALSNPRSMGFISVHDLRKLLLMELLCEGSFSRMQSDFLRKANR ILDETAEGKLQFSALFPEMRHRFIPPQNPKSKDRREKAETTLEKY KQEIKGRKDKLNSQLLSAFDMDQRQLPSRLLDEWMNIRPASHS VKLRTYVKQLNEDCRLRLQKFRKDGDGKARAIPLVGEMATFLS QDIVRMIISEETKKLITSAYYNEMQRSLAQYAGEENRHQFRAIV AELRLLDPSSGHPFLSATMETAHRYTEDFYKCYLEKKREWLAK TFYRPEQDENTKRRISVFFVPDGEARKLLPLLIRRRMKEQNDLQ DWIRNKQAHPIDLPSHLFDSKVMELLKVKDGKKKWNEAFKDW WSTKYPDGMQPFYGLRRELNIHGKSVSYIPSDGKKFADCYTHL MEKTVRDKKRELRTAGKPVPPDLAAYIKRSFHRAVNEREFMLR LVQEDDRLMLMAINKIMTDREEDILPGLKNIDSILDKENQFSLA VHAKVLEKEGEGGDNSLSLVPATIEIKSKRKDWSKYIRYRYDR RVPGLMSHFPEHKATLDEVKTLLGEYDRCRIKIFDWAFALEGAI MSDRDLKPYLHESSSREGKSGEHSTLVKMLVEKKGCLTPDESQ YLILIRNKAAHNQFPCAAEIPLIYRDVSAKVGSIEGSSAKDLPEG SSLVDSLWKKYEMIIRKILPILDPENRFFGKLLNNMSQPINDL M573_117042 KJJ86756 MKMEDDKKTTESTNMLDNKHFWAAFLNLARHNVYITVNHINK [Prevotella VLELKNKKDQDIIIDNDQDILAIKTHWEKVNGDLNKTERLRELM intermedia TKHFPFLETAIYTKNKEDKEEVKQEKQAEAQSLESLKDCLFLFL ZT] (SEQ EKLQEARNYYSHYKYSESTKEPMLEEGLLEKMYNIFDDNIQLVI ID No. KDYQHNKDINPDEDFKHLDRKGQFKYSFADNEGNITESGLLFF 140) VSLFLEKKDAIWMQQKLTGFKDNRESKKKMTHEVFCRRRMLL PKLRLESTQTQDWILLDMLNELIRCPKSLYERLQGEYRKKFNVP FDSADEDYDAEQEPFKNTLVRHQDREPYFALRYFDYNEIFTNLR FQIDLGTYHFSIYKKLIGGQKEDRHLTHKLYGFERIQEFAKQNRP DEWKALVKDLDTYETSNERYISETTPHYHLENQKIGIRFRNGNK EIWPSLKTNGENNEKSKYKLDKPYQAEAFLSVHELLPMMFYYL LLKKEEPNNDKKNASIVEGFIKREIRDMYKLYDAFANGEINNIG DLEKYCEDKGIPKRHLPKQMVAILYDEPKDMVKEAKRKQKEM VKDTKKLLATLEKQTQEEIEDGGRNIRLLKSGEIARWLVNDMM RFQPVQKDNEGNPLNNSKANSTEYQMLQRSLALYNKEEKPTRY FRQVNLINSSNPHPFLKWTKWEECNNILSFYRNYLTKKIEFLNK LKPEDWEKNQYFLKLKEPKTNRETLVQGWKNGFNLPRGIFTEPI REWFKRHQNDSKEYEKVEALKRVGLVTKVIPLEFKEEYEKEDA QKEINNCVQPFYSFPYNVGNIHKPDEKDFLPSEERKKLWGDKK DKFKGYKAKVKSKKLTDKEKEEYRSYLEFQSWNKFERELRLV RNQDIVTWLLCTELIDKMKVEGLNVEELQKLRLKDIDTDTAKQ EKNNILNRIMPMQLPVTVYEIDDSHNIVKDRPLHTVYIEETKTKL LKQGNFKALVKDRRLNGLFSFVDTSSKAELKDKPISKSVVEYEL GEYQNARIETIKDMLLLEKTLIKKYEKLPTDNFSDMLNGWLEG KDESDKARFQNDVKLLVAVRNAFSHNQYPMRNRIAFANINPFS LSSADISEEKKLDIANQLKDKTHKIIKKIIEIEKPIETKE A2033_10205 OFX18020.1 MENQTQKGKGIYYYYTKNEDKHYFGSFLNLANNNIEQIIEEFRI [Bacteroidetes RLSLKDEKNIKEIINNYFTDKKSYTDWERGINILKEYLPVIDYLD bacterium LAITDKEFEKIDLKQKETAKRKYFRTNFSLLIDTIIDLRNFYTHYF GWA2_31_9] HKPISINPDVAKFLDKNLLNVCLDIKKQKMKTDKTKQALKDGL (SEQ DKELKKLIELKKAELKEKKIKTWNITENVEGAVYNDAFNHMVY ID No. KNNAGVTILKDYHKSILPDDKIDSELKLNFSISGLVFLLSMFLSK 141) KEIEQFKSNLEGFKGKVIGENGEYEISKFNNSLKYMATHWIFSY LTFKGLKQRVKNTEDKETLLMQMIDELNKVPHEVYQTLSKEQQ NEFLEDINEYVQDNEENKKSMENSIVVHPVIRKRYDDKENYFAI RELDEFANEPTLKFFVTAGNEVHDKREKQIQGSMLTSDRMIKEK INVFGKLTEIAKYKSDYFSNENTLETSEWELFPNPSYLLIQNNIPV HIDLIHNTEEAKQCQIAIDRIKCTTNPAKKRNTRKSKEEIIKIIYQ KNKNIKYGDPTALLSSNELPALIYELLVNKKSGKELENIIVEKIV NQYKTIAGFEKGQNLSNSLITKKLKKSEPNEDKINAEKIILAINRE LEITENKLNIIKNNRAEFRTGAKRKHIFYSKELGQEATWIAYDLK RFMPEASRKEWKGEHHSELQKFLAFYDRNKNDAKALLNMFW NFDNDQLIGNDLNSAFREFHFDKEYEKYLIKRDEILEGFKSFISN FKDEPKLLKKGIKDIYRVEDKRYYIIKSTNAQKEQLLSKPICLPR GIFDNKPTYIEGVKVESNSALFADWYQYTYSDKHEFQSFYDMP RDYKEQFEKFELNNIKSIQNKKNLNKSDKFIYFRYKQDLKIKQIK SQDLFIKLMVDELENVVEKNNIELNLKKLYQTSDERFKNQLIAD VQKNREKGDTSDNKMNENFIWNMTIPLSLCNGQIEEPKVKLKD IGKFRKLETDDKVIQLLEYDKSKVWKKLEIEDELENMPNSYERI RREKLLKGIQEFEHFLLEKEKEDGINHPKHFEQDLNPNEKTYVIN GVLRKNSKLNYTEIDKLLDLEHISIKDIETSAKEIHLAYFLIHVRN KFGHNQLPKLEAFELMKKYYKKNNEETYAEYFHKVSSQIVNEF KNSLEKHS SAMN05421542_0666 SDI27289.1 MEKTQTGLGIYYDHTKLQDKYFFGGFFNLAQNNIDNVIKAFIIK [Chryseobacterium FFPERKDKDINIAQFLDICFKDNDADSDFQKKNKFLRIHFPVIGF jejuense] LTSDNDKAGFKKKFALLLKTISELRNFYTHYYHKSIEFPSELFEL (SEQ ID LDDIFVKTTSEIKKLKKKDDKTQQLLNKNLSEEYDIRYQQQIER No. 142) LKELKAQGKRVSLTDETAIRNGVFNAAFNHLIYRDGENVKPSR LYQSSYSEPDPAENGISLSQNSILFLLSMFLERKETEDLKSRVKG FKAKIIKQGEEQISGLKFMATHWVFSYLCFKGIKQKLSTEFHEET LLIQIIDELSKVPDEVYSAFDSKTKEKFLEDINEYMKEGNADLSL EDSKVIHPVIRKRYENKFNYFAIRFLDEYLSSTSLKFQVHVGNY VHDRRVKHINGTGFQTERIVKDRIKVFGRLSNISNLKADYIKEQ LELPNDSNGWEIFPNPSYIFIDNNVPIHVLADEATKKGIELFKDK RRKEQPEELQKRKGKISKYNIVSMIYKEAKGKDKLRIDEPLALL SLNEIPALLYQILEKGATPKDIELIIKNKLTERFEKIKNYDPETPAP ASQISKRLRNNTTAKGQEALNAEKLSLLIEREIENTETKLSSIEEK RLKAKKEQRRNTPQRSIFSNSDLGRIAAWLADDIKRFMPAEQRK NWKGYQHSQLQQSLAYFEKRPQEAFLLLKEGWDTSDGSSYWN NWVMNSFLENNHFEKFYKNYLMKRVKYFSELAGNIKQHTHNT KFLRKFIKQQMPADLFPKRHYILKDLETEKNKVLSKPLVFSRGL FDNNPTFIKGVKVTENPELFAEWYSYGYKTEHVFQHFYGWERD YNELLDSELQKGNSFAKNSIYYNRESQLDLIKLKQDLKIKKIKIQ DLFLKRIAEKLFENVFNYPTTLSLDEFYLTQEERAEKERIALAQS LREEGDNSPNIIKDDFIWSKTIAFRSKQIYEPAIKLKDIGKFNRFV LDDEESKASKLLSYDKNKIWNKEQLERELSIGENSYEVIRREKL FKEIQNLELQILSNWSWDGINHPREFEMEDQKNTRHPNFKMYL VNGILRKNINLYKEDEDFWLESLKENDFKTLPSEVLETKSEMVQ LLFLVILIRNQFAHNQLPEIQFYNFIRKNYPEIQNNTVAELYLNLI KLAVQKLKDNS SAMN05444360_11366 SHM52812.1 MNTRVTGMGVSYDHTKKEDKHFFGGFLNLAQDNITAVIKAFCI [Chryseobacterium KFDKNPMSSVQFAESCFTDKDSDTDFQNKVRYVRTHLPVIGYL carnipullorum] NYGGDRNTFRQKLSTLLKAVDSLRNFYTHYYHSPLALSTELFEL (SEQ LDTVFASVAVEVKQHKMKDDKTRQLLSKSLAEELDIRYKQQLE ID No. RLKELKEQGKNIDLRDEAGIRNGVLNAAFNHLIYKEGEIAKPTL 143) SYSSFYYGADSAENGITISQSGLLFLLSMFLGKKEIEDLKSRIRGF KAKIVRDGEENISGLKFMATHWIFSYLSFKGMKQRLSTDFHEET LLIQIIDELSKVPDEVYHDFDTATREKFVEDINEYIREGNEDFSLG DSTIIHPVIRKRYENKFNYFAVRFLDEFIKFPSLRFQVHLGNFVH DRRIKDIHGTGFQTERVVKDRIKVFGKLSETSSLKTEYIEKELDL DSDTGWEIFPNPSYVFIDNNIPIYISTNKTFKNGSSEFIKLRRKEKP EEMKMRGEDKKEKRDIASMIGNAGSLNSKTPLAMLSLNEMPAL LYEILVKKTTPEEIELIIKEKLDSHFENIKNYDPEKPLPASQISKRL RNNTTDKGKKVINPEKLIHLINKEIDATEAKFALLAKNRKELKE KFRGKPLRQTIFSNMELGREATWLADDIKRFMPDILRKNWKGY QHNQLQQSLAFFNSRPKEAFTILQDGWDFADGSSFWNGWIINSF VKNRSFEYFYEAYFEGRKEYFSSLAENIKQHTSNHRNLRRFIDQ QMPKGLFENRHYLLENLETEKNKILSKPLVFPRGLFDTKPTFIKG IKVDEQPELFAEWYQYGYSTEHVFQNFYGWERDYNDLLESELE KDNDFSKNSIHYSRTSQLELIKLKQDLKIKKIKIQDLFLKLIAGHI FENIFKYPASFSLDELYLTQEERLNKEQEALIQSQRKEGDHSDNII KDNFIGSKTVTYESKQISEPNVKLKDIGKFNRFLLDDKVKTLLS YNEDKVWNKNDLDLELSIGENSYEVIRREKLFKKIQNFELQTLT DWPWNGTDHPEEFGTTDNKGVNHPNFKMYVVNGILRKHTDW FKEGEDNWLENLNETHFKNLSFQELETKSKSIQTAFLIIMIRNQF AHNQLPAVQFFEFIQKKYPEIQGSTTSELYLNFINLAVVELLELL EK SAMN05421786_1011119 SIS70481.1 METQILGNGISYDHTKTEDKHFFGGFLNTAQNNIDLLIKAYISKF [Chryseobacterium ESSPRKLNSVQFPDVCFKKNDSDADFQHKLQFIRKHLPVIQYLK ureilyticum] YGGNREVLKEKFRLLLQAVDSLRNFYTHFYHKPIQLPNELLTLL (SEQ ID DTIFGEIGNEVRQNKMKDDKTRHLLKKNLSEELDFRYQEQLER No. 144) LRKLKSEGKKVDLRDTEAIRNGVLNAAFNHLIFKDAEDFKPTVS YSSYYYDSDTAENGISISQSGLLFLLSMFLGRREMEDLKSRVRG FKARIIKHEEQHVSGLKFMATHWVFSEFCFKGIKTRLNADYHEE TLLIQLIDELSKVPDELYRSFDVATRERFIEDINEYIRDGKEDKSL IESKIVHPVIRKRYESKFNYFAIRFLDEFVNFPTLRFQVHAGNYV HDRRIKSIEGTGFKTERLVKDRIKVFGKLSTISSLKAEYLAKAVN ITDDTGWELLPHPSYVFIDNNIPIHLTVDPSFKNGVKEYQEKRKL QKPEEMKNRQGGDKMHKPAISSKIGKSKDINPESPVALLSMNEI PALLYEILVKKASPEEVEAKIRQKLTAVFERIRDYDPKVPLPASQ VSKRLRNNTDTLSYNKEKLVELANKEVEQTERKLALITKNRRE CREKVKGKFKRQKVFKNAELGTEATWLANDIKRFMPEEQKKN WKGYQHSQLQQSLAFFESRPGEARSLLQAGWDFSDGSSFWNG WVMNSFARDNTFDGFYESYLNGRMKYFLRLADNIAQQSSTNK LISNFIKQQMPKGLFDRRLYMLEDLATEKNKILSKPLIFPRGIFD DKPTFKKGVQVSEEPEAFADWYSYGYDVKHKFQEFYAWDRD YEELLREELEKDTAFTKNSIHYSRESQIELLAKKQDLKVKKVRI QDLYLKLMAEFLFENVFGHELALPLDQFYLTQEERLKQEQEAIV QSQRPKGDDSPNIVKENFIWSKTIPFKSGRVFEPNVKLKDIGKFR NLLTDEKVDILLSYNNTEIGKQVIENELIIGAGSYEFIRREQLFKEI QQMKRLSLRSVRGMGVPIRLNLK Prevotella WP_004343581 MQKQDKLFVDRKKNAIFAFPKYITIMENQEKPEPIYYELTDKHF buccae WAAFLNLARHNVYTTINHINRRLEIAELKDDGYMMDIKGSWNE (SEQ ID QAKKLDKKVRLRDLIMKHFPFLEAAAYEITNSKSPNNKEQREK No. 145) EQSEALSLNNLKNVLFIFLEKLQVLRNYYSHYKYSEESPKPIFET SLLKNMYKVFDANVRLVKRDYMHHENIDMQRDFTHLNRKKQ VGRTKNIIDSPNFHYHFADKEGNMTIAGLLFFVSLFLDKKDAIW MQKKLKGFKDGRNLREQMTNEVFCRSRISLPKLKLENVQTKD WMQLDMLNELVRCPKSLYERLREKDRESFKVPFDIFSDDYDAE EEPFKNTLVRHQDRFPYFVLRYFDLNEIFEQLRFQIDLGTYHFSI YNKRIGDEDEVRHLTHHLYGFARIQDFAQQNQPEVWRKLVKD LDYFEASQEPYIPKTAPHYHLENEKIGIKFCSTHNNLFPSLKTEK TCNGRSKFNLGTQFTAEAFLSVHELLPMMFYYLLLTKDYSRKE SADKVEGIIRKEISNIYAIYDAFANGEINSIADLTCRLQKTNILQG HLPKQMISILEGRQKDMEKEAERKIGEMIDDTQRRLDLLCKQTN QKIRIGKRNAGLLKSGKIADWLVNDMMRFQPVQKDQNNIPINN SKANSTEYRMLQRALALFGSENFRLKAYFNQMNLVGNDNPHP FLAETQWEHQTNILSFYRNYLEARKKYLKGLKPQNWKQYQHF LILKVQKTNRNTLVTGWKNSFNLPRGIFTQPIREWFEKHNNSKR IYDQILSFDRVGFVAKAIPLYFAEEYKDNVQPFYDYPFNIGNKL KPQKGQFLDKKERVELWQKNKELFKNYPSEKKKTDLAYLDFL SWKKFERELRLIKNQDIVTWLMFKELFNMATVEGLKIGEIHLRD IDTNTANEESNNILNRIMPMKLPVKTYETDNKGNILKERPLATF YIEETETKVLKQGNFKVLAKDRRLNGLLSFAETTDIDLEKNPITK LSVDHELIKYQTTRISIFEMTLGLEKKLINKYPTLPTDSFRNMLE RWLQCKANRPELKNYVNSLIAVRNAFSHNQYPMYDATLFAEV KKFTLFPSVDTKKIELNIAPQLLEIVGKAIKEIEKSENKN Porphyromonas WP_005873511 MNTVPASENKGQSRTVEDDPQYFGLYLNLARENLIEVESHVRIK gingivalis FGKKKLNEESLKQSLLCDHLLSVDRWTKVYGHSRRYLPFLHYF (SEQ ID DPDSQIEKDHDSKTGVDPDSAQRLIRELYSLLDFLRNDFSHNRL No. 146) DGTTFEHLEVSPDISSFITGTYSLACGRAQSRFADFFKPDDFVLA KNRKEQLISVADGKECLTVSGLAFFICLFLDREQASGMLSRIRGF KRTDENWARAVHETFCDLCIRHPHDRLESSNTKEALLLDMLNE LNRCPRILYDMLPEEERAQFLPALDENSMNNLSENSLNEESRLL WDGSSDWAEALTKRIRHQDRFPYLMLRFIEEMDLLKGIRFRVD LGEIELDSYSKKVGRNGEYDRTITDHALAFGKLSDFQNEEEVSR MISGEASYPVRFSLFAPRYAIYDNKIGYCHTSDPVYPKSKTGEK RALSNPQSMGFISVHNLRKLLLMELLCEGSFSRMQSDFLRKANR ILDETAEGKLQFSALFPEMRHRFIPPQNPKSKDRREKAETTLEKY KQEIKGRKDKLNSQLLSAFDMNQRQLPSRLLDEWMNIRPASHS VKLRTYVKQLNEDCRLRLRKFRKDGDGKARAIPLVGEMATFLS QDIVRMIISEETKKLITSAYYNEMQRSLAQYAGEENRRQFRAIV AELHLLDPSSGHPFLSATMETAHRYTEDFYKCYLEKKREWLAK TFYRPEQDENTKRRISVFFVPDGEARKLLPLLIRRRMKEQNDLQ DWIRNKQAHPIDLPSHLFDSKIMELLKVKDGKKKWNEAFKDW WSTKYPDGMQPFYGLRRELNIHGKSVSYIPSDGKKFADCYTHL MEKTVQDKKRELRTAGKPVPPDLAADIKRSFHRAVNEREFMLR LVQEDDRLMLMAINKMMTDREEDILPGLKNIDSILDEENQFSLA VHAKVLEKEGEGGDNSLSLVPATIEIKSKRKDWSKYIRYRYDR RVPGLMSHFPEHKATLDEVKTLLGEYDRCRIKIFDWAFALEGAI MSDRDLKPYLHESSSREGKSGEHSTLVKMLVEKKGCLTPDESQ YLILIRNKAAHNQFPCAAEMPLIYRDVSAKVGSIEGSSAKDLPE GSSLVDSLWKKYEMIIRKILPILDPENRFFGKLLNNMSQPINDL Porphyromonas WP_005874195 MTEQNEKPYNGTYYTLEDKHFWAAFFNLARHNAYITLAHIDRQ gingivalis LAYSKADITNDEDILFFKGQWKNLDNDLERKARLRSLILKHFSF (SEQ ID LEGAAYGKKLFESQSSGNKSSKKKELTKKEKEELQANALSLDN No. 147) LKSILFDFLQKLKDFRNYYSHYRHPESSELPLFDGNMLQRLYNV FDVSVQRVKRDHEHNDKVDPHRHFNHLVRKGKKDKYGNNDN PFFKHHFVDREEKVTEAGLLFFVSLFLEKRDAIWMQKKIRGFKG GTEAYQQMTNEVFCRSRISLPKLKLESLRTDDWMLLDMLNELV RCPKSLYDRLREEDRARFRVPVDILSDEDDTDGTEEDPFKNTLV RHQDRFPYFALRYFDLKKVFTSLRFHIDLGTYHFAIYKKNIGEQ PEDRHLTRNLYGFGRIQDFAEEHRPEEWKRLVRDLDYFETGDK PYITQTTPHYHIEKGKIGLRFVPEGQLLWPSPEVGATRTGRSKY AQDKRFTAEAFLSVHELMPMMFYYFLLREKYSEEASAEKVQG RIKRVIEDVYAVYDAFARDEINTRDELDACLADKGIRRGHLPRQ MIAILSQEHKDMEEKVRKKLQEMIADTDHRLDMLDRQTDRKIR IGRKNAGLPKSGVIADWLVRDMMRFQPVAKDTSGKPLNNSKA NSTEYRMLQRALALFGGEKERLTPYFRQMNLTGGNNPHPFLHE TRWESHTNILSFYRSYLKARKAFLQSIGRSDREENHRFLLLKEPK TDRQTLVAGWKSEFHLPRGIFTEAVRDCLIEMGYDEVGSYKEV GFMAKAVPLYFERACKDRVQPFYDYPFNVGNSLKPKKGRFLSK EKRAEEWESGKERFRDLEAWSHSAARRIEDAFVGIEYASWENK KKIEQLLQDLSLWETFESKLKVKADKINIAKLKKEILEAKEHPY HDFKSWQKFERELRLVKNQDIITWMMCRDLMEENKVEGLDTG TLYLKDIRTDVQEQGSLNVLNHVKPMRLPVVVYRADSRGHVH KEEAPLATVYIEERDTKLLKQGNFKSFVKDRRLNGLFSFVDTGA LAMEQYPISKLRVEYELAKYQTARVCAFEQTLELEESLLTRYPH LPDESFREMLESWSDPLLDKWPDLQREVRLLIAVRNAFSHNQY PMYDETIFSSIRKYDPSSLDAIEERMGLNIAHRLSEEVKLAKEMV ERIIQA Prevotella WP_006044833 MKEEEKGKTPVVSTYNKDDKHFWAAFLNLARHNVYITVNHIN pallens KILGEGEINRDGYENTLEKSWNEIKDINKKDRLSKLIIKHFPFLE (SEQ ID VTTYQRNSADTTKQKEEKQAEAQSLESLKKSFFVFIYKLRDLRN No. 148) HYSHYKHSKSLERPKFEEDLQEKMYNIFDASIQLVKEDYKHNT DIKTEEDFKHLDRKGQFKYSFADNEGNITESGLLFFVSLFLEKK DAIWVQKKLEGFKCSNESYQKMTNEVFCRSRMLLPKLRLQSTQ TQDWILLDMLNELIRCPKSLYERLREEDRKKFRVPIEIADEDYD AEQEPFKNALVRHQDRFPYFALRYFDYNEIFTNLRFQIDLGTYH FSIYKKQIGDYKESHHLTHKLYGFERIQEFTKQNRPDEWRKFVK TFNSFETSKEPYIPETTPHYHLENQKIGIRFRNDNDKIWPSLKTNS EKNEKSKYKLDKSFQAEAFLSVHELLPMMFYYLLLKTENTDND NEIETKKKENKNDKQEKHKIEEIIENKITEIYALYDAFANGKINSI DKLEEYCKGKDIEIGHLPKQMIAILKSEHKDMATEAKRKQEEM LADVQKSLESLDNQINEEIENVERKNSSLKSGEIASWLVNDMM RFQPVQKDNEGNPLNNSKANSTEYQMLQRSLALYNKEEKPTRY FRQVNLIESSNPHPFLNNTEWEKCNNILSFYRSYLEAKKNFLESL KPEDWEKNQYFLMLKEPKTNCETLVQGWKNGFNLPRGIFTEPI RKWFMEHRKNITVAELKRVGLVAKVIPLFFSEEYKDSVQPFYN YLFNVGNINKPDEKNFLNCEERRELLRKKKDEFKKMTDKEKEE NPSYLEFQSWNKFERELRLVRNQDIVTWLLCMELFNKKKIKEL NVEKIYLKNINTNTTKKEKNTEEKNGEEKIIKEKNNILNRIMPMR LPIKVYGRENFSKNKKKKIRRNTFFTVYIEEKGTKLLKQGNFKA LERDRRLGGLFSFVKTHSKAESKSNTISKSRVEYELGEYQKARIE IIKDMLALEETLIDKYNSLDTDNFHNMLTGWLKLKDEPDKASF QNDVDLLIAVRNAFSHNQYPMRNRIAFANINPFSLSSANTSEEK GLGIANQLKDKTHKTIEKIIEIEKPIETKE Myroides WP_006261414 MKDILTTDTTEKQNRFYSHKIADKYFFGGYFNLASNNIYEVFEE odoratimimus VNKRNTFGKLAKRDNGNLKNYIIHVFKDELSISDFEKRVAIFAS (SEQ ID YFPILETVDKKSIKERNRTIDLTLSQRIRQFREMLISLVTAVDQLR No. 149) NFYTHYHHSEIVIENKVLDFLNSSLVSTALHVKDKYLKTDKTKE FLKETIAAELDILIEAYKKKQIEKKNTRFKANKREDILNAIYNEA FWSFINDKDKDKETVVAKGADAYFEKNHHKSNDPDFALNISEK GIVYLLSFFLTNKEMDSLKANLTGFKGKVDRESGNSIKYMATQ RIYSFHTYRGLKQKIRTSEEGVKETLLMQMIDELSKVPNVVYQH LSTTQQNSFIEDWNEYYKDYEDDVETDDLSRVIHPVIRKRYEDR FNYFAIRFLDEFFDFPTLRFQVHLGDYVHDRRTKQLGKVESDRII KEKVTVFARLKDINSAKANYFHSLEEQDKEELDNKWTLFPNPS YDFPKEHTLQHQGEQKNAGKIGIYVKLRDTQYKEKAALEEARK SLNPKERSATKASKYDIITQIIEANDNVKSEKPLVFTGQPIAYLS MNDIHSMLFSLLTDNAELKKTPEEVEAKLIDQIGKQINEILSKDT DTKILKKYKDNDLKETDTDKITRDLARDKEEIEKLILEQKQRAD DYNYTSSTKFNIDKSRKRKHLLFNAEKGKIGVWLANDIKRFMT EEFKSKWKGYQHTELQKLFAYYDTSKSDLDLILSDMVMVKDY PIELIALVKKSRTLVDFLNKYLEARLGYMENVITRVKNSIGTPQF KTVRKECFTFLKKSNYTVVSLDKQVERILSMPLFIERGFMDDKP TMLEGKSYQQHKEKFADWFVHYKENSNYQNFYDTEVYEITTE DKREKAKVTKKIKQQQKNDVFTLMMVNYMLEEVLKLSSNDRL SLNELYQTKEERIVNKQVAKDTQERNKNYIWNKVVDLQLCEG LVRIDKVKLKDIGNFRKYENDSRVKEFLTYQSDIVWSAYLSNEV DSNKLYVIERQLDNYESIRSKELLKEVQEIECSVYNQVANKESL KQSGNENFKQYVLQGLVPIGMDVREMLILSTDVKFIKEEIIQLG QAGEVEQDLYSLIYIRNKFAHNQLPIKEFFDFCENNYRSISDNEY YAEYYMEIFRSIKEKYTS Myroides WP_006265509 MKDILTTDTTEKQNRFYSHKIADKYFFGGYFNLASNNIYEVFEE odoratimimus VNKRNTFGKLAKRDNGNLKNYIIHVFKDELSISDFEKRVAIFAS (SEQ ID YFPILETVDKKSIKERNRTIDLTLSQRIRQFREMLISLVTAVDQLR No. 150) NFYTHYHHSEIVIENKVLDFLNSSLVSTALHVKDKYLKTDKTKE FLKETIAAELDILIEAYKKKQIEKKNTRFKANKREDILNAIYNEA FWSFINDKDKDKETVVAKGADAYFEKNHHKSNDPDFALNISEK GIVYLLSFFLTNKEMDSLKANLTGFKGKVDRESGNSIKYMATQ RIYSFHTYRGLKQKIRTSEEGVKETLLMQMIDELSKVPNVVYQH LSTTQQNSFIEDWNEYYKDYEDDVETDDLSRVIHPVIRKRYEDR FNYFAIRFLDEFFDFPTLRFQVHLGDYVHDRRTKQLGKVESDRII KEKVTVFARLKDINSAKASYFHSLEEQDKEELDNKWTLFPNPS YDFPKEHTLQHQGEQKNAGKIGIYVKLRDTQYKEKAALEEARK SLNPKERSATKASKYDIITQIIEANDNVKSEKPLVFTGQPIAYLS MNDIHSMLFSLLTDNAELKKTPEEVEAKLIDQIGKQINEILSKDT DTKILKKYKDNDLKETDTDKITRDLARDKEEIEKLILEQKQRAD DYNYTSSTKFNIDKSRKRKHLLFNAEKGKIGVWLANDIKRFMF KESKSKWKGYQHTELQKLFAYFDTSKSDLELILSDMVMVKDYP IELIDLVRKSRTLVDFLNKYLEARLGYIENVITRVKNSIGTPQFKT VRKECFAFLKESNYTVASLDKQIERILSMPLFIERGFMDSKPTML EGKSYQQHKEDFADWFVHYKENSNYQNFYDTEVYEIITEDKRE QAKVTKKIKQQQKNDVFTLMMVNYMLEEVLKLPSNDRLSLNE LYQTKEERIVNKQVAKDTQERNKNYIWNKVVDLQLCEGLVRID KVKLKDIGNFRKYENDSRVKEFLTYQSDIVWSGYLSNEVDSNK LYVIERQLDNYESIRSKELLKEVQEIECIVYNQVANKESLKQSGN ENFKQYVLQGLLPRGTDVREMLILSTDVKFKKEEIMQLGQVRE VEQDLYSLIYIRNKFAHNQLPIKEFFDFCENNYRPISDNEYYAEY YMEIFRSIKEKYAS Prevotella WP_007412163 MQKQDKLFVDRKKNAIFAFPKYITIMENQEKPEPIYYELTDKHF sp. MSX73 WAAFLNLARHNVYTTINHINRRLEIAELKDDGYMMGIKGSWNE (SEQ ID QAKKLDKKVRLRDLIMKHFPFLEAAAYEITNSKSPNNKEQREK No. 151) EQSEALSLNNLKNVLFIFLEKLQVLRNYYSHYKYSEESPKPIFET SLLKNMYKVFDANVRLVKRDYMHHENIDMQRDFTHLNRKKQ VGRTKNIIDSPNFHYHFADKEGNMTIAGLLFFVSLFLDKKDAIW MQKKLKGFKDGRNLREQMTNEVFCRSRISLPKLKLENVQTKD WMQLDMLNELVRCPKSLYERLREKDRESFKVPFDIFSDDYDAE EEPFKNTLVRHQDRFPYFVLRYFDLNEIFEQLRFQIDLGTYHFSI YNKRIGDEDEVRHLTHHLYGFARIQDFAPQNQPEEWRKLVKDL DHFETSQEPYISKTAPHYHLENEKIGIKFCSTHNNLFPSLKREKT CNGRSKFNLGTQFTAEAFLSVHELLPMMFYYLLLTKDYSRKES ADKVEGIIRKEISNIYAIYDAFANNEINSIADLTCRLQKTNILQGH LPKQMISILEGRQKDMEKEAERKIGEMIDDTQRRLDLLCKQTNQ KIRIGKRNAGLLKSGKIADWLVSDMMRFQPVQKDTNNAPINNS KANSTEYRMLQHALALFGSESSRLKAYFRQMNLVGNANPHPFL AETQWEHQTNILSFYRNYLEARKKYLKGLKPQNWKQYQHFLIL KVQKTNRNTLVTGWKNSFNLPRGIFTQPIREWFEKHNNSKRIYD QILSFDRVGFVAKAIPLYFAEEYKDNVQPFYDYPFNIGNKLKPQ KGQFLDKKERVELWQKNKELFKNYPSEKNKTDLAYLDFLSWK KFERELRLIKNQDIVTWLMFKELFKTTTVEGLKIGEIHLRDIDTN TANEESNNILNRIMPMKLPVKTYETDNKGNILKERPLATFYIEET ETKVLKQGNFKVLAKDRRLNGLLSFAETTDIDLEKNPITKLSVD YELIKYQTTRISIFEMTLGLEKKLIDKYSTLPTDSFRNMLERWLQ CKANRPELKNYVNSLIAVRNAFSHNQYPMYDATLFAEVKKFTL FPSVDTKKIELNIAPQLLEIVGKAIKEIEKSENKN Porphyromonas WP_012458414 MTEQNERPYNGTYYTLEDKHFWAAFFNLARHNAYITLAHIDRQ gingivalis LAYSKADITNDEDILFFKGQWKNLDNDLERKARLRSLILKHFSF (SEQ ID LEGAAYGKKLFESQSSGNKSSKKKELTKKEKEELQANALSLDN No. 152) LKSILFDFLQKLKDFRNYYSHYRHPESSELPLFDGNMLQRLYNV FDVSVQRVKRDHEHNDKVDPHRHFNHLVRKGKKDRYGNNDN PFFKHHFVDREEKVTEAGLLFFVSLFLEKRDAIWMQKKIRGFKG GTETYQQMTNEVFCRSRISLPKLKLESLRTDDWMLLDMLNELV RCPKSLYDRLREEDRARFRVPVDILSDEDDTDGTEEDPFKNTLV RHQDRFPYFALRYFDLKKVFTSLRFHIDLGTYHFAIYKKNIGEQ PEDRHLTRNLYGFGRIQDFAEEHRPEEWKRLVRDLDYFETGDK PYITQTTPHYHIEKGKIGLRFVPEGQHLWPSPEVGATRTGRSKY AQDKRLTAEAFLSVHELMPMMFYYFLLREKYSDEASAERVQG RIKRVIEDVYAVYDAFARGEINTRDELDACLADKGIRRGHLPRQ MIGILSQEHKDMEEKVRKKLQEMIVDTDHRLDMLDRQTDRKIR IGRKNAGLPKSGVIADWLVRDMMRFQPVAKDTSGKPLNNSKA NSTEYRMLQRALALFGGEKERLTPYFRQMNLTGGNNPHPFLHE TRWESHTNILSFYRSYLKARKAFLQSIGRSDRVENHRFLLLKEP KTDRQTLVAGWKGEFHLPRGIFTEAVRDCLIEMGLDEVGSYKE VGFMAKAVPLYFERACKDRVQPFYDYPFNVGNSLKPKKGRFLS KEKRAEEWESGKERFRLAKLKKEILEAKEHPYLDFKSWQKFER ELRLVKNQDIITWMICRDLMEENKVEGLDTGTLYLKDIRTDVQ EQGNLNVLNRVKPMRLPVVVYRADSRGHVHKEQAPLATVYIE ERDTKLLKQGNFKSFVKDRRLNGLFSFVDTGALAMEQYPISKL RVEYELAKYQTARVCAFEQTLELEESLLTRYPHLPDKNFRKML ESWSDPLLDKWPDLHGNVRLLIAVRNAFSHNQYPMYDEAVFSS IRKYDPSSPDAIEERMGLNIAHRLSEEVKQAKEMAERIIQA Paludibacter WP_013446107 MKTSANNIYFNGINSFKKIFDSKGAIAPIAEKSCRNFDIKAQNDV propionicigenes NKEQRIHYFAVGHTFKQLDTENLFEYVLDENLRAKRPTRFISLQ (SEQ QFDKEFIENIKRLISDIRNINSHYIHRFDPLKIDAVPTNIIDFLKESF ID No. ELAVIQIYLKEKGINYLQFSENPHADQKLVAFLHDKFLPLDEKK 153) TSMLQNETPQLKEYKEYRKYFKTLSKQAAIDQLLFAEKETDYI WNLFDSHPVLTISAGKYLSFYSCLFLLSMFLYKSEANQLISKIKG FKKNTTEEEKSKREIFTFFSKRFNSMDIDSEENQLVKFRDLILYL NHYPVAWNKDLELDSSNPAMTDKLKSKIIELEINRSFPLYEGNE RFATFAKYQIWGKKHLGKSIEKEYINASFTDEEITAYTYETDTCP ELKDAHKKLADLKAAKGLFGKRKEKNESDIKKTETSIRELQHEP NPIKDKLIQRIEKNLLTVSYGRNQDRFMDFSARFLAEINYFGQD ASFKMYHFYATDEQNSELEKYELPKDKKKYDSLKFHQGKLVH FISYKEHLKRYESWDDAFVIENNAIQLKLSFDGVENTVTIQRAL LIYLLEDALRNIQNNTAENAGKQLLQEYYSHNKADLSAFKQILT QQDSIEPQQKTEFKKLLPRRLLNNYSPAINHLQTPHSSLPLILEK ALLAEKRYCSLVVKAKAEGNYDDFIKRNKGKQFKLQFIRKAW NLMYFRNSYLQNVQAAGHHKSFHIERDEFNDFSRYMFAFEELS QYKYYLNEMFEKKGFFENNEFKILFQSGTSLENLYEKTKQKFEI WLASNTAKTNKPDNYHLNNYEQQFSNQLFFINLSHFINYLKSTG KLQTDANGQIIYEALNNVQYLIPEYYYTDKPERSESKSGNKLYN KLKATKLEDALLYEMAMCYLKADKQIADKAKHPITKLLTSDVE FNITNKEGIQLYHLLVPFKKIDAFIGLKMHKEQQDKKHPTSFLA NIVNYLELVKNDKDIRKTYEAFSTNPVKRTLTYDDLAKIDGHLI SKSIKFTNVTLELERYFIFKESLIVKKGNNIDFKYIKGLRNYYNN EKKKNEGIRNKAFHFGIPDSKSYDQLIRDAEVMFIANEVKPTHA TKYTDLNKQLHTVCDKLMETVHNDYFSKEGDGKKKREAAGQ KYFENIISAK Porphyromonas WP_013816155 MTEQNEKPYNGTYYTLEDKHFWAAFFNLARHNAYITLAHIDRQ gingivalis LAYSKADITNDEDILFFKGQWKNLDNDLERKARLRSLILKHFSF (SEQ ID LEGAAYGKKLFESQSSGNKSSKNKELTKKEKEELQANALSLDN No. 154) LKSILFDFLQKLKDFRNYYSHYRHPESSELPLFDGNMLQRLYNV FDVSVQRVKRDHEHNDKVDPHRHFNHLVRKGKKDRYGNNDN PFFKHHFVDREGTVTEAGLLFFVSLFLEKRDAIWMQKKIRGFKG GTETYQQMTNEVFCRSRISLPKLKLESLRTDDWMLLDMLNELV RCPKSLYDRLREEDRARFRVPVDILSDEEDTDGAEEDPFKNTLV RHQDRFPYFALRYFDLKKVFTSLRFQIDLGTYHFAIYKKNIGEQ PEDRHLTRNLYGFGRIQDFAEEHRPEEWKRLVRDLDYFETGDK PYITQTTPHYHIEKGKIGLRFVPEGQHLWPSPEVGATRTGRSKY AQDKRFTAEAFLSAHELMPMMFYYFLLREKYSEEASAERVQGR IKRVIEDVYAVYDAFARDEINTRDELDACLADKGIRRGHLPRQ MIGILSQEHKDMEEKIRKKLQEMMADTDHRLDMLDRQTDRKIR IGRKNAGLPKSGVIADWLVRDMMRFQPVAKDTSGKPLNNSKA NSTEYRMLQRALALFGGEKERLTPYFRQMNLTGGNNPHPFLHE TRWESHTNILSFYRSYLKARKAFLQSIGRSDRVENHRFLLLKEP KTDRQTLVAGWKGEFHLPRGIFTEAVRDCLIEMGLDEVGSYKE VGFMAKAVPLYFERACKDWVQPFYNYPFNVGNSLKPKKGRFL SKEKRAEEWESGKERFRLAKLKKEILEAKEHPYLDFKSWQKFE RELRLVKNQDIITWMICGDLMEENKVEGLDTGTLYLKDIRTDV QEQGSLNVLNRVKPMRLPVVVYRADSRGHVHKEQAPLATVYI EERDTKLLKQGNFKSFVKDRRLNGLFSFVDTGALAMEQYPISK LRVEYELAKYQTARVCAFEQTLELEESLLTRCPHLPDKNFRKM LESWSDPLLDKWPDLHRKVRLLIAVRNAFSHNQYPMYDEAVFS SIRKYDPSFPDAIEERMGLNIAHRLSEEVKQAKETVERIIQA Flavobacterium WP_014165541 MSSKNESYNKQKTFNHYKQEDKYFFGGFLNNADDNLRQVGKE columnare FKTRINFNHNNNELASVFKDYFNKEKSVAKREHALNLLSNYFP (SEQ ID VLERIQKHTNHNFEQTREIFELLLDTIKKLRDYYTHHYHKPITIN No. 155) PKIYDFLDDTLLDVLITIKKKKVKNDTSRELLKEKLRPELTQLKN QKREELIKKGKKLLEENLENAVFNHCLRPFLEENKTDDKQNKT VSLRKYRKSKPNEETSITLTQSGLVFLMSFFLHRKEFQVFTSGLE GFKAKVNTIKEEEISLNKNNIVYMITHWSYSYYNFKGLKHRIKT DQGVSTLEQNNTTHSLTNTNTKEALLTQIVDYLSKVPNEIYETL SEKQQKEFEEDINEYMRENPENEDSTFSSIVSHKVIRKRYENKFN YFAMRFLDEYAELPTLRFMVNFGDYIKDRQKKILESIQFDSERII KKEIHLFEKLSLVTEYKKNVYLKETSNIDLSRFPLFPNPSYVMA NNNIPFYIDSRSNNLDEYLNQKKKAQSQNKKRNLTFEKYNKEQ SKDAIIAMLQKEIGVKDLQQRSTIGLLSCNELPSMLYEVIVKDIK GAELENKIAQKIREQYQSIRDFTLDSPQKDNIPTTLIKTINTDSSV TFENQPIDIPRLKNAIQKELTLTQEKLLNVKEHEIEVDNYNRNKN TYKFKNQPKNKVDDKKLQRKYVFYRNEIRQEANWLASDLIHF MKNKSLWKGYMHNELQSFLAFFEDKKNDCIALLETVFNLKED CILTKGLKNLFLKHGNFIDFYKEYLKLKEDFLNTESTFLENGLIG LPPKILKKELSKRFKYIFIVFQKRQFIIKELEEKKNNLYADAINLS RGIFDEKPTMIPFKKPNPDEFASWFVASYQYNNYQSFYELTPDI VERDKKKKYKNLRAINKVKIQDYYLKLMVDTLYQDLFNQPLD KSLSDFYVSKAEREKIKADAKAYQKRNDSSLWNKVIHLSLQNN RITANPKLKDIGKYKRALQDEKIATLLTYDDRTWTYALQKPEK ENENDYKELHYTALNMELQEYEKVRSKELLKQVQELEKQILEE YTDFLSTQIHPADFEREGNPNFKKYLAHSILENEDDLDKLPEKV EAMRELDETITNPIIKKAIVLIIIRNKMAHNQYPPKFIYDLANRFV PKKEEEYFATYFNRVFETITKELWENKEKKDKTQV Psychroflexus WP_015024765 MESIIGLGLSFNPYKTADKHYFGSFLNLVENNLNAVFAEFKERIS torquis YKAKDENISSLIEKHFIDNMSIVDYEKKISILNGYLPIIDFLDDELE (SEQ ID NNLNTRVKNFKKNFIILAEAIEKLRDYYTHFYHDPITFEDNKEPL No. 156) LELLDEVLLKTILDVKKKYLKTDKTKEILKDSLREEMDLLVIRK TDELREKKKTNPKIQHTDSSQIKNSIFNDAFQGLLYEDKGNNKK TQVSHRAKTRLNPKDIHKQEERDFEIPLSTSGLVFLMSLFLSKKE IEDFKSNIKGFKGKVVKDENHNSLKYMATHRVYSILAFKGLKY RIKTDTFSKETLMMQMIDELSKVPDCVYQNLSETKQKDFIEDW NEYFKDNEENTENLENSRVVHPVIRKRYEDKFNYFAIRFLDEFA NFKTLKFQVFMGYYIHDQRTKTIGTTNITTERTVKEKINVFGKL SKMDNLKKHFFSQLSDDENTDWEFFPNPSYNFLTQADNSPANN IPIYLELKNQQIIKEKDAIKAEVNQTQNRNPNKPSKRDLLNKILK TYEDFHQGDPTAILSLNEIPALLHLFLVKPNNKTGQQIENIIRIKIE KQFKAINHPSKNNKGIPKSLFADTNVRVNAIKLKKDLEAELDM LNKKHIAFKENQKASSNYDKLLKEHQFTPKNKRPELRKYVFYK SEKGEEATWLANDIKRFMPKDFKTKWKGCQHSELQRKLAFYD RHTKQDIKELLSGCEFDHSLLDINAYFQKDNFEDFFSKYLENRIE TLEGVLKKLHDFKNEPTPLKGVFKNCFKFLKRQNYVTESPEIIK KRILAKPTFLPRGVFDERPTMKKGKNPLKDKNEFAEWFVEYLE NKDYQKFYNAEEYRMRDADFKKNAVIKKQKLKDFYTLQMVN YLLKEVFGKDEMNLQLSELFQTRQERLKLQGIAKKQMNKETG DSSENTRNQTYIWNKDVPVSFFNGKVTIDKVKLKNIGKYKRYE RDERVKTFIGYEVDEKWMMYLPHNWKDRYSVKPINVIDLQIQE YEEIRSHELLKEIQNLEQYIYDHTTDKNILLQDGNPNFKMYVLN GLLIGIKQVNIPDFIVLKQNTNFDKIDFTGIASCSELEKKTIILIAIR NKFAHNQLPNKMIYDLANEFLKIEKNETYANYYLKVLKKMISD LA Riemerella WP_015345620 MFFSFHNAQRVIFKHLYKAFDASLRMVKEDYKAHFTVNLTRDF anatipestifer AHLNRKGKNKQDNPDFNRYRFEKDGFFTESGLLFFTNLFLDKR (SEQ ID DAYWMLKKVSGFKASHKQREKMTTEVFCRSRILLPKLRLESRY No. 157) DHNQMLLDMLSELSRCPKLLYEKLSEENKKHFQVEADGFLDEI EEEQNPFKDTLIRHQDRFPYFALRYLDLNESFKSIRFQVDLGTYH YCIYDKKIGDEQEKRHLTRTLLSFGRLQDFTEINRPQEWKALTK DLDYKETSNQPFISKTTPHYHITDNKIGFRLGTSKELYPSLEIKDG ANRIAKYPYNSGFVAHAFISVHELLPLMFYQHLTGKSEDLLKET VRHIQRIYKDFEEERINTIEDLEKANQGRLPLGAFPKQMLGLLQ NKQPDLSEKAKIKIEKLIAETKLLSHRLNTKLKSSPKLGKRREKL IKTGVLADWLVKDFMRFQPVAYDAQNQPIKSSKANSTEFWFIR RALALYGGEKNRLEGYFKQTNLIGNTNPHPFLNKFNWKACRNL VDFYQQYLEQREKFLEAIKHQPWEPYQYCLLLKVPKENRKNLV KGWEQGGISLPRGLFTEAIRETLSKDLTLSKPIRKEIKKHGRVGFI SRAITLYFKEKYQDKHQSFYNLSYKLEAKAPLLKKEEHYEYWQ QNKPQSPTESQRLELHTSDRWKDYLLYKRWQHLEKKLRLYRN QDIMLWLMTLELTKNHFKELNLNYHQLKLENLAVNVQEADAK LNPLNQTLPMVLPVKVYPTTAFGEVQYHETPIRTVYIREEQTKA LKMGNFKALVKDRRLNGLFSFIKEENDTQKHPISQLRLRRELEI YQSLRVDAFKETLSLEEKLLNKHASLSSLENEFRTLLEEWKKKY AASSMVTDKHIAFIASVRNAFCHNQYPFYKETLHAPILLFTVAQ PTTEEKDGLGIAEALLKVLREYCEIVKSQI Prevotella WP_021584635 MENDKRLEESACYTLNDKHFWAAFLNLARHNVYITVNHINKTL pleuritidis ELKNKKNQEIIIDNDQDILAIKTHWAKVNGDLNKTDRLRELMIK (SEQ ID HFPFLEAAIYSNNKEDKEEVKEEKQAKAQSFKSLKDCLFLFLEK No. 158) LQEARNYYSHYKYSESSKEPEFEEGLLEKMYNTFDASIRLVKED YQYNKDIDPEKDFKHLERKEDFNYLFTDKDNKGKITKNGLLFF VSLFLEKKDAIWMQQKFRGFKDNRGNKEKMTHEVFCRSRMLL PKIRLESTQTQDWILLDMLNELIRCPKSLYERLQGAYREKFKVP FDSIDEDYDAEQEPFRNTLVRHQDRFPYFALRYFDYNEIFKNLR FQIDLGTYHFSIYKKLIGGKKEDRHLTHKLYGFERIQEFTKQNRP DKWQAIIKDLDTYETSNERYISETTPHYHLENQKIGIRFRNDNN DIWPSLKTNGEKNEKSKYNLDKPYQAEAFLSVHELLPMMFYYL LLKMENTDNDKEDNEVGTKKKGNKNNKQEKHKIEEIIENKIKDI YALYDAFTNGEINSIDELAEQREGKDIEIGHLPKQLIVILKNKSK DMAEKANRKQKEMIKDTKKRLATLDKQVKGEIEDGGRNIRLL KSGEIARWLVNDMMRFQPVQKDNEGKPLNNSKANSTEYQMLQ RSLALYNKEEKPTRYFRQVNLIKSSNPHPFLEDTKWEECYNILSF YRNYLKAKIKFLNKLKPEDWKKNQYFLMLKEPKTNRKTLVQG WKNGFNLPRGIFTEPIKEWFKRHQNDSEEYKKVEALDRVGLVA KVIPLFFKEEYFKEDAQKEINNCVQPFYSFPYNVGNIHKPEEKNF LHCEERRKLWDKKKDKFKGYKAKEKSKKMTDKEKEEHRSYLE FQSWNKFERELRLVRNQDILTWLLCTKLIDKLKIDELNIEELQKL RLKDIDTDTAKKEKNNILNRVMPMRLPVTVYEIDKSFNIVKDKP LHTVYIEETGTKLLKQGNFKALVKDRRLNGLFSFVKTSSEAESK SKPISKLRVEYELGAYQKARIDIIKDMLALEKTLIDNDENLPTNK FSDMLKSWLKGKGEANKARLQNDVGLLVAVRNAFSHNQYPM YNSEVFKGMKLLSLSSDIPEKEGLGIAKQLKDKIKETIERIIEIEKE IRN Porphyromonas WP_021663197 MNTVPASENKGQSRTVEDDPQYFGLYLNLARENLIEVESHVRIK gingivalis FGKKKLNEESLKQSLLCDHLLSVDRWTKVYGHSRRYLPFLHYF (SEQ ID DPDSQIEKDHDSKTGVDPDSAQRLIRELYSLLDFLRNDFSHNRL No. 159) DGTTFEHLEVSPDISSFITGTYSLACGRAQSRFADFFKPDDFVLA KNRKEQLISVADGKECLTVSGLAFFICLFLDREQASGMLSRIRGF KRTDENWARAVHETFCDLCIRHPHDRLESSNTKEALLLDMLNE LNRCPRILYDMLPEEERAQFLPALDENSMNNLSENSLNEESRLL WDGSSDWAEALTKRIRHQDRFPYLMLRFIEEMDLLKGIRFRVD LGEIELDSYSKKVGRNGEYDRTITDHALAFGKLSDFQNEEEVSR MISGEASYPVRFSLFAPRYAIYDNKIGYCHTSDPVYPKSKTGEK RALSNPRSMGFISVHDLRKLLLMELLCEGSFSRMQSDFLRKANR ILDETAEGKLQFSALFPEMRHRFIPPQNPKSKDRREKAETTLEKY KQEIKGRKDKLNSQLLSAFDMDQRQLPSRLLDEWMNIRPASHS VKLRTYVKQLNEDCRLRLQKFRKDGDGKARAIPLVGEMATFLS QDIVRMIISEETKKLITSAYYNEMQRSLAQYAGEENRHQFRAIV AELRLLDPSSGHPFLSATMETAHRYTEDFYKCYLEKKREWLAK TFYRPEQDENTKRRISVFFVPDGEARKLLPLLIRRRMKEQNDLQ DWIRNKQAHPIDLPSHLFDSKIMELLKVKDGKKKWNEAFKDW WSTKYPDGMQPFYGLRRELNIHGKSVSYIPSDGKKFADCYTHL MEKTVQDKKRELRTAGKPVPPDLAADIKRSFHRAVNEREFMLR LVQEDDRLMLMAINKMMTDREEDILPGLKNIDSILDEENQFSLA VHAKVLEKEGEGGDNSLSLVPATIEIKSKRKDWSKYIRYRYDR RVPGLMSHFPEHKATLDEVKTLLGEYDRCRIKIFDWAFALEGAI MSDRDLKPYLHESSSREGKSGEHSTLVKMLVEKKGCLTPDESQ YLILIRNKAAHNQFPCAAEMPLIYRDVSAKVGSIEGSSAKDLPE GSSLVDSLWKKYEMIIRKILPILDPENRFFGKLLNNMSQPINDL Porphyromonas WP_021665475 MNTVPASENKGQSRTVEDDPQYFGLYLNLARENLIEVESHVRIK gingivalis FGKKKLNEESLKQSLLCDHLLSVDRWTKVYGHSRRYLPFLHYF (SEQ ID DPDSQIEKDHDSKTGVDPDSAQRLIRELYSLLDFLRNDFSHNRL No. 160) DGTTFEHLEVSPDISSFITGTYSLACGRAQSRFADFFKPDDFVLA KNRKEQLISVADGKECLTVSGLAFFICLFLDREQASGMLSRIRGF KRTNENWARAVHETFCDLCIRHPHDRLESSNTKEALLLDMLNE LNRCPRILYDMLPEEERAQFLPALDENSMNNLSENSLNEESRLL WDGSSDWAEALTKRIRHQDRFPYLMLRFIEEMDLLKGIRFRVD LGEIELDSYSKKVGRNGEYDRTITDHALAFGKLSDFQNEEEVSR MISGEASYPVRFSLFAPRYAIYDNKIGYCHTSDPVYPKSKTGEK RALSNPQSMGFISVHDLRKLLLMELLCEGSFSRMQSGFLRKANR ILDETAEGKLQFSALFPEMRHRFIPPQNPKSKDRREKAETTLEKY KQEIKGRKDKLNSQLLSAFDMNQRQLPSRLLDEWMNIRPASHS VKLRTYVKQLNEDCRLRLRKFRKDGDGKARAIPLVGEMATFLS QDIVRMIISEETKKLITSAYYNEMQRSLAQYAGEENRRQFRAIV AELHLLDPSSGHPFLSATMETAHRYTEDFYKCYLEKKREWLAK TFYRPEQDENTKRRISVFFVPDGEARKLLPLLIRRRMKEQNDLQ DWIRNKQAHPIDLPSHLFDSKIMELLKVKDGKKKWNEAFKDW WSTKYPDGMQPFYGLRRELNIHGKSVSYIPSDGKKFADCYTHL MEKTVQDKKRELRTAGKPVPPDLAADIKRSFHRAVNEREFMLR LVQEDDRLMLMAINKMMTDREEDILPGLKNIDSILDKENQFSLA VHAKVLEKEGEGGDNSLSLVPATIEIKSKRKDWSKYIRYRYDR RVPGLMSHFPEHKATLDEVKTLLGEYDRCRIKIFDWAFALEGAI MSDRDLKPYLHESSSREGKSGEHSTLVKMLVEKKGCLTPDESQ YLILIRNKAAHNQFPCAAEMPLIYRDVSAKVGSIEGSSAKDLPE GSSLVDSLWKKYEMIIRKILPILDHENRFFGKLLNNMSQPINDL Porphyromonas WP_021677657 MNTVPASENKGQSRTVEDDPQYFGLYLNLARENLIEVESHVRIK gingivalis FGKKKLNEESLKQSLLCDHLLSVDRWTKVYGHSRRYLPFLHYF (SEQ ID DPDSQIEKDHDSKTGVDPDSAQRLIRELYSLLDFLRNDFSHNRL No. 161) DGTTFEHLEVSPDISSFITGTYSLACGRAQSRFADFFKPDDFVLA KNRKEQLISVADGKECLTVSGLAFFICLFLDREQASGMLSRIRGF KRTDENWARAVHETFCDLCIRHPHDRLESSNTKEALLLDMLNE LNRCPRILYDMLPEEERAQFLPALDENSMNNLSENSLNEESRLL WDGSSDWAEALTKRIRHQDRFPYLMLRFIEEMDLLKGIRFRVD LGEIELDSYSKKVGRNGEYDRTITDHALAFGKLSDFQNEEEVSR MISGEASYPVRFSLFAPRYAIYDNKIGYCHTSDPVYPKSKTGEK RALSNPQSMGFISVHDLRKLLLMELLCEGSFSRMQSGFLRKANR ILDETAEGKLQFSALFPEMRHRFIPPQNPKSKDRREKAETTLEKY KQEIKGRKDKLNSQLLSAFDMNQRQLPSRLLDEWMNIRPASHS VKLRTYVKQLNEDCRLRLRKFRKDGDGKARAIPLVGEMATFLS QDIVRMIISEETKKLITSAYYNEMQRSLAQYAGEENRRQFRAIV AELHLLDPSSGHPFLSATMETAHRYTEDFYKCYLEKKREWLAK TFYRPEQDENTKRRISVFFVPDGEARKLLPLLIRRRMKEQNDLQ DWIRNKQAHPIDLPSHLFDSKIMELLKVKDGKKKWNEAFKDW WSTKYPDGMQPFYGLRRELNIHGKSVSYIPSDGKKFADCYTHL MEKTVQDKKRELRTAGKPVPPDLAADIKRSFHRAVNEREFMLR LVQEDDRLMLMAINKMMTDREEDILPGLKNIDSILDEENQFSLA VHAKVLEKEGEGGDNSLSLVPATIEIKSKRKDWSKYIRYRYDR RVPGLMSHFPEHKATLDEVKTLLGEYDRCRIKIFDWAFALEGAI MSDRDLKPYLHESSSREGKSGEHSTLVKMLVEKKGCLTPDESQ YLILIRNKAAHNQFPCAAEMPLIYRDVSAKVGSIEGSSAKDLPE GSSLVDSLWKKYEMIIRKILPILDHENRFFGKLLNNMSQPINDL Porphyromonas WP_021680012 MNTVPASENKGQSRTVEDDPQYFGLYLNLARENLIEVESHVRIK gingivalis FGKKKLNEESLKQSLLCDHLLSVDRWTKVYGHSRRYLPFLHYF (SEQ ID DPDSQIEKDHDSKTGVDPDSAQRLIRELYSLLDFLRNDFSHNRL No. 162) DGTTFEHLEVSPDISSFITGTYSLACGRAQSRFADFFKPDDFVLA KNRKEQLISVADGKECLTVSGLAFFICLFLDREQASGMLSRIRGF KRTDENWARAVHETFCDLCIRHPHDRLESSNTKEALLLDMLNE LNRCPRILYDMLPEEERAQFLPALDENSMNNLSENSLNEESRLL WDGSSDWAEALTKRIRHQDRFPYLMLRFIEEMDLLKGIRFRVD LGEIELDSYSKKVGRNGEYDRTITDHALAFGKLSDFQNEEEVSR MISGEASYPVRFSLFAPRYAIYDNKIGYCHTSDPVYPKSKTGEK RALSNPRSMGFISVHDLRKLLLMELLCEGSFSRMQSDFLRKANR ILDETAEGKLQFSALFPEMRHRFIPPQNPKSKDRREKAETTLEKY KQEIKGRKDKLNSQLLSAFDMDQRQLPSRLLDEWMNIRPASHS VKLRTYVKQLNEDCRLRLQKFRKDGDGKARAIPLVGEMATFLS QDIVRMIISEETKKLITSAYYNEMQRSLAQYAGEENRHQFRAIV AELRLLDPSSGHPFLSATMETAHRYTEDFYKCYLEKKREWLAK TFYRPEQDENTKRRISVFFVPDGEARKLLPLLIRRRMKEQNDLQ DWIRNKQAHPIDLPSHLFDSKVMELLKVKDGKKKWNEAFKDW WSTKYPDGMQPFYGLRRELNIHGKSVSYIPSDGKKFADCYTHL MEKTVRDKKRELRTAGKPVPPDLAAYIKRSFHRAVNEREFMLR LVQEDDRLMLMAINKIMTDREEDILPGLKNIDSILDKENQFSLA VHAKVLEKEGEGGDNSLSLVPATIEIKSKRKDWSKYIRYRYDR RVPGLMSHFPEHKATLDEVKTLLGEYDRCRIKIFDWAFALEGAI MSDRDLKPYLHESSSREGKSGEHSTLVKMLVEKKGCLTPDESQ YLILIRNKAAHNQFPCAAEIPLIYRDVSAKVGSIEGSSAKDLPEG SSLVDSLWKKYEMIIRKILPILDPENRFFGKLLNNMSQPINDL Porphyromonas WP_023846767 MNTVPASENKGQSRTVEDDPQYFGLYLNLARENLIEVESHVRIK gingivalis FGKKKLNEESLKQSLLCDHLLSVDRWTKVYGHSRRYLPFLHYF (SEQ ID DPDSQIEKDHDSKTGVDPDSAQRLIRELYSLLDFLRNDFSHNRL No. 163) DGTTFEHLEVSPDISSFITGTYSLACGRAQSRFADFFKPDDFVLA KNRKEQLISVADGKECLTVSGLAFFICLFLDREQASGMLSRIRGF KRTDENWARAVHETFCDLCIRHPHDRLESSNTKEALLLDMLNE LNRCPRILYDMLPEEERAQFLPALDENSMNNLSENSLNEESRLL WDGSSDWAEALTKRIRHQDRFPYLMLRFIEEMDLLKGIRFRVD LGEIELDSYSKKVGRNGEYDRTITDHALAFGKLSDFQNEEEVSR MISGEASYPVRFSLFAPRYAIYDNKIGYCHTSDPVYPKSKTGEK RALSNPRSMGFISVHDLRKLLLMELLCEGSFSRMQSDFLRKANR ILDETAEGKLQFSALFPEMRHRFIPPQNPKSKDRREKAETTLEKY KQEIKGRKDKLNSQLLSAFDMNQRQLPSRLLDEWMNIRPASHS VKLRTYVKQLNEDCRLRLRKFRKDGDGKARAIPLVGEMATFLS QDIVRMIISEETKKLITSAYYNEMQRSLAQYAGEENRRQFRAIV AELHLLDPSSGHPFLSATMETAHRYTEDFYKCYLEKKREWLAK TFYRPEQDENTKRRISVFFVPDGEARKLLPLLIRRRMKEQNDLQ DWIRNKQAHPIDLPSHLFDSKIMELLKVKDGKKKWNEAFKDW WSTKYPDGMQPFYGLRRELNIHGKSVSYIPSDGKKFADCYTHL MEKTVQDKKRELRTAGKPVPPDLAADIKRSFHRAVNEREFMLR LVQEDDRLMLMAINKMMTDREEDILPGLKNIDSILDEENQFSLA VHAKVLEKEGEGGDNSLSLVPATIEIKSKRKDWSKYIRYRYDR RVPGLMSHFPEHKATLDEVKTLLGEYDRCRIKIFDWAFALEGAI MSDRDLKPYLHESSSREGKSGEHSTLVKMLVEKKGCLTPDESQ YLILIRNKAAHNQFPCAAEMPLIYRDVSAKVGSIEGSSAKDLPE GSSLVDSLWKKYEMIIRKILPILDPENRFFGKLLNNMSQPINDL Prevotella WP_036884929 MKNDNNSTKSTDYTLGDKHFWAAFLNLARHNVYITVNHINKV falsenii LELKNKKDQEIIIDNDQDILAIKTLWGKVDTDINKKDRLRELIM (SEQ ID KHFPFLEAATYQQSSTNNTKQKEEEQAKAQSFESLKDCLFLFLE No. 164) KLREARNYYSHYKHSKSLEEPKLEEKLLENMYNIFDTNVQLVIK DYEHNKDINPEEDFKHLGRAEGEFNYYFTRNKKGNITESGLLFF VSLFLEKKDAIWAQTKIKGFKDNRENKQKMTHEVFCRSRMLLP KLRLESTQTQDWILLDMLNELIRCPKSLYKRLQGEKREKFRVPF DPADEDYDAEQEPFKNTLVRHQDRFPYFALRYFDYNEIFTNLRF QIDLGTYHFSIYKKQIGDKKEDRHLTHKLYGFERIQEFAKENRP DEWKALVKDLDTFEESNEPYISETTPHYHLENQKIGIRNKNKKK KKTIWPSLETKTTVNERSKYNLGKSFKAEAFLSVHELLPMMFY YLLLNKEEPNNGKINASKVEGIIEKKIRDIYKLYGAFANEEINNE EELKEYCEGKDIAIRHLPKQMIAILKNEYKDMAKKAEDKQKKM IKDTKKRLAALDKQVKGEVEDGGRNIKPLKSGRIASWLVNDM MRFQPVQRDRDGYPLNNSKANSTEYQLLQRTLALFGSERERLA PYFRQMNLIGKDNPHPFLKDTKWKEHNNILSFYRSYLEAKKNF LGSLKPEDWKKNQYFLKLKEPKTNRETLVQGWKNGFNLPRGIF TEPIREWFIRHQNESEEYKKVKDFDRIGLVAKVIPLFFKEDYQKE IEDYVQPFYGYPFNVGNIHNSQEGTFLNKKEREELWKGNKTKF KDYKTKEKNKEKTNKDKFKKKTDEEKEEFRSYLDFQSWKKFE RELRLVRNQDIVTWLLCMELIDKLKIDELNIEELQKLRLKDIDTD TAKKEKNNILNRIMPMELPVTVYETDDSNNIIKDKPLHTIYIKEA ETKLLKQGNFKALVKDRRLNGLFSFVETSSEAELKSKPISKSLVE YELGEYQRARVEIIKDMLRLEETLIGNDEKLPTNKFRQMLDKW LEHKKETDDTDLKNDVKLLTEVRNAFSHNQYPMRDRIAFANIK PFSLSSANTSNEEGLGIAKKLKDKTKETIDRIIEIEEQTATKR Prevotella WP_036931485 MENDKRLEESTCYTLNDKHFWAAFLNLARHNVYITINHINKLL pleuritidis EIRQIDNDEKVLDIKALWQKVDKDINQKARLRELMIKHFPFLEA (SEQ ID AIYSNNKEDKEEVKEEKQAKAQSFKSLKDCLFLFLEKLQEARN No. 165) YYSHYKSSESSKEPEFEEGLLEKMYNTFGVSIRLVKEDYQYNKD IDPEKDFKHLERKEDFNYLFTDKDNKGKITKNGLLFFVSLFLEK KDAIWMQQKLRGFKDNRGNKEKMTHEVFCRSRMLLPKIRLES TQTQDWILLDMLNELIRCPKSLYERLQGAYREKFKVPFDSIDED YDAEQEPFRNTLVRHQDRFPYFALRYFDYNEIFKNLRFQIDLGT YHFSIYKKLIGDNKEDRHLTHKLYGFERIQEFAKQKRPNEWQA LVKDLDIYETSNEQYISETTPHYHLENQKIGIRFKNKKDKIWPSL ETNGKENEKSKYNLDKSFQAEAFLSIHELLPMMFYDLLLKKEEP NNDEKNASIVEGFIKKEIKRMYAIYDAFANEEINSKEGLEEYCK NKGFQERHLPKQMIAILTNKSKNMAEKAKRKQKEMIKDTKKR LATLDKQVKGEIEDGGRNIRLLKSGEIARWLVNDMMRFQSVQK DKEGKPLNNSKANSTEYQMLQRSLALYNKEQKPTPYFIQVNLI KSSNPHPFLEETKWEECNNILSFYRSYLEAKKNFLESLKPEDWK KNQYFLMLKEPKTNRKTLVQGWKNGFNLPRGIFTEPIKEWFKR HQNDSEEYKKVEALDRVGLVAKVIPLFFKEEYFKEDAQKEINN CVQPFYSFPYNVGNIHKPEEKNFLHCEERRKLWDKKKDKFKGY KAKEKSKKMTDKEKEEHRSYLEFQSWNKFERELRLVRNQDIVT WLLCTELIDKLKIDELNIEELQKLRLKDIDTDTAKKEKNNILNRI MPMQLPVTVYEIDKSFNIVKDKPLHTIYIEETGTKLLKQGNFKA LVKDRRLNGLFSFVKTSSEAESKSKPISKLRVEYELGAYQKARI DIIKDMLALEKTLIDNDENLPTNKFSDMLKSWLKGKGEANKAR LQNDVDLLVAIRNAFSHNQYPMYNSEVFKGMKLLSLSSDIPEKE GLGIAKQLKDKIKETIERIIEIEKEIRN [Porphyromonas WP_039417390 MTEQNERPYNGTYYTLEDKHFWAAFFNLARHNAYITLAHIDRQ gingivalis LAYSKADITNDEDILFFKGQWKNLDNDLERKARLRSLILKHFSF (SEQ ID LEGAAYGKKLFESQSSGNKSSKKKELTKKEKEELQANALSLDN No. 166) LKSILFDFLQKLKDFRNYYSHYRHPESSELPLFDGNMLQRLYNV FDVSVQRVKRDHEHNDKVDPHRHFNHLVRKGKKDRYGNNDN PFFKHHFVDREGTVTEAGLLFFVSLFLEKRDAIWMQKKIRGFKG GTEAYQQMTNEVFCRSRISLPKLKLESLRTDDWMLLDMLNELV RCPKSLYDRLREEDRARFRVPIDILSDEDDTDGTEEDPFKNTLVR HQDRFPYFALRYFDLKKVFTSLRFHIDLGTYHFAIYKKNIGEQPE DRHLTRNLYGFGRIQDFAEEHRPEEWKRLVRDLDYFETGDKPY ITQTTPHYHIEKGKIGLRFVPEGQHLWPSPEVGATRTGRSKYAQ DKRLTAEAFLSVHELMPMMFYYFLLREKYSEEVSAEKVQGRIK RVIEDVYAVYDAFARGEIDTLDRLDACLADKGIRRGHLPRQMI AILSQEHKDMEEKVRKKLQEMIADTDHRLDMLDRQTDRKIRIG RKNAGLPKSGVIADWLVRDMMRFQPVAKDTSGKPLNNSKANS TEYRMLQRALALFGGEKERLTPYFRQMNLTGGNNPHPFLHETR WESHTNILSFYRSYLKARKAFLQSIGRSDREENHRFLLLKEPKT DRQTLVAGWKSEFHLPRGIFTEAVRDCLIEMGYDEVGSYKEVG FMAKAVPLYFERACKDRVQPFYDYPFNVGNSLKPKKGRFLSKE KRAEEWESGKERFRLAKLKKEILEAKEHPYLDFKSWQKFEREL RLVKNQDIITWMMCRDLMEENKVEGLDTGTLYLKDIRTDVHE QGSLNVLNRVKPMRLPVVVYRADSRGHVHKEQAPLATVYIEE RDTKLLKQGNFKSFVKDRRLNGLFSFVDTGALAMEQYPISKLR VEYELAKYQTARVCAFEQTLELEESLLTRYPHLPDKNFRKMLE SWSDPLLDKWPDLHRKVRLLIAVRNAFSHNQYPMYDEAVFSSI RKYDPSSPDAIEERMGLNIAHRLSEEVKQAKEMAERIIQV Porphyromonas WP_094189123 MTEQSERPYNGTYYTLEDKHFWAAFLNLARHNAYITLTHIDRQ gulae LAYSKADITNDQDVLSFKALWKNLDNDLERKSRLRSLILKHFSF (SEQ ID LEGAAYGKKLFESKSSGNKSSKNKELTKKEKEELQANALSLDN No. 167) LKSILFDFLQKLKDFRNYYSHYRHSGSSELPLFDGNMLQRLYNV FDVSVQRVKRDHEHNDKVDPHRHFNHLVRKGKKDRYGHNDN PSFKHHFVDSEGMVTEAGLLFFVSLFLEKRDAIWMQKKIRGFK GGTETYQQMTNEVFCRSRISLPKLKLESLRMDDWMLLDMLNE LVRCPKPLYDRLREDDRACFRVPVDILPDEDDTDGGGEDPFKN TLVRHQDRFPYFALRYFDLKKVFTSLRFHIDLGTYHFAIYKKMI GEQPEDRHLTRNLYGFGRIQDFAEEHRPEEWKRLVRDLDYFET GDKPYISQTSPHYHIEKGKIGLRFMPEGQHLWPSPEVGTTRTGR SKYAQDKRLTAEAFLSVHELMPMMFYYFLLREKYSEEVSAEKV QGRIKRVIEDVYAIYDAFARDEINTLKELDACLADKGIRRGHLP KQMIAILSQEHKNMEEKVRKKLQEMIADTDHRLDMLDRQTDR KIRIGRKNAGLPKSGVIADWLVRDMMRFQPVAKDASGKPLNNS KANSTEYRMLQRALALFGGEKERLTPYFRQMNLTGGNNPHPFL HDTRWESHTNILSFYRSYLRARKAFLERIGRSDRMENRPFLLLK EPKTDRQTLVAGWKSEFHLPRGIFTEAVRDCLIEMGYDEVGSY REVGFMAKAVPLYFERACEDRVQPFYDSPFNVGNSLKPKKGRF LSKEERAEEWERGKERFRDLEAWSHSAARRIEDAFAGIEYASPG NKKKIEQLLRDLSLWEAFESKLKVRADKINLAKLKKEILEAQEH PYHDFKSWQKFERELRLVKNQDIITWMMCRDLMEENKVEGLD TGTLYLKDIRTNVQEQGSLNVLNHVKPMRLPVVVYRADSRGH VHKEEAPLATVYIEERDTKLLKQGNFKSFVKDRRLNGLFSFVDT GGLAMEQYPISKLRVEYELAKYQTARVCAFEQTLELEESLLTRY PHLPDKNFRKMLESWSDPLLAKWPELHGKVRLLIAVRNAFSHN QYPMYDEAVFSSIRKYDPSSPDAIEERMGLNIAHRLSEEVKQAK ETVERIIQA Porphyromonas WP_039419792 MTEQSERPYNGTYYTLEDKHFWAAFLNLARHNAYITLTHIDRQ gulae LAYSKADITNDQDVLSFKALWKNLDNDLERKSRLRSLILKHFSF (SEQ ID LEGAAYGKKLFESKSSGNKSSKNKELTKKEKEELQANALSLDN No. 168) LKSILFDFLQKLKDFRNYYSHYRHSGSSELPLFDGNMLQRLYNV FDVSVQRVKRDHEHNDKVDPHRHFNHLVRKGKKDRYGHNDN PSFKHHFVDGEGMVTEAGLLFFVSLFLEKRDAIWMQKKIRGFK GGTETYQQMTNEVFCRSRISLPKLKLESLRTDDWMLLDMLNEL VRCPKPLYDRLREKDRARFRVPVDILPDEDDTDGGGEDPFKNT LVRHQDRFPYFALRYFDLKKVFTSLRFHIDLGTYHFAIYKKVIG EQPEDRHLTRNLYGFGRIQDFAEEHRPEEWKRLVRDLDYFETG DKPYISQTTPHYHIEKGKIGLRFVPEGQHLWPSPEVGTTRTGRSK YAQDKRLTAEAFLSVHELMPMMFYYFLLREKYSEEVSAEKVQ GRIKRVIEDVYAIYDAFARDEINTRDELDACLADKGIRRGHLPK QMIGILSQEHKNMEEKVRKKLQEMIADTDHRLDMLDRQTDRKI RIGRKNAGLPKSGVIADWLVRDMMRFQPVAKDTSGKPLNNSK ANSTEYRMLQRALALFGGEKERLTPYFRQMNLTGGNNPHPFLD ETRWESHTNILSFYRSYLRARKAFLERIGRSDRVENRPFLLLKEP KTDRQTLVAGWKSEFHLPRGIFTEAVRDCLIEMGYDEVGSYKE VGFMAKAVPLYFERACKDRVQPFYDSPFNVGNSLKPKKGRFLS KEKRAEEWESGKERFRLAKLKKEILEAQEHPYHDFKSWQKFER ELRLVKNQDIITWMMCRDLMEENKVEGLDTGTLYLKDIRPNVQ EQGSLNVLNRVKPMRLPVVVYRADSRGHVHKEEAPLATVYIEE RDTKLLKQGNFKSFVKDRRLNGLFSFVDTGGLAMEQYPISKLR VEYELAKYQTARVCVFELTLRLEESLLSRYPHLPDESFREMLES WSDPLLAKWPELHGKVRLLIAVRNAFSHNQYPMYDEAVFSSIR KYDPSSPDAIEERMGLNIAHRLSEEVKQAKETVERIIQA Porphyromonas WP_039426176 MTEQSERPYNGTYYTLEDKHFWAAFLNLARHNAYITLTHIDRQ gulae LAYSKADITNDQDVLSFKALWKNFDNDLERKSRLRSLILKHFSF (SEQ ID LEGAAYGKKLFESKSSGNKSSKNKELTKKEKEELQANALSLDN No. 169) LKSILFDFLQKLKDFRNYYSHYRHSGSSELPLFDGNMLQRLYNV FDVSVQRVKRDHEHNDKVDPHYHFNHLVRKGKKDRYGHNDN PSFKHHFVDSEGMVTEAGLLFFVSLFLEKRDAIWMQKKIRGFK GGTGPYEQMTNEVFCRSRISLPKLKLESLRTDDWMLLDMLNEL VRCPKPLYDRLREKDRACFRVPVDILPDEDDTDGGGEDPFKNT LVRHQDRFPYFALRYFDLKKVFTSLRFHIDLGTYHFAIYKKMIG EQPEDRHLTRNLYGFGRIQDFAEEHRPEEWKRLVRDLDYFETG DKPYISQTTPHYHIEKGKIGLRFMPEGQHLWPSPEVGTTRTGRS KYAQDKRLTAEAFLSVHELMPMMFYYFLLREKYSEEVSAEKV QGRIKRVIKDVYAIYDAFARDEINTLKELDACSADKGIRRGHLP KQMIGILSQEHKNMEEKVRKKLQEMIADTDHRLDMLDRQTDR KIRIGRKNAGLPKSGVIADWLVRDMMRFQPVAKDTSGKPLNNS KANSTEYRMLQRALALFGGEKERLTPYFRQMNLTGGNNPHPFL DETRWESHTNILSFYRSYLRARKAFLERIGRSDRVENRPFLLLKE PKNDRQTLVAGWKSEFHLPRGIFTEAVRDCLIEMGYDEVGSYK EVGFMAKAVPLYFERACKDRVQPFYDSPFNVGNSLKPKKGRFL SKEKRAEEWESGKERFRLAKLKKEILEAKEHPYHDFKSWQKFE RELRLVKNQDIITWMMCRDLMEENKVEGLDTGTLYLKDIRTDV HEQGSLNVLNRVKPMRLPVVVYRADSRGHVHKEQAPLATVYI EERDTKLLKQGNFKSFVKDRRLNGLFSFVDTGGLAMEQYPISK LRVEYELAKYQTARVCAFEQTLELEESLLTRYPHLPDENFREML ESWSDPLLGKWPDLHGKVRLLIAVRNAFSHNQYPMYDEAVFSS IRKYDPSSPDAIEERMGLNIAHRLSEEVKQAKETVERIIQA Porphyromonas WP_039431778 MTEQSERPYNGTYYTLEDKHFWAAFLNLARHNAYITLTHIDRQ gulae LAYSKADITNDQDVLSFKALWKNFDNDLERKSRLRSLILKHFSF (SEQ ID LEGAAYGKKLFESKSSGNKSSKNKELTKKEKEELQANALSLDN No. 170) LKSILFDFLQKLKDFRNYYSHYRHSESSELPLFDGNMLQRLYNV FDVSVQRVKRDHEHNDKVDPHRHFNHLVRKGKKDRYGHNDN PSFKHHFVDGEGMVTEAGLLFFVSLFLEKRDAIWMQKKIRGFK GGTETYQQMTNEVFCRSRISLPKLKLESLRTDDWMLLDMLNEL VRCPKPLYDRLREDDRACFRVPVDILPDEDDTDGGGEDPFKNT LVRHQDRFPYFALRYFDLKKVFTSLRFHIDLGTYHFAIYKKMIG EQPEDRHLTRNLYGFGRIQDFAEEHRPEEWKRLVRDLDYFETG DKPYISQTSPHYHIEKGKIGLRFMPEGQHLWPSPEVGTTRTGRS KYAQDKRLTAEAFLSVHELMPMMFYYFLLREKYSEEVSAEKV QGRIKRVIEDVYAIYDAFARDEINTLKELDACLADKGIRRGHLP KQMIAILSQEHKDMEEKIRKKLQEMIADTDHRLDMLDRQTDRK IRIGRKNAGLPKSGVIADWLVRDMMRFQPVAKDTSGKPLNNSK ANSTEYRMLQRALALFGGEKKRLTPYFRQMNLTGGNNPHPFLH ETRWESHTNILSFYRSYLRARKAFLERIGRSDRMENRPFLLLKEP KTDRQTLVAGWKSEFHLPRGIFTEAVRDCLIEMGYDEVGSYRE VGFMAKAVPLYFERACEDRVQPFYDSPFNVGNSLKPKKGRFLS KEERAEEWERGKERFRDLEAWSHSAARRIEDAFAGIEYASPGN KKKIEQLLRDLSLWEAFESKLKVRADKINLAKLKKEILEAQEHP YHDFKSWQKFERELRLVKNQDIITWMMCRDLMEENKVEGLDT GTLYLKDIRPNVQEQGSLNVLNRVKPMRLPVVVYRADSRGHV HKEEAPLATVYIEERDTKLLKQGNFKSFVKDRRLNGLFSFVDTG GLAMEQYPISKLRVEYELAKYQTARVCVFELTLRLEESLLTRYP HLPDESFRKMLESWSDPLLAKWPELHGKVRLLIAVRNAFSHNQ YPMYDEAVFSSIRKYDPSSPDAIEERMGLNIAHRLSEEVKQAKE TVERIIQV Porphyromonas WP_039437199 MTEQSERPYNGTYYTLEDKHFWAAFLNLARHNAYITLTHIDRQ gulae LAYSKADITNDEDILFFKGQWKNLDNDLERKSRLRSLILKHFSF (SEQ ID LEGAAYGKKFFESKSSGNKSSKNKELTKKEKEELQANALSLDN No. 171) LKSILFDFLQKLKDFRNYYSHYRHSGSSELPLFDGNMLQRLYNV FDVSVQRVKRDHEHNDEVDPHYHFNHLVRKGKKDRYGHNDN PSFKHHFVDGEGMVTEAGLLFFVSLFLEKRDAIWMQKKIRGFK GGTEPYEQMTNEVFCRSRISLPKLKLESLRTDDWMLLDMLNEL VRCPKPLYDRLREKDRACFRVPVDILPDEDDTDGGGEDPFKNT LVRHQDRFPYFALRYFDLKKVFTSLRFHIDLGTYHFAIYKKMIG EQPEDRHLTRNLYGFGRIQDFAEEHRPEEWKRLVRDLDYFETG DKPYISQTTPHYHIEKGKIGLRFVPEGQHLWPSPEVGTTRTGRSK YAQDKRLTAEAFLSVHELMPMMFYYFLLREKYSEEVSAEKVQ GRIKRVIEDVYAIYDAFARDEINTLKELDACLADKGIRRGHLPK QMIGILSQERKDMEEKVRKKLQEMIADTDHRLDMLDRQTDRKI RIGRKNAGLPKSGVIADWLVRDMMRFQPVAKDTSGKPLNNSK ANSTEYRMLQRALALFGGEKERLTPYFRQMNLTGGNNPHPFLH ETRWESHTNILSFYRSYLRARKAFLERIGRSDRVENCPFLLLKEP KTDRQTLVAGWKGEFHLPRGIFTEAVRDCLIEMGYDEVGSYRE VGFMAKAVPLYFERACEDRVQPFYDSPFNVGNSLKPKKGRFLS KEKRAEEWESGKERFRLAKLKKEILEAQEHPYHDFKSWQKFER ELRLVKNQDIITWMMCRDLMEENKVEGLDTGTLYLKDIRPNVQ EQGSLNVLNRVKPMRLPVVVYRADSRGHVHKEEAPLATVYIEE RDTKLLKQGNFKSFVKDRRLNGLFSFVDTGALAMEQYPISKLR VEYELAKYQTARVCAFEQTLELEESLLTRYPHLPDESFREMLES WSDPLLTKWPELHGKVRLLIAVRNAFSHNQYPMYDEAVFSSIW KYDPSSPDAIEERMGLNIAHRLSEEVKQAKETIERIIQA Porphyromonas WP_039442171 MTEQSERPYNGTYYTLEDKHFWAAFLNLARHNAYITLTHIDRQ gulae LAYSKADITNDQDVLSFKALWKNLDNDLERKSRLRSLILKHFSF (SEQ ID LEGAAYGKKLFESKSSGNKSSKNKELTKKEKEELQANALSLDN No. 172) LKSILFDFLQKLKDFRNYYSHYRHSGSSELPLFDGNMLQRLYNV FDVSVQRVKRDHEHNDKVDPHYHFNHLVRKGKKDRYGHNDN PSFKHHFVDSEGMVTEAGLLFFVSLFLEKRDAIWMQKKIRGFK GGTGPYEQMTNEVFCRSRISLPKLKLESLRTDDWMLLDMLNEL VRCPKPLYDRLREKDRACFRVPVDILPDEDDTDGGGEDPFKNT LVRHQDRFPYFALRYFDLKKVFTSLRFHIDLGTYHFAIYKKMIG EQPEDRHLTRNLYGFGRIQDFAEEHRPEEWKRLVRDLDYLETG DKPYISQTTPHYHIEKGKIGLRFVPEGQHLWPSPEVGTTRTGRSK CAQDKRLTAEAFLSVHELMPMMFYYFLLREKYSEEVSAEKVQ GRIKRVIEDVYAIYDAFARDEINTLKELDTCLADKGIRRGHLPK QMITILSQERKDMKEKIRKKLQEMIADTDHRLDMLDRQTDRKI RIGRKNAGLPKSGVIADWLVRDMMRFQPVAKDASGKPLNNSK ANSTEYRMLQRALALFGGEKERLTPYFRQMNLTGGNNPHPFLH ETRWESHTNILSFYRSYLRARKAFLERIGRSDRVENCPFLLLKEP KTDRQTLVAGWKDEFHLPRGIFTEAVRDCLIEMGYDEVGSYRE VGFMAKAVPLYFERACEDRVQPFYDSPFNVGNSLKPKKGRFLS KEDRAEEWERGMERFRDLEAWSHSAARRIKDAFAGIEYASPGN KKKIEQLLRDLSLWEAFESKLKVRADKINLAKLKKEILEAQEHP YHDFKSWQKFERELRLVKNQDIITWMMCRDLMEENKVEGLDT GTLYLKDIRPNVQEQGSLNVLNRVKPMRLPVVVYRADSRGHV HKEAPLATVYIEERNTKLLKQGNFKSFVKDRRLNGLFSFVDTG GLAMEQYPISKLRVEYELAKYQTARVCVFELTLRLEESLLSRYP HLPDESFREMLESWSDPLLAKWPELHGKVRLLIAVRNAFSHNQ YPMYDEAVFSSIRKYDPSSPDAIEERMGLNIAHRLSEEVKQAKE TVERIIQA Porphyromonas WP_039445055 MNTVPATENKGQSRTVEDDPQYFGLYLNLARENLIEVESHVRI gulae KFGKKKLNEESLKQSLLCDHLLSIDRWTKVYGHSRRYLPFLHCF (SEQ ID DPDSGIEKDHDSKTGVDPDSAQRLIRELYSLLDFLRNDFSHNRL No. 173) DGTTFEHLKVSPDISSFITGAYTFACERAQSRFADFFKPDDFLLA KNRKEQLISVADGKECLTVSGFAFFICLFLDREQASGMLSRIRGF KRTDENWARAVHETFCDLCIRHPHDRLESSNTKEALLLDMLNE LNRCPRILYDMLPEEERAQFLPALDENSMNNLSENSLNEESRLL WDGSSDWAEALTKRIRHQDRFPYLMLRFIEEMDLLKGIRFRVD LGEIELDSYSKKVGRNGEYDRTITDHALAFGKLSDFQNEEEVSR MISGEASYPVRFSLFAPRYAIYDNKIGYCHTSDPVYPKSKTGEK RALSNPQSMGFISVHDLRKLLLMELLCEGSFSRMQSDFLRKANR ILDETAEGKLQFSALFPEMRHRFIPPQNPKSKDRREKAETTLEKY KQEIKGRKDKLNSQLLSAFDMNQRQLPSRLLDEWMNIRPASHS VKLRTYVKQLNEDCRLRLRKFRKDGDGKARAIPLVGEMATFLS QDIVRMIISEETKKLITSAYYNEMQRSLAQYAGEENRRQFRAIV AELHLLDPSSGHPFLSATMETAHRYTEDFYKCYLEKKREWLAK TFYRPEQDENTKRRISVFFVPDGEARKLLPLLIRRRMKEQNDLQ DWIRNKQAHPIDLPSHLFDSKIMELLKVKDGKKKWNEAFKDW WSTKYPDGMQPFYGLRRELNIHGKSVSYIPSDGKKFADCYTHL MEKTVRDKKRELRTAGKPVPPDLAAYIKRSFHRAVNEREFMLR LVQEDDRLMLMAINKMMTDREEDILPGLKNIDSILDEENQFSLA VHAKVLEKEGEGGDNSLSLVPATIEIKSKRKDWSKYIRYRYDR RVPGLMSHFPEHKATLDEVKTLLGEYDRCRIKIFDWAFALEGAI MSDRDLKPYLHESSSREGKSGEHSTLVKMLVEKKGCLTPDESQ YLILIRNKAAHNQFPCAAEMPLIYRDVSAKVGSIEGSSAKDLPE GSSLVDSLWKKYEMIIRKILPILDHENRFFGKLLNNMSQPINDL Capnocytophaga WP_041989581 MENKTSLGNNIYYNPFKPQDKSYFAGYLNAAMENIDSVFRELG cynodegmi KRLKGKEYTSENFFDAIFKENISLVEYERYVKLLSDYFPMARLL (SEQ ID DKKEVPIKERKENFKKNFRGIIKAVRDLRNFYTHKEHGEVEITD No. 174) EIFGVLDEMLKSTVLTVKKKKIKTDKTKEILKKSIEKQLDILCQK KLEYLKDTARKIEEKRRNQRERGEKKLVPRFEYSDRRDDLIAAI YNDAFDVYIDKKKDSLKESSKTKYNTESYPQQEEGDLKIPISKN GVVFLLSLFLSKQEVHAFKSKIAGFKATVIDEATVSHRKNSICF MATHEIFSHLAYKKLKRKVRTAEINYSEAENAEQLSIYAKETLM MQMLDELSKVPDVVYQNLSEDVQKTFIEDWNEYLKENNGDVG TMEEEQVIHPVIRKRYEDKFNYFAIRFLDEFAQFPTLRFQVHLG NYLHDSRPKEHLISDRRIKEKITVFGRLSELEHKKALFIKNTETN EDRKHYWEVFPNPNYDFPKENISVNDKDFPIAGSILDREKQPTA GKIGIKVNLLNQKYISEVDKAVKAHQLKQRNNKPSIQNIIEEIVPI NGSNPKEIIVFGGQPTAYLSMNDIHSILYEFFDKWEKKKEKLEK KGEKELRKEIGKELEEKIVGKIQTQIQQIIDKDINAKILKPYQDDD STAIDKEKLIKDLKQEQKILQKLKNEQTAREKEYQECIAYQEES RKIKRSDKSRQKYLRNQLKRKYPEVPTRKEILYYQEKGKVAVW LANDIKRFMPTDFKNEWKGEQHSLLQKSLAYYEQCKEELKNLL PQQKVFKHLPFELGGHFQQKYLYQFYTRYLDKRLEHISGLVQQ AENFKNENKVFKKVENECFKFLKKQNYTHKGLDAQAQSVLGY PIFLERGFMDEKPTIIKGKTFKGNESLFTDWFRYYKEYQNFQTF YDTENYPLVELEKKQADRKRETKIYQQKKNDVFTLLMAKHIFK SVFKQDSIDRFSLEDLYQSREERLENQEKAKQTGERNTNYIWNK TVDLNLCDGKVTVENVKLKNVGNFIKYEYDQRVQTFLKYEENI KWQAFLIKESKEEENYPYIVEREIEQYEKVRREELLKEVHLIEEY ILEKVKDKEILKKGDNQNFKYYILNGLLKQLKNEDVESYKVFN LNTKPEDVNINQLKQEATDLEQKAFVLTYIRNKFAHNQLPKKEF WDYCQEKYGKIEKEKTYAEYFAEVFKREKEALMK Prevotella WP_042518169 MNIPALVENQKKYFGTYSVMAMLNAQTVLDHIQKVADIEGEQ sp. P5-119 NENNENLWFHPVMSHLYNAKNGYDKQPEKTMFIIERLQSYFPF (SEQ ID LKIMAENQREYSNGKYKQNRVEVNSNDIFEVLKRAFGVLKMY No. 175) RDLTNHYKTYEEKLIDGCEFLTSTEQPLSGMISKYYTVALRNTK ERYGYKTEDLAFIQDNIKKITKDAYGKRKSQVNTGFFLSLQDYN GDTQKKLHLSGVGIALLICLFLDKQYINIFLSRLPIFSSYNAQSEE RRIIIRSFGINSIKLPKDRIHSEKSNKSVAMDMLNEVKRCPDELFT TLSAEKQSRFRIISDDHNEVLMKRSTDRFVPLLLQYIDYGKLFD HIRFHVNMGKLRYLLKADKTCIDGQTRVRVIEQPLNGFGRLEE AETMRKQENGTFGNSGIRIRDFENVKRDDANPANYPYIVDTYT HYILENNKVEMFISDKGSSAPLLPLIEDDRYVVKTIPSCRMSTLEI PAMAFHMFLFGSKKTEKLIVDVHNRYKRLFQAMQKEEVTAENI ASFGIAESDLPQKILDLISGNAHGKDVDAFIRLTVDDMLTDTER RIKRFKDDRKSIRSADNKMGKRGFKQISTGKLADFLAKDIVLFQ PSVNDGENKITGLNYRIMQSAIAVYDSGDDYEAKQQFKLMFEK ARLIGKGTTEPHPFLYKVFARSIPANAVDFYERYLIERKFYLTGL CNEIKRGNRVDVPFIRRDQNKWKTPAMKTLGRIYSEDLPVELPR QMFDNEIKSHLKSLPQMEGIDFNNANVTYLIAEYMKRVLNDDF QTFYQWKRNYHYMDMLKGEYDRKGSLQHCFTSVEEREGLWK ERASRTERYRKLASNKIRSNRQMRNASSEEIETILDKRLSNCRNE YQKSEKVIRRYRVQDALLFLLAKKTLTELADFDGERFKLKEIMP DAEKGILSEIMPMSFTFEKGGKKYTITSEGMKLKNYGDFFVLAS DKRIGNLLELVGSDIVSKEDIMEEFNKYDQCRPEISSIVFNLEKW AFDTYPELSARVDREEKVDFKSILKILLNNKNINKEQSDILRKIR NAFDHNNYPDKGIVEIKALPEIAMSIKKAFGEYAIMK Prevotella WP_044072147 MNIPALVENQKKYFGTYSVMAMLNAQTVLDHIQKVADIEGEQ sp. P4-76 NENNENLWFHPVMSHLYNAKNGYDKQPEKTMFIIERLQSYFPF (SEQ ID LKIMAENQREYSNGKYKQNRVEVNSNDIFEVLKRAFGVLKMY No. 176) RDQASHYKTYDEKLIDGCEFLTSTEQPLSGMINNYYTVALRNM NERYGYKTEDLAFIQDKRFKFVKDAYGKKKSQVNTGFFLSLQD YNGDTQKKLHLSGVGIALLICLFLDKQYINIFLSRLPIFSSYNAQS EERRIIIRSFGINSIKQPKDRIHSEKSNKSVAMDMLNEIKRCPNEL FETLSAEKQSRFRIISNDHNEVLMKRSSDRFVPLLLQYIDYGKLF DHIRFHVNMGKLRYLLKADKTCIDGQTRVRVIEQPLNGFGRLE EVETMRKQENGTFGNSGIRIRDFENMKRDDANPANYPYIVDTY THYILENNKVEMFISDEETPAPLLPVIEDDRYVVKTIPSCRMSTL EIPAMAFHMFLFGSKKTEKLIVDVHNRYKRLFKAMQKEEVTAE NIASFGIAESDLPQKIIDLISGNAHGKDVDAFIRLTVDDMLADTE RRIKRFKDDRKSIRSADNKMGKRGFKQISTGKLADFLAKDIVLF QPSVNDGENKITGLNYRIMQSAIAVYNSGDDYEAKQQFKLMFE KARLIGKGTTEPHPFLYKVFVRSIPANAVDFYERYLIERKFYLIG LSNEIKKGNRVDVPFIRRDQNKWKTPAMKTLGRIYDEDLPVELP RQMFDNEIKSHLKSLPQMEGIDFNNANVTYLIAEYMKRVLNDD FQTFYQWKRNYRYMDMLRGEYDRKGSLQSCFTSVEEREGLWK ERASRTERYRKLASNKIRSNRQMRNASSEEIETILDKRLSNSRNE YQKSEKVIRRYRVQDALLFLLAKKTLTELADFDGERFKLKEIMP DAEKGILSEIMPMSFTFEKGGKKYTITSEGMKLKNYGDFFVLAS DKRIGNLLELVGSDTVSKEDIMEEFKKYDQCRPEISSIVFNLEKW AFDTYPELSARVDREEKVDFKSILKILLNNKNINKEQSDILRKIR NAFDHNNYPDKGVVEIRALPEIAMSIKKAFGEYAIMK Prevotella WP_044074780 MNIPALVENQKKYFGTYSVMAMLNAQTVLDHIQKVADIEGEQ sp. P5-60 NENNENLWFHPVMSHLYNAKNGYDKQPEKTMFIIERLQSYFPF (SEQ ID LKIMAENQREYSNGKYKQNRVEVNSNDIFEVLKRAFGVLKMY No. 177) RDLTNHYKTYEEKLIDGCEFLTSTEQPFSGMISKYYTVALRNTK ERYGYKAEDLAFIQDNRYKFTKDAYGKRKSQVNTGSFLSLQDY NGDTTKKLHLSGVGIALLICLFLDKQYINLFLSRLPIFSSYNAQSE ERRIIIRSFGINSIKQPKDRIHSEKSNKSVAMDMLNEVKRCPDELF TTLSAEKQSRFRIISDDHNEVLMKRSSDRFVPLLLQYIDYGKLFD HIRFHVNMGKLRYLLKADKTCIDGQTRVRVIEQPLNGFGRLEE VETMRKQENGTFGNSGIRIRDFENMKRDDANPANYPYIVETYT HYILENNKVEMFISDEENPTPLLPVIEDDRYVVKTIPSCRMSTLEI PAMAFHMFLFGSEKTEKLIIDVHDRYKRLFQAMQKEEVTAENI ASFGIAESDLPQKIMDLISGNAHGKDVDAFIRLTVDDMLTDTER RIKRFKDDRKSIRSADNKMGKRGFKQISTGKLADFLAKDIVLFQ PSVNDGENKITGLNYRIMQSAIAVYDSGDDYEAKQQFKLMFEK ARLIGKGTTEPHPFLYKVFVRSIPANAVDFYERYLIERKFYLIGL SNEIKKGNRVDVPFIRRDQNKWKTPAMKTLGRIYSEDLPVELPR QMFDNEIKSHLKSLPQMEGIDFNNANVTYLIAEYMKRVLNDDF QTFYQWKRNYRYMDMLRGEYDRKGSLQHCFTSIEEREGLWKE RASRTERYRKLASNKIRSNRQMRNASSEEIETILDKRLSNCRNE YQKSEKIIRRYRVQDALLFLLAKKTLTELADFDGERFKLKEIMP DAEKGILSEIMPMSFTFEKGGKIYTITSGGMKLKNYGDFFVLAS DKRIGNLLELVGSNTVSKEDIMEEFKKYDQCRPEISSIVFNLEKW AFDTYPELPARVDRKEKVDFWSILDVLSNNKDINNEQSYILRKI RNAFDHNNYPDKGIVEIKALPEIAMSIKKAFGEYAIMK Phaeodactylibacter WP_044218239 MTNTPKRRTLHRHPSYFGAFLNIARHNAFMIMEHLSTKYDMED xiamenensis KNTLDEAQLPNAKLFGCLKKRYGKPDVTEGVSRDLRRYFPFLN (SEQ ID YPLFLHLEKQQNAEQAATYDINPEDIEFTLKGFFRLLNQMRNNY No. 178) SHYISNTDYGKFDKLPVQDIYEAAIFRLLDRGKHTKRFDVFESK HTRHLESNNSEYRPRSLANSPDHENTVAFVTCLFLERKYAFPFL SRLDCFRSTNDAAEGDPLIRKASHECYTMFCCRLPQPKLESSDIL LDMVNELGRCPSALYNLLSEEDQARFHIKREEITGFEEDPDEELE QEIVLKRHSDRFPYFALRYFDDTEAFQTLRFDVYLGRWRTKPV YKKRIYGQERDRVLTQSIRTFTRLSRLLPIYENVKHDAVRQNEE DGKLVNPDVTSQFHKSWIQIESDDRAFLSDRIEHFSPHYNFGDQ VIGLKFINPDRYAAIQNVFPKLPGEEKKDKDAKLVNETADAIIST HEIRSLFLYHYLSKKPISAGDERRFIQVDTETFIKQYIDTIKLFFED IKSGELQPIADPPNYQKNEPLPYVRGDKEKTQEERAQYRERQKE IKERRKELNTLLQNRYGLSIQYIPSRLREYLLGYKKVPYEKLAL QKLRAQRKEVKKRIKDIEKMRTPRVGEQATWLAEDIVFLTPPK MHTPERKTTKHPQKLNNDQFRIMQSSLAYFSVNKKAIKKFFQK ETGIGLSNRETSHPFLYRIDVGRCRGILDFYTGYLKYKMDWLDD AIKKVDNRKHGKKEAKKYEKYLPSSIQHKTPLELDYTRLPVYLP RGLFKKAIVKALAAHADFQVEPEEDNVIFCLDQLLDGDTQDFY NWQRYYRSALTEKETDNQLVLAHPYAEQILGTIKTLEGKQKNN KLGNKAKQKIKDELIDLKRAKRRLLDREQYLRAVQAEDRALW LMIQERQKQKAEHEEIAFDQLDLKNITKILTESIDARLRIPDTKV DITDKLPLRRYGDLRRVAKDRRLVNLASYYHVAGLSEIPYDLV KKELEEYDRRRVAFFEHVYQFEKEVYDRYAAELRNENPKGEST YFSHWEYVAVAVKHSADTHFNELFKEKVMQLRNKFHHNEFPY FDWLLPEVEKASAALYADRVFDVAEGYYQKMRKLMRQ Flayobacterium WP_045968377 MDNNITVEKTELGLGITYNHDKVEDKHYFGGFFNLAQNNIDLV sp. 316 AQEFKKRLLIQGKDSINIFANYFSDQCSITNLERGIKILAEYFPVV (SEQ ID SYIDLDEKNKSKSIREHLILLLETINNLRNYYTHYYHKKIIIDGSL No. 179) FPLLDTILLKVVLEIKKKKLKEDKTKQLLKKGLEKEMTILFNLM KAEQKEKKIKGWNIDENIKGAVLNRAFSHLLYNDELSDYRKSK YNTEDETLKDTLTESGILFLLSFFLNKKEQEQLKANIKGYKGKIA SIPDEEITLKNNSLRNMATHWTYSHLTYKGLKHRIKTDHEKETL LVNMVDYLSKVPHEIYQNLSEQNKSLFLEDINEYMRDNEENHD SSEASRVIHPVIRKRYENKFAYFAIRFLDEFAEFPTLRFMVNVGN YIHDNRKKDIGGTSLITNRTIKQQINVFGNLTEIHKKKNDYFEKE ENKEKTLEWELFPNPSYHFQKENIPIFIDLEKSKETNDLAKEYAK EKKKIFGSSRKKQQNTAKKNRETIINLVFDKYKTSDRKTVTFEQ PTALLSFNELNSFLYAFLVENKTGKELEKIIIEKIANQYQILKNCS STVDKTNDNIPKSIKKIVNTTTDSFYFEGKKIDIEKLEKDITIEIEK TNEKLETIKENEESAQNYKRNERNTQKRKLYRKYVFFTNEIGIE ATWITNDILRFLDNKENWKGYQHSELQKFISQYDNYKKEALGL LESEWNLESDAFFGQNLKRMFQSNSTFETFYKKYLDNRKNTLE TYLSAIENLKTMTDVRPKVLKKKWTELFRFFDKKIYLLSTIETKI NELITKPINLSRGIFEEKPTFINGKNPNKENNQHLFANWFIYAKK QTILQDFYNLPLEQPKAITNLKKHKYKLERSINNLKIEDIYIKQM VDFLYQKLFEQSFIGSLQDLYTSKEKREIEKGKAKNEQTPDESFI WKKQVEINTHNGRIIAKTKIKDIGKFKNLLTDNKIAHLISYDDRI WDFSLNNDGDITKKLYSINTELESYETIRREKLLKQIQQFEQFLL EQETEYSAERKHPEKFEKDCNPNFKKYIIEGVLNKIIPNHEIEEIEI LKSKEDVFKINFSDILILNNDNIKKGYLLIMIRNKFAHNQLIDKN LFNFSLQLYSKNENENFSEYLNKVCQNIIQEFKEKLK Porphyromonas WP_046201018 MTEQSERPYNGTYYTLEDKHFWAAFLNLARHNAYITLTHIDRQ gulae LAYSKADITNDQDVLSFKALWKNFDNDLERKSRLRSLILKHFSF (SEQ ID LEGAAYGKKLFESKSSGNKSSKNKELTKKEKEELQANALSLDN No. 180) LKSILFDFLQKLKDFRNYYSHYRHSESSELPLFDGNMLQRLYNV FDVSVQRVKRDHEHNDKVDPHRHFNHLVRKGKKDRYGHNDN PSFKHHFVDSEGMVTEAGLLFFVSLFLEKRDAIWMQKKIRGFK GGTETYQQMTNEVFCRSRISLPKLKLESLRTDDWMLLDMLNEL VRCPKPLYDRLREKDRARFRVPVDILPDEDDTDGGGEDPFKNT LVRHQDRFPYFALRYFDLKKVFTSLRFHIDLGTYHFAIYKKMIG EQPEDRHLTRNLYGFGRIQDFAEEHRPEEWKRLVRDLDYFETG DKPYISQTTPHYHIEKGKIGLRFMPEGQHLWPSPEVGTTRTGRS KYAQDKRLTAEAFLSVHELMPMMFYYFLLREKYSEEVSAEKV QGRIKRVIEDVYAIYDAFARDEINTLKELDACLADKGIRRGHLP KQMIAILSQEHKDMEEKIRKKLQEMIADTDHRLDMLDRQTDRK IRIGRKNAGLPKSGVIADWLVRDMMRFQPVAKDTSGKPLNNSK ANSTEYRMLQRALALFGGEKKRLTPYFRQMNLTGGNNPHPFLH ETRWESHTNILSFYRSYLRARKAFLERIGRSDRMENRPFLLLKEP KTDRQTLVAGWKSEFHLPRGIFTEAVRDCLIEMGYDEVGSYRE VGFMAKAVPLYFERACEDRVQPFYDSPFNVGNSLKPKKGRFLS KEERAEEWERGKERFRDLEAWSHSAARRIEDAFAGIEYASPGN KKKIEQLLRDLSLWEAFESKLKVRADKINLAKLKKEILEAQEHP YHDFKSWQKFERELRLVKNQDIITWMMCRDLMEENKVEGLDT GTLYLKDIRPNVQEQGSLNVLNRVKPMRLPVVVYRADSRGHV HKEEAPLATVYIEERDTKLLKQGNFKSFVKDRRLNGLFSFVDTG GLAMEQYPISKLRVEYELAKYQTARVCVFELTLRLEESLLTRYP HLPDESFRKMLESWSDPLLAKWPELHGKVRLLIAVRNAFSHNQ YPMYDEAVFSSIRKYDPSSPDAIEERMGLNIAHRLSEEVKQAKE TVERIIQV WP_047431796 Chryseobacterium METQTIGHGIAYDHSKIQDKHFFGGFLNLAENNIKAVLKAFSEK (SEQ sp. FNVGNVDVKQFADVSLKDNLPDNDFQKRVSFLKMYFPVVDFIN ID No. YR477 IPNNRAKFRSDLTTLFKSVDQLRNFYTHYYHKPLDFDASLFILLD 181) DIFARTAKEVRDQKMKDDKTRQLLSKSLSEELQKGYELQLERL KELNRLGKKVNIHDQLGIKNGVLNNAFNHLIYKDGESFKTKLT YSSALTSFESAENGIEISQSGLLFLLSMFLKRKEIEDLKNRNKGF KAKVVIDEDGKVNGLKFMATHWVFSYLCFKGLKSKLSTEFHEE TLLIQIIDELSKVPDELYCAFDKETRDKFIEDINEYVKEGHQDFSL EDAKVIHPVIRKRYENKFNYFAIRFLDEFVKFPSLRFQVHVGNY VHDRRIKNIDGTTFETERVVKDRIKVFGRLSEISSYKAQYLSSVS DKHDETGWEIFPNPSYVFINNNIPIHISVDTSFKKEIADFKKLRRA QVPDELKIRGAEKKRKFEITQMIGSKSVLNQEEPIALLSLNEIPAL LYEILINGKEPAEIERIIKDKLNERQDVIKNYNPENWLPASQISRR LRSNKGERIINTDKLLQLVTKELLVTEQKLKIISDNREALKQKKE GKYIRKFIFTNSELGREAIWLADDIKRFMPADVRKEWKGYQHS QLQQSLAFYNSRPKEALAILESSWNLKDEKIIWNEWILKSFTQN KFFDAFYNEYLKGRKKYFAFLSEHIVQYTSNAKNLQKFIKQQM PKDLFEKRHYIIEDLQTEKNKILSKPFIFPRGIFDKKPTFIKGVKV EDSPESFANWYQYGYQKDHQFQKFYDWKRDYSDVFLEHLGKP FINNGDRRTLGMEELKERIIIKQDLKIKKIKIQDLFLRLIAENLFQ KVFKYSAKLPLSDFYLTQEERMEKENMAALQNVREEGDKSPNI IKDNFIWSKMIPYKKGQIIENAVKLKDIGKLNVLSLDDKVQTLL SYDDAKPWSKIALENEFSIGENSYEVIRREKLFKEIQQFESEILFR SGWDGINHPAQLEDNRNPKFKMYIVNGILRKSAGLYSQGEDIW FEYNADFNNLDADVLETKSELVQLAFLVTAIRNKFAHNQLPAK EFYFYIRAKYGFADEPSVALVYLNFTKYAINEFKKVMI Riemerella WP_049354263 MFFSFHNAQRVIFKHLYKAFDASLRMVKEDYKAHFTVNLTRDF anatipestifer AHLNRKGKNKQDNPDFNRYRFEKDGFFTESGLLFFTNLFLDKR (SEQ ID DAYWMLKKVSGFKASHKQREKMTTEVFCRSRILLPKLRLESRY No. 182) DHNQMLLDMLSELSRCPKLLYEKLSEENKKHFQVEADGFLDEI EEEQNPFKDTLIRHQDRFPYFALRYLDLNESFKSIRFQVDLGTYH YCIYDKKIGDEQEKRHLTRTLLSFGRLQDFTEINRPQEWKALTK DLDYKETSNQPFISKTTPHYHITDNKIGFRLGTSKELYPSLEIKDG ANRIAKYPYNSGFVAHAFISVHELLPLMFYQHLTGKSEDLLKET VRHIQRIYKDFEEERINTIEDLEKANQGRLPLGAFPKQMLGLLQ NKQPDLSEKAKIKIEKLIAETKLLSHRLNTKLKSSPKLGKRREKL IKTGVLADWLVKDFMRFQPVAYDAQNQPIKSSKANSTEFWFIR RALALYGGEKNRLEGYFKQTNLIGNTNPHPFLNKFNWKACRNL VDFYQQYLEQREKFLEAIKNQPWEPYQYCLLLKIPKENRKNLV KGWEQGGISLPRGLFTEAIRETLSEDLMLSKPIRKEIKKHGRVGF ISRAITLYFKEKYQDKHQSFYNLSYKLEAKAPLLKREEHYEYW QQNKPQSPTESQRLELHTSDRWKDYLLYKRWQHLEKKLRLYR NQDVMLWLMTLELTKNHFKELNLNYHQLKLENLAVNVQEAD AKLNPLNQTLPMVLPVKVYPATAFGEVQYHKTPIRTVYIREEHT KALKMGNFKALVKDRRLNGLFSFIKEENDTQKHPISQLRLRREL EIYQSLRVDAFKETLSLEEKLLNKHTSLSSLENEFRALLEEWKK EYAASSMVTDEHIAFIASVRNAFCHNQYPFYKEALHAPIPLFTV AQPTTEEKDGLGIAEALLKVLREYCEIVKSQI Porphyromonas WP_052912312 MTEQNEKPYNGTYYTLEDKHFWAAFFNLARHNAYITLAHIDRQ gingivalis LAYSKADITNDEDILFFKGQWKNLDNDLERKARLRSLILKHFSF (SEQ ID LEGAAYGKKLFESQSSGNKSSKKKELTKKEKEELQANALSLDN No. 183) LKSILFDFLQKLKDFRNYYSHYRHPESSELPLFDGNMLQRLYNV FDVSVQRVKRDHEHNDKVDPHRHFNHLVRKGKKDKYGNNDN PFFKHHFVDREEKVTEAGLLFFVSLFLEKRDAIWMQKKIRGFKG GTEAYQQMTNEVFCRSRISLPKLKLESLRTDDWMLLDMLNELV RCPKLLYDRLREEDRARFRVPVDILSDEDDTDGTEEDPFKNTLV RHQDRFPYFALRYFDLKKVFTSLRFHIDLGTYHFAIYKKNIGEQ PEDRHLTRNLYGFGRIQDFAEEHRPEEWKRLVRDLDYFETGDK PYITQTTPHYHIEKGKIGLRFVPEGQLLWPSPEVGATRTGRSKY AQDKRFTAEAFLSVHELMPMMFYYFLLREKYSEEASAEKVQG RIKRVIEDVYAVYDAFARDEINTRDELDACLADKGIRRGHLPRQ MIAILSQEHKDMEEKVRKKLQEMIADTDHRLDMLDRQTDRKIR IGRKNAGLPKSGVIADWLVRDMMRFQPVAKDTSGKPLNNSKA NSTEYRMLQRALALFGGEKERLTPYFRQMNLTGGNNPHPFLHE TRWESHTNILSFYRSYLKARKAFLQSIGRSDREENHRFLLLKEPK TDRQTLVAGWKSEFHLPRGIFTEAVRDCLIEMGYDEVGSYKEV GFMAKAVPLYFERACKDRVQPFYDYPFNVGNSLKPKKGRFLSK EKRAEEWESGKERFRDLEAWSHSAARRIEDAFVGIEYASWENK KKIEQLLQDLSLWETFESKLKVKADKINIAKLKKEILEAKEHPY HDFKSWQKFERELRLVKNQDIITWMMCRDLMEENKVEGLDTG TLYLKDIRTDVQEQGSLNVLNHVKPMRLPVVVYRADSRGHVH KEEAPLATVYIEERDTKLLKQGNFKSFVKDRRLNGLFSFVDTGA LAMEQYPISKLRVEYELAKYQTARVCAFEQTLELEESLLTRYPH LPDESFREMLESWSDPLLDKWPDLQREVRLLIAVRNAFSHNQY PMYDETIFSSIRKYDPSSLDAIEERMGLNIAHRLSEEVKLAKEMV ERIIQA Porphyromonas WP_058019250 MTEQNEKPYNGTYYTLKDKHFWAAFFNLARHNAYITLTHIDRQ gingivalis LAYSKADITNDEDILFFKGQWKNLDNDLERKARLRSLILKHFSF (SEQ ID LEGAAYGKKLFESQSSGNKSSKKKELTKKEKEELQANALSLDN No. 184) LKSILFDFLQKLKDFRNYYSHYRHPESSELPMFDGNMLQRLYN VFDVSVQRVKRDHEHNDKVDPHRHFNHLVRKGKKDRCGNND NPFFKHHFVDREGKVTEAGLLFFVSLFLEKRDAIWMQKKIRGF KGGTETYQQMTNEVFCRSRISLPKLKLESLRTDDWMLLDMLNE LVRCPKSLYDRLREEDRACFRVPVDILSDEDDTDGAEEDPFKNT LVRHQDRFPYFALRYFDLKKVFTSLRFHIDLGTYHFAIYKKNIG EQPEDRHLTRNLYGFGRIQDFAEEHRPEEWKRLVRDLDCFETG DKPYITQTTPHYHIEKGKIGLRFVPEGQHLWPSPEVGATRTGRS KYAQDKRFTAEAFLSVHELMPMMFYYFLLREKYSEEVSAERV QGRIKRVIEDVYAVYDAFARDEINTRDELDACLADKGIRRGHLP RQMIAILSQKHKDMEEKVRKKLQEMIADTDHRLDMLDRQTDR KIRIGRKNAGLPKSGVIADWLVRDMMRFQPVAKDTSGKPLNNS KANSTEYRMLQRALALFGGEKERLTPYFRQMNLTGGNNPHPFL HETRWESHTNILSFYRSYLKARKAFLQSIGRSDRVENHRFLLLK EPKTDRQTLVAGWKGEFHLPRGIFTEAVRDCLIEMGLDEVGSY KEVGFMAKAVPLYFERACKDRVQPFYDYPFNVGNSLKPKKGR FLSKEKRAEEWESGKERFRDLEAWSHSAARRIEDAFAGIENASR ENKKKIEQLLQDLSLWETFESKLKVKADKINIAKLKKEILEAKE HPYLDFKSWQKFERELRLVKNQDIITWMMCRDLMEENKVEGL DTGTLYLKDIRTDVQEQGSLNVLNHVKPMRLPVVVYRADSRG HVHKEQAPLATVYIEERDTKLLKQGNFKSFVKDRRLNGLFSFV DTGALAMEQYPISKLRVEYELAKYQTARVCAFEQTLELEESLLT RYPHLPDENFRKMLESWSDPLLDKWPDLHRKVRLLIAVRNAFS HNQYPMYDEAVFSSIRKYDPSSPDAIEERMGLNIAHRLSEEVKQ AKEMAERIIQA Flavobacterium WP_060381855 MSSKNESYNKQKTFNHYKQEDKYFFGGFLNNADDNLRQVGKE columnare FKTRINFHNNNELASVFKDYFNKEKSVAKREHALNLLSNYFP (SEQ ID VLERIQKHTNHNFEQTREIFELLLDTIKKLRDYYTHHYHKPITIN No. 185) PKVYDFLDDTLLDVLITIKKKKVKNDTSRELLKEKFRPELTQLK NQKREELIKKGKKLLEENLENAVFNHCLRPFLEENKTDDKQNK TVSLRKYRKSKPNEETSITLTQSGLVFLISFFLHRKEFQVFTSGLE GFKAKVNTIKEEEISLNKNNIVYMITHWSYSYYNFKGLKHRIKT DQGVSTLEQNNTTHSLTNTNTKEALLTQIVDYLSKVPNEIYETL SEKQQKEFEEDINEYMRENPENEDSTFSSIVSHKVIRKRYENKFN YFAMRFLDEYAELPTLRFMVNFGDYIKDRQKKILESIQFDSERII KKEIHLFEKLGLVTEYKKNVYLKETSNIDLSRFPLFPSPSYVMA NNNIPFYIDSRSNNLDEYLNQKKKAQSQNRKRNLTFEKYNKEQ SKDAIIAMLQKEIGVKDLQQRSTIGLLSCNELPSMLYEVIVKDIK GAELENKIAQKIREQYQSIRDFTLDSPQKDNIPTTLTKTISTDTSV TFENQPIDIPRLKNALQKELTLTQEKLLNVKQHEIEVDNYNRNK NTYKFKNQPKDKVDDNKLQRKYVFYRNEIGQEANWLASDLIH FMKNKSLWKGYMHNELQSFLAFFEDKKNDCIALLETVFNLKED CILTKDLKNLFLKHGNFIDFYKEYLKLKEDFLNTESTFLENGFIG LPPKILKKELSKRLNYIFIVFQKRQFIIKELEEKKNNLYADAINLS RGIFDEKPTMIPFKKPNPDEFASWFVASYQYNNYQSFYELTPDK IENDKKKKYKNLRAINKVKIQDYYLKLMVDTLYQDLFNQPLDK SLSDFYVSKTDREKIKADAKAYQKRNDSFLWNKVIHLSLQNNR ITANPKLKDIGKYKRALQDEKIATLLTYDDRTWTYALQKPEKE NENDYKELHYTALNMELQEYEKVRSKKLLKQVQELEKQILDKF YDFSNNATHPEDLEIEDKKGKRHPNFKLYITKALLKNESEIINLE NIDIEILIKYYDYNTEKLKEKIKNMDEDEKAKIVNTKENYNKITN VLIKKALVLIIIRNKMAHNQYPPKFIYDLATRFVPKKEEEYFACY FNRVFETITTELWENKKKAKEIV Porphyromonas WP_061156470 MTEQNERPYNGTYYTLEDKHFWAAFFNLARHNAYITLTHIDRQ gingivalis LAYSKADITNDEDILFFKGQWKNLDNDLERKARLRSLILKHFSF (SEQ ID LEGAAYGKKLFENKSSGNKSSKKKELTKKEKEELQANALSLDN No. 186) LKSILFDFLQKLKDFRNYYSHYRHPESSELPLFDGNMLQRLYNV FDVSVQRVKRDHEHNDKVDPHRHFNHLVRKGKKDRCGNNDN PFFKHHFVDREGKVTEAGLLFFVSLFLEKRDAIWMQKKIRGFK GGTEAYQQMTNEVFCRSRISLPKLKLESLRTDDWMLLDMLNEL VRCPKSLYDRLREEDRARFRVPVDILSDEDDTDGTEEDPFKNTL VRHQDRFPYFALRYFDLKKVFTSLRFHIDLGTYHFAIYKKNIGE QPEDRHLTRNLYGFGRIQDFAEEHRPEEWKRLVRDLDYFETGD KPYITQTTPHYHIEKGKIGLRFVPEGQHLWPSPEVGATRTGRSK YAQDKRLTAEAFLSVHELMPMMFYYFLLREKYSEEVSAEKVQ GRIKRVIEDVYAVYDAFARGEIDTLDRLDACLADKGIRRGHLPR QMIAILSQEHKDMEEKVRKKLQEMIADTDHRLDMLDRQTDRKI RIGRKNAGLPKSGVIADWLVRDMMRFQPVAKDTSGKPLNNSK ANSTEYRMLQRALALFGGEKERLTPYFRQMNLTGGNNPHPFLH ETRWESHTNILSFYRSYLKARKAFLQSIGRSDREENHRFLLLKEP KTDRQTLVAGWKSEFHLPRGIFTEAVRDCLIEMGYDEVGSYKE VGFMAKAVPLYFERACKDRVQPFYDYPFNVGNSLKPKKGRFLS KEKRAEEWESGKERFRLAKLKKEILEAKEHPYLDFKSWQKFER ELRLVKNQDIITWMMCRDLMEENKVEGLDTGTLYLKDIRTEVQ EQGSLNVLNRVKPMRLPVVVYRADSRGHVHKEQAPLATVYIEE RDTKLLKQGNFKSFVKDRRLNGLFSFVDTGGLAMEQYPISKLR VEYELAKYQTARVCAFEQTLELEESLLTRCPHLPDKNFRKMLES WSDPLLDKWPDLQREVWLLIAVRNAFSHNQYPMYDEAVFSSIR KYDPSSPDAIEERMGLNIAHRLSEEVKQAKEMAERIIQA Porphyromonas WP_061156637 MNTVPASENKGQSRTVEDDPQYFGLYLNLARENLIEVESHVRIK gingivalis FGKKKLNEESLKQSLLCDHLLSVDRWTKVYGHSRRYLPFLHYF (SEQ ID DPDSQIEKDHDSKTGVDPDSAQRLIRELYSLLDFLRNDFSHNRL No. 187) DGTTFEHLEVSPDISSFITGTYSLACGRAQSRFADFFKPDDFVLA KNRKEQLISVADGKECLTVSGLAFFICLFLDREQASGMLSRIRGF KRTDENWARAVHETFCDLCIRHPHDRLESSNTKEALLLDMLNE LNRCPRILYDMLPEEERAQFLPALDENSMNNLSENSLNEESRLL WDGSSDWAEALTKRIRHQDRFPYLMLRFIEEMDLLKGIRFRVD LGEIELDSYSKKVGRNGEYDRTITDHALAFGKLSDFQNEEEVSR MISGEASYPVRFSLFAPRYAIYDNKIGYCHTSDPVYPKSKTGEK RALSNPQSMGFISVHDLRKLLLMELLCEGSFSRMQSGFLRKANR ILDETAEGKLQFSALFPEMRHRFIPPQNPKSKDRREKAETTLEKY KQEIKGRKDKLNSQLLSAFDMNQRQLPSRLLDEWMNIRPASHS VKLRTYVKQLNEDCRLRLRKFRKDGDGKARAIPLVGEMATFLS QDIVRMIISEETKKLITSAYYNEMQRSLAQYAGEENRRQFRAIV AELHLLDPSSGHPFLSATMETAHRYTEDFYKCYLEKKREWLAK TFYRPEQDENTKRRISVFFVPDGEARKLLPLLIRRRMKEQNDLQ DWIRNKQAHPIDLPSHLFDSKIMELLKVKDGKKKWNEAFKDW WSTKYPDGMQPFYGLRRELNIHGKSVSYIPSDGKKFADCYTHL MEKTVQDKKRELRTAGKPVPPDLAADIKRSFHRAVNEREFMLR LVQEDDRLMLMAINKMMTDREEDILPGLKNIDSILDKENQFSLA VHAKVLEKEGEGGDNSLSLVPATIEIKSKRKDWSKYIRYRYDR RVPGLMSHFPEHKATLDEVKTLLGEYDRCRIKIFDWAFALEGAI MSDRDLKPYLHESSSREGKSGEHSTLVKMLVEKKGCLTPDESQ YLILIRNKAAHNQFPCAAEMPLIYRDVSAKVGSIEGSSAKDLPE GSSLVDSLWKKYEMIIRKILPILDPENRFFGKLLNNMSQPINDL Riemerella WP_061710138 MFFSFHNAQRVIFKHLYKAFDASLRMVKEDYKAHFTVNLTRDF anatipestifer AHLNRKGKNKQDNPDFNRYRFEKDGFFTESGLLFFTNLFLDKR (SEQ ID DAYWMLKKVSGFKASHKQSEKMTTEVFCRSRILLPKLRLESRY No. 188) DHNQMLLDMLSELSRCPKLLYEKLSEKDKKCFQVEADGFLDEI EEEQNPFKDTLIRHQDRFPYFALRYLDLNESFKSIRFQVDLGTYH YCIYDKKIGYEQEKRHLTRTLLNFGRLQDFTEINRPQEWKALTK DLDYNETSNQPFISKTTPHYHITDNKIGFRLRTSKELYPSLEVKD GANRIAKYPYNSDFVAHAFISISVHELLPLMFYQHLTGKSEDLL KETVRHIQRIYKDFEEERINTIEDLEKANQGRLPLGAFPKQMLGL LQNKQPDLSEKAKIKIEKLIAETKLLSHRLNTKLKSSPKLGKRRE KLIKTGVLADWLVKDFMRFQPVVYDAQNQPIKSSKANSTESRLI RRALALYGGEKNRLEGYFKQTNLIGNTNPHPFLNKFNWKACRN LVDFYQQYLEQREKFLEAIKHQPWEPYQYCLLLKVPKENRKNL VKGWEQGGISLPRGLFTEAIRETLSKDLTLSKPIRKEIKKHGRVG FISRAITLYFKEKYQDKHQSFYNLSYKLEAKAPLLKKEEHYEYW QQNKPQSPTESQRLELHTSDRWKDYLLYKRWQHLEKKLRLYR NQDIMLWLMTLELTKNHFKELNLNYHQLKLENLAVNVQEADA KLNPLNQTLPMVLPVKVYPTTAFGEVQYHETPIRTVYIREEQTK ALKMGNFKALVKDRHLNGLFSFIKEENDTQKHPISQLRLRRELE IYQSLRVDAFKETLSLEEKLLNKHASLSSLENEFRTLLEEWKKK YAASSMVTDKHIAFIASVRNAFCHNQYPFYKETLHAPILLFTVA QPTTEEKDGLGIAEALLRVLREYCEIVKSQI Flavobacterium WP_063744070 MSSKNESYNKQKTFNHYKQEDKYFFGGFLNNADDNLRQVGKE columnare FKTRINFLASVFKDYFNKEKSVAKREHALNLLSNYFP (SEQ ID VLERIQKHTNHNFEQTREIFELLLDTIKKLRDYYTHHYHKPITIN No. 189) PKIYDFLDDTLLDVLITIKKKKVKNDTSRELLKEKLRPELTQLKN QKREELIKKGKKLLEENLENAVFNHCLRPFLEENKTDDKQNKT VSLRKYRKSKPNEETSITLTQSGLVFLMSFFLHRKEFQVFTSGLE GFKAKVNTIKEEKISLNKNNIVYMITHWSYSYYNFKGLKHRIKT DQGVSTLEQNNTTHSLTNTNTKEALLTQIVDYLSKVPNEIYETL SEKQQKEFEEDINEYMRENPENEDSTFSSIVSHKVIRKRYENKFN YFAMRFLDEYAELPTLRFMVNFGDYIKDRQKKILESIQFDSERII KKEIHLFEKLGLVTEYKKNVYLKETSNIDLSRFPLFPSPSYVMA NNNIPFYIDSRSNNLDEYLNQKKKAQSQNRKRNLTFEKYNKEQ SKDAIIAMLQKEIGVKDLQQRSTIGLLSCNELPSMLYEVIVKDIK GAELENKIAQKIREQYQSIRDFTLNSPQKDNIPTTLIKTISTDTSV TFENQPIDIPRLKNAIQKELALTQEKLLNVKQHEIEVNNYNRNK NTYKFKNQPKDKVDDNKLQRKYVFYRNEIGQEANWLASDLIH FMKNKSLWKGYMHNELQSFLAFFEDKKNDCIALLETVFNLKED CILTKDLKNLFLKHGNFIDFYKEYLKLKEDFLNTESTFLENGFIG LPPKILKKELSKRLNYIFIVFQKRQFIIKELEEKKNNLYADAINLS RGIFDEKPTMIPFKKPNPDEFASWFVASYQYNNYQSFYELTPDK IENDKKKKYKNLRAINKVKIQDYYLKLMVDTLYQDLFNQPLDK SLSDFYVSKTDREKIKADAKAYQKRNDSFLWNKVIHLSLQNNR ITANPKLKDIGKYKRALQDEKIATLLTYDDRTWTYALQKPEKE NENDYKELHYTALNMELQEYEKVRSKKLLKQVQELEKQILDKF YDFSNNATHPEDLEIEDKKGKRHPNFKLYITKALLKNESEIINLE NIDIEILIKYYDYNTEKLKEKIKNMDEDEKAKIVNTKENYNKITN VLIKKALVLIIIRNKMAHNQYPPKFIYDLATRFVPKKEEEYFACY FNRVFETITTELWENKKKAKEIV Riemerel1a WP_064970887 MEKPLPPNVYTLKHKFFWGAFLNIARHNAFITICHINEQLGLTTP anatipestifer PNDDKIADVVCGTWNNILNNDHDLLKKSQLTELILKHFPFLAA (SEQ ID MCYHPPKKEGKKKGSQKEQQKEKENEAQSQAEALNPSELIKVL No. 190) KTLVKQLRTLRNYYSHHSHKKPDAEKDIFKHLYKAFDASLRMV KEDYKAHFTVNLTQDFAHLNRKGKNKQDNPDFDRYRFEKDGF FTESGLLFFTNLFLDKRDAYWMLKKVSGFKASHKQSEKMTTEV FCRSRILLPKLRLESRYDHNQMLLDMLSELSRYPKLLYEKLSEE DKKRFQVEADGFLDEIEEEQNPFKDTLIRHQDRFPYFALRYLDL NESFKSIRFQVDLGTYHYCIYDKKIGDEQEKRHLTRTLLSFGRL QDFTEINRPQEWKALTKDLDYKETSKQPFISKTTPHYHITDNKIG FRLGTSKELYPSLEVKDGANRIAQYPYNSDFVAHAFISVHELLP LMFYQHLTGKSEDLLKETVRHIQRIYKDFEEERINTIEDLEKANQ GRLPLGAFPKQMLGLLQNKQPDLSEKAKIKIEKLIAETKLLSHR LNTKLKSSPKLGKRREKLIKTGVLADWLVKDFMRFQPVAYDA QNQPIESSKANSTEFQLIQRALALYGGEKNRLEGYFKQTNLIGN TNPHPFLNKFNWKACRNLVDFYQQYLEQREKFLEAIKNQPWEP YQYCLLLKIPKENRKNLVKGWEQGGISLPRGLFTEAIRETLSKD LTLSKPIRKEIKKHGRVGFISRAITLYFREKYQDDHQSFYDLPYK LEAKASPLPKKEHYEYWQQNKPQSPTELQRLELHTSDRWKDYL LYKRWQHLEKKLRLYRNQDVMLWLMTLELTKNHFKELNLNY HQLKLENLAVNVQEADAKLNPLNQTLPMVLPVKVYPATAFGE VQYQETPIRTVYIREEQTKALKMGNFKALVKDRRLNGLFSFIKE ENDTQKHPISQLRLRRELEIYQSLRVDAFKETLNLEEKLLKKHTS LSSVENKFRILLEEWKKEYAASSMVTDEHIAFIASVRNAFCHNQ YPFYEEALHAPIPLFTVAQQTTEEKDGLGIAEALLRVLREYCEIV KSQI Sinomicrobium WP_072319476.1 MESTTTLGLHLKYQHDLFEDKHYFGGGVNLAVQNIESIFQAFA oceani ERYGIQNPLRKNGVPAINNIFHDNISISNYKEYLKFLKQYLPVVG (SEQ ID FLEKSNEINIFEFREDFEILINAIYKLRHFYTHYYHSPIKLEDRFYT No. 191) CLNELFVAVAIQVKKHKMKSDKTRQLLNKNLHQLLQQLIEQKR EKLKDKKAEGEKVSLDTKSIENAVLNDAFVHLLDKDENIRLNY SSRLSEDIITKNGITLSISGLLFLLSLFLQRKEAEDLRSRIEGFKGK GNELRFMATHWVFSYLNVKRIKHRLNTDFQKETLLIQIADELSK VPDEVYKTLDHENRSKFLEDINEYIREGNEDASLNESTVVHGVI RKRYENKFHYLVLRYLDEFVDFPSLRFQVHLGNYIHDRRDKVI DGTNFITNRVIKEPIKVFGKLSHVSKLKSDYMESLSREHKNGWD VFPNPSYNFVGHNIPIFINLRSASSKGKELYRDLMKIKSEKKKKS REEGIPMERRDGKPTKIEISNQIDRNIKDNNFKDIYPGEPLAMLS LNELPALLFELLRRPSITPQDIEDRMVEKLYERFQIIRDYKPGDG LSTSKISKKLRKADNSTRLDGKKLLRAIQTETRNAREKLHTLEE NKALQKNRKRRTVYTTREQGREASWLAQDLKRFMPIASRKEW RGYHHSQLQQILAFYDQNPKQPLELLEQFWDLKEDTYVWNSWI HKSLSQHNGFVPMYEGYLKGRLGYYKKLESDIIGFLEEHKVLK RYYTQQHLNVIFRERLYFIKTETKQKLELLARPLVFPRGIFDDKP TFVQDKKVVDHPELFADWYVYSYKDDHSFQEFYHYKRDYNEI FETELSWDIDFKDNKRQLNPSEQMDLFRMKWDLKIKKIKIQDIF LKIVAEDIYLKIFGHKIPLSLSDFYISRQERLTLDEQAVAQSMRLP GDTSENQIKESNLWQTTVPYEKEQIREPKIKLKDIGKFKYFLQQ QKVLNLLKYDPQHVWTKAELEEELYIGKHSYEVVRREMLLQK CHQLEKHILEQFRFDGSNHPRELEQGNHPNFKMYIVNGILTKRG ELEIEAENWWLELGNSKNSLDKVEVELLTMKTIPEQKAFLLILIR NKFAHNQLPADNYFHYASNLMNLKKSDTYSLFWFTVADTIVQ EFMSL Reichenbachiella WP_073124441.1 MKTNPLIASSGEKPNYKKFNTESDKSFKKIFQNKGSIAPIAEKAC agariperforans KNFEIKSKSPVNRDGRLHYFSVGHAFKNIDSKNVFRYELDESQM (SEQ DMKPTQFLALQKEFFDFQGALNGLLKHIRNVNSHYVHTFEKLEI ID No. QSINQKLITFLIEAFELAVIHSYLNEEELSYEAYKDDPQSGQKLV 192) QFLCDKFYPNKEHEVEERKTILAKNKRQALEHLLFIEVTSDIDW KLFEKHKVFTISNGKYLSFHACLFLLSLFLYKSEANQLISKIKGF KRNDDNQYRSKRQIFTFFSKKFTSQDVNSEEQHLVKFRDVIQYL NHYPSAWNKHLELKSGYPQMTDKLMRYIVEAEIYRSFPDQTDN HRFLLFAIREFFGQSCLDTWTGNTPINFSNQEQKGFSYEINTSAEI KDIETKLKALVLKGPLNFKEKKEQNRLEKDLRREKKEQPTNRV KEKLLTRIQHNMLYVSYGRNQDRFMDFAARFLAETDYFGKDA KFKMYQFYTSDEQRDHLKEQKKELPKKEFEKLKYHQSKLVDY FTYAEQQARYPDWDTPFVVENNAIQIKVTLFNGAKKIVSVQRN LMLYLLEDALYSEKRENAGKGLISGYFVHHQKELKDQLDILEK ETEISREQKREFKKLLPKRLLHRYSPAQINDTTEWNPMEVILEEA KAQEQRYQLLLEKAILHQTEEDFLKRNKGKQFKLRFVRKAWH LMYLKELYMNKVAEHGHHKSFHITKEEFNDFCRWMFAFDEVP KYKEYLCDYFSQKGFFNNAEFKDLIESSTSLNDLYEKTKQRFEG WSKDLTKQSDENKYLLANYESMLKDDMLYVNISHFISYLESKG KINRNAHGHIAYKALNNVPHLIEEYYYKDRLAPEEYKSHGKLY NKLKTVKLEDALLYEMAMHYLSLEPALVPKVKTKVKDILSSNI AFDIKDAAGHHLYHLLIPFHKIDSFVALINHQSQQEKDPDKTSFL AKIQPYLEKVKNSKDLKAVYHYYKDTPHTLRYEDLNMIHSHIV SQSVQFTKVALKLEEYFIAKKSITLQIARQISYSEIADLSNYFTDE VRNTAFHFDVPETAYSMILQGIESEFLDREIKPQKPKSLSELSTQ QVSVCTAFLETLHNNLFDRKDDKKERLSKARERYFEQIN

In certain example embodiments, the CRISPR effector protein is a Cas13a protein selected from Table 2.

TABLE 2 c2c2-5 1 Lachnospiraceae MQISKVNHKHVAVGQKDRERITGFIYNDPVGDEKSLEDVVA bacterium KRANDTKVLFNVFNTKDLYDSQESDKSEKDKEIISKGAKFV MA2020 AKSFNSAITILKKQNKIYSTLTSQQVIKELKDKFGGARIYDDD (SEQ ID IEEALTETLKKSFRKENVRNSIKVLIENAAGIRSSLSKDEEELI No. 193) QEYFVKQLVEEYTKTKLQKNVVKSIKNQNMVIQPDSDSQVL SLSESRREKQSSAVSSDTLVNCKEKDVLKAFLTDYAVLDEDE RNSLLWKLRNLVNLYFYGSESIRDYSYTKEKSVWKEHDEQK ANKTLFIDEICHITKIGKNGKEQKVLDYEENRSRCRKQNINY YRSALNYAKNNTSGIFENEDSNHFWIHLIENEVERLYNGIEN GEEFKFETGYISEKVWKAVINHLSIKYIALGKAVYNYAMKEL SSPGDIEPGKIDDSYINGITSFDYEIIKAEESLQRDISMNVVFAT NYLACATVDTDKDFLLFSKEDIRSCTKKDGNLCKNIMQFWG GYSTWKNFCEEYLKDDKDALELLYSLKSMLYSMRNSSFHFS TENVDNGSWDTELIGKLFEEDCNRAARIEKEKFYNNNLHMF YSSSLLEKVLERLYSSHHERASQVPSFNRVFVRKNFPSSLSEQ RITPKFTDSKDEQIWQSAVYYLCKEIYYNDFLQSKEAYKLFR EGVKNLDKNDINNQKAADSFKQAVVYYGKAIGNATLSQVC QAIMTEYNRQNNDGLKKKSAYAEKQNSNKYKHYPLFLKQV LQSAFWEYLDENKEIYGFISAQIHKSNVEIKAEDFIANYSSQQ YKKLVDKVKKTPELQKWYTLGRLINPRQANQFLGSIRNYVQ FVKDIQRRAKENGNPIRNYYEVLESDSIIKILEMCTKLNGTTS NDIHDYFRDEDEYAEYISQFVNFGDVHSGAALNAFCNSESEG KKNGIYYDGINPIVNRNWVLCKLYGSPDLISKITSRVNENMIH DFHKQEDLIREYQIKGICSNKKEQQDLRTFQVLKNRVELRDI VEYSEIINELYGQLIKWCYLRERDLMYFQLGFHYLCLNNASS KEADYIKINVDDRNISGAILYQIAAMYINGLPVYYKKDDMY VALKSGKKASDELNSNEQTSKKINYFLKYGNNILGDKKDQL YLAGLELFENVAEHENIIIFRNEIDHFHYFYDRDRSMLDLYSE VFDRFFTYDMKLRKNVVNMLYNILLDHNIVSSFVFETGEKK VGRGDSEVIKPSAKIRLRANNGVSSDVFTYKVGSKDELKIAT LPAKNEEFLLNVARLIYYPDMEAVSENMVREGVVKVEKSND KKGKISRGSNTRSSNQSKYNNKSKNRMNYSMGSIFEKMDLK FD c2c2-6 2 Lachnospiraceae MKISKVREENRGAKLTVNAKTAVVSENRSQEGILYNDPSRY bacterium GKSRKNDEDRDRYIESRLKSSGKLYRIFNEDKNKRETDELQ NK4A179 WFLSEIVKKINRRNGLVLSDMLSVDDRAFEKAFEKYAELSYT (SEQ ID NRRNKVSGSPAFETCGVDAATAERLKGIISETNFINRIKNNID No. 194) NKVSEDIIDRIIAKYLKKSLCRERVKRGLKKLLMNAFDLPYS DPDIDVQRDFIDYVLEDFYHVRAKSQVSRSIKNMNMPVQPE GDGKFAITVSKGGTESGNKRSAEKEAFKKFLSDYASLDERV RDDMLRRMRRLVVLYFYGSDDSKLSDVNEKFDVWEDHAA RRVDNREFIKLPLENKLANGKTDKDAERIRKNTVKELYRNQ NIGCYRQAVKAVEEDNNGRYFDDKMLNMFFIHRIEYGVEKI YANLKQVTEFKARTGYLSEKIWKDLINYISIKYIAMGKAVYN YAMDELNASDKKEIELGKISEEYLSGISSFDYELIKAEEMLQR ETAVYVAFAARHLSSQTVELDSENSDFLLLKPKGTMDKNDK NKLASNNILNFLKDKETLRDTILQYFGGHSLWTDFPFDKYLA GGKDDVDFLTDLKDVIYSMRNDSFHYATENHNNGKWNKEL ISAMFEHETERMTVVMKDKFYSNNLPMFYKNDDLKKLLIDL YKDNVERASQVPSFNKVFVRKNFPALVRDKDNLGIELDLKA DADKGENELKFYNALYYMFKEIYYNAFLNDKNVRERFITKA TKVADNYDRNKERNLKDRIKSAGSDEKKKLREQLQNYIAEN DFGQRIKNIVQVNPDYTLAQICQLIMTEYNQQNNGCMQKKS AARKDINKDSYQHYKMLLLVNLRKAFLEFIKENYAFVLKPY KHDLCDKADFVPDFAKYVKPYAGLISRVAGSSELQKWYIVS RFLSPAQANHMLGFLHSYKQYVWDIYRRASETGTEINHSIAE DKIAGVDITDVDAVIDLSVKLCGTISSEISDYFKDDEVYAEYI SSYLDFEYDGGNYKDSLNRFCNSDAVNDQKVALYYDGEHP KLNRNIILSKLYGERRFLEKITDRVSRSDIVEYYKLKKETSQY QTKGIFDSEDEQKNIKKFQEMKNIVEFRDLMDYSEIADELQG QLINWIYLRERDLMNFQLGYHYACLNNDSNKQATYVTLDY QGKKNRKINGAILYQICAMYINGLPLYYVDKDSSEWTVSDG KESTGAKIGEFYRYAKSFENTSDCYASGLEIFENISEHDNITEL RNYIEHFRYYSSFDRSFLGIYSEVFDRFFTYDLKYRKNVPTIL YNILLQHFVNVRFEFVSGKKMIGIDKKDRKIAKEKECARITIR EKNGVYSEQFTYKLKNGTVYVDARDKRYLQSIIRLLFYPEK VNMDEMIEVKEKKKPSDNNTGKGYSKRDRQQDRKEYDKY KEKKKKEGNFLSGMGGNINWDEINAQLKN c2c2-7 3 [Clostridium] MKFSKVDHTRSAVGIQKATDSVHGMLYTDPKKQEVNDLDK aminophilum RFDQLNVKAKRLYNVFNQSKAEEDDDEKRFGKVVKKLNRE DSM 10710 LKDLLFHREVSRYNSIGNAKYNYYGIKSNPEEIVSNLGMVES SEQ ID LKGERDPQKVISKLLLYYLRKGLKPGTDGLRMILEASCGLRK No. 195) LSGDEKELKVFLQTLDEDFEKKTFKKNLIRSIENQNMAVQPS NEGDPIIGITQGRFNSQKNEEKSAIERMMSMYADLNEDHRED VLRKLRRLNVLYFNVDTEKTEEPTLPGEVDTNPVFEVWHDH EKGKENDRQFATFAKILTEDRETRKKEKLAVKEALNDLKSAI RDHNIMAYRCSIKVTEQDKDGLFFEDQRINRFWIHHIESAVE RILASINPEKLYKLRIGYLGEKVWKDLLNYLSIKYIAVGKAV FHFAMEDLGKTGQDIELGKLSNSVSGGLTSFDYEQIRADETL QRQLSVEVAFAANNLFRAVVGQTGKKIEQSKSEENEEDFLL WKAEKIAESIKKEGEGNTLKSILQFFGGASSWDLNHFCAAYG NESSALGYETKFADDLRKAIYSLRNETFHFTTLNKGSFDWNA KLIGDMFSHEAATGIAVERTRFYSNNLPMFYRESDLKRIMDH LYNTYHPRASQVPSFNSVFVRKNFRLFLSNTLNTNTSFDTEV YQKWESGVYYLFKEIYYNSFLPSGDAHHLFFEGLRRIRKEAD NLPIVGKEAKKRNAVQDFGRRCDELKNLSLSAICQMIMTEY NEQNNGNRKVKSTREDKRKPDIFQHYKMLLLRTLQEAFAIYI RREEFKFIFDLPKTLYVMKPVEEFLPNWKSGMFDSLVERVK QSPDLQRWYVLCKFLNGRLLNQLSGVIRSYIQFAGDIQRRAK ANHNRLYMDNTQRVEYYSNVLEVVDFCIKGTSRFSNVFSDY FRDEDAYADYLDNYLQFKDEKIAEVSSFAALKTFCNEEEVK AGIYMDGENPVMQRNIVMAKLFGPDEVLKNVVPKVTREEIE EYYQLEKQIAPYRQNGYCKSEEDQKKLLRFQRIKNRVEFQTI TEFSEIINELLGQLISWSFLRERDLLYFQLGFHYLCLHNDTEK PAEYKEISREDGTVIRNAILHQVAAMYVGGLPVYTLADKKL AAFEKGEADCKLSISKDTAGAGKKIKDFFRYSKYVLIKDRML TDQNQKYTIYLAGLELFENTDEHDNITDVRKYVDHFKYYAT SDENAMSILDLYSEIHDRFFTYDMKYQKNVANMLENILLRH FVLIRPEFFTGSKKVGEGKKITCKARAQIEIAENGMRSEDFTY KLSDGKKNISTCMIAARDQKYLNTVARLLYYPHEAKKSIVD TREKKNNKKTNRGDGTFNKQKGTARKEKDNGPREFNDTGF SNTPFAGFDPFRNS c2c2-8 5 Carnobacterium MRITKVKIKLDNKLYQVTMQKEEKYGTLKLNEESRKSTAEIL gallinarum RLKKASFNKSFHSKTINSQKENKNATIKKNGDYISQIFEKLVG DSM 4847 VDTNKNIRKPKMSLTDLKDLPKKDLALFIKRKFKNDDIVEIK (SEQ ID NLDLISLFYNALQKVPGEHFTDESWADFCQEMMPYREYKNK No. 196) FIERKIILLANSIEQNKGFSINPETFSKRKRVLHQWAIEVQERG DFSILDEKLSKLAEIYNFKKMCKRVQDELNDLEKSMKKGKN PEKEKEAYKKQKNFKIKTIWKDYPYKTHIGLIEKIKENEELN QFNIEIGKYFEHYFPIKKERCTEDEPYYLNSETIATTVNYQLK NALISYLMQIGKYKQFGLENQVLDSKKLQEIGIYEGFQTKFM DACVFATSSLKNIIEPMRSGDILGKREFKEAIATSSFVNYHHF FPYFPFELKGMKDRESELIPFGEQTEAKQMQNIWALRGSVQQ IRNEIFHSFDKNQKFNLPQLDKSNFEFDASENSTGKSQSYIET DYKFLFEAEKNQLEQFFIERIKSSGALEYYPLKSLEKLFAKKE MKFSLGSQVVAFAPSYKKLVKKGHSYQTATEGTANYLGLS YYNRYELKEESFQAQYYLLKLIYQYVFLPNFSQGNSPAFRET VKAILRINKDEARKKMKKNKKFLRKYAFEQVREMEFKETPD QYMSYLQSEMREEKVRKAEKNDKGFEKNITMNFEKLLMQIF VKGFDVFLTTFAGKELLLSSEEKVIKETEISLSKKINEREKTLK ASIQVEHQLVATNSAISYWLFCKLLDSRHLNELRNEMIKFKQ SRIKFNHTQHAELIQNLLPIVELTILSNDYDEKNDSQNVDVSA YFEDKSLYETAPYVQTDDRTRVSFRPILKLEKYHTKSLIEALL KDNPQFRVAATDIQEWMHKREEIGELVEKRKNLHTEWAEG QQTLGAEKREEYRDYCKKIDRFNWKANKVTLTYLSQLHYLI TDLLGRMVGFSALFERDLVYFSRSFSELGGETYHISDYKNLS GVLRLNAEVKPIKIKNIKVIDNEENPYKGNEPEVKPFLDRLH AYLENVIGIKAVHGKIRNQTAHLSVLQLELSMIESMNNLRDL MAYDRKLKNAVTKSMIKILDKHGMILKLKIDENHKNFEIESL IPKEIIHLKDKAIKTNQVSEEYCQLVLALLTTNPGNQLN c2c2-9 6 Carnobacterium MRMTKVKINGSPVSMNRSKLNGHLVWNGTTNTVNILTKKE gallinarum QSFAASFLNKTLVKADQVKGYKVLAENIFIIFEQLEKSNSEKP DSM 4847 SVYLNNIRRLKEAGLKRFFKSKYHEEIKYTSEKNQSVPTKLN (SEQ ID LIPLFFNAVDRIQEDKFDEKNWSYFCKEMSPYLDYKKSYLNR No. 197) KKEILANSIQQNRGFSMPTAEEPNLLSKRKQLFQQWAMKFQ ESPLIQQNNFAVEQFNKEFANKINELAAVYNVDELCTAITEK LMNFDKDKSNKTRNFEIKKLWKQHPHNKDKALIKLFNQEG NEALNQFNIELGKYFEHYFPKTGKKESAESYYLNPQTIIKTVG YQLRNAFVQYLLQVGKLHQYNKGVLDSQTLQEIGMYEGFQ TKFMDACVFASSSLRNIIQATTNEDILTREKFKKELEKNVELK HDLFFKTEIVEERDENPAKKIAMTPNELDLWAIRGAVQRVR NQIFHQQINKRHEPNQLKVGSFENGDLGNVSYQKTIYQKLFD AEIKDIEIYFAEKIKSSGALEQYSMKDLEKLFSNKELTLSLGG QVVAFAPSYKKLYKQGYFYQNEKTIELEQFTDYDFSNDVFK ANYYLIKLIYHYVFLPQFSQANNKLFKDTVHYVIQQNKELNT TEKDKKNNKKIRKYAFEQVKLMKNESPEKYMQYLQREMQE ERTIKEAKKTNEEKPNYNFEKLLIQIFIKGFDTFLRNFDLNLNP AEELVGTVKEKAEGLRKRKERIAKILNVDEQIKTGDEEIAFW IFAKLLDARHLSELRNEMIKFKQSSVKKGLIKNGDLIEQMQPI LELCILSNDSESMEKESFDKIEVFLEKVELAKNEPYMQEDKL TPVKFRFMKQLEKYQTRNFIENLVIENPEFKVSEKIVLNWHE EKEKIADLVDKRTKLHEEWASKAREIEEYNEKIKKNKSKKL DKPAEFAKFAEYKIICEAIENFNRLDHKVRLTYLKNLHYLMI DLMGRMVGFSVLFERDFVYMGRSYSALKKQSIYLNDYDTF ANIRDWEVNENKHLFGTSSSDLTFQETAEFKNLKKPMENQL KALLGVTNHSFEIRNNIAHLHVLRNDGKGEGVSLLSCMNDL RKLMSYDRKLKNAVTKAIIKILDKHGMILKLTNNDHTKPFEI ESLKPKKIIHLEKSNHSFPMDQVSQEYCDLVKKMLVFTN c2c2- 7 Paludibacter MRVSKVKVKDGGKDKMVLVHRKTTGAQLVYSGQPVSNET 10 propionicigenes SNILPEKKRQSFDLSTLNKTIIKFDTAKKQKLNVDQYKIVEKI WB4 FKYPKQELPKQIKAEEILPFLNHKFQEPVKWKNGKEESFNL (SEQ ID TLLIVEAVQAQDKRKLQPYYDWKTWYIQTKSDLLKKSIENN No. 198) RIDLTENLSKRKKALLAWETEFTASGSIDLTHYHKVYMTDV LCKMLQDVKPLTDDKGKINTNAYHRGLKKALQNHQPAIFGT REVPNEANRADNQLSIYHLEVVKYLEHYFPIKTSKRRNTADD IAHYLKAQTLKTTIEKQLVNAIRANIIQQGKTNHHELKADTT SNDLIRIKTNEAFVLNLTGTCAFAANNIRNMVDNEQTNDILG KGDFIKSLLKDNTNSQLYSFFFGEGLSTNKAEKETQLWGIRG AVQQIRNNVNHYKKDALKTVFNISNFENPTITDPKQQTNYA DTIYKARFINELEKIPEAFAQQLKTGGAVSYYTIENLKSLLTT FQFSLCRSTIPFAPGFKKVFNGGINYQNAKQDESFYELMLEQ YLRKENFAEESYNARYFMLKLIYNNLFLPGFTTDRKAFADSV GFVQMQNKKQAEKVNPRKKEAYAFEAVRPMTAADSIADY MAYVQSELMQEQNKKEEKVAEETRINFEKFVLQVFIKGFDS FLRAKEFDFVQMPQPQLTATASNQQKADKLNQLEASITADC KLTPQYAKADDATHIAFYVFCKLLDAAHLSNLRNELIKFRES VNEFKFHHLLEIIEICLLSADVVPTDYRDLYSSEADCLARLRP FIEQGADITNWSDLFVQSDKHSPVIHANIELSVKYGTTKLLEQ IINKDTQFKTTEANFTAWNTAQKSIEQLIKQREDHHEQWVK AKNADDKEKQERKREKSNFAQKFIEKHGDDYLDICDYINTY NWLDNKMHFVHLNRLHGLTIELLGRMAGFVALFDRDFQFF DEQQIADEFKLHGFVNLHSIDKKLNEVPTKKIKEIYDIRNKIIQ INGNKINESVRANLIQFISSKRNYYNNAFLHVSNDEIKEKQM YDIRNHIAHFNYLTKDAADFSLIDLINELRELLHYDRKLKNA VSKAFIDLFDKHGMILKLKLNADHKLKVESLEPKKIYHLGSS AKDKPEYQYCTNQVMMAYCNMCRSLLEMKK c2c2- 9 Listeria MLALLHQEVPSQKLHNLKSLNTESLTKLFKPKFQNMISYPPS 11 weihenstephanensis KGAEHVQFCLTDIAVPAIRDLDEIKPDWGIFFEKLKPYTDWA FSL R9- ESYIHYKQTTIQKSIEQNKIQSPDSPRKLVLQKYVTAFLNGEP 0317 (SEQ LGLDLVAKKYKLADLAESFKVVDLNEDKSANYKIKACLQQ ID No. 199) HQRNILDELKEDPELNQYGIEVKKYIQRYFPIKRAPNRSKHA RADFLKKELIESTVEQQFKNAVYHYVLEQGKMEAYELTDPK TKDLQDIRSGEAFSFKFINACAFASNNLKMILNPECEKDILGK GDFKKNLPNSTTQSDVVKKMIPFFSDEIQNVNFDEAIWAIRG SIQQIRNEVYHCKKHSWKSILKIKGFEFEPNNMKYTDSDMQK LMDKDIAKIPDFIEEKLKSSGIIRFYSHDKLQSIWEMKQGFSL LTTNAPFVPSFKRVYAKGHDYQTSKNRYYDLGLTTFDILEY GEEDFRARYFLTKLVYYQQFMPWFTADNNAFRDAANFVLR LNKNRQQDAKAFINIREVEEGEMPRDYMGYVQGQIAIHEDS TEDTPNHFEKFISQVFIKGFDSHMRSADLKFIKNPRNQGLEQS EIEEMSFDIKVEPSFLKNKDDYIAFWTFCKMLDARHLSELRN EMIKYDGHLTGEQEIIGLALLGVDSRENDWKQFFSSEREYEK IMKGYVGEELYQREPYRQSDGKTPILFRGVEQARKYGTETVI QRLFDASPEFKVSKCNITEWERQKETIEETIERRKELHNEWE KNPKKPQNNAFFKEYKECCDAIDAYNWHKNKTTLVYVNEL HHLLIEILGRYVGYVAIADRDFQCMANQYFKHSGITERVEY WGDNRLKSIKKLDTFLKKEGLFVSEKNARNHIAHLNYLSLK SECTLLYLSERLREIFKYDRKLKNAVSKSLIDILDRHGMSVVF ANLKENKHRLVIKSLEPKKLRHLGEKKIDNGYIETNQVSEEY CGIVKRLLEI c2c2- 10 Listeriaceae MKITKMRVDGRTIVMERTSKEGQLGYEGIDGNKTTEIIFDKK 12 bacterium KESFYKSILNKTVRKPDEKEKNRRKQAINKAINKEITELMLA FSL M6- VLHQEVPSQKLHNLKSLNTESLTKLFKPKFQNMISYPPSKGA 0635 = EHVQFCLTDIAVPAIRDLDEIKPDWGIFFEKLKPYTDWAESYI Listeria HYKQTTIQKSIEQNKIQSPDSPRKLVLQKYVTAFLNGEPLGL newyorkensis FSL DLVAKKYKLADLAESFKLVDLNEDKSANYKIKACLQQHQR M6-0635 NILDELKEDPELNQYGIEVKKYIQRYFPIKRAPNRSKHARADF (SEQ ID LKKELIESTVEQQFKNAVYHYVLEQGKMEAYELTDPKTKDL No. 200) QDIRSGEAFSFKFINACAFASNNLKMILNPECEKDILGKGNFK KNLPNSTTRSDVVKKMIPFFSDELQNVNFDEAIWAIRGSIQQI RNEVYHCKKHSWKSILKIKGFEFEPNNMKYADSDMQKLMD KDIAKIPEFIEEKLKSSGVVRFYRHDELQSIWEMKQGFSLLTT NAPFVPSFKRVYAKGHDYQTSKNRYYNLDLTTFDILEYGEE DFRARYFLTKLVYYQQFMPWFTADNNAFRDAANFVLRLNK NRQQDAKAFINIREVEEGEMPRDYMGYVQGQIAIHEDSIEDT PNHFEKFISQVFIKGFDRHMRSANLKFIKNPRNQGLEQSEIEE MSFDIKVEPSFLKNKDDYIAFWIFCKMLDARHLSELRNEMIK YDGHLTGEQEIIGLALLGVDSRENDWKQFFSSEREYEKIMKG YVVEELYQREPYRQSDGKTPILFRGVEQARKYGTETVIQRLF DANPEFKVSKCNLAEWERQKETIEETIKRRKELHNEWAKNP KKPQNNAFFKEYKECCDAIDAYNWHKNKTTLAYVNELHHL LIEILGRYVGYVAIADRDFQCMANQYFKHSGITERVEYWGD NRLKSIKKLDTFLKKEGLFVSEKNARNHIAHLNYLSLKSECT LLYLSERLREIFKYDRKLKNAVSKSLIDILDRHGMSVVFANL KENKHRLVIKSLEPKKLRHLGGKKIDGGYIETNQVSEEYCGI VKRLLEM c2c2- 12 Leptotrichia MKVTKVDGISHKKYIEEGKLVKSTSEENRTSERLSELLSIRLD 13 wadei IYIKNPDNASEEENRIRRENLKKFFSNKVLHLKDSVLYLKNR F0279 KEKNAVQDKNYSEEDISEYDLKNKNSFSVLKKILLNEDVNSE (SEQ ID ELEIFRKDVEAKLNKINSLKYSFEENKANYQKINENNVEKVG No. 201) GKSKRNIIYDYYRESAKRNDYINNVQEAFDKLYKKEDIEKLF FLIENSKKHEKYKIREYYHKIIGRKNDKENFAKIIYEEIQNVN NIKELIEKIPDMSELKKSQVFYKYYLDKEELNDKNIKYAFCH FVEIEMSQLLKNYVYKRLSNISNDKIKRIFEYQNLKKLIENKL LNKLDTYVRNCGKYNYYLQVGEIATSDFIARNRQNEAFLRNI IGVSSVAYFSLRNILETENENDITGRMRGKTVKNNKGEEKYV SGEVDKIYNENKQNEVKENLKMFYSYDFNMDNKNEIEDFFA NIDEAISSIRHGIVHFNLELEGKDIFAFKNIAPSEISKKMFQNEI NEKKLKLKIFKQLNSANVFNYYEKDVIIKYLKNTKFNFVNK NIPFVPSFTKLYNKIEDLRNTLKFFWSVPKDKEEKDAQIYLLK NIYYGEFLNKFVKNSKVFFKITNEVIKINKQRNQKTGHYKYQ KFENIEKTVPVEYLAIIQSREMINNQDKEEKNTYIDFIQQIFLK GFIDYLNKNNLKYIESNNNNDNNDIFSKIKIKKDNKEKYDKIL KNYEKHNRNKEIPHEINEFVREIKLGKILKYTENLNMFYLILK LLNHKELTNLKGSLEKYQSANKEETFSDELELINLLNLDNNR VTEDFELEANEIGKFLDFNENKIKDRKELKKFDTNKIYFDGE NIIKHRAFYNIKKYGMLNLLEKIADKAKYKISLKELKEYSNK KNEIEKNYTMQQNLHRKYARPKKDEKFNDEDYKEYEKAIG NIQKYTHLKNKVEFNELNLLQGLLLKILHRLVGYTSIWERDL RFRLKGEFPENHYIEEIFNFDNSKNVKYKSGQIVEKYINFYKE LYKDNVEKRSIYSDKKVKKLKQEKKDLYIRNYIAHFNYIPHA EISLLEVLENLRKLLSYDRKLKNAIMKSIVDILKEYGFVATFK IGADKKIEIQTLESEKIVHLKNLKKKKLMTDRNSEELCELVK VMFEYKALE c2c2- 15 Rhodobacter MQIGKVQGRTISEFGDPAGGLKRKISTDGKNRKELPAHLSSD 14 capsulatus PKALIGQWISGIDKIYRKPDSRKSDGKAIHSPTPSKMQFDARD SB 1003 DLGEAFWKLVSEAGLAQDSDYDQFKRRLHPYGDKFQPADS (SEQ ID GAKLKFEADPPEPQAFHGRWYGAMSKRGNDAKELAAALYE No. 202) HLHVDEKRIDGQPKRNPKTDKFAPGLVVARALGIESSVLPRG MARLARNWGEEEIQTYFVVDVAASVKEVAKAAVSAAQAFD PPRQVSGRSLSPKVGFALAEHLERVTGSKRCSFDPAAGPSVL ALHDEVKKTYKRLCARGKNAARAFPADKTELLALMRHTHE NRVRNQMVRMGRVSEYRGQQAGDLAQSHYWTSAGQTEIK ESEIFVRLWVGAFALAGRSMKAWIDPMGKIVNTEKNDRDLT AAVNIRQVISNKEMVAEAMARRGIYFGETPELDRLGAEGNE GFVFALLRYLRGCRNQTFHLGARAGFLKEIRKELEKTRWGK AKEAEHVVLTDKTVAAIRAIIDNDAKALGARLLADLSGAFV AHYASKEHFSTLYSEIVKAVKDAPEVSSGLPRLKLLLKRADG VRGYVHGLRDTRKHAFATKLPPPPAPRELDDPATKARYIALL RLYDGPFRAYASGITGTALAGPAARAKEAATALAQSVNVTK AYSDVMEGRTSRLRPPNDGETLREYLSALTGETATEFRVQIG YESDSENARKQAEFIENYRRDMLAFMFEDYIRAKGFDWILKI EPGATAMTRAPVLPEPIDTRGQYEHWQAALYLVMHFVPASD VSNLLHQLRKWEALQGKYELVQDGDATDQADARREALDL VKRFRDVLVLFLKTGEARFEGRAAPFDLKPFRALFANPATFD RLFMATPTTARPAEDDPEGDGASEPELRVARTLRGLRQIARY NHMAVLSDLFAKHKVRDEEVARLAEIEDETQEKSQIVAAQE LRTDLHDKVMKCHPKTISPEERQSYAAAIKTIEEHRFLVGRV YLGDHLRLHRLMMDVIGRLIDYAGAYERDTGTFLINASKQL GAGADWAVTIAGAANTDARTQTRKDLAHFNVLDRADGTPD LTALVNRAREMMAYDRKRKNAVPRSILDMLARLGLTLKWQ MKDHLLQDATITQAAIKHLDKVRLTVGGPAAVTEARFSQDY LQMVAAVFNGSVQNPKPRRRDDGDAWHKPPKPATAQSQPD QKPPNKAPSAGSRLPPPQVGEVYEGVVVKVIDTGSLGFLAVE GVAGNIGLHISRLRRIREDAIIVGRRYRFRVEIYVPPKSNTSKL NAADLVRID c2c2- 16 Rhodobacter MQIGKVQGRTISEFGDPAGGLKRKISTDGKNRKELPAHLSSD 15 capsulatus PKALIGQWISGIDKIYRKPDSRKSDGKAIHSPTPSKMQFDARD R121 (SEQ DLGEAFWKLVSEAGLAQDSDYDQFKRRLHPYGDKFQPADS ID No. 203) GAKLKFEADPPEPQAFHGRWYGAMSKRGNDAKELAAALYE HLHVDEKRIDGQPKRNPKTDKFAPGLVVARALGIESSVLPRG MARLARNWGEEEIQTYFVVDVAASVKEVAKAAVSAAQAFD PPRQVSGRSLSPKVGFALAEHLERVTGSKRCSFDPAAGPSVL ALHDEVKKTYKRLCARGKNAARAFPADKTELLALMRHTHE NRVRNQMVRMGRVSEYRGQQAGDLAQSHYWTSAGQTEIK ESEIFVRLWVGAFALAGRSMKAWIDPMGKIVNTEKNDRDLT AAVNIRQVISNKEMVAEAMARRGIYFGETPELDRLGAEGNE GFVFALLRYLRGCRNQTFHLGARAGFLKEIRKELEKTRWGK AKEAEHVVLTDKTVAAIRAIIDNDAKALGARLLADLSGAFV AHYASKEHFSTLYSEIVKAVKDAPEVSSGLPRLKLLLKRADG VRGYVHGLRDTRKHAFATKLPPPPAPRELDDPATKARYIALL RLYDGPFRAYASGITGTALAGPAARAKEAATALAQSVNVTK AYSDVMEGRSSRLRPPNDGETLREYLSALTGETATEFRVQIG YESDSENARKQAEFIENYRRDMLAFMFEDYIRAKGFDWILKI EPGATAMTRAPVLPEPIDTRGQYEHWQAALYLVMHFVPASD VSNLLHQLRKWEALQGKYELVQDGDATDQADARREALDL VKRFRDVLVLFLKTGEARFEGRAAPFDLKPFRALFANPATFD RLFMATPTTARPAEDDPEGDGASEPELRVARTLRGLRQIARY NHMAVLSDLFAKHKVRDEEVARLAEIEDETQEKSQIVAAQE LRTDLHDKVMKCHPKTISPEERQSYAAAIKTIEEHRFLVGRV YLGDHLRLHRLMMDVIGRLIDYAGAYERDTGTFLINASKQL GAGADWAVTIAGAANTDARTQTRKDLAHFNVLDRADGTPD LTALVNRAREMMAYDRKRKNAVPRSILDMLARLGLTLKWQ MKDHLLQDATITQAAIKHLDKVRLTVGGPAAVTEARFSQDY LQMVAAVFNGSVQNPKPRRRDDGDAWHKPPKPATAQSQPD QKPPNKAPSAGSRLPPPQVGEVYEGVVVKVIDTGSLGFLAVE GVAGNIGLHISRLRRIREDAIIVGRRYRFRVEIYVPPKSNTSKL NAADLVRID c2c2- 17 Rhodobacter MQIGKVQGRTISEFGDPAGGLKRKISTDGKNRKELPAHLSSD 16 capsulatus PKALIGQWISGIDKIYRKPDSRKSDGKAIHSPTPSKMQFDARD DE442 DLGEAFWKLVSEAGLAQDSDYDQFKRRLHPYGDKFQPADS (SEQ ID GAKLKFEADPPEPQAFHGRWYGAMSKRGNDAKELAAALYE No. 204) HLHVDEKRIDGQPKRNPKTDKFAPGLVVARALGIESSVLPRG MARLARNWGEEEIQTYFVVDVAASVKEVAKAAVSAAQAFD PPRQVSGRSLSPKVGFALAEHLERVTGSKRCSFDPAAGPSVL ALHDEVKKTYKRLCARGKNAARAFPADKTELLALMRHTHE NRVRNQMVRMGRVSEYRGQQAGDLAQSHYWTSAGQTEIK ESEIFVRLWVGAFALAGRSMKAWIDPMGKIVNTEKNDRDLT AAVNIRQVISNKEMVAEAMARRGIYFGETPELDRLGAEGNE GFVFALLRYLRGCRNQTFHLGARAGFLKEIRKELEKTRWGK AKEAEHVVLTDKTVAAIRAIIDNDAKALGARLLADLSGAFV AHYASKEHFSTLYSEIVKAVKDAPEVSSGLPRLKLLLKRADG VRGYVHGLRDTRKHAFATKLPPPPAPRELDDPATKARYIALL RLYDGPFRAYASGITGTALAGPAARAKEAATALAQSVNVTK AYSDVMEGRSSRLRPPNDGETLREYLSALTGETATEFRVQIG YESDSENARKQAEFIENYRRDMLAFMFEDYIRAKGFDWILKI EPGATAMTRAPVLPEPIDTRGQYEHWQAALYLVMHFVPASD VSNLLHQLRKWEALQGKYELVQDGDATDQADARREALDL VKRFRDVLVLFLKTGEARFEGRAAPFDLKPFRALFANPATFD RLFMATPTTARPAEDDPEGDGASEPELRVARTLRGLRQIARY NHMAVLSDLFAKHKVRDEEVARLAEIEDETQEKSQIVAAQE LRTDLHDKVMKCHPKTISPEERQSYAAAIKTIEEHRFLVGRV YLGDHLRLHRLMMDVIGRLIDYAGAYERDTGTFLINASKQL GAGADWAVTIAGAANTDARTQTRKDLAHFNVLDRADGTPD LTALVNRAREMMAYDRKRKNAVPRSILDMLARLGLTLKWQ MKDHLLQDATITQAAIKHLDKVRLTVGGPAAVTEARFSQDY LQMVAAVFNGSVQNPKPRRRDDGDAWHKPPKPATAQSQPD QKPPNKAPSAGSRLPPPQVGEVYEGVVVKVIDTGSLGFLAVE GVAGNIGLHISRLRRIREDAIIVGRRYRFRVEIYVPPKSNTSKL NAADLVRID c2c2-2 (SEQ ID MGNLFGHKRWYEVRDKKDFKIKRKVKVKRNYDGNKYILNI No. 205) NENNNKEKIDNNKFIRKYINYKKNDNILKEFTRKFHAGNILF KLKGKEGIIRIENNDDFLETEEVVLYIEAYGKSEKLKALGITK KKIIDEAIRQGITKDDKKIEIKRQENEEEIEIDIRDEYTNKTLND CSIILRIIENDELETKKSIYEIFKNINMSLYKIIEKIIENETEKVFE NRYYEEHLREKLLKDDKIDVILTNFMEIREKIKSNLEILGFVK FYLNVGGDKKKSKNKKMLVEKILNINVDLTVEDIADFVIKEL EFWNITKRIEKVKKVNNEFLEKRRNRTYIKSYVLLDKHEKFK IERENKKDKIVKFFVENIKNNSIKEKIEKILAEFKIDELIKKLEK ELKKGNCDTEIFGIFKKHYKVNFDSKKFSKKSDEEKELYKIIY RYLKGRIEKILVNEQKVRLKKMEKIEIEKILNESILSEKILKRV KQYTLEHIMYLGKLRHNDIDMTTVNTDDFSRLHAKEELDLE LITFFASTNMELNKIFSRENINNDENIDFFGGDREKNYVLDKK ILNSKIKIIRDLDFIDNKNNITNNFIRKFTKIGTNERNRILHAISK ERDLQGTQDDYNKVINIIQNLKISDEEVSKALNLDVVFKDKK NIITKINDIKISEENNNDIKYLPSFSKVLPEILNLYRNNPKNEPF DTIETEKIVLNALIYVNKELYKKLILEDDLEENESKNIFLQELK KTLGNIDEIDENIIENYYKNAQISASKGNNKAIKKYQKKVIEC YIGYLRKNYEELFDFSDFKMNIQEIKKQIKDINDNKTYERITV KTSDKTIVINDDFEYIISIFALLNSNAVINKIRNRFFATSVWLN TSEYQNIIDILDEIMQLNTLRNECITENWNLNLEEFIQKMKEIE KDFDDFKIQTKKEIFNNYYEDIKNNILTEFKDDINGCDVLEKK LEKIVIFDDETKFEIDKKSNILQDEQRKLSNINKKDLKKKVDQ YIKDKDQEIKSKILCRIIFNSDFLKKYKKEIDNLIEDMESENEN KFQEIYYPKERKNELYIYKKNLFLNIGNPNFDKIYGLISNDIK MADAKFLFNIDGKNIRKNKISEIDAILKNLNDKLNGYSKEYK EKYIKKLKENDDFFAKNIQNKNYKSFEKDYNRVSEYKKIRD LVEFNYLNKIESYLIDINWKLAIQMARFERDMHYIVNGLREL GIIKLSGYNTGISRAYPKRNGSDGFYTTTAYYKFFDEESYKKF EKICYGFGIDLSENSEINKPENESIRNYISHFYIVRNPFADYSIA EQIDRVSNLLSYSTRYNNSTYASVFEVFKKDVNLDYDELKK KFKLIGNNDILERLMKPKKVSVLELESYNSDYIKNLIIELLTKI ENTNDTL c2c2-3 L wadei MKVTKVDGISHKKYIEEGKLVKSTSEENRTSERLSELLSIRLD (Lw2) IYIKNPDNASEEENRIRRENLKKFFSNKVLHLKDSVLYLKNR (SEQ ID KEKNAVQDKNYSEEDISEYDLKNKNSFSVLKKILLNEDVNSE No. 206) ELEIFRKDVEAKLNKINSLKYSFEENKANYQKINENNVEKVG GKSKRNIIYDYYRESAKRNDYINNVQEAFDKLYKKEDIEKLF FLIENSKKHEKYKIREYYHKIIGRKNDKENFAKIIYEEIQNVN NIKELIEKIPDMSELKKSQVFYKYYLDKEELNDKNIKYAFCH FVEIEMSQLLKNYVYKRLSNISNDKIKRIFEYQNLKKLIENKL LNKLDTYVRNCGKYNYYLQVGEIATSDFIARNRQNEAFLRNI IGVSSVAYFSLRNILETENENDITGRMRGKTVKNNKGEEKYV SGEVDKIYNENKQNEVKENLKMFYSYDFNMDNKNEIEDFFA NIDEAISSIRHGIVHFNLELEGKDIFAFKNIAPSEISKKMFQNEI NEKKLKLKIFKQLNSANVFNYYEKDVIIKYLKNTKFNFVNK NIPFVPSFTKLYNKIEDLRNTLKFFWSVPKDKEEKDAQIYLLK NIYYGEFLNKFVKNSKVFFKITNEVIKINKQRNQKTGHYKYQ KFENIEKTVPVEYLAIIQSREMINNQDKEEKNTYIDFIQQIFLK GFIDYLNKNNLKYIESNNNNDNNDIFSKIKIKKDNKEKYDKIL KNYEKHNRNKEIPHEINEFVREIKLGKILKYTENLNMFYLILK LLNHKELTNLKGSLEKYQSANKEETFSDELELINLLNLDNNR VTEDFELEANEIGKFLDFNENKIKDRKELKKFDTNKIYFDGE NIIKHRAFYNIKKYGMLNLLEKIADKAKYKISLKELKEYSNK KNEIEKNYTMQQNLHRKYARPKKDEKFNDEDYKEYEKAIG NIQKYTHLKNKVEFNELNLLQGLLLKILHRLVGYTSIWERDL RFRLKGEFPENHYIEEIFNFDNSKNVKYKSGQIVEKYINFYKE LYKDNVEKRSIYSDKKVKKLKQEKKDLYIRNYIAHFNYIPHA EISLLEVLENLRKLLSYDRKLKNAIMKSIVDILKEYGFVATFK IGADKKIEIQTLESEKIVHLKNLKKKKLMTDRNSEELCELVK VMFEYKALEKRPAATKKAGQAKKKKGSYPYDVPDYAYPY DVPDYAYPYDVPDYA* c2c2-4 Listeria MWISIKTLIHHLGVLFFCDYMYNRREKKIIEVKTMRITKVEV seeligeri DRKKVLISRDKNGGKLVYENEMQDNTEQIMHHKKSSFYKS (SEQ ID VVNKTICRPEQKQMKKLVHGLLQENSQEKIKVSDVTKLNIS No. 207) NFLNHRFKKSLYYFPENSPDKSEEYRIEINLSQLLEDSLKKQQ GTFICWESFSKDMELYINWAENYISSKTKLIKKSIRNNRIQST ESRSGQLMDRYMKDILNKNKPFDIQSVSEKYQLEKLTSALK ATFKEAKKNDKEINYKLKSTLQNHERQIIEELKENSELNQFNI EIRKHLETYFPIKKTNRKVGDIRNLEIGEIQKIVNHRLKNKIVQ RILQEGKLASYEIESTVNSNSLQKIKIEEAFALKFINACLFASN NLRNMVYPVCKKDILMIGEFKNSFKEIKHKKFIRQWSQFFSQ EITVDDIELASWGLRGAIAPIRNEIIHLKKHSWKKFFNNPTFK VKKSKIINGKTKDVTSEFLYKETLFKDYFYSELDSVPELIINK MESSKILDYYSSDQLNQVFTIPNFELSLLTSAVPFAPSFKRVY LKGFDYQNQDEAQPDYNLKLNIYNEKAFNSEAFQAQYSLFK MVYYQVFLPQFTTNNDLFKSSVDFILTLNKERKGYAKAFQDI RKMNKDEKPSEYMSYIQSQLMLYQKKQEEKEKINHFEKFIN QVFIKGFNSFIEKNRLTYICHPTKNTVPENDNIEIPFHTDMDDS NIAFWLMCKLLDAKQLSELRNEMIKFSCSLQSTEEISTFTKAR EVIGLALLNGEKGCNDWKELFDDKEAWKKNMSLYVSEELL QSLPYTQEDGQTPVINRSIDLVKKYGTETILEKLFSSSDDYKV SAKDIAKLHEYDVTEKIAQQESLHKQWIEKPGLARDSAWTK KYQNVINDISNYQWAKTKVELTQVRHLHQLTIDLLSRLAGY MSIADRDFQFSSNYILERENSEYRVTSWILLSENKNKNKYND YELYNLKNASIKVSSKNDPQLKVDLKQLRLTLEYLELFDNRL KEKRNNISHFNYLNGQLGNSILELFDDARDVLSYDRKLKNA VSKSLKEILSSHGMEVTFKPLYQTNHHLKIDKLQPKKIHHLG EKSTVSSNQVSNEYCQLVRTLLTMK C2-17 Leptotrichia MKVTKVGGISHKKYTSEGRLVKSESEENRTDERLSALLNMR buccalis LDMYIKNPSSTETKENQKRIGKLKKFFSNKMVYLKDNTLSL C-1013-b KNGKKENIDREYSETDILESDVRDKKNFAVLKKIYLNENVNS (SEQ ID EELEVFRNDIKKKLNKINSLKYSFEKNKANYQKINENNIEKV No. 208) EGKSKRNIIYDYYRESAKRDAYVSNVKEAFDKLYKEEDIAK LVLEIENLTKLEKYKIREFYHEIIGRKNDKENFAKIIYEEIQNV NNMKELIEKVPDMSELKKSQVFYKYYLDKEELNDKNIKYAF CHFVEIEMSQLLKNYVYKRLSNISNDKIKRIFEYQNLKKLIEN KLLNKLDTYVRNCGKYNYYLQDGEIATSDFIARNRQNEAFL RNIIGVSSVAYFSLRNILETENENDITGRMRGKTVKNNKGEE KYVSGEVDKIYNENKKNEVKENLKMFYSYDFNMDNKNEIE DFFANIDEAISSIRHGIVHFNLELEGKDIFAFKNIAPSEISKKMF QNEINEKKLKLKIFRQLNSANVFRYLEKYKILNYLKRTRFEF VNKNIPFVPSFTKLYSRIDDLKNSLGIYWKTPKTNDDNKTKEI IDAQIYLLKNIYYGEFLNYFMSNNGNFFEISKEIIELNKNDKR NLKTGFYKLQKFEDIQEKIPKEYLANIQSLYMINAGNQDEEE KDTYIDFIQKIFLKGFMTYLANNGRLSLIYIGSDEETNTSLAE KKQEFDKFLKKYEQNNNIKIPYEINEFLREIKLGNILKYTERL NMFYLILKLLNHKELTNLKGSLEKYQSANKEEAFSDQLELIN LLNLDNNRVTEDFELEADEIGKFLDFNGNKVKDNKELKKFD TNKIYFDGENIIKHRAFYNIKKYGMLNLLEKIADKAGYKISIE ELKKYSNKKNEIEKNHKMQENLHRKYARPRKDEKFTDEDY ESYKQAIENIEEYTHLKNKVEFNELNLLQGLLLRILHRLVGY TSIWERDLRFRLKGEFPENQYIEEIFNFENKKNVKYKGGQIVE KYIKFYKELHQNDEVKINKYSSANIKVLKQEKKDLYIRNYIA HFNYIPHAEISLLEVLENLRKLLSYDRKLKNAVMKSVVDILK EYGFVATFKIGADKKIGIQTLESEKIVHLKNLKKKKLMTDRN SEELCKLVKIMFEYKMEEKKSEN C2-18 Herbinix MKLTRRRISGNSVDQKITAAFYRDMSQGLLYYDSEDNDCTD hemicellulosilytica KVIESMDFERSWRGRILKNGEDDKNPFYMFVKGLVGSNDKI (SEQ ID VCEPIDVDSDPDNLDILINKNLTGFGRNLKAPDSNDTLENLIR No. 209) KIQAGIPEEEVLPELKKIKEMIQKDIVNRKEQLLKSIKNNRIPF SLEGSKLVPSTKKMKWLFKLIDVPNKTFNEKMLEKYWEIYD YDKLKANITNRLDKTDKKARSISRAVSEELREYHKNLRTNY NRFVSGDRPAAGLDNGGSAKYNPDKEEFLLFLKEVEQYFKK YFPVKSKHSNKSKDKSLVDKYKNYCSYKVVKKEVNRSIINQ LVAGLIQQGKLLYYFYYNDTWQEDFLNSYGLSYIQVEEAFK KSVMTSLSWGINRLTSFFIDDSNTVKFDDITTKKAKEAIESNY FNKLRTCSRMQDHFKEKLAFFYPVYVKDKKDRPDDDIENLI VLVKNAIESVSYLRNRTFHFKESSLLELLKELDDKNSGQNKI DYSVAAEFIKRDIENLYDVFREQIRSLGIAEYYKADMISDCFK TCGLEFALYSPKNSLMPAFKNVYKRGANLNKAYIRDKGPKE TGDQGQNSYKALEEYRELTWYIEVKNNDQSYNAYKNLLQLI YYHAFLPEVRENEALITDFINRTKEWNRKETEERLNTKNNKK HKNFDENDDITVNTYRYESIPDYQGESLDDYLKVLQRKQMA RAKEVNEKEEGNNNYIQFIRDVVVWAFGAYLENKLKNYKN ELQPPLSKENIGLNDTLKELFPEEKVKSPFNIKCRFSISTFIDNK GKSTDNTSAEAVKTDGKEDEKDKKNIKRKDLLCFYLFLRLL DENEICKLQHQFIKYRCSLKERRFPGNRTKLEKETELLAELEE LMELVRFTMPSIPEISAKAESGYDTMIKKYFKDFIEKKVFKNP KTSNLYYHSDSKTPVTRKYMALLMRSAPLHLYKDIFKGYYL ITKKECLEYIKLSNIIKDYQNSLNELHEQLERIKLKSEKQNGK DSLYLDKKDFYKVKEYVENLEQVARYKHLQHKINFESLYRI FRIHVDIAARMVGYTQDWERDMHFLFKALVYNGVLEERRF EAIFNNNDDNNDGRIVKKIQNNLNNKNRELVSMLCWNKKL NKNEFGAIIWKRNPIAHLNHFTQTEQNSKSSLESLINSLRILLA YDRKRQNAVTKTINDLLLNDYHIRIKWEGRVDEGQIYFNIKE KEDIENEPIIHLKHLHKKDCYIYKNSYMFDKQKEWICNGIKE EVYDKSILKCIGNLFKFDYEDKNKSSANPKHT C2-19 [Eubacterium] MLRRDKEVKKLYNVFNQIQVGTKPKKWNNDEKLSPEENER rectale RAQQKNIKMKNYKWREACSKYVESSQRIINDVIFYSYRKAK (SEQ ID NKLRYMRKNEDILKKMQEAEKLSKFSGGKLEDFVAYTLRKS No. 210) LVVSKYDTQEFDSLAAMVVFLECIGKNNISDHEREIVCKLLE LIRKDFSKLDPNVKGSQGANIVRSVRNQNMIVQPQGDRFLFP QVYAKENETVTNKNVEKEGLNEFLLNYANLDDEKRAESLR KLRRILDVYFSAPNHYEKDMDITLSDNIEKEKFNVWEKHEC GKKETGLFVDIPDVLMEAEAENIKLDAVVEKRERKVLNDRV RKQNIICYRYTRAVVEKYNSNEPLFFENNAINQYWIHHIENA VERILKNCKAGKLFKLRKGYLAEKVWKDAINLISIKYIALGK AVYNFALDDIWKDKKNKELGIVDERIRNGITSFDYEMIKAHE NLQRELAVDIAFSVNNLARAVCDMSNLGNKESDFLLWKRN DIADKLKNKDDMASVSAVLQFFGGKSSWDINIFKDAYKGKK KYNYEVRFIDDLRKAIYCARNENFHFKTALVNDEKWNTELF GKIFERETEFCLNVEKDRFYSNNLYMFYQVSELRNMLDHLY SRSVSRAAQVPSYNSVIVRTAFPEYITNVLGYQKPSYDADTL GKWYSACYYLLKEIYYNSFLQSDRALQLFEKSVKTLSWDDK KQQRAVDNFKDHFSDIKSACTSLAQVCQIYMTEYNQQNNQI KKVRSSNDSIFDQPVYQHYKVLLKKAIANAFADYLKNNKDL FGFIGKPFKANEIREIDKEQFLPDWTSRKYEALCIEVSGSQEL QKWYIVGKFLNARSLNLMVGSMRSYIQYVTDIKRRAASIGN ELHVSVHDVEKVEKWVQVIEVCSLLASRTSNQFEDYFNDKD DYARYLKSYVDFSNVDMPSEYSALVDFSNEEQSDLYVDPKN PKVNRNIVHSKLFAADHILRDIVEPVSKDNIEEFYSQKAEIAY CKIKGKEITAEEQKAVLKYQKLKNRVELRDIVEYGEIINELLG QLINWSFMRERDLLYFQLGFHYDCLRNDSKKPEGYKNIKVD ENSIKDAILYQIIGMYVNGVTVYAPEKDGDKLKEQCVKGGV GVKVSAFHRYSKYLGLNEKTLYNAGLEIFEVVAEHEDIINLR NGIDHFKYYLGDYRSMLSIYSEVFDRFFTYDIKYQKNVLNLL QNILLRHNVIVEPILESGFKTIGEQTKPGAKLSIRSIKSDTFQY KVKGGTLITDAKDERYLETIRKILYYAENEEDNLKKSVVVTN ADKYEKNKESDDQNKQKEKKNKDNKGKKNEETKSDAEKN NNERLSYNPFANLNFKLSN C2-20 Eubacteriaceae MKISKESHKRTAVAVMEDRVGGVVYVPGGSGIDLSNNLKK bacterium RSMDTKSLYNVFNQIQAGTAPSEYEWKDYLSEAENKKREAQ CHKCI004 KMIQKANYELRRECEDYAKKANLAVSRIIFSKKPKKIFSDDDI (SEQ ID ISHMKKQRLSKFKGRMEDFVLIALRKSLVVSTYNQEVFDSR No. 211) KAATVFLKNIGKKNISADDERQIKQLMALIREDYDKWNPDK DSSDKKESSGTKVIRSIEHQNMVIQPEKNKLSLSKISNVGKKT KTKQKEKAGLDAFLKEYAQIDENSRMEYLKKLRRLLDTYFA APSSYIKGAAVSLPENINFSSELNVWERHEAAKKVNINFVEIP ESLLNAEQNNNKINKVEQEHSLEQLRTDIRRRNITCYHFANA LAADERYHTLFFENMAMNQFWIHHMENAVERILKKCNVGT LFKLRIGYLSEKVWKDMLNLLSIKYIALGKAVYHFALDDIW KADIWKDASDKNSGKINDLTLKGISSFDYEMVKAQEDLQRE MAVGVAFSTNNLARVTCKMDDLSDAESDFLLWNKEAIRRH VKYTEKGEILSAILQFFGGRSLWDESLFEKAYSDSNYELKFL DDLKRAIYAARNETFHFKTAAIDGGSWNTRLFGSLFEKEAGL CLNVEKNKFYSNNLVLFYKQEDLRVFLDKLYGKECSRAAQI PSYNTILPRKSFSDFMKQLLGLKEPVYGSAILDQWYSACYYL FKEVYYNLFLQDSSAKALFEKAVKALKGADKKQEKAVESFR KRYWEISKNASLAEICQSYITEYNQQNNKERKVRSANDGMF NEPIYQHYKMLLKEALKMAFASYIKNDKELKFVYKPTEKLF EVSQDNFLPNWNSEKYNTLISEVKNSPDLQKWYIVGKFMNA RMLNLLLGSMRSYLQYVSDIQKRAAGLGENQLHLSAENVG QVKKWIQVLEVCLLLSVRISDKFTDYFKDEEEYASYLKEYV DFEDSAMPSDYSALLAFSNEGKIDLYVDASNPKVNRNIIQAK LYAPDMVLKKVVKKISQDECKEFNEKKEQIIVIQFKNKGDEVS WEEQQKILEYQKLKNRVELRDLSEYGELINELLGQLINWSYL RERDLLYFQLGFHYSCLMNESKKPDAYKTIRRGTVSIENAVL YQIIAMYINGFPVYAPEKGELKPQCKTGSAGQKIRAFCQWAS MVEKKKYELYNAGLELFEVVKEHDNIIDLRNKIDHFKYYQG NDSILALYGEIFDRFFTYDMKYRNNVLNHLQNILLRHNVIIKP IISKDKKEVGRGKMKDRAAFLLEEVSSDRFTYKVKEGERKID AKNRLYLETVRDILYFPNRAVNDKGEDVIICSKKAQDLNEK KADRDKNHDKSKDTNQKKEGKNQEEKSENKEPYSDRMTW KPFAGIKLE C2-21 Blautia sp. MKISKVDHVKSGIDQKLSSQRGMLYKQPQKKYEGKQLEEH Marseille- VRNLSRKAKALYQVFPVSGNSKMEKELQIINSFIKNILLRLDS P2398 GKTSEEIVGYINTYSVASQISGDHIQELVDQHLKESLRKYTCV (SEQ ID GDKRIYVPDIIVALLKSKFNSETLQYDNSELKILIDFIREDYLK No. 212) EKQIKQIVHSIENNSTPLRIAEINGQKRLIPANVDNPKKSYIFE FLKEYAQSDPKGQESLLQHMRYLILLYLYGPDKITDDYCEEI EAWNFGSIVMDNEQLFSEEASMLIQDRIYVNQQIEEGRQSKD TAKVKKNKSKYRMLGDKIEHSINESVVKHYQEACKAVEEK DIPWIKYISDHVMSVYSSKNRVDLDKLSLPYLAKNTWNTWI SFIAMKYVDMGKGVYHFAMSDVDKVGKQDNLIIGQIDPKFS DGISSFDYERIKAEDDLHRSMSGYIAFAVNNFARAICSDEFRK KNRKEDVLTVGLDEIPLYDNVKRKLLQYFGGASNWDDSIIDI IDDKDLVACIKENLYVARNVNFHFAGSEKVQKKQDDILEEIV RKETRDIGKHYRKVFYSNNVAVFYCDEDIIKLMNHLYQREK PYQAQIPSYNKVISKTYLPDLIFMLLKGKNRTKISDPSIMNMF RGTFYFLLKEIYYNDFLQASNLKEMFCEGLKNNVKNKKSEK PYQNFMRRFEELENMGMDFGEICQQIMTDYEQQNKQKKKT ATAVMSEKDKKIRTLDNDTQKYKHFRTLLYIGLREAFIIYLK DEKNKEWYEFLREPVKREQPEEKEFVNKWKLNQYSDCSELI LKDSLAAAWYVVAHFINQAQLNHLIGDIKNYIQFISDIDRRA KSTGNPVSESTEIQIERYRKILRVLEFAKFFCGQITNVLTDYY QDENDFSTHVGHYVKFEKKNMEPAHALQAFSNSLYACGKE KKKAGFYYDGMNPIVNRNITLASMYGNKKLLENAMNPVTE QDIRKYYSLMAELDSVLKNGAVCKSEDEQKNLRHFQNLKN RIELVDVLTLSELVNDLVAQLIGWVYIRERDMMYLQLGLHY IKLYFTDSVAEDSYLRTLDLEEGSIADGAVLYQIASLYSFNLP MYVKPNKSSVYCKKHVNSVATKFDIFEKEYCNGDETVIENG LRLFENINLHKDMVKFRDYLAHFKYFAKLDESILELYSKAYD FFFSYNIKLKKSVSYVLTNVLLSYFINAKLSFSTYKSSGNKTV QHRTTKISVVAQTDYFTYKLRSIVKNKNGVESIENDDRRCEV VNIAARDKEFVDEVCNVINYNSDK C2-22 Leptotrichia MGNLFGHKRWYEVRDKKDFKIKRKVKVKRNYDGNKYILNI sp. oral taxon NENNNKEKIDNNKFIGEFVNYKKNNNVLKEFKRKFHAGNIL 879 str. F0557 FKLKGKEEIIRIENNDDFLETEEVVLYIEVYGKSEKLKALEITK (SEQ ID KKIIDEAIRQGITKDDKKIEIKRQENEEEIEIDIRDEYTNKTLND No. 213) CSIILRIIENDELETKKSIYEIFKNINMSLYKIIEKIIENETEKVFE NRYYEEHLREKLLKDNKIDVILTNFMEIREKIKSNLEIMGFVK FYLNVSGDKKKSENKKMFVEKILNTNVDLTVEDIVDFIVKEL KFWNITKRIEKVKKFNNEFLENRRNRTYIKSYVLLDKHEKFK IERENKKDKIVKFFVENIKNNSIKEKIEKILAEFKINELIKKLEK ELKKGNCDTEIFGIFKKHYKVNFDSKKFSNKSDEEKELYKIIY RYLKGRIEKILVNEQKVRLKKMEKIEIEKILNESILSEKILKRV KQYTLEHIMYLGKLRHNDIVKMTVNTDDFSRLHAKEELDLE LITFFASTNMELNKIFNGKEKVTDFFGFNLNGQKITLKEKVPS FKLNILKKLNFINNENNIDEKLSHFYSFQKEGYLLRNKILHNS YGNIQETKNLKGEYENVEKLIKELKVSDEEISKSLSLDVIFEG KVDIINKINSLKIGEYKDKKYLPSFSKIVLEITRKFREINKDKL FDIESEKIILNAVKYVNKILYEKITSNEENEFLKTLPDKLVKKS NNKKENKNLLSIEEYYKNAQVSSSKGDKKAIKKYQNKVTNA YLEYLENTFTEIIDFSKFNLNYDEIKTKIEERKDNKSKIIIDSIST NINITNDIEYIISIFALLNSNTYINKIRNRFFATSVWLEKQNGTK EYDYENIISILDEVLLINLLRENNITDILDLKNAIIDAKIVENDE TYIKNYIFESNEEKLKKRLFCEELVDKEDIRKIFEDENFKFKSF IKKNEIGNFKINFGILSNLECNSEVEAKKIIGKNSKKLESFIQNI IDEYKSNIRTLFSSEFLEKYKEEIDNLVEDTESENKNKFEKIYY PKEHKNELYIYKKNLFLNIGNPNFDKIYGLISKDIKNVDTKIL FDDDIKKNKISEIDAILKNLNDKLNGYSNDYKAKYVNKLKE NDDFFAKNIQNENYSSFGEFEKDYNKVSEYKKIRDLVEFNYL NKIESYLIDINWKLAIQMARFERDMHYIVNGLRELGIIKLSGY NTGISRAYPKRNGSDGFYTTTAYYKFFDEESYKKFEKICYGF GIDLSENSEINKPENESIRNYISHFYIVRNPFADYSIAEQIDRVS NLLSYSTRYNNSTYASVFEVFKKDVNLDYDELKKKFRLIGN NDILERLMKPKKVSVLELESYNSDYIKNLIIELLTKIENTNDTL C2-23 Lachnospiraceae MKISKVDHTRMAVAKGNQHRRDEISGILYKDPTKTGSIDFDE bacterium RFKKLNCSAKILYHVFNGIAEGSNKYKNIVDKVNNNLDRVL NK4A144 FTGKSYDRKSIIDIDTVLRNVEKINAFDRISTEEREQIIDDLLEI (SEQ ID QLRKGLRKGKAGLREVLLIGAGVIVRTDKKQEIADFLEILDE No. 214) DFNKTNQAKNIKLSIENQGLVVSPVSRGEERIFDVSGAQKGK SSKKAQEKEALSAFLLDYADLDKNVRFEYLRKIRRLINLYFY VKNDDVMSLTEIPAEVNLEKDFDIWRDHEQRKEENGDFVGC PDILLADRDVKKSNSKQVKIAERQLRESIREKNIKRYRFSIKTI EKDDGTYFFANKQISVFWIHRIENAVERILGSINDKKLYRLRL GYLGEKVWKDILNFLSIKYIAVGKAVFNFAMDDLQEKDRDI EPGKISENAVNGLTSFDYEQIKADEMLQREVAVNVAFAANN LARVTVDIPQNGEKEDILLWNKSDIKKYKKNSKKGILKSILQ FFGGASTWNMKMFEIAYHDQPGDYEENYLYDIIQIIYSLRNK SFHFKTYDHGDKNWNRELIGKMIEHDAERVISVEREKFHSN NLPMFYKDADLKKILDLLYSDYAGRASQVPAFNTVLVRKNF PEFLRKDMGYKVHFNNPEVENQWHSAVYYLYKEIYYNLFL RDKEVKNLFYTSLKNIRSEVSDKKQKLASDDFASRCEEIEDR SLPEICQIIMTEYNAQNFGNRKVKSQRVIEKNKDIFRHYKML LIKTLAGAFSLYLKQERFAFIGKATPIPYETTDVKNFLPEWKS GMYASFVEEIKNNLDLQEWYIVGRFLNGRMLNQLAGSLRSY IQYAEDIERRAAENRNKLFSKPDEKIEACKKAVRVLDLCIKIS TRISAEFTDYFDSEDDYADYLEKYLKYQDDAIKELSGSSYAA LDHFCNKDDLKFDIYVNAGQKPILQRNIVMAKLFGPDNILSE VMEKVTESAIREYYDYLKKVSGYRVRGKCSTEKEQEDLLKF QRLKNAVEFRDVTEYAEVINELLGQLISWSYLRERDLLYFQL GFHYMCLKNKSFKPAEYVDIRRNNGTIIHNAILYQIVSMYIN GLDFYSCDKEGKTLKPIETGKGVGSKIGQFIKYSQYLYNDPS YKLEIYNAGLEVFENIDEHDNITDLRKYVDHFKYYAYGNKM SLLDLYSEFFDRFFTYDMKYQKNVVNVLENILLRHFVIFYPK FGSGKKDVGIRDCKKERAQIEISEQSLTSEDFMFKLDDKAGE EAKKFPARDERYLQTIAKLLYYPNEIEDMNRFMKKGETINK KVQFNRKKKITRKQKNNSSNEVLSSTMGYLFKNIKL C2-24 Chloroflexus MTDQVRREEVAAGELADTPLAAAQTPAADAAVAATPAPAE aggregans AVAPTPEQAVDQPATTGESEAPVTTAQAAAHEAEPAEATGA (SEQ ID SFTPVSEQQPQKPRRLKDLQPGMELEGKVTSIALYGIFVDVG No. 215) VGRDGLVHISEMSDRRIDTPSELVQIGDTVKVWVKSVDLDA RRISLTMLNPSRGEKPRRSRQSQPAQPQPRRQEVDREKLASL KVGEIVEGVITGFAPFGAFADIGVGKDGLIHISELSEGRVEKP EDAVKVGERYQFKVLEIDGEGTRISLSLRRAQRTQRMQQLEP GQIIEGTVSGIATFGAFVDIGVGRDGLVHISALAPHRVAKVED VVKVGDKVKVKVLGVDPQSKRISLTMRLEEEQPATTAGDEA AEPAEEVTPTRRGNLERFAAAAQTARERSERGERSERGERRE RRERRPAQSSPDTYIVGEDDDESFEGNATIEDLLTKFGGSSSR RDRDRRRRHEDDDDEEMERPSNRRQREAIRRTLQQIGYDE C2-25 Demequina MDLTWHALLILFIVALLAGFLDTLAGGGGLLTVPALLLTGIP aurantiaca PLQALGTNKLQSSFGTGMATYQVIRKKRVHWRDVRWPMV (SEQ ID WAFLGSAAGAVAVQFIDTDALLIIIPVVLALVAAYFLFVPKS No. 216) HLPPPEPRMSDPAYEATLVPIIGAYDGAFGPGTGSLYALSGV ALRAKTLVQSTAIAKTLNFATNFAALLVFAFAGHMLWTVGA VMIAGQLIGAYAGSHMLFRVNPLVLRVLIVVMSLGMLIRVL LD C2-26 Thalassospira MRIIKPYGRSHVEGVATQEPRRKLRLNSSPDISRDIPGFAQSH sp. DALIIAQWISAIDKIATKPKPDKKPTQAQINLRTTLGDAAWQ TSL5-1 HVMAENLLPAATDPAIREKLHLIWQSKIAPWGTARPQAEKD (SEQ ID GKPTPKGGWYERFCGVLSPEAITQNVARQIAKDIYDHLHVA No. 217) AKRKGREPAKQGESSNKPGKFKPDRKRGLIEERAESIAKNAL RPGSHAPCPWGPDDQATYEQAGDVAGQIYAAARDCLEEKK RRSGNRNTSSVQYLPRDLAAKILYAQYGRVFGPDTTIKAALD EQPSLFALHKAIKDCYHRLINDARKRDILRILPRNMAALFRL VRAQYDNRDINALIRLGKVIHYHASEQGKSEHHGIRDYWPS QQDIQNSRFWGSDGQADIKRHEAFSRIWRHIIALASRTLHDW ADPHSQKFSGENDDILLLAKDAIEDDVFKAGHYERKCDVLF GAQASLFCGAEDFEKAILKQAITGTGNLRNATFHFKGKVRFE KELQELTKDVPVEVQSAIAALWQKDAEGRTRQIAETLQAVL AGHFLTEEQNRHIFAALTAAMAQPGDVPLPRLRRVLARHDSI CQRGRILPLSPCPDRAKLEESPALTCQYTVLKMLYDGPFRAW LAQQNSTILNHYIDSTIARTDKAARDMNGRKLAQAEKDLITS RAADLPRLSVDEKMGDFLARLTAATATEMRVQRGYQSDGE NAQKQAAFIGQFECDVIGRAFADFLNQSGFDFVLKLKADTP QPDAAQCDVTALIAPDDISVSPPQAWQQVLYFILHLVPVDDA SHLLHQIRKWQVLEGKEKPAQIAHDVQSVLMLYLDMHDAK FTGGAALHGIEKFAEFFAHAADFRAVFPPQSLQDQDRSIPRR GLREIVRFGHLPLLQHMSGTVQITHDNVVAWQAARTAGAT GMSPIARRQKQREELHALAVERTARFRNADLQNYMHALVD VIKHRQLSAQVTLSDQVRLHRLMMGVLGRLVDYAGLWERD LYFVVLALLYHHGATPDDVFKGQGKKNLADGQVVAALKPK NRKAAAPVGVFDDLDHYGIYQDDRQSIRNGLSHFNMLRGG KAPDLSHWVNQTRSLVAHDRKLKNAVAKSVIEMLAREGFD LDWGIQTDRGQHILSHGKIRTRQAQHFQKSRLHIVKKSAKPD KNDTVKIRENLHGDAMVERVVQLFAAQVQKRYDITVEKRL DHLFLKPQDQKGKNGIHTHNGWSKTEKKRRPSRENRKGNH EN C2-27 SAMN04487830_13920 MKFSKESHRKTAVGVTESNGIIGLLYKDPLNEKEKIEDVVNQ [Pseudobutyrivibrio RANSTKRLFNLFGTEATSKDISRASKDLAKVVNKAIGNLKGN sp. OR37] KKFNKKEQITKGLNTKIIVEELKNVLKDEKKLIVNKDIIDEAC (SEQ ID SRLLKTSFRTAKTKQAVKMILTAVLIENTNLSKEDEAFVHEY No. 218) FVKKLVNEYNKTSVKKQIPVALSNQNMVIQPNSVNGTLEISE TKKSKETKTTEKDAFRAFLRDYATLDENRRHKMRLCLRNLV NLYFYGETSVSKDDFDEWRDHEDKKQNDELFVKKIVSIKTD RKGNVKEVLDVDATIDAIRTNNIACYRRALAYANENPDVFF SDTMLNKFWIHHVENEVERIYGHINNNTGDYKYQLGYLSEK VWKGIINYLSIKYIAEGKAVYNYAMNALAKDNNSNAFGKLD EKFVNGITSFEYERIKAEETLQRECAVNIAFAANHLANATVD LNEKDSDFLLLKHEDNKDTLGAVARPNILRNILQFFGGKSRW NDFDFSGIDEIQLLDDLRKMIYSLRNSSFHFKTENIDNDSWNT KLIGDMFAYDFNMAGNVQKDKMYSNNVPMFYSTSDIEKML DRLYAEVHERASQVPSFNSVFVRKNFPDYLKNDLKITSAFGV DDALKWQSAVYYVCKEIYYNDFLQNPETFTMLKDYVQCLPI DIDKSMDQKLKSERNAHKNFKEAFATYCKECDSLSAICQMI MTEYNNQNKGNRKVISARTKDGDKLIYKHYKMILFEALKN VFTIYLEKNINTYGFLKKPKLINNVPAIEEFLPNYNGRQYETL VNRITEETELQKWYIVGRLLNPKQVNQLIGNFRSYVQYVND VARRAKQTGNNLSNDNIAWDVKNIIQIFDVCTKLNGVTSNIL EDYFDDGDDYARYLKNFVDYTNKNNDHSATLLGDFCAKEI DGIKIGIYHDGTNPIVNRNIIQCKLYGATGIISDLTKDGSILSV DYEIIKKYMQMQKEIKVYQQKGICKTKEEQQNLKKYQELKN IVELRNIIDYSEILDELQGQLINWGYLRERDLMYFQLGFHYLC LHNESKKPVGYNNAGDISGAVLYQIVAMYTNGLSLIDANGK SKKNAKASAGAKVGSFCSYSKEIRGVDKDTKEDDDPIYLAG VELFENINEHQQCINLRNYIEHFHYYAKHDRSMLDLYSEVFD RFFTYDMKYTKNVPNMMYNILLQHLVVPAFEFGSSEKRLDD NDEQTKPRAMFTLREKNGLSSEQFTYRLGDGNSTVKLSARG DDYLRAVASLLYYPDRAPEGLIRDAEAEDKFAKINHSNPKSD NRNNRGNFKNPKVQWYNNKTKRK C2-28 SAMN02910398_00008 MKISKVDHRKTAVKITDNKGAEGFIYQDPTRDSSTMEQIISN [Butyrivibrio sp. RARSSKVLFNIFGDTKKSKDLNKYTESLIIYVNKAIKSLKGDK YAB3001] RNNKYEEITESLKTERVLNALIQAGNEFTCSENNIEDALNKY (SEQ ID LKKSFRVGNTKSALKKLLMAAYCGYKLSIEEKEEIQNYFVD No. 219) KLVKEYNKDTVLKYTAKSLKHQNMVVQPDTDNHVFLPSRI AGATQNKMSEKEALTEFLKAYAVLDEEKRHNLRIILRKLVN LYFYESPDFIYPENNEWKEHDDRKNKTETFVSPVKVNEEKN GKTFVKIDVPATKDLIRLKNIECYRRSVAETAGNPITYFTDHN ISKFWIHHIENEVEKIFALLKSNWKDYQFSVGYISEKVWKEII NYLSIKYIAIGKAVYNYALEDIKKNDGTLNFGVIDPSFYDGIN SFEYEKIKAEETFQREVAVYVSFAVNHLSSATVKLSEAQSDM LVLNKNDIEKIAYGNTKRNILQFFGGQSKWKEFDFDRYINPV NYTDIDFLFDIKKMVYSLRNESFHFTTTDTESDWNKNLISAM FEYECRRISTVQKNKFFSNNLPLFYGENSLERVLHKLYDDYV DRMSQVPSFGNVFVRKKFPDYMKEIGIKHNLSSEDNLKLQG ALYFLYKEIYYNAFISSEKAMKIFVDLVNKLDTNARDDKGRI THEAMAHKNFKDAISHYMTHDCSLADICQKIMTEYNQQNT GHRKKQTTYSSEKNPEIFRHYKMILFMLLQKAMTEYISSEEIF DFIMKPNSPKTDIKEEEFLPQYKSCAYDNLIKLIADNVELQK WYITARLLSPREVNQLIGSFRSYKQFVSDIERRAKETNNSLSK SGMTVDVENITKVLDLCTKLNGRFSNELTDYFDSKDDYAVY VSKFLDFGFKIDEKFPAALLGEFCNKEENGKKIGIYHNGTEPI LNSNIIKSKLYGITDVVSRAVKPVSEKLIREYLQQEVKIKPYL ENGVCKNKEEQAALRKYQELKNRIEFRDIVEYSEIINELMGQ LINFSYLRERDLMYFQLGFHYLCLNNYGAKPEGYYSIVNDK RTIKGAILYQIVAMYTYGLPIYHYVDGTISDRRKNKKTVLDT LNSSETVGAKIKYFIYYSDELFNDSLILYNAGLELFENINEHE NIVNLRKYIDHFKYYVSQDRSLLDIYSEVFDRYFTYDRKYKK NVMNLFSNIMLKHFIITDFEFSTGEKTIGEKNTAKKECAKVRI KRGGLSSDKFTYKFKDAKPIELSAKNTEFLDGVARILYYPEN VVLTDLVRNSEVEDEKRIEKYDRNHNSSPTRKDKTYKQDVK KNYNKKTSKAFDSSKLDTKSVGNNLSDNPVLKQFLSESKKK R C2-29 Blautia sp. MKISKVDHVKSGIDQKLSSQRGMLYKQPQKKYEGKQLEEH Marseille- VRNLSRKAKALYQVFPVSGNSKMEKELQIINSFIKNILLRLDS P2398 GKTSEEIVGYINTYSVASQISGDHIQELVDQHLKESLRKYTCV (SEQ ID GDKRIYVPDIIVALLKSKFNSETLQYDNSELKILIDFIREDYLK No. 220) EKQIKQIVHSIENNSTPLRIAEINGQKRLIPANVDNPKKSYIFE FLKEYAQSDPKGQESLLQHMRYLILLYLYGPDKITDDYCEEI EAWNFGSIVMDNEQLFSEEASMLIQDRIYVNQQIEEGRQSKD TAKVKKNKSKYRMLGDKIEHSINESVVKHYQEACKAVEEK DIPWIKYISDHVMSVYSSKNRVDLDKLSLPYLAKNTWNTWI SFIAMKYVDMGKGVYHFAMSDVDKVGKQDNLIIGQIDPKFS DGISSFDYERIKAEDDLHRSMSGYIAFAVNNFARAICSDEFRK KNRKEDVLTVGLDEIPLYDNVKRKLLQYFGGASNWDDSIIDI IDDKDLVACIKENLYVARNVNFHFAGSEKVQKKQDDILEEIV RKETRDIGKHYRKVFYSNNVAVFYCDEDIIKLMNHLYQREK PYQAQIPSYNKVISKTYLPDLIFMLLKGKNRTKISDPSIMNMF RGTFYFLLKEIYYNDFLQASNLKEMFCEGLKNNVKNKKSEK PYQNFMRRFEELENMGMDFGEICQQIMTDYEQQNKQKKKT ATAVMSEKDKKIRTLDNDTQKYKHFRTLLYIGLREAFIIYLK DEKNKEWYEFLREPVKREQPEEKEFVNKWKLNQYSDCSELI LKDSLAAAWYVVAHFINQAQLNHLIGDIKNYIQFISDIDRRA KSTGNPVSESTEIQIERYRKILRVLEFAKFFCGQITNVLTDYY QDENDFSTHVGHYVKFEKKNMEPAHALQAFSNSLYACGKE KKKAGFYYDGMNPIVNRNITLASMYGNKKLLENAMNPVTE QDIRKYYSLMAELDSVLKNGAVCKSEDEQKNLRHFQNLKN RIELVDVLTLSELVNDLVAQLIGWVYIRERDMMYLQLGLHY IKLYFTDSVAEDSYLRTLDLEEGSIADGAVLYQIASLYSFNLP MYVKPNKSSVYCKKHVNSVATKFDIFEKEYCNGDETVIENG LRLFENINLHKDMVKFRDYLAHFKYFAKLDESILELYSKAYD FFFSYNIKLKKSVSYVLTNVLLSYFINAKLSFSTYKSSGNKTV QHRTTKISVVAQTDYFTYKLRSIVKNKNGVESIENDDRRCEV VNIAARDKEFVDEVCNVINYNSDK C2-30 Leptotrichia MKITKIDGISHKKYIKEGKLVKSTSEENKTDERLSELLTIRLD sp. TYIKNPDNASEEENRIRRENLKEFFSNKVLYLKDGILYLKDR Marseille- REKNQLQNKNYSEEDISEYDLKNKNNFLVLKKILLNEDINSE P3007 ELEIFRNDFEKKLDKINSLKYSLEENKANYQKINENNIKKVE (SEQ ID GKSKRNIFYNYYKDSAKRNDYINNIQEAFDKLYKKEDIENLF No. 221) FLIENSKKHEKYKIRECYHKIIGRKNDKENFATIIYEEIQNVNN MKELIEKVPNVSELKKSQVFYKYYLNKEKLNDENIKYVFCH FVEIEMSKLLKNYVYKKPSNISNDKVKRIFEYQSLKKLIENKL LNKLDTYVRNCGKYSFYLQDGEIATSDFIVGNRQNEAFLRNI IGVSSTAYFSLRNILETENENDITGRMRGKTVKNNKGEEKYIS GEIDKLYDNNKQNEVKKNLKMFYSYDFNMNSKKEIEDFFSN IDEAISSIRHGIVHFNLELEGKDIFTFKNIVPSQISKKMFHDEIN EKKLKLKIFKQLNSANVFRYLEKYKILNYLNRTRFEFVNKNI PFVPSFTKLYSRIDDLKNSLGIYWKTPKTNDDNKTKEITDAQI YLLKNIYYGEFLNYFMSNNGNFFEITKEIIELNKNDKRNLKT GFYKLQKFENLQEKTPKEYLANIQSLYMINAGNQDEEEKDT YIDFIQKIFLKGFMTYLANNGRLSLIYIGSDEETNTSLAEKKQ EFDKFLKKYEQNNNIEIPYEINEFVREIKLGKILKYTERLNMF YLILKLLNHKELTNLKGSLEKYQSANKEEAFSDQLELINLLN LDNNRVTEDFELEADEIGKFLDFNGNKVKDNKELKKFDTNK IYFDGENIIKHRAFYNIKKYGMLNLLEKISDEAKYKISIEELKN YSKKKNEIEENHTTQENLHRKYARPRKDEKFTDEDYKKYEK AIRNIQQYTHLKNKVEFNELNLLQSLLLRILHRLVGYTSIWER DLRFRLKGEFPENQYIEEIFNFDNSKNVKYKNGQIVEKYINFY KELYKDDTEKISIYSDKKVKELKKEKKDLYIRNYIAHFNYIPN AEISLLEMLENLRKLLSYDRKLKNAIMKSIVDILKEYGFVVTF KIEKDKKIRIESLKSEEVVHLKKLKLKDNDKKKEPIKTYRNS KELCKLVKVMFEYKMKEKKSEN C2-31 Bacteroides MRITKVKVKESSDQKDKMVLIHRKVGEGTLVLDENLADLTA ihuae (SEQ PIIDKYKDKSFELSLLKQTLVSEKEMNIPKCDKCTAKERCLSC ID No. 222) KQREKRLKEVRGAIEKTIGAVIAGRDIIPRLNIFNEDEICWLIK PKLRNEFTFKDVNKQVVKLNLPKVLVEYSKKNDPTLFLAYQ QWIAAYLKNKKGHIKKSILNNRVVIDYSDESKLSKRKQALEL WGEEYETNQRIALESYHTSYNIGELVTLLPNPEEYVSDKGEIR PAFHYKLKNVLQMHQSTVFGTNEILCINPIFNENRANIQLSAY NLEVVKYFEHYFPIKKKKKNLSLNQAIYYLKVETLKERLSLQ LENALRMNLLQKGKIKKHEFDKNTCSNTLSQIKRDEFFVLNL VEMCAFAANNIRNIVDKEQVNEILSKKDLCNSLSKNTIDKEL CTKFYGADFSQIPVAIWAMRGSVQQIRNEIVHYKAEAIDKIF ALKTFEYDDMEKDYSDTPFKQYLELSIEKIDSFFIEQLSSNDV LNYYCTEDVNKLLNKCKLSLRRTSIPFAPGFKTIYELGCHLQ DSSNTYRIGHYLMLIGGRVANSTVTKASKAYPAYRFMLKLI YNHLFLNKFLDNHNKRFFMKAVAFVLKDNRENARNKFQYA FKEIRMMNNDESIASYMSYIHSLSVQEQEKKGDKNDKVRYN TEKFIEKVFVKGFDDFLSWLGVEFILSPNQEERDKTVTREEYE NLMIKDRVEHSINSNQESHIAFFTFCKLLDANHLSDLRNEWI KFRSSGDKEGFSYNFAIDIIELCLLTVDRVEQRRDGYKEQTEL KEYLSFFIKGNESENTVWKGFYFQQDNYTPVLYSPIELIRKY GTLELLKLIIVDEDKITQGEFEEWQTLKKVVEDKVTRRNELH QEWEDMKNKSSFSQEKCSIYQKLCRDIDRYNWLDNKLHLV HLRKLHNLVIQILSRMARFIALWDRDFVLLDASRANDDYKL LSFFNFRDFINAKKTKTDDELLAEFGSKIEKKNAPFIKAEDVP LMVECIEAKRSFYQKVFFRNNLQVLADRNFIAHYNYISKTAK CSLFEMIIKLRTLMYYDRKLRNAVVKSIANVFDQNGMVLQL SLDDSHELKVDKVISKRIVHLKNNNIMTDQVPEEYYKICRRL LEMKK C2-32 SAMN05216357_1045 MEFRDSIFKSLLQKEIEKAPLCFAEKLISGGVFSYYPSERLKEF [Porphyromonadaceae VGNHPFSLFRKTMPFSPGFKRVMKSGGNYQNANRDGRFYD bacterium LDIGVYLPKDGFGDEEWNARYFLMKLIYNQLFLPYFADAEN KH3CP3RA] HLFRECVDFVKRVNRDYNCKNNNSEEQAFIDIRSMREDESIA (SEQ ID DYLAFIQSNIIIEENKKKETNKEGQINFNKFLLQVFVKGFDSFL No. 223) KDRTELNFLQLPELQGDGTRGDDLESLDKLGAVVAVDLKLD ATGIDADLNENISFYTFCKLLDSNHLSRLRNEIIKYQSANSDF SHNEDFDYDRIISIIELCMLSADHVSTNDNESIFPNNDKDFSGI RPYLSTDAKVETFEDLYVHSDAKTPITNATMVLNWKYGTDK LFERLMISDQDFLVTEKDYFVWKELKKDIEEKIKLREELHSL WVNTPKGKKGAKKKNGRETTGEFSEENKKEYLEVCREIDRY VNLDNKLHFVHLKRMHSLLIELLGRFVGFTYLFERDYQYYH LEIRSRRNKDAGVVDKLEYNKIKDQNKYDKDDFFACTFLYE KANKVRNFIAHFNYLTMWNSPQEEEHNSNLSGAKNSSGRQN LKCSLTELINELREVMSYDRKLKNAVTKAVIDLFDKHGMVI KFRIVNNNNNDNKNKHHLELDDIVPKKIMHLRGIKLKRQDG KPIPIQTDSVDPLYCRMWKKLLDLKPTPF C2-33 Listeria MHDAWAENPKKPQSDAFLKEYKACCEAIDTYNWHKNKAT riparia LVYVNELHHLLIDILGRLVGYVAIADRDFQCMANQYLKSSG (SEQ ID HTERVDSWINTIRKNRPDYIEKLDIFMNKAGLFVSEKNGRNY No. 224) IAHLNYLSPKHKYSLLYLFEKLREMLKYDRKLKNAVTKSLID LLDKHGMCVVFANLKNNKHRLVIASLKPKKIETFKWKKIK C2-34 Insolitispinillum MRIIRPYGSSTVASPSPQDAQPLRSLQRQNGTFDVAEFSRRHP peregrinum ELVLAQWVAMLDKIIRKPAPGKNSTALPRPTAEQRRLRQQV (SEQ ID GAALWAEMQRHTPVPPELKAVWDSKVHPYSKDNAPATAKT No. 225) PSHRGRWYDRFGDPETSAATVAEGVRRHLLDSAQPFRANGG QPKGKGVIEHRALTIQNGTLLHHHQSEKAGPLPEDWSTYRA DELVSTIGKDARWIKVAASLYQHYGRIFGPTTPISEAQTRPEF VLHTAVKAYYRRLFKERKLPAERLERLLPRTGEALRHAVTV QHGNRSLADAVRIGKILHYGWLQNGEPDPWPDDAALYSSR YWGSDGQTDIKHSEAVSRVWRRALTAAQRTLTSWLYPAGT DAGDILLIGQKPDSIDRNRLPLLYGDSTRHWTRSPGDVWLFL KQTLENLRNSSFHFKTLSAFTSHLDGTCESEPAEQQAAQALW QDDRQQDHQQVFLSLRALDATTYLPTGPLHRIVNAVQSTDA TLPLPRFRRVVTRAANTRLKGFPVEPVNRRTMEDDPLLRCR YGVLKLLYERGFRAWLETRPSIASCLDQSLKRSTKAAQTING KNSPQGVEILSRATKLLQAEGGGGHGIHDLFDRLYAATARE MRVQVGYHHDAEAARQQAEFIEDLKCEVVARAFCAYLKTL GIQGDTFRRQPEPLPTWPDLPDLPSSTIGTAQAALYSVLHLMP VEDVGSLLHQLRRWLVALQARGGEDGTAITATIPLLELYLN RHDAKFSGGGAGTGLRWDDWQVFEDCQATFDRVFPPGPAL DSHRLPLRGLREVLRFGRVNDLAALIGQDKITAAEVDRWHT AEQTIAAQQQRREALHEQLSRKKGTDAEVDEYRALVTAIAD HRHLTAHVTLSNVVRLHRLMTTVLGRLVDYGGLWERDLTF VTLYEAHRLGGLRNLLSESRVNKFLDGQTPAALSKKNNAEE NGMISKVLGDKARRQIRNDFAHFNMLQQGKKTINLTDEINN ARKLMAHDRKLKNAITRSVTTLLQQDGLDIVWTMDASHRL TDAKIDSRNAIHLHKTHNRANIREPLHGKSYCRWVAALFGA TSTPSATKKSDKIR

In certain example embodiments, the RNA-targeting effector protein is a Cas13c effector protein as disclosed in U.S. Provisional Patent Application No. 62/525,165 filed Jun. 26, 2017, and PCT Application No. US 2017/047193 filed Aug. 16, 2017. Example wildtype orthologue sequences of Cas13c are provided in Table 4 below. In certain example embodiments, the CRISPR effector protein is a Cas13c protein from Table 3 or 4.

TABLE 3 Fusobacterium MEKFRRQNRNSIIKIIISNYDTKGIKELKVRYRKQAQLDTFIIKTEI necrophorum VNNDIFIKSIIEKAREKYRYSFLFDGEEKYHFKNKSSVEIVKKDIF subsp. SQTPDNMIRNYKITLKISEKNPRVVEAEIEDLMNSTILKDGRRSA funduliforme RREKSMTERKLIEEKVAKNYSLLANCPMEEVDSIKIYKIKRFLT ATCC 51357 YRSNMLLYFASINSFLCEGIKGKDNETEEIWHLKDNDVRKEKV contig00003 RENFKNKLIQSTENYNSSLKNQIEEKEKLLRKEFKKGAFYRTIIK (SEQ ID No. KLQQERIKELSEKSLTEDCEKIIKLYSKLRHSLMHYDYQYFENLF 226) ENKKNDDLMKDLNLDLFKSLPLIRKMKLNNKVNYLEDGDTLF VLQKTKKAKTLYQIYDALCEQKNGFNKFINDFFVSDGEENTVF KQIINEKFQSEMEFLEKRISESEKKNEKLKKKLDSMKAHFRNINS EDTKEAYFWDIHSSRNYKTKYNERKNLVNEYTELLGSSKEKKL LREEITKINRQLLKLKQEMEEITKKNSLFRLEYKMKIAFGFLFCE FDGNISKFKDEFDASNQEKIIQYHKNGEKYLTSFLKEEEKEKFNL EKMQKIIQKTEEEDWLLPETKNNLFKFYLLTYLLLPYELKGDFL GFVKKHYYDIKNVDFIDENQNNIQVSQTVEKQEDYFYHKIRLFE KNTKKYEIVKYSIVPNEKLKQYFEDLGIDIKYLTVEQKSEVSEEK NKKVSLKNNGMFNKTILLFVFKYYQIAFKLFNDIELYSLFFLRE KSGKPLEIFRKELESKMKDGYLNFGQLLYVVYEVLVKNKDLDK ILSKKIDYRKDKSFSPEIAYLRNFLSHLNYSKFLDNFMKINTNKS DENKEVLIPSIKIQKMIQFIEKCNLQNQIDFDFNFVNDFYMRKEK MFFIQLKQIFPDINSTEKQKMNEKEEILRNRYHLTDKKNEQIKDE HEAQSQLYEKILSLQKIYSSDKNNFYGRLKEEKLLFLEKQGKKK LSMEEIKDKIAGDISDLLGILKKEITRDIKDKLTEKFRYCEEKLLN LSFYNHQDKKKEESIRVFLIRDKNSDNFKFESILDDGSNKIFISKN GKEITIQCCDKVLETLIIEKNTLKISSNGKIISLIPHYSYSIDVKY Fusobacterium MEKFRRQNRSSIIKIIISNYDTKGIKELKVRYRKQAQLDTFIIKTEI necrophorum VNNDIFIKSBEKAREKYRYSFLFDGEEKYHFKNKSSVEIVKKDIF DJ-2 SQTPDNMIRNYKITLKISEKNPRVVEAEIEDLMNSTILKDGRRSA contig0065, RREKSMTERKLIEEKVAENYSLLANCPMEEVDSIKIYKIKRFLTY whole genome RSNMLLYFASINSFLCEGIKGKDNETEEIWHLKDNDVRKEKVKE shotgun NFKNKLIQSTENYNSSLKNQIEEKEKLLRKESKKGAFYRTIIKKL sequence (SEQ QQERIKELSEKSLTEDCEKIIKLYSELRHPLMHYDYQYFENLFEN ID No. 227) KENSELTKNLNLDIFKSLPLVRKMKLNNKVNYLEDNDTLFVLQ KTKKAKTLYQIYDALCEQKNGFNKFINDFFVSDGEENTVFKQII NEKFQSEIEFLEKRISESEKKNEKLKKKLDSMKAHFRNINSEDTK EAYFWDIHSSRNYKTKYNERKNLVNEYTELLGSSKEKKLLREEI TKINRQLLKLKQEMEEITKKNSLFRLEYKMKMAFGFLFCEFDG NISRFKDEFDASNQEKIIQYHKNGEKYLTYFLKEEEKEKFNLKK LQETIQKTGEENWLLPQNKNNLFKFYLLTYLLLPYELKGDFLGF VKKHYYDIKNVDFMDENQSSKIIESKEDDFYHKIRLFEKNTKKY EIVKYSIVPDKKLKQYFKDLGIDTKYLILDQKSEVSGEKNKKVS LKNNGMFNKTILLFVFKYYQIAFKLFNDIELYSLFFLREKSGKPF EVFLKELKDKMIGKQLNFGQLLYVVYEVLVKNKDLSEILSERID YRKDMCFSAEIADLRNFLSHLNYSKFLDNFMKINTNKSDENKE VLIPSIKIQKMIKFIEECNLQSQIDFDFNFVNDFYMRKEKMFFIQL KQIFPDINSTEKQKMNEKEEILRNRYHLTDKKNEQIKDEHEAQS QLYEKILSLQKIYSSDKNNFYGRLKEEKLLFLEKQEKKKLSMEEI KDKIAGDISDLLGILKKEITRDIKDKLTEKFRYCEEKLLNLSFYN HQDKKKEESIRVFLIRDKNSDNFKFESILDDGSNKIFISKNGKEIT IQCCDKVLETLIIEKNTLKISSNGKIISLIPHYSYSIDVKY Fusobacterium MKVRYRKQAQLDTFIIKTEIVNNDIFIKSIIEKAREKYRYSFLFDG necrophorum EEKYHFKNKSSVEIVKNDIFSQTPDNMIRNYKITLKISEKNPRVV BFTR-1 EAEIEDLMNSTILKDGRRSARREKSMTERKLIEEKVAENYSLLA contig0068 NCPIEEVDSIKIYKIKRFLTYRSNMLLYFASINSFLCEGIKGKDNE (SEQ ID No. TEEIWHLKDNDVRKEKVKENFKNKLIQSTENYNSSLKNQIEEKE 228) KLSSKEFKKGAFYRTIIKKLQQERIKELSEKSLTEDCEKIIKLYSE LRHPLMHYDYQYFENLFENKENSELTKNLNLDIFKSLPLVRKM KLNNKVNYLEDNDTLFVLQKTKKAKTLYQIYDALCEQKNGFN KFINDFFVSDGEENTVFKQIINEKFQSEMEFLEKRISESEKKNEKL KKKLDSMKAHFRNINSEDTKEAYFWDIHSSRNYKTKYNERKNL VNEYTKLLGSSKEKKLLREEITKINRQLLKLKQEMEEITKKNSLF RLEYKMKIAFGFLFCEFDGNISKFKDEFDASNQEKIIQYHKNGE KYLTSFLKEEEKEKFNLEKMQKIIQKTEEEDWLLPETKNNLFKF YLLTYLLLPYELKGDFLGFVKKHYYDIKNVDFMDENQNNIQVS QTVEKQEDYFYHKIRLFEKNTKKYEIVKYSIVPNEKLKQYFEDL GIDIKYLTGSVESGEKWLGENLGIDIKYLTVEQKSEVSEEKNKK VSLKNNGMFNKTILLFVFKYYQIAFKLFNDIELYSLFFLREKSEK PFEVFLEELKDKMIGKQLNFGQLLYVVYEVLVKNKDLDKILSK KIDYRKDKSFSPEIAYLRNFLSHLNYSKFLDNFMKINTNKSDEN KEVLIPSIKIQKMIQFIEKCNLQNQIDFDFNFVNDFYMRKEKMFF IQLKQIFPDINSTEKQKKSEKEEILRKRYHLINKKNEQIKDEHEA QSQLYEKILSLQKIFSCDKNNFYRRLKEEKLLFLEKQGKKKISM KEIKDKIASDISDLLGILKKEITRDIKDKLTEKFRYCEEKLLNISFY NHQDKKKEEGIRVFLIRDKNSDNFKFESILDDGSNKIFISKNGKEI TIQCCDKVLETLMIEKNTLKISSNGKIISLIPHYSYSIDVKY Fusobacterium MTEKKSIIFKNKSSVEIVKKDIFSQTPDNMIRNYKITLKISEKNPR necrophorum VVEAEIEDLMNSTILKDGRRSARREKSMTERKLIEEKVAENYSL subsp. LANCPMEEVDSIKIYKIKRFLTYRSNMLLYFASINSFLCEGIKGK funduliforme DNETEEIWHLKDNDVRKEKVKENFKNKLIQSTENYNSSLKNQIE 1_1_36S EKEKLLRKESKKGAFYRTIIKKLQQERIKELSEKSLTEDCEKIIKL cont1.14 (SEQ YSELRHPLMHYDYQYFENLFENKENSELTKNLNLDIFKSLPLVR ID No. 229) KMKLNNKVNYLEDNDTLFVLQKTKKAKTLYQIYDALCEQKNG FNKFINDFFVSDGEENTVFKQIINEKFQSEMEFLEKRISESEKKNE KLKKKFDSMKAHFHNINSEDTKEAYFWDIHSSSNYKTKYNERK NLVNEYTELLGSSKEKKLLREEITQINRKLLKLKQEMEEITKKNS LFRLEYKMKIAFGFLFCEFDGNISKFKDEFDASNQEKIIQYHKNG EKYLTYFLKEEEKEKFNLEKMQKIIQKTEEEDWLLPETKNNLFK FYLLTYLLLPYELKGDFLGFVKKHYYDIKNVDFMDENQNNIQV SQTVEKQEDYFYHKIRLFEKNTKKYEIVKYSIVPNEKLKQYFED LGIDIKYLTGSVESGEKWLGENLGIDIKYLTVEQKSEVSEEKIKK FL Fusobacterium MGKPNRSSIIKIIISNYDNKGIKEVKVRYNKQAQLDTFLIKSELK perfoetens DGKFILYSIVDKAREKYRYSFEIDKTNINKNEILIIKKDIYSNKED ATCC 29250 KVIRKYILSFEVSEKNDRTIVTKIKDCLETQKKEKFERENTRRLIS T364DRAFT_scaffold00009.9_C ETERKLLSEETQKTYSKIACCSPEDIDSVKIYKIKRYLAYRSNML (SEQ ID LFFSLINDIFVKGVVKDNGEEVGEIWRIIDSKEIDEKKTYDLLVE No. 230) NFKKRMSQEFINYKQSIENKIEKNTNKIKEIEQKLKKEKYKKEIN RLKKQLIELNRENDLLEKDKIELSDEEIREDIEKILKIYSDLRHKL MHYNYQYFENLFENKKISKEKNEDVNLTELLDLNLFRYLPLVR QLKLENKTNYLEKEDKITVLGVSDSAIKYYSYYNFLCEQKNGF NNFINSFFSNDGEENKSFKEKINLSLEKEIEIMEKETNEKIKEINK NELQLMKEQKELGTAYVLDIHSLNDYKISHNERNKNVKLQNDI MNGNRDKNALDKINKKLVELKIKMDKITKRNSILRLKYKLQVA YGFLMEEYKGNIKKFKDEFDISKEKIKSYKSKGEKYLEVKSEKK YITKILNSIEDIHNITWLKNQEENNLFKFYVLTYILLPFEFRGDFL GFVKKHYYDIKNVEFLDENNDRLTPEQLEKMKNDSFFNKIRLFE KNSKKYDILKESILTSERIGKYFSLLNTGAKYFEYGGEENRGIFN KNIIIPIFKYYQIVLKLYNDVELAMLLTLSESDEKDINKIKELVTL KEKVSPKKIDYEKKYKFSVLLDCFNRIINLGKKDFLASEEVKEV AKTFTNLAYLRNKICHLNYSKFIDDLLTIDTNKSTTDSEGKLLIN DRIRKLIKFIRENNQKMNISIDYNYINDYYMKKEKFIFGQRKQA KTIIDSGKKANKRNKAEELLKMYRVKKENINLIYELSKKLNELT KSELFLLDKKLLKDIDFTDVKIKNKSFFELKNDVKEVANIKQAL QKHSSELIGIYKKEVIMAIKRSIVSKLIYDEEKVLSIIIYDKTNKK YEDFLLEIRRERDINKFQFLIDEKKEKLGYEKIIETKEKKKVVVKI QNNSELVSEPRIIKNKDKKKAKTPEEISKLGILDLTNHYCFNLKI TL Fusobacterium MENKGNNKKIDFDENYNILVAQIKEYFTKEIENYNNRIDNIIDKK ulcerans ATCC ELLKYSEKKEESEKNKKLEELNKLKSQKLKILTDEEIKADVIKII 49185 cont2.38 KIFSDLRHSLMHYEYKYFENLFENKKNEELAELLNLNLFKNLTL (SEQ ID No. LRQMKIENKTNYLEGREEFNIIGKNIKAKEVLGHYNLLAEQKNG 231) FNNFINSFFVQDGTENLEFKKLIDEHFVNAKKRLERNIKKSKKLE KELEKMEQHYQRLNCAYVWDIHTSTTYKKLYNKRKSLIEEYN KQINEIKDKEVITAINVELLRIKKEMEEITKSNSLFRLKYKMQIA YAFLEIEFGGNIAKFKDEFDCSKMEEVQKYLKKGVKYLKYYKD KEAQKNYEFPFEEIFENKDTHNEEWLENTSENNLFKFYILTYLLL PMEFKGDFLGVVKKHYYDIKNVDFTDESEKELSQVQLDKMIGD SFFHKIRLFEKNTKRYEIIKYSILTSDEIKRYFRLLELDVPYFEYE KGTDEIGIFNKNIILTIFKYYQIIFRLYNDLEIHGLFNISSDLDKILR DLKSYGNKNINFREFLYVIKQNNNSSTEEEYRKIWENLEAKYLR LHLLTPEKEEIKTKTKEELEKLNEISNLRNGICHLNYKEIIEEILKT EISEKNKEATLNEKIRKVINFIKENELDKVELGFNFINDFFMKKE QFMFGQIKQVKEGNSDSITTERERKEKNNKKLKETYELNCDNL SEFYETSNNLRERANSSSLLEDSAFLKKIGLYKVKNNKVNSKVK DEEKRIENIKRKLLKDSSDIMGMYKAEVVKKLKEKLILIFKHDE EKRIYVTVYDTSKAVPENISKEILVKRNNSKEEYFFEDNNKKYV TEYYTLEITETNELKVIPAKKLEGKEFKTEKNKENKLMLNNHYC FNVKIIY Anaerosalibacter MKSGRREKAKSNKSSIVRVIISNFDDKQVKEIKVLYTKQGGIDVI sp. ND1 KFKSTEKDEKGRMKFNFDCAYNRLEEEEFNSFGGKGKQSFFVT genome TNEDLTELHVTKRHKTTGEIIKDYTIQGKYTPIKQDRTKVTVSIT assembly DNKDHFDSNDLGDKIRLSRSLTQYTNRILLDADVMKNYREIVCS Anaerosalibacter DSEKVDETINIDSQEIYKINRFLSYRSNMIIYYQMINNFLLHYDG massiliensis EEDKGGNDSINLINEIWKYENKKNDEKEKIIERSYKSIEKSINQYI ND1 (SEQ ID LNHNTEVESGDKEKKIDISEERIKEDLKKTFILFSRLRHYMVHYN No. 232) YKFYENLYSGKNFIIYNKDKSKSRRFSELLDLNIFKELSKIKLVK NRAVSNYLDKKTTIHVLNKNINAIKLLDIYRDICETKNGFNNFIN NMMTISGEEDKEYKEMVTKHFNENMNKLSIYLENFKKHSDFKT NNKKKETYNLLKQELDEQKKLRLWFNAPYVYDIHSSKKYKEL YVERKKYVDIHSKLIEAGINNDNKKKLNEINVKLCELNTEMKE MTKLNSKYRLQYKLQLAFGFILEEFNLDIDKFVSAFDKDNNLTI SKFMEKRETYLSKSLDRRDNRFKKLIKDYKFRDTEDIFCSDREN NLVKLYILMYILLPVEIRGDFLGFVKKNYYDLKHVDFIDKRNND NKDTFFHDLRLFEKNVKRLEVTSYSLSDGFLGKKSREKFGKELE KFIYKNVSIALPTNIDIKEFNKSLVLPMMKNYQIIFKLLNDIEISA LFLIAKKEGNEGSITFKKVIDKVRKEDMNGNINFSQVMKMALN EKVNCQIRNSIAHINMKQLYIEPLNIYINNNQNKKTISEQMEEIID ICITKGLTGKELNKNIINDYYMKKEKLVFNLKLRKRNNLVSIDA QQKNMKEKSILNKYDLNYKDENLNIKEIILKVNDLNNKQKLLK ETTEGESNYKNALSKDILLLNGIIRKNINFKIKEMILGIIQQNEYR YVNINIYDKIRKEDHNIDLKINNKYIEISCYENKSNESTDERINFK IKYMDLKVKNELLVPSCYEDIYIKKKIDLEIRYIENCKVVYIDIY YKKYNINLEFDGKTLFVKFNKDVKKNNQKVNLESNYIQNIKFIV S

TABLE 4 Name Sequence EH019081 MTEKKSIIFKNKSSVEIVKKDIFSQTPDNMIRNYKITLKISEKNPRVVEAEIEDLMNSTILKDG RRSARREKSMTERKLIEEKVAENYSLLANCPMEEVDSIKIYKIKRFLTYRSNMLLYFASINSFL CEGIKGKDNETEEIWHLKDNDVRKEKVKENFKNKLIQSTENYNSSLKNQIEEKEKLLRKESK KGAFYRTIIKKLQQERIKELSEKSLTEDCEKIIKLYSELRHPLMHYDYQYFENLFENKENSELTK NLNLDIFKSLPLVRKMKLNNKVNYLEDNDTLFVLQKTKKAKTLYQIYDALCEQKNGFNKFI NDFFVSDGEENTVFKQIINEKFQSEMEFLEKRISESEKKNEKLKKKFDSMKAHFHNINSEDT KEAYFWDIHSSSNYKTKYNERKNLVNEYTELLGSSKEKKLLREEITQINRKLLKLKQEMEEIT KKNSLFRLEYKMKIAFGFLFCEFDGNISKFKDEFDASNQEKIIQYHKNGEKYLTYFLKEEEKE KFNLEKMQKIIQKTEEEDWLLPETKNNLFKFYLLTYLLLPYELKGDFLGFVKKHYYDIKNVDF MDENQNNIQVSQTVEKQEDYFYHKIRLFEKNTKKYEIVKYSIVPNEKLKQYFEDLGIDIKYLT GSVESGEKWLGENLGIDIKYLTVEQKSEVSEEKIKKFL WP_094899336 MEKDKKGEKIDISQEMIEEDLRKILILFSRLRHSMVHYDYEFYQALYSGKDFVISDKNNLEN RMISQLLDLNIFKELSKVKLIKDKAISNYLDKNTTIHVLGQDIKAIRLLDIYRDICGSKNGFNKF INTMITISGEEDREYKEKVIEHFNKKMENLSTYLEKLEKQDNAKRNNKRVYNLLKQKLIEQQ KLKEWFGGPYVYDIHSSKRYKELYIERKKLVDRHSKLFEEGLDEKNKKELTKINDELSKLNSE MKEMTKLNSKYRLQYKLQLAFGFILEEFDLNIDTFINNFDKDKDLIISNFMKKRDIYLNRVL DRGDNRLKNIIKEYKFRDTEDIFCNDRDNNLVKLYILMYILLPVEIRGDFLGFVKKNYYDMK HVDFIDKKDKEDKDTFFHDLRLFEKNIRKLEITDYSLSSGFLSKEHKVDIEKKINDFINRNGA MKLPEDITIEEFNKSLILPIMKNYQINFKLLNDIEISALFKIAKDRSITFKQAIDEIKNEDIKKNS KKNDKNNHKDKNINFTQLMKRALHEKIPYKAGMYQIRNNISHIDMEQLYIDPLNSYMNS NKNNITISEQIEKIIDVCVTGGVTGKELNNNIINDYYMKKEKLVFNLKLRKQNDIVSIESQEK NKREEFVFKKYGLDYKDGEINIIEVIQKVNSLQEELRNIKETSKEKLKNKETLFRDISLINGTIR KNINFKIKEMVLDIVRMDEIRHINIHIYYKGENYTRSNIIKFKYAIDGENKKYYLKQHEINDIN LELKDKFVTLICNMDKHPNKNKQTINLESNYIQNVKFIIP WP_040490876 MENKGNNKKIDFDENYNILVAQIKEYFTKEIENYNNRIDNIIDKKELLKYSEKKEESEKNKKL EELNKLKSQKLKILTDEEIKADVIKIIKIFSDLRHSLMHYEYKYFENLFENKKNEELAELLNLNL FKNLTLLRQMKIENKTNYLEGREEFNIIGKNIKAKEVLGHYNLLAEQKNGFNNFINSFFVQD GTENLEFKKLIDEHFVNAKKRLERNIKKSKKLEKELEKMEQHYQRLNCAYVWDIHTSTTYK KLYNKRKSLIEEYNKQINEIKDKEVITAINVELLRIKKEMEEITKSNSLFRLKYKMQIAYAFLEIE FGGNIAKFKDEFDCSKMEEVQKYLKKGVKYLKYYKDKEAQKNYEFPFEEIFENKDTHNEE WLENTSENNLFKFYILTYLLLPMEFKGDFLGVVKKHYYDIKNVDFTDESEKELSQVQLDKMI GDSFFHKIRLFEKNTKRYEIIKYSILTSDEIKRYFRLLELDVPYFEYEKGTDEIGIFNKNIILTIFKY YQIIFRLYNDLEIHGLFNISSDLDKILRDLKSYGNKNINFREFLYVIKQNNNSSTEEEYRKIWE NLEAKYLRLHLLTPEKEEIKTKTKEELEKLNEISNLRNGICHLNYKEIIEEILKTEISEKNKEATLN EKIRKVINFIKENELDKVELGFNFINDFFMKKEQFMFGQIKQVKEGNSDSITTERERKEKNN KKLKETYELNCDNLSEFYETSNNLRERANSSSLLEDSAFLKKIGLYKVKNNKVNSKVKDEEKR IENIKRKLLKDSSDIMGMYKAEVVKKLKEKLILIFKHDEEKRIYVTVYDTSKAVPENISKEILVK RNNSKEEYFFEDNNKKYVTEYYTLEITETNELKVIPAKKLEGKEFKTEKNKENKLMLNNHYC FNVKIIY WP_047396607 MEEIKHKKNKSSIIRVIVSNYDMTGIKEIKVLYQKQGGVDTFNLKTIINLESGNLEIISCKPKE REKYRYEFNCKTEINTISITKKDKVLKKEIRKYSLELYFKNEKKDTVVAKVTDLLKAPDKIEGER NHLRKLSSSTERKLLSKTLCKNYSEISKTPIEEIDSIKIYKIKRFLNYRSNFLIYFALINDFLCAGV KEDDINEVWLIQDKEHTAFLENRIEKITDYIFDKLSKDIENKKNQFEKRIKKYKTSLEELKTET LEKNKTFYIDSIKTKITNLENKITELSLYNSKESLKEDLIKIISIFTNLRHSLMHYDYKSFENLFEN IENEELKNLLDLNLFKSIRMSDEFKTKNRTNYLDGTESFTIVKKHQNLKKLYTYYNNLCDKK NGFNTFINSFFVTDGIENTDFKNLIILHFEKEMEEYKKSIEYYKIKISNEKNKSKKEKLKEKIDLL QSELINMREHKNLLKQIYFFDIHNSIKYKELYSERKNLIEQYNLQINGVKDVTAINHINTKLLS LKNKMDKITKQNSLYRLKYKLKIAYSFLMIEFDGDVSKFKNNFDPTNLEKRVEYLDKKEEYL NYTAPKNKFNFAKLEEELQKIQSTSEMGADYLNVSPENNLFKFYILTYIMLPVEFKGDFLGF VKNHYYNIKNVDFMDESLLDENEVDSNKLNEKIENLKDSSFFNKIRLFEKNIKKYEIVKYSVS TQENMKEYFKQLNLDIPYLDYKSTDEIGIFNKNMILPIFKYYQNVFKLCNDIEIHALLALANK KQQNLEYAIYCCSKKNSLNYNELLKTFNRKTYQNLSFIRNKIAHLNYKELFSDLFNNELDLNT KVRCLIEFSQNNKFDQIDLGMNFINDYYMKKTRFIFNQRRLRDLNVPSKEKIIDGKRKQQN DSNNELLKKYGLSRTNIKDIFNKAWY WP_035935671 MKVRYRKQAQLDTFIIKTEIVNNDIFIKSIIEKAREKYRYSFLFDGEEKYHFKNKSSVEIVKNDI FSQTPDNMIRNYKITLKISEKNPRVVEAEIEDLMNSTILKDGRRSARREKSMTERKLIEEKV AENYSLLANCPIEEVDSIKIYKIKRFLTYRSNMLLYFASINSFLCEGIKGKDNETEEIWHLKDN DVRKEKVKENFKNKLIQSTENYNSSLKNQIEEKEKLSSKEFKKGAFYRTIIKKLQQERIKELSE KSLTEDCEKIIKLYSELRHPLMHYDYQYFENLFENKENSELTKNLNLDIFKSLPLVRKMKLNN KVNYLEDNDTLFVLQKTKKAKTLYQIYDALCEQKNGFNKFINDFFVSDGEENTVFKQIINEK FQSEMEFLEKRISESEKKNEKLKKKLDSMKAHFRNINSEDTKEAYFWDIHSSRNYKTKYNE RKNLVNEYTKLLGSSKEKKLLREEITKINRQLLKLKQEMEEITKKNSLFRLEYKMKIAFGFLFC EFDGNISKFKDEFDASNQEKIIQYHKNGEKYLTSFLKEEEKEKFNLEKMQKIIQKTEEEDWL LPETKNNLFKFYLLTYLLLPYELKGDFLGFVKKHYYDIKNVDFMDENQNNIQVSQTVEKQE DYFYHKIRLFEKNTKKYEIVKYSIVPNEKLKQYFEDLGIDIKYLTGSVESGEKWLGENLGIDIK YLTVEQKSEVSEEKNKKVSLKNNGMFNKTILLFVFKYYQIAFKLFNDIELYSLFFLREKSEKPF EVFLEELKDKMIGKQLNFGQLLYVVYEVLVKNKDLDKILSKKIDYRKDKSFSPEIAYLRNFLS HLNYSKFLDNFMKINTNKSDENKEVLIPSIKIQKMIQFIEKCNLQNQIDFDFNFVNDFYMR KEKMFFIQLKQIFPDINSTEKQKKSEKEEILRKRYHLINKKNEQIKDEHEAQSQLYEKILSLQK IFSCDKNNFYRRLKEEKLLFLEKQGKKKISMKEIKDKIASDISDLLGILKKEITRDIKDKLTEKFR YCEEKLLNISFYNHQDKKKEEGIRVFLIRDKNSDNFKFESILDDGSNKIFISKNGKEITIQCCD KVLETLMIEKNTLKISSNGKIISLIPHYSYSIDVKY WP_035906563 MEKFRRQNRSSIIKIIISNYDTKGIKELKVRYRKQAQLDTFIIKTEIVNNDIFIKSIIEKAREKYRY SFLFDGEEKYHFKNKSSVEIVKKDIFSQTPDNMIRNYKITLKISEKNPRVVEAEIEDLMNSTIL KDGRRSARREKSMTERKLIEEKVAENYSLLANCPMEEVDSIKIYKIKRFLTYRSNMLLYFASI NSFLCEGIKGKDNETEEIWHLKDNDVRKEKVKENFKNKLIQSTENYNSSLKNQIEEKEKLLR KESKKGAFYRTIIKKLQQERIKELSEKSLTEDCEKIIKLYSELRHPLMHYDYQYFENLFENKEN SELTKNLNLDIFKSLPLVRKMKLNNKVNYLEDNDTLFVLQKTKKAKTLYQIYDALCEQKNGF NKFINDFFVSDGEENTVFKQIINEKFQSEIEFLEKRISESEKKNEKLKKKLDSMKAHFRNINSE DTKEAYFWDIHSSRNYKTKYNERKNLVNEYTELLGSSKEKKLLREEITKINRQLLKLKQEME EITKKNSLFRLEYKMKMAFGFLFCEFDGNISRFKDEFDASNQEKIIQYHKNGEKYLTYFLKEE EKEKFNLKKLQETIQKTGEENWLLPQNKNNLFKFYLLTYLLLPYELKGDFLGFVKKHYYDIK NVDFMDENQSSKIIESKEDDFYHKIRLFEKNTKKYEIVKYSIVPDKKLKQYFKDLGIDTKYLIL DQKSEVSGEKNKKVSLKNNGMFNKTILLFVFKYYQIAFKLFNDIELYSLFFLREKSGKPFEVF LKELKDKMIGKQLNFGQLLYVVYEVLVKNKDLSEILSERIDYRKDMCFSAEIADLRNFLSHN YSKFLDNFMKINTNKSDENKEVLIPSIKIQKMIKFIEECNLQSQIDFDFNFVNDFYMRKEKM FFIQLKQIFPDINSTEKQKMNEKEEILRNRYHLTDKKNEQIKDEHEAQSQLYEKILSLQKIYSS DKNNFYGRLKEEKLLFLEKQEKKKLSMEEIKDKIAGDISDLLGILKKEITRDIKDKLTEKFRYCE EKLLNLSFYNHQDKKKEESIRVFLIRDKNSDNFKFESILDDGSNKIFISKNGKEITIQCCDKVL ETLIIEKNTLKISSNGKIISLIPHYSYSIDVKY WP_042678931 MKSGRREKAKSNKSSIVRVIISNFDDKQVKEIKVLYTKQGGIDVIKFKSTEKDEKGRMKFNF DCAYNRLEEEEFNSFGGKGKQSFFVTTNEDLTELHVTKRHKTTGEIIKDYTIQGKYTPIKQD RTKVTVSITDNKDHFDSNDLGDKIRLSRSLTQYTNRILLDADVMKNYREIVCSDSEKVDETI NIDSQEIYKINRFLSYRSNMIIYYQMINNFLLHYDGEEDKGGNDSINLINEIWKYENKKNDE KEKIIERSYKSIEKSINQYILNHNTEVESGDKEKKIDISEERIKEDLKKTFILFSRLRHYMVHYNY KFYENLYSGKNFIIYNKDKSKSRRFSELLDLNIFKELSKIKLVKNRAVSNYLDKKTTIHVLNKNI NAIKLLDIYRDICETKNGFNNFINNMMTISGEEDKEYKEMVTKHFNENMNKLSIYLENFKK HSDFKTNNKKKETYNLLKQELDEQKKLRLWFNAPYVYDIHSSKKYKELYVERKKYVDIHSKL IEAGINNDNKKKLNEINVKLCELNTEMKEMTKLNSKYRLQYKLQLAFGFILEEFNLDIDKFV SAFDKDNNLTISKFMEKRETYLSKSLDRRDNRFKKLIKDYKFRDTEDIFCSDRENNLVKLYIL MYILLPVEIRGDFLGFVKKNYYDLKHVDFIDKRNNDNKDTFFHDLRLFEKNVKRLEVTSYSL SDGFLGKKSREKFGKELEKFIYKNVSIALPTNIDIKEFNKSLVLPMMKNYQIIFKLLNDIEISAL FLIAKKEGNEGSITFKKVIDKVRKEDMNGNINFSQVMKMALNEKVNCQIRNSIAHINMKQ LYIEPLNIYINNNQNKKTISEQMEEIIDICITKGLTGKELNKNIINDYYMKKEKLVFNLKLRKR NNLVSIDAQQKNMKEKSILNKYDLNYKDENLNIKEIILKVNDLNNKQKLLKETTEGESNYK NALSKDILLLNGIIRKNINFKIKEMILGIIQQNEYRYVNINIYDKIRKEDHNIDLKINNKYIEISC YENKSNESTDERINFKIKYMDLKVKNELLVPSCYEDIYIKKKIDLEIRYIENCKVVYIDIYYKKY NINLEFDGKTLFVKFNKDVKKNNQKVNLESNYIQNIKFIVS WP_062627846 MEKFRRQNRNSIIKIIISNYDTKGIKELKVRYRKQAQLDTFIIKTEIVNNDIFIKSIIEKAREKYR YSFLFDGEEKYHFKNKSSVEIVKKDIFSQTPDNMIRNYKITLKISEKNPRVVEAEIEDLMNSTI LKDGRRSARREKSVTERKLIEEKVAENYSLLANCPMEEVDSIKIYKIKRFLTYRSNMLLYFASI NSFLCEGIKGKENETEEIWHLKDNDVRKEKVKENFKNKLIQSTENYNSSLKNQIEEKEKLLR KESKKGAFYRTIIKKLQQERIKELSEKSLTEDCEKIIKLYSKLRHSLMHYDYQYFENLFENKETP ELKDKLDLHLFKSLPLIRKMKLNNKVNYLEDGDTLFVLQKTKKAKTLYQIYDALCEQKNGFN KFINDFFVSDGEENTVFKQIINEKFQSEMEFLGKRISESEEKNPKLKKKFDSMKAHFHNINS EDTKEAYFWDIHSSSNYKTKYNERKNLVNEYTELLGSSKEKKLLREEITQINRKLLKLKQEME EITKKNSLFRLEYKMKMAFGFLFCEFDGNISRFKDEFDASNQEKIIQYHKNGEKYLTYFLKEE EKEKFNLKKLQETIQKTGKENWLLPQNKNNLFKFYLLTYLLLPYELKGDFLGFVKKHYYDIK NVDFMDENQSSKIIESKEDDFYHKIRLFEKNTKKYEIVKYSIVPDEKLKQYFKDLGIDTKYLIL EQKSEVSGEKNKKVSLKNNGMFNKTILLFVFKYYQIAFKLFNDIELYSLFFLREKSGKPFEVFL KELKDKMIGKQLNFGQLLYVIYEVLVKNKDLSEILSERIDYRKDMCFSAEIADLRNFLSHLNY SKFLDNFMKINTNKSDENKEVLIPSIKIQKMIKFIEECNLQSQIDFDFNFVNDFYMRKEKMF FIQLKQIFPDINSTEKQKMNEKEEILRNRYHLTDKKNEQIKDEHEAQSQLYEKILSLQKIYSS DKNNFYGRLKEEKLLFLGKQGKKKLSMEEIKDKIAGDISDLLGILKKEITRDIKDKLTEKFRYC EEKLLNLSFYNHQDKKKEESIRVFLIRDKNSDNFKFESILDDGSNKIFISKNGKEITIQCCDKV LETLMIEKNTLKISSNGKIISLVPHYSYSIDVKY WP_005959231 MEKFRRQNRNSIIKIIISNYDTKGIKELKVRYRKQAQLDTFIIKTEIVNNDIFIKSIIEKAREKYR YSFLFDGEEKYHFKNKSSVEIVKKDIFSQTPDNMIRNYKITLKISEKNPRVVEAEIEDLMNSTI LKDGRRSARREKSMTERKLIEEKVAKNYSLLANCPMEEVDSIKIYKIKRFLTYRSNMLLYFAS INSFLCEGIKGKDNETEEIWHLKDNDVRKEKVRENFKNKLIQSTENYNSSLKNQIEEKEKLL RKEFKKGAFYRTIIKKLQQERIKELSEKSLTEDCEKIIKLYSKLRHSLMHYDYQYFENLFENKK NDDLMKDLNLDLFKSLPLIRKMKLNNKVNYLEDGDTLFVLQKTKKAKTLYQIYDALCEQKN GFNKFINDFFVSDGEENTVFKQIINEKFQSEMEFLEKRISESEKKNEKLKKKLDSMKAHFRN INSEDTKEAYFWDIHSSRNYKTKYNERKNLVNEYTELLGSSKEKKLLREEITKINRQLLKLKQ EMEEITKKNSLFRLEYKMKIAFGFLFCEFDGNISKFKDEFDASNQEKIIQYHKNGEKYLTSFL KEEEKEKFNLEKMQKIIQKTEEEDWLLPETKNNLFKFYLLTYLLLPYELKGDFLGFVKKHYYD IKNVDFIDENQNNIQVSQTVEKQEDYFYHKIRLFEKNTKKYEIVKYSIVPNEKLKQYFEDLGI DIKYLTVEQKSEVSEEKNKKVSLKNNGMFNKTILLFVFKYYQIAFKLFNDIELYSLFFLREKSG KPLEIFRKELESKMKDGYLNFGQLLYVVYEVLVKNKDLDKILSKKIDYRKDKSFSPEIAYLRNF LSHLNYSKFLDNFMKINTNKSDENKEVLIPSIKIQKMIQFIEKCNLQNQIDFDFNFVNDFYM RKEKMFFIQLKQIFPDINSTEKQKMNEKEEILRNRYHLTDKKNEQIKDEHEAQSQLYEKILS LQKIYSSDKNNFYGRLKEEKLLFLEKQGKKKLSMEEIKDKIAGDISDLLGILKKEITRDIKDKLT EKFRYCEEKLLNLSFYNHQDKKKEESIRVFLIRDKNSDNFKFESILDDGSNKIFISKNGKEITIQ CCDKVLETLIIEKNTLKISSNGKIISLIPHYSYSIDVKY WP_027128616 MGKPNRSSIIKIIISNYDNKGIKEVKVRYNKQAQLDTFLIKSELKDGKFILYSIVDKAREKYRYS FEIDKTNINKNEILIIKKDIYSNKEDKVIRKYILSFEVSEKNDRTIVTKIKDCLETQKKEKFEREN TRRLISETERKLLSEETQKTYSKIACCSPEDIDSVKIYKIKRYLAYRSNMLLFFSLINDIFVKGVV KDNGEEVGEIWRIIDSKEIDEKKTYDLLVENFKKRMSQEFINYKQSIENKIEKNTNKIKEIEQ KLKKEKYKKEINRLKKQLIELNRENDLLEKDKIELSDEEIREDIEKILKIYSDLRHKLMHYNYQY FENLFENKKISKEKNEDVNLTELLDLNLFRYLPLVRQLKLENKTNYLEKEDKITVLGVSDSAIK YYSYYNFLCEQKNGFNNFINSFFSNDGEENKSFKEKINLSLEKEIEIMEKETNEKIKEINKNEL QLMKEQKELGTAYVLDIHSLNDYKISHNERNKNVKLQNDIMNGNRDKNALDKINKKLVEL KIKMDKITKRNSILRLKYKLQVAYGFLMEEYKGNIKKFKDEFDISKEKIKSYKSKGEKYLEVKS EKKYITKILNSIEDIHNITWLKNQEENNLFKFYVLTYILLPFEFRGDFLGFVKKHYYDIKNVEFL DENNDRLTPEQLEKMKNDSFFNKIRLFEKNSKKYDILKESILTSERIGKYFSLLNTGAKYFEY GGEENRGIFNKNIIIPIFKYYQIVLKLYNDVELAMLLTLSESDEKDINKIKELVTLKEKVSPKKI DYEKKYKFSVLLDCFNRIINLGKKDFLASEEVKEVAKTFTNLAYLRNKICHLNYSKFIDDLLTI DTNKSTTDSEGKLLINDRIRKLIKFIRENNQKMNISIDYNYINDYYMKKEKFIFGQRKQAKTII DSGKKANKRNKAEELLKMYRVKKENINLIYELSKKLNELTKSELFLLDKKLLKDIDFTDVKIKN KSFFELKNDVKEVANIKQALQKHSSELIGIYKKEVIMAIKRSIVSKLIYDEEKVLSIIIYDKTNKK YEDFLLEIRRERDINKFQFLIDEKKEKLGYEKIIETKEKKKVVVKIQNNSELVSEPRIIKNKDKK KAKTPEEISKLGILDLTNHYCFNLKITL WP_062624740 MEKFRRQNRNSIIKIIISNYDTKGIKELKVRYRKQAQLDTFIIKTEIVNNDIFIKSIIEKAREKYR YSFLFDGEEKYHFKNKSSVEIVKKDIFSQTPDNMIRNYKITLKISEKNPRVVEAEIEDLMNSTI LKDGRRSARREKSMTERKLIEEKVAKNYSLLANCPMEEVDSIKIYKIKRFLTYRSNMLLYFAS INSFLCEGIKGKDNETEEIWHLKDNDVRKEKVRENFKNKLIQSTENYNSSLKNQIEEKEKLL RKEFKKGAFYRTIIKKLQQERIKELSEKSLTEDCEKIIKLYSKLRHSLMHYDYQYFENLFENKK NDDLMKDLNLDLFKSLPLIRKMKLNNKVNYLEDGDTLFVLQKTKKAKTLYQIYDALCEQKN GFNKFINDFFVSDGEENTVFKQIINEKFQSEMEFLEKRISESEKKNEKLKKKLDSMKAHFRN INSEDTKEAYFWDIHSSRNYKTKYNERKNLVNEYTELLGSSKEKKLLREEITKINRQLLKLKQ EMEEITKKNSLFRLEYKMKIAFGFLFCEFDGNISKFKDEFDASNQEKIIQYHKNGEKYLTSFL KEEEKEKFNLEKMQKIIQKTEEEDWLLPETKNNLFKFYLLTYLLLPYELKGDFLGFVKKHYYD IKNVDFIDENQNNIQVSQTVEKQEDYFYHKIRLFEKNTKKYEIVKYSIVPNEKLKQYFEDLGI DIKYLTGSVESGEKWLGENLGIDIKYLTVEQKSEVSEEKNKKVSLKNNGMFNKTILLFVFKY YQIAFKLFNDIELYSLFFLREKSGKPLEIFRKELESKMKDGYLNFGOLLYVVYEVLVKNKDLD KILSKKIDYRKDKSFSPEIAYLRNFLSHLNYSKFLDNFMKINTNKSDENKEVLIPSIKIQKMIQF IEKCNLQNQIDFDFNFVNDFYMRKEKMFFIQLKQIFPDINSTEKQKMNEKEEILRNRYHLT DKKNEQIKDEHEAQSQLYEKILSLQKIYSSDKNNFYGRLKEEKLLFLEKQGKKKLSMEEIKDK IAGDISDLLGILKKEITRDIKDKLTEKFRYCEEKLLNLSFYNHQDKKKEESIRVFLIRDKNSDNF KFESILDDGSNKIFISKNGKEITIQCCDKVLETLIIEKNTLKISSNGKIISLIPHYSYSIDVKY WP_096402050 MENKNKPNRGSIVRIIISNYDMKGIKELKVRYRKQAQLDTFILQTTLDKSNNSILINDFRVK AREKYRYSFTYDGKEKFSVPSNSIIVTKIDNAAPEKSKEIRKYKITLGIDEKCKTGSMITAAIED LLEDDRVREGIRNPRRKASKTERKLITESICHNYAQITQCPVEEIDAVKIYKVKRFLSYRSNM LLFFALINDFLCKNLKNEKGEKINEIWEMENKGNNKKIDFDENYNILVAQIKEYFTKEIENY NNRIDNIIDKKELLKYSEEKEESEKNKKLEELNKLESQKLKILTDEEIKADVIKIIKIFSDLRHSL MHYEYKYFENLFENKKNEELAELLNLNLFKNLTLLRQMKIENKTNYLEGDEKFNILGKDVR AKNALGHYDLLVEQKNGFNNFINSFFVQDGTENLEFKKFIDENFIKAQKELEEDIKNCKESV KKLEKKLKENPKKSEDLEKKLEKKQKKLKELKKELEKMKQHYKRLNCAYVWDIHSSTVYKKL YNERKNLIEKYNKQLNGLQDKNAITGINAQLLRIKKEMEEITKSNSLFRLKYKMQIAYAFLE MEYEGNIAKFKNEFDCSKTEKIQEWLEKSEEYLNYCMEKEEDGKNYKFHFKEISEIKDTHN EEWLENTSENNLFKFYILTYLLLPMEFKGDFLGVVKKHYYDIKNVDFTDESEKELSQEQIDK MIGDSFFHKIRLFEKNTKRYEIIKYSILTSDEIKKYFELLELKVPYLEYKGIDEIGIFNKNIILPIFKY YQIIFRLYNDLEIHGLFNVSFDINKILSDLKSYGNENINFREFLYVIKQNNNSSTEEEYQKIWE KLESKYLKEPLLTPEKKEINKKTEKELKKLDGISFLRNKISHLEYEKIIEGVLKTAVNGENKKTSE TNADKVFLNEKIKKIINFIKENELDKIELGFNFINDFFMKKEQFMFGQIKQVKEGNSDSITTE RKRKEENNKRLKITYGLNYNNLSKIYEFSNTLREIVNSPLFLKDSTLLKKVDLSKVMLKEKPIC SLQYENNTKLEDDIKRILLKDSSDIMGIYKAEVVKKLKEKLVLIFKYDEEKKIYVTVYDTSKAV PENISKEILVKRNNSKEEYFFEDNKKKYTTQYYTLEITKENELKVIPAKKLEGKEFKTEKKEEN KLMLNNHYCFNVKIIY

In certain example embodiments, the CRISPR effector protein is a Cas13d protein selected from Table 5.

TABLE 5 RfxCas13d MIEKKKSFAKGMGVKSTLVSGSKVYMTTFAEGSDARLEKIVEGDSIRS (SEQ ID VNEGEAFSAEMADKNAGYKIGNAKFSHPKGYAVVANNPLYTGPVQQ NO: 233) DMLGLKETLEKRYFGESADGNDNICIQVIHNILDIEKILAEYITNAAYA VNNISGLDKDIIGFGKFSTVYTYDEFKDPEHHRAAFNNNDKLINAIKA QYDEFDNFLDNPRLGYFGQAFFSKEGRNYIINYGNECYDILALLSGLR HWVVHNNEEESRISRTWLYNLDKNLDNEYISTLNYLYDRITNELTNSF SKNSAANVNYIAETLGINPAEFAEQYFRFSIMKEQKNLGFNITKLREV MLDRKDMSEIRKNHKVFDSIRTKVYTMMDFVIYRYYIEEDAKVAAA NKSLPDNEKSLSEKDIFVINLRGSFNDDQKDALYYDEANRIWRKLENI MHNIKEFRGNKTREYKKKDAPRLPRILPAGRDVSAFSKLMYALTMFL DGKEINDLLTTLINKFDNIQSFLKVMPLIGVNAKFVEEYAFFKDSAKIA DELRLIKSFARMGEPIADARRAMYIDAIRILGTNLSYDELKALADTFSL DENGNKLKKGKHGMRNFIINNVISNKRFHYLIRYGDPAHLHEIAKNEA VVKFVLGRIADIQKKQGQNGKNQIDRYYETCIGKDKGKSVSEKVDAL TKIITGMNYDQFDKKRSVIEDTGRENAEREKFKKIISLYLTVIYHILKNI VNINARYVIGFHCVERDAQLYKEKGYDINLKKLEEKGFSSVTKLCAGI DETAPDKRKDVEKEMAERAKESIDSLESANPKLYANYIKYSDEKKAE EFTRQINREKAKTALNAYLRNTKWNVIIREDLLRIDNKTCTLFRNKAV HLEVARYVHAYINDIAEVNSYFQLYHYIMQRIIMNERYEKSSGKVSEY FDAVNDEKKYNDRLLKLLCVPFGYCIPRFKNLSIEALFDRNEAAKFDK EKKKVSGNS AdmCas13d MNNKRKTKAKAAGLKSVFFDQKQAVLTTFAKGNNSQIEKKVVNSEV (SEQ ID KDLRQPPAFDLELKEKTFYISGKNNINTSRENPLASASLPLSKRQRIRA NO: 234) ERIKRAREENRPYHNVKRVGEDDLRAKADLEKHYFGKEYSDNLKIQII YNILDINKIISPYINDIVYSMNNLARNDEYIDGKIDVIGSLSSTTDYSSFM SPNKDLEKEKKFSFHRENYKKFVEASKPYMRYYGKVFIRDVKKSKLS TGKGEKIEVMYRSDEEIFTIFQILSYVRQSIMHNDIGNKSSILAIEKYPA RFVGFLSDLLKTKTNDVNRMFIDNNSQTNFWVLFSIFGLQDHTSGAD KICRNFYDFVIKADSKNLGFSLKKIRELMLDLPNANMLRDHQFDTVRS KFYTLLDFIIYQHYLEEKSRIDNMVEKLRMTLKEEEKEVLYAAEAKIV WNAIGAKVINKLVPMMNGDALKEIKRKNRDRKLPQSVIATVQVNSD ANVFSGLIYFLTLFLDGKEINEMVSNLITKFENIDSLLHVDREIYKSDEK DLDLEIEKLALFFKGVVRPNAKTDTGAGEISKSFSIFQSAERIIEELKFIK NVTRMDNEIFPSEGVFLDAANVLGVRGDDFDFSNEFVGDDLHSDANK KIINKINGTKEDRNLRNFIINNVVKSRRFQYIAREININTHYVKQLANNE TLNRFVLNKMGDAKIINRYYESISGNTPNIEVRSQIDYLVKRLRSFSFE DLNDVKQKVRPGTNESIEKEKKKALVGLCLTIQYLVYKNLVNINARY TTAFYCLERDSKLKGFGVDVWRDFESYTALTNHFIKEGYLPVRKAEIL RANLKHLDCEDGFKYYRNQVTHLNAIRVAYKYINEIKSVHSYFALYH YIMQRHLYDSLQAKAKDSSGFVIDALKKSFEHKIYSKDLLHVLHSPFG YNTARYKNLSIEALFDKNESRPEVNPLSTND UrCas13d MAKKNKMKPRELREAQKKARQLKAAEINNNAAPAIAAMPAAEVIAP (SEQ ID AAEKKKSSVKAAGMKSILVSENKMYITSFGKGNSAVLEYEVDNNDY NO: 235) NQTQLSSKDNSNIQLGGVNEVNITFSSKHGFESGVEINTSNPTHRSGES SPVRGDMLGLKSELEKRFFGKTFDDNIHIQLIYNILDIEKILAVYVTNIV YALNNMLGVKGSESHDDFIGYLSTNNIYDVFIDPDNSSLSDDKKANVR KSLSKFNALLKTKRLGYFGLEEPKTKDNRVSQAYKKRVYHMLAIVGQ IRQCVFHDKSGAKRFDLYSFINNIDPEYRDTLDYLVEERLKSINKDFIE DNKVNISLLIDMMKGYEADDIIRLYYDFIVLKSQKNLGFSIKKLREKM LDEYGFRFKDKQYDSVRSKMYKLMDFLLFCNYYRNDIAAGESLVRK LRFSMTDDEKEGIYADEAAKLWGKFRNDFENIADHMNGDVIKELGK ADMDFDEKILDSEKKNASDLLYFSKMIYMLTYFLDGKEINDLLTTLIS KFDNIKEFLKIMKSSAVDVECELTAGYKLFNDSQRITNELFIVKNIASM RKPAASAKLTMFRDALTILGIDDKITDDRISGILKLKEKGKGIHGLRNFI TNNVIESSRFVYLIKYANAQKIREVAKNEKVVMFVLGGIPDTQIERYY KSCVEFPDMNSSLGVKRSELARMIKNISFDDFKNVKQQAKGRENVAK ERAKAVIGLYLTVMYLLVKNLVNVNARYVIAIHCLERDFGLYKEIIPE LASKNLKNDYRILSQTLCELCDKSPNLFLKKNERLRKCVEVDINNADS SMTRKYRNCIAHLTVVRELKEYIGDICTVDSYFSIYHYVMQRCITKRE NDTKQEEKIKYEDDLLKNHGYTKDFVKALNSPFGYNIPRFKNLSIEQL FDRNEYLTEK P1E0Cas13d MEREVKKPPKKSLAKAAGLKSTFVISPQEKELAMTAFGRGNDALLQK (SEQ ID RIVDGVVRDVAGEKQQFQVQRQDESRFRLQNSRLADRTVTADDPLH NO: 236) RAETPRRQPLGAGMDQLRRKAILEQKYFGRTFDDNIHIQLIYNILDIHK MLAVPANHIVHTLNLLGGYGETDFVGMLPAGLPYDKLRVVKKKNGD TVDIKADIAAYAKRPQLAYLGAAFYDVTPGKSKRDAARGRVKREQD VYAILSLMSLLRQFCARDSVRIWGQNTTAALYHLQALPQDMKDLLD DGWRRALGGVNDHFLDTNKVNLLTLFEYYGAETKQARVALTQDFYR FVVLKEQKNMGFSLRRLREELLKLPDAAYLTGQEYDSVRQKLYMLL DFLLCRLYAQERADRCEELVSALRCALSDEEKDTVYQAEAAALWQA LGDTLRRKLLPLLKGKKLQDKDKKKSDELGLSRDVLDGVLFRPAQQG SRANADYFCRLMHLSTWFMDGKEINTLLTTLISKLENIDSLRSVLESM GLAYSFVPAYAMFDHSRYIAGQLRVVNNIARMRKPAIGAKREMYRA AVVLLGVDSPEAAAAITDDLLQIDPETGKVRPRSDSARDTGLRNFIAN NVVESRRFTYLLRYMTPEQARVLAQNEKLIAFVLSTVPDTQLERYCRT CGREDITGRPAQIRYLTAQIMGVRYESFTDVEQRGRGDNPKKERYKA LIGLYLTVLYLAVKNMVNCNARYVIAFYCRDRDTALYQKEVCWYDL EEDKKSGKQRQVEDYTALTRYFVSQGYLNRHACGYLRSNIVINGISNSL LTAYRNAVDHLNAIPPLGSLCRDIGRVDSYFALYHYAVQQYLNGRYY RKTPREQELFAAMAQHRTWCSDLVKALNTPFGYNLARYKNLSIDGLF DREGDHVVREDGEKPAE RffCas13d MKKKMSLREKREAEKQAKKAAYSAASKNTDSKPAEKKAETPKPAEII (SEQ ID SDNSRNKTAVKAAGLKSTIISGDKLYMTSFGKGNAAVIEQKIDINDYS NO: 237) FSAMKDTPSLEVDKAESKEISFSSHHPFVKNDKLTTYNPLYGGKDNPE KPVGRDMLGLKDKLEERYFGCTFNDNLHIQIIYNILDIEKILAVHSANI TTALDHMVDEDDEKYLNSDYIGYMNTINTYDVFMDPSKNSSLSPKDR KNIDNSRAKFEKLLSTKRLGYFGFDYDANGKDKKKNEEIKKRLYHLT AFAGQLRQWSFHSAGNYPRTWLYKLDSLDKEYLDTLDHYFDKRFND INDDFVTKNATNLYILKEVFPEANFKDIADLYYDFIVIKSHKNMGFSIK KLREKMLECDGADRIKEQDMDSVRSKLYKLIDFCIFKYYHEFPELSEK NVDILRAAVSDTKKDNLYSDEAARLWSIFKEKFLGFCDKIVVWVTGE HEKDITSVIDKDAYRNRSNVSYFSKLMYAMCFFLDGKEINDLLTTLIN KFDNIANQIKTAKELGINTAFVKNYDFFNHSEKYVDELNIVKNIARMK KPSSNAKKAMYHDALTILGIPEDMDEKALDEELDLILEKKTDPVTGKP LKGKNPLRNFIANNVIENSRFIYLIKFCNPENVRKIVNNTKVTEFVLKRI PDAQIERYYKSCTDSEMNPPTEKKITELAGKLKDMNFGNFRNVRQSA KENMEKERFKAVIGLYLTVVYRVVKNLVDVNSRYIMAFHSLERDSQL YNVSVDNDYLALTDTLVKEGDNSRSRYLAGNKRLRDCVKQDIDNAK KWFVSDKYNSITKYRNNVAHLTAVRNCAEFIGDITKIDSYFALYHYLI QRQLAKGLDHERSGFDRNYPQYAPLFKWHTYVKDVVKALNAPFGYN IPRFKNLSIDALFDRNEIKKNDGEKKSDD RaCas13d MAKKSKGMSLREKRELEKQKRIQKAAVNSVNDTPEKTEEANVVSVN (SEQ ID VRTSAENKHSKKSAAKALGLKSGLVIGDELYLTSFGRGNEAKLEKKIS NO: 238) GDTVEKLGIGAFEVAERDESTLTLESGRIKDKTARPKDPRHITVDTQG KFKEDMLGIRSVLEKKIFGKTFDDNIHVQLAYNILDVEKIMAQYVSDI VYMLHNTDKTERNDNLMGYMSIRNTYKTFCDTSNLPDDTKQKVENQ KREFDKIIKSGRLGYFGEAFMVNSGNSTKLRPEKEIYHIFALMASLRQS YFHGYVKDTDYQGTTWAYTLEDKLKGPSHEFRETIDKIFDEGFSKISK DFGKMNKVNLQILEQMIGELYGSIERQNLTCDYYDFIQLKKHKYLGFS IKRLRETMLETTPAECYKAECYNSERQKLYKLIDFLIYDLYYNRKPARI EEIVDKLRESVNDEEKESIYSVEAKYVYESLSKVLDKSLKNSVSGETIK DLQKRYDDETANRIWDISQHSISGNVNCFCKLIYIMTLMLDGKEINDL LTTLVNKFDNIASFIDVMDELGLEHSFTDNYKMFADSKAICLDLQFINS FARMSKIDDEKSKRQLFRDALVILDIGNKDETWINNYLDSDIFKLDKE GNKLKGARHDFRNFIANNVIKSSRFKYLVKYSSADGMIKLKTNEKLIG FVLDKLPETQIDRYYESCGLDNAVVDKKVRIEKLSGLIRDMKFDDFSG VKTSNKAGDNDKQDKAKYQAIISLYLMVLYQIVKNMIYVNSRYVIAF HCLERDFGMYGKDFGKYYQGCRKLTDHFIEEKYMKEGKLGCNKKV GRYLKNNISCCTDGLINTYRNQVDHFAVVRKIGNYAAYIKSIGSWFEL YHYVIQRIVFDEYRFALNNTESNYKNSIIKHHTYCKDMVKALNTPFGY DLPRYKNLSIGDLFDRNNYLNKTKESIDANSSIDSQ EsCas13d MGKKIHARDLREQRKTDRTEKFADQNKKREAERAVPKKDAAVSVKS (SEQ ID VSSVSSKKDNVTKSMAKAAGVKSVFAVGNTVYMTSFGRGNDAVLEQ NO: 239) KIVDTSHEPLNIDDPAYQLNVVTMNGYSVTGHRGETVSAVTDNPLRR FNGRKKDEPEQSVPTDMLCLKPTLEKKFFGKEFDDNIHIQLIYNILDIE KILAVYSTNAIYALNNMSADENIENSDFFMKRTTDETFDDFEKKKEST NSREKADFDAFEKFIGNYRLAYFADAFYVNKKNPKGKAKNVLREDK ELYSVLTLIGKLRHWCVHSEEGRAEFWLYKLDELKDDFKNVLDVVY NRPVEEINNRFIENNKVNIQILGSVYKNTDIAELVRSYYEFLITKKYKN MGFSIKKLRESMLEGKGYADKEYDSVRNKLYQMTDFILYTGYINEDS DRADDLVNTLRSSLKEDDKTTVYCKEADYLWKKYRESIREVADALD GDNIKKLSKSNIEIQEDKLRKCFISYADSVSEFTKLIYLLTRFLSGKEIN DLVTTLINKFDNIRSFLEIMDELGLDRTFTAEYSFFEGSTKYLAELVEL NSFVKSCSFDINAKRTMYRDALDILGIESDKTEEDIEKMIDNILQIDAN GDKKLKKNNGLRNFIASNVIDSNRFKYLVRYGNPKKIRETAKCKPAV RFVLNEIPDAQIERYYEACCPKNTALCSANKRREKLADMIAEIKFENFS DAGNYQKANVTSRTSEAEIKRKNQAIIRLYLTVMYIMLKNLVNVNAR YVIAFHCVERDTKLYAESGLEVGNIEKNKTNLTMAVMGVKLENGIIK TEFDKSFAENAANRYLRNARWYKLILDNLKKSERAVVNEFRNTVCHL NAIRNININIKEIKEVENYFALYHYLIQKHLENRFADKKVERDTGDFIS KLEEHKTYCKDFVKAYCTPFGYNLVRYKNLTIDGLFDKNYPGKDDSD EQK

Cas13 Variants and Mutations

The present disclosure provides for variants and mutated forms of Cas proteins. In some examples, the present disclosure includes variants and mutated forms of Cas 13, e.g., Cas13b. The variants or mutated forms of Cas protein may be catalytically inactive, e.g., have no or reduced nuclease activity compared to a corresponding wildtype. In certain examples, the variants or mutated forms of Cas protein have nickase activity.

Mutations of Cas13

In some cases, the present disclosure provides for mutated Cas13 proteins comprising one or more modified of amino acids, wherein the amino acids: (a) interact with a guide RNA that forms a complex with the mutated Cas 13 protein; (b) are in a HEPN active site, an inter-domain linker domain, or a bridge helix domain of the mutated Cas 13 protein; or a combination thereof.

The term “corresponding amino acid” or “residue which corresponds to” refers to a particular amino acid or analogue thereof in a Cas13 homologue or orthologue that is identical or functionally equivalent to an amino acid in reference Cas protein. Accordingly, as used herein, referral to an “amino acid position corresponding to amino acid position [X]” of a specified Cas 13 protein represents referral to a collection of equivalent positions in other recognized Cas 13 and structural homologues and families. The mutations described herein apply to all Cas13 protein that is orthologs or homologs of the referred Cas protein (e.g., PbCas13b). For example, the mutations apply to Cas13a, Cas13b, Cas13c, Cas13d, Cas13b-t1, Cas13b-t2, or Cas13b-t3.

In an aspect, the invention relates to a mutated Cas13 protein comprising one or more mutation of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): T405, H407, K457, H500, K570, K590, N634, R638, N652, N653, K655, S658, K741, K744, N756, S757, R762, R791, K846, K857, K870, R877, K183, K193, R600, K607, K612, R614, K617, K826, K828, K829, R824, R830, Q831, K835, K836, R838, R618, D434, K431, R53, K943, R1041, Y164, R285, R287, K292, E296, N297, Q646, N647, R402, K393, N653, N652, R482, N480, D396, E397, D398, E399, K294, E400, R56, N157, H161, H452, N455, K484, N486, G566, H567, A656, V795, A796, W842, K871, E873, R874, R1068, N1069, or H1073.

PbCas13b as used herein preferably has the sequence of NCBI Reference Sequence WP_004343973.1. It is to be understood that WP_004343973.1 refers to the wild type (i.e. unmutated) PbCas13b. LshCas13a (Leptotrichia shahii Cas13a) as used herein preferably has the sequence of NCBI Reference Sequence WP_018451595.1. It is to be understood that WP_018451595.1 refers to the wild type (i.e. unmutated) LshCas13b. Pgu Cas13b (Porphyromonas gulae Cas13b) as used herein preferably has the sequence of NCBI Reference Sequence WP_039434803.1. It is to be understood that WP_039434803.1 refers to the wild type (i.e. unmutated) Pgu Cas13b. Psp Cas13b (Prevotella sp. P5-125 Cas13b) as used herein preferably has the sequence of NCBI Reference Sequence WP_044065294.1. It is to be understood that WP_044065294.1 refers to the wild type (i.e. unmutated) Psp Cas13b.

In embodiments of the invention, a Type VI system comprises a mutated Cas13 effector protein according to the invention as described herein (and optionally a small accessory protein encoded upstream or downstream of a Cas13b effector protein). In certain embodiments, the small accessory protein enhances the Cas13b effector's ability to target RNA.

Insights from the structure of Cas13 enables further rational engineering to improve functionality for RNA targeting specificity, base editing, and nucleic acid detection, etc. Based on the elucidated crystal structure of the Cas13 effector with its crRNA described herein, functional implications of rational engineering and mutagenesis can be postulated, of which non-limiting mutations are exemplified in Table 6 below (with reference to PbCas13b; WP_004343973.1).

TABLE 6 Residue Descrption Expected result T405 coordinates first base of alter activity guide (U) H407 basestacking with UO possible PFS involvment H407Y/W/F basestacking with UO alter PFS K457 direct readout of A31 H500 hydrogen bond with bb of G11 alter activity K570 direct readout of G25 alter activity K590 bb of U27 alter activity N634 bb of A29 alter activity R638 bb of A28 alter activity N652 direct readout of U2 and C36 alter activity N653 direct readout of C36 alter activity K655 hydrogen bonds with bb of na 3 alter activity S658 coordinates first base of guide alter activity K741 direct readout of U27 alter activity K744 hydrogen bonds with bb of na 6 alter activity N756 direct readout of C33 and C5 alter activity S757 direct readout of A32 alter activity R762 hydrogen bond with bb of G10 alter activity R791 bb of A22 alter activity K846 hydrogen bond with bb of U18 alter activity K857 hydrogen bond with bb of C15 alter activity K870 hydrogen bond with base of U19 alter activity R877 direct readout of U18 alter activity Channels K183 Outerchannel rim alter activity K193 Outerchannel rim alter activity R600 Outerchannel rim alter activity K607 Outerchannel rim alter activity K612 Outerchannel rim alter activity R614 Outerchannel rim alter activity K617 Outerchannel rim alter activity K826 Bridge helix domain alter activity K828 Bridge helix domain alter activity K829 Bridge helix domain alter activity R824 Bridge helix domain alter activity R830 Bridge helix domain alter activity Q831 Bridge helix domain alter activity K835 Bridge helix domain alter activity K836 Bridge helix domain alter activity R838 Bridge helix domain alter activity R618 conserved outer channel arginien alter activity D434 Conserved loop alter activity K431 Conserved loop alter activity Active site pocket 46-57 HEP1 73-79 HEP1 152-164 HEP1 1036-1046 HEP2 1064-1074 HEP2 R53A/K/D/E HEP1 change in base specificity K943A/R/D/E HEP2 change in base specificity R1041A/K/D/E HEP2 change in base specificity Y164A/F/W affect base stacking at active site Interdomain linker 285-299 R285 central channel active pocket alter activity R287 central channel active pocket alter activity K292 central channel active pocket alter activity E296 central channel active pocket alter activity N297 central channel active pocket alter activity Other Trans active site loop alter activity Q646 Trans active site loop alter activity N647 Trans active site loop alter activity HEPN interface crRNA processiong R402 remove crRNA processing alter crRNA processing K393 remove crRNA processing alter crRNA processing N653 remove crRNA processing alter crRNA processing N652 remove crRNA processing alter crRNA processing R482 remove crRNA processing alter crRNA processing N480 remove crRNA processing alter crRNA processing LID domain D396 hairpin with unknown function alter crRNA processing E397 hairpin with unknown function alter crRNA processing D398 hairpin with unknown function alter crRNA processing E399 hairpin with unknown function alter crRNA processing K294 IDL alter activity

Structural (Sub)Domains

In another aspect, the disclosure provides a mutated Cas13 protein comprising one or more mutations of amino acids, wherein the amino acids: interact with a guide RNA that forms a complex with the engineered Cas 13 protein; or are in a HEPN active site, a lid domain, a helical domain, selected from a helical 1 or a helical 2 domain, an inter-domain linker (IDL) domain, or a bridge helix domain of the mutated Cas 13 protein, or a combination thereof.

Based on the crystal structure of the Cas protein, different structural domains can be identified. In addition to sequence alignments, the information of the crystal structure and domain architecture allows corresponding amino acids of different orthologues (e.g. Cas13b orthologues) and homologues (other Cas13 proteins, such as Cas13a, Cas13c, or Cas13d) to be identified. By means of example, and without limitation, the crystal structure of PbCas13b in complex with crRNA as reported herein, identifies the following structural domains (see also FIG. 1A): HEPN1 and HEPN2 (catalytic domains, respectively spanning from amino acid 1 to 285 and 930 to 1127); IDL (interdomain linker, spanning from amino acids 286 to 301); helical domains 1 and 2, whereby helical domain is split in helical domain 1-1, 1-2, and 1-3 (respectively spanning from amino acids 302 to 374, 499 to 581, and 747 to 929), and helical domain 2 spanning from amino acids 582 to 746; LID (spanning from amino acids 375 to 498). Helical domain 1, in particular helical domain 1-3 encompasses a bridge helix as a discernible subdomain. Accordingly, particular mutations according to the invention as described herein, apart from having a specified amino acid position in the Cas13 polypeptide can also be linked to a particular structural domain of the Cas13 protein. Hence a corresponding amino acid in a Cas13 orthologue or homologue can have a specified amino acid position in the Cas13 polypeptide as well as belong to a corresponding structural domain (see also for instance FIG. 4 as an example of corresponding amino acids in HEPN1 and HEPN2 of Cas13a and Cas13b). Mutations may be identified by locations in structural (sub) domains, by position corresponding to amino acids of a particular Cas13 protein (e.g. PbCas13b), by interactions with a guide RNA, or a combination thereof.

The types of mutations can be conservative mutations or non-conservative mutations. In certain preferred embodiments, the amino acid which is mutated is mutated into alanine (A). In certain preferred embodiments, if the amino acid to be mutated is an aromatic amino acid, it is mutated into alanine or another aromatic amino acid (e.g. H, Y, W, or F). In certain preferred embodiments, if the amino acid to be mutated is a charged amino acid, it is mutated into alanine or another charged amino acid (e.g. H, K, R, D, or E). In certain preferred embodiments, if the amino acid to be mutated is a charged amino acid, it is mutated into alanine or another charged amino acid having the same charge. In certain preferred embodiments, if the amino acid to be mutated is a charged amino acid, it is mutated into alanine or another charged amino acid having the opposite charge.

The invention also provides for methods and compositions wherein one or more amino acid residues of the effector protein may be modified e.g., an engineered or non-naturally-occurring effector protein or Cas13. In an embodiment, the modification may comprise mutation of one or more amino acid residues of the effector protein. The one or more mutations may be in one or more catalytically active domains of the effector protein, or a domain interacting with the crRNA (such as the guide sequence or direct repeat sequence). The effector protein may have reduced or abolished nuclease activity or alternatively increased nuclease activity compared with an effector protein lacking said one or more mutations. The effector protein may not direct cleavage of the RNA strand at the target locus of interest. In a preferred embodiment, the one or more mutations may comprise two mutations. In a preferred embodiment the one or more amino acid residues are modified in a Cas13b effector protein, e.g., an engineered or non-naturally-occurring effector protein or Cas13b. In some cases, the CRISPR-Cas protein comprises one or more mutations in the helical domain.

The Cas13 protein herein may comprise one or more mutations. In some cases, the Cas13 protein comprises one or more mutations of amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): T405, H407, K457, H500, K570, K590, N634, R638, N652, N653, K655, S658, K741, K744, N756, S757, R762, R791, K846, K857, K870, R877, K183, K193, R600, K607, K612, R614, K617, K826, K828, K829, R824, R830, Q831, K835, K836, R838, R618, D434, K431, R53, K943, R1041, Y164, R285, R287, K292, E296, N297, Q646, N647, R402, K393, N653, N652, R482, N480, D396, E397, D398, E399, K294, E400, R56, N157, H161, H452, N455, K484, N486, G566, H567, A656, V795, A796, W842, K871, E873, R874, R1068, N1069, or H1073.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): H407, K457, H500, K570, K590, N634, R638, N652, N653, K655, S658, K741, K744, N756, S757, R762, R791, K846, K857, K870, R877, K183, K193, R600, K607, K612, R614, K617, K826, K828, K829, R824, R830, Q831, K835, K836, R838, R618, D434, K431, R53, K943, R1041, Y164, R285, R287, K292, E296, N297, Q646, N647, R402, K393, N653, N652, R482, N480, D396, E397, D398, E399, K294, E400, R56, N157, H161, H452, N455, K484, N486, G566, H567, W842, K871, E873, R874, R1068, N1069, or H1073.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): T405, H407, K457, H500, K570, K590, N634, R638, N652, N653, K655, S658, K741, K744, N756, S757, R762, R791, K846, K857, K870, R877, K183, K193, R600, K607, K612, R614, K617, K826, K828, K829, R824, R830, Q831, K835, K836, R838, R618, D434, K431, R53, K943, R1041, Y164, R285, R287, K292, E296, N297, Q646, N647, R402, K393, N653, N652, R482, N480, D396, E397, D398, E399, K294, or E400.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K393, R402, N482, T405, H407, S658, N653, A656, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, K741, R56, N157, H161, R1068, N1069, or H1073. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: K393, R402, N482, H407, S658, N653, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, K741, R56, N157, H161, R1068, N1069, or H1073.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: W842, K846, K870, E873, or R877. In some cases, the Cas13 protein comprises in helical domain 1 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 1 of PbCas13b: W842, K846, K870, E873, or R877. In some cases, the Cas13 protein comprises in helical domain 1-3 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 1-3 of PbCas13b: W842, K846, K870, E873, or R877. In some cases, the Cas13 protein comprises in the helical bridge domain one or more mutations of an amino acid corresponding to the following amino acids in the helical bridge domain of PbCas13b: W842, K846, K870, E873, or R877. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: K393, R402, N480, N482, N652, or N653. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: K393, R402, N480, or N482. In some cases, the Cas13 protein comprises in the LID domain one or more mutations of an amino acid corresponding to the following amino acids in the LID domain of PbCas13b: K393, R402, N480, or N482. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: N652 or N653. In some cases, the Cas13 protein comprises in helical domain 2 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 2 of PbCas13b: N652 or N653.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: T405, H407, S658, N653, A656, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, or K741. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: H407, S658, N653, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, or K741. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: S658, N653, A656, K655, N652, H567, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, G566, K590, R638, S757, N756, or K741. In some cases, the Cas13 protein comprises in a helical domain one or more mutations of an amino acid corresponding to the following amino acids in a helical domain of PbCas13b: S658, N653, A656, K655, N652, H567, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, G566, K590, R638, S757, N756, or K741.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: H567, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, G566, S757, or N756. In some cases, the Cas13 protein comprises in helical domain 1 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 1 of PbCas13b: H567, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, G566, S757, or N756. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: H567, H500, R762, V795, A796, R791, G566, S757, or N756. In some cases, the Cas13 protein comprises in helical domain 1 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 1 of PbCas13b: H567, H500, R762, V795, A796, R791, G566, S757, or N756.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: K871, K857, K870, W842, E873, R877, K846, or R874. In some cases, the Cas13 protein comprises in the helical bridge domain one or more mutations of an amino acid corresponding to the following amino acids in the helical bridge domain of PbCas13b: K871, K857, K870, W842, E873, R877, K846, or R874. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: H567, H500, or G566. In some cases, the Cas13 protein comprises in helical domain 1-2 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 1-2 of PbCas13b: H567, H500, or G566. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, S757, or N756. In some cases, the Cas13 protein comprises in helical domain 1-3 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1-3 of PbCas13b: K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, S757, or N756. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: R762, V795, A796, R791, S757, or N756. In some cases, the Cas13 protein comprises in helical domain 1-3 one or more mutation of an amino acid corresponding to the following amino acids in helical domain 1-3 of PbCas13b: R762, V795, A796, R791, S757, or N756. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: S658, N653, A656, K655, N652, K590, R638, or K741. In some cases, the Cas13 protein comprises in helical domain 2 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 2 of PbCas13b: S658, N653, A656, K655, N652, K590, R638, or K741. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: T405, H407, N486, K484, N480, H452, N455, or K457. In some cases, the Cas13 protein comprises in the LID domain one or more mutations of an amino acid corresponding to the following amino acids in the LID domain of PbCas13b: T405, H407, N486, K484, N480, H452, N455, or K457. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: S658, N653, K655, N652, H567, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, G566, K590, R638, S757, N756, or K741.

In some cases, the Cas13 protein comprises in a helical domain one or more mutations of an amino acid corresponding to the following amino acids in a helical domain of PbCas13b: S658, N653, K655, N652, H567, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, G566, K590, R638, S757, N756, or K741. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: H567, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, G566, S757, or N756. In some cases, the Cas13 protein comprises in helical domain 1 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 1 of PbCas13b: H567, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, G566, S757, or N756. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: H567, H500, R762, R791, G566, S757, or N756. In some cases, the Cas13 protein comprises in helical domain 1 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 1 of PbCas13b: H567, H500, R762, R791, G566, S757, or N756. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, S757, or N756. In some cases, the Cas13 protein comprises in helical domain 1-3 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 1-3 of PbCas13b: K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, S757, or N756. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: R762, R791, S757, or N756. In some cases, the Cas13 protein comprises in helical domain 1-3 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 1-3 of PbCas13b: R762, R791, S757, or N756.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: S658, N653, K655, N652, K590, R638, or K741. In some cases, the Cas13 protein comprises in helical domain 2 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 2 of PbCas13b: S658, N653, K655, N652, K590, R638, or K741.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: H407, N486, K484, N480, H452, N455, or K457. In some cases, the Cas13 protein comprises in the LID domain one or more mutations of an amino acid corresponding to the following amino acids in the LID domain of PbCas13b: H407, N486, K484, N480, H452, N455, or K457.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: R56, N157, H161, R1068, N1069, or H1073. In some cases, the Cas13 protein comprises in a HEPN domain one or more mutations of an amino acid corresponding to the following amino acids in a HEPN domain of PbCas13b: R56, N157, H161, R1068, N1069, or H1073.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: R56, N157, or H161. In some cases, the Cas13 protein comprises in HEPN domain 1 one or more mutations of an amino acid corresponding to the following amino acids in HEPN domain 1 of PbCas13b: R56, N157, or H161. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: R1068, N1069, or H1073. In some cases, the Cas13 protein comprises in HEPN domain 2 one or more mutations of an amino acid corresponding to the following amino acids in HEPN domain 2 of PbCas13b: R1068, N1069, or H1073.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: K393, R402, N482, T405, H407, N486, K484, N480, H452, N455, or K457. In some cases, the Cas13 protein comprises in the LID domain one or more mutations of an amino acid corresponding to the following amino acids in the LID domain of PbCas13b: K393, R402, N482, T405, H407, N486, K484, N480, H452, N455, or K457. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: K393, R402, N482, H407, N486, K484, N480, H452, N455, or K457. In some cases, the Cas13 protein comprises in the LID domain one or more mutations of an amino acid corresponding to the following amino acids in the LID domain of PbCas13b: K393, R402, N482, H407, N486, K484, N480, H452, N455, or K457.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: T405, H407, S658, N653, A656, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, K741, K393, R402, or N482. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: H407, S658, N653, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, K741, K393, R402, or N482.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: S658, N653, A656, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, or K741. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: S658, N653, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, or K741.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: N486, K484, N480, H452, N455, or K457. In some cases, the Cas13 protein comprises in the LID domain one or more mutations of an amino acid corresponding to the following amino acids in the LID domain of PbCas13b: N486, K484, N480, H452, N455, or K457. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: K393, R402, N482, N486, K484, N480, H452, N455, or K457. In some cases, the Cas13 protein comprises in the LID domain one or more mutations of an amino acid corresponding to the following amino acids in the LID domain of PbCas13b: K393, R402, N482, N486, K484, N480, H452, N455, or K457.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: S658, N653, A656, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, V795, A796, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, K741, K393, R402, or N482. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of PbCas13b: S658, N653, K655, N652, H567, N455, H500, K871, K857, K870, W842, E873, R877, K846, R874, R762, R791, G566, K590, R638, H452, S757, N756, N486, K484, N480, K457, K741, K393, R402, or N482.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K943, or R1041. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53 or Y164.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K943 or R1041. In some cases, the Cas13 protein comprises in a HEPN domain one or more mutations of an amino acid corresponding to the following amino acids in a HEPN domain of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K943, or R1041. In some cases, the Cas13 protein comprises in HEPN domain 1 one or more mutations of an amino acid corresponding to the following amino acids in HEPN domain 1 of Prevotella buccae Cas13b (PbCas13b): R53 or Y164. In some cases, the Cas13 protein comprises in HEPN domain 2 one or more mutations of an amino acid corresponding to the following amino acids in HEPN domain 2 of Prevotella buccae Cas13b (PbCas13b): K943 or R1041.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K943, R1041, R56, N157, H161, R1068, N1069, or H1073.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, Y164, R56, N157, or H161. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K943, R1041, R1068, N1069, or H1073. In some cases, the Cas13 protein comprises in a HEPN domain one or more mutations of an amino acid corresponding to the following amino acids in a HEPN domain of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K943, R1041, R56, N157, H161, R1068, N1069, or H1073. In some cases, the Cas13 protein comprises in HEPN domain 1 one or more mutations of an amino acid corresponding to the following amino acids in HEPN domain 1 of Prevotella buccae Cas13b (PbCas13b): R53, Y164, R56, N157, or H161. In some cases, the Cas13 protein comprises in HEPN domain 2 one or more mutations of an amino acid corresponding to the following amino acids in HEPN domain 2 of Prevotella buccae Cas13b (PbCas13b): K943, R1041, R1068, N1069, or H1073.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K183, K193, K943, or R1041. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K183, or K193. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K943 or R1041. In some cases, the Cas13 protein comprises in a HEPN domain one or more mutations of an amino acid corresponding to the following amino acids in a HEPN domain of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K183, K193, K943, or R1041.

In some cases, the Cas13 protein comprises in HEPN domain 1 one or more mutations of an amino acid corresponding to the following amino acids in HEPN domain 1 of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K183, or K193. In some cases, the Cas13 protein comprises in HEPN domain 2 one or more mutations of an amino acid corresponding to the following amino acids in HEPN domain 2 of Prevotella buccae Cas13b (PbCas13b): K943 or R1041. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K183, K193, K943, R1041, R56, N157, H161, R1068, N1069, or H1073. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K183, K193, R56, N157, or H161. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K943, R1041, R1068, N1069, or H1073. In some cases, the Cas13 protein comprises in a HEPN domain one or more mutations of an amino acid corresponding to the following amino acids in a HEPN domain of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K183, K193, K943, R1041, R56, N157, H161, R1068, N1069, or H1073. In some cases, the Cas13 protein comprises in HEPN domain 1 one or more mutations of an amino acid corresponding to the following amino acids in HEPN domain 1 of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K183, K193, R56, N157, or H161.

In some cases, the Cas13 protein comprises in HEPN domain 2 one or more mutations of an amino acid corresponding to the following amino acids in HEPN domain 2 of Prevotella buccae Cas13b (PbCas13b): K943, R1041, R1068, N1069, or H1073. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K183 or K193. In some cases, the Cas13 protein comprises in HEPN domain 1 one or more mutations of an amino acid corresponding to the following amino acids in HEPN domain 1 of Prevotella buccae Cas13b (PbCas13b): K183 or K193.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K943, or R1041. In some cases, the Cas13 protein comprises in a HEPN domain one or more mutations of an amino acid corresponding to the following amino acids in a HEPN domain of Prevotella buccae Cas13b (PbCas13b): R53, Y164, K943, or R1041. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, K943, or R1041; preferably R53A, R53K, R53D, or R53E; K943A, K943R, K943D, or K943E; or R1041A, R1041K, R1041D, or R1041E.

In some cases, the Cas13 protein comprises in a HEPN domain one or more mutations of an amino acid corresponding to the following amino acids in a HEPN domain of Prevotella buccae Cas13b (PbCas13b): R53, K943, or R1041; preferably R53A, R53K, R53D, or R53E; K943A, K943R, K943D, or K943E; or R1041A, R1041K, R1041D, or R1041E.

In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid Y164 of Prevotella buccae Cas13b (PbCas13b), preferably Y164A, Y164F, or Y164W. In some cases, the Cas13 protein comprises HEPN domain 1 a mutations of an amino acid corresponding to amino acid Y164 HEPN domain 1 of Prevotella buccae Cas13b (PbCas13b), preferably Y164A, Y164F, or Y164W. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): T405, H407, K457, D434, K431, R402, K393, R482, N480, D396, E397, D398, or E399.

In some cases, the Cas13 protein comprises in the LID domain one or more mutations of an amino acid corresponding to the following amino acids in the LID domain of Prevotella buccae Cas13b (PbCas13b): T405, H407, K457, D434, K431, R402, K393, R482, N480, D396, E397, D398, or E399. In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid H407 of Prevotella buccae Cas13b (PbCas13b), preferably H407Y, H407W, or H407F. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R402, K393, R482, N480, D396, E397, D398, or E399. In some cases, the Cas13 protein comprises in the LID domain one or more mutations of an amino acid corresponding to the following amino acids in the LID domain of Prevotella buccae Cas13b (PbCas13b): R402, K393, R482, N480, D396, E397, D398, or E399. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K457, D434, or K431. In some cases, the Cas13 protein comprises in the LID domain one or more mutations of an amino acid corresponding to the following amino acids in the LID domain of Prevotella buccae Cas13b (PbCas13b): K457, D434, or K431.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): H500, K570, K590, N634, R638, N652, N653, K655, S658, K741, K744, N756, S757, R762, R791, K846, K857, K870, R877, R600, K607, K612, R614, K617, K826, K828, K829, R824, R830, Q831, K835, K836, R838, R618, Q646, N647, N653, or N652. In some cases, the Cas13 protein comprises in a helical domain one or more mutations of an amino acid corresponding to the following amino acids in a helical domain of Prevotella buccae Cas13b (PbCas13b): H500, K570, K590, N634, R638, N652, N653, K655, S658, K741, K744, N756, S757, R762, R791, K846, K857, K870, R877, R600, K607, K612, R614, K617, K826, K828, K829, R824, R830, Q831, K835, K836, R838, R618, Q646, N647, N653, or N652. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): H500, K570, N756, S757, R762, R791, K846, K857, K870, R877, K826, K828, K829, R824, R830, Q831, K835, K836, or R838. In some cases, the Cas13 protein comprises in helical domain 1 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 1 of Prevotella buccae Cas13b (PbCas13b): H500, K570, N756, S757, R762, R791, K846, K857, K870, R877, K826, K828, K829, R824, R830, Q831, K835, K836, or R838.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): H500, K570, N756, S757, R762, or R791. In some cases, the Cas13 protein comprises in helical domain 1 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 1 of Prevotella buccae Cas13b (PbCas13b): H500, K570, N756, S757, R762, or R791. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K846, K857, K870, R877, K826, K828, K829, R824, R830, Q831, K835, K836, or R838. In some cases, the Cas13 protein comprises in the helical bridge domain one or more mutations of an amino acid corresponding to the following amino acids in the helical bridge domain of Prevotella buccae Cas13b (PbCas13b): K846, K857, K870, R877, K826, K828, K829, R824, R830, Q831, K835, K836, or R838. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): H500 or K570. In some cases, the Cas13 protein comprises in helical domain 1-2 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 1-2 of Prevotella buccae Cas13b (PbCas13b): H500 or K570.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): N756, S757, R762, R791, K846, K857, K870, R877, K826, K828, K829, R824, R830, Q831, K835, K836, or R838. In some cases, the Cas13 protein comprises in helical domain 1-3 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 1-3 of Prevotella buccae Cas13b (PbCas13b): N756, S757, R762, R791, K846, K857, K870, R877, K826, K828, K829, R824, R830, Q831, K835, K836, or R838. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): N756, S757, R762, or R791. In some cases, the Cas13 protein comprises in helical domain 1-3 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 1-3 of Prevotella buccae Cas13b (PbCas13b): N756, S757, R762, or R791. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): N756, S757, R762, R791, K846, K857, K870, or R877. In some cases, the Cas13 protein comprises in helical domain 1-3 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 1-3 of Prevotella buccae Cas13b (PbCas13b): N756, S757, R762, R791, K846, K857, K870, or R877.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K826, K828, K829, R824, R830, Q831, K835, K836, or R838. In some cases, the Cas13 protein comprises in helical domain 1-3 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 1-3 of Prevotella buccae Cas13b (PbCas13b): K826, K828, K829, R824, R830, Q831, K835, K836, or R838.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K590, N634, R638, N652, N653, K655, S658, K741, K744, R600, K607, K612, R614, K617, R618, Q646, N647, N653, or N652. In some cases, the Cas13 protein comprises in helical domain 2 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 2 of Prevotella buccae Cas13b (PbCas13b): K590, N634, R638, N652, N653, K655, S658, K741, K744, R600, K607, K612, R614, K617, R618, Q646, N647, N653, or N652. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): Q646 or N647. In some cases, the Cas13 protein comprises in helical domain 2 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 2 of Prevotella buccae Cas13b (PbCas13b): Q646 or N647. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): N653 or N652. In some cases, the Cas13 protein comprises in helical domain 2 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 2 of Prevotella buccae Cas13b (PbCas13b): N653 or N652. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K590, N634, R638, N652, N653, K655, S658, K741, or K744. In some cases, the Cas13 protein comprises in helical domain 2 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 2 of Prevotella buccae Cas13b (PbCas13b): K590, N634, R638, N652, N653, K655, S658, K741, or K744. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R600, K607, K612, R614, K617, or R618. In some cases, the Cas13 protein comprises in helical domain 2 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 2 of Prevotella buccae Cas13b (PbCas13b): R600, K607, K612, R614, K617, or R618. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R285, R287, K292, E296, N297, or K294. In some cases, the Cas13 protein comprises in the IDL domain one or more mutations of an amino acid corresponding to the following amino acids in the IDL domain of Prevotella buccae Cas13b (PbCas13b): R285, R287, K292, E296, N297, or K294. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R285, K292, E296, or N297. In some cases, the Cas13 protein comprises in the IDL domain one or more mutations of an amino acid corresponding to the following amino acids in the IDL domain of Prevotella buccae Cas13b (PbCas13b): R285, K292, E296, or N297.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): T405, H500, K570, K590, N634, R638, N652, N653, K655, S658, K741, K744, N756, S757, R762, R791, K846, K857, K870, R877, K183, K193, R600, K607, K612, R614, K617, K826, K828, K829, R824, R830, Q831, K835, K836, R838, R618, D434, K431, R285, R287, K292, E296, N297, Q646, N647, or K294. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R402, K393, N653, N652, R482, N480, D396, E397, D398, or E399. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53, K655, R762, or R1041; preferably R53A or R53D; K655A; R762A; or R1041E or R1041D. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): N297, E296, K292, or R285; preferably N297A, E296A, K292A, or R285A. In some cases, the Cas13 protein comprises in (e.g., the central channel of) the IDL domain one or more mutations of an amino acid corresponding to the following amino acids in (e.g., the central channel of) the IDL domain of Prevotella buccae Cas13b (PbCas13b): N297, E296, K292, or R285; preferably N297A, E296A, K292A, or R285A. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): Q831, K836, R838, N652, N653, R830, K655 or R762; preferably Q831A, K836A, R838A, N652A, N653A, R830A, K655A, or R762A.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): N652, N653, R830, K655 or R762; preferably N652A, N653A, R830A, K655A, or R762A. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K655 or R762; preferably K655A or R762A. In some cases, the Cas13 protein comprises in a helical domain one or more mutations of an amino acid corresponding to the following amino acids in a helical domain of Prevotella buccae Cas13b (PbCas13b): Q831, K836, R838, N652, N653, R830, K655 or R762; preferably Q831A, K836A, R838A, N652A, N653A, R830A, K655A, or R762A. In some cases, the Cas13 protein comprises a helical domain one or more mutations of an amino acid corresponding to the following amino acids a helical domain of Prevotella buccae Cas13b (PbCas13b): N652, N653, R830, K655 or R762; preferably N652A, N653A, R830A, K655A, or R762A.

In some cases, the Cas13 protein comprises in helical domain 2 one or more mutations of an amino acid corresponding to the following amino acids in helical domain 2 of Prevotella buccae Cas13b (PbCas13b): K655 or R762; preferably K655A or R762A. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R614, K607, K193, K183 or R600; preferably R614A, K607A, K193A, K183A or R600A. In some cases, the Cas13 protein comprises in the trans-subunit loop of helical domain 2 one or more mutations of an amino acid corresponding to the following amino acids in the trans-subunit loop of helical domain 2 of Prevotella buccae Cas13b (PbCas13b): Q646 or N647; preferably Q646A or N647A. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): R53 or R1041; preferably R53A or R53D, or R1041E or R1041D. In some cases, the Cas13 protein comprises in a HEPN domain one or more mutations of an amino acid corresponding to the following amino acids in a HEPN domain of Prevotella buccae Cas13b (PbCas13b): R53 or R1041; preferably R53A or R53D, or R1041E or R1041D. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella buccae Cas13b (PbCas13b): K457, D397, E398, D399, E400, T405, H407 or D434; preferably D397A, E398A, D399A, E400A, T405A, H407A, H407W, H407Y, H407F or D434A. In some cases, the Cas13 protein comprises in the LID domain one or more mutations of an amino acid corresponding to the following amino acids in the LID domain of Prevotella buccae Cas13b (PbCas13b): K457, D397, E398, D399, E400, T405, H407 or D434; preferably D397A, E398A, D399A, E400A, T405A, H407A, H407W, H407Y, H407F or D434A.

In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid T405 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid H407 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K457 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid H500 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K570 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K590 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid N634 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid R638 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid N652 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid N653 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K655 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid S658 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K741 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K744 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid N756 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid S757 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid R762 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid R791 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K846 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K857 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K870 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid R877 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K183 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K193 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid R600 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K607 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K612 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid R614 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K617 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K826 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K828 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K829 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid R824 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid R830 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid Q831 of Prevotella buccae Cas13b (PbCas13b).

In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K835 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K836 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid R838 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid R618 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid D434 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K431 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid R53 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K943 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid R1041 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid Y164 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid R285 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid R287 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K292 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid E296 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid N297 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid Q646 of Prevotella buccae Cas13b (PbCas13b).

In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid N647 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid R402 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K393 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid N653 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid N652 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid R482 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid N480 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid D396 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid E397 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid D398 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid E399 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K294 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid E400 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid R56 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid N157 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid H161 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid H452 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid N455 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K484 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid N486 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid G566 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid H567 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid A656 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid V795 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid A796 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid W842 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid K871 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid E873 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid R874 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid R1068 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid N1069 of Prevotella buccae Cas13b (PbCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid H1073 of Prevotella buccae Cas13b (PbCas13b).

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Leptotrichia shahii Cas13a (LshCas13a): R597, N598, H602, R1278, N1279, or H1283. The present disclosure also includes a mutated Cas13 protein comprising one or more mutations of an amino acid corresponding to the following amino acids of Leptotrichia shahii Cas13a (LshCas13a): R597, N598, H602, R1278, N1279, or H1283. In some cases, the Cas13 protein comprises in a HEPN domain one or more mutations of an amino acid corresponding to the following amino acids in a HEPN domain of Leptotrichia shahii Cas13a (LshCas13a): R597, N598, H602, R1278, N1279, or H1283. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Leptotrichia shahii Cas13a (LshCas13a): R597, N598, or H602. In some cases, the Cas13 protein comprises in HEPN domain 1 one or more mutation of an amino acid corresponding to the following amino acids in HEPN domain 1 of Leptotrichia shahii Cas13a (LshCas13a): R597, N598, or H602. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Leptotrichia shahii Cas13a (LshCas13a): R1278, N1279, or H1283. In some cases, the Cas13 protein comprises in HEPN domain 2 one or more mutations of an amino acid corresponding to the following amino acids in HEPN domain 2 of Leptotrichia shahii Cas13a (LshCas13a): R1278, N1279, or H1283. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Porphyromonas gulae Cas13b (PguCas13b): R146, H151, R1116, or H1121. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Porphyromonas gulae Cas13b (PguCas13b): R146, H151, R1116, or H1121. In some cases, the Cas13 protein comprises in a HEPN domain one or more mutations of an amino acid corresponding to the following amino acids in a HEPN domain of Porphyromonas gulae Cas13b (PguCas13b): R146, H151, R1116, or H1121.

In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Porphyromonas gulae Cas13b (PguCas13b): R146 or H151. In some cases, the Cas13 protein comprises in HEPN domain 1 one or more mutations of an amino acid corresponding to the following amino acids in HEPN domain 1 of Porphyromonas gulae Cas13b (PguCas13b): R146 or H151. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Porphyromonas gulae Cas13b (PguCas13b): R1116 or H1121. In some cases, the Cas13 protein comprises in HEPN domain 2 one or more mutations of an amino acid corresponding to the following amino acids in HEPN domain 2 of Porphyromonas gulae Cas13b (PguCas13b): R1116 or H1121. In some cases, the Cas13 protein comprises one or more mutations of an amino acid corresponding to the following amino acids of Prevotella sp. P5-125 Cas13b (PspCas13b): H133 or H1058. The present disclosure also provides a mutated Cas13 protein comprising one or more mutations of an amino acid corresponding to the following amino acids of Prevotella sp. P5-125 Cas13b (PspCas13b): H133 or H1058. In some cases, the Cas13 protein comprises in a HEPN domain one or more mutations of an amino acid corresponding to the following amino acids in a HEPN domain of Prevotella sp. P5-125 Cas13b (PspCas13b): H133 or H1058.

In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid H133 of Prevotella sp. P5-125 Cas13b (PspCas13b). In some cases, the Cas13 protein comprises in HEPN domain 1 a mutation of an amino acid corresponding to amino acid H133 in HEPN domain 1 of Prevotella sp. P5-125 Cas13b (PspCas13b). In some cases, the Cas13 protein comprises a mutation of an amino acid corresponding to amino acid H1058 of Prevotella sp. P5-125 Cas13b (PspCas13b). In some cases, the Cas13 protein comprises in HEPN domain 2 a mutation of an amino acid corresponding to the amino acid H1058 in HEPN domain 2 of Prevotella sp. P5-125 Cas13b (PspCas13b).

The CRISPR-Cas protein herein may comprise one or more amino acids mutated. In some embodiments, the amino acid is mutated to A, P, or V, preferably A. In some embodiments, the amino acid is mutated to a hydrophobic amino acid. In some embodiments, the amino acid is mutated to an aromatic amino acid. In some embodiments, the amino acid is mutated to a charged amino acid. In some embodiments, the amino acid is mutated to a positively charged amino acid. In some embodiments, the amino acid is mutated to a negatively charged amino acid. In some embodiments, the amino acid is mutated to a polar amino acid. In some embodiments, the amino acid is mutated to an aliphatic amino acid.

The present disclosure also provides for methods of altering activity of CRISPR-Cas proteins. In some examples, such methods comprise identifying one or more candidate amino acids in the Cas13 protein based on a three-dimensional structure of at least a portion of the Cas 13 protein, wherein the one or more candidate amino acids interact with a guide RNA that forms a complex with the Cas13 protein; or are in a HEPN active site, an inter-domain linker domain, or a bridge helix domain of the Cas13 protein; and mutating the one or more candidate amino acids thereby generating a mutated Cas13 protein, wherein activity the mutated Cas13 protein is different than the Cas13 protein.

Destabilized Cas13 and Fusion Proteins

In certain embodiments, the effector protein according to the invention as described herein is associated with or fused to a destabilization domain (DD). In some embodiments, the DD is ER50. A corresponding stabilizing ligand for this DD is, in some embodiments, 4HT. As such, in some embodiments, one of the at least one DDs is ER50 and a stabilizing ligand therefor is 4HT or CMP8. In some embodiments, the DD is DHFR50. A corresponding stabilizing ligand for this DD is, in some embodiments, TMP. As such, in some embodiments, one of the at least one DDs is DHFR50 and a stabilizing ligand therefor is TMP. In some embodiments, the DD is ER50. A corresponding stabilizing ligand for this DD is, in some embodiments, CMP8. CMP8 may therefore be an alternative stabilizing ligand to 4HT in the ER50 system. While it may be possible that CMP8 and 4HT can/should be used in a competitive matter, some cell types may be more susceptible to one or the other of these two ligands, and from this disclosure and the knowledge in the art the skilled person can use CMP8 and/or 4HT.

In some embodiments, one or two DDs may be fused to the N-terminal end of the Cas13 with one or two DDs fused to the C-terminal of the Cas13. In some embodiments, the at least two DDs are associated with the Cas13 and the DDs are the same DD, i.e. the DDs are homologous. Thus, both (or two or more) of the DDs could be ER50 DDs. This is preferred in some embodiments. Alternatively, both (or two or more) of the DDs could be DHFR50 DDs. This is also preferred in some embodiments. In some embodiments, the at least two DDs are associated with the Cas13 and the DDs are different DDs, i.e. the DDs are heterologous. Thus, one of the DDS could be ER50 while one or more of the DDs or any other DDs could be DHFR50. Having two or more DDs which are heterologous may be advantageous as it would provide a greater level of degradation control. A tandem fusion of more than one DD at the N or C-term may enhance degradation; and such a tandem fusion can be, for example ER50-ER50-Cas13 or DHFR-DHFR-Cas13 It is envisaged that high levels of degradation would occur in the absence of either stabilizing ligand, intermediate levels of degradation would occur in the absence of one stabilizing ligand and the presence of the other (or another) stabilizing ligand, while low levels of degradation would occur in the presence of both (or two of more) of the stabilizing ligands. Control may also be imparted by having an N-terminal ER50 DD and a C-terminal DHFR50 DD.

In some embodiments, the fusion of the Cas13 with the DD comprises a linker between the DD and the Cas13. In some embodiments, the linker is a GlySer linker. In some embodiments, the DD-Cas13 further comprises at least one Nuclear Export Signal (NES). In some embodiments, the DD-Cas13 comprises two or more NESs. In some embodiments, the DD-Cas13 comprises at least one Nuclear Localization Signal (NLS). This may be in addition to an NES. In some embodiments, the Cas13 comprises or consists essentially of or consists of a localization (nuclear import or export) signal as, or as part of, the linker between the Cas13 and the DD. HA or Flag tags are also within the ambit of the invention as linkers. Applicants use NLS and/or NES as linker and also use Glycine Serine linkers as short as GS up to (GGGGS)3.

Destabilizing domains have general utility to confer instability to a wide range of proteins; see, e.g., Miyazaki, J Am Chem Soc. Mar. 7, 2012; 134(9): 3942-3945, incorporated herein by reference. CMP8 or 4-hydroxytamoxifen can be destabilizing domains. More generally, A temperature-sensitive mutant of mammalian DHFR (DHFRts), a destabilizing residue by the N-end rule, was found to be stable at a permissive temperature but unstable at 37° C. The addition of methotrexate, a high-affinity ligand for mammalian DHFR, to cells expressing DHFRts inhibited degradation of the protein partially. This was an important demonstration that a small molecule ligand can stabilize a protein otherwise targeted for degradation in cells. A rapamycin derivative was used to stabilize an unstable mutant of the FRB domain of mTOR (FRB*) and restore the function of the fused kinase, GSK-3β.6,7 This system demonstrated that ligand-dependent stability represented an attractive strategy to regulate the function of a specific protein in a complex biological environment. A system to control protein activity can involve the DD becoming functional when the ubiquitin complementation occurs by rapamycin induced dimerization of FK506-binding protein and FKBP12. Mutants of human FKBP12 or ecDHFR protein can be engineered to be metabolically unstable in the absence of their high-affinity ligands, Shield-1 or trimethoprim (TMP), respectively. These mutants are some of the possible destabilizing domains (DDs) useful in the practice of the invention and instability of a DD as a fusion with a Cas13 confers to the Cas13 degradation of the entire fusion protein by the proteasome. Shield-1 and TMP bind to and stabilize the DD in a dose-dependent manner. The estrogen receptor ligand binding domain (ERLBD, residues 305-549 of ERS1) can also be engineered as a destabilizing domain. Since the estrogen receptor signaling pathway is involved in a variety of diseases such as breast cancer, the pathway has been widely studied and numerous agonist and antagonists of estrogen receptor have been developed. Thus, compatible pairs of ERLBD and drugs are known. There are ligands that bind to mutant but not wild-type forms of the ERLBD. By using one of these mutant domains encoding three mutations (L384M, M421G, G521R)12, it is possible to regulate the stability of an ERLBD-derived DD using a ligand that does not perturb endogenous estrogen-sensitive networks. An additional mutation (Y537S) can be introduced to further destabilize the ERLBD and to configure it as a potential DD candidate. This tetra-mutant is an advantageous DD development. The mutant ERLBD can be fused to a Cas13 and its stability can be regulated or perturbed using a ligand, whereby the Cas13 has a DD. Another DD can be a 12-kDa (107-amino-acid) tag based on a mutated FKBP protein, stabilized by Shield1 ligand; see, e.g., Nature Methods 5, (2008). For instance a DD can be a modified FK506 binding protein 12 (FKBP12) that binds to and is reversibly stabilized by a synthetic, biologically inert small molecule, Shield-1; see, e.g., Banaszynski L A, Chen L C, Maynard-Smith L A, Ooi A G, Wandless T J. A rapid, reversible, and tunable method to regulate protein function in living cells using synthetic small molecules. Cell. 2006; 126:995-1004; Banaszynski L A, Sellmyer M A, Contag C H, Wandless T J, Thorne S H. Chemical control of protein stability and function in living mice. Nat Med. 2008; 14:1123-1127; Maynard-Smith L A, Chen L C, Banaszynski L A, Ooi A G, Wandless T J. A directed approach for engineering conditional protein stability using biologically silent small molecules. The Journal of biological chemistry. 2007; 282:24866-24872; and Rodriguez, Chem Biol. Mar. 23, 2012; 19(3): 391-398—all of which are incorporated herein by reference and may be employed in the practice of the invention in selected a DD to associate with a Cas13 in the practice of this invention. As can be seen, the knowledge in the art includes a number of DDs, and the DD can be associated with, e.g., fused to, advantageously with a linker, to a Cas13, whereby the DD can be stabilized in the presence of a ligand and when there is the absence thereof the DD can become destabilized, whereby the Cas13 is entirely destabilized, or the DD can be stabilized in the absence of a ligand and when the ligand is present the DD can become destabilized; the DD allows the Cas13 and hence the CRISPR-Cas13 complex or system to be regulated or controlled—turned on or off so to speak, to thereby provide means for regulation or control of the system, e.g., in an in vivo or in vitro environment. For instance, when a protein of interest is expressed as a fusion with the DD tag, it is destabilized and rapidly degraded in the cell, e.g., by proteasomes. Thus, absence of stabilizing ligand leads to a D associated Cas being degraded. When a new DD is fused to a protein of interest, its instability is conferred to the protein of interest, resulting in the rapid degradation of the entire fusion protein. Peak activity for Cas is sometimes beneficial to reduce off-target effects. Thus, short bursts of high activity are preferred. The present invention is able to provide such peaks. In some senses the system is inducible. In some other senses, the system repressed in the absence of stabilizing ligand and de-repressed in the presence of stabilizing ligand.

Dead Cas Proteins

In certain embodiments, the effector protein herein is a catalytically inactive or dead Cas protein. In some cases, the effector protein (CRISPR enzyme; Cas13; effector protein) according to the invention as described herein is a catalytically inactive or dead Cas13 effector protein (dCas13). In some cases, a dead Cas protein, e.g., a dead Cas13 protein has nickase activity. In some embodiments, the dCas13 effector comprises mutations in the nuclease domain. In some embodiments, the dCas13 effector protein has been truncated. In some cases, the dead Cas proteins may be fused with a deaminase herein, e.g., an adenosine deaminase.

To reduce the size of a fusion protein of the Cas13 effector and the one or more functional domains, the C-terminus of the Cas13 effector can be truncated while still maintaining its RNA binding function. For example, at least 20 amino acids, at least 40 amino acids, at least 50 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 150 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 250 amino acids, at least 260 amino acids, or at least 300 amino acids, or at least 350 amino acids, or up to 120 amino acids, or up to 140 amino acids, or up to 160 amino acids, or up to 180 amino acids, or up to 200 amino acids, or up to 250 amino acids, or up to 300 amino acids, or up to 350 amino acids, or up to 400 amino acids, may be truncated at the C-terminus of the Cas13 effector. Specific examples of Cas13 truncations include C-terminal 4984-1090, C-terminal 41026-1090, and C-terminal 41053-1090, C-terminal 4934-1090, C-terminal 4884-1090, C-terminal 4834-1090, C-terminal 4784-1090, and C-terminal 4734-1090, wherein amino acid positions correspond to amino acid positions of Prevotella sp. P5-125 Cas13b protein. The skilled person will understand that similar truncations can be designed for other Cas13b orthologues, or other Cas13 types or subtypes, such as Cas13a, Cas13c, or Cas13d. In some cases, the truncated Cas13b is encoded by nt 1-984 of Prevotella sp. P5-125 Cas13b or the corresponding nt of a Cas13b orthologue or homologue. Examples of Cas13 truncations also include C-terminal Δ 795-1095, wherein amino acid positions correspond to amino acid positions of Riemerella anatipestifer Cas13b protein. Examples of Cas13 truncations further include C-terminal Δ 875-1175, C-terminal 895-1175, C-terminal Δ 915-1175, C-terminal Δ 935-1175, C-terminal Δ 955-1175, C-terminal 975-1175, C-terminal Δ 995-1175, C-terminal Δ 1015-1175, C-terminal Δ 1035-1175, C-terminal Δ 1055-1175, C-terminal Δ 1075-1175, C-terminal Δ 1095-1175, C-terminal Δ 1115-1175, C-terminal Δ 1135-1175, C-terminal Δ 1155-1175, wherein amino acid positions correspond to amino acid positions of Porphyromonas gulae Cas13b protein.

In some embodiments, the N-terminus of the Cas13 effector protein may be truncated. For example, at least 20 amino acids, at least 40 amino acids, at least 50 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 150 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 250 amino acids, at least 260 amino acids, or at least 300 amino acids, or at least 350 amino acids, or up to 120 amino acids, or up to 140 amino acids, or up to 160 amino acids, or up to 180 amino acids, or up to 200 amino acids, or up to 250 amino acids, or up to 300 amino acids, or up to 350 amino acids, or up to 400 amino acids, may be truncated at the N-terminus of the Cas13 effector. Examples of Cas13 truncations include N-terminal Δ41-125, N-terminal Δ 1-88, or N-terminal Δ1-72, wherein amino acid positions of the truncations correspond to amino acid positions of Prevotella sp. P5-125 Cas13b protein.

In some embodiments, both the N- and the C-termini of the Cas13 effector protein may be truncated. For example, at least 20 amino acids may be truncated at the C-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the N-terminus of the Cas13 effector. For example, at least 40 amino acids may be truncated at the C-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the N-terminus of the Cas13 effector. For example, at least 60 amino acids may be truncated at the C-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the N-terminus of the Cas13 effector. For example, at least 80 amino acids may be truncated at the C-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the N-terminus of the Cas13 effector. For example, at least 100 amino acids may be truncated at the C-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the N-terminus of the Cas13 effector. For example, at least 120 amino acids may be truncated at the C-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the N-terminus of the Cas13 effector. For example, at least 140 amino acids may be truncated at the C-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the N-terminus of the Cas13 effector. For example, at least 160 amino acids may be truncated at the C-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the N-terminus of the Cas13 effector. For example, at least 180 amino acids may be truncated at the C-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the N-terminus of the Cas13 effector. For example, at least 200 amino acids may be truncated at the C-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the N-terminus of the Cas13 effector. For example, at least 220 amino acids may be truncated at the C-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the N-terminus of the Cas13 effector. For example, at least 240 amino acids may be truncated at the C-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the N-terminus of the Cas13 effector. For example, at least 260 amino acids may be truncated at the C-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the N-terminus of the Cas13 effector. For example, at least 280 amino acids may be truncated at the C-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the N-terminus of the Cas13 effector. For example, at least 300 amino acids may be truncated at the C-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the N-terminus of the Cas13 effector. For example, at least 350 amino acids may be truncated at the C-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the N-terminus of the Cas13 effector. For example, at least 20 amino acids may be truncated at the N-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the C-terminus of the Cas13 effector. For example, at least 40 amino acids may be truncated at the N-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the C-terminus of the Cas13 effector. For example, at least 60 amino acids may be truncated at the N-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the C-terminus of the Cas13 effector. For example, at least 80 amino acids may be truncated at the N-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the C-terminus of the Cas13 effector. For example, at least 100 amino acids may be truncated at the N-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the C-terminus of the Cas13 effector. For example, at least 120 amino acids may be truncated at the N-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the C-terminus of the Cas13 effector. For example, at least 140 amino acids may be truncated at the N-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the C-terminus of the Cas13 effector. For example, at least 160 amino acids may be truncated at the N-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the C-terminus of the Cas13 effector. For example, at least 180 amino acids may be truncated at the N-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the C-terminus of the Cas13 effector. For example, at least 200 amino acids may be truncated at the N-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the C-terminus of the Cas13 effector. For example, at least 220 amino acids may be truncated at the N-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the C-terminus of the Cas13 effector. For example, at least 240 amino acids may be truncated at the N-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the C-terminus of the Cas13 effector. For example, at least 260 amino acids may be truncated at the N-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the C-terminus of the Cas13 effector. For example, at least 280 amino acids may be truncated at the N-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the C-terminus of the Cas13 effector. For example, at least 300 amino acids may be truncated at the N-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the C-terminus of the Cas13 effector. For example, at least 350 amino acids may be truncated at the N-terminus of the Cas13 effector, and at least 20 amino acids, at least 40 amino acids, at least 60 amino acids, at least 80 amino acids, at least 100 amino acids, at least 120 amino acids, at least 140 amino acids, at least 160 amino acids, at least 180 amino acids, at least 200 amino acids, at least 220 amino acids, at least 240 amino acids, at least 260 amino acids, at least 300 amino acids, or at least 350 amino acids may be truncated at the C-terminus of the Cas13 effector.

Split Proteins

It is noted that in this context, and more generally for the various applications as described herein, the use of a split version of the RNA targeting effector protein can be envisaged. Indeed, this may not only allow increased specificity but may also be advantageous for delivery. The Cas13 is split in the sense that the two parts of the Cas13 enzyme substantially comprise a functioning Cas13. Ideally, the split should always be so that the catalytic domain(s) are unaffected. That Cas13 may function as a nuclease or it may be a dead-Cas13 which is essentially an RNA-binding protein with very little or no catalytic activity, due to typically mutation(s) in its catalytic domains.

Each half of the split Cas13 may be fused to a dimerization partner. By means of example, and without limitation, employing rapamycin sensitive dimerization domains, allows to generate a chemically inducible split Cas13 for temporal control of Cas13 activity. Cas13 can thus be rendered chemically inducible by being split into two fragments and that rapamycin-sensitive dimerization domains may be used for controlled reassembly of the Cas13. The two parts of the split Cas13 can be thought of as the N′ terminal part and the C′ terminal part of the split Cas13. The fusion is typically at the split point of the Cas13. In other words, the C′ terminal of the N′ terminal part of the split Cas13 is fused to one of the dimer halves, whilst the N′ terminal of the C′ terminal part is fused to the other dimer half.

The Cas13 does not have to be split in the sense that the break is newly created. The split point is typically designed in silico and cloned into the constructs. Together, the two parts of the split Cas13, the N′ terminal and C′ terminal parts, form a full Cas13, comprising preferably at least 70% or more of the wildtype amino acids (or nucleotides encoding them), preferably at least 80% or more, preferably at least 90% or more, preferably at least 95% or more, and most preferably at least 99% or more of the wildtype amino acids (or nucleotides encoding them). Some trimming may be possible, and mutants are envisaged. Non-functional domains may be removed entirely. What is important is that the two parts may be brought together and that the desired Cas13 function is restored or reconstituted. The dimer may be a homodimer or a heterodimer.

In certain embodiments, the Cas13 effector as described herein may be used for mutation-specific, or allele-specific targeting, such as. for mutation-specific, or allele-specific knockdown.

The RNA targeting effector protein can moreover be fused to another functional RNase domain, such as a non-specific RNase or Argonaute 2, which acts in synergy to increase the RNase activity or to ensure further degradation of the message.

Modulating Cas13 Effector Proteins

The invention provides accessory proteins that modulate CRISPR protein function. In certain embodiments, the accessory protein modulates catalytic activity of a CRISPR protein. In an embodiment of the invention an accessory protein modulates targeted, or sequence specific, nuclease activity. In an embodiment of the invention, an accessory protein modulates collateral nuclease activity. In an embodiment of the invention, an accessory protein modulates binding to a target nucleic acid.

According to the invention, the nuclease activity to be modulated can be directed against nucleic acids comprising or consisting of RNA, including without limitation mRNA, miRNA, siRNA and nucleic acids comprising cleavable RNA linkages along with nucleotide analogs. In an embodiment of the invention, the nuclease activity to be modulated can be directed against nucleic acids comprising or consisting of DNA, including without limitation nucleic acids comprising cleavable DNA linkages and nucleic acid analogs.

In an embodiment of the invention, an accessory protein enhances an activity of a CRISPR protein. In certain such embodiments, the accessory protein comprises a HEPN domain and enhances RNA cleavage. In certain embodiments, the accessory protein inhibits an activity of a CRISPR protein. In certain such embodiments, the accessory protein comprises an inactivated HEPN domain or lacks an HEPN domain altogether.

According to the invention, naturally occurring accessory proteins of Type VI CRISPR systems comprise small proteins encoded at or near a CRISPR locus that function to modify an activity of a CRISPR protein. In general, a CRISPR locus can be identified as comprising a putative CRISPR array and/or encoding a putative CRISPR effector protein. In an embodiment, an effector protein can be from 800 to 2000 amino acids, or from 900 to 1800 amino acids, or from 950 to 1300 amino acids. In an embodiment, an accessory protein can be encoded within 25 kb, or within 20 kb or within 15 kb, or within 10 kb of a putative CRISPR effector protein or array, or from 2 kb to 10 kb from a putative CRISPR effector protein or array.

In an embodiment of the invention, an accessory protein is from 50 to 300 amino acids, or from 100 to 300 amino acids or from 150 to 250 amino acids or about 200 amino acids. Non-limiting examples of accessory proteins include the csx27 and csx28 proteins identified herein.

Identification and use of a CRISPR accessory protein of the invention is independent of CRISPR effector protein classification. Accessory proteins of the invention can be found in association with or engineered to function with a variety of CRISPR effector proteins. Examples of accessory proteins identified and used herein are representative of CRISPR effector proteins generally. It is understood that CRISPR effector protein classification may involve homology, feature location (e.g., location of REC domains, NUC domains, HEPN sequences), nucleic acid target (e.g. DNA or RNA), absence or presence of tracr RNA, location of guide/spacer sequence 5′ or 3′ of a direct repeat, or other criteria. In embodiments of the invention, accessory protein identification and use transcend such classifications.

In type VI CRISPR-Cas systems that target RNA, the Cas proteins usually comprise two conserved HEPN domains which are involved in RNA cleavage. In certain embodiments, the Cas protein processes crRNA to generate mature crRNA. The guide sequence of the crRNA recognizes target RNA with a complementary sequence and the Cas protein degrades the target strand. More particularly, in certain embodiments, upon target binding, the Cas protein undergoes a structural rearrangement that brings two HEPN domains together to form an active HEPN catalytic site and the target RNA is then cleaved. The location of the catalytic site near the surface of the Cas protein allows non-specific collateral ssRNA cleavage.

In certain embodiments, accessory proteins are instrumental in increasing or reducing target and/or collateral RNA cleavage. Without being bound by theory, an accessory protein that activates CRISPR activity (e.g., a csx28 protein or ortholog or variant comprising a HEPN domain) can be envisioned as capable of interacting with a Cas protein and combining its HEPN domain with a HEPN domain of the Cas protein to form an active HEPN catalytic site, whereas an inhibitory accessory protein (e.g. csx27 with lacks an HEPN domain) can be envisioned as capable of interacting with a Cas protein and reducing or blocking a conformation of the Cas protein that would bring together two HEPN domains.

According to the invention, in certain embodiments, enhancing activity of a Type VI Cas protein or complex thereof comprises contacting the Type VI Cas protein or complex thereof with an accessory protein from the same organism that activates the Cas protein. In other embodiments, enhancing activity of a Type VI Cas protein of complex thereof comprises contacting the Type VI Cas protein or complex thereof with an activator accessory protein from a different organism within the same subclass (e.g., Type VI-b). In other embodiments, enhancing activity of a Type VI Cas protein or complex thereof comprises contacting the Type VI Cas protein or complex thereof with an accessory protein not within the subclass (e.g., a Type VI Cas protein other than Type VI-b with a Type VI-b accessory protein or vice-versa).

According to the invention, in certain embodiments, repressing activity of a Type VI Cas protein or complex thereof comprises contacting the Type VI Cas protein or complex thereof with an accessory protein from the same organism that represses the Cas protein. In other embodiments, repressing activity of a Type VI Cas protein or complex thereof comprises contacting the Type VI Cas protein or complex thereof with a repressor accessory protein from a different organism within the same subclass (e.g., Type VI-b). In other embodiments, repressing activity of a Type VI Cas protein or complex thereof comprises contacting the Type VI Cas protein or complex thereof with a repressor accessory protein not within the subclass (e.g., a Type VI Cas protein other than Type VI-b with a Type VI-b repressor accessory protein or vice-versa).

In certain embodiments where the Type VI Cas protein and the Type VI accessory protein are from the same organism, the two proteins will function together in an engineered CRISPR system. In certain embodiments, it will be desirable to alter the function of the engineered CRISPR system, for example by modifying either or both of the proteins or their expression. In embodiments where the Type VI Cas protein and the Type VI accessory protein are from different organisms which may be within the same class or different classes, the proteins may function together in an engineered CRISPR system but it will often be desired or necessary to modify either or both of the proteins to function together.

Accordingly, in certain embodiments of the invention either or both of a Cas protein and an accessory protein may be modified to adjust aspects of protein-protein interactions between the Cas protein and accessory protein. In certain embodiments, either or both of a Cas protein and an accessory protein may be modified to adjust aspects of protein-nucleic acid interactions. Ways to adjust protein-protein interactions and protein-nucleic acid interaction include without limitation, fitting molecular surfaces, polar interactions, hydrogen bonds, and modulating van der Waals interactions. In certain embodiments, adjusting protein-protein interactions or protein-nucleic acid binding comprises increasing or decreasing binding interactions. In certain embodiments, adjusting protein-protein interactions or protein-nucleic acid binding comprises modifications that favor or disfavor a conformation of the protein or nucleic acid.

By “fitting”, is meant determining including by automatic, or semi-automatic means, interactions between one or more atoms of a Cas13 protein (and optionally at least one atoms of a Cas13 accessory protein), or between one or more atoms of a Cas13 protein and one or more atoms of a nucleic acid, (or optionally between one or more atoms of a Cas13 accessory protein and a nucleic acid), and calculating the extent to which such interactions are stable. Interactions include attraction and repulsion, brought about by charge, steric considerations and the like.

The three-dimensional structure of Type VI CRISPR protein or complex thereof (and/or a Type VI CRISPR accessory protein or complex thereof in the context of Cas13b) provides in the context of the instant invention an additional tool for identifying additional mutations in orthologs of Cas13. The crystal structure can also be basis for the design of new and specific Cas13s (and optionally Cas13 accessory proteins). Various computer-based methods for fitting are described further. Binding interactions of Cas13s (and optionally accessory proteins), and nucleic acids can be examined through the use of computer modeling using a docking program. Docking programs are known; for example GRAM, DOCK or AUTODOCK (see Walters et al. Drug Discovery Today, vol. 3, no. 4 (1998), 160-178, and Dunbrack et al. Folding and Design 2 (1997), 27-42). This procedure can include computer fitting to ascertain how well the shape and the chemical structure of the binding partners. Computer-assisted, manual examination of the active site or binding site of a Type VI system may be performed. Programs such as GRID (P. Goodford, J. Med. Chem, 1985, 28, 849-57)—a program that determines probable interaction sites between molecules with various functional groups—may also be used to analyze the active site or binding site to predict partial structures of binding compounds. Computer programs can be employed to estimate the attraction, repulsion or steric hindrance of the two binding partners, e.g., components of a Type VI CRISPR system, or a nucleic acid molecule and a component of a Type VI CRISPR system.

Amino acid substitutions may be made on the basis of differences or similarities in amino acid properties (such as polarity, charge, solubility, hydrophobicity, hydrophilicity, and/or the amphipathic nature of the residues) and it is therefore useful to group amino acids together in functional groups. Amino acids may be grouped together based on the properties of their side chains alone. In comparing orthologs, there are likely to be residues conserved for structural or catalytic reasons. These sets may be described in the form of a Venn diagram (Livingstone C. D. and Barton G. J. (1993) “Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation” Comput. Appl. Biosci. 9: 745-756) (Taylor W.R. (1986) “The classification of amino acid conservation” J. Theor. Biol. 119; 205-218). Conservative substitutions may be made, for example according to the table below which describes a generally accepted Venn diagram grouping of amino acids (see Table 7 below).

TABLE 7 Set Sub-set Hydrophobic F W Y H K M I L V A G C Aromatic F W Y H (SEQ ID NO: 240) Aliphatic I L V Polar W Y H K R E D C S T N Q Charged H K R E D (SEQ ID NO: 241) Positively charged H K R Negatively charged E D Small V C A G S P T N D Tiny A G S (SEQ ID NO: 242)

In an engineered Cas13 system, modification may comprise modification of one or more amino acid residues of the Cas13 protein (and/or may comprise modification of one or more amino acid residues of the Cas13 accessory protein in the case of Cas13b).

In an engineered Cas13 system, modification may comprise modification of one or more amino acid residues located in a region which comprises residues which are positively charged in the unmodified Cas13 protein (and/or Cas13 accessory protein).

In an engineered Cas13 system, modification may comprise modification of one or more amino acid residues which are positively charged in the unmodified Cas13 protein (and/or Cas13 accessory protein).

In an engineered Cas13 system, modification may comprise modification of one or more amino acid residues which are not positively charged in the unmodified Cas13 protein (and/or Cas13 accessory protein).

The modification may comprise modification of one or more amino acid residues which are uncharged in the unmodified Cas13 protein (and/or Cas13 accessory protein).

The modification may comprise modification of one or more amino acid residues which are negatively charged in the unmodified Cas13 protein (and/or Cas13 accessory protein).

The modification may comprise modification of one or more amino acid residues which are hydrophobic in the unmodified Cas13 protein (and/or Cas13 accessory protein).

The modification may comprise modification of one or more amino acid residues which are polar in the unmodified Cas13 protein (and/or Cas13 accessory protein).

The modification may comprise substitution of a hydrophobic amino acid or polar amino acid with a charged amino acid, which can be a negatively charged or positively charged amino acid. The modification may comprise substitution of a negatively charged amino acid with a positively charged or polar or hydrophobic amino acid. The modification may comprise substitution of a positively charged amino acid with a negatively charged or polar or hydrophobic amino acid.

Embodiments of the invention include sequences (both polynucleotide or polypeptide) which may comprise homologous substitution (substitution and replacement are both used herein to mean the interchange of an existing amino acid residue or nucleotide, with an alternative residue or nucleotide) that may occur i.e., like-for-like substitution in the case of amino acids such as basic for basic, acidic for acidic, polar for polar, etc. Non-homologous substitution may also occur i.e., from one class of residue to another or alternatively involving the inclusion of unnatural amino acids such as ornithine (hereinafter referred to as Z), diaminobutyric acid ornithine (hereinafter referred to as B), norleucine ornithine (hereinafter referred to as O), pyridylalanine, thienylalanine, naphthylalanine and phenylglycine. Variant amino acid sequences may include suitable spacer groups that may be inserted between any two amino acid residues of the sequence including alkyl groups such as methyl, ethyl or propyl groups in addition to amino acid spacers such as glycine or β-alanine residues. A further form of variation, which involves the presence of one or more amino acid residues in peptoid form, may be well understood by those skilled in the art. For the avoidance of doubt, “the peptoid form” is used to refer to variant amino acid residues wherein the α-carbon substituent group is on the residue's nitrogen atom rather than the α-carbon. Processes for preparing peptides in the peptoid form are known in the art, for example Simon R J et al., PNAS (1992) 89(20), 9367-9371 and Horwell D C, Trends Biotechnol. (1995) 13(4), 132-134.

Homology modelling: Corresponding residues in other Cas13 orthologs can be identified by the methods of Zhang et al., 2012 (Nature; 490(7421): 556-60) and Chen et al., 2015 (PLoS Comput Biol; 11(5): e1004248)—a computational protein-protein interaction (PPI) method to predict interactions mediated by domain-motif interfaces. PrePPI (Predicting PPI), a structure based PPI prediction method, combines structural evidence with non-structural evidence using a Bayesian statistical framework. The method involves taking a pair a query proteins and using structural alignment to identify structural representatives that correspond to either their experimentally determined structures or homology models. Structural alignment is further used to identify both close and remote structural neighbors by considering global and local geometric relationships. Whenever two neighbors of the structural representatives form a complex reported in the Protein Data Bank, this defines a template for modelling the interaction between the two query proteins. Models of a complex are created by superimposing the representative structures on their corresponding structural neighbor in the template. This approach is in Dey et al., 2013 (Prot Sci; 22: 359-66).

Collateral Activity

Collateral activity was recently leveraged for a highly sensitive and specific nucleic acid detection platform termed SHERLOCK that is useful for many clinical diagnoses (Gootenberg, J. S. et al. Nucleic acid detection with CRISPR-Cas13a/C2c2. Science 356, 438-442 (2017)).

According to the invention, engineered CRISPR-Cas systems are optimized for RNA endonuclease activity and can be expressed in mammalian cells and targeted to effectively knock down reporter molecules or transcripts in cells.

The collateral effect of engineered CRISPR-Cas with isothermal amplification provides a CRISPR-based diagnostic providing rapid DNA or RNA detection with high sensitivity and single-base mismatch specificity. The CRISPR-Cas-based molecular detection platform is used to detect specific strains of virus, distinguish pathogenic bacteria, genotype human DNA, and identify cell-free tumor DNA mutations. Furthermore, reaction reagents can be lyophilized for cold-chain independence and long-term storage, and readily reconstituted on paper for field applications.

The ability to rapidly detect nucleic acids with high sensitivity and single-base specificity on a portable platform may aid in disease diagnosis and monitoring, epidemiology, and general laboratory tasks. Although methods exist for detecting nucleic acids, they have trade-offs among sensitivity, specificity, simplicity, cost, and speed.

Microbial Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and CRISPR-associated (CRISPR-Cas) adaptive immune systems contain programmable endonucleases that can be leveraged for CRISPR-based diagnostics (CRISPR-Dx). CRISPR-Cas can be reprogrammed with CRISPR RNAs (crRNAs) to provide a platform for specific DNA sensing. Upon recognition of its DNA target, activated CRISPR-Cas engages in “collateral” cleavage of nearby non-targeted nucleic acids (i.e., RNA and/or ssDNA). This crRNA-programmed collateral cleavage activity allows CRISPR-Cas to detect the presence of a specific DNA in vivo by triggering programmed cell death or by nonspecific degradation of labelled RNA or ssDNA. Here is described an in vitro nucleic acid detection platform with high sensitivity based on nucleic acid amplification and CRISPR-Cas-mediated collateral cleavage of a commercial reporter RNA, allowing for real-time detection of the target.

Conservation of non-specific ss DNA and RNA directed proteins will inevitably lead to further and, potentially, improved CRISPR proteins that demonstrate collateral cleavage and may be used for detection and offer greater breadth for multiplexed detection of nucleic acid targets in amplified and highly sensitive, especially SHERLOCK, diagnostic systems

RNA-Based Masking

In certain example embodiments, an RNA-based masking construct suppresses generation of a detectable positive signal, or the RNA-based masking construct suppresses generation of a detectable positive signal by masking the detectable positive signal, or generating a detectable negative signal instead, or the RNA-based masking construct comprises a silencing RNA that suppresses generation of a gene product encoded by a reporting construct, wherein the gene product generates the detectable positive signal when expressed.

In another example embodiment, the RNA-based masking construct is a ribozyme that generates a negative detectable signal, and wherein the positive detectable signal is generated when the ribozyme is deactivated. In one example embodiment, the ribozyme converts a substrate to a first color and wherein the substrate converts to a second color when the ribozyme is deactivated. In another example embodiment, the RNA-based masking agent is an aptamer that sequesters an enzyme, wherein the enzyme generates a detectable signal upon release from the aptamer by acting upon a substrate, or the aptamer sequesters a pair of agents that when released from the aptamers combine to generate a detectable signal.

In another example embodiment, the RNA-based masking construct comprises an RNA oligonucleotide to which are attached a detectable ligand oligonucleotide and a masking component. In certain example embodiments, the detectable ligand is a fluorophore and the masking component is a quencher molecule.

In another aspect, the invention provides a method for detecting target nucleic acid (e.g.,) RNAs in samples, comprising: distributing a sample or set of samples into one or more individual discrete volumes, the individual discrete volumes comprising a CRISPR system comprising an effector protein, one or more guide RNAs, an RNA-based masking construct; incubating the sample or set of samples under conditions sufficient to allow binding of the one or more guide RNAs to one or more target molecules; activating the CRISPR effector protein via binding of the one or more guide RNAs to the one or more target molecules, wherein activating the CRISPR effector protein results in modification of the RNA-based masking construct such that a detectable positive signal is produced; and detecting the detectable positive signal, wherein detection of the detectable positive signal indicates a presence of one or more target molecules in the sample.

In some embodiments, the method for detecting a target nucleic acid in a sample comprising: contacting a sample with: an engineered CRISPR-Cas protein; at least one guide polynucleotide comprising a guide sequence capable of binding to the target nucleic acid and designed to form a complex with the engineered CRISPR-Cas; and a RNA-based masking construct comprising a non-target sequence; wherein the engineered CRISPR-Cas protein exhibits collateral RNase activity and cleaves the non-target sequence of the detection construct; and detecting a signal from cleavage of the non-target sequence, thereby detecting the target nucleic acid in the sample. In some embodiments, the method further comprises contacting the sample with reagents for amplifying the target nucleic acid. In some embodiments, the reagents for amplifying comprises isothermal amplification reaction reagents. In some embodiments, the isothermal amplification reagents comprise nucleic-acid sequence-based amplification, recombinase polymerase amplification, loop-mediated isothermal amplification, strand displacement amplification, helicase-dependent amplification, or nicking enzyme amplification reagents.

In some embodiments, the target nucleic acid is DNA molecule and the method further comprises contacting the target DNA molecule with a primer comprising an RNA polymerase site and RNA polymerase.

In some embodiments, the masking construct: suppresses generation of a detectable positive signal until the masking construct cleaved or deactivated, or masks a detectable positive signal or generates a detectable negative signal until the masking construct cleaved or deactivated.

In some embodiments, the masking construct comprises: a. a silencing RNA that suppresses generation of a gene product encoded by a reporting construct, wherein the gene product generates the detectable positive signal when expressed; b. a ribozyme that generates the negative detectable signal, and wherein the positive detectable signal is generated when the ribozyme is deactivated; or c. a ribozyme that converts a substrate to a first color and wherein the substrate converts to a second color when the ribozyme is deactivated; d. an aptamer and/or comprises a polynucleotide-tethered inhibitor; e. a polynucleotide to which a detectable ligand and a masking component are attached; f. a nanoparticle held in aggregate by bridge molecules, wherein at least a portion of the bridge molecules comprises a polynucleotide, and wherein the solution undergoes a color shift when the nanoparticle is disbursed in solution; g. a quantum dot or fluorophore linked to one or more quencher molecules by a linking molecule, wherein at least a portion of the linking molecule comprises a polynucleotide; h. a polynucleotide in complex with an intercalating agent, wherein the intercalating agent changes absorbance upon cleavage of the polynucleotide; or l. two fluorophores tethered by a polynucleotide that undergo a shift in fluorescence when released from the polynucleotide. In some embodiments, the aptamer a. comprises a polynucleotide-tethered inhibitor that sequesters an enzyme, wherein the enzyme generates a detectable signal upon release from the aptamer or polynucleotide-tethered inhibitor by acting upon a substrate; or b. is an inhibitory aptamer that inhibits an enzyme and prevents the enzyme from catalyzing generation of a detectable signal from a substrate or wherein the polynucleotide-tethered inhibitor inhibits an enzyme and prevents the enzyme from catalyzing generation of a detectable signal from a substrate; or c. sequesters a pair of agents that when released from the aptamers combine to generate a detectable signal. In some embodiments, the nanoparticle is a colloidal metal. In some embodiments, the at least one guide polynucleotide comprises a mismatch. In some embodiments, the mismatch is up- or downstream of a single nucleotide variation on the one or more guide sequences.

In another aspect, the invention provides a method for detecting peptides in samples, comprising: distributing a sample or set of samples into a set of individual discrete volumes, the individual discrete volumes comprising peptide detection aptamers, a CRISPR system comprising an effector protein, one or more guide RNAs, an RNA-based masking construct, wherein the peptide detection aptamers comprising a masked RNA polymerase site and configured to bind one or more target molecules; incubating the sample or set of samples under conditions sufficient to allow binding of the peptide detection aptamers to the one or more target molecules, wherein binding of the aptamer to a corresponding target molecule exposes the RNA polymerase binding site resulting in RNA synthesis of a trigger RNA; activating the CRISPR effector protein via binding of the one or more guide RNAs to the trigger RNA, wherein activating the CRISPR effector protein results in modification of the RNA-based masking construct such that a detectable positive signal is produced; and detecting the detectable positive signal, wherein detection of the detectable positive signal indicates a presence of one or more target molecules in a sample.

In certain example embodiments, the one or more guide RNAs are designed to bind to one or more target molecules that are diagnostic for a disease state. In certain other example embodiments, the disease state is an infection, an organ disease, a blood disease, an immune system disease, a cancer, a brain and nervous system disease, an endocrine disease, a pregnancy or childbirth-related disease, an inherited disease, or an environmentally-acquired disease, cancer, or a fungal infection, a bacterial infection, a parasite infection, or a viral infection.

In certain example embodiments, the RNA-based masking construct suppresses generation of a detectable positive signal, or the RNA-based masking construct suppresses generation of a detectable positive signal by masking the detectable positive signal, or generating a detectable negative signal instead, or the RNA-based masking construct comprises a silencing RNA that suppresses generation of a gene product encoded by a reporting construct, wherein the gene product generates the detectable positive signal when expressed, or the RNA-based masking construct is a ribozyme that generates the negative detectable signal, and wherein the positive detectable signal is generated when the ribozyme is inactivated. In other example embodiments, the ribozyme converts a substrate to a first state and wherein the substrate converts to a second state when the ribozyme is inactivated, or the RNA-based masking agent is an aptamer, or the aptamer sequesters an enzyme, wherein the enzyme generates a detectable signal upon release from the aptamer by acting upon a substrate, or the aptamer sequesters a pair of agents that when released from the aptamers combine to generate a detectable signal. In still further embodiments, the RNA-based masking construct comprises an RNA oligonucleotide with a detectable ligand on a first end of the RNA oligonucleotide and a masking component on a second end of the RNA oligonucleotide, or the detectable ligand is a fluorophore and the masking component is a quencher molecule.

Base Editing

The present disclosure also provides for a base editing system. In general, such a system may comprise a deaminase (e.g., an adenosine deaminase or cytidine deaminase) fused with a Cas protein. The Cas protein may be a dead Cas protein or a Cas nickase protein. In certain examples, the system comprises a mutated form of an adenosine deaminase fused with a dead CRISPR-Cas or CRISPR-Cas nickase. The mutated form of the adenosine deaminase may have both adenosine deaminase and cytidine deaminase activities.

In certain example embodiments, a dCas13b can be fused with an adenosine deaminase or cytidine deaminase for base editing purposes. In some cases, the dCas13b is dCas13b-t1, dCas13b-t2, or dCas13b-t3.

In one aspect, the present disclosure provides an engineered adenosine deaminase. The engineered adenosine deaminase may comprise one or more mutations herein. In some embodiments, the engineered adenosine deaminase has cytidine deaminase activity. In certain examples, the engineered adenosine deaminase has both cytidine deaminase activity and adenosine deaminase. FIG. 101 shows an example system and method of programmable cytidine to uridine conversion according to some embodiments herein. In some cases, the modifications by base editors herein may be used for targeting post-translational signaling or catalysis. FIG. 102 shows examples approaches.

Adenosine Deaminase

The term “adenosine deaminase” or “adenosine deaminase protein” as used herein refers to a protein, a polypeptide, or one or more functional domain(s) of a protein or a polypeptide that is capable of catalyzing a hydrolytic deamination reaction that converts an adenine (or an adenine moiety of a molecule) to a hypoxanthine (or a hypoxanthine moiety of a molecule), as shown below. In some embodiments, the adenine-containing molecule is an adenosine (A), and the hypoxanthine-containing molecule is an inosine (I). The adenine-containing molecule can be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).

According to the present disclosure, adenosine deaminases that can be used in connection with the present disclosure include, but are not limited to, members of the enzyme family known as adenosine deaminases that act on RNA (ADARs), members of the enzyme family known as adenosine deaminases that act on tRNA (ADATs), and other adenosine deaminase domain-containing (ADAD) family members. According to the present disclosure, the adenosine deaminase is capable of targeting adenine in a RNA/DNA and RNA duplexes. Indeed, Zheng et al. (Nucleic Acids Res. 2017, 45(6): 3369-3377) demonstrate that ADARs can carry out adenosine to inosine editing reactions on RNA/DNA and RNA/RNA duplexes. In particular embodiments, the adenosine deaminase has been modified to increase its ability to edit DNA in a RNA/DNA heteroduplex of in an RNA duplex as detailed herein below.

In some embodiments, the adenosine deaminase is derived from one or more metazoa species, including but not limited to, mammals, birds, frogs, squids, fish, flies and worms. In some embodiments, the adenosine deaminase is a human, squid or Drosophila adenosine deaminase.

In some embodiments, the adenosine deaminase is a human ADAR, including hADAR1, hADAR2, hADAR3. In some embodiments, the adenosine deaminase is a Caenorhabditis elegans ADAR protein, including ADR-1 and ADR-2. In some embodiments, the adenosine deaminase is a Drosophila ADAR protein, including dAdar. In some embodiments, the adenosine deaminase is a squid Loligo pealeii ADAR protein, including sqADAR2a and sqADAR2b. In some embodiments, the adenosine deaminase is a human ADAT protein. In some embodiments, the adenosine deaminase is a Drosophila ADAT protein. In some embodiments, the adenosine deaminase is a human ADAD protein, including TENR (hADAD1) and TENRL (hADAD2).

In some embodiments, the adenosine deaminase is a TadA protein such as E. coli TadA. See Kim et al., Biochemistry 45:6407-6416 (2006); Wolf et al., EMBO J. 21:3841-3851 (2002). In some embodiments, the adenosine deaminase is mouse ADA. See Grunebaum et al., Curr. Opin. Allergy Clin. Immunol. 13:630-638 (2013). In some embodiments, the adenosine deaminase is human ADAT2. See Fukui et al., J. Nucleic Acids 2010:260512 (2010). In some embodiments, the deaminase (e.g., adenosine or cytidine deaminase) is one or more of those described in Cox et al., Science. 2017 Nov. 24; 358(6366): 1019-1027; Komore et al., Nature. 2016 May 19; 533(7603):420-4; and Gaudelli et al., Nature. 2017 Nov. 23; 551(7681):464-471.

In some embodiments, the adenosine deaminase protein recognizes and converts one or more target adenosine residue(s) in a double-stranded nucleic acid substrate into inosine residues (s). In some embodiments, the double-stranded nucleic acid substrate is a RNA-DNA hybrid duplex. In some embodiments, the adenosine deaminase protein recognizes a binding window on the double-stranded substrate. In some embodiments, the binding window contains at least one target adenosine residue(s). In some embodiments, the binding window is in the range of about 3 bp to about 100 bp. In some embodiments, the binding window is in the range of about 5 bp to about 50 bp. In some embodiments, the binding window is in the range of about 10 bp to about 30 bp. In some embodiments, the binding window is about 1 bp, 2 bp, 3 bp, 5 bp, 7 bp, 10 bp, 15 bp, 20 bp, 25 bp, 30 bp, 40 bp, 45 bp, 50 bp, 55 bp, 60 bp, 65 bp, 70 bp, 75 bp, 80 bp, 85 bp, 90 bp, 95 bp, or 100 bp.

In some embodiments, the adenosine deaminase protein comprises one or more deaminase domains. Not intended to be bound by a particular theory, it is contemplated that the deaminase domain functions to recognize and convert one or more target adenosine (A) residue(s) contained in a double-stranded nucleic acid substrate into inosine (I) residue(s). In some embodiments, the deaminase domain comprises an active center. In some embodiments, the active center comprises a zinc ion. In some embodiments, during the A-to-I editing process, base pairing at the target adenosine residue is disrupted, and the target adenosine residue is “flipped” out of the double helix to become accessible by the adenosine deaminase. In some embodiments, amino acid residues in or near the active center interact with one or more nucleotide(s) 5′ to a target adenosine residue. In some embodiments, amino acid residues in or near the active center interact with one or more nucleotide(s) 3′ to a target adenosine residue. In some embodiments, amino acid residues in or near the active center further interact with the nucleotide complementary to the target adenosine residue on the opposite strand. In some embodiments, the amino acid residues form hydrogen bonds with the 2′ hydroxyl group of the nucleotides.

In some embodiments, the adenosine deaminase comprises human ADAR2 full protein (hADAR2) or the deaminase domain thereof (hADAR2-D). In some embodiments, the adenosine deaminase is an ADAR family member that is homologous to hADAR2 or hADAR2-D.

Particularly, in some embodiments, the homologous ADAR protein is human ADAR1 (hADAR1) or the deaminase domain thereof (hADAR1-D). In some embodiments, glycine 1007 of hADAR1-D corresponds to glycine 487 hADAR2-D, and glutamic Acid 1008 of hADAR1-D corresponds to glutamic acid 488 of hADAR2-D.

In some embodiments, the adenosine deaminase comprises the wild-type amino acid sequence of hADAR2-D. In some embodiments, the adenosine deaminase comprises one or more mutations in the hADAR2-D sequence, such that the editing efficiency, and/or substrate editing preference of hADAR2-D is changed according to specific needs. The engineered adenosine deaminase may be fused with a Cas protein, e.g., Cas9, Cas 12 (e.g., Cas12a, Cas12b, Cas12c, Cas12d, etc.), Cas13 (e.g., Cas13a, Cas13b (such as Cas13b-t1, Cas13b-t2, Cas13b-t3), Cas13c, Cas13d, etc.), Cas14, CasX, CasY, or an engineered form of the Cas protein (e.g., an invective, dead form, a nickase form). In some examples, provided herein include an engineered adenosine deaminase fused with a dead Cas13b protein or Cas13 nickase.

Certain mutations of hADAR1 and hADAR2 proteins have been described in Kuttan et al., Proc Natl Acad Sci USA. (2012) 109(48):E3295-304; Want et al. ACS Chem Biol. (2015) 10(11):2512-9; and Zheng et al. Nucleic Acids Res. (2017) 45(6):3369-337, each of which is incorporated herein by reference in its entirety.

In some embodiments, the adenosine deaminase comprises a mutation at glycine336 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the glycine residue at position 336 is replaced by an aspartic acid residue (G336D).

In some embodiments, the adenosine deaminase comprises a mutation at Glycine487 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the glycine residue at position 487 is replaced by a non-polar amino acid residue with relatively small side chains. For example, in some embodiments, the glycine residue at position 487 is replaced by an alanine residue (G487A). In some embodiments, the glycine residue at position 487 is replaced by a valine residue (G487V). In some embodiments, the glycine residue at position 487 is replaced by an amino acid residue with relatively large side chains. In some embodiments, the glycine residue at position 487 is replaced by a arginine residue (G487R). In some embodiments, the glycine residue at position 487 is replaced by a lysine residue (G487K). In some embodiments, the glycine residue at position 487 is replaced by a tryptophan residue (G487W). In some embodiments, the glycine residue at position 487 is replaced by a tyrosine residue (G487Y).

In some embodiments, the adenosine deaminase comprises a mutation at glutamic acid488 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the glutamic acid residue at position 488 is replaced by a glutamine residue (E488Q). In some embodiments, the glutamic acid residue at position 488 is replaced by a histidine residue (E488H). In some embodiments, the glutamic acid residue at position 488 is replace by an arginine residue (E488R). In some embodiments, the glutamic acid residue at position 488 is replace by a lysine residue (E488K). In some embodiments, the glutamic acid residue at position 488 is replace by an asparagine residue (E488N). In some embodiments, the glutamic acid residue at position 488 is replace by an alanine residue (E488A). In some embodiments, the glutamic acid residue at position 488 is replace by a Methionine residue (E488M). In some embodiments, the glutamic acid residue at position 488 is replace by a serine residue (E488S). In some embodiments, the glutamic acid residue at position 488 is replace by a phenylalanine residue (E488F). In some embodiments, the glutamic acid residue at position 488 is replace by a lysine residue (E488L). In some embodiments, the glutamic acid residue at position 488 is replace by a tryptophan residue (E488W).

In some embodiments, the adenosine deaminase comprises a mutation at threonine490 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the threonine residue at position 490 is replaced by a cysteine residue (T490C). In some embodiments, the threonine residue at position 490 is replaced by a serine residue (T490S). In some embodiments, the threonine residue at position 490 is replaced by an alanine residue (T490A). In some embodiments, the threonine residue at position 490 is replaced by a phenylalanine residue (T490F). In some embodiments, the threonine residue at position 490 is replaced by a tyrosine residue (T490Y). In some embodiments, the threonine residue at position 490 is replaced by a serine residue (T490R). In some embodiments, the threonine residue at position 490 is replaced by an alanine residue (T490K). In some embodiments, the threonine residue at position 490 is replaced by a phenylalanine residue (T490P). In some embodiments, the threonine residue at position 490 is replaced by a tyrosine residue (T490E).

In some embodiments, the adenosine deaminase comprises a mutation at valine493 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the valine residue at position 493 is replaced by an alanine residue (V493A). In some embodiments, the valine residue at position 493 is replaced by a serine residue (V493S). In some embodiments, the valine residue at position 493 is replaced by a threonine residue (V493T). In some embodiments, the valine residue at position 493 is replaced by an arginine residue (V493R). In some embodiments, the valine residue at position 493 is replaced by an aspartic acid residue (V493D). In some embodiments, the valine residue at position 493 is replaced by a proline residue (V493P). In some embodiments, the valine residue at position 493 is replaced by a glycine residue (V493G).

In some embodiments, the adenosine deaminase comprises a mutation at alanine589 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the alanine residue at position 589 is replaced by a valine residue (A589V).

In some embodiments, the adenosine deaminase comprises a mutation at asparagine597 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the asparagine residue at position 597 is replaced by a lysine residue (N597K). In some embodiments, the adenosine deaminase comprises a mutation at position 597 of the amino acid sequence, which has an asparagine residue in the wild type sequence. In some embodiments, the asparagine residue at position 597 is replaced by an arginine residue (N597R). In some embodiments, the adenosine deaminase comprises a mutation at position 597 of the amino acid sequence, which has an asparagine residue in the wild type sequence. In some embodiments, the asparagine residue at position 597 is replaced by an alanine residue (N597A). In some embodiments, the adenosine deaminase comprises a mutation at position 597 of the amino acid sequence, which has an asparagine residue in the wild type sequence. In some embodiments, the asparagine residue at position 597 is replaced by a glutamic acid residue (N597E). In some embodiments, the adenosine deaminase comprises a mutation at position 597 of the amino acid sequence, which has an asparagine residue in the wild type sequence. In some embodiments, the asparagine residue at position 597 is replaced by a histidine residue (N597H). In some embodiments, the adenosine deaminase comprises a mutation at position 597 of the amino acid sequence, which has an asparagine residue in the wild type sequence. In some embodiments, the asparagine residue at position 597 is replaced by a glycine residue (N597G). In some embodiments, the adenosine deaminase comprises a mutation at position 597 of the amino acid sequence, which has an asparagine residue in the wild type sequence. In some embodiments, the asparagine residue at position 597 is replaced by a tyrosine residue (N597Y). In some embodiments, the asparagine residue at position 597 is replaced by a phenylalanine residue (N597F). In some embodiments, the adenosine deaminase comprises mutation N597I. In some embodiments, the adenosine deaminase comprises mutation N597L. In some embodiments, the adenosine deaminase comprises mutation N597V. In some embodiments, the adenosine deaminase comprises mutation N597M. In some embodiments, the adenosine deaminase comprises mutation N597C. In some embodiments, the adenosine deaminase comprises mutation N597P. In some embodiments, the adenosine deaminase comprises mutation N597T. In some embodiments, the adenosine deaminase comprises mutation N597S. In some embodiments, the adenosine deaminase comprises mutation N597W. In some embodiments, the adenosine deaminase comprises mutation N597Q. In some embodiments, the adenosine deaminase comprises mutation N597D. In certain example embodiments, the mutations at N597 described above are further made in the context of an E488Q background

In some embodiments, the adenosine deaminase comprises a mutation at serine599 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the serine residue at position 599 is replaced by a threonine residue (S599T).

In some embodiments, the adenosine deaminase comprises a mutation at asparagine613 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the asparagine residue at position 613 is replaced by a lysine residue (N613K). In some embodiments, the adenosine deaminase comprises a mutation at position 613 of the amino acid sequence, which has an asparagine residue in the wild type sequence. In some embodiments, the asparagine residue at position 613 is replaced by an arginine residue (N613R). In some embodiments, the adenosine deaminase comprises a mutation at position 613 of the amino acid sequence, which has an asparagine residue in the wild type sequence. In some embodiments, the asparagine residue at position 613 is replaced by an alanine residue (N613A) In some embodiments, the adenosine deaminase comprises a mutation at position 613 of the amino acid sequence, which has an asparagine residue in the wild type sequence. In some embodiments, the asparagine residue at position 613 is replaced by a glutamic acid residue (N613E). In some embodiments, the adenosine deaminase comprises mutation N613I. In some embodiments, the adenosine deaminase comprises mutation N613L. In some embodiments, the adenosine deaminase comprises mutation N613V. In some embodiments, the adenosine deaminase comprises mutation N613F. In some embodiments, the adenosine deaminase comprises mutation N613M. In some embodiments, the adenosine deaminase comprises mutation N613C. In some embodiments, the adenosine deaminase comprises mutation N613G. In some embodiments, the adenosine deaminase comprises mutation N613P. In some embodiments, the adenosine deaminase comprises mutation N613T. In some embodiments, the adenosine deaminase comprises mutation N613S. In some embodiments, the adenosine deaminase comprises mutation N613Y. In some embodiments, the adenosine deaminase comprises mutation N613W. In some embodiments, the adenosine deaminase comprises mutation N613Q. In some embodiments, the adenosine deaminase comprises mutation N613H. In some embodiments, the adenosine deaminase comprises mutation N613D. In some embodiments, the mutations at N613 described above are further made in combination with a E488Q mutation.

In some embodiments, to improve editing efficiency, the adenosine deaminase may comprise one or more of the mutations: G336D, G487A, G487V, E488Q, E488H, E488R, E488N, E488A, E488S, E488M, T490C, T490S, V493T, V493S, V493A, V493R, V493D, V493P, V493G, N597K, N597R, N597A, N597E, N597H, N597G, N597Y, A589V, S599T, N613K, N613R, N613A, N613E, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above.

In some embodiments, to reduce editing efficiency, the adenosine deaminase may comprise one or more of the mutations: E488F, E488L, E488W, T490A, T490F, T490Y, T490R, T490K, T490P, T490E, N597F, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In particular embodiments, it can be of interest to use an adenosine deaminase enzyme with reduced efficacy to reduce off-target effects.

In some embodiments, to reduce off-target effects, the adenosine deaminase comprises one or more of mutations at R348, V351, T375, K376, E396, C451, R455, N473, R474, K475, R477, R481, S486, E488, T490, 5495, R510, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some embodiments, the adenosine deaminase comprises mutation at E488 and one or more additional positions selected from R348, V351, T375, K376, E396, C451, R455, N473, R474, K475, R477, R481, S486, T490, S495, R510. In some embodiments, the adenosine deaminase comprises mutation at T375, and optionally at one or more additional positions. In some embodiments, the adenosine deaminase comprises mutation at N473, and optionally at one or more additional positions. In some embodiments, the adenosine deaminase comprises mutation at V351, and optionally at one or more additional positions. In some embodiments, the adenosine deaminase comprises mutation at E488 and T375, and optionally at one or more additional positions. In some embodiments, the adenosine deaminase comprises mutation at E488 and N473, and optionally at one or more additional positions. In some embodiments, the adenosine deaminase comprises mutation E488 and V351, and optionally at one or more additional positions. In some embodiments, the adenosine deaminase comprises mutation at E488 and one or more of T375, N473, and V351.

In some embodiments, to reduce off-target effects, the adenosine deaminase comprises one or more of mutations selected from R348E, V351L, T375G, T375S, R455G, R455S, R455E, N473D, R474E, K475Q, R477E, R481E, S486T, E488Q, T490A, T490S, S495T, and R510E, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some embodiments, the adenosine deaminase comprises mutation E488Q and one or more additional mutations selected from R348E, V351L, T375G, T375S, R455G, R455S, R455E, N473D, R474E, K475Q, R477E, R481E, S486T, T490A, T490S, S495T, and R510E. In some embodiments, the adenosine deaminase comprises mutation T375G or T375S, and optionally one or more additional mutations. In some embodiments, the adenosine deaminase comprises mutation N473D, and optionally one or more additional mutations. In some embodiments, the adenosine deaminase comprises mutation V351L, and optionally one or more additional mutations. In some embodiments, the adenosine deaminase comprises mutation E488Q, and T375G or T375G, and optionally one or more additional mutations. In some embodiments, the adenosine deaminase comprises mutation E488Q and N473D, and optionally one or more additional mutations. In some embodiments, the adenosine deaminase comprises mutation E488Q and V351L, and optionally one or more additional mutations. In some embodiments, the adenosine deaminase comprises mutation E488Q and one or more of T375G/S, N473D and V351L.

In certain examples, the adenosine deaminase protein or catalytic domain thereof has been modified to comprise a mutation at E488, preferably E488Q, of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein and/or wherein the adenosine deaminase protein or catalytic domain thereof has been modified to comprise a mutation at T375, preferably T375G of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In certain examples, the adenosine deaminase protein or catalytic domain thereof has been modified to comprise a mutation at E1008, preferably E1008Q, of the hADAR1d amino acid sequence, or a corresponding position in a homologous ADAR protein.

Crystal structures of the human ADAR2 deaminase domain bound to duplex RNA reveal a protein loop that binds the RNA on the 5′ side of the modification site. This 5′ binding loop is one contributor to substrate specificity differences between ADAR family members. See Wang et al., Nucleic Acids Res., 44(20):9872-9880 (2016), the content of which is incorporated herein by reference in its entirety. In addition, an ADAR2-specific RNA-binding loop was identified near the enzyme active site. See Mathews et al., Nat. Struct. Mol. Biol., 23(5):426-33 (2016), the content of which is incorporated herein by reference in its entirety. In some embodiments, the adenosine deaminase comprises one or more mutations in the RNA binding loop to improve editing specificity and/or efficiency.

In some embodiments, the adenosine deaminase comprises a mutation at alanine454 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the alanine residue at position 454 is replaced by a serine residue (A454S). In some embodiments, the alanine residue at position 454 is replaced by a cysteine residue (A454C). In some embodiments, the alanine residue at position 454 is replaced by an aspartic acid residue (A454D).

In some embodiments, the adenosine deaminase comprises a mutation at arginine455 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the arginine residue at position 455 is replaced by an alanine residue (R455A). In some embodiments, the arginine residue at position 455 is replaced by a valine residue (R455V). In some embodiments, the arginine residue at position 455 is replaced by a histidine residue (R455H). In some embodiments, the arginine residue at position 455 is replaced by a glycine residue (R455G). In some embodiments, the arginine residue at position 455 is replaced by a serine residue (R455S). In some embodiments, the arginine residue at position 455 is replaced by a glutamic acid residue (R455E). In some embodiments, the adenosine deaminase comprises mutation R455C. In some embodiments, the adenosine deaminase comprises mutation R455I. In some embodiments, the adenosine deaminase comprises mutation R455K. In some embodiments, the adenosine deaminase comprises mutation R455L. In some embodiments, the adenosine deaminase comprises mutation R455M. In some embodiments, the adenosine deaminase comprises mutation R455N. In some embodiments, the adenosine deaminase comprises mutation R455Q. In some embodiments, the adenosine deaminase comprises mutation R455F. In some embodiments, the adenosine deaminase comprises mutation R455W. In some embodiments, the adenosine deaminase comprises mutation R455P. In some embodiments, the adenosine deaminase comprises mutation R455Y. In some embodiments, the adenosine deaminase comprises mutation R455E. In some embodiments, the adenosine deaminase comprises mutation R455D. In some embodiments, the mutations at R455 described above are further made in combination with a E488Q mutation.

In some embodiments, the adenosine deaminase comprises a mutation at isoleucine456 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the isoleucine residue at position 456 is replaced by a valine residue (I456V). In some embodiments, the isoleucine residue at position 456 is replaced by a leucine residue (I456L). In some embodiments, the isoleucine residue at position 456 is replaced by an aspartic acid residue (I456D).

In some embodiments, the adenosine deaminase comprises a mutation at phenylalanine457 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the phenylalanine residue at position 457 is replaced by a tyrosine residue (F457Y). In some embodiments, the phenylalanine residue at position 457 is replaced by an arginine residue (F457R). In some embodiments, the phenylalanine residue at position 457 is replaced by a glutamic acid residue (F457E).

In some embodiments, the adenosine deaminase comprises a mutation at serine458 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the serine residue at position 458 is replaced by a valine residue (S458V). In some embodiments, the serine residue at position 458 is replaced by a phenylalanine residue (S458F). In some embodiments, the serine residue at position 458 is replaced by a proline residue (S458P). In some embodiments, the adenosine deaminase comprises mutation S458I. In some embodiments, the adenosine deaminase comprises mutation S458L. In some embodiments, the adenosine deaminase comprises mutation S458M. In some embodiments, the adenosine deaminase comprises mutation S458C. In some embodiments, the adenosine deaminase comprises mutation S458A. In some embodiments, the adenosine deaminase comprises mutation S458G. In some embodiments, the adenosine deaminase comprises mutation S458T. In some embodiments, the adenosine deaminase comprises mutation S458Y. In some embodiments, the adenosine deaminase comprises mutation S458W. In some embodiments, the adenosine deaminase comprises mutation S458Q. In some embodiments, the adenosine deaminase comprises mutation S458N. In some embodiments, the adenosine deaminase comprises mutation S458H. In some embodiments, the adenosine deaminase comprises mutation S458E. In some embodiments, the adenosine deaminase comprises mutation S458D. In some embodiments, the adenosine deaminase comprises mutation S458K. In some embodiments, the adenosine deaminase comprises mutation S458R. In some embodiments, the mutations at 5458 described above are further made in combination with a E488Q mutation.

In some embodiments, the adenosine deaminase comprises a mutation at proline459 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the proline residue at position 459 is replaced by a cysteine residue (P459C). In some embodiments, the proline residue at position 459 is replaced by a histidine residue (P459H). In some embodiments, the proline residue at position 459 is replaced by a tryptophan residue (P459W).

In some embodiments, the adenosine deaminase comprises a mutation at histidine460 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the histidine residue at position 460 is replaced by an arginine residue (H460R). In some embodiments, the histidine residue at position 460 is replaced by an isoleucine residue (H460I). In some embodiments, the histidine residue at position 460 is replaced by a proline residue (H460P). In some embodiments, the adenosine deaminase comprises mutation H460L. In some embodiments, the adenosine deaminase comprises mutation H460V. In some embodiments, the adenosine deaminase comprises mutation H460F. In some embodiments, the adenosine deaminase comprises mutation H460M. In some embodiments, the adenosine deaminase comprises mutation H460C. In some embodiments, the adenosine deaminase comprises mutation H460A. In some embodiments, the adenosine deaminase comprises mutation H460G. In some embodiments, the adenosine deaminase comprises mutation H460T. In some embodiments, the adenosine deaminase comprises mutation H460S. In some embodiments, the adenosine deaminase comprises mutation H460Y. In some embodiments, the adenosine deaminase comprises mutation H460W. In some embodiments, the adenosine deaminase comprises mutation H460Q. In some embodiments, the adenosine deaminase comprises mutation H460N. In some embodiments, the adenosine deaminase comprises mutation H460E. In some embodiments, the adenosine deaminase comprises mutation H460D. In some embodiments, the adenosine deaminase comprises mutation H460K. In some embodiments, the mutations at H460 described above are further made in combination with a E488Q mutation.

In some embodiments, the adenosine deaminase comprises a mutation at proline462 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the proline residue at position 462 is replaced by a serine residue (P462S). In some embodiments, the proline residue at position 462 is replaced by a tryptophan residue (P462W). In some embodiments, the proline residue at position 462 is replaced by a glutamic acid residue (P462E).

In some embodiments, the adenosine deaminase comprises a mutation at aspartic acid469 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the aspartic acid residue at position 469 is replaced by a glutamine residue (D469Q). In some embodiments, the aspartic acid residue at position 469 is replaced by a serine residue (D469S). In some embodiments, the aspartic acid residue at position 469 is replaced by a tyrosine residue (D469Y).

In some embodiments, the adenosine deaminase comprises a mutation at arginine470 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the arginine residue at position 470 is replaced by an alanine residue (R470A). In some embodiments, the arginine residue at position 470 is replaced by an isoleucine residue (R470I). In some embodiments, the arginine residue at position 470 is replaced by an aspartic acid residue (R470D).

In some embodiments, the adenosine deaminase comprises a mutation at histidine471 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the histidine residue at position 471 is replaced by a lysine residue (H471K). In some embodiments, the histidine residue at position 471 is replaced by a threonine residue (H471T). In some embodiments, the histidine residue at position 471 is replaced by a valine residue (H471V).

In some embodiments, the adenosine deaminase comprises a mutation at proline472 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the proline residue at position 472 is replaced by a lysine residue (P472K). In some embodiments, the proline residue at position 472 is replaced by a threonine residue (P472T). In some embodiments, the proline residue at position 472 is replaced by an aspartic acid residue (P472D).

In some embodiments, the adenosine deaminase comprises a mutation at asparagine473 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the asparagine residue at position 473 is replaced by an arginine residue (N473R). In some embodiments, the asparagine residue at position 473 is replaced by a tryptophan residue (N473W). In some embodiments, the asparagine residue at position 473 is replaced by a proline residue (N473P). In some embodiments, the asparagine residue at position 473 is replaced by an aspartic acid residue (N473D).

In some embodiments, the adenosine deaminase comprises a mutation at arginine 474 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the arginine residue at position 474 is replaced by a lysine residue (R474K). In some embodiments, the arginine residue at position 474 is replaced by a glycine residue (R474G). In some embodiments, the arginine residue at position 474 is replaced by an aspartic acid residue (R474D). In some embodiments, the arginine residue at position 474 is replaced by a glutamic acid residue (R474E).

In some embodiments, the adenosine deaminase comprises a mutation at lysine475 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the lysine residue at position 475 is replaced by a glutamine residue (K475Q). In some embodiments, the lysine residue at position 475 is replaced by an asparagine residue (K475N). In some embodiments, the lysine residue at position 475 is replaced by an aspartic acid residue (K475D).

In some embodiments, the adenosine deaminase comprises a mutation at alanine476 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the alanine residue at position 476 is replaced by a serine residue (A476S). In some embodiments, the alanine residue at position 476 is replaced by an arginine residue (A476R). In some embodiments, the alanine residue at position 476 is replaced by a glutamic acid residue (A476E).

In some embodiments, the adenosine deaminase comprises a mutation at arginine477 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the arginine residue at position 477 is replaced by a lysine residue (R477K). In some embodiments, the arginine residue at position 477 is replaced by a threonine residue (R477T). In some embodiments, the arginine residue at position 477 is replaced by a phenylalanine residue (R477F). In some embodiments, the arginine residue at position 474 is replaced by a glutamic acid residue (R477E).

In some embodiments, the adenosine deaminase comprises a mutation at glycine478 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the glycine residue at position 478 is replaced by an alanine residue (G478A). In some embodiments, the glycine residue at position 478 is replaced by an arginine residue (G478R). In some embodiments, the glycine residue at position 478 is replaced by a tyrosine residue (G478Y). In some embodiments, the adenosine deaminase comprises mutation G478I. In some embodiments, the adenosine deaminase comprises mutation G478L. In some embodiments, the adenosine deaminase comprises mutation G478V. In some embodiments, the adenosine deaminase comprises mutation G478F. In some embodiments, the adenosine deaminase comprises mutation G478M. In some embodiments, the adenosine deaminase comprises mutation G478C. In some embodiments, the adenosine deaminase comprises mutation G478P. In some embodiments, the adenosine deaminase comprises mutation G478T. In some embodiments, the adenosine deaminase comprises mutation G478S. In some embodiments, the adenosine deaminase comprises mutation G478W. In some embodiments, the adenosine deaminase comprises mutation G478Q. In some embodiments, the adenosine deaminase comprises mutation G478N. In some embodiments, the adenosine deaminase comprises mutation G478H. In some embodiments, the adenosine deaminase comprises mutation G478E. In some embodiments, the adenosine deaminase comprises mutation G478D. In some embodiments, the adenosine deaminase comprises mutation G478K. In some embodiments, the mutations at G478 described above are further made in combination with a E488Q mutation.

In some embodiments, the adenosine deaminase comprises a mutation at glutamine479 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the glutamine residue at position 479 is replaced by an asparagine residue (Q479N). In some embodiments, the glutamine residue at position 479 is replaced by a serine residue (Q479S). In some embodiments, the glutamine residue at position 479 is replaced by a proline residue (Q479P).

In some embodiments, the adenosine deaminase comprises a mutation at arginine348 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the arginine residue at position 348 is replaced by an alanine residue (R348A). In some embodiments, the arginine residue at position 348 is replaced by a glutamic acid residue (R348E).

In some embodiments, the adenosine deaminase comprises a mutation at valine351 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the valine residue at position 351 is replaced by a leucine residue (V351L). In some embodiments, the adenosine deaminase comprises mutation V351Y. In some embodiments, the adenosine deaminase comprises mutation V351M. In some embodiments, the adenosine deaminase comprises mutation V351T. In some embodiments, the adenosine deaminase comprises mutation V351G. In some embodiments, the adenosine deaminase comprises mutation V351A. In some embodiments, the adenosine deaminase comprises mutation V351F. In some embodiments, the adenosine deaminase comprises mutation V351E. In some embodiments, the adenosine deaminase comprises mutation V351I. In some embodiments, the adenosine deaminase comprises mutation V351C. In some embodiments, the adenosine deaminase comprises mutation V351H. In some embodiments, the adenosine deaminase comprises mutation V351P. In some embodiments, the adenosine deaminase comprises mutation V351S. In some embodiments, the adenosine deaminase comprises mutation V351K. In some embodiments, the adenosine deaminase comprises mutation V351N. In some embodiments, the adenosine deaminase comprises mutation V351W. In some embodiments, the adenosine deaminase comprises mutation V351Q. In some embodiments, the adenosine deaminase comprises mutation V351D. In some embodiments, the adenosine deaminase comprises mutation V351R. In some embodiments, the mutations at V351 described above are further made in combination with a E488Q mutation.

In some embodiments, the adenosine deaminase comprises a mutation at threonine375 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the threonine residue at position 375 is replaced by a glycine residue (T375G). In some embodiments, the threonine residue at position 375 is replaced by a serine residue (T375S). In some embodiments, the adenosine deaminase comprises mutation T375H. In some embodiments, the adenosine deaminase comprises mutation T375Q. In some embodiments, the adenosine deaminase comprises mutation T375C. In some embodiments, the adenosine deaminase comprises mutation T375N. In some embodiments, the adenosine deaminase comprises mutation T375M. In some embodiments, the adenosine deaminase comprises mutation T375A. In some embodiments, the adenosine deaminase comprises mutation T375W. In some embodiments, the adenosine deaminase comprises mutation T375V. In some embodiments, the adenosine deaminase comprises mutation T375R. In some embodiments, the adenosine deaminase comprises mutation T375E. In some embodiments, the adenosine deaminase comprises mutation T375K. In some embodiments, the adenosine deaminase comprises mutation T375F. In some embodiments, the adenosine deaminase comprises mutation T375I. In some embodiments, the adenosine deaminase comprises mutation T375D. In some embodiments, the adenosine deaminase comprises mutation T375P. In some embodiments, the adenosine deaminase comprises mutation T375L. In some embodiments, the adenosine deaminase comprises mutation T375Y. In some embodiments, the mutations at T375Y described above are further made in combination with an E488Q mutation.

In some embodiments, the adenosine deaminase comprises a mutation at Arg481 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the arginine residue at position 481 is replaced by a glutamic acid residue (R481E).

In some embodiments, the adenosine deaminase comprises a mutation at Ser486 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the serine residue at position 486 is replaced by a threonine residue (S486T).

In some embodiments, the adenosine deaminase comprises a mutation at Thr490 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the threonine residue at position 490 is replaced by an alanine residue (T490A). In some embodiments, the threonine residue at position 490 is replaced by a serine residue (T490S).

In some embodiments, the adenosine deaminase comprises a mutation at Ser495 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the serine residue at position 495 is replaced by a threonine residue (S495T).

In some embodiments, the adenosine deaminase comprises a mutation at Arg510 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the arginine residue at position 510 is replaced by a glutamine residue (R510Q). In some embodiments, the arginine residue at position 510 is replaced by an alanine residue (R510A). In some embodiments, the arginine residue at position 510 is replaced by a glutamic acid residue (R510E).

In some embodiments, the adenosine deaminase comprises a mutation at Gly593 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the glycine residue at position 593 is replaced by an alanine residue (G593A). In some embodiments, the glycine residue at position 593 is replaced by a glutamic acid residue (G593E).

In some embodiments, the adenosine deaminase comprises a mutation at Lys594 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the lysine residue at position 594 is replaced by an alanine residue (K594A).

In some embodiments, the adenosine deaminase comprises a mutation at any one or more of positions A454, R455, 1456, F457, 5458, P459, H460, P462, D469, R470, H471, P472, N473, R474, K475, A476, R477, G478, Q479, R348, R510, G593, K594 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein.

In some embodiments, the adenosine deaminase comprises any one or more of mutations A454S, A454C, A454D, R455A, R455V, R455H, I456V, I456L, I456D, F457Y, F457R, F457E, S458V, S458F, S458P, P459C, P459H, P459W, H460R, H460I, H460P, P462S, P462W, P462E, D469Q, D469S, D469Y, R470A, R470I, R470D, H471K, H471T, H471V, P472K, P472T, P472D, N473R, N473W, N473P, R474K, R474G, R474D, K475Q, K475N, K475D, A476S, A476R, A476E, R477K, R477T, R477F, G478A, G478R, G478Y, Q479N, Q479S, Q479P, R348A, R510Q, R510A, G593A, G593E, K594A of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein.

In some embodiments, the adenosine deaminase comprises a mutation at any one or more of positions T375, V351, G478, 5458, H460 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein, optionally in combination a mutation at E488. In some embodiments, the adenosine deaminase comprises one or more of mutations selected from T375G, T375C, T375H, T375Q, V351M, V351T, V351Y, G478R, S458F, H460I, optionally in combination with E488Q.

In some embodiments, the adenosine deaminase comprises one or more of mutations selected from T375H, T375Q, V351M, V351Y, H460P, optionally in combination with E488Q.

In some embodiments, the adenosine deaminase comprises mutations T375S and S458F, optionally in combination with E488Q.

In some embodiments, the adenosine deaminase comprises a mutation at two or more of positions T375, N473, R474, G478, S458, P459, V351, R455, R455, T490, R348, Q479 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein, optionally in combination a mutation at E488. In some embodiments, the adenosine deaminase comprises two or more of mutations selected from T375G, T375S, N473D, R474E, G478R, S458F, P459W, V351L, R455G, R455S, T490A, R348E, Q479P, optionally in combination with E488Q.

In some embodiments, the adenosine deaminase comprises mutations T375G and V351L. In some embodiments, the adenosine deaminase comprises mutations T375G and R455G. In some embodiments, the adenosine deaminase comprises mutations T375G and R455S. In some embodiments, the adenosine deaminase comprises mutations T375G and T490A. In some embodiments, the adenosine deaminase comprises mutations T375G and R348E. In some embodiments, the adenosine deaminase comprises mutations T375S and V351L. In some embodiments, the adenosine deaminase comprises mutations T375S and R455G. In some embodiments, the adenosine deaminase comprises mutations T375S and R455S. In some embodiments, the adenosine deaminase comprises mutations T375S and T490A. In some embodiments, the adenosine deaminase comprises mutations T375S and R348E. In some embodiments, the adenosine deaminase comprises mutations N473D and V351L. In some embodiments, the adenosine deaminase comprises mutations N473D and R455G. In some embodiments, the adenosine deaminase comprises mutations N473D and R455S. In some embodiments, the adenosine deaminase comprises mutations N473D and T490A. In some embodiments, the adenosine deaminase comprises mutations N473D and R348E. In some embodiments, the adenosine deaminase comprises mutations R474E and V351L. In some embodiments, the adenosine deaminase comprises mutations R474E and R455G. In some embodiments, the adenosine deaminase comprises mutations R474E and R455S. In some embodiments, the adenosine deaminase comprises mutations R474E and T490A. In some embodiments, the adenosine deaminase comprises mutations R474E and R348E. In some embodiments, the adenosine deaminase comprises mutations S458F and T375G. In some embodiments, the adenosine deaminase comprises mutations S458F and T375S. In some embodiments, the adenosine deaminase comprises mutations S458F and N473D. In some embodiments, the adenosine deaminase comprises mutations S458F and R474E. In some embodiments, the adenosine deaminase comprises mutations S458F and G478R. In some embodiments, the adenosine deaminase comprises mutations G478R and T375G. In some embodiments, the adenosine deaminase comprises mutations G478R and T375S. In some embodiments, the adenosine deaminase comprises mutations G478R and N473D. In some embodiments, the adenosine deaminase comprises mutations G478R and R474E. In some embodiments, the adenosine deaminase comprises mutations P459W and T375G. In some embodiments, the adenosine deaminase comprises mutations P459W and T375S. In some embodiments, the adenosine deaminase comprises mutations P459W and N473D. In some embodiments, the adenosine deaminase comprises mutations P459W and R474E. In some embodiments, the adenosine deaminase comprises mutations P459W and G478R. In some embodiments, the adenosine deaminase comprises mutations P459W and S458F. In some embodiments, the adenosine deaminase comprises mutations Q479P and T375G. In some embodiments, the adenosine deaminase comprises mutations Q479P and T375S. In some embodiments, the adenosine deaminase comprises mutations Q479P and N473D. In some embodiments, the adenosine deaminase comprises mutations Q479P and R474E. In some embodiments, the adenosine deaminase comprises mutations Q479P and G478R. In some embodiments, the adenosine deaminase comprises mutations Q479P and S458F. In some embodiments, the adenosine deaminase comprises mutations Q479P and P459W. All mutations described in this paragraph may also further be made in combination with a E488Q mutations.

In some embodiments, the adenosine deaminase comprises a mutation at any one or more of positions K475, Q479, P459, G478, S458 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein, optionally in combination a mutation at E488. In some embodiments, the adenosine deaminase comprises one or more of mutations selected from K475N, Q479N, P459W, G478R, S458P, S458F, optionally in combination with E488Q.

In some embodiments, the adenosine deaminase comprises a mutation at any one or more of positions T375, V351, R455, H460, A476 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein, optionally in combination a mutation at E488. In some embodiments, the adenosine deaminase comprises one or more of mutations selected from T375G, T375C, T375H, T375Q, V351M, V351T, V351Y, R455H, H460P, H460I, A476E, optionally in combination with E488Q.

In certain embodiments, improvement of editing and reduction of off-target modification is achieved by chemical modification of gRNAs. gRNAs which are chemically modified as exemplified in Vogel et al. (2014), Angew Chem Int Ed, 53:6267-6271, doi:10.1002/anie.201402634 (incorporated herein by reference in its entirety) reduce off-target activity and improve on-target efficiency. 2′-O-methyl and phosphothioate modified guide RNAs in general improve editing efficiency in cells.

ADAR has been known to demonstrate a preference for neighboring nucleotides on either side of the edited A (www.nature.com/nsmb/journal/v23/n5/full/nsmb.3203.html, Matthews et al. (2017), Nature Structural Mol Biol, 23(5): 426-433, incorporated herein by reference in its entirety). Accordingly, in certain embodiments, the gRNA, target, and/or ADAR is selected optimized for motif preference.

Intentional mismatches have been demonstrated in vitro to allow for editing of non-preferred motifs (academic.oup.com/nar/article-lookup/doi/10.1093/nar/gku272; Schneider et al (2014), Nucleic Acid Res, 42(10):e87); Fukuda et al. (2017), Scientific Reports, 7, doi:10.1038/srep41478, incorporated herein by reference in its entirety). Accordingly, in certain embodiments, to enhance RNA editing efficiency on non-preferred 5′ or 3′ neighboring bases, intentional mismatches in neighboring bases are introduced.

In some embodiments, the adenosine deaminase may be a tRNA-specific adenosine deaminase or a variant thereof. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: W23L, W23R, R26G, H36L, N37S, P48S, P48T, P48A, I49V, R51L, N72D, L84F, S97C, A106V, D108N, H123Y, G125A, A142N, S146C, D147Y, R152H, R152P, E155V, I156F, K157N, K161T, based on amino acid sequence positions of E. coli TadA, and mutations in a homologous deaminase protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: D108N based on amino acid sequence positions of E. coli TadA, and mutations in a homologous deaminase protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: A106V, D108N, based on amino acid sequence positions of E. coli TadA, and mutations in a homologous deaminase protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: A106V, D108N, D147Y, E155V, based on amino acid sequence positions of E. coli TadA, and mutations in a homologous deaminase protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: A106V, D108N, based on amino acid sequence positions of E. coli TadA, and mutations in a homologous deaminase protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: A106V, D108N, D147Y, E155V, L84F, H123Y, I156F, based on amino acid sequence positions of E. coli TadA, and mutations in a homologous deaminase protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: A106V, D108N, D147Y, E155V, L84F, H123Y, I156F, A142N, based on amino acid sequence positions of E. coli TadA, and mutations in a homologous deaminase protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: A106V, D108N, D147Y, E155V, L84F, H123Y, I156F, H36L, R51L, S146C, K157N, based on amino acid sequence positions of E. coli TadA, and mutations in a homologous deaminase protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: A106V, D108N, D147Y, E155V, L84F, H123Y, I156F, H36L, R51L, S146C, K157N, P48S, based on amino acid sequence positions of E. coli TadA, and mutations in a homologous deaminase protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: A106V, D108N, D147Y, E155V, L84F, H123Y, I156F, H36L, R51L, S146C, K157N, P48S, A142N, based on amino acid sequence positions of E. coli TadA, and mutations in a homologous deaminase protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: A106V, D108N, D147Y, E155V, L84F, H123Y, I156F, H36L, R51L, S146C, K157N, P48S, W23R, P48A, based on amino acid sequence positions of E. coli TadA, and mutations in a homologous deaminase protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: A106V, D108N, D147Y, E155V, L84F, H123Y, I156F, H36L, R51L, S146C, K157N, P48S, W23R, P48A, A142N, based on amino acid sequence positions of E. coli TadA, and mutations in a homologous deaminase protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: A106V, D108N, D147Y, E155V, L84F, H123Y, I156F, H36L, R51L, S146C, K157N, P48S, W23R, P48A, R152P, based on amino acid sequence positions of E. coli TadA, and mutations in a homologous deaminase protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: A106V, D108N, D147Y, E155V, L84F, H123Y, I156F, H36L, R51L, S146C, K157N, P48S, W23R, P48A, R152P, A142N, based on amino acid sequence positions of E. coli TadA, and mutations in a homologous deaminase protein corresponding to the above.

Results suggest that A's opposite C's in the targeting window of the ADAR deaminase domain are preferentially edited over other bases. Additionally, A's base-paired with U's within a few bases of the targeted base show low levels of editing by CRISPR-Cas-ADAR fusions, suggesting that there is flexibility for the enzyme to edit multiple A's. These two observations suggest that multiple A's in the activity window of CRISPR-Cas-ADAR fusions could be specified for editing by mismatching all A's to be edited with C's. Accordingly, in certain embodiments, multiple A:C mismatches in the activity window are designed to create multiple A:I edits. In certain embodiments, to suppress potential off-target editing in the activity window, non-target A's are paired with A's or G's.

The terms “editing specificity” and “editing preference” are used interchangeably herein to refer to the extent of A-to-I editing at a particular adenosine site in a double-stranded substrate. In some embodiment, the substrate editing preference is determined by the 5′ nearest neighbor and/or the 3′ nearest neighbor of the target adenosine residue. In some embodiments, the adenosine deaminase has preference for the 5′ nearest neighbor of the substrate ranked as U>A>C>G (“>” indicates greater preference). In some embodiments, the adenosine deaminase has preference for the 3′ nearest neighbor of the substrate ranked as G>C˜A>U (“>” indicates greater preference; “˜” indicates similar preference). In some embodiments, the adenosine deaminase has preference for the 3′ nearest neighbor of the substrate ranked as G>C>U˜A (“>” indicates greater preference; “˜” indicates similar preference). In some embodiments, the adenosine deaminase has preference for the 3′ nearest neighbor of the substrate ranked as G>C>A>U (“>” indicates greater preference). In some embodiments, the adenosine deaminase has preference for the 3′ nearest neighbor of the substrate ranked as C˜G˜A>U (“>” indicates greater preference; “˜” indicates similar preference). In some embodiments, the adenosine deaminase has preference for a triplet sequence containing the target adenosine residue ranked as TAG>AAG>CAC>AAT>GAA>GAC (“>” indicates greater preference), the center A being the target adenosine residue.

In some embodiments, the substrate editing preference of an adenosine deaminase is affected by the presence or absence of a nucleic acid binding domain in the adenosine deaminase protein. In some embodiments, to modify substrate editing preference, the deaminase domain is connected with a double-strand RNA binding domain (dsRBD) or a double-strand RNA binding motif (dsRBM). In some embodiments, the dsRBD or dsRBM may be derived from an ADAR protein, such as hADAR1 or hADAR2. In some embodiments, a full length ADAR protein that comprises at least one dsRBD and a deaminase domain is used. In some embodiments, the one or more dsRBM or dsRBD is at the N-terminus of the deaminase domain. In other embodiments, the one or more dsRBM or dsRBD is at the C-terminus of the deaminase domain.

In some embodiments, the substrate editing preference of an adenosine deaminase is affected by amino acid residues near or in the active center of the enzyme. In some embodiments, to modify substrate editing preference, the adenosine deaminase may comprise one or more of the mutations: G336D, G487R, G487K, G487W, G487Y, E488Q, E488N, T490A, V493A, V493T, V493S, N597K, N597R, A589V, S599T, N613K, N613R, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above.

Particularly, in some embodiments, to reduce editing specificity, the adenosine deaminase can comprise one or more of mutations E488Q, V493A, N597K, N613K, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some embodiments, to increase editing specificity, the adenosine deaminase can comprise mutation T490A.

In some embodiments, to increase editing preference for target adenosine (A) with an immediate 5′ G, such as substrates comprising the triplet sequence GAC, the center A being the target adenosine residue, the adenosine deaminase can comprise one or more of mutations G336D, E488Q, E488N, V493T, V493S, V493A, A589V, N597K, N597R, S599T, N613K, N613R, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above.

Particularly, in some embodiments, the adenosine deaminase comprises mutation E488Q or a corresponding mutation in a homologous ADAR protein for editing substrates comprising the following triplet sequences: GAC, GAA, GAU, GAG, CAU, AAU, UAC, the center A being the target adenosine residue.

In some embodiments, the adenosine deaminase comprises the wild-type amino acid sequence of hADAR1-D. In some embodiments, the adenosine deaminase comprises one or more mutations in the hADAR1-D sequence, such that the editing efficiency, and/or substrate editing preference of hADAR1-D is changed according to specific needs.

In some embodiments, the adenosine deaminase comprises a mutation at Glycine1007 of the hADAR1-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the glycine residue at position 1007 is replaced by a non-polar amino acid residue with relatively small side chains. For example, in some embodiments, the glycine residue at position 1007 is replaced by an alanine residue (G1007A). In some embodiments, the glycine residue at position 1007 is replaced by a valine residue (G1007V). In some embodiments, the glycine residue at position 1007 is replaced by an amino acid residue with relatively large side chains. In some embodiments, the glycine residue at position 1007 is replaced by an arginine residue (G1007R). In some embodiments, the glycine residue at position 1007 is replaced by a lysine residue (G1007K). In some embodiments, the glycine residue at position 1007 is replaced by a tryptophan residue (G1007W). In some embodiments, the glycine residue at position 1007 is replaced by a tyrosine residue (G1007Y). Additionally, in other embodiments, the glycine residue at position 1007 is replaced by a leucine residue (G1007L). In other embodiments, the glycine residue at position 1007 is replaced by a threonine residue (G1007T). In other embodiments, the glycine residue at position 1007 is replaced by a serine residue (G1007S).

In some embodiments, the adenosine deaminase comprises a mutation at glutamic acid1008 of the hADAR1-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the glutamic acid residue at position 1008 is replaced by a polar amino acid residue having a relatively large side chain. In some embodiments, the glutamic acid residue at position 1008 is replaced by a glutamine residue (E1008Q). In some embodiments, the glutamic acid residue at position 1008 is replaced by a histidine residue (E1008H). In some embodiments, the glutamic acid residue at position 1008 is replaced by an arginine residue (E1008R). In some embodiments, the glutamic acid residue at position 1008 is replaced by a lysine residue (E1008K). In some embodiments, the glutamic acid residue at position 1008 is replaced by a nonpolar or small polar amino acid residue. In some embodiments, the glutamic acid residue at position 1008 is replaced by a phenylalanine residue (E1008F). In some embodiments, the glutamic acid residue at position 1008 is replaced by a tryptophan residue (E1008W). In some embodiments, the glutamic acid residue at position 1008 is replaced by a glycine residue (E1008G). In some embodiments, the glutamic acid residue at position 1008 is replaced by an isoleucine residue (E1008I). In some embodiments, the glutamic acid residue at position 1008 is replaced by a valine residue (E1008V). In some embodiments, the glutamic acid residue at position 1008 is replaced by a proline residue (E1008P). In some embodiments, the glutamic acid residue at position 1008 is replaced by a serine residue (E1008S). In other embodiments, the glutamic acid residue at position 1008 is replaced by an asparagine residue (E1008N). In other embodiments, the glutamic acid residue at position 1008 is replaced by an alanine residue (E1008A). In other embodiments, the glutamic acid residue at position 1008 is replaced by a Methionine residue (E1008M). In some embodiments, the glutamic acid residue at position 1008 is replaced by a leucine residue (E1008L).

In some embodiments, to improve editing efficiency, the adenosine deaminase may comprise one or more of the mutations: E1007S, E1007A, E1007V, E1008Q, E1008R, E1008H, E1008M, E1008N, E1008K, based on amino acid sequence positions of hADAR1-D, and mutations in a homologous ADAR protein corresponding to the above.

In some embodiments, to reduce editing efficiency, the adenosine deaminase may comprise one or more of the mutations: E1007R, E1007K, E1007Y, E1007L, E1007T, E1008G, E1008I, E1008P, E1008V, E1008F, E1008W, E1008S, E1008N, E1008K, based on amino acid sequence positions of hADAR1-D, and mutations in a homologous ADAR protein corresponding to the above.

In some embodiments, the substrate editing preference, efficiency and/or selectivity of an adenosine deaminase is affected by amino acid residues near or in the active center of the enzyme. In some embodiments, the adenosine deaminase comprises a mutation at the glutamic acid 1008 position in hADAR1-D sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the mutation is E1008R, or a corresponding mutation in a homologous ADAR protein. In some embodiments, the E1008R mutant has an increased editing efficiency for target adenosine residue that has a mismatched G residue on the opposite strand.

In some embodiments, the adenosine deaminase protein further comprises or is connected to one or more double-stranded RNA (dsRNA) binding motifs (dsRBMs) or domains (dsRBDs) for recognizing and binding to double-stranded nucleic acid substrates. In some embodiments, the interaction between the adenosine deaminase and the double-stranded substrate is mediated by one or more additional protein factor(s), including a CRISPR/CAS protein factor. In some embodiments, the interaction between the adenosine deaminase and the double-stranded substrate is further mediated by one or more nucleic acid component(s), including a guide RNA.

In certain example embodiments, directed evolution may be used to design modified ADAR proteins capable of catalyzing additional reactions besides deamination of a adenine to a hypoxanthine.

Modified Adenosine Deaminase Having C to U Deamination Activity

In certain example embodiments, directed evolution may be used to design modified ADAR proteins capable of catalyzing additional reactions besides deamination of an adenine to a hypoxanthine. For example, the modified ADAR protein may be capable of catalyzing deamination of a cytidine to a uracil. While not bound by a particular theory, mutations that improve C to U activity may alter the shape of the binding pocket to be more amenable to the smaller cytidine base. In some cases, the modified ADAR comprise mutations on residues the catalytic core and/or residues that contact the RNA target. Examples of mutations on residues in the catalytic core include V351G and K350I., based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. Examples of mutations on residues on the residues that contact with the RNA target include S486A and S495N, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above.

In certain embodiments the adenosine deaminase is engineered to convert the activity to cytidine deaminase. Such engineered adenosine deaminase may also retain its adenosine deaminase activity, i.e., such mutated adenosine deaminase may have both adenosine deaminase and cytidine deaminase activities. Accordingly in some embodiments, the adenosine deaminase comprises one or more mutations in positions selected from E396, C451, V351, R455, T375, K376, S486, Q488, R510, K594, R348, G593, S397, H443, L444, Y445, F442, E438, T448, A353, V355, T339, P539, T339, P539, V525 I520, P462 and N579. In particular embodiments, the adenosine deaminase comprises one or more mutations in a position selected from V351, L444, V355, V525 and I520. In some embodiments, the adenosine deaminase may comprise one or more of mutations at E488, V351, S486, T375, S370, P462, N597, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above.

In some embodiments, the adenosine deaminase may comprise one or more of the mutations: E488Q based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: E488Q, V351G, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: E488Q, V351G, S486A, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: E488Q, V351G, S486A, T375S, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: E488Q, V351G, S486A, T375S, S370C, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: E488Q, V351G, S486A, T375S, S370C, P462A, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: E488Q, V351G, S486A, T375S, S370C, P462A, N597I, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: E488Q, V351G, S486A, T375S, S370C, P462A, N597I, L332I, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: E488Q, V351G, S486A, T375S, S370C, P462A, N597I, L332I, I398V, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: E488Q, V351G, S486A, T375S, S370C, P462A, N597I, L332I, I398V, K350I, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: E488Q, V351G, S486A, T375S, S370C, P462A, N597I, L332I, I398V, K350I, M383L, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: E488Q, V351G, S486A, T375S, S370C, P462A, N597I, L332I, I398V, K350I, M383L, D619G, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: E488Q, V351G, S486A, T375S, S370C, P462A, N597I, L332I, I398V, K350I, M383L, D619G, S582T, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: E488Q, V351G, S486A, T375S, S370C, P462A, N597I, L332I, I398V, K350I, M383L, D619G, S582T, V440I based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: E488Q, V351G, S486A, T375S, S370C, P462A, N597I, L332I, I398V, K350I, M383L, D619G, S582T, V440I, S495N based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: E488Q, V351G, S486A, T375S, S370C, P462A, N597I, L332I, I398V, K350I, M383L, D619G, S582T, V440I, S495N, K418E based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some embodiments, the adenosine deaminase may comprise one or more of the mutations: E488Q, V351G, S486A, T375S, S370C, P462A, N597I, L332I, I398V, K350I, M383L, D619G, S582T, V440I, S495N, K418E, S661T based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In some examples, provided herein includes a mutated adenosine deaminase e.g., an adenosine deaminase comprising one or more mutations of E488Q, V351G, S486A, T375S, S370C, P462A, N597I, L332I, I398V, K350I, M383L, D619G, S582T, V440I, S495N, K418E, S661T (based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above), fused with a dead CRISPR-Cas protein or CRISPR-Cas nickase. In a particular example, provided herein includes a mutated adenosine deaminase e.g., an adenosine deaminase comprising E488Q, V351G, S486A, T375S, S370C, P462A, N597I, L332I, I398V, K350I, M383L, D619G, S582T, V440I, S495N, K418E, and S661T (based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above), fused with a dead CRISPR-Cas protein or a CRISPR-Cas nickase.

In some embodiments, the modified adenosine deaminase having C-to-U deamination activity comprises a mutation at any one or more of positions V351, T375, R455, and E488 of the hADAR2-D amino acid sequence, or a corresponding position in a homologous ADAR protein. In some embodiments, the adenosine deaminase comprises mutation E488Q. In some embodiments, the adenosine deaminase comprises one or more of mutations selected from V351I, V351L, V351F, V351M, V351C, V351A, V351G, V351P, V351T, V351S, V351Y, V351W, V351Q, V351N, V351H, V351E, V351D, V351K, V351R, T375I, T375L, T375V, T375F, T375M, T375C, T375A, T375G, T375P, T375S, T375Y, T375W, T375Q, T375N, T375H, T375E, T375D, T375K, T375R, R455I, R455L, R455V, R455F, R455M, R455C, R455A, R455G, R455P, R455T, R455S, R455Y, R455W, R455Q, R455N, R455H, R455E, R455D, R455K. In some embodiments, the adenosine deaminase comprises mutation E488Q, and further comprises one or more of mutations selected from V351I, V351L, V351F, V351M, V351C, V351A, V351G, V351P, V351T, V351S, V351Y, V351W, V351Q, V351N, V351H, V351E, V351D, V351K, V351R, T375I, T375L, T375V, T375F, T375M, T375C, T375A, T375G, T375P, T375S, T375Y, T375W, T375Q, T375N, T375H, T375E, T375D, T375K, T375R, R455I, R455L, R455V, R455F, R455M, R455C, R455A, R455G, R455P, R455T, R455S, R455Y, R455W, R455Q, R455N, R455H, R455E, R455D, R455K.

In some cases, the modified ADAR may further comprise one or more mutations that reduce off-target activities. In cases where modified ADAR has C-to-U deamination activity, such mutations may reduce A to I off-target activity and increase C-to-U on-target deamination activity. In general, such mutations may be on residues that interact with the RNA target. Examples of such mutations include S375N, S375C, S375A, and N473I, based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above. In one example, the ADAR has S375N mutation. In one example, provided herein includes a mutated adenosine deaminase e.g., an adenosine deaminase comprising E488Q, V351G, S486A, T375S, S370C, P462A, N597I, L332I, I398V, K350I, M383L, D619G, S582T, V440I, S495N, K418E, S661T, and S375N (based on amino acid sequence positions of hADAR2-D, and mutations in a homologous ADAR protein corresponding to the above), fused with a dead CRISPR-Cas protein or a CRISPR-Cas nickase.

In connection with the aforementioned modified ADAR protein having C-to-U deamination activity, the invention described herein also relates to a method for deaminating a C in a target RNA sequence of interest, comprising delivering to a target RNA or DNA an AD-functionalized composition disclosed herein.

In certain example embodiments, the method for deaminating a C in a target RNA sequence comprising delivering to said target RNA: (a) a catalytically inactive (dead) Cas; (b) a guide molecule which comprises a guide sequence linked to a direct repeat sequence; and (c) a modified ADAR protein having C-to-U deamination activity or catalytic domain thereof; wherein said modified ADAR protein or catalytic domain thereof is covalently or non-covalently linked to said dead Cas protein or said guide molecule or is adapted to link thereto after delivery; wherein guide molecule forms a complex with said dead Cas protein and directs said complex to bind said target RNA sequence of interest; wherein said guide sequence is capable of hybridizing with a target sequence comprising said C to form an RNA duplex; wherein, optionally, said guide sequence comprises a non-pairing A or U at a position corresponding to said C resulting in a mismatch in the RNA duplex formed; and wherein said modified ADAR protein or catalytic domain thereof deaminates said C in said RNA duplex.

In connection with the aforementioned modified ADAR protein having C-to-U deamination activity, the invention described herein further relates to an engineered, non-naturally occurring system suitable for deaminating a C in a target locus of interest, comprising: (a) a guide molecule which comprises a guide sequence linked to a direct repeat sequence, or a nucleotide sequence encoding said guide molecule; (b) a catalytically inactive CRISPR-Cas protein, or a nucleotide sequence encoding said catalytically inactive CRISPR-Cas protein; (c) a modified ADAR protein having C-to-U deamination activity or catalytic domain thereof, or a nucleotide sequence encoding said modified ADAR protein or catalytic domain thereof; wherein said modified ADAR protein or catalytic domain thereof is covalently or non-covalently linked to said CRISPR-Cas protein or said guide molecule or is adapted to link thereto after delivery; wherein said guide sequence is capable of hybridizing with a target RNA sequence comprising a C to form an RNA duplex; wherein, optionally, said guide sequence comprises a non-pairing A or U at a position corresponding to said C resulting in a mismatch in the RNA duplex formed; wherein, optionally, the system is a vector system comprising one or more vectors comprising: (a) a first regulatory element operably linked to a nucleotide sequence encoding said guide molecule which comprises said guide sequence, (b) a second regulatory element operably linked to a nucleotide sequence encoding said catalytically inactive CRISPR-Cas protein; and (c) a nucleotide sequence encoding a modified ADAR protein having C-to-U deamination activity or catalytic domain thereof which is under control of said first or second regulatory element or operably linked to a third regulatory element; wherein, if said nucleotide sequence encoding a modified ADAR protein or catalytic domain thereof is operably linked to a third regulatory element, said modified ADAR protein or catalytic domain thereof is adapted to link to said guide molecule or said CRISPR-Cas protein after expression; wherein components (a), (b) and (c) are located on the same or different vectors of the system, optionally wherein said first, second, and/or third regulatory element is an inducible promoter.

In an embodiment of the invention, the substrate of the adenosine deaminase is an RNA/DNA heteroduplex formed upon binding of the guide molecule to its DNA target which then forms the CRISPR-Cas complex with the CRISPR-Cas enzyme. The RNA/DNA or DNA/RNA heteroduplex is also referred to herein as the “RNA/DNA hybrid”, “DNA/RNA hybrid” or “double-stranded substrate”.

According to the present invention, the substrate of the adenosine deaminase is an RNA/DNAn RNA duplex formed upon binding of the guide molecule to its DNA target which then forms the CRISPR-Cas complex with the CRISPR-Cas enzyme. The substrate of the adenosine deaminase can also be an RNA/RNA duplex formed upon binding of the guide molecule to its RNA target which then forms the CRISPR-Cas complex with the CRISPR-Cas enzyme. The RNA/DNA or DNA/RNAn RNA duplex is also referred to herein as the “RNA/DNA hybrid”, “DNA/RNA hybrid” or “double-stranded substrate”. The particular features of the guide molecule and CRISPR-Cas enzyme are detailed below.

The term “editing selectivity” as used herein refers to the fraction of all sites on a double-stranded substrate that is edited by an adenosine deaminase. Without being bound by theory, it is contemplated that editing selectivity of an adenosine deaminase is affected by the double-stranded substrate's length and secondary structures, such as the presence of mismatched bases, bulges and/or internal loops.

In some embodiments, when the substrate is a perfectly base-paired duplex longer than 50 bp, the adenosine deaminase may be able to deaminate multiple adenosine residues within the duplex (e.g., 50% of all adenosine residues). In some embodiments, when the substrate is shorter than 50 bp, the editing selectivity of an adenosine deaminase is affected by the presence of a mismatch at the target adenosine site. Particularly, in some embodiments, adenosine (A) residue having a mismatched cytidine (C) residue on the opposite strand is deaminated with high efficiency. In some embodiments, adenosine (A) residue having a mismatched guanosine (G) residue on the opposite strand is skipped without editing.

In particular embodiments, the adenosine deaminase protein or catalytic domain thereof is delivered to the cell or expressed within the cell as a separate protein, but is modified so as to be able to link to either the Cas protein or the guide molecule. In particular embodiments, this is ensured by the use of orthogonal RNA-binding protein or adaptor protein/aptamer combinations that exist within the diversity of bacteriophage coat proteins. Examples of such coat proteins include but are not limited to: MS2, Qβ, F2, GA, fr, JP501, M12, R17, BZ13, JP34, JP500, KU1, M11, MX1, TW18, VK, SP, FI, ID2, NL95, TW19, AP205, ϕCb5, ϕCb8r, ϕCb12r, ϕCb23r, 7s and PRR1. Aptamers can be naturally occurring or synthetic oligonucleotides that have been engineered through repeated rounds of in vitro selection or SELEX (systematic evolution of ligands by exponential enrichment) to bind to a specific target.

In particular embodiments, the guide molecule is provided with one or more distinct RNA loop(s) or distinct sequence(s) that can recruit an adaptor protein. A guide molecule may be extended, without colliding with the Cas protein by the insertion of distinct RNA loop(s) or distinct sequence(s) that may recruit adaptor proteins that can bind to the distinct RNA loop(s) or distinct sequence(s). Examples of modified guides and their use in recruiting effector domains to the Cas complex are provided in Konermann (Nature 2015, 517(7536): 583-588). In particular embodiments, the aptamer is a minimal hairpin aptamer which selectively binds dimerized MS2 bacteriophage coat proteins in mammalian cells and is introduced into the guide molecule, such as in the stemloop and/or in a tetraloop. In these embodiments, the adenosine deaminase protein is fused to MS2. The adenosine deaminase protein is then co-delivered together with the Cas protein and corresponding guide RNA.

In some embodiments, the Cas-ADAR base editing system described herein comprises (a) a Cas protein, which is catalytically inactive or a nickase; (b) a guide molecule which comprises a guide sequence; and (c) an adenosine deaminase protein or catalytic domain thereof; wherein the adenosine deaminase protein or catalytic domain thereof is covalently or non-covalently linked to the Cas protein or the guide molecule or is adapted to link thereto after delivery; wherein the guide sequence is substantially complementary to the target sequence but comprises a non-pairing C corresponding to the A being targeted for deamination, resulting in a A-C mismatch in a DNA-RNA or RNA-RNA duplex formed by the guide sequence and the target sequence. For application in eukaryotic cells, the Cas protein and/or the adenosine deaminase are preferably NLS-tagged.

In some embodiments, the components (a), (b) and (c) are delivered to the cell as a ribonucleoprotein complex. The ribonucleoprotein complex can be delivered via one or more lipid nanoparticles.

In some embodiments, the components (a), (b) and (c) are delivered to the cell as one or more RNA molecules, such as one or more guide RNAs and one or more mRNA molecules encoding the Cas protein, the adenosine deaminase protein, and optionally the adaptor protein. The RNA molecules can be delivered via one or more lipid nanoparticles.

In some embodiments, the components (a), (b) and (c) are delivered to the cell as one or more DNA molecules. In some embodiments, the one or more DNA molecules are comprised within one or more vectors such as viral vectors (e.g., AAV). In some embodiments, the one or more DNA molecules comprise one or more regulatory elements operably configured to express the Cas protein, the guide molecule, and the adenosine deaminase protein or catalytic domain thereof, optionally wherein the one or more regulatory elements comprise inducible promoters.

In some embodiments of the guide molecule is capable of hybridizing with a target sequence comprising the Adenine to be deaminated within a first DNA strand or a RNA strand at the target locus to form a DNA-RNA or RNA-RNA duplex which comprises a non-pairing Cytosine opposite to said Adenine. Upon duplex formation, the guide molecule forms a complex with the Cas protein and directs the complex to bind said first DNA strand or said RNA strand at the target locus of interest. Details on the aspect of the guide of the Cas-ADAR base editing system are provided herein below.

In some embodiments, a Cas guide RNA having a canonical length (e.g., about 20 nt for AacCas) is used to form a DNA-RNA or RNA-RNA duplex with the target DNA or RNA. In some embodiments, a Cas guide molecule longer than the canonical length (e.g., >20 nt for AacCas) is used to form a DNA-RNA or RNA-RNA duplex with the target DNA or RNA including outside of the Cas-guide RNA-target DNA complex. In certain example embodiments, the guide sequence has a length of about 29-53 nt capable of forming a DNA-RNA or RNA-RNA duplex with said target sequence. In certain other example embodiments, the guide sequence has a length of about 40-50 nt capable of forming a DNA-RNA or RNA-RNA duplex with said target sequence. In certain example embodiments, the distance between said non-pairing C and the 5′ end of said guide sequence is 20-30 nucleotides. In certain example embodiments, the distance between said non-pairing C and the 3′ end of said guide sequence is 20-30 nucleotides.

In at least a first design, the Cas-ADAR system comprises (a) an adenosine deaminase fused or linked to a Cas protein, wherein the Cas protein is catalytically inactive or a nickase, and (b) a guide molecule comprising a guide sequence designed to introduce a A-C mismatch in a DNA-RNA or RNA-RNA duplex formed between the guide sequence and the target sequence. In some embodiments, the Cas protein and/or the adenosine deaminase are NLS-tagged, on either the N- or C-terminus or both.

In at least a second design, the Cas-ADAR system comprises (a) a Cas protein that is catalytically inactive or a nickase, (b) a guide molecule comprising a guide sequence designed to introduce a A-C mismatch in a DNA-RNA or RNA-RNA duplex formed between the guide sequence and the target sequence, and an aptamer sequence (e.g., MS2 RNA motif or PP7 RNA motif) capable of binding to an adaptor protein (e.g., MS2 coating protein or PP7 coat protein), and (c) an adenosine deaminase fused or linked to an adaptor protein, wherein the binding of the aptamer and the adaptor protein recruits the adenosine deaminase to the DNA-RNA or RNA-RNA duplex formed between the guide sequence and the target sequence for targeted deamination at the A of the A-C mismatch. In some embodiments, the adaptor protein and/or the adenosine deaminase are NLS-tagged, on either the N- or C-terminus or both. The Cas protein can also be NLS-tagged.

The use of different aptamers and corresponding adaptor proteins also allows orthogonal gene editing to be implemented. In one example in which adenosine deaminase are used in combination with cytidine deaminase for orthogonal gene editing/deamination, sgRNA targeting different loci are modified with distinct RNA loops in order to recruit MS2-adenosine deaminase and PP7-cytidine deaminase (or PP7-adenosine deaminase and MS2-cytidine deaminase), respectively, resulting in orthogonal deamination of A or C at the target loci of interested, respectively. PP7 is the RNA-binding coat protein of the bacteriophage Pseudomonas. Like MS2, it binds a specific RNA sequence and secondary structure. The PP7 RNA-recognition motif is distinct from that of MS2. Consequently, PP7 and MS2 can be multiplexed to mediate distinct effects at different genomic loci simultaneously. For example, an sgRNA targeting locus A can be modified with MS2 loops, recruiting MS2-adenosine deaminase, while another sgRNA targeting locus B can be modified with PP7 loops, recruiting PP7-cytidine deaminase. In the same cell, orthogonal, locus-specific modifications are thus realized. This principle can be extended to incorporate other orthogonal RNA-binding proteins.

In at least a third design, the Cas-ADAR CRISPR system comprises (a) an adenosine deaminase inserted into an internal loop or unstructured region of a Cas protein, wherein the Cas protein is catalytically inactive or a nickase, and (b) a guide molecule comprising a guide sequence designed to introduce a A-C mismatch in a DNA-RNA or RNA-RNA duplex formed between the guide sequence and the target sequence.

Cas protein split sites that are suitable for insertion of adenosine deaminase can be identified with the help of a crystal structure. For example, with respect to AacCas mutants, it should be readily apparent what the corresponding position for, for example, a sequence alignment. For other Cas protein one can use the crystal structure of an ortholog if a relatively high degree of homology exists between the ortholog and the intended Cas protein.

The split position may be located within a region or loop. Preferably, the split position occurs where an interruption of the amino acid sequence does not result in the partial or full destruction of a structural feature (e.g. alpha-helixes or (3-sheets). Unstructured regions (regions that did not show up in the crystal structure because these regions are not structured enough to be “frozen” in a crystal) are often preferred options. Splits in all unstructured regions that are exposed on the surface of Cas are envisioned in the practice of the invention. The positions within the unstructured regions or outside loops may not need to be exactly the numbers provided above, but may vary by, for example 1, 2, 3, 4, 5, 6, 7, 8, 9, or even 10 amino acids either side of the position given above, depending on the size of the loop, so long as the split position still falls within an unstructured region of outside loop.

The Cas-ADAR system described herein can be used to target a specific Adenine within a DNA sequence for deamination. For example, the guide molecule can form a complex with the Cas protein and directs the complex to bind a target sequence at the target locus of interest. Because the guide sequence is designed to have a non-pairing C, the heteroduplex formed between the guide sequence and the target sequence comprises a A-C mismatch, which directs the adenosine deaminase to contact and deaminate the A opposite to the non-pairing C, converting it to a Inosine (I). Since Inosine (I) base pairs with C and functions like Gin cellular process, the targeted deamination of A described herein are useful for correction of undesirable G-A and C-T mutations, as well as for obtaining desirable A-G and T-C mutations. In some embodiments, the guide may comprise one or more mismatches to increase specificity. For example, the guide may comprise one or more disfavorable guanine mismatches across from off-target adenosines.

Base Excision Repair Inhibitor

In some embodiments, the AD-functionalized CRISPR system further comprises a base excision repair (BER) inhibitor. Without wishing to be bound by any particular theory, cellular DNA-repair response to the presence of I:T pairing may be responsible for a decrease in nucleobase editing efficiency in cells. Alkyladenine DNA glycosylase (also known as DNA-3-methyladenine glycosylase, 3-alkyladenine DNA glycosylase, or N-methylpurine DNA glycosylase) catalyzes removal of hypoxanthine from DNA in cells, which may initiate base excision repair, with reversion of the I:T pair to a A:T pair as outcome.

In some embodiments, the BER inhibitor is an inhibitor of alkyladenine DNA glycosylase. In some embodiments, the BER inhibitor is an inhibitor of human alkyladenine DNA glycosylase. In some embodiments, the BER inhibitor is a polypeptide inhibitor. In some embodiments, the BER inhibitor is a protein that binds hypoxanthine. In some embodiments, the BER inhibitor is a protein that binds hypoxanthine in DNA. In some embodiments, the BER inhibitor is a catalytically inactive alkyladenine DNA glycosylase protein or binding domain thereof. In some embodiments, the BER inhibitor is a catalytically inactive alkyladenine DNA glycosylase protein or binding domain thereof that does not excise hypoxanthine from the DNA. Other proteins that are capable of inhibiting (e.g., sterically blocking) an alkyladenine DNA glycosylase base-excision repair enzyme are within the scope of this disclosure. Additionally, any proteins that block or inhibit base-excision repair as also within the scope of this disclosure.

Without wishing to be bound by any particular theory, base excision repair may be inhibited by molecules that bind the edited strand, block the edited base, inhibit alkyladenine DNA glycosylase, inhibit base excision repair, protect the edited base, and/or promote fixing of the non-edited strand. It is believed that the use of the BER inhibitor described herein can increase the editing efficiency of an adenosine deaminase that is capable of catalyzing a A to I change.

Accordingly, in the first design of the AD-functionalized CRISPR system discussed above, the CRISPR-Cas protein or the adenosine deaminase can be fused to or linked to a BER inhibitor (e.g., an inhibitor of alkyladenine DNA glycosylase). In some embodiments, the BER inhibitor can be comprised in one of the following structures (nCas=Cas nickase; dCas=dead Cas): [AD]-[optional linker]-[nCas/dCas]-[optional linker]-[BER inhibitor]; [AD]-[optional linker]-[BER inhibitor]-[optional linker]-[nCas/dCas]; [BER inhibitor]-[optional linker]-[AD]-[optional linker]-[nCas/dCas]; [BER inhibitor]-[optional linker]-[nCas/dCas]-[optional linker]-[AD]; [nCas/dCas]-[optional linker]-[AD]-[optional linker]-[BER inhibitor]; [nCas/dCas]-[optional linker]-[BER inhibitor]-[optional linker]-[AD].

Similarly, in the second design of the AD-functionalized CRISPR system discussed above, the CRISPR-Cas protein, the adenosine deaminase, or the adaptor protein can be fused to or linked to a BER inhibitor (e.g., an inhibitor of alkyladenine DNA glycosylase). In some embodiments, the BER inhibitor can be comprised in one of the following structures (nCas=Cas nickase; dCas=dead Cas): [nCas/dCas]-[optional linker]-[BER inhibitor]; [BER inhibitor]-[optional linker]-[nCas/dCas]; [AD]-[optional linker]-[Adaptor]-[optional linker]-[BER inhibitor]; [AD]-[optional linker]-[BER inhibitor]-[optional linker]-[Adaptor]; [BER inhibitor]-[optional linker]-[AD]-[optional linker]-[Adaptor]; [BER inhibitor]-[optional linker]-[Adaptor]-[optional linker]-[AD]; [Adaptor]-[optional linker]-[AD]-[optional linker]-[BER inhibitor]; [Adaptor]-[optional linker]-[BER inhibitor]-[optional linker]-[AD].

In the third design of the AD-functionalized CRISPR system discussed above, the BER inhibitor can be inserted into an internal loop or unstructured region of a CRISPR-Cas protein.

Cytidine Deaminase

In some embodiments, the deaminase is a cytidine deaminase. The term “cytidine deaminase” or “cytidine deaminase protein” or “cytidine deaminase activity” as used herein refers to a protein, a polypeptide, or one or more functional domain(s) of a protein or a polypeptide that is capable of catalyzing a hydrolytic deamination reaction that converts an cytosine (or an cytosine moiety of a molecule) to an uracil (or a uracil moiety of a molecule), as shown below. In some embodiments, the cytosine-containing molecule is an cytidine (C), and the uracil-containing molecule is an uridine (U). The cytosine-containing molecule can be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). In certain examples, a cytidine deaminase may be a cytidine deaminase acting on RNA (CDAR).

According to the present disclosure, cytidine deaminases that can be used in connection with the present disclosure include, but are not limited to, members of the enzyme family known as apolipoprotein B mRNA-editing complex (APOBEC) family deaminase, an activation-induced deaminase (AID), or a cytidine deaminase 1 (CDA1). In particular embodiments, the deaminase in an APOBEC1 deaminase, an APOBEC2 deaminase, an APOBEC3A deaminase, an APOBEC3B deaminase, an APOBEC3C deaminase, and APOBEC3D deaminase, an APOBEC3E deaminase, an APOBEC3F deaminase an APOBEC3G deaminase, an APOBEC3H deaminase, or an APOBEC4 deaminase.

In the methods and systems of the present invention, the cytidine deaminase or engineered adenosine deaminase with cytidine deaminase activity is capable of targeting Cytosine in a DNA single strand. In certain example embodiments the cytidine deaminase activity may edit on a single strand present outside of the binding component e.g. bound CRISPR-Cas. In other example embodiments, the cytidine deaminase may edit at a localized bubble, such as a localized bubble formed by a mismatch at the target edit site but the guide sequence. In certain example embodiments the cytidine deaminase may contain mutations that help focus the area of activity such as those disclosed in Kim et al., Nature Biotechnology (2017) 35(4):371-377 (doi:10.1038/nbt.3803.

In some embodiments, the cytidine deaminase is derived from one or more metazoa species, including but not limited to, mammals, birds, frogs, squids, fish, flies and worms. In some embodiments, the cytidine deaminase is a human, primate, cow, dog rat or mouse cytidine deaminase.

In some embodiments, the cytidine deaminase is a human APOBEC, including hAPOBEC1 or hAPOBEC3. In some embodiments, the cytidine deaminase is a human AID.

In some embodiments, the cytidine deaminase protein recognizes and converts one or more target cytosine residue(s) in a single-stranded bubble of a RNA duplex into uracil residues (s). In some embodiments, the cytidine deaminase protein recognizes a binding window on the single-stranded bubble of a RNA duplex. In some embodiments, the binding window contains at least one target cytosine residue(s). In some embodiments, the binding window is in the range of about 3 bp to about 100 bp. In some embodiments, the binding window is in the range of about 5 bp to about 50 bp. In some embodiments, the binding window is in the range of about 10 bp to about 30 bp. In some embodiments, the binding window is about 1 bp, 2 bp, 3 bp, 5 bp, 7 bp, 10 bp, 15 bp, 20 bp, 25 bp, 30 bp, 40 bp, 45 bp, 50 bp, 55 bp, 60 bp, 65 bp, 70 bp, 75 bp, 80 bp, 85 bp, 90 bp, 95 bp, or 100 bp.

In some embodiments, the cytidine deaminase protein comprises one or more deaminase domains. Not intended to be bound by theory, it is contemplated that the deaminase domain functions to recognize and convert one or more target cytosine (C) residue(s) contained in a single-stranded bubble of a RNA duplex into (an) uracil (U) residue (s). In some embodiments, the deaminase domain comprises an active center. In some embodiments, the active center comprises a zinc ion. In some embodiments, amino acid residues in or near the active center interact with one or more nucleotide(s) 5′ to a target cytosine residue. In some embodiments, amino acid residues in or near the active center interact with one or more nucleotide(s) 3′ to a target cytosine residue.

In some embodiments, the cytidine deaminase comprises human APOBEC1 full protein (hAPOBEC1) or the deaminase domain thereof (hAPOBEC1-D) or a C-terminally truncated version thereof (hAPOBEC-T). In some embodiments, the cytidine deaminase is an APOBEC family member that is homologous to hAPOBEC1, hAPOBEC-D or hAPOBEC-T.

In some embodiments, the cytidine deaminase comprises human AID1 full protein (hAID) or the deaminase domain thereof (hAID-D) or a C-terminally truncated version thereof (hAID-T). In some embodiments, the cytidine deaminase is an AID family member that is homologous to hAID, hAID-D or hAID-T. In some embodiments, the hAID-T is a hAID which is C-terminally truncated by about 20 amino acids.

In some embodiments, the cytidine deaminase comprises the wild-type amino acid sequence of a cytosine deaminase. In some embodiments, the cytidine deaminase comprises one or more mutations in the cytosine deaminase sequence, such that the editing efficiency, and/or substrate editing preference of the cytosine deaminase is changed according to specific needs.

Certain mutations of APOBEC1 and APOBEC3 proteins have been described in Kim et al., Nature Biotechnology (2017) 35(4):371-377 (doi:10.1038/nbt.3803); and Harris et al. Mol. Cell (2002) 10:1247-1253, each of which is incorporated herein by reference in its entirety.

In some embodiments, the cytidine deaminase is an APOBEC1 deaminase comprising one or more mutations at amino acid positions corresponding to W90, R118, H121, H122, R126, or R132 in rat APOBEC1, or an APOBEC3G deaminase comprising one or more mutations at amino acid positions corresponding to W285, R313, D316, D317X, R320, or R326 in human APOBEC3G.

In some embodiments, the cytidine deaminase comprises a mutation at tryptophane90 of the rat APOBEC1 amino acid sequence, or a corresponding position in a homologous APOBEC protein, such as tryptophane285 of APOBEC3G. In some embodiments, the tryptophan residue at position 90 is replaced by an tyrosine or phenylalanine residue (W90Y or W90F).

In some embodiments, the cytidine deaminase comprises a mutation at Arginine118 of the rat APOBEC1 amino acid sequence, or a corresponding position in a homologous APOBEC protein. In some embodiments, the arginine residue at position 118 is replaced by an alanine residue (R118A).

In some embodiments, the cytidine deaminase comprises a mutation at Histidine121 of the rat APOBEC1 amino acid sequence, or a corresponding position in a homologous APOBEC protein. In some embodiments, the histidine residue at position 121 is replaced by an arginine residue (H121R).

In some embodiments, the cytidine deaminase comprises a mutation at Histidine122 of the rat APOBEC1 amino acid sequence, or a corresponding position in a homologous APOBEC protein. In some embodiments, the histidine residue at position 122 is replaced by an arginine residue (H122R).

In some embodiments, the cytidine deaminase comprises a mutation at Arginine126 of the rat APOBEC1 amino acid sequence, or a corresponding position in a homologous APOBEC protein, such as Arginine320 of APOBEC3G. In some embodiments, the arginine residue at position 126 is replaced by an alanine residue (R126A) or by a glutamic acid (R126E).

In some embodiments, the cytidine deaminase comprises a mutation at arginine132 of the APOBEC1 amino acid sequence, or a corresponding position in a homologous APOBEC protein. In some embodiments, the arginine residue at position 132 is replaced by a glutamic acid residue (R132E).

In some embodiments, to narrow the width of the editing window, the cytidine deaminase may comprise one or more of the mutations: W90Y, W90F, R126E and R132E, based on amino acid sequence positions of rat APOBEC1, and mutations in a homologous APOBEC protein corresponding to the above.

In some embodiments, to reduce editing efficiency, the cytidine deaminase may comprise one or more of the mutations: W90A, R118A, R132E, based on amino acid sequence positions of rat APOBEC1, and mutations in a homologous APOBEC protein corresponding to the above. In particular embodiments, it can be of interest to use a cytidine deaminase enzyme with reduced efficacy to reduce off-target effects.

In some embodiments, the cytidine deaminase is wild-type rat APOBEC1 (rAPOBEC1, or a catalytic domain thereof. In some embodiments, the cytidine deaminase comprises one or more mutations in the rAPOBEC1 sequence, such that the editing efficiency, and/or substrate editing preference of rAPOBEC1 is changed according to specific needs.

rAPOBEC1: (SEQ ID NO: 243) MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSI WRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAI TEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESG YCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQ PQLTFFTIALQSCHYQRLPPHILWATGLK

In some embodiments, the cytidine deaminase is wild-type human APOBEC1 (hAPOBEC1) or a catalytic domain thereof. In some embodiments, the cytidine deaminase comprises one or more mutations in the hAPOBEC1 sequence, such that the editing efficiency, and/or substrate editing preference of hAPOBEC1 is changed according to specific needs.

APOBEC1: (SEQ ID NO: 244) MTSEKGPSTGDPTLRRRIEPWEFDVFYDPRELRKEACLLYEIKWGMSRKI WRSSGKNTTNHVEVNFIKKFTSERDFHPSMSCSITWFLSWSPCWECSQAI REFLSRHPGVTLVIYVARLFWHMDQQNRQGLRDLVNSGVTIQIMRASEYY HCWRNFVNYPPGDEAHWPQYPPLWMMLYALELHCIILSLPPCLKISRRWQ NHLTFFRLHLQNCHYQTIPPHILLATGLIHPSVAWR

In some embodiments, the cytidine deaminase is wild-type human APOBEC3G (hAPOBEC3G) or a catalytic domain thereof. In some embodiments, the cytidine deaminase comprises one or more mutations in the hAPOBEC3G sequence, such that the editing efficiency, and/or substrate editing preference of hAPOBEC3G is changed according to specific needs.

hAPOBEC3G: (SEQ ID NO: 245) MELKYHPEMRFFHWFSKWRKLHRDQEYEVTWYISWSPCTKCTRDMATFLA EDPKVTLTIFVARLYYFWDPDYQEALRSLCQKRDGPRATMKIMNYDEFQH CWSKFVYSQRELEEPWNNLPKYYILLHIMLGEILRHSMDPPTFTENENNE PWVRGRHETYLCYEVERMHNDTWVLLNQRRGELCNQAPHKHGELEGRHAE LCFLDVIPFWKLDLDQDYRVTCFTSWSPCFSCAQEMAKFISKNKHVSLCI FTARIYDDQGRCQEGLRTLAEAGAKISIMTYSEFKHCWDTFVDHQGCPFQ PWDGLDEHSQDLSGRLRAILQNQEN

In some embodiments, the cytidine deaminase is wild-type Petromyzon marinus CDA1 (pmCDA1) or a catalytic domain thereof. In some embodiments, the cytidine deaminase comprises one or more mutations in the pmCDA1 sequence, such that the editing efficiency, and/or substrate editing preference of pmCDA1 is changed according to specific needs.

pmCDA1: (SEQ ID NO: 246) MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFW GYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADC AEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNV MVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKRRSELSIMIQVKIL HTTKSPAV

In some embodiments, the cytidine deaminase is wild-type human AID (hAID) or a catalytic domain thereof. In some embodiments, the cytidine deaminase comprises one or more mutations in the pmCDA1 sequence, such that the editing efficiency, and/or substrate editing preference of pmCDA1 is changed according to specific needs.

hAID: (SEQ ID NO: 247) MDSLLMNRRKFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLR NKNGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRG NPYLSLRIFTARLYFCEDRKAEPEGLRRLHRAGVQIAIMTFKDYFYCWNT FVENHERTFKAWEGLHENSVRLSRQLRRILLPLYEVDDLRDAFRTLGLLD

In some embodiments, the cytidine deaminase is truncated version of hAID (hAID-DC) or a catalytic domain thereof. In some embodiments, the cytidine deaminase comprises one or more mutations in the hAID-DC sequence, such that the editing efficiency, and/or substrate editing preference of hAID-DC is changed according to specific needs.

hAID-DC: (SEQ ID NO: 248) MDSLLMNRRKFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLR NKNGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRG NPNLSLRIFTARLYFCEDRKAEPEGLRRLHRAGVQIAIMTFKDYFYCWNT FVENHERTFKAWEGLHENSVRLSRQLRRILL

Additional embodiments of the cytidine deaminase are disclosed in WO WO2017/070632, titled “Nucleobase Editor and Uses Thereof,” which is incorporated herein by reference in its entirety.

In some embodiments, the cytidine deaminase has an efficient deamination window that encloses the nucleotides susceptible to deamination editing. Accordingly, in some embodiments, the “editing window width” refers to the number of nucleotide positions at a given target site for which editing efficiency of the cytidine deaminase exceeds the half-maximal value for that target site. In some embodiments, the cytidine deaminase has an editing window width in the range of about 1 to about 6 nucleotides. In some embodiments, the editing window width of the cytidine deaminase is 1, 2, 3, 4, 5, or 6 nucleotides.

Not intended to be bound by theory, it is contemplated that in some embodiments, the length of the linker sequence affects the editing window width. In some embodiments, the editing window width increases (e.g., from about 3 to about 6 nucleotides) as the linker length extends (e.g., from about 3 to about 21 amino acids). In a non-limiting example, a 16-residue linker offers an efficient deamination window of about 5 nucleotides. In some embodiments, the length of the guide RNA affects the editing window width. In some embodiments, shortening the guide RNA leads to a narrowed efficient deamination window of the cytidine deaminase.

In some embodiments, mutations to the cytidine deaminase affect the editing window width. In some embodiments, the cytidine deaminase component of the CD-functionalized CRISPR system comprises one or more mutations that reduce the catalytic efficiency of the cytidine deaminase, such that the deaminase is prevented from deamination of multiple cytidines per DNA binding event. In some embodiments, tryptophan at residue 90 (W90) of APOBEC1 or a corresponding tryptophan residue in a homologous sequence is mutated. In some embodiments, the catalytically inactive CRISPR-Cas is fused to or linked to an APOBEC1 mutant that comprises a W90Y or W90F mutation. In some embodiments, tryptophan at residue 285 (W285) of APOBEC3G, or a corresponding tryptophan residue in a homologous sequence is mutated. In some embodiments, the catalytically inactive CRISPR-Cas is fused to or linked to an APOBEC3G mutant that comprises a W285Y or W285F mutation.

In some embodiments, the cytidine deaminase component of CD-functionalized CRISPR system comprises one or more mutations that reduce tolerance for non-optimal presentation of a cytidine to the deaminase active site. In some embodiments, the cytidine deaminase comprises one or more mutations that alter substrate binding activity of the deaminase active site. In some embodiments, the cytidine deaminase comprises one or more mutations that alter the conformation of DNA to be recognized and bound by the deaminase active site. In some embodiments, the cytidine deaminase comprises one or more mutations that alter the substrate accessibility to the deaminase active site. In some embodiments, arginine at residue 126 (R126) of APOBEC1 or a corresponding arginine residue in a homologous sequence is mutated. In some embodiments, the catalytically inactive CRISPR-Cas is fused to or linked to an APOBEC1 that comprises a R126A or R126E mutation. In some embodiments, tryptophan at residue 320 (R320) of APOBEC3G, or a corresponding arginine residue in a homologous sequence is mutated. In some embodiments, the catalytically inactive CRISPR-Cas is fused to or linked to an APOBEC3G mutant that comprises a R320A or R320E mutation. In some embodiments, arginine at residue 132 (R132) of APOBEC1 or a corresponding arginine residue in a homologous sequence is mutated. In some embodiments, the catalytically inactive CRISPR-Cas is fused to or linked to an APOBEC1 mutant that comprises a R132E mutation.

In some embodiments, the APOBEC1 domain of the CD-functionalized CRISPR system comprises one, two, or three mutations selected from W90Y, W90F, R126A, R126E, and R132E. In some embodiments, the APOBEC1 domain comprises double mutations of W90Y and R126E. In some embodiments, the APOBEC1 domain comprises double mutations of W90Y and R132E. In some embodiments, the APOBEC1 domain comprises double mutations of R126E and R132E. In some embodiments, the APOBEC1 domain comprises three mutations of W90Y, R126E and R132E.

In some embodiments, one or more mutations in the cytidine deaminase as disclosed herein reduce the editing window width to about 2 nucleotides. In some embodiments, one or more mutations in the cytidine deaminase as disclosed herein reduce the editing window width to about 1 nucleotide. In some embodiments, one or more mutations in the cytidine deaminase as disclosed herein reduce the editing window width while only minimally or modestly affecting the editing efficiency of the enzyme. In some embodiments, one or more mutations in the cytidine deaminase as disclosed herein reduce the editing window width without reducing the editing efficiency of the enzyme. In some embodiments, one or more mutations in the cytidine deaminase as disclosed herein enable discrimination of neighboring cytidine nucleotides, which would be otherwise edited with similar efficiency by the cytidine deaminase.

In some embodiments, the cytidine deaminase protein further comprises or is connected to one or more double-stranded RNA (dsRNA) binding motifs (dsRBMs) or domains (dsRBDs) for recognizing and binding to double-stranded nucleic acid substrates. In some embodiments, the interaction between the cytidine deaminase and the substrate is mediated by one or more additional protein factor(s), including a CRISPR/CAS protein factor. In some embodiments, the interaction between the cytidine deaminase and the substrate is further mediated by one or more nucleic acid component(s), including a guide RNA.

According to the present invention, the substrate of the cytidine deaminase is an DNA single strand bubble of a RNA duplex comprising a Cytosine of interest, made accessible to the cytidine deaminase upon binding of the guide molecule to its DNA target which then forms the CRISPR-Cas complex with the CRISPR-Cas enzyme, whereby the cytosine deaminase is fused to or is capable of binding to one or more components of the CRISPR-Cas complex, i.e. the CRISPR-Cas enzyme and/or the guide molecule. The particular features of the guide molecule and CRISPR-Cas enzyme are detailed below.

The cytidine deaminase or catalytic domain thereof may be a human, a rat, or a lamprey cytidine deaminase protein or catalytic domain thereof.

The cytidine deaminase protein or catalytic domain thereof may be an apolipoprotein B mRNA-editing complex (APOBEC) family deaminase. The cytidine deaminase protein or catalytic domain thereof may be an activation-induced deaminase (AID). The cytidine deaminase protein or catalytic domain thereof may be a cytidine deaminase 1 (CDA1).

The cytidine deaminase protein or catalytic domain thereof may be an APOBEC1 deaminase. The APOBEC1 deaminase may comprise one or more mutations corresponding to W90A, W90Y, R118A, H121R, H122R, R126A, R126E, or R132E in rat APOBEC1, or an APOBEC3G deaminase comprising one or more mutations corresponding to W285A, W285Y, R313A, D316R, D317R, R320A, R320E, or R326E in human APOBEC3G.

The system may further comprise a uracil glycosylase inhibitor (UGI). Inn some embodiments, the cytidine deaminase protein or catalytic domain thereof is delivered together with a uracil glycosylase inhibitor (UGI). The GI may be linked (e.g., covalently linked) to the cytidine deaminase protein or catalytic domain thereof and/or a catalytically inactive CRISPR-Cas protein.

Regulation of Post-Translational Modification of Gene Products

In some cases, base editing may be used for regulating post-translational modification of a gene products. In some cases, an amino acid residue that is a post-translational modification site may be mutated by base editing to an amino residue that cannot be modified. Examples of such post-translational modifications include disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, methylation, ubiquitination, sumoylation, or any combinations thereof.

In some embodiments, the base editors herein may regulate Stat3/IRF-5 pathway, e.g., for reduction of inflammation. For example, phosphorylation on Tyr705 of Stat3, Thr10, Ser158, Ser309, Ser317, Ser451, and/or Ser462 of IRF-5 may be involved with interleukin signaling. Base editors herein may be used to mutate one or more of these procreation sites for regulating immunity, autoimmunity, and/or inflammation.

In some embodiments, the base editors herein may regulate insulin receptor substrate (IRS) pathway. For example, phosphorylation on Ser265, Ser302, Ser325, Ser336, Ser358, Ser407, and/or Ser408 may be involved in regulating (e.g., inhibit) ISR pathway. Alternatively or additionally, Serine 307 in mouse (or Serine 312 in human) may be mutated so the phosphorylation may be regulated. For example, Serine 307 phosphorylation may lead to degradation of IRS-1 and reduce MAPK signaling. Serine 307 phosphorylation may be induced under insulin insensitivity conditions, such as insulin overstimulation and/or TNFα treatment. In some examples, 5307F mutation may be generated for stabilizing the interaction between IRS-1 and other components in the pathway. Base editors herein may be used to mutate one or more of these procreation sites for regulating IRS pathway.

Regulation of Stability of Gene Products

In some embodiments, base editing may be used for regulating the stability of gene products. For example, one or more amino acid residues that regulate protein degradation rates may be mutated by the base editors herein. In some cases, such amino acid residues may be in a degron. A degron may refer to a portion of a protein involved in regulating the degradation rate of the protein. Degrons may include short amino acid sequences, structural motifs, and exposed amino acids (e.g., lysine or arginine). Some protein may comprise multiple degrons. The degrons be ubiquitin-dependent (e.g., regulating protein degradation based on ubiquitination of the protein) or ubiquitin-independent.

In some cases, the based editing may be used to mutate one or more amino acid residues in a signal peptide for protein degradation. In some examples, the signal peptide may be a PEST sequence, which is a peptide sequence that is rich in proline (P), glutamic acid (E), serine (S), and threonine (T). For example, the stability of NANOG, which comprises a PEST sequence, may be increased, e.g., to promote embryonic stem cell pluripotency.

In some examples, the base editors may be used for mutating SMN2 (e.g., to generate S270A mutilation) to increase stability of the SMN2 protein, which is involved in spinal muscular atrophy. Other mutations in SMN2 that may be generated by based editors include those described in Cho S. et al., Genes Dev. 2010 Mar. 1; 24(5): 438-442. In certain examples, the base editors may be used for generating mutations on IκBα, as described in Fortmann K T et al., J Mol Biol. 2015 Aug. 28; 427(17): 2748-2756. Target sites in degrons may be identified by computational tools, e.g., the online tools provided on slim.ucd.ie/apc/index.php. Other targets include Cdc25A phosphatase.

Examples of Genes that can be Targeted by Base Editors

In some examples, the base editors may be used for modifying PCSK9. The base editors may introduce stop codons and/or disease-associated mutations that reduce PCSK9 activity. The base editing may introduce one or more of the following mutations in PCSK9: R46L, R46A, A53V, A53A, E57K, Y142X, L253F, R237W, H391N, N425S, A443T, I474V, I474A, Q554E, Q619P, E670G, E670A, C679X, H417Q, R469W, E482G, F515L, and/or H553R.

In some examples, the base editors may be used for modifying ApoE. The base editors may target ApoE in synthetic model and/or patient-derived neurons (e.g., those derived from iPSC). The targeting may be tested by sequencing.

In some examples, the base editors may be used for modifying Stat1/3. The base editor may target Y705 and/or S727 for reducing Stat1/3 activation. The base editing may be tested by luciferase-based promoter. Targeting Stat1/3 by base editing may block monocyte to macrophage differentiation, and inflammation in response to ox-LDL stimulation of macrophages.

In some examples, the base editors may be used for modifying TFEB (transcription factor for EB). The base editor may target one or more amino acid residues that regulate translocation of the TFEB. In some cases, the base editor may target one or more amino acid residues that regulate autophagy.

In some examples, the base editors may be used for modifying ornithine carbamoyl transferase (OTC). Such modification may be used for correct ornithine carbamoyl transferase deficiency. For example, base editing may correct Leu45Pro mutation by converting nucleotide 134C to U. An example approach is shown in FIG. 102.

In some examples, the base editors may be used for modifying Lipin1. The base editor may target one or more serine's that can be phosphorylated by mTOR. Base editing of Lipin1 may regulate lipid accumulation. The base editors may target Lipin1 in 3T3L1 preadipocyte model. Effects of the base editing may be tested by measuring reduction of lipid accumulation (e.g., via oil red).

Base Editing Guide Molecule Design Considerations

In some embodiments, the guide sequence is an RNA sequence of between 10 to 50 nt in length, but more particularly of about 20-30 nt advantageously about 20 nt, 23-25 nt or 24 nt. In base editing embodiments, the guide sequence is selected so as to ensure that it hybridizes to the target sequence comprising the adenosine to be deaminated. This is described more in detail below. Selection can encompass further steps which increase efficacy and specificity of deamination.

In some embodiments, the guide sequence is about 20 nt to about 30 nt long and hybridizes to the target DNA strand to form an almost perfectly matched duplex, except for having a dA-C mismatch at the target adenosine site. Particularly, in some embodiments, the dA-C mismatch is located close to the center of the target sequence (and thus the center of the duplex upon hybridization of the guide sequence to the target sequence), thereby restricting the adenosine deaminase to a narrow editing window (e.g., about 4 bp wide). In some embodiments, the target sequence may comprise more than one target adenosine to be deaminated. In further embodiments the target sequence may further comprise one or more dA-C mismatch 3′ to the target adenosine site. In some embodiments, to avoid off-target editing at an unintended Adenine site in the target sequence, the guide sequence can be designed to comprise a non-pairing Guanine at a position corresponding to said unintended Adenine to introduce a dA-G mismatch, which is catalytically unfavorable for certain adenosine deaminases such as ADAR1 and ADAR2