ANALYSIS OF NUCLEIC ACIDS ASSOCIATED WITH SINGLE CELLS USING NUCLEIC ACID BARCODE

Info

Publication number: 20220243240
Type: Application
Filed: Apr 8, 2022
Publication Date: Aug 4, 2022
Inventors: Yann Chong TAN (Singapore), Gary WITHEY (San Francisco, CA)
Application Number: 17/716,617

Abstract

Provided herein are methods and compositions for analyzing nucleic acids associated with single cells using nucleic acid barcodes. According to some embodiments, a method for producing one or more polynucleotides of interest comprises: obtaining a plurality of RNAs associated with one or more samples, wherein the samples are obtained from one or more subjects, each RNA is associated with a single sample, and the RNAs associated with each sample are present in a separate reaction volume; adding an adapter molecule to the RNAs associated with each sample, wherein the adapter molecule is generated using an enzymatic reaction and comprises a universal priming sequence, a barcode sequence, and a binding site; and incorporating the barcode sequence into one or more polynucleotides associated with each sample, thereby producing the one or more polynucleotides of interest.

Description

Description

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No. 16/402,606, filed May 3, 2019, which is a continuation of U.S. application Ser. No. 15/428,064, filed Feb. 8, 2017, now U.S. Pat. No. 10,316,345, which is a continuation of U.S. application Ser. No. 14/586,857, filed Dec. 30, 2014, now U.S. Pat. No. 9,580,736, which claims benefit of U.S. Application No. 61/922,012, filed Dec. 30, 2013, the entire contents of each of which are incorporated herein by reference.

INCORPORATION-BY-REFERENCE OF SEQUENCE LISTING

The instant application contains a Sequence Listing which has been submitted in ASCII format via EFS-Web and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Mar. 31, 2022, is named “NATE-054_C04US_SeqList.txt” and is about 8.7 MB in size.

BACKGROUND

Variable genes such as immunoglobulin (Ig) and T cell receptor (TCR) genes are formed from rearrangement of V(D)J gene segments with P/N nucleotide additions between the junctions. A fully functional Ig or TCR protein is formed by association of two genes—heavy and light chain genes for Ig, alpha and beta genes for an αβTCR and gamma and delta genes for a γδTCR. This combinatorial approach results in an extremely large variety of different possible sequences.

This repertoire allows the immune system to be able to respond to novel immunological insults that have not yet been encountered by the organism. Immunoglobulin genes also undergo somatic hypermutation which further increases the repertoire size.

Correspondingly, any nucleic acid analysis of variable genes that allows for expression of the native Ig or TCR protein to investigate its functional properties requires not just sequencing individual B (for Ig genes) or T cells (for TCR genes), but also requires native pairing of the two genes that make up the protein. This can be done by single cell cloning and Sanger sequencing, but is slow and laborious (see, e.g., Wrammert et. al., Nature, 2008, 453:667-671).

High-throughput methods have been developed for high-throughput sequencing of natively paired genes, and fall into two approaches. The first approach is to attach a unique nucleic acid barcode identifier to nucleic acids from a cell, and pairing is achieved via bioinformatically linking together genes if they share the same barcode and therefore originate from the same cell (PCT/US2012/000221). The second approach is to physically link nucleic acids from the two genes together (see, e.g., U.S. Pat. No. 7,749,697).

The first approach is superior as it allows pairing for multiple genes (such as B or T cell co-expressed genes that identify specific T cell or B cell subsets), while the second approach is limited to physically linking a few nucleic acids. To date, experimental data exists only for cases in which no more than two nucleic acids have been physically linked.

Associating nucleic acids unambiguously to a single cell (the first approach) rather than associating them with each other via linking (the second approach) has advantages. When nucleic acids are associated with each other, it can be difficult to distinguish PCR and sequencing errors from true biological variation. Assumptions have to be made about the accuracy of the sequencing platform and reads arbitrarily assigned to different sequences based on a percentage similarity cutoff, i.e. all reads with >95% similarity are assigned to a sequence and any differences between them are assumed to be due to sequencing errors. This is unable to distinguish between sequences that are very similar to one another (see Zhu et. al., Frontiers in Microbiology, 2012, 3:315).

Furthermore, assumptions about how many cells share an identical sequence are made using the relative frequency of reads assigned to the sequence. This is an approximate measure and is affected by PCR amplification biases, as is well known in the field. Therefore, associating Ig or TCR nucleic acids with each other can only give an approximate, but not true representation of the repertoire sequenced (see Zhu et. al., Frontiers in Microbiology, 2012, 3:315).

However, associating nucleic acids to single cells using nucleic acid barcodes allows for unambiguous differentiation between similar or even identical sequences from single B or T cells as each read can be assigned to a cell.

Furthermore, by building a consensus sequence with all reads associated with a cell, very accurate and almost completely error-free sequences can be obtained and an accurate representation of the repertoire sequenced can be obtained. This is also generalizable to analysis of all nucleic acids in a cell.

Still, technical difficulties in delivering unique barcodes to each single cell remain. The current best technology to attach nucleic acid barcodes to variable genes has unique barcodes in aqueous solution and each barcode exists in a separate storage container even before the reaction to attach barcodes to variable gene nucleic acids (PCT/US2012/000221), otherwise the nucleic acid barcodes will be mixed before use. This creates a logistical difficulty of barcoding many thousands of cells, due to the large number of containers required to contain the individual barcodes.

The requirement for a large number of storage containers also makes this approach incompatible with any sort of approach where a unique barcode cannot be individually pipetted into each individual reaction container (which will also contain a single cell). An example is nanoliter-sized reaction containers such as a nanowell approach, where it is impractical to pipette a unique barcode individually to each nanowell as there are thousands to hundreds of thousands of nanowells.

This is also infeasible in a nanodroplet approach, in which droplets are made using a water-in-oil emulsion, as hundreds of thousands of nanodroplets are generated with only a few aqueous streams (see for e.g., products by Dolomite Microfluidics or Raindance Technologies), and it is not possible to have unique barcodes in individual storage containers before delivering to the nanodroplet.

One method to deliver unique barcodes to individual reaction containers is by using limiting dilution to deposit a unique barcode into the majority of reaction containers. One may perform limiting dilution of barcodes attached to manipulable objects, such as beads, each of which has multiple copies of one particular barcode attached, or one may perform limiting dilution of barcodes in solution. Upon diluting such beads, multiple copies of one particular nucleic acid barcode are present in a reaction container, whereas upon diluting barcodes in solution, only a single copy of a particular nucleic acid barcode is present in a reaction container.

Moreover, addition of a nucleic acid barcode to the sample-derived nucleic acids of interest present in a reaction container will be more complete if the introduced barcode is amplified, to ensure that it is present in a sufficient quantity in the reaction chamber. For example, a typical mammalian cell contains roughly 400,000 copies of mRNA. To maximize the efficiency of the overall single-cell analysis, as many of these mRNA copies as possible should be barcoded. Therefore, at a minimum, roughly the same number of copies of a particular nucleic acid barcode as there are mRNA copies need to be present in the reaction container. Limiting dilution of barcodes in solution leads to just a single copy of a particular barcode in the reaction container, while dilution of beads bearing barcodes would be expected to provide maximally tens of thousands of copies. Thus, amplification of the barcode in either case is important to generate sufficient quantities of a particular nucleic acid barcode in a reaction container such that successful addition of the barcode to the greatest number of sample-derived nucleic acids occurs. However, beads are expected to provide significantly more starting material for and therefore significantly better barcode amplification.

Furthermore, if the nucleic acids are attached to a solid surface, they will not be as free to move about in comparison to nucleic acids in solution. Solid phase kinetics are much slower than aqueous phase kinetics for nucleic acid complementary base pairing, and may result in much less efficient addition of barcodes to nucleic acids of interest. Preferably, nucleic acid barcodes should exist in the aqueous phase before participating in the barcoding reaction.

This current invention improves upon a previous invention (PCT/US2012/000221) to attach unique barcodes to each sample, where each sample is usually a single cell, but is generalizable to any type of sample. The current invention enables delivery of unique barcodes to any type of reaction container, and is also suitable for nanoliter-sized reaction containers and does not require keeping unique nucleic acid barcodes in separate storage containers. It is amendable to but does not require manually pipetting a unique barcode into each reaction container. It delivers one or more copies of a unique barcode or unique barcode set into each reaction container and the barcode is attached to nucleic acids of interest in a reaction that occurs in the aqueous phase with rapid aqueous phase kinetics. Furthermore, the amplification reaction can occur at a sufficiently low temperature that it is compatible with mesophilic enzymes (that are otherwise inactivated at high temperatures) to add barcodes to nucleic acids of interest.

SUMMARY

Disclosed herein are methods and compositions for analyzing nucleic acids associated with single cells using nucleic acid barcodes.

Disclosed herein is a method for producing one or more polynucleotides of interest, comprising obtaining a cDNA library comprising a plurality of cDNAs associated with one or more samples obtained from one or more subjects, wherein each cDNA is associated with a single sample in the one or more samples, and wherein each cDNA associated with each sample is present in a separate container or compartment; and adding an adapter molecule to the cDNA associated with each sample to produce the one or more polynucleotides of interest, wherein the adapter molecule is generated from an adaptor construct comprising a universal priming sequence, a barcode, and a cDNA binding site.

In some aspects, the adapter molecules are generated using an isothermal reaction. In some aspects, the adaptor construct further comprises an RNA polymerase (RNAP) promoter. In some aspects, the RNAP promoter is selected from the group consisting of T7, T3, and SP6. In some aspects, the adaptor construct further comprises a nicking endonuclease restriction site. In some aspects, the nicking endonuclease restriction site is selected from the group consisting of Nt.BbvCI, Nt.BspQI, Nt.BsmAI, Nt.BstNBI, Nt.AlwI, and Nt.BsmAI. In some aspects, the adaptor is an RNA adaptor generated by RNAP. In some aspects, the adaptor is a DNA adaptor generated by a nicking endonuclease and strand displacing DNA polymerase. In some aspects, the strand displacing DNA polymerase is selected from the group consisting of Klenow exo- and Bst Large Fragment and its engineered variants, such as Bst 2.0.

In some aspects, the method further comprises allowing the 3′ end of the adapter molecule to attach to the 3′ end of each cDNA in the library to produce the one or more polynucleotides of interest.

In some aspects, the adaptor is added by annealing the adaptor to the ‘3 tail of a cDNA generated during a reverse transcription reaction. In some aspects, each cDNA comprises at least one C nucleotide, wherein C is located at the 3′ end of each cDNA, wherein the adapter region comprises at least one G nucleotide, wherein G is located at the 3′ end of the adapter region, and wherein the adapter region is attached to each cDNA via binding between the G and C. In some aspects, the adapter molecule is single-stranded, and further comprising incorporating the complementary of the adapter molecule into each cDNA by allowing an enzyme to make the adapter molecule double-stranded. In some aspects, the complementary of the adapter molecule is incorporated into each cDNA to produce the polynucleotide of interest by an MMLV H-reverse transcriptase.

In some aspects, each sample comprises a cell. In some aspects, the cell is a B cell. In some aspects, the B cell is a plasmablast, memory B cell, or a plasma cell.

Also disclosed herein is a method of attaching a barcode to a solid support comprising the steps of: a) generating a hydrophilic compartment of an inverse emulsion, the hydrophilic compartment comprising: a solid support contained therein, wherein the solid support comprises an oligonucleotide bound to the surface via a capture moiety, wherein the oligonucleotide comprises a 3′ sequence complementary to a 3′ sequence on a barcode oligonucleotide; a barcode oligonucleotide comprising a 3′ sequence complementary to the 3′ end of the bound oligonucleotide, and a barcode sequence; and b) performing a polymerase extension reaction to add the sequence of the barcode to the bound oligonucleotide on the solid support.

In some aspects, the barcode oligonucleotide further comprises a 5′ sequence identical or complementary to a reverse PCR primer. In some aspects, the method further comprises performing a PCR reaction using a fluorophore-labeled reverse primer.

In some aspects, the solid support is a bead. In some aspects, the capture moiety is streptavidin. In some aspects, the barcode oligonucleotide further comprises a RNA polymerase (RNAP) promoter and/or an endonuclease restriction site, a universal priming sequence, a cDNA binding site. In some aspects, the RNAP promoter selected from the group consisting of T7, T3, and SP6. In some aspects, the nicking endonuclease restriction site is selected from the group consisting of Nt.BbvCI, Nt.BspQI, Nt.BsmAI, Nt.BstNBI, Nt.AlwI, and Nt.BsmAI. In some aspects, the cDNA binding site is one or more G nucleotides.

Also disclosed herein is a method of attaching a barcode to a solid support comprising the steps of a) providing a solid support, with an oligonucleotide bound to the solid support via a capture moiety, wherein the oligonucleotide comprises an S1_xsequence, and a sequence complementary to a 3′ sequence on a first barcode oligonucleotide; a first barcode oligonucleotide comprising a 3′ sequence complementary to a sequence of the bound oligonucleotide, and a W sequence; and b) performing a polymerase extension reaction or ligation reaction to add the W sequence to the S1_xsequence of the bound oligonucleotide on the solid support; c) providing a second barcode oligonucleotide with a S2_ysequence comprising a 3′ sequence complementary to the 3′ end of the oligonucleotide extended in step b); d) performing a polymerase extension reaction or ligation reaction to add the S2_ysequence to the S1_xand W sequences of the bound oligonucleotide on the solid support, where the barcode sequence comprises the S1_x, W, and S2_ysequences.

In some aspects, the solid support is a bead. In some aspects, the capture moiety is streptavidin. In some aspects, the first or second barcode oligonucleotide further comprises a RNA polymerase (RNAP) promoter and/or a nicking endonuclease restriction site, a universal priming sequence, a cDNA binding site. In some aspects, the RNAP promoter selected from the group consisting of T7, T3, and SP6. In some aspects, the endonuclease restriction site is selected from the group consisting of Nt.BbvCI, Nt.BspQI, Nt.BsmAI, Nt.BstNBI, Nt.AlwI, and Nt.BsmAI. In some aspects, the cDNA binding site is one or more G nucleotides.

Also disclosed herein is a solid support with an attached barcode generated by any of the the methods disclosed above. Also disclosed herein is a beaded barcode library comprising a plurality of such solid supports with attached barcodes.

Also disclosed herein is a barcode adaptor construct comprising a universal priming sequence, a barcode, and a cDNA binding site. In some aspects, the construct further comprises an RNAP promoter. In some aspects, the RNAP promoter is selected from the group consisting of T7, T3, and SP6. In some aspects, the construct further comprises a nicking endonuclease restriction site. In some aspects, the nicking endonuclease restriction site is selected from the group consisting of Nt.BbvCI, Nt.BspQI, Nt.BsmAI, Nt.BstNBI, Nt.AlwI, and Nt.BsmAI.

Also disclosed herein is a barcode adaptor template bead comprising a solid support and a barcode adaptor molecule bound to the solid support via a capture moiety, wherein the barcode adaptor molecule comprises a barcode sequence and a cDNA binding site. In some aspects, the cDNA binding site comprises one or more G nucleotides. In some aspects, the barcode sequence comprises a sequence S1_x-W-S2_y. Also disclosed herein is a beaded barcode library comprising a plurality of the barcode adaptor template beads as disclosed above.

Also disclosed herein is a polynucleotide library comprising a plurality of barcode adaptor template beads comprising a solid support and a barcode adaptor molecule bound to the solid support via a capture moiety, wherein the barcode adaptor molecule comprises a barcode sequence and a cDNA binding site, wherein a cDNA region is coupled to the 3′ end of the adaptor.

In some aspects, the cDNA binding site comprises one or more G nucleotides. In some aspects, the barcode sequence comprises a sequence S1_x-W-S2_y.

In some aspects, the cDNA is derived from a B cell. In some aspects, the B cell is a plasmablast, memory B cell, or a plasma cell. In some aspects, the cDNA is a B-cell derived variable immunoglobulin region.

Also disclosed herein is a microfluidic droplet device as shown in FIG. 7.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

These and other features, aspects, and advantages will become better understood with regard to the following description, and accompanying drawings, where:

FIG. 1. Barcode adapter map. The barcode adapter sequence comprises an RNA polymerase promoter and/or a nicking endonuclease site followed by a universal priming sequence used in subsequent PCR steps for primers to anneal to, followed by a barcode sequence, and a nucleic acid binding sequence.

FIG. 2. Amplifying barcode adapter. a) RNA barcode adapters can be synthesized in a linear amplification reaction by an RNAP, such as T7, which binds to its promoter sequence and synthesizes single-stranded barcode adapter RNA. b) A nicking endonuclease, such as Nt.BbvCI (NEB) can be used to introduce a nick on the sense strand and DNA barcode adapters can be synthesized in an amplification reaction by a strand-displacing enzyme, such as Klenow exo-, which will extend the nick and displace the single-stranded barcode adapter.

FIG. 3. Incorporating barcode adapter sequence into 1^ststrand cDNA. Here RNA barcode adapters are synthesized to demonstrate barcoding of cDNA. DNA barcode adapters (synthesized in FIG. 2b) may also be used. a) An RNAP primes off its promoter and synthesizes RNA barcode adapters. b) In the same reaction, reverse transcription occurs and 1^ststrand cDNA is generated. The MMLV-based H-reverse transcriptase has 3′ tailing activity and adds several dCs to the 3′ end of the 1^ststrand cDNA. c) The barcode adapter base-pairs with the tailed dCs and the reverse transcriptase continues transcription using the barcode adapter as a template, incorporating the barcode sequence into the 1^ststrand cDNA. All mRNAs in the reaction are therefore barcoded.

FIG. 4. RNA barcode adapters have less background. In the barcoding reaction in FIG. 3, both oligo(dT) and barcode adapters are present, and both oligos can prime the reverse transcription reaction. a) When the reaction is primed with oligo(dT), the reaction proceeds as normal. b) when the RT reaction is misprimed with a DNA barcode adapter, during PCR the forward primer can prime off both the sense and anti-sense strands and create amplification of non-desired products. c) When the RT reaction is primed with RNA barcode adapter, the growing strand cannot use RNA nucleotides as a template when using a proof-reading DNA polymerase in PCR1, and as a result misprimed cDNAs will not contain barcode adapter sequences on both the sense and anti-sense strands. Therefore non-desired products should not be exponentially amplified, resulting in significantly less background.

FIG. 5. Barcode synthesis in one reaction. a) Beads were coupled to an oligonucleotide. Coupling may be done by coupling biotinylated oligos onto streptavidin coated beads, and may also be coupled using other means known in the field. b) In a reaction container coupled beads, forward and reverse primers and a barcode sequence containing forward and reverse primer complementary sequences were present, with the barcode sequence oligo preferably present at only a single copy in a reaction container. PCR was then conducted to amplify the barcode sequence oligo and incorporate it into the bead to form barcode adapter template beads.

FIG. 6. Barcode synthesis in multi-steps. a) Beads are coupled to (multiple copies of) an oligonucleotide containing a unique S1 sequence. Multiple, separate coupling reactions are performed, with each coupling reaction using an oligonucleotide containing a different unique S1 sequence. Beads, each coupled to an oligonucleotide with a different unique S1 sequence, are then pooled together, forming a library of beads having S1_xsequences. b) These beads are then used in an extension reaction. In each reaction, an oligonucleotide that contains a unique W sequence complementarily base-pairs with the S1_x-containing oligonucleotides coupled to the beads, and an extension reaction using a DNA polymerase is performed. Beads from all the extension reactions are pooled, and a library of beads containing a combination of S1_xsequences each with the unique W sequence are formed. c) The double-stranded DNA from (b) is denatured and the antisense strand washed off the beads. Additional, separate extension reactions are performed on the beads as in (b), but the oligonucleotide that complementarily base-pairs with the S1_xand W containing oligonucleotide coupled to the beads contains a different unique S2 sequence in each separate reaction. Beads from all extension reactions are pooled, and a library of beads containing barcode adapter templates is obtained, with a combination of S1_x, W, and S2_ysequences forming the barcode sequence. A large number of unique barcode sequences can thus be obtained in this combinatorial approach. Furthermore, multiple unique W sequences can each be combined with the S1_xand S2_ysequences, yielding barcodes of the general format S1_x-W_z-S2_y.

FIG. 7. Example of microfluidic droplet device setup. Three Dolomite P-Pumps were equipped with flow sensors. The first P-Pump was connected directly to a 2-Reagent Droplet Chip via microfluidic tubing that incorporated a T-junction to split the line into two inputs. This was the oil input line. The other two P-Pumps were connected via fluidic tubing to PEEK sample loops that coiled around an ice bin that served to keep samples chilled while the device was operating, and each of these loops were connected to the 2-Reagent Droplet Chip. Each sample loop incorporated a four-way valve at its front end so that sample could be loaded into the loop by means of a syringe. The first sample loop was filled with the cell & barcoded bead suspension while the second loop was filled with RT/barcoding/lysis mix. Importantly, the sample loops were oriented horizontally and above or level with the droplet chip so as to avoid any uphill sections through which it may be difficult for cells and beads to travel. The ice bin was filled with ice prior to use.

FIG. 8. Barcoding reaction works in a variety of different buffers. 1, 2, and 3 refer to the 3 reaction buffers, which were the 0.5×MMLV, 1× Thermopol DF and 0.5×TAE buffers respectively. K, L, and G refer to kappa, lambda and gamma immunoglobulin chains. All chains were amplified in the different reaction buffers used.

FIG. 9. Barcoding reaction work better using RNA barcodes. 1, 2, and 3 refers to the 3 reaction conditions, which were the 1×MMLV, 0.5×MMLV conditions using RNA barcode adapters, and 1×MMLV using DNA barcode adapters. K, L, and G refer to kappa, lambda and gamma immunoglobulin chains. The bands in the reaction using DNA adapter was obscured due to high background.

FIG. 10. Amplified product from barcoding single B cells in droplet reaction containers with barcode adapter templates. The bands corresponding to kappa and lambda light chains (“K/L”) and mu heavy chain (“M”) can be clearly seen.

FIG. 11. RNA barcode adapters amplified from barcode adapter template beads made using a multi-step approach. Barcode adapter template beads made were used in an in vitro transcription reaction. Bands were present from beads made using S1-oligo+W-oligo-a+S2-oligo-a and S1-oligo+w-oligo-b+S2-oligo-b respectively.

DETAILED DESCRIPTION

The invention provides a method to generate unique nucleic acid barcoded adapters in each reaction container such that the nucleic acid barcoded adapters are in aqueous phase but the template from which they were generated can either be attached to a solid surface (such as attached to beads) or be free in solution. Nucleic acid barcoded adapters are any polynucleotide sequence that comprise a unique barcode sequence and may or may not have modifications (for example, biotinylated or contain C18 spacers) or contain modified polynucleotides (such as 2′-O-methyl RNA bases).

Also provided are compositions generated using the methods disclosed herein. Accordingly, the present invention provides compositions of RNA and DNA adaptors and constructs for their generation. Also provided are barcode adapter template bead libraries comprising compound and non-compound barcodes, emulsion droplet libraries loaded with RNA barcodes, emulsions containing barcode libraries with cells, barcoded cDNA libraries, and microfluidic droplet generating devices, among others.

The barcoded adapter template is a double-stranded DNA (dsDNA) template which comprises the following sequence: 5′-T7 promoter-universal priming sequence-barcode sequence-binding sequence-3′. The T7 promoter sequence allows for synthesis of an RNA barcoded adapter from the template by T7 RNA polymerase. The universal priming sequence is used for complementarity to PCR primers that are used downstream. The binding sequence consists of 1 or more guanine bases (G's) and allows for complementary base-pairing of the barcoded adapter to the 3′ end of 1^ststrand cDNA (FIG. 1).

Other promoter sequences can be used, such as but not limited to T3 and SP6 promoter sequences, which allows for synthesis of an RNA barcoded adapter by T3 and SP6 RNA polymerases respectively. Other RNA polymerases which do not have a specific promoter sequence may also be used, as long as a full length or near full length barcoded adapter is synthesized in a large fraction of cases (FIG. 2a). Isothermal amplification may also be used, typically using DNA polymerases with strand-displacement activity such as Bst large fragment and Klenow 3′→5′exo- as long as full length or near full length barcoded adapters are synthesized in a large fraction of cases. Specific primer or nicking endonuclease sequences may need to be used instead of a promoter sequence, depending on the isothermal amplification method used (FIG. 2b). Barcoded adapters thus generated will comprise DNA nucleotides instead of RNA nucleotides. Both RNA or DNA barcoded adapters can be attached to polynucleotides of interest.

Attaching barcoded adapters to 3′ end of 1^ststrand cDNA has been previously described (PCT/US2012/000221). Briefly, H-MMLV reverse transcriptases have a 3′ dC tailing activity and add non-templated dCs to 1^ststrand cDNA. If a barcoded adapter ending in at least 1 G is also present, the adapter can base-pair with the 3′ dC of the 1^ststrand cDNA and the reverse transcriptase undergoes template switching and continues transcription using the barcoded adapter as a template and thus covalently adds the barcoded adapter to the 3′ end of the 1^ststrand cDNA via phosphodiester bonds (FIG. 3).

Barcoded adapters were linearly amplified from double-stranded DNA (dsDNA) containing a 5′ T7 promoter using a T7 RNA polymerase, and occurs in the same reaction as the reverse transcription reaction. The advantage of this approach is that barcoded adapters can be amplified from a dsDNA template, and this provides unique advantages:

- 1. Barcoded adapter templates can be attached to beads (a unique barcode per bead) and stored in the same storage container
- 2. Multiple copies of a unique barcoded adapter can be delivered into a reaction container without use of an individual pipetting step
- 3. Barcoded adapters are amplified, overcoming the limited amount of polynucleotides that can be attached to each bead
- 4. Amplified barcodes are in aqueous phase and utilize much more rapid liquid phase rather than solid phase kinetics

There are also advantages involved in using an RNA barcoded adapter rather than a DNA barcoded adapter:

- 1. An RNA barcoded adapter may be more efficient in the template switching reaction which attaches the barcode sequence to polynucleotides of interest as reverse transcriptases typically use RNA rather than DNA as a template and template switching is used by the reverse transcriptase in vivo to switch to an RNA template in the replication of retroviruses.
- 2. Using an entirely RNA transcript as an adapter results in less background when using proof-reading DNA polymerases in downstream PCR reactions. Background occurs when the barcode adapter misprimes and initiates reverse transcription, resulting in barcode adapter sequences added at both the 5′ and 3′ end of 1^ststrand cDNA. These can be amplified in PCR by just one primer complementary to the barcode adapter. However, if proof-reading DNA polymerases are used during PCR, they will not transcribe the RNA primer (FIG. 4), eliminating background from barcode adapter mispriming.

Due to the large number of barcoding reactions involved, NextGen sequencing is best suited to sequencing the barcoded nucleic acids to bioinformatically associate nucleic acids from the same reaction container with one another. Additional barcodes may be associated with a set of samples that are distinct from another set of samples and can be associated using PCR primers with unique barcode sequences. These additional barcodes are also referred to as plate-IDs. Plate-IDs confer advantages such as allowing distinguishing between different sets of samples in the same sequencing run, or bioinformatically tracking and eliminating any potential contaminations between different sets of samples.

As PCR and NextGen sequencing errors are unavoidable, the barcodes in this invention can be designed to be a reasonable distance apart such that the majority of barcode sequencing reads can be correctly assigned, with a small percentage of unassigned and misassigned barcodes.

One way of doing this is to design pre-determined barcode sequences with a minimum Hamming or edit distance apart. Another way is to design barcodes with random nucleotides, such as (N)₁₅. With this, there is a total possible space of 4¹⁵, or ˜1 billion unique barcode sequences. If a number of samples to be barcoded is much fewer than this total space, e.g. 1 million, or 0.1% of the total barcode space, we can expect the barcodes used should be of sufficient distance apart from one another that the majority of barcodes should be correctly assigned.

As long as the misassignment rate is sufficiently low, misassigned sequencing reads can be detected and discarded simply because the nucleic acids linked to the misassigned barcode sequence are different from the consensus sequence. We would expect the consensus sequence for each gene (e.g. gamma heavy chain, TCR alpha chain) associated to a barcode sequence to be assembled from correctly assigned reads as the barcode sequences were designed to be of a sufficient distance apart.

Samples in reaction containers can be barcoded with either a unique barcode, or a unique barcode set. A unique barcode set can be used by, e.g., delivering two or more barcode adapter template beads per reaction container, and each nucleic acid of a sample is barcoded with one of the barcodes in the unique barcode set. Nucleic acids are then associated to a sample by use of a unique barcode set.

One method to distinguish which barcode sets are used for which samples can be determined by examining reads from NextGen sequencing. Each barcode sequence is expected to be associated with assembled contigs from different samples as barcode sequences are reused in unique barcode sets. But contigs from the same sample are expected to be identical. For example, identical immunoglobulin gamma heavy chain contigs may be observed to be using barcode sequences a, b and c. And barcode sequences a, b and d may be observed to be associated with another immunoglobulin gamma heavy chain contig. From this, we can then conclude that a, b and c comprise barcode set1, and a, b and d barcode set2.

A library of barcode adapter template beads of N unique barcode sequences needs to be sufficiently diverse to barcode n samples that such the majority of samples are barcoded with either a unique barcode or a unique barcode set. If the number of barcode adapter template beads greatly exceed N, sampling with replacement can be approximated, and the number of samples barcoded with a unique barcode, U follows the binomial distribution and is given by:

$U = N * (\begin{matrix} n \\ k \end{matrix}) {p^{k} (1 - p)}^{n - k}$

Where k=1, and p=1/N.

The fraction of samples that are not barcoded with a unique barcode (and thus have two or more samples associated with one another) is given by

1−U/n

The relationship between N, n and the fraction of samples not barcoded with a unique barcode is given in Table 1.

TABLE 1 Fraction of samples not barcoded with a unique barcode # unique # samples barcoded (n) barcodes (N) 1,000 10,000 100,000 1,000,000 10,000,000 1,000 63.19% 100.00% 100.00% 100.00% 100.00% 5,000 18.11% 86.47% 100.00% 100.00% 100.00% 10,000 9.51% 63.21% 100.00% 100.00% 100.00% 50,000 1.98% 18.13% 86.47% 100.00% 100.00% 100,000 0.99% 9.52% 63.21% 100.00% 100.00% 500,000 0.20% 1.98% 18.13% 86.47% 100.00% 1,000,000 0.10% 0.99% 9.52% 63.21% 100.00% 5,000,000 0.02% 0.20% 1.98% 18.13% 86.47% 10,000,000 0.01% 0.10% 1.00% 9.52% 63.21% 50,000,000 0.00% 0.02% 0.20% 1.98% 18.13% 100,000,000 0.00% 0.01% 0.10% 1.00% 9.52%

As can be seen, if N=10n, >90% of the samples will be barcoded with a unique barcode.

The number of samples barcoded with a unique barcode set, U_SET, with x barcodes in a set also follows the binomial distribution, and can be thought of as a barcode library with

$(\begin{matrix} N \\ x \end{matrix})$

unique barcode combinations (N is assumed to be sufficiently large that combination is essentially without repetition), with nx barcodes used to barcode n samples and is given by:

$U_{S E T} = (\begin{matrix} N \\ x \end{matrix}) * (\begin{matrix} n \\ k \end{matrix}) {p^{k} (1 - p)}^{n - k}$

Where k=1, and

$p = 1 / (\begin{matrix} N \\ x \end{matrix}) .$

The fraction of samples that are not barcoded with a unique barcode (and thus have two or more samples associated with one another) is given by

1−U_SET/n

The relationship between N, n, x and the fraction of samples not barcoded with a unique barcode is given in Tables 2 and 3.

TABLE 2 Fraction of samples not barcoded with a unique barcode set when x = 2 # unique barcodes (N) when using barcode set # samples barcoded (n) with x = 2 1,000 10,000 100,000 1,000,000 10,000,000 100 18.28% 86.74% 100.00% 100.00% 100.00% 500 0.80% 7.70% 55.14% 99.97% 100.00% 1,000 0.20% 1.98% 18.14% 86.49% 100.00% 5,000 0.01% 0.08% 0.80% 7.69% 55.07% 10,000 0.00% 0.02% 0.20% 1.98% 18.13% 50,000 0.00% 0.00% 0.01% 0.08% 0.80% 100,000 0.00% 0.00% 0.00% 0.02% 0.20%

TABLE 3 Fraction of samples not barcoded with a unique barcode set when x = 3 # unique barcodes (N) when using barcode set # samples barcoded (n) with x = 3 1,000 10,000 100,000 1,000,000 10,000,000 100 0.62% 6.00% 46.12% 99.79% 100.00% 500 0.00% 0.05% 0.48% 4.71% 38.30% 1,000 0.00% 0.01% 0.06% 0.60% 5.84% 5,000 0.00% 0.00% 0.00% 0.00% 0.05% 10,000 0.00% 0.00% 0.00% 0.00% 0.01% 50,000 0.00% 0.00% 0.00% 0.00% 0.00% 100,000 0.00% 0.00% 0.00% 0.00% 0.00%

As can be seen, when using unique barcode sets instead of unique barcodes, a much smaller number of unique barcodes in the barcode adapter library is required to barcode a similar number of samples such that the majority of samples can be identified with a unique barcode set.

Compositions

Polynucleotides

In some aspects, a composition can include a polynucleotide. The term “polynucleotide(s)” refers to nucleic acids such as DNA molecules and RNA molecules and analogs thereof (e.g., DNA or RNA generated using nucleotide analogs or using nucleic acid chemistry). As desired, the polynucleotides may be made synthetically, e.g., using art-recognized nucleic acid chemistry or enzymatically using, e.g., a polymerase, and, if desired, can be modified. Typical modifications include methylation, biotinylation, and other art-known modifications. In addition, a polynucleotide can be single-stranded or double-stranded and, where desired, linked to a detectable moiety. In some aspects, a polynucleotide can include hybrid molecules, e.g., comprising DNA and RNA.

“G,” “C,” “A,” “T” and “U” each generally stand for a nucleotide that contains guanine, cytosine, adenine, thymidine and uracil as a base, respectively. However, it will be understood that the term “ribonucleotide” or “nucleotide” can also refer to a modified nucleotide or a surrogate replacement moiety. The skilled person is well aware that guanine, cytosine, adenine, and uracil may be replaced by other moieties without substantially altering the base pairing properties of an oligonucleotide comprising a nucleotide bearing such replacement moiety. For example, without limitation, a nucleotide comprising inosine as its base may base pair with nucleotides containing adenine, cytosine, or uracil. Hence, nucleotides containing uracil, guanine, or adenine may be replaced in nucleotide sequences by a nucleotide containing, for example, inosine. In another example, adenine and cytosine anywhere in the oligonucleotide can be replaced with guanine and uracil, respectively to form G-U Wobble base pairing with the target mRNA. Sequences containing such replacement moieties are suitable for the compositions and methods described herein.

As used herein, and unless otherwise indicated, the term “complementary,” when used to describe a first nucleotide sequence in relation to a second nucleotide sequence, refers to the ability of a polynucleotide comprising the first nucleotide sequence to hybridize and form a duplex structure under certain conditions with a polynucleotide comprising the second nucleotide sequence, as will be understood by the skilled person. Such conditions can, for example, be stringent conditions, where stringent conditions may include: 400 mM NaCl, 40 mM PIPES pH 6.4, 1 mM EDTA, 50° C. or 70° C. for 12-16 hours followed by washing. Other conditions, such as physiologically relevant conditions as may be encountered inside an organism, can apply. The skilled person will be able to determine the set of conditions most appropriate for a test of complementarity of two sequences in accordance with the ultimate application of the hybridized nucleotides.

Complementary sequences include base-pairing of a region of a polynucleotide comprising a first nucleotide sequence to a region of a polynucleotide comprising a second nucleotide sequence over the length or a portion of the length of one or both nucleotide sequences. Such sequences can be referred to as “complementary” with respect to each other herein. However, where a first sequence is referred to as “substantially complementary” with respect to a second sequence herein, the two sequences can be complementary, or they may include one or more, but generally not more than about 5, 4, 3, or 2 mismatched base pairs within regions that are base-paired. For two sequences with mismatched base pairs, the sequences will be considered “substantially complementary” as long as the two nucleotide sequences bind to each other via base-pairing.

“Complementary” sequences, as used herein, may also include, or be formed entirely from, non-Watson-Crick base pairs and/or base pairs formed from non-natural and modified nucleotides, in as far as the above embodiments with respect to their ability to hybridize are fulfilled. Such non-Watson-Crick base pairs includes, but are not limited to, G:U Wobble or Hoogstein base pairing.

The term percent “identity,” in the context of two or more nucleic acid or polypeptide sequences, refer to two or more sequences or subsequences that have a specified percentage of nucleotides or amino acid residues that are the same, when compared and aligned for maximum correspondence, as measured using one of the sequence comparison algorithms described below (e.g., BLASTP and BLASTN or other algorithms available to persons of skill) or by visual inspection. Depending on the application, the percent “identity” can exist over a region of the sequence being compared, e.g., over a functional domain, or, alternatively, exist over the full length of the two sequences to be compared.

For sequence comparison, typically one sequence acts as a reference sequence to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are input into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. The sequence comparison algorithm then calculates the percent sequence identity for the test sequence(s) relative to the reference sequence, based on the designated program parameters.

Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Nat'l. Acad. Sci. USA 85:2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, Wis.), or by visual inspection (see generally Ausubel et al., infra).

One example of an algorithm that is suitable for determining percent sequence identity and sequence similarity is the BLAST algorithm, which is described in Altschul et al., J. Mol. Biol. 215:403-410 (1990). Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information web-site.

Identical sequences include 100% identity of a polynucleotide comprising a first nucleotide sequence to a polynucleotide comprising a second nucleotide sequence over the entire length of one or both nucleotide sequences. Such sequences can be referred to as “fully identical” with respect to each other herein. However, in some aspects, where a first sequence is referred to as “substantially identical” with respect to a second sequence herein, the two sequences can be fully complementary, or they may have one or more, but generally not more than about 5, 4, 3, or 2 mismatched nucleotides upon alignment. In some aspects, where a first sequence is referred to as “substantially identical” with respect to a second sequence herein, the two sequences can be fully complementary, or they may be about 50, 60, 70, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% identical to each other.

Where a first sequence is referred to as “distinct” with respect to the identity of a second sequence herein, the two sequences have at least one or more mismatched nucleotides upon alignment. In some aspects, distinct sequences can have 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more mismatched nucleotides upon alignment. In some aspects, distinct sequences can be about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 60, 70, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or less than 100% identical to each other. In some aspects, where a first sequence is referred to as “distinct” with respect to a second sequence herein, the two sequences can have substantially or fully identical sequences, but instead differ from one another based upon differing patterns of modification within the sequences. Such modifications are generally known in the art, e.g., methylation.

In some aspects, a polynucleotide can be present in a library of polynucleotides. In some aspects, a polynucleotide library can include a plurality of polynucleotides. In some aspects, each polynucleotide in the plurality of polynucleotides can be derived from a single sample. In some aspects, a single sample can include a single cell such as a B cell.

Conventional notation is used herein to describe nucleotide sequences: the left-hand end of a single-stranded nucleotide sequence is the 5′-end; the left-hand direction of a double-stranded nucleotide sequence is referred to as the 5′-direction. The direction of 5′ to 3′ addition of nucleotides to nascent RNA transcripts is referred to as the transcription direction. The DNA strand having the same sequence as an mRNA is referred to as the “coding strand;” sequences on the DNA strand having the same sequence as an mRNA transcribed from that DNA and which are located 5′ to the 5′-end of the RNA transcript are referred to as “upstream sequences;” sequences on the DNA strand having the same sequence as the RNA and which are 3′ to the 3′ end of the coding RNA transcript are referred to as “downstream sequences.”

The term “messenger RNA” or “mRNA” refers to an RNA that is without introns and that can be translated into a polypeptide.

The term “cDNA” refers to a DNA that is complementary or identical to an mRNA, in either single stranded or double stranded form.

The term “amplicon” refers to the amplified product of a nucleic acid amplification reaction, e.g., RT-PCR.

The term “hybridize” refers to a sequence specific non-covalent binding interaction with a complementary nucleic acid. Hybridization may occur to all or a portion of a nucleic acid sequence. Those skilled in the art will recognize that the stability of a nucleic acid duplex, or hybrids, can be determined by the Tm. Additional guidance regarding hybridization conditions may be found in: Current Protocols in Molecular Biology, John Wiley & Sons, N.Y., 1989, 6.3.1-6.3.6 and in: Sambrook et al., Molecular Cloning, a Laboratory Manual, Cold Spring Harbor Laboratory Press, 1989, Vol. 3.

As used herein, “region” refers to a contiguous portion of the nucleotide sequence of a polynucleotide. Examples of regions are described herein an include identification regions, sample identification regions, plate identification regions, adapter regions, and the like. In some aspects, a polynucleotide can include one or more regions. In some aspects, a polynucleotide can include less than 2, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more regions. In some aspects, regions can be coupled. In some aspects, regions can be operatively coupled. In some aspects, regions can be physically coupled.

As used herein, “variable region” refers to a variable nucleotide sequence that arises from a gene recombination or gene conversion event, such as V(D)J recombination and homologous recombination between upstream VH gene segments and rearranged VDJ genes to produce a final, expressed gene product. Examples are but not limited to immunoglobulin genes and T cell receptor genes. For example, it can include a V, J, and/or D region of an immunoglobulin or T cell receptor sequence isolated from a T cell or B cell of interest, such as an activated T cell or an activated B cell.

As used herein “B cell variable immunoglobulin region” refers to a variable immunoglobulin nucleotide sequence isolated from a B cell. For example, a variable immunoglobulin sequence can include a V, J, and/or D region of an immunoglobulin sequence isolated from a B cell of interest such as a memory B cell, an activated B cell, or plasmablast.

As used herein “barcode” refers to any unique sequence label that can be coupled to at least one nucleotide sequence for, e.g., later identification of the at least one nucleotide sequence.

As used herein, “barcode set” refers to any unique set of sequences that can be coupled to nucleotide sequences from a sample, where each nucleotide sequence is coupled to one barcode sequence in the set, for, e.g., later identification of the nucleotide sequences.

As used herein, “barcode adapter” refers to an oligonucleotide that comprise a unique barcode sequence.

As used herein, “barcode adapter template” refers to a double stranded oligonucleotide that comprise a barcode adapter sequence and is able to be used as a template to amplify and produce single stranded barcode adapter oligonucleotides.

As used herein, “barcode adapter template beads” refers to beads that have barcode adapter templates coupled to them.

As used herein, “barcoding or barcoding reaction” refers to a reaction that links a barcode sequence, or the complement of a barcode sequence, with a nucleic acid. The barcode adapter need not necessarily be covalently linked with the nucleic acid, but the barcode sequence information itself is linked with the nucleic acid. “Barcoding nucleic acids”, “barcoding cells”, “barcoding nucleic acids from cells”, “barcoding nucleic acids from reaction containers”, “barcoding reaction containers” are used interchangeably.

As used herein “identification region” refers to a nucleotide sequence label (e.g., a unique barcode sequence) that can be coupled to at least one nucleotide sequence for, e.g., later identification of the at least one nucleotide sequence. In some aspects, a barcode sequence is used as a sample identification region. In some aspects, a barcode set is used as a sample identification region.

As used herein “immunoglobulin region” refers to a contiguous portion of nucleotide sequence from one or both chains (heavy and light) of an antibody.

As used herein “adapter region” or “adaptor molecule” refers to a linker that couples a first nucleotide sequence to a second nucleotide sequence. In some aspects, an adapter region can include a contiguous portion of nucleotide sequence that acts as a linker. In some aspects, an adaptor region or adaptor molecule can include a binding site, such as a cDNA binding site. For example, a binding site can have the sequence GGG and couples a first sequence to a second sequence via binding between GGG and CCC. In some aspects, the adaptor region or adaptor molecule can comprise elements such as an RNA polymerase promoter, a nicking endonuclease restriction site, a universal priming sequence, a barcode, and a cDNA binding site.

In some aspects, a polynucleotide can include a cDNA region. In some aspects, a polynucleotide can include a sample identification (barcode)-adapter region. In some aspects, a polynucleotide can include a sample identification (barcode) region. In some aspects, a polynucleotide can include an adapter region. In some aspects, a polynucleotide can include a universal primer region. In some aspects, a polynucleotide can include an amplicon region. In some aspects, a polynucleotide can include a plate identification region. In some aspects, a polynucleotide can include a first plate identification region. In some aspects, a polynucleotide can include a second plate identification region. In some aspects, a polynucleotide can include a restriction site region. In some aspects, a polynucleotide can include a first restriction site region. In some aspects, a polynucleotide can include a second restriction site region. In some aspects, a polynucleotide can include a sequencing region. In some aspects, a polynucleotide can include a first sequencing region. In some aspects, a polynucleotide can include a second sequencing region.

In some aspects, a polynucleotide can include a plurality of any region described herein. For example, a polynucleotide can include a first sample identification (barcode) region and a second sample identification (barcode) region. In some aspects, the first sample identification (barcode) region and the second sample identification (barcode) region are identical or substantially identical. In some aspects, the first sample identification (barcode) region and the second sample (barcode) identification region are distinct. In some aspects, an identification (barcode) region is coupled to a variable immunoglobulin region.

In some aspects the sequence of a region will be at least long enough to serve as a target sequence for a primer or a probe in a PCR reaction. In some aspects, a region can be 1 to greater than 5000 base pairs in length. For example, a region can be from 1-10,000 nucleotides in length, e.g., 2-30 nucleotides in length, including all sub-ranges therebetween. As non-limiting examples, a region can be from 1-30 nucleotides, 1-26 nucleotides, 1-23 nucleotides, 1-22 nucleotides, 1-21 nucleotides, 1-20 nucleotides, 1-19 nucleotides, 1-18 nucleotides, 1-17 nucleotides, 18-30 nucleotides, 18-26 nucleotides, 18-23 nucleotides, 18-22 nucleotides, 18-21 nucleotides, 18-20 nucleotides, 19-30 nucleotides, 19-26 nucleotides, 19-23 nucleotides, 19-22 nucleotides, 19-21 nucleotides, 19-20 nucleotides, 20-30 nucleotides, 20-26 nucleotides, 20-25 nucleotides, 20-24 nucleotides, 20-23 nucleotides, 20-22 nucleotides, 20-21 nucleotides, 21-30 nucleotides, 21-26 nucleotides, 21-25 nucleotides, 21-24 nucleotides, 21-23 nucleotides, or 21-22 nucleotides. In some aspects, a region can be about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more nucleotides in length. In some aspects, a region can be less than 50, 50-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, 900-1000, or greater than 1000 nucleotides in length. In some aspects, a region can be less than 1000, 1000-2000, 2000-3000, 3000-4000, 4000-5000, 5000-6000, 6000-7000, 7000-8000, 8000-9000, 9000-10000, or greater than 10000 nucleotides in length. In some aspects, a region can include at least two nucleotides, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20 or more nucleotides of a polynucleotide disclosed herein.

The term “sample” can include RNA, DNA, a single cell or multiple cells or fragments of cells or an aliquot of body fluid, taken from a subject (e.g., a mammalian subject, an animal subject, a human subject, or a non-human animal subject). Samples can be selected by one of skill in the art using any means now known or later discovered including centrifugation, venipuncture, blood draw, excretion, swabbing, ejaculation, massage, biopsy, needle aspirate, lavage sample, scraping, surgical incision, laser capture microdissection, gradient separation, or intervention or other means known in the art. Samples can also be selected by one of skill in the art using one or more markers known to be associated with a sample of interest. Samples can also be selected using methods known in the art such as cell sorting and FACS.

In some aspects a polynucleotide can be derived from or associated with a single sample. In some aspects a region can be derived from or associated with a single sample. In some aspects, a cDNA region can be derived from or associated with a single sample. In some aspects, an amplicon region can be derived from or associated with a single sample. A “single sample” includes a sample comprising polynucleotides that is taken from a single source. In some aspects, a single source includes a sample taken at a particular time point or at a particular location, e.g., in a subject or flask of cells or plate of cells. In some aspects, a first single sample is taken from a first subject at a first time point and a second single sample is taken from the first subject at a second time point that is distinct from the first time point. In some aspects, a first single sample is taken from a first subject at a first location and a second sample is taken from the first subject at a second location that is distinct from the first location. In some aspects, a first single sample is taken from a first subject at a time point and a second single sample is taken from a second subject at a time point. In some aspects, a first single sample is taken from a first subject at a location and a second sample is taken from a second subject at a location. In one embodiment, a sample comprises polynucleotides that include mRNA derived from one or more B cells. In another embodiment, a sample comprises polynucleotides including cDNA derived from one or more B cells. In another embodiment, a single sample comprises mRNA derived from one or more B cells sorted into a single well of a 96-well or 384-well plate. Samples are generally derived from a prokaryotic cell(s) (e.g., a bacterial cell(s)), a eukaryotic cell(s) (e.g., a mammalian and yeast cell(s)), or other sources of genetic material such as a virus or phage. The term “mammal” or “mammalian” as used herein includes both humans and non-humans and include but is not limited to humans, non-human primates, canines, felines, murines, bovines, equines, and porcines. In some aspects, the methods of the invention are applied to single samples in a plate with at least 96 wells, at least 384 wells, at least 1536 wells, or more wells. In further aspects, the methods of the invention are applied to single samples in at least one, two, three, four, five, six, seven, eight, ten, fifteen, twenty, thirty or more plates with at least 96 wells each.

In some aspects a 5′ adaptor region sequence and/or a sample identification region are added to all cDNAs from a single sample, e.g., during RT and not just to Ig genes. In some aspects, 3′ gene specific primers (GSPs) can be used to amplify any expressed gene in the single sample. In some aspects, genes are amplified that have a 5′ variable region, e.g., T cell receptors and B cell receptors without needing multiple degenerate 5′ primers to amplify the gene(s) of interest. GSPs can include primers specific for IgG, IgM, IgD, IgA, IgE, TCR chains, and other genes of interest.

In some aspects, multiple rounds of PCR can also be performed, e.g., using nested GSPs. For such nested GSPs, the GSP for the second round of PCR hybridizes to its target gene sequence at a position 5′ along that sequence relative to the position hybridized to by the GSP used in the first round of PCR.

In some aspects, cDNA region or an amplicon region can include a DNA polynucleotide. In some aspects, cDNA region or an amplicon region can include a cDNA polynucleotide. In some aspects, cDNA region or an amplicon region can include an RNA polynucleotide hybridized to a DNA polynucleotide. In some aspects, cDNA region or an amplicon region can include an mRNA polynucleotide hybridized to a cDNA polynucleotide.

In some aspects, a universal primer region is not fully complementary to any human exon. In some aspects, a universal primer region is not fully complementary to any expressed human gene. In some aspects, a universal primer region has minimal secondary structure.

In some aspects, an amplicon region comprises an immunoglobulin heavy chain amplicon sequence. In some aspects, an amplicon region comprises an immunoglobulin light chain amplicon sequence. In some aspects, an amplicon region comprises a T cell receptor alpha amplicon sequence. In some aspects, an amplicon region comprises a T cell receptor beta amplicon sequence.

In some aspects, a polynucleotide is present in a library of polynucleotides and can be differentiated from other polynucleotides present in the library based on a region of the polynucleotide.

In some aspects, the sequence of the sample identification region of each polynucleotide in a library derived from a first single sample is distinct from the sequence of the sample identification region of the other polynucleotides in the library derived from one or more samples distinct from the first single sample. In some aspects, the sequence of the sample identification region of each polynucleotide in a library derived from a first single sample differs by at least 1 nucleotide from the sequence of the sample identification region of the other polynucleotides in the library derived from one or more samples distinct from the first single sample. In some aspects, the sequence of the sample identification region of each polynucleotide in a library derived from a first single sample differs by at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 or more nucleotides from the sequence of the sample identification region of the other polynucleotides in the library derived from one or more samples distinct from the first single sample. In some aspects, the sequence of the sample identification region of each polynucleotide in a library derived from a first single sample can be about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 60, 70, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or less than 100% identical to the sequence of the sample identification region of the other polynucleotides in the library derived from one or more samples distinct from the first single sample. In some aspects, the sequence of the sample identification region of each polynucleotide in a library derived from a first single sample is less than 100% identical to the sequence of the sample identification region of the other polynucleotides in the library derived from one or more samples distinct from the first single sample. In some aspects, a sample-identification region acts as a digital barcode on all 1^ststrand cDNA reverse transcribed from a single sample. In some aspects, the sample identification region is at least 1 nucleotide in length. In some aspects, a sample-identification region can comprise at least 3 nucleotides, and sample-identification regions can differ from each other by at least 1 nucleotide. In one embodiment, sample-identification regions are 3-15 nucleotides in length and differ from each other by at least 1 nucleotide. In some aspects, sample-identification regions can comprise at least 64 variants (using sample-identification regions 3 nucleotides in length with each sample-ID differing from each other by at least 1 nucleotide), or in some aspects larger numbers of variants. In some aspects, the sequence attached 3′ to the sample-identification region can be an adapter region comprising at least 1 G. In a preferred embodiment, the sequence attached 3′ to the sample-identification region can be an adapter region comprising at least 2 G's. In one embodiment, a sequence attached to the 5′ end of a sample-identification region is a universal primer sequence that can be used during PCR amplification to avoid the need for the subsequent addition of a 5′ universal primer sequence (by ligation or another method) or the use of multiple degenerate 5′ primers to amplify genes with variable 5′ regions. In some aspects, the sequence of the first plate identification region of each polynucleotide in a library derived from a first set of single samples is distinct from the sequence of the first plate identification region of the other polynucleotides in the library derived from one or more single sample sets distinct from the first set of single samples. In some aspects, the sequence of the first plate identification region of each polynucleotide in a library derived from the first set of single samples differs by at least 1 nucleotide from the sequence of the first plate identification region of the other polynucleotides in the library derived from one or more single sample sets distinct from the first set of single samples. In some aspects, the sequence of the first plate identification region of each polynucleotide in a library derived from the first set of single samples differs by at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 or more nucleotides from the sequence of the first plate identification region of the other polynucleotides in the library derived from one or more single sample sets distinct from the first set of single samples. In some aspects, the sequence of the first plate identification region of each polynucleotide in a library derived from the first set of single samples can be about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 60, 70, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or less than 100% identical to sequence of the first plate identification region of the other polynucleotides in the library derived from one or more single sample sets distinct from the first set of single samples. In some aspects, the sequence of the first plate identification region of each polynucleotide in a library derived from the first set of single samples is less than 100% identical to sequence of the first plate identification region of the other polynucleotides in the library derived from one or more single sample sets distinct from the first set of single samples. In some aspects, the sequence of the second plate identification region of each polynucleotide in a library derived from a first set of single samples is distinct from the sequence of the second plate identification region of the other polynucleotides in the library derived from one or more single sample sets distinct from the first set of single samples. In some aspects, the sequence of the second plate identification region of each polynucleotide in a library derived from the first set of single samples differs by at least 1 nucleotide from the sequence of the second plate identification region of the other polynucleotides in the library derived from one or more single sample sets distinct from the first set of single samples. In some aspects, the sequence of the second plate identification region of each polynucleotide in a library derived from the first set of single samples differs by at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 or more nucleotides from the sequence of the second plate identification region of the other polynucleotides in the library derived from one or more single sample sets distinct from the first set of single samples. In some aspects, the sequence of the second plate identification region is identical to the sequence of the first plate identification region on a polynucleotide. In some aspects, the sequence of the second plate identification region of each polynucleotide in a library derived from the first set of single samples can be about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 60, 70, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or less than 100% identical to sequence of the second plate identification region of the other polynucleotides in the library derived from one or more single sample sets distinct from the first set of single samples. In some aspects, the sequence of the second plate identification region of each polynucleotide in a library derived from the first set of single samples is less than 100% identical to sequence of the second plate identification region of the other polynucleotides in the library derived from one or more single sample sets distinct from the first set of single samples. In some aspects, a plate-identification region (e.g., a first plate identification region or a second plate identification region) can comprise at least 2 nucleotides, and plate-identification regions differ from each other by at least 1 nucleotide. In one embodiment, plate-identification regions are 2-10 nucleotides in length and differ from each other by at least 1 nucleotide. In some aspects, use of plate-identification regions is found in only some embodiments, as the use of a larger number of different sample-identification regions (one per single sample to be analyzed) can eliminate the need for plate-identification regions. In some aspects, plate-identification regions are used to reduce the number of unique oligonucleotides containing a sample-identification region that need to be synthesized.

In some aspects, a polynucleotide includes one or more adapter regions. In some aspects, an adapter region includes one or more G's. In some aspects, an adapter region includes 2, 3, 4, 5, 6, 7, 8, 9, 10 or more G's. In some aspects, adapter regions are attached to the 3′ ends of cDNAs using the template switching property of MMLV H-reverse transcriptases. Different methods to attach adaptor regions exist, including but not limited to, doing PCR with primers with 5′ flanking adaptor region sequences, sticky and blunt end ligations, template-switching-mediated addition of nucleotides, or other methods to covalently attach nucleotides to the 5′ end, to the 3′ end, or to the 5′ and 3′ ends of the polynucleotides. These methods can employ properties of enzymes commonly used in molecular biology. PCR can use, e.g., thermophilic DNA polymerase. Sticky ends that are complementary or substantially complementary are created through either cutting dsDNA with restriction enzymes that leave overhanging ends or through 3′ tailing activities of enzymes such as TdT (terminal transferase). Sticky and blunt ends can then be ligated with a complementary adaptor region using ligases such as T4 ligase. Template-switching utilizes the 3′ tailing activity of MMLV H⁻ reverse transcriptase to add one or more cytosines (C's) to the 3′ end of cDNAs and its ability to switch template from mRNA to an adaptor region with complementary G's. In some aspects, a cDNA includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more C's on its 3′ end.

In some aspects, a polynucleotide includes one or more restriction site regions. Restriction site regions include one or more restriction sites. Restrictions sites can include: NheI, XhoI, BstBI, EcoRI, SacII, BbvCI, PspXI, AgeI, ApaI, KpnI, Acc65I, XmaI, BstEII, DraIII, PacI, FseI, AsiSI, and AscI. In some aspects, any rare 8-cutter enzyme restriction site can be used.

In some aspects, one or more regions of a polynucleotide described herein can be operatively coupled to one or more other regions of the polynucleotide. In some aspects, two or more distinct regions of a single polynucleotide can be operatively coupled. For example, a universal primer region can be operatively coupled to an adapter region. In some aspects two or more regions can be operatively coupled together that are substantially identical to each other in sequence or identical in description. For example, a first sample identification region can be operatively coupled to a second sample identification region. In some aspects, the sequences of the first sample identification region and the second sample identification region are identical or substantially identical. In some aspects, the sequences of the first sample identification region and the second sample identification region are different or distinct.

In some aspects, one or more regions of a polynucleotide described herein can be coupled to one or more other regions of the polynucleotide. In some aspects, two or more distinct regions of a single polynucleotide can be coupled. For example, a universal primer region can be coupled to an adapter region. In some aspects two or more regions can be coupled together that are substantially identical to each other in sequence or identical in description. For example, a first sample identification region can be coupled to a second sample identification region. In some aspects, the sequences of the first sample identification region and the second sample identification region are identical or substantially identical. In some aspects, the sequences of the first sample identification region and the second sample identification region are different or distinct.

In some aspects, a polynucleotide includes the sequence 5′-A-B-3′, wherein A is a sample identification region, and wherein B is an adapter region. In some aspects, a polynucleotide includes the sequence 5′-A-B-C-3′, wherein A is a universal primer region, wherein B is a sample identification region, and wherein C is an adapter region. In some aspects, a polynucleotide includes the sequence 5′-A-B-C-3′, wherein A is a sample identification region, wherein B is an adapter region, and wherein C is an amplicon region derived from a single sample. In some aspects, a polynucleotide includes the sequence 5′-A-B-C-D-3′, wherein A is a universal primer region, wherein B is a sample identification region, wherein C is an adapter region, and wherein D is an amplicon region derived from a single sample. In some aspects, a polynucleotide includes the sequence 5′-A-B-C-D-E-3′, wherein A is a plate identification region, wherein B is a universal primer region, wherein C is a sample identification region, wherein D is an adapter region, and wherein E is an amplicon region derived from a single sample. In some aspects, a polynucleotide includes the sequence 5′-A-B-C-D-E-F-3′, wherein A is a first restriction site region, wherein B is a universal primer region, wherein C is a sample identification region, wherein D is an adapter region, wherein E is an amplicon region derived from a single sample, and wherein F is a second restriction site region.

In some aspects, the regions of each of the above sequences can be rearranged in a different order, e.g., 5′-C-A-D-B-3′ or 5′-E-A-C-B-D-F-3′ or 5′-B-A-3′. In some aspects, one or more regions of the above sequences can be deleted, e.g., 5′-A-D-3′ or 5′-B-C-3′. In some aspects, one or more additional regions can be added to the above sequences, e.g., 5′-A-A₂-B-3′ or 5′-A-B-C-D-E-F-G-3′. In such examples the one or more additional regions can be any region disclosed herein or equivalents thereof. In some aspects, one or more regions of the sequences above can be modified, e.g., methylated.

In some aspects, a polynucleotide can include an adapter molecule. In some aspects, a polynucleotide adapter molecule can include a universal primer region, a sample identification region, and an adapter region, wherein the 3′ end of the universal primer region is coupled to the 5′ end of the sample identification region, and wherein the 3′ end of the sample identification region is coupled to the 5′ end of the adapter region. In some aspects, an adapter molecule includes a polynucleotide comprising at least 2 nucleotides that bind to C's added by a reverse transcriptase at the 3′ end of a 1st strand cDNA. In some aspects, an adapter molecule includes a deoxyribose polynucleotide comprising 3-6 G's (DNA G's). In another embodiment, an adapter molecule includes a ribose polynucleotide consisting of 3-6 G's (RNA G's). In other embodiments, the adapter molecule can utilize nucleotide analogues, such locked nucleic acids (LNAs), e.g., LNA G's. In other embodiments, the nucleotide base may also be a universal or degenerate base such as 5-nitroindole and 3-nitropyrrole that can base-pair to C's as well as other nucleotides, in any combination.

In some aspects, a polynucleotide can include a primer or a probe. In some aspects, a primer can include a universal primer region and a plate identification region, and wherein the 3′ end of the plate identification region is coupled to the 5′ end of the universal primer region.

In some aspects, a composition can include a polynucleotide composition library. In some aspects, a polynucleotide composition library includes a plurality of polynucleotide compositions. In some aspects each composition is present in a separate container. In some aspects, a container can be a test tube. In some aspects, a container can be a well in a plate. In some aspects, a container can be a well in a 96-well plate. In some aspects, a container can be a well in a 384-well plate. In some aspects, each composition comprises a cDNA region derived from a single sample. In some aspects, each composition comprises a sample identification-adapter region comprising a sample identification region coupled to an adapter region. In some aspects the sequence of the sample identification region of each sample identification-adapter region in a library is distinct from the nucleotide sequence of the sample identification region of the other sample identification-adapter regions present in each separate container in the library. In some aspects the sample identification-adapter region is attached to the cDNA region. In some aspects the sample identification-adapter region is attached to the cDNA region by binding between their 3′ regions. In some aspects the sample identification-adapter region is attached to the cDNA region by G:C binding. In some aspects, the cDNA region comprises an RNA polynucleotide hybridized to a DNA polynucleotide. In some aspects, the cDNA region comprises an mRNA polynucleotide hybridized to a cDNA polynucleotide.

In some aspects, the plurality of polynucleotide compositions in a polynucleotide library can comprise at least 2, at least 3, at least 10, at least 30, at least 100, at least 300, at least 1000, at least 3000, at least 10,000, at least 30,000, at least 100,000, at least 300,000, at least 1,000,000, at least 3,000,000, at least 10,000,000, at least 30,000,000, or more members. In other aspects, the plurality of polynucleotide compositions in a polynucleotide library can comprise at least 2, at least 3, at least 10, at least 30, at least 100, at least 300, at least 1000, at least 3000, at least 10,000, at least 30,000, or more genes of a cell sample's whole transcriptome. In other aspects, the plurality of polynucleotide compositions in a polynucleotide library comprises at least 1, at least 2, at least 3, at least 10, at least 30, at least 100, at least 300, at least 1000, at least 10,000, at least 100,000, at least 1,000,000, at least 10,000,000, at least 1,000,000,000 or more of the different antibody species present in the blood of an individual. These the antibody species can be expressed by plasmablasts, plasma cells, memory B cells, long-lived plasma cells, naïve B cells, other B lineage cells, or combinations thereof.

Vectors

In some aspects, a composition can include a vector. Vectors can be used in the transformation of a host cell with a nucleic acid sequence. In some aspects, a vector can include one or more polynucleotides described herein. In one embodiment, a library of nucleic acid sequences encoding target polypeptides may be introduced into a population of cells, thereby allowing screening of a library. The term “vector” is used to refer to a carrier nucleic acid molecule into which a nucleic acid sequence can be inserted for introduction into a cell where it can be replicated. A nucleic acid sequence can be “exogenous” or “heterologous” which means that it is foreign to the cell into which the vector is being introduced or that the sequence is homologous to a sequence in the cell but in a position within the host cell nucleic acid in which the sequence is ordinarily not found. Vectors include plasmids, cosmids, and viruses (e.g., bacteriophage). One of skill in the art may construct a vector through standard recombinant techniques, which are described in Maniatis et al., 1988 and Ausubel et al., 1994, both of which references are incorporated herein by reference. In some aspects, a vector can be a vector with the constant regions of an antibody pre-engineered in. In this way, one of skill can clone just the VDJ regions of an antibody of interest and clone those regions into the pre-engineered vector.

The term “expression vector” refers to a vector containing a nucleic acid sequence coding for at least part of a gene product capable of being transcribed. In some cases, RNA molecules are then translated into a protein, polypeptide, or peptide. Expression vectors can contain a variety of “control sequences,” which refer to nucleic acid sequences for the transcription and possibly translation of an operably linked coding sequence in a particular host organism. In addition to control sequences that govern transcription and translation, vectors and expression vectors may contain nucleic acid sequences that serve other functions as well.

In some aspects, a vector can include a promoter. In some aspects, a vector can include an enhancer. A “promoter” is a control sequence that is a region of a nucleic acid sequence at which initiation and rate of transcription are controlled. It may contain genetic elements at which regulatory proteins and molecules may bind such as RNA polymerase and other transcription factors. The phrases “operatively positioned,” “operatively linked,” “under control,” and “under transcriptional control” mean that a promoter is in a correct functional location and/or orientation in relation to a nucleic acid sequence to control transcriptional initiation and/or expression of that sequence. A promoter may or may not be used in conjunction with an “enhancer,” which refers to a cis-acting regulatory sequence involved in the transcriptional activation of a nucleic acid sequence.

A promoter may be one naturally associated with a gene or sequence, as may be obtained by isolating the 5′ non-coding sequences located upstream of the coding segment and/or exon. Such a promoter can be referred to as “endogenous.” Similarly, an enhancer may be one naturally associated with a nucleic acid sequence, located either downstream or upstream of that sequence. Alternatively, certain advantages will be gained by positioning the coding nucleic acid segment under the control of a recombinant or heterologous promoter, which refers to a promoter that is not normally associated with a nucleic acid sequence in its natural environment. A recombinant or heterologous enhancer refers also to an enhancer not normally associated with a nucleic acid sequence in its natural environment. Such promoters or enhancers may include promoters or enhancers of other genes, and promoters or enhancers isolated from any other prokaryotic cell, and promoters or enhancers not “naturally occurring,” i.e., containing different elements of different transcriptional regulatory regions, and/or mutations that alter expression. In addition to producing nucleic acid sequences of promoters and enhancers synthetically, sequences may be produced using recombinant cloning and/or nucleic acid amplification technology, including PCR, in connection with the compositions disclosed herein (see U.S. Pat. Nos. 4,683,202, 5,928,906, each incorporated herein by reference).

In some aspects, a promoter and/or enhancer that effectively directs the expression of the DNA segment in the cell type chosen for expression. One example of such promoter that may be used is the E. coli arabinose or T7 promoter. Those of skill in the art of molecular biology generally are familiar with the use of promoters, enhancers, and cell type combinations for protein expression, for example, see Sambrook et al. (1989), incorporated herein by reference. The promoters employed may be constitutive, tissue-specific, inducible, and/or useful under the appropriate conditions to direct high level expression of the introduced DNA segment, such as is advantageous in the large-scale production of recombinant proteins and/or peptides. The promoter may be heterologous or endogenous.

In some aspects, vectors can include initiation signals and/or internal ribosome binding sites. A specific initiation signal also may be included for efficient translation of coding sequences. These signals include the ATG initiation codon or adjacent sequences. Exogenous translational control signals, including the ATG initiation codon, may need to be provided. One of ordinary skill in the art would readily be capable of determining this and providing the necessary signals. It is well known that the initiation codon must be “in-frame” with the reading frame of the desired coding sequence to ensure translation of the entire insert. The exogenous translational control signals and initiation codons can be either natural or synthetic. The efficiency of expression may be enhanced by the inclusion of appropriate transcription enhancer elements.

In some aspects, a vector can include sequences that increase or optimize the expression level of the DNA segment encoding the gene of interest. An example of such sequences includes addition of introns in the expressed mRNA (Brinster, R. L. et al. (1988) Introns increase transcriptional efficiency in transgenic mice. Proc. Natl. Acad. Sci. USA 85, 836-40; Choi, T. et al. (1991) A generic intron increases gene expression in transgenic mice. Mol. Cell. Biol. 11, 3070-4). Another example of a method for optimizing expression of the DNA segment is “codon optimization”. Codon optimization involves insertion of silent mutations in the DNA segment to reduce the use of rare codons to optimize protein translation (Codon engineering for improved antibody expression in mammalian cells. Carton J M, Sauerwald T, Hawley-Nelson P, Morse B, Peffer N, Beck H, Lu J, Cotty A, Amegadzie B, Sweet R. Protein Expr Purif 2007 October; 55(2):279-86. Epub 2007 Jun. 16).

In some aspects, a vector can include multiple cloning sites. Vectors can include a multiple cloning site (MCS), which is a nucleic acid region that contains multiple restriction enzyme sites, any of which can be used in conjunction with standard recombinant technology to digest the vector (see Carbonelli et al., 1999, Levenson et al., 1998, and Cocea, 1997, incorporated herein by reference.) “Restriction enzyme digestion” refers to catalytic cleavage of a nucleic acid molecule with an enzyme that functions only at specific locations in a nucleic acid molecule. Many of these restriction enzymes are commercially available. Use of such enzymes is understood by those of skill in the art. Frequently, a vector is linearized or fragmented using a restriction enzyme that cuts within the MCS to enable exogenous sequences to be ligated to the vector. “Ligation” refers to the process of forming phosphodiester bonds between two nucleic acid fragments, which may or may not be contiguous with each other. Techniques involving restriction enzymes and ligation reactions are well known to those of skill in the art of recombinant technology.

In some aspects, a vector can include a termination signal. The vectors or constructs will generally comprise at least one termination signal. A “termination signal” or “terminator” is comprised of the DNA sequences involved in specific termination of an RNA transcript by an RNA polymerase. Thus, in certain embodiments, a termination signal that ends the production of an RNA transcript is contemplated. A terminator may be necessary in vivo to achieve desirable message levels.

Terminators contemplated for use include any known terminator of transcription described herein or known to one of ordinary skill in the art, including but not limited to, for example, rho dependent or rho independent terminators. In certain embodiments, the termination signal may be a lack of transcribable or translatable sequence, such as due to a sequence truncation.

In some aspects, a vector can include an origin of replication.

In order to propagate a vector in a host cell, it may contain one or more origins of replication sites (often termed “ori”), which is a specific nucleic acid sequence at which replication is initiated.

In some aspects, a vector can include one or more selectable and/or screenable markers. In certain embodiments, cells containing a nucleic acid construct may be identified in vitro or in vivo by including a marker in the expression vector. Such markers would confer an identifiable change to the cell permitting easy identification of cells containing the expression vector. Generally, a selectable marker is one that confers a property that allows for selection. A positive selectable marker is one in which the presence of the marker allows for its selection, while a negative selectable marker is one in which its presence prevents its selection. An example of a positive selectable marker is a drug resistance marker.

Usually the inclusion of a drug selection marker aids in the cloning and identification of transformants, for example, genes that confer resistance to neomycin, puromycin, hygromycin, DHFR, GPT, zeocin and histidinol are useful selectable markers. In addition to markers conferring a phenotype that allows for the discrimination of transformants based on the implementation of conditions, other types of markers including screenable markers such as GFP, whose basis is colorimetric analysis, are also contemplated. Alternatively, screenable enzymes such as chloramphenicol acetyltransferase (CAT) may be utilized. One of skill in the art would also know how to employ immunologic markers, possibly in conjunction with FACS analysis. The marker used is not believed to be important, so long as it is capable of being expressed simultaneously with the nucleic acid encoding a gene product. Further examples of selectable and screenable markers are well known to one of skill in the art.

In one aspect, the vector can express DNA segments encoding multiple polypeptides of interest. For example, DNA segments encoding both the immunoglobulin heavy chain and light chain can be encoded and expressed by a single vector. In one aspect, both DNA segments can be included on the same expressed RNA and internal ribosome binding site (IRES) sequences used to enable expression of the DNA segments as separate polypeptides (Pinkstaff J K, Chappell S A, Mauro V P, Edelman G M, Krushel L A., Internal initiation of translation of five dendritically localized neuronal mRNAs, Proc Natl Acad Sci USA. 2001 Feb. 27; 98(5):2770-5. Epub 2001 Feb. 20). In another aspect, each DNA segment has its own promoter region resulting in expression of separate mRNAs (Andersen C R, Nielsen L S, Baer A, Tolstrup A B, Weilguny D. Efficient Expression from One CMV Enhancer Controlling Two Core Promoters. Mol Biotechnol. 2010 Nov. 27. [Epub ahead of print]).

Host Cells and Expression Systems

In some aspects, a composition can include a host cell. In some aspects, a host cell can include a polynucleotide or vector described herein. In some aspects, a host cell can include a eukaryotic cell (e.g., insect, yeast, or mammalian) or a prokaryotic cell (e.g., bacteria). In the context of expressing a heterologous nucleic acid sequence, “host cell” can refer to a prokaryotic cell, and it includes any transformable organism that is capable of replicating a vector and/or expressing a heterologous gene encoded by a vector. A host cell can, and has been, used as a recipient for vectors. A host cell may be “transfected” or “transformed,” which refers to a process by which exogenous nucleic acid is transferred or introduced into the host cell. A transformed cell includes the primary subject cell and its progeny.

In particular embodiments, a host cell is a Gram negative bacterial cell. These bacteria are suited for use in that they possess a periplasmic space between the inner and outer membrane and, particularly, the aforementioned inner membrane between the periplasm and cytoplasm, which is also known as the cytoplasmic membrane. As such, any other cell with such a periplasmic space could be used. Examples of Gram negative bacteria include, but are not limited to, E. coli, Pseudomonas aeruginosa, Vibrio cholera, Salmonella typhimurium, Shigella flexneri, Haemophilus influenza, Bordotella pertussi, Erwinia amylovora, Rhizobium sp. The Gram negative bacterial cell may be still further defined as bacterial cell which has been transformed with the coding sequence of a fusion polypeptide comprising a candidate binding polypeptide capable of binding a selected ligand. The polypeptide is anchored to the outer face of the cytoplasmic membrane, facing the periplasmic space, and may comprise an antibody coding sequence or another sequence. One means for expression of the polypeptide is by attaching a leader sequence to the polypeptide capable of causing such directing.

Numerous prokaryotic cell lines and cultures are available for use as a host cell, and they can be obtained through the American Type Culture Collection (ATCC), which is an organization that serves as an archive for living cultures and genetic materials. An appropriate host can be determined by one of skill in the art based on the vector backbone and the desired result. A plasmid or cosmid, for example, can be introduced into a prokaryote host cell for replication of many vectors. Bacterial cells used as host cells for vector replication and/or expression include DH5-alpha, JM109, and KC8, as well as a number of commercially available bacterial hosts such as SURE™ Competent Cells and SOLOPACK™ Gold Cells (STRATAGENE™, La Jolla). In some aspects, other bacterial cells such as E. coli LE392 are contemplated for use as host cells.

Many host cells from various cell types and organisms are available and would be known to one of skill in the art. Similarly, a viral vector may be used in conjunction with a prokaryotic host cell, particularly one that is permissive for replication or expression of the vector. Some vectors may employ control sequences that allow it to be replicated and/or expressed in both prokaryotic and eukaryotic cells. One of skill in the art would further understand the conditions under which to incubate all of the above described host cells to maintain them and to permit replication of a vector. Also understood and known are techniques and conditions that would allow large-scale production of vectors, as well as production of the nucleic acids encoded by vectors and their cognate polypeptides, proteins, or peptides.

In some aspects, a host cell is mammalian. Examples include CHO cells, CHO-K1 cells, or CHO-S cells. Other mammalian host cells include NS0 cells and CHO cells that are dhfr-, e.g., CHO-dhfr-, DUKX-B11 CHO cells, and DG44 CHO cells.

Numerous expression systems exist can that comprise at least a part or all of the compositions disclosed herein. Expression systems can include eukaryotic expression systems and prokaryotic expression systems. Such systems could be used, for example, for the production of a polypeptide product identified as capable of binding a particular ligand. Prokaryote-based systems can be employed to produce nucleic acid sequences, or their cognate polypeptides, proteins and peptides. Many such systems are commercially and widely available. Other examples of expression systems comprise of vectors containing a strong prokaryotic promoter such as T7, Tac, Trc, BAD, lambda pL, Tetracycline or Lac promoters, the pET Expression System and an E. coli expression system.

Polypeptides

In some aspects, a composition can include a polypeptide. In some aspects, a polypeptide encoded by a polynucleotide described herein can be expressed, e.g., from a host cell. The terms “polypeptide” or “protein” include a macromolecule having the amino acid sequence of a native protein, that is, a protein produced by a naturally-occurring and non-recombinant cell; or it is produced by a genetically-engineered or recombinant cell, and comprise molecules having the amino acid sequence of the native protein, or molecules having deletions from, additions to, and/or substitutions of one or more amino acids of the native sequence. The term also includes amino acid polymers in which one or more amino acids are chemical analogs of a corresponding naturally-occurring amino acid and polymers. The terms “polypeptide” and “protein” encompass antigen binding proteins, antibodies, or sequences that have deletions from, additions to, and/or substitutions of one or more amino acids of antigen-binding protein. The term “polypeptide fragment” refers to a polypeptide that has an amino-terminal deletion, a carboxyl-terminal deletion, and/or an internal deletion as compared with the full-length native protein. Such fragments can also contain modified amino acids as compared with the native protein. In certain embodiments, fragments are about five to 500 amino acids long. For example, fragments can be at least 5, 6, 8, 10, 14, 20, 50, 70, 100, 110, 150, 200, 250, 300, 350, 400, or 450 amino acids long. Useful polypeptide fragments include immunologically functional fragments of antibodies, including binding domains. In the case of a binding antibody, useful fragments include but are not limited to a CDR region, a variable domain of a heavy and/or light chain, a portion of an antibody chain or just its variable region including two CDRs, and the like.

The term “isolated protein” means that a subject protein (1) is free of at least some other proteins with which it would normally be found, (2) is essentially free of other proteins from the same source, e.g., from the same species, (3) is expressed by a cell from a different species, (4) has been separated from at least about 50 percent of polynucleotides, lipids, carbohydrates, or other materials with which it is associated in nature, (5) is operably associated (by covalent or noncovalent interaction) with a polypeptide with which it is not associated in nature, or (6) does not occur in nature. Typically, an “isolated protein” constitutes at least about 5%, at least about 10%, at least about 25%, or at least about 50% of a given sample. Genomic DNA, cDNA, mRNA or other RNA, nucleic acids of synthetic origin, or any combination thereof can encode such an isolated protein. Preferably, the isolated protein is substantially free from proteins or polypeptides or other contaminants that are found in its natural environment that would interfere with its therapeutic, diagnostic, prophylactic, research or other use.

In some aspects, a polypeptide can include an antigen binding protein (ABP). An “antigen binding protein” (“ABP”) as used herein means any protein that binds a specified target antigen. “Antigen binding protein” includes but is not limited to antibodies and binding parts thereof, such as immunologically functional fragments. Peptibodies are another example of antigen binding proteins. The term “immunologically functional fragment” (or simply “fragment”) of an antibody or immunoglobulin chain (heavy or light chain) antigen binding protein, as used herein, is a species of antigen binding protein comprising a portion (regardless of how that portion is obtained or synthesized) of an antibody that lacks at least some of the amino acids present in a full-length chain but which is still capable of specifically binding to an antigen. Such fragments are biologically active in that they bind to the target antigen and can compete with other antigen binding proteins, including intact antibodies, for binding to a given epitope. In some embodiments, the fragments are neutralizing fragments. These biologically active fragments can be produced by recombinant DNA techniques, or can be produced by enzymatic or chemical cleavage of antigen binding proteins, including intact antibodies. Immunologically functional immunoglobulin fragments include, but are not limited to, Fab, a diabody (heavy chain variable domain on the same polypeptide as a light chain variable domain, connected via a short peptide linker that is too short to permit pairing between the two domains on the same chain), Fab′, F(ab′)2, Fv, domain antibodies and single-chain antibodies, and can be derived from any mammalian source, including but not limited to human, mouse, rat, camelid or rabbit. It is further contemplated that a functional portion of the antigen binding proteins disclosed herein, for example, one or more CDRs, could be covalently bound to a second protein or to a small molecule to create a therapeutic agent directed to a particular target in the body, possessing bifunctional therapeutic properties, or having a prolonged serum half-life. As will be appreciated by one of skill in the art, an antigen binding protein can include nonprotein components. Additional details about antigen binding proteins and antibodies such as modifications, variants, methods of making, and methods of screening can be found in U.S. Pat. Pub. 20110027287, herein incorporated by reference in its entirety for all purposes.

In some aspects, a polypeptide can include an antibody. The term “antibody” refers to an intact immunoglobulin of any isotype, or a fragment thereof that can compete with the intact antibody for specific binding to the target antigen, and includes, for instance, chimeric, humanized, fully human, and bispecific antibodies. An “antibody” is a species of an antigen binding protein. An intact antibody will generally comprise at least two full-length heavy chains and two full-length light chains, but in some instances can include fewer chains such as antibodies naturally occurring in camelids which can comprise only heavy chains. Antibodies can be derived solely from a single source, or can be “chimeric,” that is, different portions of the antibody can be derived from two different antibodies. The antigen binding proteins, antibodies, or binding fragments can be produced in hybridomas, by recombinant DNA techniques, or by enzymatic or chemical cleavage of intact antibodies. Unless otherwise indicated, the term “antibody” includes, in addition to antibodies comprising two full-length heavy chains and two full-length light chains, derivatives, variants, fragments, and muteins thereof. Furthermore, unless explicitly excluded, antibodies include monoclonal antibodies, bispecific antibodies, minibodies, domain antibodies, synthetic antibodies (sometimes referred to herein as “antibody mimetics”), chimeric antibodies, humanized antibodies, human antibodies, antibody fusions (sometimes referred to herein as “antibody conjugates”), and fragments thereof, respectively. In some embodiments, the term also encompasses peptibodies.

A therapeutically effective amount of an ABP can be administered to a subject in need thereof. ABPs can be formulated in pharmaceutical compositions. These compositions can comprise, in addition to one or more of the ABPs, a pharmaceutically acceptable excipient, carrier, buffer, stabilizer or other materials well known to those skilled in the art. Such materials should be non-toxic and should not interfere with the efficacy of the active ingredient. The precise nature of the carrier or other material can depend on the route of administration, e.g. oral, intravenous, cutaneous or subcutaneous, nasal, intramuscular, intraperitoneal routes.

Pharmaceutical compositions for oral administration can be in tablet, capsule, powder or liquid form. A tablet can include a solid carrier such as gelatin or an adjuvant. Liquid pharmaceutical compositions generally include a liquid carrier such as water, petroleum, animal or vegetable oils, mineral oil or synthetic oil. Physiological saline solution, dextrose or other saccharide solution or glycols such as ethylene glycol, propylene glycol or polyethylene glycol can be included.

For intravenous, cutaneous or subcutaneous injection, or injection at the site of affliction, the active ingredient will be in the form of a parenterally acceptable aqueous solution which is pyrogen-free and has suitable pH, isotonicity and stability. Those of relevant skill in the art are well able to prepare suitable solutions using, for example, isotonic vehicles such as Sodium Chloride Injection, Ringer's Injection, Lactated Ringer's Injection. Preservatives, stabilizers, buffers, antioxidants and/or other additives can be included, as required.

ABP administration is preferably in a “therapeutically effective amount” or “prophylactically effective amount” (as the case can be, although prophylaxis can be considered therapy), this being sufficient to show benefit to the individual. The actual amount administered, and rate and time-course of administration, will depend on the nature and severity of disease being treated. Prescription of treatment, e.g. decisions on dosage etc., is within the responsibility of general practitioners and other medical doctors, and typically takes account of the disorder to be treated, the condition of the individual patient, the site of delivery, the method of administration and other factors known to practitioners. Examples of the techniques and protocols mentioned above can be found in Remington's Pharmaceutical Sciences, 16th edition, Osol, A. (ed), 1980.

A composition can be administered alone or in combination with other treatments, either simultaneously or sequentially dependent upon the condition to be treated.

Immune Cells

A sample can include immune cells. The immune cells can include T cells and B cells. T-cells (T lymphocytes) include, for example, cells that express T cell receptors. B-cells include, for example, activated B cells, blasting B cells, plasma cells, plasmablasts, memory B cells, B1 cells, B2 cells, marginal-zone B cells, and follicular B cells. T cells include activated T cells, blasting T cells, Helper T cells (effector T cells or Th cells), cytotoxic T cells (CTLs), memory T cells, central memory T cells, effector memory T cells and regulatory T cells. A sample can include a single cell in some applications (e.g., a calibration test to define relevant T or B cells) or more generally at least 1,000, at least 10,000, at least 100,000, at least 250,000, at least 500,000, at least 750,000, or at least 1,000,000 cells.

B Cells

As used herein a “B cell” refers to any cell that has at least one rearranged immunoglobulin gene locus. A B cell can include at least one rearranged immunoglobulin heavy chain locus or at least one rearranged immunoglobulin light chain locus. A B cell can include at least one rearranged immunoglobulin heavy chain locus and at least one rearranged immunoglobulin light chain locus. B cells are lymphocytes that are part of the adaptive immune system. B cells can include any cells that express antibodies either in the membrane-bound form as the B-cell receptor (BCR) on the cell surface or as secreted antibodies. B cells can express immunoglobulins (antibodies, B cell receptor). Antibodies can include heterodimers formed from the heavy and light immunoglobulin chains. The heavy chain is formed from gene rearrangements of the variable, diversity, and junctional (VDJ) genes to form the variable region, which is joined to the constant region. The light chain is formed from gene rearrangements of the variable and junctional (VJ) genes to form the variable region, which is then joined to the constant region. Owing to a large possible number of junctional combinations, the variable regions of the antibody gene (which is also the BCR) have huge diversity, enabling B cells to recognize any foreign antigen and mount a response against it.

B-Cell Activation and Differentiation

B cells are activated and differentiate when they recognize an antigen in the context of an inflammatory immune response. They usually include 2 signals to become activated, one signal delivered through BCR (a membrane-bound form of the rearranged immunoglobulin), and another delivered through CD40 or another co-stimulatory molecule. This second signal can be provided through interaction with helper T cells, which express the ligand for CD40 (CD40L) on their surface. B cells then proliferate and may undergo somatic hypermutation, where random changes in the nucleotide sequences of the antibody genes are made, and B cells whose antibodies have a higher affinity B cells are selected. They may also undergo “class-switching”, in which the constant region of the heavy chain encoding the IgM isotype is switched to the constant region encoding the IgG, IgA, or IgE isotype. Differentiating B cells may end up as memory B cells, which are usually of higher affinity and classed switched, though some memory B cells are still of the IgM isotype. Memory B cells can also become activated and differentiate into plasmablasts and ultimately, into plasma cells. Differentiating B cells may also first become plasmablasts, which then differentiate to become plasma cells.

Affinity Maturation and Clonal Families

A clonal family is generally defined by the use of related immunoglobulin heavy chain and/or light chain V(D)J sequences by 2 or more samples. Related immunoglobulin heavy chain V(D)J sequences can be identified by their shared usage of V(D)J gene segments encoded in the genome. Within a clonal family there are generally subfamilies that vary based on shared mutations within their V(D)J segments, that can arise during B cell gene recombination and somatic hypermutation.

Activated B cells migrate and form germinal centers within lymphoid or other tissues, where they undergo affinity maturation. B cells may also undergo affinity maturation outside of germinal centers. During affinity maturation, B cells undergo random mutations in their antibody genes, concentrated in the complementary determining regions (CDRs) of the genes, which encode the parts of the antibody that directly bind to and recognize the target antigen against which the B cell was activated. This creates sub-clones from the original proliferating B cell that express immunoglobulins that are slightly different from the original clone and from each other. Clones compete for antigen and the higher-affinity clones are selected, while the lower-affinity clones die by apoptosis. This process results in the “affinity maturation” of B cells and consequently in the generation of B cells expressing immunoglobulins that bind to the antigen with higher affinity. All the B cells that originate from the same ‘parent’ B cell form clonal families, and these clonal families include B cells that recognize the same or similar antigenic epitopes. In some aspects, we expect that clones present at higher frequencies represent clones that bind to antigen with higher affinity, because the highest-affinity clones are selected during affinity maturation. In some aspects, clones with different V(D)J segment usage exhibit different binding characteristics. In some aspects, clones with the same V(D)J segment usage but different mutations exhibit different binding characteristics.

Memory B Cells

Memory B cells are usually affinity-matured B cells, and may be class-switched. These are cells that can respond more rapidly to a subsequent antigenic challenge, significantly reducing the time included for affinity-matured antibody secretion against the antigen from ˜14 days in a naive organism to ˜7 days.

Plasmablasts and Plasma Cells

Plasma cells can be either long-lived or short-lived. Long-lived plasma cells may survive for the lifetime of the organism, whereas short-lived plasma cells can last for 3-4 days. Long-lived plasma cells reside either in areas of inflammation, in the mucosal areas (in the case of IgA-secreting plasma cells), in secondary lymphoid tissues (such as the spleen or lymph nodes), or in the bone marrow. To reach these divergent areas, plasmablasts fated to become long-lived plasma cells may first travel through the bloodstream before utilizing various chemokine gradients to traffic to the appropriate areas. Plasmablasts are cells that are affinity matured, are typically classed-switched, and usually secrete antibodies, though generally in lower quantities than the quantity of antibody produced by plasma cells. Plasma cells are dedicated antibody secretors.

Characteristics of TCR and BCR Genes

Since identifying recombinations are present in the DNA of each individual adaptive immune cell as well as their associated RNA transcripts, either RNA or DNA can be sequenced. A recombined sequence from a T-cell or B-cell can also be referred to as a clonotype. The DNA or RNA can correspond to sequences from T-cell receptor (TCR) genes or immunoglobulin (Ig) genes that encode antibodies. For example, the DNA and RNA can correspond to sequences encoding alpha, beta, gamma, or delta chains of a TCR. In a majority of T-cells, the TCR is a heterodimer consisting of an alpha-chain and beta-chain. The TCR-alpha chain is generated by VJ recombination, and the beta chain receptor is generated by V(D)J recombination. For the TCR-beta chain, in humans there are 48 V segments, 2 D segments, and 13 J segments. Several bases may be deleted and others added (called N and P nucleotides) at each of the two junctions. In a minority of T-cells, the TCRs consist of gamma and delta chains. The TCR gamma chain is generated by VJ recombination, and the TCR delta chain is generated by V(D)J recombination (Kenneth Murphy, Paul Travers, and Mark Walport, Janeway's Immunology 7th edition, Garland Science, 2007, which is herein incorporated by reference in its entirety).

The DNA and RNA analyzed in the methods can correspond to sequences encoding heavy chain immunoglobulins (IgH) with constant regions (alpha, delta, gamma, epsilon, or mu) or light chain immunoglobulins (IgK or IgL) with constant regions lambda or kappa. Each antibody can have two identical light chains and two identical heavy chains. Each chain is composed of a constant (C) and a variable region. For the heavy chain, the variable region is composed of a variable (V), diversity (D), and joining (J) segments. Several distinct sequences coding for each type of these segments are present in the genome. A specific VDJ recombination event occurs during the development of a B-cell, marking that cell to generate a specific heavy chain. Diversity in the light chain is generated in a similar fashion except that there is no D region so there is only VJ recombination. Somatic mutation often occurs close to the site of the recombination, causing the addition or deletion of several nucleotides, further increasing the diversity of heavy and light chains generated by B-cells. The possible diversity of the antibodies generated by a B-cell is then the product of the different heavy and light chains. The variable regions of the heavy and light chains contribute to form the antigen recognition (or binding) region or site. Added to this diversity is a process of somatic hypermutation which can occur after a specific response is mounted against some epitope. In this process mutations occur in those B-cells that are able to recognize the specific epitope leading to greater diversity in antibodies that may be able to bind the specific epitope more strongly. All these factors contribute to great diversity of antibodies generated by the B-cells. Many billions and maybe more than a trillion distinct antibodies may be generated. The basic premise for generating T-cell diversity is similar to that for generating antibodies by B-cells. An element of T-cell and B-cell activation is their binding to epitopes. The activation of a specific cell leads to the production of more of the same type of cells leading to a clonal expansion.

Complementarity determining regions (CDR), or hypervariable regions, are sequences in the variable domains of antigen receptors (e.g., T cell receptor and immunoglobulin) that can bind an antigen. The chain of each antigen receptor contains three CDRs (CDR1, CDR2, and CDR3). The two polypeptides making T cells (alpha and beta) and immunoglobulin (IgH and IgK or IgL) contribute to the formation of the three CDRs.

The part of CDR1 and CDR2 that is coded for by TCR-beta lies within one of 47 functional V segments. Most of the diversity of CDRs is found in CDR3, with the diversity being generated by somatic recombination events during the development of T lymphocytes.

A great diversity of BCR is present inter and intra-individuals. The BCR is composed of two genes IgH and IgK (or IgL) coding for antibody heavy and light chains. Three Complementarity Determining Region (CDR) sequences that bind antigens and MHC molecules have the most diversity in IgH and IgK (or IgL). The part of CDR1 and CDR2 coded for by IgH lies within one of 44 functional V segments. Most of the diversity in naive B cells emerges in the generation of CDR3 through somatic recombination events during the development of B lymphocytes. The recombination can generate a molecule with one of each of the V, D, and J segments. In humans, there are 44 V, 27 D, and 6 J segments; thus, there is a theoretical possibility of more than 7,000 combinations. In a small fraction of BCRs (about 5%) two D segments are found. Furthermore, several bases may be deleted and others added (called N and P nucleotides) at each of the two junctions generating a great degree of diversity. After B cell activation a process of affinity maturation through somatic hypermutation occurs. In this process progeny cells of the activated B cells accumulate distinct somatic mutations throughout the gene with higher mutation concentration in the CDR regions leading to generating antibodies with higher affinity to the antigens. In addition to somatic hypermutation activated B cells undergo the process of isotype switching. Antibodies with the same variable segments can have different forms (isotypes) depending on the constant segment. Whereas all naive B cells express IgM (or IgD), activated B cells mostly express IgG but also IgM, IgA and IgE. This expression switching from IgM (and/or IgD) to IgG, IgA, or IgE occurs through a recombination event causing one cell to specialize in producing a specific isotype. There is one segment for each IgM, IgD, and IgE, two segments for IgA, and four segments for IgG.

Computer Implementation

In some aspects, one or more methods described herein can be implemented on a computer. In one embodiment, a computer comprises at least one processor coupled to a chipset. Also coupled to the chipset are a memory, a storage device, a keyboard, a graphics adapter, a pointing device, and a network adapter. A display is coupled to the graphics adapter. In one embodiment, the functionality of the chipset is provided by a memory controller hub and an I/O controller hub. In another embodiment, the memory is coupled directly to the processor instead of the chipset.

The storage device is any device capable of holding data, like a hard drive, compact disk read-only memory (CD-ROM), DVD, or a solid-state memory device. The memory holds instructions and data used by the processor. The pointing device may be a mouse, track ball, or other type of pointing device, and is used in combination with the keyboard to input data into the computer system. The graphics adapter displays images and other information on the display. The network adapter couples the computer system to a local or wide area network.

As is known in the art, a computer can have different and/or other components than those described previously. In addition, the computer can lack certain components. Moreover, the storage device can be local and/or remote from the computer (such as embodied within a storage area network (SAN)).

As is known in the art, the computer is adapted to execute computer program modules for providing functionality described herein. As used herein, the term “module” refers to computer program logic utilized to provide the specified functionality. Thus, a module can be implemented in hardware, firmware, and/or software. In one embodiment, program modules are stored on the storage device, loaded into the memory, and executed by the processor.

Embodiments of the entities described herein can include other and/or different modules than the ones described here. In addition, the functionality attributed to the modules can be performed by other or different modules in other embodiments. Moreover, this description occasionally omits the term “module” for purposes of clarity and convenience.

Kits

A kit can include a polynucleotide, a polynucleotide library, a vector, and/or a host cell disclosed herein and instructions for use. The kits may comprise, in a suitable container, a polynucleotide, a polynucleotide library, a vector, and/or a host cell disclosed herein, one or more controls, and various buffers, reagents, enzymes and other standard ingredients well known in the art.

The container can include at least one well on a plate comprising one or more wells. The container can include at least one vial, test tube, flask, bottle, syringe, or other container means, into which a polynucleotide, a polynucleotide library, a vector, and/or a host cell may be placed, and in some instances, suitably aliquoted. Where an additional component is provided, the kit can contain additional containers into which this component may be placed. The kits can also include a means for containing the polynucleotide, a polynucleotide library, a vector, and/or a host cell and any other reagent containers in close confinement for commercial sale. Such containers may include injection or blow-molded plastic containers into which the desired vials are retained. Containers can include labeling with instructions for use and/or warnings.

EXAMPLES

The examples are offered for illustrative purposes only, and are not intended to limit the scope of any embodiment of the present invention in any way. Efforts have been made to ensure accuracy with respect to numbers used (e.g., amounts, temperatures, etc.), but some experimental error and deviation should, of course, be allowed for.

Various methods can employ, unless otherwise indicated, conventional methods of protein chemistry, biochemistry, recombinant DNA techniques and pharmacology, within the skill of the art. Such techniques are explained fully in the literature. See, e.g., T. E. Creighton, Proteins: Structures and Molecular Properties (W.H. Freeman and Company, 1993); A. L. Lehninger, Biochemistry (Worth Publishers, Inc., current addition); Sambrook, et al., Molecular Cloning: A Laboratory Manual (2nd Edition, 1989); Methods In Enzymology (S. Colowick and N. Kaplan eds., Academic Press, Inc.); Remington's Pharmaceutical Sciences, 18th Edition (Easton, Pa.: Mack Publishing Company, 1990); Carey and Sundberg Advanced Organic Chemistry 3^rdEd. (Plenum Press) Vols A and B (1992); Current Protocols in Molecular Biology (2002-; Wiley; Online ISBN: 9780471142720; DOI: 10.1002/04711142727); Current Protocols in Immunology (2001-; Wiley; Online ISBN: 9780471142737; DOI: 10.1002/0471142735).

Example 1: Making Barcode Adapter Template Bead Library in a Single Reaction

The method described below was used to create a barcode adapter template bead library using emulsion PCR, where polymerase chain reaction (PCR) was performed to attach unique barcode adapter templates to each bead (see FIG. 5).

TABLE 4 Oligos used to make barcode adapter template bead library in a single reaction Primer name Sequence emB-T7bridge2 dual-biotin-C18spacer-C18spacer-TAA TAC GAC TCA CTA TAG GAT AAA GCG GCC GCA AAT emB-BCbridge2 mCmCC CCT GTT TAA ACC THH HTH HHH THH HHT HHH THH HHA TTT GCG GCC GCT TTA T (random combination of HH HTH HHH THH HHT HHH THH HH, has 3¹⁸or 387 x10⁶possibilities, giving 387 million unique barcodes) emB-T7bridgefree TAA TAC GAC TCA CTA TAG GAT AAA GCG GCC GCA AAT emB-Rv3 AlexaFluor647-C18spacer-mCmCC CCT GTT TAA ACC T

Streptavidin-coated M-270 Dynabeads® (Life Technologies) were coupled with biotinylated oligonucleotide (“emB_T7bridge2”):

- 1. Beads were resuspended by gently swirling
- 2. 1 mL of M270 beads (approx. 6.7×10⁸beads) were placed into each of three 1.5 mL microfuge tubes, for a total of 3 mL
- 3. Placed on magnet for 3 minutes.
- 4. Supernatant was removed from each tube and resuspended in 1 mL (1×vol) Bind/Wash Buffer (BWB; 1M NaCl, 5 mM Tris, 0.5 mM EDTA)
- 5. Step 4 was repeated twice more followed by final resuspension in 540 μL volume BWB
- 6. 60 μL of 100 μM emB_T7bridge2 was added to beads and incubated for 15 minutes with gentle rotation
- 7. Following incubation, beads were washed 3× with 1 mL BWB buffer, and combined into a single tube
- 8. Beads were stored at 4° C. with 0.01% sodium azide
- 9. Beads were washed 3× with 10 mM Tris before use

Added barcode oligonucleotides and forward and reverse primers to the coupled beads from above in an emulsion-based PCR:

- 1. The following PCR mix (3 mL total volume) was prepared in three 1.5 mL microcentrifuge tubes (VWR Cat. No. 20170-650):

ddH₂0 715.9 μL 10X HiFi PCR buffer 100 μL 50 mM MgSO₄ 50 μL 10 mM dNTP mix 20 μL emB_T7bridge2-labeled Dynabeads (1.2 × 10⁷beads/μL) 50 μL emB_T7bridgefree (10 μM) 4 μL emB_BCandbridge2 (1 pM) 16.6 μL emB_Rv3 (100 μM) 30 μL Thermostable inorganic pyrophosphatase (NEB 2,000 1.5 μL units/mL) Platinum Taq Hifi (Life Technologies, 5 units/μL) 12 μL Total volume 1000 μL

- 2. An oil-surfactant mix was prepared (1 mL total volume):

a. Mineral oil (Sigma) 900 μL b. EM90 (Evonik) 100 uL

- 3. 800 μL of oil-surfactant mix and 200 μL of PCR mix were combined into each of 15 Axygen 2.0 mL Maxymum Recovery conical-bottom microcentrifuge tubes (MCT-200-L-C). Tubes were sealed and shaken for 3 seconds
- 4. Tubes were placed into a Qiagen TissueLyzer II, and shaken for 5 minutes at 14 Hz
- 5. The emulsion was divided among the wells of a VWR 96-well PCR plate (83007-374), with 160 μL of emulsion added per well
- 6. Tubes were thermocycled using the following program:

Initial: 94° C. 2′ 35 Cycles: 94° C. 20″ 42° C. 30″ 68° C. 15″ 50 Cycles: 55° C. 5.5′ 72° C. 30″ Final extension: 68° C. 5′ Hold: 10° C. hold

The emulsion was broken and beads recovered:

- 1. The contents of the PCR plate was transferred into 1.5 mL microcentrifuge tubes (VWR 20170-650), with no more than 0.5 mL of emulsion volume per tube
- 2. 100 uL of 1 uM emB_T7bridgefree primer was added to each tube
- 3. Tubes were topped off with isobutanol, sealed and shaken to mix thoroughly
- 4. Tubes were centrifuged for 1 min at 14,000 rpm
- 5. Tubes were placed on a magnetic strip to draw the beads to the side of the tubes, then as much of the supernatant as possible was aspirated as possible while leaving the pelleted beads behind
- 6. 1 mL of isobutanol was added, mixed well by pipetting up and down until the remaining oil/emulsion volume had dispersed into the isobutanol
- 7. Tubes were placed on a magnetic strip to draw the beads to the side of the tubes, then the isobutanol was aspirated. Beads from all of the tubes were combined into a single tube by first aspirating the supernatant from the tube into which the beads will be combined and then transferring the full volume from another tube, allowing time for the beads to collect at the magnet, then aspirating the supernatant and repeating
- 8. 1 mL of fresh isobutanol was added, mixed well and let rest for 60 seconds
- 9. Isobutanol was aspirated
- 10. 1 mL of 100% ethanol was added, mixed well and let rest for 60 seconds
- 11. Ethanol was aspirated
- 12. Steps 10 & 11 were repeated
- 13. 1 mL of 70% ethanol was added, mixed well and let rest for 60 seconds
- 14. Ethanol was aspirated
- 15. Steps 13 & 14 were repeated
- 16. 1 mL PBS was added, mixed well and let rest for 60 seconds.
- 17. PBS was aspirated
- 18. Steps 15 & 16 were repeated

Beads that incorporated barcode adapter templates were then sorted from non-barcoded beads using a Becton Dickenson FACS Aria III, utilizing the fluorescence from the Alexa Fluor 647 dye incorporated into the emB_Rv3 reverse primer.

Beads were stored in 0.01% sodium azide at 4° C. for storage.

Note that this example makes barcode adapter template beads with a T7 RNAP promoter sequence for amplification of barcode adapters by T7 RNAP. By replacing the T7 RNAP promoter sequence “TAA TAC GAC TCA CTA TAG G” in emB-T7bridge2 with other RNAP promoter sequences, barcode adapters can be amplified using other RNAPs. Also by replacing the promoter sequence with a nicking endonuclease site, such as Nt.BbvCI's “CCT CAG C”, barcode adapters can be amplified using a nicking endonuclease (e.g. Nt.BbvCI) and a strand-displacing DNAP such as Klenow exo-.

Also, “HH HTH HHH THH HHT HHH THH HH” in emB-BCbridge2 gives ˜387 million unique barcodes. When this barcode library is used to barcode even, for e.g., 10 million cells, only 2.5% of the unique barcodes are used. It is expected that the majority of the barcodes are of sufficient distance from one another that the majority of barcode sequence reads from NextGen sequencing are easily distinguishable from one another (with a proportion of reads being discarded), regardless of PCR and sequencing errors.

The emulsion can be made using a variety of methods known to the field, and in this case was made using a shaking method and the resulting droplets were polydisperse with an average droplet diameter of ˜25 μm. Barcode oligonucleotides were amplified with forward and reverse primers and the reverse primer was labeled with a fluorescent tag, which in this example was Alexa Fluor 647—so that beads that incorporated barcode adapter template were distinguishable from unlabeled beads. Bright fluorescent beads that incorporated barcode adapter template were then FACS sorted from dim unlabeled beads.

At the specified concentrations of beads and barcode oligonucleotide in this example, by Poisson distribution beads were loaded into droplets at an average of ˜7 beads per droplet, and we observed that roughly 28% of droplets contained one or more copies of a unique barcode oligonucleotide, while the rest of the droplets contained no barcode oligonucleotide at all. Of the droplets that contained at least one barcode oligonucleotide, ˜70% should contain exactly one barcode oligonucleotide while the remaining ˜30% should contain two or more barcodes. Therefore ˜70% of the barcode template adapter bead library was monoclonal (one unique barcode sequence per bead) and ˜30% was polyclonal.

The end yield of the method described below was roughly 12 million barcode adapter template beads of which ˜8.4 million are monoclonal barcode adapter template beads. And although droplets were filled with ˜7 beads per droplet on average, after breaking the emulsion the yield of beads was ˜2%. Based on binomial distribution, ˜7.7 million unique barcode sequences were present in this barcode adapter template bead library.

When this library of barcode adapter template beads is used to barcode a set of, for e.g., 100,000 cells, we would expect nucleic acids from 30%, or 30,000 cells to be barcoded with barcodes from polyclonal beads. This would be detected as a set of unique barcodes associating nucleic acids to a single cell. Also, we expect nucleic acids from ˜0.9% or ˜630 cells would be expected to be labeled with non-unique barcodes and share a barcode sequence with another cell (see Table 1 and related discussion). This is detectable as more than one variable gene nucleic acid, such as two immunoglobulin heavy chains or two TCR alpha chains being associated with each other and may be discarded from further analysis. Therefore, from this library we would expect ˜99% of the cells with barcoded nucleic acids to be useable.

The concentrations of beads and barcode oligonucleotide can be adjusted to obtain a barcode adapter template bead library with differing proportions of monoclonal and polyclonal beads and a different number of unique barcode sequences present. This will allow for barcoding nucleic acids from single cells to achieve differing proportions of nucleic acids associated to a single cell via a unique barcode, or a set of unique barcodes, and also to change the percentage of barcoded nucleic acids discarded from further analysis.

This barcode adapter template bead making process can be optimized so that we achieve a ratio of monoclonal:polyclonal beads of 90%:10%. This improvement over the current ˜70%:30% ratio can be achieved by several different methods, the simplest of which would be to further dilute the oligo containing the barcode sequence (emB-BCbridge2 in this case) so that fewer copies are divided among the droplets in the emulsion, resulting in a reduced incidence of multiple barcode sequences being encapsulated in any given droplet.

Example 2: Making Barcode Adapter Template Bead Library in a Single Reaction II

The method described below was used to create a barcode adapter template bead library using emulsion PCR, where polymerase chain reaction (PCR) was performed to attach unique barcode adapter templates to each bead (see FIG. 5).

TABLE 5 Oligos used to make barcode adapter template bead library in a single reaction II Primer name Sequence emB-T7bridgeIsceI dual-biotin-C18spacer-C18spacer-TAA TAC GAC TCA CTA TAG GAT AGG GAT AAC AGG GTA ATA GGA emB_BCbridgeISceI_2 mCmCC CCA GTT TAA ACT CCTH HHT HHH HTH HHH THH HTH HHH TCC TAT TAC CCT GTT ATC CC (random combination of HH HTH HHH THH HHT HHH THH HH, has 3¹⁸or 387 x10⁶possibilities, giving 387 million unique barcodes) emB- TAA TAC GAC TCA CTA TAG GAT AG T7bridgefreeIsceI_2 GGATAACAGGGTAATAGGA emB_IsceI_RV AlexaFluor647-mCmCC CCA GTT TAA ACT CCT

Streptavidin-coated M-270 Dynabeads® (Life Technologies) were coupled with biotinylated oligonucleotide (“emB_T7bridgeIsceI”):

- 1. Beads were resuspended by gently swirling
- 2. 1 mL of M270 beads (approx. 6.7×10⁸beads) were placed into each of three 1.5 mL microfuge tubes, for a total of 3 mL
- 3. Placed on magnet for 3 minutes.
- 4. Supernatant was removed from each tube and resuspended in 1 mL (1×vol) Bind/Wash Buffer (BWB; 1M NaCl, 5 mM Tris, 0.5 mM EDTA)
- 5. Step 4 was repeated twice more followed by final resuspension in 540 μL volume BWB
- 6. 60 μL of 100 μM emB_T7bridgeIsceI was added to beads and incubated for 15 minutes with gentle rotation
- 7. Following incubation, beads were washed 3× with 1 mL BWB buffer, and combined into a single tube
- 8. Beads were stored at 4° C. with 0.01% sodium azide
- 9. Beads were washed 3× with 10 mM Tris before use

Added barcode oligonucleotides and forward and reverse primers to the coupled beads from above in an emulsion-based PCR:

- 1. The following PCR mix (3 mL total volume) was prepared in three 1.5 mL microcentrifuge tubes (VWR Cat. No. 20170-650):

ddH₂0 715.9 μL 10X HiFi PCR buffer 100 μL 50 mM MgSO₄ 50 μL 10 mM dNTP mix 20 μL emB_T7bridgeIsceI-labeled Dynabeads (1.2 × 10⁷beads/μL) 50 μL emB_T7bridgefreeIsceI_2 (10 μM) 4 μL emB_BCbridgeISceI_2 (1 pM) 16.6 μL emB_IsceI_RV (100 μM) 30 μL Thermostable inorganic pyrophosphatase (NEB 2,000 1.5 μL units/mL) Platinum Taq Hifi (Life Technologies, 5 units/μL) 12 μL Total volume 1000 μL

- 2. An oil-surfactant mix was prepared (1 mL total volume):

a. Mineral oil (Sigma) 900 μL b. EM90 (Evonik) 100 μL

- 3. 800 μL of oil-surfactant mix and 200 μL of PCR mix were combined into each of 15 Axygen 2.0 mL Maxymum Recovery conical-bottom microcentrifuge tubes (MCT-200-L-C). Tubes were sealed and shaken for 3 seconds
- 4. Tubes were placed into a Qiagen TissueLyzer II, and shaken for 5 minutes at 14 Hz
- 5. The emulsion was divided among the wells of a VWR 96-well PCR plate (83007-374), with 160 μL of emulsion added per well
- 6. Tubes were thermocycled using the following program:

Initial: 94° C. 2′ 35 Cycles: 94° C. 20″ 42° C. 30″ 68° C. 15″ 50 Cycles: 55° C. 5.5′ 72° C. 30″ Final extension: 68° C. 5′ Hold: 10° C. hold

The emulsion was broken and beads recovered:

- 1. The contents of the PCR plate was transferred into 1.5 mL microcentrifuge tubes (VWR 20170-650), with no more than 0.5 mL of emulsion volume per tube
- 2. 100 uL of 1 uM emB_T7bridgefreeIsceI_2 primer was added to each tube
- 3. Tubes were topped off with isobutanol, sealed and shaken to mix thoroughly
- 4. Tubes were centrifuged for 1 min at 14,000 rpm
- 5. Tubes were placed on a magnetic strip to draw the beads to the side of the tubes, then as much of the supernatant as possible was aspirated as possible while leaving the pelleted beads behind
- 6. 1 mL of isobutanol was added, mixed well by pipetting up and down until the remaining oil/emulsion volume had dispersed into the isobutanol
- 7. Tubes were placed on a magnetic strip to draw the beads to the side of the tubes, then the isobutanol was aspirated. Beads from all of the tubes were combined into a single tube by first aspirating the supernatant from the tube into which the beads will be combined and then transferring the full volume from another tube, allowing time for the beads to collect at the magnet, then aspirating the supernatant and repeating
- 8. 1 mL of fresh isobutanol was added, mixed well and let rest for 60 seconds
- 9. Isobutanol was aspirated
- 10. 1 mL of 100% ethanol was added, mixed well and let rest for 60 seconds
- 11. Ethanol was aspirated
- 12. Steps 10 & 11 were repeated
- 13. 1 mL of 70% ethanol was added, mixed well and let rest for 60 seconds
- 14. Ethanol was aspirated
- 15. Steps 13 & 14 were repeated
- 16. 1 mL PBS was added, mixed well and let rest for 60 seconds.
- 17. PBS was aspirated
- 18. Steps 15 & 16 were repeated

Beads that incorporated barcode adapter templates were then sorted from non-barcoded beads using a Becton Dickenson FACS Aria III, utilizing the fluorescence from the Alexa Fluor 647 dye incorporated into the emB_IsceI_RV reverse primer.

Beads were stored in 0.01% sodium azide at 4° C. for storage.

Note that this example makes barcode adapter template beads with a T7 RNAP promoter sequence for amplification of barcode adapters by T7 RNAP. By replacing the T7 RNAP promoter sequence “TAA TAC GAC TCA CTA TAG G” in emB-T7bridgeIsceI with other RNAP promoter sequences, barcode adapters can be amplified using other RNAPs. Also by replacing the promoter sequence with a nicking endonuclease site, such as Nt.BbvCI's “CCT CAG C”, barcode adapters can be amplified using a nicking endonuclease (e.g. Nt.BbvCI) and a strand-displacing DNAP such as Klenow exo-.

Also, “HH HTH HHH THH HHT HHH THH HH” in emB-BCbridgeIsceI_2 gives ˜387 million unique barcodes. When this barcode library is used to barcode even, for e.g., 10 million cells, only 2.5% of the unique barcodes are used. It is expected that the majority of the barcodes are of sufficient distance from one another that the majority of barcode sequence reads from NextGen sequencing are easily distinguishable from one another (with a proportion of reads being discarded), regardless of PCR and sequencing errors.

At the specified concentrations of beads and barcode oligonucleotide in this example, by Poisson distribution beads were loaded into droplets at an average of ˜7 beads per droplet, and we observed that roughly 25% of droplets contained one or more copies of a unique barcode oligonucleotide, while the rest of the droplets contained no barcode oligonucleotide at all. Of the droplets that contained at least one barcode oligonucleotide, ˜75% contained exactly one barcode oligonucleotide while the remaining ˜25% contained two or more barcodes. Therefore ˜75% of the barcode template adapter bead library was monoclonal (one unique barcode sequence per bead) and ˜25% was polyclonal.

The end yield of the method described below was roughly 50 million barcode adapter template beads of which ˜37.5 million were monoclonal beads. Although droplets were filled with ˜7 beads per droplet on average, after breaking the emulsion the yield of beads was ˜11%. Based on binomial distribution, ˜28 million monoclonal beads with unique barcode sequences were present.

When this library of barcode adapter template beads is used to barcode a set of 100,000 cells, we would expect nucleic acids from 25%, or 25,000 cells to be barcoded with barcodes from polyclonal beads. This would be detected as a set of unique barcodes associating nucleic acids to a single cell. Also, we expect nucleic acids from ˜0.2% or ˜200 cells would be expected to be labeled with non-unique barcodes and share a barcode sequence with another cell (see Table 1 and related discussion). This is detectable as more than one variable gene nucleic acid, such as two immunoglobulin heavy chains or two TCR alpha chains being associated with each other and may be discarded from further analysis. Therefore, from this library we would expect ˜99% of the cells with barcoded nucleic acids to be useable.

The concentrations of beads and barcode oligonucleotide can be adjusted to obtain a barcode adapter template bead library with differing proportions of monoclonal and polyclonal beads and a different number of unique barcode sequences present. This will allow for barcoding nucleic acids from single cells to achieve differing proportions of nucleic acids associated to a single cell via a unique barcode, or a set of unique barcodes, and also to change the percentage of barcoded nucleic acids discarded from further analysis.

Example 3: Making Barcode Adapter Template Bead Library in Multi-Steps

In this example reactions as per FIG. 6 were done, except that that was only one S1, one W, and one S2 barcode sequence used. Therefore, pooling of beads coupled to different S1 sequences did not occur, and similarly, beads were not pooled after the polymerase extension reaction to add W sequences to the S1 oligo.

This example can be easily extended to be done as per FIG. 6 simply by having multiple S1-oligo, W-oligo and S2-oligo with unique barcode sequences.

TABLE 6 Oligos used to make barcode adapter template bead library in a single reaction Primer name Sequence S1-oligo Desthiobiotin-18C spacer-ATA TTA ATA CGA CTC ACT ATA GGC ATA GGG ATA ACA GGG TAA TGA [S1] AG, where S1=GATGGAT W-oligo-a CCT CCT CCT CCT CCC [W] CTI III III TCA TTA CCC TGT TAT CCC TAT GCC, where W=AGTGAGCTGCGT W-oligo-b CT CCT CCT CCC [W] CTI III III TCA TTA CCC TGT TAT CCC TAT GCC, where W=AGTGAGCTGCGT S2-oligo-a mCmCC CT [S2] TCC TCC TCC TCC TCC C, where S2=CCTAACC S2-oligo-b mCmCC CT [S2] CTC CTC CTC CC, where S2=CCTAACC

Streptavidin-coated M-270 Dynabeads® (Life Technologies) were coupled with biotinylated oligonucleotides containing S1 sequence in individual reactions:

- 1. Beads were resuspended by gently swirling
- 2. M270 beads (Life Technologies) were placed on magnet for 3 minutes.
- 3. Supernatant was removed from each tube and resuspended in (1×vol) 0.5× Bind/Wash Buffer (BWB; 1M NaCl, 5 mM Tris, 0.5 mM EDTA)
- 4. Step 4 was repeated twice more followed by final resuspension in BWB buffer
- 5. 10 μM S1-oligo was added to beads and incubated for 15 minutes with gentle rotation
- 6. Following incubation, beads were washed 3× with BWB buffer
- 7. Beads were stored at 4° C. with 0.01% sodium azide
- 8. Beads were washed 3× with 10 mM Tris before use

Coupled beads were then pooled together, and an extension reaction using w-oligo was performed.

For w extension reaction:

ddH₂O 26.1 μL 10x Taq buffer 5 μL 100 mM MgCl₂ 4.25 μL 20% Tween 20 0.125 μL 100X BSA 5 μL S1-coupled beads (1 mg in 20 μL) 5 μL dNTP 1 μL Taq (NEB) 0.5 μL TIPP (NEB) 0.025 μL 100 μM W-oligo-a OR W-oligo-b 3 μL

- Incubated at 55° C. overnight in an shaking incubator, shaking at 800 rpm.

Beads were pooled and washed thrice with 1×BWB buffer. The anti-sense strand was then melted in 70° C. melt buffer (50 mM NaCl, 10 mM Tris pH 8.0). Beads were pelleted with a magnet and supernatant removed entirely, then beads are washed thrice in 1 mL TE0.1 and then resuspended in TE0.1 at 1 mg/20 uL.

For s2 extension reaction (per 250 μg beads):

ddH₂O 24.5 μL 10x Taq buffer 5 μL 100 mM MgCl₂ 4.25 μL 20% Tween 20 0.125 μL 100X BSA 5 μL S1 + w-a or S1 + w-b Beads 5 μL dNTP 1 μL 100 μM S2-oligo-a OR S2-oligo-b 3 μL

- S2-oligo-a was used with S1+w-a beads, and S2-oligo-b was used with S1+w-b beads. Incubated at 60° C. for 10 min then slowly cooled to 37° C. Incubated at 37° C. for 2 hours, shaken at 800 rpm. Reaction was then allowed to cool to room temperature.
- Then the following was added:

dNTP (NEB) 1 μL Ecoli pyrophosphatase (NEB) 0.1 μL Klenow fragment (NEB) 1 μL

- Reaction was incubated at 25° C. for 3 hours, shaking at 800 rpm. Every hour reaction was refreshed with 1 μL dNTP.

Beads were pooled and washed thrice with 1×BWB buffer. Beads were stored at 4° C. with 0.01% sodium azide and were washed 3× with 10 mM Tris before use.

A small aliquot of barcode adapter template beads were also used in an in vitro transcription reaction using T7 RNAP to determine if making of the beads was successful. If successful, T7 RNAP would be able to transcribe RNA off the double stranded T7 promoter present in the s1-oligo sequence. Megascript T7 kit (Life Technologies) was used and manufacturer's instructions were followed. 5 μL of reaction was run on an RNA Flashgel (Lonza). See FIG. 11.

The number of unique barcode sequences as formed from the combination of S1, W, and S2 sequences can be increased or decreased as desired. For example, as can be seen in Table 1, if the number of unique barcodes is ˜10× greater than the number of cells to be barcoded, as determined by the binomial distribution, we can expect ˜10% of cells to share identical barcodes and thus discarded during bioinformatic linking of nucleic acids to one another (this is detectable as more than one variable gene nucleic acid, such as two immunoglobulin heavy chains or two TCR alpha chains being associated with each other) Therefore, from such a library we can expect ˜90% of barcoded cells to be successfully barcoded with a unique sequence enabling proper informatics linkage of nucleic acids to one another.

Therefore, the number of S1_x, W, and S2_ysequences required is dependent on the desired number of cells to be barcoded. In Table 7, the W-extension reaction is envisioned to occur in 96-well plates, and an identical number of S1_xand S2_ysequences are used. As can be seen, to barcode 10 million cells, at most 323 S1_xand S2_yoligos and 960 W_zoligos are required. These are manageable numbers, especially if the reactions are done in 96-well plates, necessitating a total of only 18 96-well plates to perform the reactions to make a barcode adapter template bead library of the desired size.

TABLE 7 Number of S1_x, W_z, and S2_ysequences required to obtain a barcode adapter template library of sufficient size to barcode nucleic acids from a desired number of cells # cells to be barcoded 1,000 10,000 100,000 1,000,000 10,000,000 # unique beads required 10,000 100,000 1,000,000 10,000,000 100,000,000 # S1/S2 11 33 103 323 1021 required if 96 w # S1/S2 4 11 33 103 323 required if 960 w # S1/S2 2 4 11 33 103 required if 9600 w

Also, it is desirable for the barcodes in S1_x, S2_y, and W_zto be designed to be a minimum Hamming distance apart, with this minimum being 2. With this minimum, only barcode sequence reads from NextGen sequencing with an exact match to the barcode sequence are used; barcode sequence reads with errors are discarded. If the Hamming distance or edit distance used is increased to a minimum of 3, then error-correction is possible.

Note that this example makes barcode adapter template beads with a T7 RNAP promoter sequence for amplification of barcode adapters by T7 RNAP. By replacing the T7 RNAP promoter sequence “TAA TAC GAC TCA CTA TAG G” in emB-T7bridge2 with other RNAP promoter sequences, barcode adapters can be amplified using other RNAPs. Also by replacing the promoter sequence with a nicking endonuclease site, such as Nt.BbvCI's “CCT CAG C”, barcode adapters can be amplified using a nicking endonuclease (e.g. Nt.BbvCI) and a strand-displacing DNAP such as Klenow exo-.

Example 4: Making Aqueous Barcode Adapter Template

Aqueous barcode adapter templates that were not coupled to beads were also synthesized to demonstrate that they could work in non-bead situations, and the broad applicability of the method.

A reaction mix was prepared as described below:

ddH2O 353 μL 10x HiFi Buffer 50 μL 50 mM MgSO₄ 20 μL 10 mM dNTP mix 10 μL 10 μM emB_T7bridge2 (Refer to Table 4) 25 μL 1 pM emB_BCbridge2 (Refer to Table 4) 13 μL 10 μM emB_RV3 (Refer to Table 4) 25 μL Platinum Taq HiFi (Life Technologies) 4 μL Total Volume 500 μL

The reaction mix was then aliquoted into a 96-well PCR plate at 25 μL per well and thermocycled as follows:

Initial: 95° C. 1′ 22 Cycles: 95° C. 20″ 46° C. 30″ 68° C. 30″ Final extension: 68° C. 5′ Hold: KFC hold

The resulting PCR product, which is the barcode adapter template, was then blunted to remove A overhangs:

NEBuffer 2 162 μL 10 mM dNTPs 30 μL T4 DNA polymerase (New England Biolabs) 2 μL

2.5 μL of the blunting mix was added to each 25 μL reaction volume, and incubated at 12° C. for 15 minutes. 1 μL of 250 mM EDTA was then added to each 25 μL reaction volume and heated to 75° C. for 20 minutes to inactivate the enzyme.

The reaction was cleaned up and quantitated:

- 1. Reactions were then pooled and cleaned up using Zymo Research RNA Clean and Concentrator kit following manufacturer instructions
- 2. Picogreen quantitation kit (Life Technologies) was used to quantify the DNA and barcode adapter template concentration adjusted to 55 ng/uL

Note that this example makes barcode adapter template beads with a T7 RNAP promoter sequence for amplification of barcode adapters by T7 RNAP. By replacing the T7 RNAP promoter sequence “TAA TAC GAC TCA CTA TAG G” in emB-T7bridge2 with other RNAP promoter sequences, barcode adapters can be amplified using other RNAPs. Also by replacing the promoter sequence with a nicking endonuclease site, such as Nt.BbvCI's “CCT CAG C”, barcode adapters can be amplified using a nicking endonuclease (e.g. Nt.BbvCI) and a strand-displacing DNAP such as Klenow exo-.

Example 5: Adding Barcodes from Barcode Adapter Templates to mRNA in Different Reaction Buffers

This example shows that the method is useable in a variety of different buffers. Barcode adapter templates were made as described above in Example 4.

TABLE 8 Composition of reaction buffers Buffer name Composition 1x MMLV 50 mM Tris-HCl 75 mM KCl 3 mM MgCl2 10 mM DTT pH 8.3 @ 25° C. 1x Thermopol DF 20 mM Tris-HCl 10 mM (NH4)2SO4 10 mM KCl 2 mM MgSO4 pH 8.8@25° C. 1x TAE 40 mM Tris 20 mM acetic acid 1 mM EDTA

The following reactions were set up:

Using 0.5×MMLV Buffer

ddH2O 4.8 μL 10x MMLV buffer (NEB) 1.25 μL 100X BSA (NEB) 1.25 μL 100 mM MgCl2 1.75 μL 50 μM oligo(dT)20VN 0.5 μL NTP mix (from Life Technologies Megascript SP6 kit) 2 μL dNTP (NEB) 1.25 μL barcode adapter template (55 ng/μL) 0.6 μL Ribolock (Thermo Scientific) 0.6 μL Total PBMC RNA (50 ng/ul) 4 μL

The above was heated to 55° C. for 3 minutes, then the following was added:

Ribolock (Thermo Scientific) 0.4 μL E. coli inorganic pyrophosphatase (NEB) 2 μL T7 RNAP (NEB) 1 μL T4gp32 (NEB) 0.6 μL Maxima H- RTase (Thermo Scientific) 3 μL

T7 RNAP linear amplification of barcode adapters from barcode adapter template, reverse transcription and addition of barcodes to 1^ststrand cDNA was performed at 42° C. for 2 hours.

Using Thermopol Buffer:

ddH2O 3.3 μL 10x Thermopol DF (NEB) 2.5 μL 1M DTT 0.25 μL 100X BSA (NEB) 1.25 μL 100 mM MgCl2 1.75 μL 50 μM Oligo(dT)20VN 0.5 μL NTP mix (from Life Technologies Megascript SP6 kit) 2 μL dNTP (NEB) 1.25 μL barcode adapter template (55 ng/μL) 0.6 μL Ribolock (Thermo Scientific) 0.6 μL Total PBMC RNA (50 ng/μL) 4 μL

The above was heated to 55° C. for 3 minutes, then the following was added:

Ribolock (Thermo Scientific) 0.4 μL E. coli inorganic pyrophosphatase (NEB) 2 μL T7 RNAP (NEB) 1 μL T4gp32 (NEB) 0.6 μL Maxima H- RTase (Thermo Scientific) 3 μL

T7 RNAP linear amplification of barcode adapters from barcode adapter template, reverse transcription and addition of barcodes to 1^ststrand cDNA was performed at 42° C. for 2 hours.

Using TAE Buffer:

ddH2O 4.55 μL 5x TAE 1.25 μL 1M DTT 0.25 μL 100X BSA (NEB) 1.25 μL 100 mM MgCl2 1.75 μL 50 μM Oligo(dT)20VN 0.5 μL NTP mix (from Life Technologies Megascript SP6 kit) 2 μL dNTP (NEB) 1.25 μL barcode adapter template (55 ng/μL) 0.6 μL Ribolock (Thermo Scientific) 0.6 μL Total PBMC RNA (50 ng/μL) 4 μL

The above was heated to 55° C. for 3 minutes, then the following was added:

Ribolock (Thermo Scientific) 0.4 μL E. coli inorganic pyrophosphatase (NEB) 2 μL T7 RNAP (NEB) 1 μL T4gp32 (NEB) 0.6 μL Maxima H- RTase (Thermo Scientific) 3 μL

T7 RNAP linear amplification of barcode adapters from barcode adapter template, reverse transcription, and addition of barcodes to 1^ststrand cDNA was performed at 42° C. for 2 hours.

The reaction was then cleaned up using a modified traditional phenol/chloroform method:

- 1. 200 μL of TE0.1 (10 mM Tris pH 8.0, 0.1 mM EDTA) was added to each reaction mix
- 2. 200 μL of Phenol/chloroform/isoamyl alcohol (Sigma) was added to each reaction mix and shaken vigorously in pre-spun Gel Phase Lock tubes (5Prime)
- 3. Gel Phase Lock tubes were centrifuged at 14,000 g for 3 minutes and the top aqueous fraction was pipetted into Amicon 100 kDa columns (Millipore) and spun at 14,000 g for 3 minutes
- 4. 450 μL TE (10 mM Tris, pH 8.0, 1 mM EDTA) was then pipetted into the Amicon column, and spun at 14,000 g for 3 minutes
- 5. 450 μL of 10 mM Tris (pH8.0) was then pipetted into the Amicon column and spun at 14,000 g for 5 minutes
- 6. The Amicon column was inverted into a new collection tube and spun at 1000 g for 2 minutes to collect the elute which contained the purified mRNA/1st strand cDNA duplex

Two rounds of PCR (PCR1 and PCR2) were then performed:

TABLE 9 PCR1 and PCR2 primer sequences Primer name Sequence L_GSP1 TYT GTG GGA CTT CCA CTG CTC G_GSP1 TCT TGT CCA CCT TGG TGT TGC TG K_GSP1 CGA TTG GAG GGC GTT ATC CAC K_GSP2 CTA TGC GCC TTG CCA GCC CGC TCA GTC AGA TGG CGG GAA GAT GAA GAC L_GSP2 CTA TGC GCC TTG CCA GCC CGC TCA GGA GGA GGG YGG GAA CAG AGT GAC G_GSP2 CTA TGC GCC TTG CCA GCC CGC TCA GGG GAA GTA GTC CTT GAC CAG GCA G BC_Long GAG AGA CTG ACA GCG TAT CGC CTC CCT CGC GCC ATC AGA CGA GTG CGT GGA TAA AGC GGC CGC AAA T FW_1short GAG AGA CTG ACA GCG TAT CGC CTC 2FR CGT ATC GCC TCC CTC GCG and CTA TGC GCC TTG CCA GCC C mixed 1:1

The following PCR1 Phusion (Thermo Scientific) reaction mix was set up per RT reaction:

H2O 11.28 μL 5x GC buffer 5 μL MgCl2 0.15 μL DMSO 1 μL dNTP 0.5 μL 10 μM FW1-short 1 μL 10 μM BC-Long 1 μL 10 μM K-GSP1 0.56 μL 10 μM L-GSP1 1.25 μL 10 μM G-GSP1 0.56 μL ET-SSB (NEB) 0.25 μL BSA 0.25 μL Phusion 0.2 μL cDNA template 2 μL

Initial: 95° C. 5′ 18 Cycles: 98° C. 30″ 62° C. 30″ 72° C. 45″ Final extension: 72° C. 5′ Hold: 10° C. hold

The reactions from PCR1 were then diluted 50× and used as a template in 3 separate PCR2 reactions, one for kappa light chain, one for lambda light chain and one for gamma heavy chain.

The following PCR2 Phusion (Thermo Scientific) reaction mixes were set up per RT reaction:

H2O 17.82 μL 5x GC buffer 6 μL MgCl2 0.18 μL DMSO 1 μL dNTP 0.6 μL 10 μM 2FW 1.2 μL 10 μM K or L or G-GSP2 0.6 μL BSA 0.3 μL Phusion 0.3 μL Dil. PCR1 template 2 μL

Initial: 95° C. 5′ 28 cycles: 98° C. 30″ 65° C. 30″ 72° C. 45″ Final extension: 72° C. 5′ Hold: 10° C. hold

5 μL of product was run on a gel (FIG. 8). As can be seen, the barcoding reaction works well in a variety of different buffers that contain a variety of different ions such as potassium, ammonium, chloride, sulphate, and acetate ions.

Example 6: RNA Barcode Adapters Amplified from Barcode Adapter Templates Work Better than Unamplified DNA Barcode Adapters

This example shows that the method is useable in a variety of different buffers with different salt concentrations. Also, the method as described with amplified RNA barcode adapters from barcode adapter templates work better than just adding DNA barcode adapters into the reaction. This is presumably due to a reaction with RNA barcode adapters resulting in lower background (see FIG. 3). Barcode adapter templates were made as described above in Example 4.

TABLE 10 Additional oligo sequences Primer name Sequence DNA barcode TYT GTG GGA CTT CCA CTG CTC adapter w24 FW1_Long GAG AGA CTG ACA GCG TAT CGC CTC CCT CGC GCC ATC AGA CGA GTG CGT CAC GAC CGG TGC TCG ATT TAG

The following reactions were set up and buffer compositions are as in Table 8:

Using 1×MMLV Buffer

ddH₂O 3.55 μL 10x MMLV buffer (NEB) 2.5 μL 100X BSA (NEB) 1.25 μL 100 mM MgCl₂ 1.75 μL 50 μM oligo(dT)₂₀VN 0.5 μL NTP mix (from Life Technologies Megascript SP6 kit) 2 μL dNTP (NEB) 1.25 μL barcode adapter template (55 ng/μL) 0.6 μL Ribolock (Thermo Scientific) 0.6 μL Total PBMC RNA (50 ng/ul) 4 μL

The above was heated to 55° C. for 3 minutes, then the following was added:

Ribolock (Thermo Scientific) 0.4 μL E. coli inorganic pyrophosphatase (NEB) 2 μL T7 RNAP (NEB) 1 μL T4gp32 (NEB) 0.6 μL Maxima H- RTase (Thermo Scientific) 3 μL

T7 RNAP linear amplification of barcode adapters from barcode adapter template, reverse transcription, and addition of barcodes to 1st strand cDNA was performed at 42° C. for 2 hours.

Using 0.5×MMLV Buffer

ddH₂O 4.8 μL 10x MMLV buffer (NEB) 1.25 μL 100X BSA(NEB) 1.25 μL 100 mM MgCl₂ 1.75 μL 50 μM oligo(dT)₂₀VN 0.5 μL NTP mix (from Life Technologies Megascript SP6 kit) 2 μL dNTP (NEB) 1.25 μL barcode adapter template (55 ng/μL) 0.6 μL Ribolock (Thermo Scientific) 0.6 μL Total PBMC RNA (50 ng/ul) 4 μL

The above was heated to 55° C. for 3 minutes, then the following was added:

Ribolock (Thermo Scientific) 0.4 μL E. coli inorganic pyrophosphatase (NEB) 2 μL T7 RNAP (NEB) 1 μL T4gp32 (NEB) 0.6 μL Maxima H-RTase (Thermo Scientific) 3 μL

T7 RNAP linear amplification of barcode adapters from barcode adapter template, reverse transcription, and addition of barcodes to 1st strand cDNA was performed at 42° C. for 2 hours.

Using DNA Barcode Adapter

ddH₂O 13 μL 10x MMLV buffer (NEB) 2.5 μL 100X BSA (NEB) 0.25 μL 100 mM MgCl₂ 0.75 μL 50 μM Oligo(dT)₂₀VN 1 μL 10 μM DNA barcode adapter w24 2.5 μL Ribolock (Thermo Scientific) 0.6 μL Total PBMC RNA (50 ng/ul) 2 μL

The above was heated to 55° C. for 3 minutes, then the following was added:

Ribolock (Thermo Scientific) 0.4 μL T4gp32 (NEB) 1 μL Maxima H- RTase (Thermo Scientific) 1 μL

Reverse transcription and addition of barcodes to 1st strand cDNA was performed at 42° C. for 2 hours.

The reaction was then cleaned up using a modified traditional phenol/chloroform method:

- 1. 200 μL of TE0.1 (10 mM Tris pH 8.0, 0.1 mM EDTA) was added to each reaction mix
- 2. 200 μL of Phenol/chloroform/isoamyl alcohol (Sigma) was added to each reaction mix and shaken vigorously in pre-spun Gel Phase Lock tubes (5Prime)
- 3. Gel Phase Lock tubes were centrifuged at 14,000 g for 3 minutes and the top aqueous fraction was pipetted into Amicon 100 kDa columns (Millipore) and spun at 14,000 g for 3 minutes
- 4. 450 μL TE (10 mM Tris, pH 8.0, 1 mM EDTA) was then pipetted into the Amicon column, and spun at 14,000 g for 3 minutes
- 5. 450 μL of 10 mM Tris (pH8.0) was then pipetted into the Amicon column and spun at 14,000 g for 5 minutes
- 6. The Amicon column was inverted into a new collection tube and spun at 1000 g for 2 minutes to collect the elute which contained the purified mRNA/1^ststrand cDNA duplex

Two rounds of PCR (PCR1 and PCR2) were then performed:

The following PCR1 Phusion (Thermo Scientific) reaction mix was set up per RT reaction that used a barcode adapter template:

H2O 11.28 μL 5x GC buffer 5 μL MgCl₂ 0.15 μL DMSO 1 μL dNTP 0.5 μL 10 μM FW1-short 1 μL 10 μM BC-Long 1 μL 10 μM K-GSP1 0.56 μL 10 μM L-GSP1 1.25 μL 10 μM G-GSP1 0.56 μL ET-SSB (NEB) 0.25 μL BSA 0.25 μL Phusion 0.2 μL cDNA template 2 μL

The following PCR1 Phusion (Thermo Scientific) reaction mix was set up per RT reaction that used a DNA barcode adapter:

H2O 11.28 μL 5x GC buffer 5 μL MgCl₂ 0.15 μL DMSO 1 μL dNTP 0.5 μL 10 μM FW1-short 1 μL 10 μM FW-Long 1 μL 10 μM K-GSP1 0.56 μL 10 μM L-GSP1 1.25 μL 10 μM G-GSP1 0.56 μL ET-SSB (NEB) 0.25 μL BSA 0.25 μL Phusion 0.2 μL cDNA template 2 μL

Initial: 95° C. 5′ 18 Cycles: 98° C. 30″ 62° C. 30″ 72° C. 45″ Final extension: 72° C. 5′ Hold: 10° C. hold

The reactions from PCR1 were then diluted 50× and used as a template in 3 separate PCR2 reactions, one for kappa light chain, one for lambda light chain, and one for gamma heavy chain.

The following PCR2 Phusion (Thermo Scientific) reaction mixes were set up per PCR1 reaction:

H2O 17.82 μL 5x GC buffer 6 μL MgCl₂ 0.18 μL DMSO 1 μL dNTP 0.6 μL 10 μM 2FW 1.2 μL 10 μM K or L or G-GSP2 0.6 μL BSA 0.3 μL Phusion 0.3 μL Dil. PCR1 template 2 μL

Initial: 95° C. 5′ 28 cycles: 98° C. 30″ 65° C. 30″ 72° C. 45″ Final extension: 72° C. 5′ Hold: 10° C. hold

5 μL of product was run on a gel (FIG. 9). As can be seen, the barcoding reaction works well in buffers with differing salt concentrations. While the reaction works better in a low salt buffer (0.5×MMLV) due to the salt sensitivity of the T7 RNAP, it does work in a higher salt buffer (1×MMLV). Note that due to non-specific priming during the RT step when using DNA barcode adapters (refer to FIG. 3), in this particular example there was exceptionally high background and the desired bands were obscured.

Example 7: Barcoding Nucleic Acids from Cells Using Aqueous Barcode Adapter Templates in Droplets Made Using a Microfluidic Droplet Device

A device for creating monodisperse emulsions was used encapsulate single cells along with barcoded beads and other reagents necessary for the barcoding assay. For the purposes of this disclosure, we described our own droplet system although any number of comparable systems could be assembled. Three Dolomite P-Pumps were equipped with flow sensors (Dolomite 3200016, 3200095, and 3200098). The first P-Pump was connected directly to a 2-Reagent Droplet Chip (Dolomite 3200287) via microfluidic tubing that incorporated a T-junction to split the line into two inputs. This was the oil input line. The other two P-Pumps were connected via fluidic tubing to PEEK sample loops that coiled around an ice bin that served to keep samples chilled while the device was operating, and each of these loops were connected to the 2-Reagent Droplet Chip. Each sample loop incorporated a four-way valve at its front end so that sample could be loaded into the loop by means of a syringe. The first sample loop was filled with cells while the second loop was filled with RT/barcoding/lysis mix. An example of the device configuration is as shown in FIG. 7. The ice bin was filled with ice prior to use.

A murine B220+ B cell population was FACS sorted and a cell suspension was prepared using 300 mM betaine with 10 mM NaCl and 0.5 mg/ml BSA as a suspension buffer. Cells were used at a concentration of 4,500 cells/μL.

An RT/aqueous barcode mix was prepared as follows:

10X Thermopol DF 30 μL 1M DTT 3 μL 1M MgCl₂ 3.6 μL 50 μM oligo(dT)₂₀VN 6 μL NTP mix (from Life Technologies Megascript SP6 kit) 48 μL dNTP (NEB) 15 μL barcode adapter template (55 ng/μL) 7.2 μL 10% Tween-20 1 μL Ribolock (Thermo Scientific) 12 μL E. coli inorganic pyrophosphatase (NEB) 24 μL T7 RNAP (NEB) 12 μL T4gp32 (NEB) 7.2 μL Maxima H- RTase (Thermo Scientific) 36 μL Total volume 205 μL

The cell suspension was loaded into one sample loop and the RT/barcoding/lysis mix was loaded into the other sample loop using syringes. Cell and barcode concentrations were chosen in such a way as to minimize the occurrence of multiple cells or barcodes in a single droplet, while keeping those concentrations high enough so that a large enough number of cells were encapsulated with barcodes. The 4-way valves were switched so that the sample loops were in line with the pump, and all three pumps were activated. The two aqueous inputs were flowed at rates so that they mixed at a 1:2 (cell suspension:RT/barcoding/lysis mix) ratio. The aqueous and oil inputs were flowed at rates so that droplets that were ˜50 μm in diameter are formed, and at a high enough flow rate so that cells flowed through the device. The emulsion was collected in a Sorenson Bioscience 0.2 mL PCR tube. After the sample had been created, it was first given a pre-heat step (3 minutes at 55° C.) and then incubated for 2 hours at 42° C. to allow the reaction to proceed. Following the reaction, the emulsion was broken using the “breaking non-bead emulsion” process described below. This produced a purified sample of cDNA for subsequent PCR amplification and sequencing.

Breaking non-bead emulsions

- 1. 200 μL TE, 400 μL phenol/chloroform/isoamyl alcohol, 800 μL chloroform were pipetted into pre-spun Gel Phase Lock tubes
- 2. Each sample was pipetted into a corresponding Gel Phase Lock tube
- 3. Tubes were spun down for 3 minutes at 14,000 g
- 4. The aqueous layers were pipetted into 100 kDa Amicon tubes (Millipore).
- 5. Tubes were spun down for 3 minutes at 14,000 g
- 6. 450 μL of TE was pipetted into the Amicon tubes
- 7. Tubes were spun down for 3 minutes at 14,000 g
- 8. 450 μL of 10 mM Tris was added to the Amicon tubes
- 9. Tubes were spun down for 5 minutes at 14,000 g
- 10. Amicon tubes were placed inverted into fresh collection tubes
- 11. Tubes were spun down for 2 minutes at 1,000 g

Two rounds of PCR (PCR1 and PCR2) were then performed, using the following primers in addition to some primer sequences listed in Table 9.

TABLE 11 Additional primers for PCR of murine immunoglobulin genes Primer name Sequence L_GSP1_murine ACT CTT CTC CAC AGT GTC CCC TTC ATG and ACT CTT CTC CAC AGT GTG ACC TTC ATG mixed 50:50 G_GSP1_murine CTG GAC AGG GAT CCA GAG TTC C and CTG GAC AGG GCT CCA TAG TTC C mixed 50:50 K_GSP1_murine CCA TTT TGT CGT TCA CTG CCA TC M_GSP1_murine CCA GAG AAG CCA TCC CGT GGT K_GSP2_murine CTA TGC GCC TTG CCA GCC CGC TCA GCA CTG GAT GGT GGG AAG ATG GA L_GSP2_murine CTA TGC GCC TTG CCA GCC CGC TCA GGG CCT TGT TAG TCT CGA GCT CTT C and CTA TGC GCC TTG CCA GCC CGC TCA GGG CTT TGT TTT CCT KGA GCT CCT C mixed 50:50 G_GSP2_murine CTA TGC GCC TTG CCA GCC CGC TCA GGG GGC CAG TGG ATA GAC HGA TG and CTA TGC GCC TTG CCA GCC CGC TCA GCA GGG ACC AAG GGA TAG ACA GAT G mixed 50:50 M_GSP2 murine CTA TGC GCC TTG CCA GCC CGC TCA GGR AAG ACA TTT GGG RAG GAC TGA CTC

The following PCR1 Phusion (Thermo Scientific) reaction mix was set up per RT reaction that used a barcode adapter template:

H₂O 10.53 μL 5x GC buffer 5 μL MgCl₂ 0.15 μL DMSO 1 μL dNTP 0.5 μL 10 μM FW1-short 1 μL 10 μM BC-Long 1 μL 10 μM mK-GSP1 0.5 μL 10 μM mL-GSP1 0.5 μL 10 μM mG-GSP1 0.56 μL 10 μM mM-GSP1 0.56 μL ET-SSB (NEB) 0.25 μL BSA 0.25 μL Phusion 0.2 μL cDNA template 2 μL

Initial: 95° C. 5′ 18 Cycles: 98° C. 30″ 62° C. 30″ 72° C. 45″ Final extension: 72° C. 5′ Hold: 10° C. hold

The reactions from PCR1 were then diluted 50× and used as a template in 3 separate PCR2 reactions, one for kappa and lambda light chains, one for mu heavy chain, and one for gamma heavy chain.

The following PCR2 Phusion (Thermo Scientific) reaction mixes were set up per PCR1 reaction:

H2O to 30 μL 5x GC buffer 6 μL MgCl₂ 0.18 μL DMSO 1 μL dNTP 0.6 μL 10 μM 2FW 1.2 μL 10 μM mK and mL or mM-GSP 20.6 μL BSA 0.3 μL Phusion 0.3 μL Dil. PCR1 template 2 μL

Initial: 95° C. 5′ 28 cycles: 98° C. 30″ 65° C. 30″ 72° C. 45″ Final extension: 72° C. 5′ Hold: 10° C. hold

5 μL of PCR product was run on a gel (FIG. 10). Bands corresponding to kappa and lambda light chain, and to mu heavy chain were clearly seen. Only the mu heavy chain was amplified as the majority of B220+ B cells were expected to be naïve B cells which are IgM+.

The immunoglobulin heavy and light chains thus amplified can be purified and prepared for next generation sequencing, such as, but not limited to, 454 sequencing. As this example used barcode adapter templates at concentrations of >1 copy per reaction container, a unique set of barcodes are incorporated into the nucleic acids in each reaction container rather than a unique barcode. Paired immunoglobulin heavy and light chains can be associated with each other by them sharing a unique set of barcodes, rather than by a unique barcode.

Barcode adapter templates can also be used at a concentration such that by limiting dilution the majority of reaction containers that contain a barcode adapter template will contain it at 1 copy per reaction container. In this case, paired immunoglobulin heavy and light chains can be associated with each other by them sharing a unique barcode sequence.

Example 8: Barcoding Nucleic Acids from Cells Using Barcode Adapter Template Beads in Droplets Made Using a Microfluidic Droplet Device

A microfluidic device to generate droplets as described in Example 7 is used, with the only difference being that the first sample loop contained both cells and barcode adapter template beads as made in Examples 1, 2 or 3.

A murine B220+ B cell population is FACS sorted and a cell and barcode adapter template bead suspension is prepared using 300 mM betaine with 10 mM NaCl and 0.5 mg/ml BSA as a suspension buffer. Cells are included at a concentration of 4,500 cells/μL and beads are used at a concentration of 60,000 beads/μL.

An RT mix is prepared as follows:

ddH₂0 7.4 μL 10X Thermopol DF 36 μL 1M DTT 3.6 μL 1M MgCl₂ 4.3 μL 50 μM oligo(dT) 7.2 μL NTP mix (from Life Technologies Megascript SP6 kit) 57.6 μL dNTP (NEB) 18 μL 10% Tween-20 1.2 μL Ribolock (Thermo Scientific) 14.4 μL E. coli inorganic pyrophosphatase (NEB) 28.8 μL T7 RNAP (NEB) 14.4 μL T4gp32 (NEB) 8.6 μL Maxima H- RTase (Thermo Scientific) 43.2 μL Total volume 244.8 μL

The cell & barcoded bead suspension is loaded into one sample loop and the RT/barcoding/lysis mix is loaded into the other sample loop using syringes. The 4-way valves are switched so that the sample loops are in line with the pump, and all three pumps are activated. The two aqueous inputs are flowed at rates so that they mix at a 1:2 (cell&bead suspension:RT/barcoding/lysis mix) ratio. The aqueous and oil inputs are flowed at rates so that droplets that are ˜50 um in diameter are formed, and at a high enough flow rate so that cells and beads flow through the device. The emulsion is collected in a Sorenson Bioscience 0.2 mL PCR tube. After the sample has been created, it is first given a heat step (3 minutes at 55° C.) and then incubated for 2 hours at 42° C. to allow the RT/barcoding reaction to proceed. Following the barcoding reaction, the emulsion is broken using the “breaking non-bead emulsion” process described in Example 7. Subsequent PCR reactions are performed as in Example 7.

The immunoglobulin heavy and light chains thus amplified are purified and prepared for next generation sequencing, such as, but not limited to, 454 sequencing. As this example uses barcode adapter templates beads at ˜1 bead per reaction container, paired immunoglobulin heavy and light chains are paired by their shared use of a unique barcode sequence.

Barcode adapter template beads can also be used at a higher concentration, such that the majority of reaction containers will contain >1 bead. In this case, paired immunoglobulin heavy and light chains are associated with each other by them sharing a unique set of barcode sequences.

Example 9: Barcoding Nucleic Acids from Cells Using Barcode Adapters Amplified from Barcode Adapter Templates Beads with a DNA Polymerase

A microfluidic device to generate droplets as described in Example 7 is used, with the only difference being that the first sample loop contained both cells and barcode adapter template beads as made in Examples 1, 2 or 3. In this example, the barcode adapter template beads comprise a 5′ Nt.BbvCI nicking endonuclease sequence rather than a T7 RNAP promoter sequence to allow for amplification of barcode adapters by a DNA polymerase.

A murine B220+ B cell population was FACS sorted and a cell and barcode adapter template bead suspension was prepared using 300 mM betaine with 10 mM NaCl and 0.5 mg/ml BSA as a suspension buffer. Cells are included at a concentration of 4,500 cells/uL and beads are used at a concentration of 60,000 beads/μL.

An RT mix is prepared as follows:

ddH₂0 32.7 μL 10X Thermopol DF 36 μL 1M DTT 3.6 μL 1M MgCl₂ 4.3 μL 50 μM oligo(dT) 7.2 μL dNTP (NEB) 36 μL 10% Tween-20 1.2 μL Ribolock (Thermo Scientific) 14.4 μL E. coli inorganic pyrophosphatase (NEB) 28.8 μL Nt. BbvCI (NEB) 14.4 μL Klenow exo− (NEB) 14.4 μL T4gp32 (NEB) 8.6 μL Maxima H- RTase (Thermo Scientific) 43.2 μL Total volume 244.8 μL

The cell & barcoded bead suspension is loaded into one sample loop and the RT/barcoding/lysis mix is loaded into the other sample loop using syringes. The 4-way valves are switched so that the sample loops are in line with the pump, and all three pumps are activated. The two aqueous inputs are flowed at rates so that they mix at a 1:2 (cell & bead suspension:RT/barcoding/lysis mix) ratio. The aqueous and oil inputs are flowed at rates so that droplets that are ˜50 um in diameter are formed, and at a high enough flow rate so that cells and beads flow through the device. The emulsion is collected in a Sorenson Bioscience 0.2 mL PCR tube. After the sample has been created, it is first given a heat step (3 minutes at 55° C.) and then incubated for 2 hours at 42° C. to allow the RT/barcoding reaction to proceed. Following the barcoding reaction, the emulsion is broken using the “breaking non-bead emulsion” process described in Example 7. Subsequent PCR reactions are performed as in Example 7.

The immunoglobulin heavy and light chains thus amplified are purified and prepared for next generation sequencing, such as, but not limited to, 454 sequencing. As this example uses barcode adapter templates beads at ˜1 bead per reaction container, paired immunoglobulin heavy and light chains are paired by their shared use of a unique barcode sequence.

Barcode adapter template beads can also be used at a higher concentration, such that the majority of reaction containers will contain >1 bead. In this case, paired immunoglobulin heavy and light chains are associated with each other by them sharing a unique set of barcode sequences.

Example 10: Barcoding Nucleic Acids from Cells Using Barcode Adapter Templates in Multi-Well Reaction Containers

Barcode adapter templates with a composition as in FIG. 1 are synthesized as duplex oligos from a vendor such as IDT. Each unique barcode adapter template is kept in a different storage container such that there is no mixing or cross-contamination of barcode sequences. Activated B cells (plasmablasts) are single cell sorted using a FACS Aria II (Becton Dickenson) into 10 μL of a lysis buffer into all wells of a 96-well plate. The composition of the buffer in each well is:

10 mM Tris pH 8.0 to 10 μL 10x MMLV buffer 1 μL 100 mM MgCl₂ 0.3 μL 1M DTT 0.015 μL 100x BSA (NEB) 0.075 μL dNTP 0.5 μL 10 μM Oligo(dT)₂₅ 0.5 μL 20% IGEPAL-630 (Sigma) 0.15 μL 1 μM barcode adapter template 0.25 μL Ribolock (Thermo Scientific) 0.4 μL Maxima H- RTase (Thermo Scientific) 0.25 μL

The plate is then incubated at 55° C. for 3 minutes, then incubated at 42° C. for 2 hours for the RT/barcoding reaction to occur. The reactions in all wells of a 96-well plate were then pooled together and cleanup is performed using a modified traditional phenol/chloroform method:

- 1. 400 μL of Phenol/chloroform/isoamyl alcohol (Sigma) is added to and shaken vigorously in pre-spun Gel Phase Lock tubes (5Prime)
- 2. Gel Phase Lock tubes are centrifuged at 14,000 g for 3 minutes and the top aqueous fraction is pipetted into Amicon 100 kDa columns (Millipore) and spun at 14,000 g for 3 minutes
- 3. Step 2 is repeated as necessary to get the entire aqueous volume spun through the Amicon column
- 4. 450 μL TE (10 mM Tris, pH 8.0, 1 mM EDTA) is then pipetted into the Amicon column, and spun at 14,000 g for 3 minutes
- 5. 450 μL of 10 mM Tris (pH8.0) is then pipetted into the Amicon column and spun at 14,000 g for 5 minutes
- 6. The Amicon column is inverted into a new collection tube and spun at 1000 g for 2 minutes to collect the elute which contained the purified mRNA/1^ststrand cDNA duplex

The following PCR1 Phusion (Thermo Scientific) reaction mix is set up:

H2O 11.28 μL 5x GC buffer 5 μL MgCl₂ 0.15 μL DMSO 1 μL dNTP 0.5 μL 10 μM FW1-short 1 μL 10 μM FW-Long 1 μL 10 μM K-GSP1 0.56 μL 10 μM L-GSP1 1.25 μL 10 μM G-GSP1 0.56 μL ET-SSB (NEB) 0.25 μL BSA 0.25 μL Phusion 0.2 μL cDNA template 2 μL

Initial: 95° C. 5′ 18 Cycles: 98° C. 30″ 62° C. 30″ 72° C. 45″ Final extension: 72° C. 5′ Hold: 10° C. hold

The reaction from PCR1 is then diluted 50× and used as a template in 3 separate PCR2 reactions, one for kappa light chain, one for lambda light chain and one for gamma heavy chain.

The following PCR2 Phusion (Thermo Scientific) reaction mixes are set up:

H2O 17.82 μL 5x GC buffer 6 μL MgCl₂ 0.18 μL DMSO 1 μL dNTP 0.6 μL 10 μM 2FW 1.2 μL 10 μM K or L or G-GSP2 0.6 μL BSA 0.3 μL Phusion 0.3 μL Dil. PCR1 template 2 μL

Initial: 95° C. 5′ 23 or 28 cycles: 98° C. 30″ 65° C. 30″ 72° C. 45″ Final extension: 72° C. 5′ Hold: 10° C. hold

The immunoglobulin heavy and light chains thus amplified are purified and prepared for next generation sequencing, such as, but not limited to, 454 sequencing. As this example uses a unique barcode adapter templates individually pipetted into each reaction container (in this case wells of a 96-well plate), paired immunoglobulin heavy and light chains are bioinformatically paired by their shared use of a unique barcode sequence.

Claims

1. A structure comprising: a plurality of single-stranded DNA barcode adaptors attached to a solid support, wherein each barcode adaptor in the plurality includes a barcode sequence, a universal priming sequence, and an RNA binding site.

2. The structure of claim 1, wherein the RNA binding site is a polyT tract.

3. The structure of claim 1, wherein the RNA binding site comprises a sequence complementary to at least one sequence region in one or more mRNAs.

4. The structure of claim 1, wherein the barcode adaptor is attached to the solid support via the 5′ end of the barcode adaptor.

5. The structure of claim 1, wherein the barcode adaptor is attached to the solid support via a thiol group.

6. A method for producing one or more polynucleotides of interest, the method comprising:

a. providing a plurality of barcode adaptors, each barcode adaptor including a single-stranded DNA sequence with a barcode sequence, a universal priming sequence, a UMI, and an RNA binding site;

b. obtaining a plurality of RNA molecules associated with one or more samples;

c. adding the plurality of adapter molecules to the RNA molecules associated with the sample,

d. performing reverse transcription on the RNA molecules from the cells to generate cDNA molecules including the barcode sequence;

e. collecting the cDNA molecules including the barcode sequences; and

f. sequencing the cDNA molecules including the barcode sequences.

7. The method of claim 6, wherein the RNA molecules associated with a sample are in a separate reaction volume.

8. The method of claim 6, wherein the RNA binding site is a polyT tract.

9. The method of claim 6, wherein the RNA binding site comprises a sequence complementary to at least one of the RNA molecules.

10. The method of claim 6, wherein the barcode adaptor is attached to a solid support.

11. The method of claim 10, wherein the barcode adaptor is attached to the solid support via the 5′ end of the barcode adaptor.

12. The method of claim 10, wherein the barcode adaptor is attached to the solid support via avidin, streptavidin, biotin, gold, a thiol group, a carboxyl group, an epoxy group, a hydroxyl group or any combination thereof.