METHODS AND SYSTEMS FOR SPATIAL MAPPING OF GENETIC VARIANTS

Info

Publication number: 20220049303
Type: Application
Filed: Aug 16, 2021
Publication Date: Feb 17, 2022
Inventors: Michele A. Busby (Needham, MA), Evan Daugharthy (Cambridge, MA), Richard Terry (Carlisle, MA)
Application Number: 17/403,405

Abstract

The present disclosure provides methods and systems for determining the location and identity of variant sequences. The present disclosure also provides methods of treating diseases and of using spatial information of variant sequences in a biological sample to guide treatment selection in a corresponding subject.

Description

Description

CROSS-REFERENCE

This application claims the benefit of U.S. Provisional Patent Application Ser. No. 63/066,604, filed Aug. 17, 2020, which is entirely incorporated herein by reference.

BACKGROUND

Cancer immunotherapies utilize immune cells to detect and eliminate cancerous cells from the body. Immune cells often identify cancerous cells due to the presentation of tumor antigens on the cancer cell surface by major histocompatibility complex (MHC) proteins. In some instances, the efficacy of a cancer immunotherapy can be dependent on the expression of particular antigens and/or MHC proteins by cancerous cells. Antigens and MHC proteins can be expressed heterogeneously within a tumor volume leading to the presence of a single tumor with several clonal populations of cancer cells. Cancer treatments targeting clonal populations expressed throughout the volume of a tumor may improve treatment efficacy.

SUMMARY

Recognized herein is a need to spatially map the expression of antigens/major histocompatibility complex (MHC) proteins. The methods and compositions provided herein can inform the selection of cancer immunotherapies that are more likely to be effective for a particular patient.

In an aspect, the present disclosure provides a method of analyzing a spatial distribution of a first human leukocyte antigen (HLA) variant sequence in a biological sample comprising: (a) obtaining a biological sample comprising a nucleic acid corresponding to the first HLA variant sequence from a subject; (b) hybridizing a first probe comprising an HLA targeting sequence to the nucleic acid corresponding to the first HLA variant sequence; (c) identifying at least a portion of the first probe; and (d) determining a location of the first HLA variant sequence within the biological sample by determining a location of the first probe.

In some embodiments, identifying at least a portion of the first probe comprises sequencing at least a portion of the first probe in situ. In some embodiments, determining a location of the first probe comprises sequencing at least a portion of the first probe in situ. In some embodiments, determining a location of the first HLA variant sequence further comprises identifying the first HLA variant sequence. In some embodiments, the first HLA variant sequence comprises an HLA allele. In some embodiments, identifying the first HLA variant sequence comprises identifying the first probe. In some embodiments, the method further comprises providing the biological sample within a three-dimensional (3D) matrix that preserves spatial information of the first HLA variant sequence prior to operations (c) and (d). In some embodiments, providing the biological sample within the 3D matrix comprises generating the 3D matrix. In some embodiments, the method further comprises immobilizing the first probe on the 3D matrix. In some embodiments, the method further comprises immobilizing the nucleic acid corresponding to the first HLA variant sequence on the 3D matrix. In some embodiments, the biological sample is provided within the 3D matrix by directing a precursor of the 3D matrix through the biological sample and subjecting the precursor of the 3D matrix to a reaction to generate cross-links and form the 3D matrix. In some embodiments, the cross-links comprise chemical crosslinks. In some embodiments, the cross-links comprise physical crosslinks. In some embodiments, the reaction comprises free-radical polymerization. In some embodiments, the reaction comprises a chemical conjugation reaction. In some embodiments, the reaction comprises a bioconjugation reaction. In some embodiments, the reaction comprises a photopolymerization reaction.

In some embodiments, the biological sample comprises a second nucleic acid corresponding to a second variant sequence and the method further comprises: (A) hybridizing a second probe comprising a second nucleic acid targeting sequence to the second nucleic acid corresponding to the second variant sequence; (B) identifying at least a portion of the second probe; and (C) determining a location of the second variant sequence within the biological sample by determining a location of the second probe. In some embodiments, the second variant sequence comprises a mutation. In some embodiments, the mutation is associated with an increased risk of cancer. In some embodiments, the mutation is associated with a tumor antigen. In some embodiments, the mutation is associated with a cancer/testis antigen. In some embodiments, the mutation is associated with an oncofetal protein. In some embodiments, the mutation is a tumor mutation. In some embodiments, the mutation is associated with a tumor suppressor protein. In some embodiments, the mutation is associated with a neoantigen. In some embodiments, the method further comprises generating a visual representation of the location of the first HLA variant sequence and the location of the second variant sequence for display on a graphical user interface (GUI). In some embodiments, the method further comprises detecting a clone within the biological sample by comparing the location of the first HLA variant sequence and the location of the second variant sequence. In some embodiments, the method further comprises generating a visual representation of a location of the clone within the biological sample for display on a graphical user interface (GUI). In some embodiments, the method further comprises identifying a cell or derivative thereof within the biological sample, wherein the cell derivative thereof comprises the first HLA variant sequence and the second variant sequence. In some embodiments, the method further comprises predicting the presentation of a peptide on a major histocompatibility complex (MHC) protein expressed in the biological sample, wherein the peptide is at least partially encoded by the second variant sequence and the MHC protein is at least partially encoded by the HLA variant sequence. In some embodiments, the peptide is a mutant peptide. In some embodiments, the peptide is associated with an increased risk of cancer. In some embodiments, the method further comprises selecting a treatment to be administered to the subject, wherein: the treatment comprises administration of a cell to the subject; and the cell comprises a cell receptor that recognizes the peptide on the MHC protein expressed in the biological sample. In some embodiments, the cell is a T-cell, a B cell, or a natural killer T (NKT) cell. In some embodiments, the cell is a recombinant T-cell. In some embodiments, the cell expresses a chimeric antigen receptor. In some embodiments, the cell expresses a recombinant T cell receptor. In some embodiments, the method further comprises selecting a treatment to be administered to the subject, wherein the treatment is more likely to be effective in a subject with one or more cancer cells presenting the peptide on the MHC protein expressed in the biological sample than it is in a subject without the one or more cancer cells that present the peptide on the MHC protein expressed in the biological sample. In some embodiments, the treatment is an immunotherapy. In some embodiments, the treatment comprises administration of a checkpoint inhibitor to the subject. In some embodiments, the method further comprises selecting a treatment to be administered to the subject, wherein: the treatment comprises administration of the peptide; and the treatment is more likely to be effective in a subject with one or more cancer cells expressing the MHC protein expressed in the biological sample than it is in a subject without one or more cancer cells that express the MHC protein. In some embodiments, the biological sample further comprises a third nucleic acid, and wherein the third nucleic acid corresponds to a second HLA variant sequence, the method further comprising: (1) hybridizing a third probe comprising a second HLA targeting sequence to the third nucleic acid; (2) identifying at least a portion of the third probe; and (3) determining a location of the second HLA variant sequence within the biological sample by determining a location of the third probe. In some embodiments, a plurality of additional probes (e.g., fourth, fifth, sixth, etc.) are used to detect various genes encoding antigens, inflammatory markers, or cell typing markers within a biological sample, such as a panel of genes for assaying expression of antigens, inflammatory markers, or cell typing markers.

In some embodiments, the method further comprises, prior to operation (b): (I) obtaining a genetic profile of the subject; (II) detecting a presence or absence of a first HLA allele in the subject by analyzing the genetic profile. In some embodiments, the first HLA variant sequence comprises the first HLA allele detected in the genetic profile. In some embodiments, the first HLA allele comprises a mutation. In some embodiments, the first HLA allele is a gene variant. In some embodiments, the method further comprises identifying a first group of HLA alleles, wherein the first group of HLA alleles are expressed in the biological sample, and wherein the first probe is designed to hybridize to a nucleic acid corresponding to one of the HLA alleles of the first group of alleles. In some embodiments, the first probe discriminates between two alleles of the first group of HLA alleles.

In some embodiments, the method further comprises, prior to operation (b): (I) obtaining a genetic profile of the subject; (II) detecting a plurality of HLA alleles in the subject by analyzing the genetic profile; wherein the first probe preferentially hybridizes to a nucleic acid corresponding to only one of the HLA alleles detected in the genetic profile. In some embodiments, the genetic profile is generated via ribonucleic acid (RNA) sequencing. In some embodiments, the genetic profile is generated via exome sequencing. In some embodiments, the first HLA variant sequence is a class I HLA allele. In some embodiments, the first HLA variant sequence is a class II HLA allele. In some embodiments, the first HLA variant sequence is HLA-A*01:01. In some embodiments, the first HLA variant sequence is HLA-A*02:01. In some embodiments, the first HLA variant sequence is HLA-B*44:02. In some embodiments, the first HLA variant sequence is HLA-C*07:01. In some embodiments, the first HLA variant sequence is HLA-C*08:02. In some embodiments, the first HLA variant sequence is HLA-DPA1. In some embodiments, the first HLA variant sequence is HLA-DPB1*01. In some embodiments, the first HLA variant sequence is HLA-DQA1. In some embodiments, the first HLA variant sequence is HLA-DQB1. In some embodiments, the first HLA variant sequence is HLA DRB1. In some embodiments, the first HLA variant sequence is HLA-DRA. In some embodiments, the nucleic acid corresponding to the first HLA variant sequence is a deoxyribonucleic acid (DNA) molecule. In some embodiments, the nucleic acid corresponding to the first HLA variant sequence is an RNA molecule. In some embodiments, the method further comprises, prior to operation (b), reverse transcribing RNA expressed in the biological sample to form complementary deoxyribonucleic acid (cDNA), wherein the cDNA comprises the nucleic acid corresponding to the first HLA variant sequence. In some embodiments, the biological sample is a tissue biopsy. In some embodiments, the biological sample is a tumor biopsy. In some embodiments, the biological sample is biological tissue. In some embodiments, the biological sample is a surgical resection. In some embodiments, the biological sample is a tumor. In some embodiments, the biological sample is a blood sample.

In some embodiments, the method further comprises, prior to operation (b), generating a section, wherein the section comprises a portion of the biological sample. In some embodiments, the method further comprises, prior to operation (c) subjecting the first probe to an amplification reaction to generate an amplified nucleic acid molecule that corresponds to the first HLA variant sequence. In some embodiments, identifying at least a portion of the first probe comprises identifying at least a portion of the amplified nucleic acid molecule. In some embodiments, determining the location of the first probe comprises determining a location of the amplified nucleic acid molecule. In some embodiments, the method further comprises, prior to operation (c) subjecting the first probe to an amplification reaction to generate an amplified nucleic acid molecule that corresponds to the first HLA variant sequence and immobilizing the amplified nucleic acid molecule on the 3D matrix. In some embodiments, the first probe is a circularizable probe. In some embodiments, the circularizable probe is a padlock probe. In some embodiments, the padlock probe comprises: a first end; a second end; a 5′ terminal region; and a 3′ terminal region; and wherein the 5′ terminal region and the 3′ terminal region hybridize to the nucleic acid corresponding to the first HLA variant sequence. In some embodiments, the method further comprises circularizing the padlock probe by ligating the first end and the second end of the padlock probe together, thereby generating a circularized padlock probe. In some embodiments, the first end and the second end are contiguous. In some embodiments, the first end and the second end are separated by a gap region containing at least one nucleotide. In some embodiments, the gap region contains from 2 to 500 nucleotides. In some embodiments, the method further comprises filling the gap region by incorporating at least one nucleotide in an extension reaction. In some embodiments, the amplification reaction is a rolling circle amplification (RCA) reaction. In some embodiments: the nucleic acid corresponding to the first HLA variant sequence is a DNA molecule hybridized to an RNA molecule; the nucleic acid corresponding to the first HLA variant sequence comprises a first sequence; the RNA molecule comprises a second sequence; the first sequence is the reverse complement of the second sequence; the method further comprises, prior to (b); (i) degrading or digesting at least a portion of the RNA molecule; and the second sequence is identified based on the identification of at least a portion of the amplified nucleic acid sequence. In some embodiments, the DNA molecule is a cDNA molecule. In some embodiments, the biological sample is present in a 3D matrix; and the DNA molecule is immobilized to the 3D matrix. In some embodiments, the biological sample is present in a 3D matrix; and the first probe is immobilized to the 3D matrix. In some embodiments, the method further comprises administering a treatment to the subject, wherein the treatment is selected for administration to the subject based at least partially on the spatial distribution of the first HLA variant sequence in the biological sample. In some embodiments, the treatment comprises an immunotherapy. In some embodiments, the treatment comprises a checkpoint inhibitor. In some embodiments, the treatment comprises a cancer vaccine. In some embodiments, the treatment comprises a chimeric antigen receptor T-cell therapy. In some embodiments, the treatment comprises a recombinant T-cell therapy.

In another aspect, the present disclosure provides a method of identifying a location of a human leukocyte antigen (HLA) allele in a biological sample comprising targeting a nucleobase to a nucleic acid molecule encoding the HLA allele in the biological sample and identifying a sequence of the nucleic acid molecule or derivative thereof in situ to identify the location of the HLA allele within the biological sample.

In another aspect, the present disclosure provides a method of identifying a location of a human leukocyte antigen (HLA) allele in a biological sample comprising targeting a nucleic acid probe molecule to a nucleic acid molecule encoding the HLA allele in the biological sample and identifying a sequence of the nucleic acid molecule or derivative thereof in situ to identify the location of the HLA allele within the biological sample.

In another aspect, the present disclosure provides a method for analyzing a spatial distribution of a human leukocyte antigen (HLA) variant sequence in a biological sample from a subject, comprising (a) obtaining a genetic profile of the subject and detecting a presence or absence of the HLA variant sequence in the subject by analyzing the genetic profile; (b) hybridizing a first probe comprising an HLA targeting sequence to a first nucleic acid in the biological sample corresponding to the HLA variant sequence; (c) identifying at least a portion of the first probe; and (d) determining a location of the HLA variant sequence within the biological sample by determining a location of the first probe, wherein the first probe preferentially hybridizes to the first nucleic acid corresponding to the HLA variant sequence detected in the genetic profile. In some embodiments, the genetic profile is generated via RNA sequencing. In some embodiments, the genetic profile is generated via exome sequencing. In some embodiments, the biological sample comprises a second nucleic acid and the method further comprises hybridizing a second probe comprising another nucleic acid targeting sequence to the second nucleic acid, identifying at least a portion of the second probe; and determining a location of the second nucleic acid within the biological sample by determining a location of the second probe.

In another aspect, the present disclosure provides a method of analyzing a biological sample, comprising (a) obtaining the biological sample comprising a first nucleic acid and a second nucleic acid, wherein the first nucleic acid corresponds to an HLA variant sequence from a subject; (b) hybridizing a first probe comprising an HLA targeting sequence to the first nucleic acid corresponding to the HLA variant sequence and hybridizing a second probe comprising another nucleic acid targeting sequence to the second nucleic acid; (c) identifying at least a portion of the first probe and at least a portion of the second probe; and (d) determining a location of the HLA variant sequence and a location of the second nucleic acid within the biological sample.

In some embodiments, the second nucleic acid corresponds to an additional HLA variant sequence. In some embodiments, the second nucleic acid comprises a mutation. In some embodiments, the mutation is associated with an increased risk of cancer. In some embodiments, the mutation is associated with a cancer/testis antigen. In some embodiments, the mutation is associated with an oncofetal protein. In some embodiments, the mutation is a tumor mutation. In some embodiments, the mutation is associated with a tumor suppressor protein. In some embodiments, the mutation is associated with a neoantigen.

In some embodiments, the second nucleic acid is associated with a tumor antigen. In some embodiments, the second nucleic acid is associated with a marker of inflammation. In some embodiments, the second nucleic acid is associated with a marker for cell typing. In some embodiments, the second nucleic acid is associated with a marker for an immune cell or a cancer cell. In some embodiments, the method further comprises generating a visual representation of the location of the HLA variant sequence and the location of the second nucleic acid for display on a graphical user interface (GUI). In some embodiments, the method further comprises detecting a clone within the biological sample by comparing the location of the HLA variant sequence and a location of the additional HLA variant sequence. In some embodiments, the method further comprises generating a visual representation of a location of the clone within the biological sample for display on a graphical user interface (GUI). In some embodiments, the method further comprises identifying a cell or derivative thereof within the biological sample, wherein the cell derivative thereof comprises the HLA variant sequence and the second nucleic acid. In some embodiments, the method further comprises predicting the presentation of a peptide on a major histocompatibility complex (MHC) protein expressed in the biological sample, wherein the peptide is at least partially encoded by the second nucleic acid and the MHC protein is at least partially encoded by the HLA variant sequence.

In some embodiments, the peptide is a mutant peptide. In some embodiments, the peptide is associated with an increased risk of cancer. In some embodiments, identifying at least a portion of the first probe comprises sequencing at least a portion of the first probe in situ. In some embodiments, determining a location of the first probe comprises sequencing at least a portion of the first probe in situ. In some embodiments, determining a location of the HLA variant sequence further comprises identifying the HLA variant sequence. In some embodiments, the HLA variant sequence comprises an HLA allele. In some embodiments, the additional HLA variant sequence comprises an HLA allele. In some embodiments, identifying the HLA variant sequence comprises identifying the first probe.

In some embodiments, the method further comprises providing the biological sample within a three-dimensional (3D) matrix that preserves spatial information of the HLA variant sequence prior to operation (b). In some embodiments, providing the biological sample within the 3D matrix comprises generating the 3D matrix. In some embodiments, the method further comprises immobilizing the first probe on the 3D matrix. In some embodiments, the method further comprises immobilizing the first nucleic acid on the 3D matrix. In some embodiments, the biological sample is provided within the 3D matrix by directing a precursor of the 3D matrix through the biological sample and subjecting the precursor of the 3D matrix to a reaction to generate cross-links and form the 3D matrix. In some embodiments, the cross-links comprise chemical crosslinks. In some embodiments, the cross-links comprise physical crosslinks. In some embodiments, the reaction comprises free-radical polymerization. In some embodiments, the reaction comprises a chemical conjugation reaction. In some embodiments, the reaction comprises a bioconjugation reaction. In some embodiments, the reaction comprises a photopolymerization reaction.

In some embodiments, the HLA variant sequence is a class I HLA allele. In some embodiments, the HLA variant sequence is a class II HLA allele. In some embodiments, the HLA variant sequence is HLA-A*01:01. In some embodiments, the HLA variant sequence is HLA-A*02:01. In some embodiments, the HLA variant sequence is HLA-B*44:02. In some embodiments, the HLA variant sequence is HLA-C*07:01. In some embodiments, the HLA variant sequence is HLA-C*08:02. In some embodiments, the HLA variant sequence is HLA-DPA1. In some embodiments, the HLA variant sequence is HLA-DPB1*01. In some embodiments, the HLA variant sequence is HLA-DQA1. In some embodiments, the HLA variant sequence is HLA-DQB1. In some embodiments, the HLA variant sequence is HLA DRB1. In some embodiments, the HLA variant sequence is HLA-DRA. In some embodiments, the first nucleic acid corresponding to the HLA variant sequence is a DNA molecule. In some embodiments, the first nucleic acid corresponding to the HLA variant sequence is an RNA molecule. In some embodiments, the second nucleic acid is a DNA molecule. In some embodiments, the second nucleic acid is an RNA molecule.

In some embodiments, the biological sample is a tissue biopsy. In some embodiments, the biological sample is a tumor biopsy. In some embodiments, the biological sample is biological tissue. In some embodiments, the biological sample is a surgical resection. In some embodiments, the biological sample is a tumor. In some embodiments, the biological sample is a blood sample. In some embodiments, the method further comprises, prior to hybridizing the first probe, generating a section, wherein the section comprises a portion of the biological sample. In some embodiments, the method further comprises, prior to identifying the at least a portion of the first probe, subjecting the first probe to an amplification reaction to generate an amplified nucleic acid molecule that corresponds to the HLA variant sequence. In some embodiments, identifying the at least a portion of the first probe comprises identifying at least a portion of the amplified nucleic acid molecule. In some embodiments, determining the location of the first probe comprises determining a location of the amplified nucleic acid molecule. In some embodiments, the method further comprises, subjecting the first probe to an amplification reaction to generate an amplified nucleic acid molecule that corresponds to the HLA variant sequence. In some embodiments, the first probe is a circularizable probe. In some embodiments, the circularizable probe is a padlock probe. In some embodiments, the padlock probe comprises: a first end, a second end, a 5′ terminal region, and a 3′ terminal region. In some embodiments, the 5′ terminal region and the 3′ terminal region hybridize to the first nucleic acid corresponding to the HLA variant sequence.

In some embodiments, the method further comprises circularizing the padlock probe by ligating the first end and the second end of the padlock probe together, thereby generating a circularized padlock probe. In some embodiments, the first end and the second end are contiguous. In some embodiments, the first end and the second end are separated by a gap region containing at least one nucleotide. In some embodiments, the method further comprises filling the gap region by incorporating at least one nucleotide in an extension reaction. In some embodiments, the amplification reaction is a rolling circle amplification (RCA) reaction. In some embodiments, the method further comprises contacting the biological sample with a plurality of fluorescently label oligonucleotides directly or indirectly to identify at least a portion of the first probe. In some embodiments, the method further comprises contacting the biological sample with a plurality of fluorescently label oligonucleotides directly or indirectly to identify at least a portion of the second probe.

Another aspect of the present disclosure provides a non-transitory computer readable medium comprising machine executable code that, upon execution by one or more computer processors, implements any of the methods above or elsewhere herein.

Another aspect of the present disclosure provides a system comprising one or more computer processors and computer memory coupled thereto. The computer memory comprises machine executable code that, upon execution by the one or more computer processors, implements any of the methods above or elsewhere herein.

Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.

INCORPORATION BY REFERENCE

All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.

BRIEF DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee. The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings (also “Figure” and “FIG.” herein), of which:

FIG. 1 shows a schematic of a workflow for generating genetic profiles.

FIG. 2 shows an example of designing a probe to focus on the discrimination of two different human leukocyte antigen (HLA) alleles known to be present in a sample.

FIG. 3 shows a computer system that is programmed or otherwise configured to implement methods provided herein.

FIG. 4 shows a schematic of a spatial map of gene expression in a biological sample.

FIG. 5A-5B shows various HLA gene alleles and T cell markers detected in a biological sample.

DETAILED DESCRIPTION

While various embodiments of the invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.

Provided herein are methods and systems for spatially mapping genetic variants in a biological sample, such as a tumor or derivative thereof. Methods and systems of the disclosure can, for example, identify the presence or absence of clonal populations within a tumor by spatially mapping the expression of tumor antigens and major histocompatibility complex (MHC) proteins. MHC proteins are encoded by human leukocyte antigen (HLA) genes. HLA genes are highly polymorphic and many HLA variants exist, including alleles. Also disclosed are methods of selecting a treatment for cancer based on the spatial expression of HLA variants and/or tumor antigens and the use of such treatments as cancer therapies.

Treatment of cancer via immunotherapies offers many advantages over other chemotherapeutic-based approaches. However, the efficacy of a cancer immunotherapy can be dependent on the expression of particular tumor antigens and/or MHC proteins by cancerous cells. Antigens such as tumor antigens bind to MHC proteins to form MHC-antigen complexes which are presented on the cancer cell surface, where antigens can be recognized by immune cells. Within and between subjects, the antigen profiles of cells vary. The proper presentation of peptides on the surface of cells by MHC proteins allows the immune system to distinguish between cancerous cells and normal cells not to be harmed. In some instances, cancerous cells form in a subject that expresses tumor antigens. Non-limiting examples of tumor antigens include tumor specific antigens, (antigens only present on tumor cells), tumor-associated antigens (antigens present on tumor cells and some normal cells), and neoantigens (antigens not previously recognized by the immune system that can arise from altered tumor proteins formed as a result of tumor mutations or from viral proteins). In some instances, immune cells can detect the expression of tumor antigens and facilitate the destruction of cancerous cells displaying the tumor antigen.

For the effective detection of cancerous cells by immune cells, particular MHC proteins and antigens may be co-expressed by the cancerous cells. Over time, mutations in cancer cells can result in the loss or alteration of the expression of an antigen or its presenting MHC proteins, resulting in the formation of a tumor sub-clone. The loss of antigen expression in cancer cells, the mutation of antigens in cancer cells, the loss of MHC protein expression, or mutations to MHC proteins can allow cancerous cells to evade immune detection. Immunotherapies frequently rely on the detection of cancerous cells by immune cells, thus the loss or mutation of antigen expression in cancer cells or loss or mutations of MHC proteins can render an immunotherapy ineffective against a clonal population of cancer cells. In some instances, a clonal population of cancer cells can be expressed in a portion of a tumor, rather than throughout the volume of a tumor.

Some methods of selecting an immunotherapy based on antigen and/or MHC expression rely on analyzing gene expression data of tumors. However, these methods do not provide spatial information or identify the presence of clonal populations within tumors. Thus, the methods may improperly assume that a particular antigen and MHC molecule are both co-expressed in the same cell when they are not. Treatments selected based on such methods may be effective against clonal populations present in a small fraction of the tumor volume and may therefore have limited efficacy. A method that can spatially assess the expression of antigen and/or MHC proteins throughout a tumor can allow for the selection of a treatment or treatments aimed against clonal populations identified to be present in the tumor. Such clonal populations can be present in part of the tumor or throughout the volume of a tumor. In some cases, the clonal populations may be resistant to other therapies or may have evaded immune detection. Thus, the methods can allow for more targeted therapy selection. Treatments directed towards clonal populations identified in a tumor may have improved efficacy compared to those selected based on global tumor expression data directed towards unexpressed clonal populations or clonal populations expressed in only part of a tumor.

Methods and systems disclosed herein can overcome many of the shortcomings of some methods for immunotherapy selection. By utilizing probes that can identify and locate HLA gene variants and genes encoding antigen proteins within a biological sample, the methods and systems can spatially map clonal populations of cancer cells throughout a tumor. In some embodiments, probes can be used to identify and locate gene variants, genes encoding antigens, inflammatory markers, or cell typing markers within a biological sample. In some cases, the methods and systems can be used to detect presence or absence of, mutations of, or level of expression of gene variants and/or genes encoding antigens, inflammatory markers, or cell typing markers (e.g., immune cell marker, cancer marker) within a biological sample. In some cases, the biological samples comprise tumor tissue or a derivative thereof containing a three-dimensional matrix (3D) that preserves the spatial information of nucleic acid molecules corresponding to MHC proteins and/or tumor antigens. In some cases, loss of antigens and/or MHC protein expression may be detected. In some cases, biological samples are bodily fluids, such as blood, that preserve the cellular structure of nucleic acid molecules corresponding to MHC proteins and/or tumor antigens. In some embodiments, the methods provided herein are useful for the purpose of distilling data for diagnosis, prognosis, therapeutic guidance, or monitoring or evaluating the efficacy of treatment. Probes of the disclosure can hybridize to such nucleic acid molecules. In some cases, designing probes for the HLA genes can be difficult because of homology in sequences shared by various HLA genes. In some embodiments, detecting the clonal loss of HLA expression can be difficult because HLA genes can be both highly polymorphic and homologous to one another. In some aspects, provided herein is are methods and systems that comprise generating a genetic profile (e.g., by next-generation sequencing) to identify a plurality of variant sequences (e.g. HLA variant sequences, sequences encoding tumor antigen variants, or other tumor characteristics) present in the subject prior to designing probes for use with an individual or a tumor sample. Examples of probes include, but are not limited to, padlock probes, molecular inversion probes, or variants thereof. The probes can comprise a region that has complementarity with a target and may comprise an additional region that does not hybridize with the target. The probes can be circular probes. The probes can be circularizable probes.

The probes can comprise two or more components (e.g., multicomponent probes). For example, the probes can comprise two or more separate nucleic acid fragments, and the two or more separate nucleic acid fragments can be joined together to form a circular probe when the two or more separate nucleic acid fragments hybridize to a target nucleic acid molecule. In some cases, the probe can comprise a first nucleic acid molecule comprising (i) a first hybridizing region having a first sequence complementary to a first target sequence of a target nucleic acid molecule and (ii) a first nonhybridizing region at a first end of the first nucleic acid molecule. The probe can further comprise a second nucleic acid molecule comprising (i) a second hybridizing region having a second sequence complementary to a second target sequence of the target nucleic acid molecule and (ii) a second nonhybridizing region at a second end of the second nucleic acid molecule. The first nucleic acid molecule and the second molecule may be configured such that, upon hybridization of the first sequence to the first target sequence and the second sequence to the second target sequence: (1) the first nonhybridizing region and the second nonhybridizing region do not hybridize with the target nucleic acid molecule; and (2) the first end and the second end undergo coupling to one another. The first nucleic acid molecule and the second nucleic acid molecule may be configured such that, upon hybridization of the first sequence to the first target sequence and the second sequence to the second target sequence, the first end of the first nucleic acid molecule and the second end of the second nucleic acid molecule may be adjacent.

The probes such as padlock probes can undergo gap filing or circularization and amplification for detection. Once hybridized to targeted nucleic acid molecules, the identity and location of the probe molecules can be determined with the use of techniques such as sequencing by extension with reversible terminators, fluorescent in situ sequencing, pyrosequencing, and massively parallel signature sequencing (MPSS) in the context of the 3D matrix. Determination of the identity and location of probe molecules can allow for the generation of spatial maps showing the location of MHC protein and/or tumor antigen expression. Spatial maps can inform the selection of treatments for cancer. In some instances, administration of a treatment selected based in part on the spatial expression of MHC proteins and/or tumor antigens can have an increased probability of efficacy compared to a treatment that was selected without knowledge related to the spatial expression of MHC proteins and/or tumor antigens.

Methods of Analyzing the Spatial Distributions of Target Nucleic Acids

Methods and systems disclosed herein can utilize probe molecules to identify target sequences (e.g., variant sequences) in a biological sample such as, tissue, tumor tissue, populations of cells, individual cells, or a derivative of any of the foregoing. In some cases, the methods and systems can further determine the location of identified target sequences. In some cases, the determined identities and locations of target sequences can be used to generate visual representations (e.g., for display on a user interface, such as a graphical user interface) of biological samples. In some instances, a target sequence is a sequence of nucleotides within a biological sample.

In some cases, probes of the disclosure comprise a nucleic acid targeting sequence (or target hybridizing sequence or region). Additionally, probe molecules of the disclosure can further comprise identifier nucleic acid sequences (e.g., barcode sequences), sequencing primer binding sites, and padlock binding sites. Nucleic acid targeting sequences can be used to direct the binding of a probe to a specific nucleic acid within a cell. For example, a nucleic acid targeting sequence can comprise a specific sequence of nucleotides that causes the probe to target and hybridize to a specific nucleic acid within a cell. In some cases, the nucleic acid targeting sequence comprises a sequence of nucleotides that is complementary to a sequence of the targeted nucleic acid, causing hybridization of the targeting sequence and the target via Watson-Crick base pairing. In some cases, a nucleic acid targeting sequence is an HLA targeting sequence that is designed to bind to a nucleic acid corresponding to an HLA variant sequence. Nucleic acid targeting sequences of the disclosure can be, for example, from 21 to 200 nucleotides long. In some cases, the nucleic acid targeting sequences of the disclosure can be at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 150 or more nucleotides long. In some cases, the nucleic acid targeting sequences of the disclosure can be at most about 200, 150, 100, 95, 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30 or less nucleotides long.

In some embodiments, each probe targets and hybridizes to a specific target nucleic acid molecule and the specific target nucleic acid molecule is not bound by multiple probes. For example, a sequence of at most about 20, 15, 10, 5 or less nucleotides long of a particular MHC or human leukocyte antigen (HLA) variant sequence is targeted by each probe. In some cases, a position unique to a particular MHC or HLA allele is at most about 5, 4, 3, 2, 1 or less nucleotides long. In some aspects, padlock probes may be useful for detecting short target nucleic acid sequences (e.g., distinguishing base(s) of a HLA variant).

In some cases, the target nucleic acid corresponds to and/or is specifically co-localized with a variant or transcript sequence of interest. In some instances, the target sequence is an HLA sequence or a variant or a transcript sequence corresponding to an antigen. In some instances, the target sequence comprises an HLA allele or a mutation. The target can be associated with, for example, an increased risk a cancer, a tumor or tumor associated antigen, a cancer/testis antigen, a cancer antigen, an oncofetal protein, a tumor mutation, a tumor suppressor protein, or a neoantigen. When a probe molecule hybridizes to a nucleic acid molecule that corresponds to and is co-localized with a target sequence of interest, detection of the identity and location of the probe molecule allows for or the detection of the identity and location of the targeted sequence. In some cases, the target sequence encodes a protein or a portion thereof and detection of the identity and location of the target sequence allows for the identity and location of the protein to be determined.

Probe molecules of the disclosure can be identified by, for example, identifying an identifier nucleotide sequence. An identifier nucleotide sequence can allow for the specific identification of a probe molecule. According to one aspect of the disclosure, an identifier nucleotide sequence is a unique nucleotide sequence. According to another aspect of the disclosure, an identifier nucleotide sequence is a substantially unique nucleotide sequence. According to yet another aspect of the disclosure, an identifier nucleotide sequence can convey enough information about a probe molecule to infer the identity of a probe molecule with at least a non-uniform probability. In some cases, the identifier nucleotide sequence can be held in common for some or all of the probes that bind to a particular nucleic acid sequence or target. In some cases, the probe can comprise a combination of identifier nucleotide sequences described herein.

In some cases, additional information can be used to support an inference on the identity of a probe molecule. Additional information can include the identity of the different probe molecules that have contacted the sample or information about the biological sample being contacted. In some cases, such as when multiple replicates of a probe molecule are used to target multiple replicates of the same nucleic acid molecule, each replicate probe molecule can comprise the same identifier nucleotide sequence.

In some embodiments, a probe molecule of the disclosure has a length ranging from 21 to 200 nucleotides long. In some cases, the probe molecule of the disclosure can be at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 150 or more nucleotides long. In some cases, the probe molecule of the disclosure can be at most about 200, 150, 100, 95, 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30 or less nucleotides long.

The probe can be a multicomponent probe comprising a first nucleic acid molecule and a second nucleic acid molecule, and the variant sequence can be included within the target hybridizing region of one of the two components, for example, the first nucleic acid molecule. The first nucleic acid molecule can further include the identifier nucleotide sequence, such that the plurality of probes targeting a sequence variation can share a common component (e.g., the first nucleic acid molecule) with conserved sequence. In some cases, the probes can comprise degenerate bases (e.g., W). Such probes can be prepared during synthesis, comprising a mixture of probes with A and T at that position. The identifier nucleotide sequence can be detected by sequencing such as sequencing by hybridization.

The identity and/or location of a target sequence (e.g., variant sequence) can be determined via the identification of a portion of a probe molecule. In some cases, the identity and/or location of a target sequence is determined via the identification of about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, or about 100% of a probe molecule. In some cases, the identity and/or location of a target sequence is determined via the identification of 50%-100% of a probe molecule. In some cases, the identity and/or location of a target sequence is determined via the identification of at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, of at least 90% of a probe molecule. In some cases, the identity and/or location of a target sequence is determined via the identification of at most 10%, at most 20%, at most 30%, at most 40%, at most 50%, at most 60%, at most 70%, at most 80%, or at most 90% of a probe molecule. In some cases, the identity and/or location of a target sequence is determined via the identification of the identifier nucleotide sequence (e.g., barcode sequence) of a probe. In some cases, the identity and/or location of a target sequence is determined via the identification of about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, or about 100% of a identifier nucleotide sequence. In some cases, the identity and/or location of a target sequence is determined via the identification of 50%-100% of a identifier nucleotide sequence. In some cases, the identity and/or location of a target sequence is determined via the identification of at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, of at least 90% of a identifier nucleotide sequence. In some cases, the identity and/or location of a target sequence is determined via the identification of at most 10%, at most 20%, at most 30%, at most 40%, at most 50%, at most 60%, at most 70%, at most 80%, or at most 90% of a identifier nucleotide sequence. In other cases, the identity and/or location of a target sequence is determined via the identification of an entire probe molecule. In some cases, a single nucleotide in the identifier nucleotide sequence of the probe may be sufficient to determine a variant. For example, within the identifier nucleotide sequence, a nucleotide identity at a single position can distinguish between up to 4 variant target sequences.

Determining the Identity and Location of Variant Sequences Via Probe Detection

Methods and systems of the disclosure can identify and locate target sequences within a biological sample. In some cases, the target sequences can be variant sequences, which encode for MHC proteins, antigens (e.g., tumor antigens), or portions thereof.

In some cases, the target sequences (e.g., a second nucleic acid for analysis) can encode antigens, inflammatory markers, or cell typing markers (e.g., immune cell marker, cancer marker). For example, a target sequence may be associated with immune cells such as B cells, T cells, NK (natural killer) cells, NK T cells, professional antigen-presenting cells (APCs), and non-professional antigen-presenting cells, and inflammatory cells (neutrophils, macrophages, monocytes, eosinophils, and basophils). In some examples, a target sequence may be associated with modulation of an existing immune response, a developing immune response, a potential immune response, or the capacity to induce, regulate, influence, or respond to an immune response. In some cases, a target sequence may be associated with altered production and/or secretion of certain classes of molecules such as cytokines, chemokines, growth factors, transcription factors, kinases, costimulatory molecules, or other cell surface receptors. In some cases, the target sequences can encode a receptor or ligand on the surface of the cancer cell (e.g., epidermal growth factor receptor (EGFR, ErbB-1, HER1)). In some cases, the target sequences can encode a T cell marker e.g., CD3, CD4, or CD8).

The location and identify of variant sequences can be determined by identifying and locating probe molecules (or portions thereof) targeting the variant sequences. In some instances, probe molecules can target a nucleic acid corresponding to a specific variant sequence. A nucleic acid corresponding to a specific variant sequence can be, for example, a genomic deoxyribonucleic acid (DNA) sequence, a complementary deoxyribonucleic acid (cDNA) molecule reverse transcribed from a variant sequence, a ribonucleic acid (RNA) molecule transcribed from a variant sequence, and amplification products of the foregoing. Additionally/alternatively, probe molecules can discriminate between two or more nucleic acids corresponding to different variant sequences known to be present in a biological sample. For example, a method of the disclosure can comprise generating a genetic profile of a subject from which a sample is obtained to identify a plurality of variant sequences (e.g. HLA variant sequences or sequences encoding tumor antigen variants) present in the subject. Genetic profiles can be generated via, for example, RNA sequencing, exome sequencing, or whole genome sequencing. Probe molecules can be designed to discriminate between two nucleic acid molecules corresponding to sequences of the plurality of variant sequences and/or to preferentially hybridize to a nucleic acid molecule corresponding to only one of the identified variant sequences. The identity and location of the identified variant sequences within the biological sample can then be identified by contacting the sample with the designed probe molecules.

In some examples, the location and identity of a probe molecule or portion thereof is determined by first amplifying the probe molecule to generate a corresponding amplification product and subsequently determining the location and identity of the amplification product or portion thereof. In other examples, the location and identity of a probe molecule or portion thereof can be determined directly.

Methods that can be used to determine the location and identity of probe molecules or portions thereof include methods of sequencing nucleic acids in situ within a 3D matrix. Sequencing methods, such as sequencing by extension with reversible terminators, fluorescent in situ sequencing (FISSEQ), OligoFISSEQ, pyrosequencing, massively parallel signature sequencing (MPSS) and the like, can be used to sequence nucleic acids, including the probes comprising nucleic acids described herein, within a 3D matrix. Sequencing methods of the disclosure can determine the location and identity of probe molecules or portions thereof either via the sequencing of probe molecules (or portions thereof) directly or via sequencing of corresponding amplification products (or portions thereof).

Pyrosequencing is one such method that can be used with the methods and systems described herein. Pyrosequencing is a method in which pyrophosphate (PPi) is released during each nucleotide incorporation event (i.e., when a nucleotide is added to a growing polynucleotide sequence). The PPi released in the DNA polymerase-catalyzed reaction can be detected by ATP sulfurylase and luciferase in a coupled reaction which can be visibly detected. The added nucleotides can be continuously degraded by a nucleotide-degrading enzyme. After the first added nucleotide has been degraded, the next nucleotide can be added. As this procedure is repeated, longer stretches of the template sequence can be deduced.

Massively Parallel Signature Sequencing (MPSS) is another such sequencing method which utilizes ligation-based DNA sequencing simultaneously on microbeads. A mixture of labelled adaptors comprising all possible overhangs is annealed to a target sequence of four nucleotides. The label can be detected upon successful ligation of an adaptor. A restriction enzyme is then used to cleave the DNA template to expose the next four bases.

Fluorescent in situ sequencing (FISSEQ) is another sequencing method that can be used with the methods and systems described herein. FISSEQ is a process during which a series of biochemical processing operations are interlaced with fluorescent imaging operations within a biological sample. Sequencing methods that can be employed by FISSEQ include sequencing by synthesis, sequencing by ligation, and sequencing by hybridization. A FISSEQ assay can involve: (1) the extension of DNA via the addition of a single type of fluorescently-labelled nucleotide triphosphate (dNTP) to a reaction, (2) washing away of unincorporated nucleotide, (3) the detection of nucleotide incorporation by fluorescence imaging, (4) the repetition of operations 1-3 with each of the four dNTPs in turn, and (5) the repetition of operations 1-4 in cycles. At the beginning of each subsequent cycle, the fluorescence from previous cycles can be bleached or digitally subtracted or the fluorophore can be cleaved from the nucleotide and washed away. Following the completion of each cycle, a nucleotide present in the sequence can be identified, thus allowing for the identification of a sequence via the completion of multiple cycles. In some embodiments, operation (1) of the FISSEQ cycle described above can involve the addition of a mixture of all four dNTPS each labelled with a different fluorophore. In such embodiments operations 1-3 may not be repeated individually for each of the four dNTPs, as the identity of the dNTP incorporated in each cycle can be determined based on detection of the fluorescent label corresponding to the dNTP. The probes described herein can also be detected by OligoFISSEQ, which leverages fluorescence in situ sequencing (FISSEQ) of barcoded Oligopaint probes to enable visualization of many targeted genomic regions. The probe can be an Oligopaint probe. The probe can comprise a barcode, which can be interrogated by various methods to identify the probe and corresponding target sequence. For example, the barcode can be identified by sequencing by synthesis, sequencing by ligation, sequencing by hybridization, hybridization chain reaction or cyclic hybridization chain reaction. For another example, the barcode can be identified by temporal detection, where two or more subsequences of the barcode can be sequentially detected by sequencing or hybridization.

Sequencing data can be processed to allow for the visualization of each sequenced nucleotide strand as a localized spot in a fluorescent image. In some cases, the data include a sequence of colors corresponding to the nucleotide sequence of the strand. By analyzing successive fluorescent images of FISSEQ cycles, the identity and location of a vast number of unique probes can be determined in a single biological sample. In some instances, the identity and location of expressed antigen encoding, or HLA, genes can be inferred from the identity and location of probe molecules. Computational methods can then be employed to construct visual representations of gene expression in biological samples. In some embodiments, a method disclosed herein can simultaneously present the location of at least about 50, 100, 200, 300, 400, 500 or more expressed genes within a genome, or all expressed genes within a genome (e.g., whole genome or transcriptome).

In cases where sequencing methods (e.g., FISSEQ) are used for the sequencing of probe molecules or portions thereof (e.g., identifier nucleotide sequences) directly, the probe sequences may be designed to optimize the sequencing protocol for properties such as compactness and error robustness. For example, probe or identifier nucleotide sequence of the minimum length for unique specification of each target species can be used. In some embodiments, the spatial sequencing of identifier molecules can be combined with non-spatial methods of high throughput sequencing to sequence and identify longer nucleic acid molecules such as those corresponding to DNA variants, transcripts, antigens, and HLA alleles. In some embodiments, the spatial sequencing of identifier molecules can be combined with the identification of proteins, such as antigens. This can happen through antibody staining or the spatial sequencing of nucleic acid tagged antibodies that identify proteins. Additionally, error correction features such as redundant or parity bits of information can be added to probe sequences.

Amplification Methods

In some embodiments, the target sequences such as variant sequences, nucleic acids corresponding to variant sequences, probes, derivatives of the foregoing, and/or portions of any of the foregoing (e.g., identifier nucleotide sequences) are subjected to amplification reactions prior to being identified. Amplification of a probe or portion thereof generates a corresponding amplification product comprising an amplified nucleic acid sequence. In some cases, target sequences, nucleic acids corresponding to target sequences or probe molecules (or portions thereof) are identified via the identification of a corresponding amplification nucleic acid sequence, such as an amplicon, or a portion thereof. In some cases, target sequences, nucleic acids corresponding to target sequences, probe molecules or portions thereof are identified via the identification of about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, or about 100% of an amplified nucleic acid sequence. In some cases, variant sequences, nucleic acids corresponding to variant sequences, probe molecules, or portions thereof are identified via the identification of 50%-100% of an amplified nucleic acid sequence. In some cases, the identity and/or location of a variant sequence is determined via the identification of at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, of at least 90% of an amplified nucleic acid sequence. In some cases, the identity and/or location of a variant sequence is determined via the identification of at most 10%, at most 20%, at most 30%, at most 40%, at most 50%, at most 60%, at most 70%, at most 80%, or at most 90% of an amplified nucleic acid sequence.

Amplification products can be generated using various methods for nucleic acid amplification, including solid-state or semi-solid-state amplification methods. Nucleic acid molecules can be amplified by rolling circle amplification (RCA), as by using a circular template molecule and an enzyme capable of rolling circle amplification, such as Phi29, Bst, Vent, 9° N DNA polymerases and related enzymes. Nucleic acid molecules can be amplified by polymerase chain reaction (PCR), as by using a DNA polymerase enzyme. Nucleic acid molecules can be amplified by an RNA polymerase using the in vitro transcription reaction, such as by T7 RNA polymerase. In some cases, nucleic acid molecules can be subjected to multiple different amplification reactions.

Amplification products may comprise functional linkage groups for tethering to a 3D matrix, such as acrylamide or click-reactive groups, enabling the products of amplification to be spatially immobilized via covalent gel linkages. According to one aspect, the functional linkages can be incorporated during amplification using nucleotide analogs, including amino-allyl dUTP, 5-TCO-PEG4-dUTP, C8-Alkyne-dUTP, 5-Azidomethyl-dUTP, 5-Vinyl-dUTP, 5-Ethynyl dUTP, or a combination thereof. According to a separate aspect, for amplification methods using one or more primers, one or more of the primers may comprise a functional linkage group for tethering to a 3D matrix, i.e., solid-state.

Further, amplification products may be subsequently processed, chemically or biochemically, using mechanisms including, but not limited to, fragmentation, end-modification, second-stranding, annealing of accessory strands, such as priming, gap filling, circularization, blunt ending, phosphorylation, dephosphorylation, protection, and deprotection. End-modifications may entail the addition and/or removal of bases or sequences. For example, additional sequences may be used to subject the amplification products to next generation sequencing reactions. End modifications may also entail adding chemical moieties that may be useful for linkages or coupling the amplification products to another molecule. For example, the end may be phosphorylated or dephosphorylated by an enzyme, for example, a kinase or phosphatase. Protecting groups may also be added or removed to allow or prevent particular reactions from taking place. Blunt ending may also occur in which an overhang comprising a portion of single stranded nucleic acid is removed. Amplification products may be subjected to second stranding reactions which may result in additional nucleic acid molecules. Second stranding may be performed by adding in a primer which is complementary to the amplification product and a polymerizing enzyme to generate additional nucleic acid molecules. Amplification products may also be subjected to gap filling via a polymerizing enzyme, which may link two strands of DNA together via the synthesis of intervene bases. Amplification products may also be circularized via the activity of a gap filling reaction, extension reaction, ligation reaction, or a combination thereof, in which a circular nucleic acid is generated from a linear nucleic acid.

Amplification products or subunits thereof may be processed into a greater plurality of subunits, such as by fragmentation. Methods of fragmenting amplification products include mechanisms which are random, or substantially random, including by the DNA hydrolysis or DNA nicking activities of enzymes including, but not limited to, DNase, endonucleases, and DNA repair enzymes. According to one aspect of the random fragmentation mechanism, nucleotides or nucleotide analogs can be incorporated into an amplification product during synthesis, which subsequently become the site of amplification product fragmentation. Examples include incorporation of dUTP with fragmentation by Uracil-Specific Excision Reagents (USER), such as the combination of Uracil DNA Glycosylase (UDG) and an endonuclease such as Endonuclease VIII or Endonuclease IV; incorporation of inosine with fragmentation by Endonuclease V; and by incorporation of monomers bearing internal cleavage sites, such as oligonucleotides with internal disulfide or bridging phosphorothioate linkages during ligase-mediated amplification. Methods of fragmenting an amplification product can include mechanisms for site-directed fragmentation, such as by restriction endonucleases, for which single-stranded sites may be splinted with an accessory oligonucleotide to facilitate the restriction endonuclease reaction, and by other sequence-specific nucleic acid cutting mechanisms including by Cas9, C2c2, and other nucleic-acid-directed nucleic-acid restriction enzymes, and by Transcription Activator-Like Effector Nucleases (TALENs). According to some embodiments, a rolling circle amplification product can be fragmented into molecules corresponding to a probe or identifier nucleotide sequence. Fragmentation may also be performed by subjecting a rolling circle amplification product to a reverse primer in a process known as hyberbranched rolling circle amplification (RCA). For example, RCA may create a long nucleic strand comprising multiple repeats of the template nucleic. Subjecting the amplification product to a reverse primer may allow a polymerization may create separate double strand DNA molecules resulting in the RCA amplification product being fragmented.

Probes for Amplification

Methods and systems disclosed herein can utilize various probes. The probe can be a linear probe. The probe can be a circularizable probe. The probe can be a circular probe. The probe can be a padlock probe. The probe can be a molecular inversion probe. The probe can comprise a target hybridizing region and target nonhybridizing region. The probe (e.g., the target nonhybridizing region) can further comprise a primer binding sequence and/or an identifier nucleotide sequence.

The probe can be a multicomponent probe. The probe can comprise two or more separate nucleic acid fragments, and the two or more separate nucleic acid fragments can be joined together to form a circular probe when the two or more separate nucleic acid fragments hybridize to a target nucleic acid molecule. In some cases, the probe can comprise a first nucleic acid molecule comprising (i) a first hybridizing region having a first sequence complementary to a first target sequence of a target nucleic acid molecule and (ii) a first nonhybridizing region at a first end of the first nucleic acid molecule. The probe can further comprise a second nucleic acid molecule comprising (i) a second hybridizing region having a second sequence complementary to a second target sequence of the target nucleic acid molecule and (ii) a second nonhybridizing region at a second end of the second nucleic acid molecule. The first nucleic acid molecule and the second molecule may be configured such that, upon hybridization of the first sequence to the first target sequence and the second sequence to the second target sequence: (1) the first nonhybridizing region and the second nonhybridizing region do not hybridize with the target nucleic acid molecule; and (2) the first end and the second end undergo coupling to one another. The first nucleic acid molecule and the second nucleic acid molecule may be configured such that, upon hybridization of the first sequence to the first target sequence and the second sequence to the second target sequence, the first end of the first nucleic acid molecule and the second end of the second nucleic acid molecule may be adjacent. The first nucleic acid molecule and the second nucleic acid molecule may be configured such that, upon hybridization of the first sequence to the first target sequence and the second sequence to the second target sequence, the first end and the second end undergo coupling to one another via a nucleic acid extension reaction. The first nucleic acid molecule and the second nucleic acid molecule may be configured such that, upon hybridization of the first sequence to the first target sequence and the second sequence to the second target sequence, the first end and the second end undergo coupling to one another via a nucleic acid ligation reaction. The first nucleic acid molecule and the second nucleic acid molecule may be configured such that, upon hybridization of the first sequence to the first target sequence and the second sequence to the second target sequence, the first end and the second end undergo coupling to one another via a hybridization reaction. The first nucleic acid molecule and the second nucleic acid molecule may be configured such that, upon hybridization of the first sequence to the first target sequence and the second sequence to the second target sequence, the first end and the second end undergo coupling to one another via a nucleic acid extension and a nucleic acid ligation reaction.

Methods and systems disclosed herein can utilize padlock probes (herein known as “padlocks”) in the detection of target sequences, nucleic acids corresponding to target sequences, probes, or portions of probes (e.g., identifier nucleotide sequences) within a biological sample. Padlocks can be designed to bind specifically to targets such as nucleic acids corresponding to target sequences, probes, or portions of probes disclosed herein and can comprise a first end, a second end, and 5′ and 3′ terminal regions. In some cases, padlocks bind to specific portions of probes disclosed herein such as padlock binding sites or identifier nucleotide sequence. By hybridization to a target, the ends of the padlock are brought into juxtaposition for ligation. The ligation may be direct or indirect. In other words, the ends of the padlock may be ligated directly to each other or they may be ligated to an intervening nucleic acid molecule or a sequence of nucleotides. Thus, the terminal regions of the padlock probe may be complementary to adjacent, or contiguous, regions in the target (e.g., probe) to which it is targeted, or they may be complementary to non-adjacent or non-contiguous regions in the target (e.g., probe) to which it is targeted. In the cases where the padlock probe is complementary to non-adjacent or non-contiguous regions of the target, for ligation to occur, the “gap” between the two ends of the hybridized padlock probe can be filled by an intervening oligonucleotide molecule or a sequence of nucleotides.

A “gap” region described herein may be of various lengths. For example, a gap region can comprise at least 1, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50 or more nucleotide(s). For another example, a gap region can comprise from 2 to 10, from 10 to 20, from 20 to 50, from 50 to 100, from 100 to 150, from 150 to 200, from 200 to 300, from 300 to 400, from 400 to 500, or from 2 to 500 nucleotides. A gap region can be filled in by incorporating one or more nucleotides in an extension reaction, for example, by extending from the 3′ end of the two ends. The extension reaction can be performed by a polymerase. The gap region can also be filled in by hybridizing an additional nucleotide or an additional oligonucleotide sequence to the gap region. The length of the additional oligonucleotide sequence can be determined based on the length of the gap region. For example, the additional oligonucleotide sequence can be of the same length as the gap region such that after hybridizing with the gap region, the 5′ end of the additional oligonucleotide sequence is adjacent to the 3′ end of the padlock and the 3′ end of the additional oligonucleotide sequence is adjacent to the 5′ end of the padlock. For example, the additional oligonucleotide sequence can comprise from 2 to 10, from 10 to 20, from 20 to 50, from 50 to 100, from 100 to 150, from 150 to 200, from 200 to 300, from 300 to 400, from 400 to 500, or from 2 to 500 nucleotides. The circularized padlock can then be generated by ligating the ends of the additional oligonucleotide sequence with the ends of the padlock.

Upon addition to a sample having a target molecule, the ends of the padlock may hybridize to complementary regions in a target molecule or derivative thereof (e.g., probe molecule). Following hybridization, the padlock may be circularized by direct or indirect ligation of the ends of the padlock by a ligase enzyme. The circularized padlock may be subjected to amplification to generate an amplification product. For example, the circulated padlock may be subjected to rolling circle amplification (RCA) to generate a DNA nanoball (i.e., rolony). The circularized padlock may be primed by the 3′ end of the probe nucleic acid sequence (i.e., the RCA is target-primed). A DNA polymerase with 3′-5′ exonuclease activity may be used. This can permit the digestion of the probe strand in a 3′-5′ direction to a point adjacent to the bound padlock. Alternatively, the targeted probe may be of appropriate length and may act as the primer for the DNA polymerase-mediated amplification reaction without such digestion. As a further alternative, instead of priming the RCA with the targeted probe molecule, an additional primer that can hybridize to the padlock may be added in the sample and used for an amplification reaction. The amplification product (e.g., rolony) can be identified and its location determined. The location and identity of the amplification product can be used to subsequently locate and identify the targeted probe molecule, which in turn can locate and identify a sub-cellular structure that co-localizes with the nucleic acid target of the probe molecule. Using a plurality of padlock probes, a number of target nucleic acids can be detected in a multiplex manner. In some embodiments, a set of probes for detecting a plurality of target nucleic acids is designed and provided for a particular sample, e.g., a personalized probe set for an individual or a tumor sample from the individual.

A padlock disclosed herein may comprise functional moieties for immobilization to a 3D matrix, either directly or indirectly, as via a hybridized oligonucleotide. For example, a tethering oligo hybridized to the backbone of the padlock, e.g., outside the domains responsible for hybridizing to the target probe molecule, may serve as a rolling circle amplification primer, thereby serving to tether the padlock molecule (and targeted probe molecule) via DNA hybridization prior to rolling circle amplification, and subsequently serving to tether the rolling circle amplification product (i.e., rolony) after rolling circle amplification for the purpose of preserving the spatial information associated with the original targeted (by the probe) nucleic acid molecule, targeted (by the padlock) probe molecule, padlock, and rolony.

Predicting Presentation of Antigen Peptides in the Context of MHC Proteins

Methods disclosed herein may comprise predicting the presentation of specific MHC-antigen complexes on the surface of a cell. In some cases, the spatial distribution of an MHC-antigen complex presented on the surface of a cell within a biological sample is predicted. Predictions can be based in whole or in part on, for example, the presence, absence, or spatial distribution of target sequences in a biological sample. A variant sequence can be, for example, an HLA variant sequence or be associated with a tumor antigen (e.g., comprise a mutation associated with a tumor antigen) such as a tumor-specific antigen, a tumor-associated antigen, or a neoantigen. In some instances, a variant sequence is detected within a cell. Additionally or alternatively, the presence of multiple variant sequences (e.g., an HLA variant sequence and a variant sequence associated with a tumor antigen) can be detected within the same cell. In some cases, antigens presented as part of MHC-antigen complexes are informative of cancer. For example, an antigen can be a tumor antigen. In some cases, the antigen can be a protein that healthy subjects normally only express in specific cells (e.g., a cancer/testis antigen) or during specific developmental stages (e.g., an oncofetal protein).

In some cases, methods disclosed herein comprise using a computer algorithm to predict the presentation or absence and/or spatial distribution of MHC-antigen complexes within a biological sample. Computer algorithms may aid in the prediction of MHC-antigen complex presentation by predicting the ability of MHC binding motifs to bind antigens. Computer algorithms can be used alone or in combination with methods to determine the presence, absence, or spatial distribution of variant sequences. Non-limiting examples of computer algorithms that can be used with methods of the disclosure include scoring function based computer algorithms such as SYFPEITHI, RANKPEP, PickPocket 1.1, SMMPMBEC, PSSMHCpan 1.0, and MixMHCpred 2.0.1; machine learning based algorithms such as NetMHC 4.0, NetMHCstabpan, NetMHCPan 4.0, MHCnuggets 2.0, ConvMHC, and HLA-CNN; and consensus algorithms such as IEDB-AR-Consensus, and NetMHCcons.

Visualizing MHC-Antigen Co-Localization

Methods disclosed herein may comprise visualizing the co-localization of antigen and MHC protein expression in a biological sample. In some cases, a visual representation of MHC-antigen expression co-localization can be generated by a computer for display on a user interface, such as a graphical user interface. Visual representations of the disclosure can depict the presence or absence of nucleic acids, such as nucleic acids associated with and/or corresponding to MHC or antigen (e.g., tumor antigen) expression. Additionally or alternatively, visual representations can depict predictions of MHC-antigen complex presentation within a biological sample. Visual representations can depict the predicted presence, or absence, of MHC-antigen complexes within a biological sample.

In some instances, visual representations display information on a cellular level. For example, a visual representation can depict the presence of nucleic acids, the absence of nucleic acids, the predicted presence of MHC-antigen complexes, or the predicted absence of MHC-antigen complexes within or on the surface of individual cells. In some cases, methods disclosed herein can identify clones within a biological sample. Clones can be identified based on the co-expression, or predicted co-expression, of an antigen and MHC protein in a single cell. In some cases, clones are generated by analyzing a generated visual representation. In some cases, a visual representation is generated to depict the distribution of clones and/or clonal populations within a biological sample (i.e., the clonal structure of the sample). Visual representations of the clonal structure of a biological sample such as a tumor can be used to design, or select, a treatment for a subject from which the sample is derived. For example, a visual representation a tumor's clonal structure can be used to design or select a bispecific antibody or T-cell receptor (TCR) based therapy against specific MHC-antigen complexes or cancer vaccines against such complexes.

Identification of Cancer Cells

Information pertaining to MHC and/or antigen expression can be used to identify a cancerous cell. For example, the presence of a tumor antigen, may indicate that a cell is cancerous. Cancer cell clones, sub-clones, and clonal populations of cells can also be identified based on the presence or absence of HLA gene variants and/or genes encoding antigen proteins in a cell. The presence of an HLA gene variant and/or gene encoding an antigen protein can be detected by, for example, determining the identity and location of a probe molecule hybridized to a corresponding nucleic acid using in situ sequencing. In some instances, the presence and location of cancer cells, cancer cell clones, sub-clones, and/or clonal populations of cancer cells can be portrayed in a visual representation.

Selection and Administration of Therapies

Knowledge of MHC and/or antigen expression in a biological sample derived from a subject can identify the presence of a cancer cell and/or guide the selection of an effective immunotherapy for the subject. Immunotherapies often rely on immune cells targeting tumor antigens presented on the surface of cancer cells. Frequently, the ability of an immune cell to effectively target a tumor antigen depends on display of the tumor antigen as part of a particular MHC-antigen complex. A loss of expression of, or mutation in, an MHC protein from a cancer cell can lead to altered or improper presentation of a tumor antigen on the cell surface and render an immunotherapy ineffective. Similarly, a loss of expression of, or mutation in, a tumor antigen can render an immunotherapy ineffective. Conversely, the presence of MHC-antigen complexes on a cancer cell surface can indicate that a cancer cell will be susceptible to a particular immunotherapy. In some cases, antigens and/or MHC protein expression is lost in only a portion of cells in a tumor. In some cases, antigens and/or MHC proteins are mutated in only a portion of cells in a tumor.

Methods disclosed herein can comprise selecting a therapy (e.g., an immunotherapy) to be administered to a subject. In some cases, the methods further comprise administering a therapy such as an immunotherapy to a subject. A therapy can be selected based at least in part on the spatial distribution of antigens, MHC proteins, and/or mutants thereof. Additionally or alternatively, a therapy can be selected based at least in part on the presence or absence of antigens, MHC proteins, and/or mutants thereof in a biological sample. For example, a therapy that targets a MHC-antigen complex that is expressed or predicted to be expressed homogenously throughout a tumor volume can be selected and administered to a subject. In some cases, a therapy (e.g., an immunotherapy) that is likely to effectively target a clonal population can be selected and administered to a subject. In some cases, multiple clonal populations can be predicted to exist in a tumor and multiple therapies likely to effectively target each clonal population can be selected. In some cases, a method of the disclosure comprises administering one or more therapies likely to target a clonal population to a subject. In some cases, a method of the disclosure comprises identifying a clonal population that is likely to evade a particular immunotherapy and targeting the clonal population with a second therapy rather than with the immunotherapy that is not likely to be effective. Non-limiting examples of immunotherapies that can be selected and/or administered as part of methods disclosed herein include recombinant T-cell therapies (including T cells expressing recombinant or exogenous T cell receptors), chimeric antigen receptor (CAR) T-cell therapies, cancer vaccines, checkpoint inhibitors, and B-cell therapies.

T-cell receptors (TCRs) are generally found on the surface of T-cells and are responsible for recognizing antigens bound to MHC proteins. When TCRs engage with MHC-antigen complexes the corresponding T-cell is activated. T-cell activation can facilitate destruction of the recognized cell. In recombinant T-cell therapies, T-cells can be engineered to express TCRs targeted towards MHC-antigen complexes expressed by cancer cells so that when the TCR binds the MHC-antigen complex T-cell activation leads to destruction of the cancer cell. In some cases, TCRs are engineered to target a particular MHC-antigen complex predicted or known to be expressed on a cancer cell within a biological sample. For example, a target antigen or portion thereof can be used to produce or generate a TCR. In some cases, directed evolution methods are used to generate TCRs with altered properties, such as with higher affinity for a specific MHC-antigen complex. In some cases, directed evolution is achieved by display methods including, but not limited to, yeast display, phage display, or T cell display. In some cases, display approaches involve engineering, or modifying, a known, parent, or reference TCR. For example, in some cases, a wild-type TCR can be used as a template for producing mutagenized TCRs with a predetermined altered property, such as higher affinity for a predetermined target antigen or MHC-antigen complex. In some embodiments, a recombinant T-cell therapy is engineered or selected to target an antigen and/or MHC-antigen complex known or predicted to be expressed within a biological sample. In some cases, a recombinant T-cell therapy is engineered or selected based at least in part on the spatial distribution of antigen and/or MHC expression in a biological sample. In some cases, a method of the disclosure comprises administering an engineered or selected recombinant T-cell therapy to a subject.

CAR T-cell therapies utilize CAR T-cell receptors that have been engineered to combine both antigen binding and T-cell activating functions into a single receptor. In some embodiments of a method disclosed herein, a CAR T-cell therapy can be engineered or selected to target an antigen and/or MHC-antigen complex known or predicted to be expressed within a biological sample. In some cases, a CAR T-cell therapy is engineered or selected based at least in part on the spatial distribution of antigen and/or MHC expression in a biological sample. In some cases, a method of the disclosure comprises administering an engineered or selected CAR T-cell therapy to a subject.

A recombinant or CAR T-cell therapy of the disclosure can, in some instances, be a natural killer T (NKT) cell therapy. NKT cells are specialized T-cells. NKT cells can recognize lipid antigens presented on MHC proteins and can lead to activation of innate and adaptive immune cells in the tumor microenvironment. In some cases, a NKT cell therapy is engineered or selected based at least in part on the spatial distribution of antigen and/or MHC expression in a biological sample. In some cases, a method of the disclosure comprises administering an engineered or selected NKT cell therapy to a subject.

Cancer vaccines can treat cancer by activating a subject's immune system against cancer cells. Some cancer vaccines work by immunizing subjects against antigens expressed by cancer cells so that the immune system is stimulated to kill cancer cells expressing the antigen. In some cases, a subject can be immunized with an effective amount of an immunogen containing an effective amount of a particular antigen or MHC-antigen complex. In some cases, a subject can be immunized with precursors that will stimulate the subject's body produce an immune response to the antigen, such as through the use of synthetic peptides, self-replicating synthetic RNA, and viral vectors, which may be used alone or in combination.

In some cases, the antigen of the MHC-antigen complex is an epitope of antigen capable of binding to the MHC. In some embodiments, a cancer vaccine is engineered or selected to immunize a subject against a target antigen and/or MHC-antigen complex known or predicted to be expressed within a biological sample. In some cases, a cancer vaccine is engineered or selected based at least in part on the spatial distribution of antigen and/or MHC expression in a biological sample. In some cases, a method disclosed herein comprises administering an engineered or selected cancer vaccine to a subject.

Checkpoint inhibitors target immune checkpoints, immune system regulators that can dampen the immune response to a stimulus. Often, cancer cells evade destruction by immune cells by stimulating immune checkpoint targets, a process that can be reversed via the administration of checkpoint inhibitors. Non-limiting examples of molecules that can be targeted for inhibition by checkpoint inhibitors include CD25, PD-1 (CD279), PD-L1 (CD274, B7-H1), PD-L2 (CD273, B7-DC), CTLA-4, LAG3 (CD223), TIM3, 4-1BB (CD137), 4-1BBL (CD137L), GITR (TNFRSF18, AITR), CD40, CD40L, ICOS, ICOS-L, OX40 (CD134, TNFRSF4), OX40L, CXCR2, tumor associated antigens (TAA), B7-H3, B7-H4, BTLA, HVEM, GAL9, B7H3, B7H4, CD28, VISTA, CD27, CD30, STING, A2A adenosine receptor, KIR, and 2B4. In some instances, the effectiveness of a checkpoint inhibitor can be predicted based on the expression of antigens and/or MHC proteins by cancer cells. In some embodiments, a checkpoint inhibitor is selected based at least in part on the number of expected antigens within a biological sample. In some embodiments, a checkpoint inhibitor is selected based at least in part on an antigen, MHC protein, and/or MHC-antigen complex known or predicted to be expressed within a biological sample. In some cases, a checkpoint inhibitor is selected based at least in part on the spatial distribution of antigen and/or MHC expression in a biological sample. In some cases, a method disclosed herein comprises administering a selected checkpoint inhibitor to a subject.

Activated B-cells can induce specific T-cell responses directed towards cancer cell destruction. In some embodiments of a method disclosed herein, a B-cell therapy can be engineered or selected based at least in part on an antigen, MHC protein, and/or MHC-antigen complex known or predicted to be expressed within a biological sample. In some cases, a B-cell therapy is engineered or selected based at least in part on the spatial distribution of antigen and/or MHC expression in a biological sample. In some cases, a method of the disclosure comprises administering an engineered or selected B-cell therapy to a subject.

Variant Sequences and Alleles

Methods disclosed herein can analyze the spatial distribution of target sequences in a biological sample. In some cases, the identity and location of a target sequence can be determined using a probe molecule. For example, the method can comprise hybridizing a probe comprising a targeting sequence to a nucleic acid corresponding to a target sequence, identifying at least a portion of the probe, and determining a location of the target sequence within a biological sample by determining a location of the probe. In some cases, identifying and/or determining a location of the probe or at least a portion thereof comprises sequencing at least of portion of the probe in situ. In some cases, determining a location of a target sequence comprises identifying a target sequence. In some cases, a target sequence comprises a sequence of nucleotides. In some cases, a target sequence is part of a DNA molecule, cDNA molecule, RNA molecule, messenger ribonucleic acid (mRNA) molecule, transfer ribonucleic acid (tRNA) molecule, micro ribonucleic acid (miRNA) molecule, ribosomal ribonucleic acid (rRNA) molecule, small nucleolar ribonucleic acid (snoRNA) molecule, miRNA molecule, or a derivative of any of the foregoing. In some cases, a nucleic acid corresponding to a variant sequence is a DNA molecule, cDNA molecule, RNA molecule, mRNA molecule, miRNA molecule, rRNA molecule, snoRNA molecule, miRNA molecule, tRNA molecule, or a portion a portion or derivative of any of the foregoing.

In some cases, a target sequence is an HLA variant sequence. An HLA variant sequence can, in some instances, comprise an HLA allele. HLA genes can encode for MHC proteins and come in two forms class I and class II. In some instances, an HLA variant sequence is a class I or class II HLA allele. Class I HLA alleles encode class I MHC proteins and class II HLA alleles encode class II MHC proteins. MHC proteins are generally glycoproteins that contain a polymorphic antigen binding site or binding groove that can, in some cases, complex with peptide, including peptides processed by the cell machinery. A non-limiting example of a peptide that can complex with MHC proteins is an antigen such as a tumor antigen. In some cases, MHC molecules can be displayed or expressed on the cell surface, including as a complex with an antigen, i.e., MHC-antigen complex, for presentation of an antigen in a conformation recognizable by an antigen receptor on immune cells such as T-cells. Generally, MHC class I molecules are heterodimers having a membrane spanning a chain, in some cases with three alpha domains, and a non-covalently associated β2 microglobulin. Generally, MHC class II molecules are composed of two transmembrane glycoproteins, α and β, both of which can span the membrane. An MHC molecule can include an effective portion of an MHC that contains an antigen binding site or sites for binding a peptide and the sequences necessary for recognition by the appropriate antigen receptor. In some embodiments, MHC class I molecules deliver antigens originating in the cytosol to the cell surface, where a MHC-antigen complex is recognized by T cells, such as generally CD8+ T cells, but in some cases CD4+ T cells. In some embodiments, MHC class II molecules deliver antigens originating in the vesicular system to the cell surface, where they are generally recognized by CD4+ T-cells.

Expression of an HLA allele can affect MHC protein and antigen presentation on cells such as cancer cells. In some instances, an HLA variant sequence is an HLA allele. HLA alleles can be classified to a level of precision that ranges from a group of similar HLA molecules (e.g. HLA-A*02) to distinct proteins, coding sequences, nucleotide sequences, expression level (e.g. HLA-A*02:01:101:01:02N). Non-limiting examples of HLA alleles include HLA-A*01, HLA-A*02:01, HLA-A*02:01:01:03:04N, HLA-A*03:01, HLA-A*24:02, HLA-B*44:02, HLA-C*07:01, HLA-C*08:02, HLA-DPA1*01, HLA-DPB1*04:02, HLA-DQA1*01, HLA-DQB1*01, HLA DRB1*01, and HLA-DRA. In some embodiments, a method of the disclosure can assess the absence or spatial expression of one or more (e.g., 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 1,000, 2,000, 3,000, 4,000, 5,000, 10,000, 100,000 or more) HLA alleles in a biological sample. For example, the method of the disclosure can assess the absence or spatial expression of a selection of clinically actionable target sequences (e.g., HLA variants). The absence or spatial expression of parental HLA alleles and/or modifications or mutants thereof can be assessed. The expression of corresponding MHC proteins can be predicted from HLA expression patterns. In some cases, a therapy (e.g., an immunotherapy) can be engineered or selected for treatment based on the spatial expression of one or more HLA alleles in a biological sample. Predictions on the spatial organization of MHC proteins or MHC-antigen complexes in a biological sample can be made based on predicted expression of HLA alleles. In some cases, a method of the disclosure comprises generating a visual representation (e.g., for display on a user interface, such as a graphical user interface) of HLA allele expression in a biological sample. In some cases, biological samples, such as blood and tissue needle aspirate, may not be analyzed in the context of an extracellular or tissue matrix structure, but have cellular structure, and the spatial expression comprises the observed or predicted cellular colocalization of MHC alleles and antigens.

A variant sequence of the disclosure can, in some cases, encode for, or be associated with, an antigen such as a tumor antigen. Tumor antigens are antigenic substances produced in tumor cells. Tumor antigens can trigger an immune response. Non-limiting examples of tumor antigens include tumor-specific antigens, tumor associated antigens, and neoantigens. Tumor-specific antigens are expressed on cancer cells, but not on healthy cells. Tumor-associated antigens can be expressed on both on cancerous and healthy cells. Neoantigens are antigens created due to tumor mutations. A method disclosed herein can comprise determining the location and identity of a target sequence that corresponds to or is associated with an antigen such as a tumor antigen. In some cases, a method disclosed herein comprises determining the identity and location of more than one target sequence, with each target sequence corresponding to or being associated with a tumor antigen, an HLA allele, or another sequence which can be used to infer the clonal structure of the tumor. For example, a method of the disclosure can determine the identity and location of a single neoantigen and its presenting HLA allele. In other cases, a method may determine the identity and location of hundreds or thousands of (e.g., at least about 10, 100, 1000, 10,000, 100,000 or more) target sequences. In these cases, targets may include identified antigens such as neoantigens, dozens or more nucleotide sequences that define each of the class I and II HLA alleles, sequences that can identify expressed cancer associated antigens, and hundreds or more expressed mutations that can be used together to help define and predict the clonal structure. In some instances, a method of the disclosure can determine the absence of hundreds or more target sequences. In some cases, the cancer samples may be highly mutated and hundreds of expressed somatic mutations may be identified, which can be used to inform clonal structure even if the specific antigen is not identified or is unknown. The absence or identity and location of one or more target sequences in a biological sample can be used to select or engineer a treatment such as an immunotherapy for a subject. In some cases, the selected or engineered treatment can be administered to the subject.

In some cases, a variant sequence comprises a mutation. Mutations can be associated with, for example, an increased risk of cancer or a neoantigen. Mutations can, in some instances, be tumor mutations.

Probe Design

Probe molecules can be designed to detect the presence or absence target sequences in a biological sample. In some cases, the presence, absence, and/or location of a target sequence can be determined via identification of a probe molecule or a portion thereof. Probe molecules can be designed to hybridize to nucleic acid molecules corresponding to variant sequence. A nucleic acid molecule can be, for example a DNA molecule, cDNA molecule, RNA molecule, mRNA molecule, tRNA molecule, miRNA molecule, rRNA molecule, snoRNA molecule, miRNA molecule, or a derivative of any of the foregoing (e.g., an amplification product). In some embodiments, a method disclosed herein comprises reverse transcribing RNA to form cDNA, wherein the cDNA is a nucleic acid corresponding to a variant sequence. In some cases, a probe molecule hybridizes specifically to a variant sequence.

Different HLA alleles and nucleotides associated with or encoding tumor antigens can have very similar nucleotide sequences. Similarities in nucleotide sequences can make designing probes that are specific for a single allele or gene difficult. Thus, in some cases probe molecules of the disclosure can discriminate between two or more nucleic acids corresponding to different variant sequences known to be present in a biological sample. In some cases, probes can be designed or selected based on characteristics known about the type of cancer to be treated. In some cases, probes can be designed or selected based on information learned about a subject. For example, a method of the disclosure can comprise generating a genetic profile of a subject from which a sample is obtained to identify a plurality of variant sequences (e.g. HLA variant sequences or sequences encoding antigen variants) present in the subject. In some cases, the genetic profile is obtained from a tumor or cancerous cells. In some cases, the genetic profile is generated from healthy tissue in a subject. For example, a genetic profile may be generated from healthy tissue in a subject with a tumor so that it can be compared to the genetic profile of the tumor. Genetic profiles can be generated via, for example, microarrays, next generation sequencing, single cell sequencing of tissues, RNA sequencing, exome sequencing, or whole genome sequencing. In some instances, genetic profiles can be used to identify a plurality of variant sequences present in a biological sample to spatially analyze. In some instances, genetic profiles can be used to identify any mutations present in a biological sample. In some cases, the mutation(s) identified can be used to predict any associated antigen(s). In some embodiments, an antigen prediction algorithm can be used to identify a mutation and an associated antigen. Probe molecules can be designed or selected to discriminate between two nucleic acid molecules corresponding to sequences of the plurality of variant sequences identified and/or to preferentially hybridize to a nucleic acid molecule corresponding to only one of the identified variant sequences. An example of such a process is depicted in FIG. 1. An example of designing a probe to focus on the discrimination of two different HLA alleles known to be present in a sample is shown in FIG. 2, with bolded alleles representing alleles identified in a biological sample, and non-bolded alleles representing similar alleles that were not detected in the sample. Probes may be designed based on population-level or clinical cohort sequencing datasets, such as from “cancer atlas” databases, to target variants present at more than 1 variant in the cohort, or based on a percent frequency, such as 0.1%, 1%, 2%, 5%, 10%, or more.

Samples

Any suitable biological sample that comprises nucleic acid may be obtained from a subject. Any suitable biological sample that comprises nucleic acid may be used in the methods and systems described herein. A biological sample may be solid matter (e.g., biological tissue) or may be a fluid (e.g., a biological fluid). In general, a biological fluid can include any fluid associated with living organisms. Non-limiting examples of a biological sample include blood (or components of blood—e.g., white blood cells, red blood cells, platelets) obtained from any anatomical location (e.g., tissue, circulatory system, bone marrow) of a subject, and cells or tissue obtained from any anatomical location of a subject such as skin, heart tissue, lung tissue, kidney tissue, breath, bone marrow, stool, semen, vaginal fluid, interstitial fluids derived from tumorous tissue, breast tissue, pancreatic tissue, cerebral spinal fluid, throat swab, biopsies, placental fluid, amniotic fluid, liver tissue, muscle, smooth muscle, bladder tissue, gall bladder tissue, colonic tissue, intestinal tissue, brain tissue, cavity fluids, sputum, pus, micropiota, meconium, breast milk, prostate tissue, esophageal tissue, thyroid tissue, serum, saliva, urine, gastric and digestive fluid, tears, ocular fluids, sweat, mucus, earwax, oil, glandular secretions, spinal fluid, hair, fingernails, plasma, nasal swab or nasopharyngeal wash, spinal fluid, cord blood, emphatic fluids, and/or other excretions or body tissues. A biological sample may be a cell-free sample. Such cell-free sample may include DNA and/or RNA.

Additionally, biological samples of the disclosure include, without limitation, cells, populations of cells, needle or fine needle aspirates, tissue biopsies, tissue sections, tumor biopsies, biological tissues, surgical resections, tumors, and cancer cells. In some embodiments, methods disclosed herein analyze the spatial distribution of variant sequences in cancer/tumor samples, aid in the selection of treatment of cancer, and/or administer therapies for cancer treatment. Non-limiting examples of cancers disclosed herein include a fibrosarcoma, myosarcoma, liposarcoma, chondrosarcoma, osteogenic sarcoma, chordoma, angiosarcoma, endotheliosarcoma, lymphangiosarcoma, lymphangioendotheliosarcoma, synovioma, mesothelioma, Ewing's tumor, leiomyosarcoma, rhabdomyosarcoma, gastric cancer, esophageal cancer, rectal cancer, pancreatic cancer, ovarian cancer, prostate cancer, uterine cancer, cancer of the head and neck, skin cancer, brain cancer, squamous cell carcinoma, sebaceous gland carcinoma, papillary carcinoma, papillary adenocarcinoma, cystadenocarcinoma, medullary carcinoma, bronchogenic carcinoma, renal cell carcinoma, hepatoma, bile duct carcinoma, choriocarcinoma, seminoma, embryonal carcinoma, Wilm's tumor, cervical cancer, testicular cancer, small cell lung carcinoma, non-small cell lung carcinoma, bladder carcinoma, epithelial carcinoma, glioma, astrocytoma, medulloblastoma, craniopharyngioma, ependymoma, pinealoma, hemangioblastoma, acoustic neuroma, oligodendroglioma, meningioma, melanoma, neuroblastoma, retinoblastoma, leukemia, lymphoma, or Kaposi sarcoma. In some instances, a biological sample is derived from healthy tissue. For example, a biological tissue can be derived from a healthy control tissue from a subject with cancer.

Construction of the 3D Matrix

Methods disclosed herein may comprise providing a biological sample such as tumor tissue a or derivative thereof within a 3D matrix prior to contacting the sample with a probe molecule. An in situ 3D matrix can be formed from an original biological specimen using a number of approaches described herein. Formation of the 3D matrix can cause the termination of in vivo biochemical processes, substantially preserving spatial information within the biological sample such as the spatial information of the nucleic acid molecules and other sub-cellular components. Common methods for forming the 3D matrix from a biological specimen can include fixation, or the formation of chemical (via covalent bonds) or physical (via weak interactions) crosslinks among the 3D matrix of biomolecules, such as by temperature, electromagnetic radiation (e.g., microwave), or chemicals, such as formaldehyde, glutaraldehyde, or other material for biological sample fixation, within the cell and tissue. Any convenient fixation agent, or “fixative,” may be used to fix the biological sample in the absence or in the presence of hydrogel subunits, for example, formaldehyde, paraformaldehyde, glutaraldehyde, acetone, ethanol, methanol, formalin, osmium tetroxide, etc. In some cases, the fixative may be diluted in a buffer, e.g., saline, phosphate buffer (PB), phosphate buffered saline (PBS), citric acid buffer, potassium phosphate buffer, etc., usually at a concentration of 1-10%, e.g., about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, or 10%, for example, 4% paraformaldehyde/0.1M phosphate buffer; 2% paraformaldehyde/0.2% picric acid/0.1M phosphate buffer; 4% paraformaldehyde/0.2% periodate/1.2% lysine in 0.1 M phosphate buffer; 4% paraformaldehyde/0.05% glutaraldehyde in phosphate buffer; etc. The type of fixative used and the duration of exposure to the fixative will depend on the sensitivity of the molecules of interest in the specimen to denaturation by the fixative and may be readily determined using histochemical or immunohistochemical techniques.

Alternatively, or in addition to the process of fixation, a tissue-chemical hydrogel may be formed by generation of chemical or physical crosslinks between biomolecules and other natural or synthetic components added to the sample to supplement or replace native cellular components for the purpose of immobilizing biomolecules. The chemical 3D matrix can comprise a polymeric compound. A 3D matrix may be formed in situ throughout the cell and tissue sample, such as through the formation of a hydrogel matrix. A hydrogel matrix may be formed upon cross-linking, gelling, or polymerizing subunits, such as, for example, cross-linking, gelling or polymerizing a polyacrylamide or polyethylene glycol (PEG). The 3D matrix may be generated by directing precursors of the 3D matrix into the biological specimen and subjecting the precursors to crosslinking or polymerization reactions. For example, acrylamide may be directed into the biological specimen and polymerized to form polyacrylamide. Further according to this embodiment, the chemical matrix can be composed substantially of polyacrylamide. According to another embodiment, the 3D matrix can be an expanding FISSEQ matrix, such as one comprised substantially of poly(acrylate-co-acrylic acid) (PAA) or Poly(N-isopropylacrylamide) (NIPAM). The matrix comprising NIPAM may be expandable or configured to expand by a change in temperature. According to another embodiment, the 3D matrix can be composed substantially of cross-linked poly-ethylene-glycol (PEG). The PEG can be of various molecular weights.

The 3D matrix may be formed by various processes such as via free-radical polymerization, chemical conjugation and bioconjugation reactions. For example, the reaction between a primary amine and N-hydroxysuccinimide esters or between thiols and maleimides or other chemical mechanisms may be used to form the 3D matrix. Aggregation and non-covalent mechanism may also be used to generate the 3D matrix.

The 3D matrix may be formed using a photopolymerization. Photopolymerization may use photons to initiate a polymerization reaction. The photopolymerization reaction may be initiated by a single-photon or a multiphoton excitation system as described elsewhere herein. Light may be manipulated such to form specific two dimensional (2D) or 3D patterns and be used to initiate the photopolymerization reaction. This may be used to construct a particular shape or pattern for the 3D matrix such that the matrix is generated in one part of the cell or cell derivative but not generated in another part of the cell or cell derivative. Light and patterns of light may be generated by spatial light modulators, such as a digital spatial light modulator. The spatial light modulators may employ a transmissive liquid crystal, reflective liquid crystal on silicon (LCOS), digital light processing, a digital micromirror device (DMD), or a combination thereof.

The fixative/hydrogel composition may comprise any hydrogel subunits, such as, but not limited to, poly(ethylene glycol) and derivatives thereof (e.g., PEG-diacrylate (PEG-DA), PEG-RGD), polyaliphatic polyurethanes, polyether polyurethanes, polyester polyurethanes, polyethylene copolymers, polyamides, polyvinyl alcohols, polypropylene glycol, polytetramethylene oxide, polyvinyl pyrrolidone, polyacrylamide, poly(hydroxyethyl acrylate), and poly(hydroxyethyl methacrylate), collagen, hyaluronic acid, chitosan, dextran, agarose, gelatin, alginate, protein polymers, methylcellulose and the like. Agents such as hydrophilic nanoparticles, e.g., poly-lactic acid (PLA), poly-glycolic acid (PLG), poly(lactic-co-glycolic acid) (PLGA), polystyrene, poly(dimethylsiloxane) (PDMS), etc. may be used to improve the permeability of the hydrogel while maintaining patternability. Materials such as block copolymers of PEG, degradable PEO, poly(lactic acid) (PLA), and other similar materials can be used to add specific properties to the hydrogel. Crosslinkers (e.g., bis-acrylamide, diazirine, etc.) and initiators (e.g., azobisisobutyronitrile (AIBN), riboflavin, L-arginine, etc.) may be included to promote covalent bonding between interacting macromolecules in later polymerization operations.

Nucleic acids (e.g., RNA molecules, DNA molecules, cDNA molecules, primers, probes, padlock probes) disclosed herein can comprise functional moieties which can be used to link the nucleic acid molecules to a 3D matrix. The functional moiety can be reacted with a reactive group on the 3D matrix through conjugation chemistry. In some cases, the functional moiety can be attached to a target of interest through conjugation chemistry. In some cases, the functional moiety can be directly attached to a reactive group on the native nucleic acid molecule. In some cases, the functional moiety can be indirectly linked to a target through an intermediate chemical or group. The conjugation approaches described herein are not limited to nucleic acid targets and can be used for protein or small molecule targets as well. A nucleotide analog comprising a functional moiety may be incorporated into a growing chain of the nucleic acid (e.g., cDNA molecule, probe, or primer) during nucleic acid synthesis or an extension reaction.

As used herein, the term “reactive group” or “functional moiety” generally refers to any moiety on a first reactant that is capable of reacting chemically with another functional moiety or reactive group on a second reactant to form a covalent or ionic linkage. “Reactive group” and “functional moiety” may be used interchangeably. For example, a reactive group of the monomer or polymer of the matrix-forming material can react chemically with a functional moiety (or another reactive group) on the substrate of interest or the target to form a covalent or ionic linkage. The substrate of interest or the target may then be immobilized to the matrix via the linkage formed by the reactive group and the functional moiety. Examples of suitable reactive groups or functional moieties include electrophiles or nucleophiles that can form a covalent linkage by reaction with a corresponding nucleophile or electrophile, respectively, on the substrate of interest. Non-limiting examples of suitable electrophilic reactive groups may include, for example, esters including activated esters (such as, for example, succinimidyl esters), amides, acrylamides, acridines, acyl azides, acyl halides, acyl nitriles, aldehydes, ketones, alkyl halides, alkyl sulfonates, anhydrides, aryl halides, aziridines, boronates, carbodiimides, diazoalkanes, epoxides, haloacetamides, haloplatinates, halotriazines, imido esters, isocyanates, isothiocyanates, maleimides, phosphoramidites, silyl halides, sulfonate esters, sulfonyl halides, and the like. Non-limiting examples of suitable nucleophilic reactive groups may include, for example, amines, anilines, thiols, alcohols, phenols, hyrazines, hydroxylamines, carboxylic acids, glycols, heterocycles, and the like.

Further according to these aspects of the present disclosure, endogenous or exogenous biomolecules, especially nucleic acids, may be covalently or noncovalently linked to the 3D matrix, preserving the spatial origin of the molecules during sample processing. The nucleic acid molecules or derivatives thereof can be coupled to the 3D matrix by coupling agents. To facilitate coupling or other downstream processes, endogenous nucleic acids may be modified using chemical reactions, such as alkylation, oxymercuration, periodate oxidation of RNA 3′ vicinal diols, carbodiimide activation of RNA and DNA 5′ phosphate, or by other nucleic-acid reactive chemistries such as psoralen and phenyl azide, for functional attachment of acryloyl or click-reactive moieties, which may be subsequently reacted with the 3D matrix. Alternatively, endogenous nucleic acids may be modified using biochemical reactions, such as ligation, polymerase extension, and hybridization, for functional attachment of acryloyl or click-reactive moieties, which may be subsequently reacted with the 3D matrix. For example, a DNA molecule may be ligated using a DNA ligase to attach the 3D matrix to the DNA molecule. The coupling reaction may couple probes or sequences comprising a identifier nucleotide sequence to the 3D matrix or may couple sequences to the 3D matrix that are associated with probes or a identifier nucleotide sequence.

Reference to the 3D matrix may be understood to be inclusive of a number of matrix compositions, including those comprised of biomolecules, synthetic polymers, hydrogels, or combinations thereof. An intermediate or final 3D matrix composition may comprise multiple independently formed matrixes, such as re-embedded hydrogels, or other forms of spatially coincident, or in situ, 3D matrix(es).

Further according to these aspects of the present disclosure, the synthetic 3D matrix may be partially or substantially cleared of certain species or classes of biomolecules, such as lipids and proteins, as by use of detergent and/or protease reagents. According to some aspects of the present disclosure, the sample can be cleared using a detergent solution, such as Triton-X or sodium dodecyl sulfate (SDS). The detergent may interact with the molecules allowing the molecules to be washed out or removed. Other non-limiting examples of detergents include Triton X-100, Triton X-114, Tween-20, Tween 80, saponin, 3-[(3-cholamidopropyl)dimethylammonio]-1-propanesulfonate (CHAPS), and NP-40. According to some aspects of the present disclosure, the sample can be cleared using a protease reaction, such as Proteinase K. The protease may cleave or digest proteins such that the fragments or amino acids can be removed. According to some aspects of the present disclosure, the extracellular matrix can be substantially cleared using one or more specific or non-specific proteases. Other non-limiting examples of protease include trypsin, chemotrypsin, papain, thrombin, and pepsin.

The synthetic 3D matrix may be immobilized onto a solid substrate, such as glass or plastic, facilitating handling and reagent exchange. According to one aspect, the 3D matrix can be affixed to a glass slide via oxysilane-functionalization with acrylamide- or free-radical-polymerizing groups, such as methacryloxypropyltrimethoxysilane. The 3D matrix may be free-floating or otherwise not attached to a solid substrate.

A matrix may be used in conjunction with a solid support. For example, the matrix can be polymerized in such a way that one surface of the matrix is attached to a solid support (e.g., a glass surface, a flow cell, a glass slide, a well), while the other surface of the matrix is exposed or sandwiched between two solid supports. According to some aspects of the present disclosure, the matrix can be contained within a container. In some cases, the biological sample may be fixed or immobilized on a solid support.

Solid supports of the present disclosure may be fashioned into a variety of shapes. In certain embodiments, the solid support is substantially planar. Examples of solid supports include plates such as slides, microtitre plates, flow cells, coverslips, microchips, and the like, containers such as microfuge tubes, test tubes and the like, tubing, sheets, pads, films and the like. Additionally, the solid supports may be, for example, biological, nonbiological, organic, inorganic, or a combination thereof.

The term “solid surface” or “solid support,” as used herein, refers to the surface of a solid support or substrate and includes any material that can serve as a solid or semi-solid foundation for attachment of a biological sample such as polynucleotides, amplicons (i.e., amplification products), DNA balls, other nucleic acids and/or other polymers, including biopolymers. Examples of materials comprising solid surfaces include glass, modified glass, functionalized glass, inorganic glasses, microspheres, including inert and/or magnetic particles, plastics, polysaccharides, nylon, nitrocellulose, ceramics, resins, silica, silica-based materials, carbon, metals, an optical fiber or optical fiber bundles, a variety of polymers other than those exemplified above and multi-well micro tier plates. Examples of plastics include acrylics, polystyrene, copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, polyurethanes and Teflon™. Examples of silica-based materials include silicon and various forms of modified silicon.

Solid surfaces can also be varied in their shape depending on the application in a method described herein. For example, a solid surface useful in the present disclosure can be planar or contain regions that are concave or convex.

Computer Systems

The present disclosure provides computer systems that are programmed to implement methods of the disclosure. FIG. 3 shows a computer system 301 that is programmed or otherwise configured to aid in designing probes, engineering immunotherapies, selecting immunotherapies for administration to a subject, predicting MHC-antigen complex presentation, processing sequencing reads, sequencing nucleic acids, or constructing visual representations of HLA expression, antigen expression, MHC expression, or MHC-antigen presentation within a biological sample. The computer system 301 can regulate various aspects of the present disclosure, such as, for example, store data relating to 3D spatial positions of nucleic acids (e.g., probes, amplification products, rolonies, etc.), align sequencing reads, map nucleic acid locations, and generate outputs regarding spatial positions. In some aspects, the computer system may be programmed to control release of reagents, activation of reactions (e.g., amplification reactions), and/or may initiate a sequencing reaction to take place. The computer system 301 can be an electronic device of a user or a computer system that is remotely located with respect to the electronic device. The electronic device can be a mobile electronic device.

The computer system 301 includes a central processing unit (CPU, also “processor” and “computer processor” herein) 305, which can be a single core or multi core processor, or a plurality of processors for parallel processing. The computer system 301 also includes memory or memory location 310 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 315 (e.g., hard disk), communication interface 320 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 325, such as cache, other memory, data storage and/or electronic display adapters. The memory 310, storage unit 315, interface 320 and peripheral devices 325 are in communication with the CPU 305 through a communication bus (solid lines), such as a motherboard. The storage unit 315 can be a data storage unit (or data repository) for storing data. The computer system 301 can be operatively coupled to a computer network (“network”) 330 with the aid of the communication interface 320. The network 330 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet. The network 330 in some cases is a telecommunication and/or data network. The network 330 can include one or more computer servers, which can enable distributed computing, such as cloud computing. The network 330, in some cases with the aid of the computer system 301, can implement a peer-to-peer network, which may enable devices coupled to the computer system 301 to behave as a client or a server.

The CPU 305 can execute a sequence of machine-readable instructions, which can be embodied in a program or software. The instructions may be stored in a memory location, such as the memory 310. The instructions can be directed to the CPU 305, which can subsequently program or otherwise configure the CPU 305 to implement methods of the present disclosure. Examples of operations performed by the CPU 305 can include fetch, decode, execute, and writeback.

The CPU 305 can be part of a circuit, such as an integrated circuit. One or more other components of the system 301 can be included in the circuit. In some cases, the circuit is an application specific integrated circuit (ASIC).

The storage unit 315 can store files, such as drivers, libraries and saved programs. The storage unit 315 can store user data, e.g., user preferences and user programs. The computer system 301 in some cases can include one or more additional data storage units that are external to the computer system 301, such as located on a remote server that is in communication with the computer system 301 through an intranet or the Internet.

The computer system 301 can communicate with one or more remote computer systems through the network 330. For instance, the computer system 301 can communicate with a remote computer system of a user (e.g., a user generating the indices of the current disclosure or a user utilizing such indices). Examples of remote computer systems include personal computers (e.g., portable PC), slate or tablet PC's (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), personal digital assistants, or cloud systems (e.g. Amazon AWS). The user can access the computer system 301 via the network 330.

Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 301, such as, for example, on the memory 310 or electronic storage unit 315. The machine executable or machine-readable code can be provided in the form of software. During use, the code can be executed by the processor 305. In some cases, the code can be retrieved from the storage unit 315 and stored on the memory 310 for ready access by the processor 305. In some situations, the electronic storage unit 315 can be precluded, and machine-executable instructions are stored on memory 310.

The code can be pre-compiled and configured for use with a machine having a processer adapted to execute the code or can be compiled during runtime. The code can be supplied in a programming language that can be selected to enable the code to execute in a pre-compiled or as-compiled fashion.

Aspects of the systems and methods provided herein, such as the computer system 301, can be embodied in programming. Various aspects of the technology may be thought of as “products” or “articles of manufacture” in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk. “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server. Thus, another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links or the like, also may be considered as media bearing the software. As used herein, unless restricted to non-transitory, tangible “storage” media, terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.

Hence, a machine readable medium, such as computer-executable code, may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings. Volatile storage media include dynamic memory, such as main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.

The computer system 301 can include or be in communication with an electronic display 335 that comprises a user interface (UI) 340. In some instances, the US can provide the spatial origin of nucleic acid molecules, show the detection and/or sequencing of biomolecules of interest, or generate or display an electronic report associating the 3D spatial position with a sequence of a nucleic acid molecule. Examples of UI's include, without limitation, a graphical user interface (GUI) and web-based user interface.

In some embodiments, data including the spatial origin/position of nucleic acid molecules detected may be generated and compiled (e.g., in a clinical “report”). In some embodiments, the data may be processed by computer algorithm or with human assistance, e.g., by an oncologist, clinical genomicist, or pathologist, into a concise representation of the presence and/or absence of variants and clonality thereof within the tumor and with respect to histological tissue features, when present, such as by reference to databases, for the purpose of distilling clinically actionable or potentially actionable aspects of the high-dimensional data for the purpose of diagnosis, prognosis, or therapeutic guidance. The report may be presented in analog or digital form, in the latter embodiment, may include interactive graphical user interface features for the purpose of visualization and performing statistical analysis with respect to the patient sample and/or external datasets.

Methods and systems of the present disclosure can be implemented by way of one or more algorithms. An algorithm can be implemented by way of software upon execution by the central processing unit 305. The algorithm can, for example, be executed to generate the indices of the current disclosure, or map and align sequencing reads to identify a spatial origin of a given sequence.

While various embodiments of the invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.

Definitions

Unless defined otherwise, all terms of art, notations and other technical and scientific terms or terminology used herein are intended to have the same meaning as is commonly understood by one of ordinary skill in the art to which the claimed subject matter pertains. In some cases, terms with commonly understood meanings are defined herein for clarity and/or for ready reference, and the inclusion of such definitions herein should not necessarily be construed to represent a substantial difference over what is generally understood in the art.

Throughout this application, various embodiments may be presented in a range format. It may be understood that the description in range format is merely for convenience and brevity and may not be construed as an inflexible limitation on the scope of the disclosure. Accordingly, the description of a range can be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 can be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.

As used in the specification and claims, the singular form “a”, “an” or “the” includes plural references unless the context clearly dictates otherwise. For example, the term “a cell” includes a plurality of cells, including mixtures thereof.

Whenever the term “at least,” “greater than,” or “greater than or equal to” precedes the first numerical value in a series of two or more numerical values, the term “at least,” “greater than” or “greater than or equal to” applies to each of the numerical values in that series of numerical values. For example, greater than or equal to 1, 2, or 3 is equivalent to greater than or equal to 1, greater than or equal to 2, or greater than or equal to 3.

Whenever the term “no more than,” “less than,” or “less than or equal to” precedes the first numerical value in a series of two or more numerical values, the term “no more than,” “less than,” or “less than or equal to” applies to each of the numerical values in that series of numerical values. For example, less than or equal to 3, 2, or 1 is equivalent to less than or equal to 3, less than or equal to 2, or less than or equal to 1.

As used herein, the term “about” a number refers to that number plus or minus 10% of that number. The term “about” a range refers to that range minus 10% of its lowest value and plus 10% of its greatest value.

As used herein, the terms “amplifying” and “amplification” generally refer to generating an extension product or one or more copies (or “amplified product” or “amplification product”) of a nucleic acid. The one or more copies may be generated by nucleic acid extension. Such extension may be a single round of extension or multiple rounds of extension. The amplified product may be generated by polymerase chain reaction (PCR).

The term “rolony,” as used herein, generally refers to a rolling circle colony, such as, for example, a colony of nucleic acid molecules generated by rolling circle amplification (RCA).

The term “nucleic acid,” as used herein, generally refers to a polymeric form of nucleotides of any length. A nucleic acid may comprise either deoxyribonucleotides (dNTPs) or ribonucleotides (rNTPs), or analogs thereof. A nucleic acid may be an oligonucleotide or a polynucleotide. Nucleic acids may have any three-dimensional structure and may perform any function. Non-limiting examples of nucleic acids include DNA, RNA, coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), ribozymes, cDNA, recombinant nucleic acids, branched nucleic acids, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers. A nucleic acid may comprise one or more modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be made before or after assembly of the nucleic acid. The sequence of nucleotides of a nucleic acid may be interrupted by non-nucleotide components. A nucleic acid may be further modified after polymerization, such as by conjugation, with a functional moiety for immobilization.

As used herein, the term “subject,” generally refers to an entity or a medium that has testable or detectable genetic information. A subject can be a person or an individual. A subject can be a vertebrate, such as, for example, a mammal. Non-limiting examples of mammals include murines, simians, and humans. A subject may be an animal, such as a farm animal. A subject may be a pet, such as dog, cat, mouse, rat, or bird. Other examples of subjects include food, plant, soil, and water. A subject may be displaying a disease. As an alternative, the subject may be asymptomatic.

The section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described.

EMBODIMENTS

The following non-limiting embodiments provide illustrative examples of the disclosure, but do not limit the scope of the disclosure.

Embodiment 1. A method of analyzing a spatial distribution of a first human leukocyte antigen (HLA) variant sequence in a biological sample comprising:

(a) obtaining a biological sample comprising a nucleic acid corresponding to the first HLA variant sequence from a subject;

(b) hybridizing a first probe comprising an HLA targeting sequence to the nucleic acid corresponding to the first HLA variant sequence;

(c) identifying at least a portion of the first probe; and

(d) determining a location of the first HLA variant sequence within the biological sample by determining a location of the first probe.

Embodiment 2. The method of embodiment 1, wherein identifying at least a portion of the first probe comprises sequencing at least a portion of the first probe in situ.

Embodiment 3. The method of embodiment 1 or 2, wherein determining a location of the first probe comprises sequencing at least a portion of the first probe in situ.

Embodiment 4. The method of any one of embodiments 1-3, wherein determining a location of the first HLA variant sequence further comprises identifying the first HLA variant sequence.

Embodiment 5. The method of embodiment 4, wherein the first HLA variant sequence comprises an HLA allele.

Embodiment 6. The method of embodiment 4 or 5, wherein identifying the first HLA variant sequence comprises identifying the first probe.

Embodiment 7. The method of any one of embodiments 1-6, further comprising providing the biological sample within a three-dimensional (3D) matrix that preserves spatial information of the first HLA variant sequence prior to operations (c) and (d).

Embodiment 8. The method of embodiment 7, wherein providing the biological sample within the 3D matrix comprises generating the 3D matrix.

Embodiment 9. The method of embodiment 7 or 8, further comprising immobilizing the first probe on the 3D matrix.

Embodiment 10. The method of any one of embodiments 7-9, further comprising immobilizing the nucleic acid corresponding to the first HLA variant sequence on the 3D matrix.

Embodiment 11. The method of any one of embodiments 7-10, wherein the biological sample is provided within the 3D matrix by directing a precursor of the 3D matrix through the biological sample and subjecting the precursor of the 3D matrix to a reaction to generate cross-links and form the 3D matrix.

Embodiment 12. The method of embodiment 11, wherein the cross-links comprise chemical crosslinks.

Embodiment 13. The method of embodiment 11, wherein the cross-links comprise physical crosslinks.

Embodiment 14. The method of embodiment 11, wherein the reaction comprises free-radical polymerization.

Embodiment 15. The method of embodiment 11, wherein the reaction comprises a chemical conjugation reaction.

Embodiment 16. The method of embodiment 11, wherein the reaction comprises a bioconjugation reaction.

Embodiment 17. The method of embodiment 11, wherein the reaction comprises a photopolymerization reaction.

Embodiment 18. The method of any one of embodiments 1-17, wherein the biological sample comprises a second nucleic acid corresponding to a second variant sequence and the method further comprises:

(A) hybridizing a second probe comprising a second nucleic acid targeting sequence to the second nucleic acid corresponding to the second variant sequence;

(B) identifying at least a portion of the second probe; and

(C) determining a location of the second variant sequence within the biological sample by determining a location of the second probe.

Embodiment 19. The method of embodiment 18, wherein the second variant sequence comprises a mutation.

Embodiment 20. The method of embodiment 19, wherein the mutation is associated with an increased risk of cancer.

Embodiment 21. The method of embodiment 19, wherein the mutation is associated with a tumor antigen.

Embodiment 22. The method of embodiment 19, wherein the mutation is associated with a cancer/testis antigen.

Embodiment 23. The method of embodiment 19, wherein the mutation is associated with an oncofetal protein.

Embodiment 24. The method of embodiment 19, wherein the mutation is a tumor mutation.

Embodiment 25. The method of embodiment 19, wherein the mutation is associated with a tumor suppressor protein.

Embodiment 26. The method of embodiment 19, wherein the mutation is associated with a neoantigen.

Embodiment 27. The method of any one of embodiments 18-26, further comprising generating a visual representation of the location of the first HLA variant sequence and the location of the second variant sequence for display on a graphical user interface (GUI).

Embodiment 28. The method of any one of embodiments 18-26, further comprising detecting a clone within the biological sample by comparing the location of the first HLA variant sequence and the location of the second variant sequence.

Embodiment 29. The method of embodiment 28, further comprising generating a visual representation of the location of the clone within the biological sample for display on a graphical user interface (GUI).

Embodiment 30. The method of any one of embodiments 18-29, further comprising identifying a cell or derivative thereof within the biological sample, wherein the cell derivative thereof comprises the first HLA variant sequence and the second variant sequence.

Embodiment 31. The method of any one of embodiments 18-30, further comprising predicting the presentation of a peptide on a major histocompatibility complex (MHC) protein expressed in the biological sample, wherein the peptide is at least partially encoded by the second variant sequence and the MHC protein is at least partially encoded by the HLA variant sequence.

Embodiment 32. The method of embodiment 31, wherein the peptide is a mutant peptide.

Embodiment 33. The method of embodiment 31 or 32, wherein the peptide is associated with an increased risk of cancer.

Embodiment 34. The method of any one of embodiments 31-33, further comprising selecting a treatment to be administered to the subject, wherein:

- the treatment comprises administration of a cell to the subject; and
- the cell comprises a cell receptor that recognizes the peptide on the MHC protein expressed in the biological sample.

Embodiment 35. The method of embodiment 34, wherein the cell is a T-cell, a B cell, or an natural killer T (NKT) cell.

Embodiment 36. The method of embodiment 34, wherein the cell is a recombinant T-cell.

Embodiment 37. The method of any one of embodiments 34-36, wherein the cell expresses a chimeric antigen receptor.

Embodiment 38. The method of any one of embodiments 33-35, wherein the cell expresses a recombinant T cell receptor.

Embodiment 39. The method of any one of embodiments 31-38, further comprising selecting a treatment to be administered to the subject, wherein the treatment is more likely to be effective in a subject with one or more cancer cells presenting the peptide on the MHC protein expressed in the biological sample than it is in a subject without the one or more cancer cells that present the peptide on the MHC protein expressed in the biological sample.

Embodiment 40. The method of embodiment 39, wherein the treatment is an immunotherapy.

Embodiment 41. The method of embodiment 39 or embodiment 40, wherein the treatment comprises administration of a checkpoint inhibitor to the subject.

Embodiment 42. The method of any one of embodiments 31-41, further comprising selecting a treatment to be administered to the subject, wherein:

- the treatment comprises administration of the peptide; and
- the treatment is more likely to be effective in a subject with one or more cancer cells expressing the MHC protein expressed in the biological sample than it is in a subject without one or more cancer cells that express the MHC protein.

Embodiment 43. The method of any one of embodiments 1-42, wherein the biological sample further comprises a third nucleic acid, and wherein the third nucleic acid corresponds to a second HLA variant sequence, the method further comprising:

(1) hybridizing a third probe comprising a second HLA targeting sequence to the third nucleic acid;

(2) identifying at least a portion of the third probe; and

(3) determining a location of the second HLA variant sequence within the biological sample by determining a location of the third probe.

Embodiment 44. The method of any one of embodiments 1-43, further comprising, prior to operation (b):

(I) obtaining a genetic profile of the subject;

(II) detecting a presence or absence of a first HLA allele in the subject by analyzing the genetic profile.

Embodiment 45. The method of embodiment 44, wherein the first HLA variant sequence comprises the first HLA allele detected in the genetic profile.

Embodiment 46. The method of embodiment 44, wherein the first HLA allele comprises a mutation.

Embodiment 47. The method of embodiment 44, wherein the first HLA allele is a gene variant.

Embodiment 48. The method of any one of embodiments 1-43, further comprising identifying a first group of HLA alleles, wherein the first group of HLA alleles are expressed in the biological sample, and wherein the first probe is designed to hybridize to a nucleic acid corresponding to one of the HLA alleles of the first group of alleles.

Embodiment 49. The method of embodiment 48, wherein the first probe discriminates between two alleles of the first group of HLA alleles.

Embodiment 50. The method of any one of embodiments 1-43, further comprising, prior to operation (b):

(I) obtaining a genetic profile of the subject;

(II) detecting a plurality of HLA alleles in the subject by analyzing the genetic profile;

wherein the first probe preferentially hybridizes to a nucleic acid corresponding to only one of the HLA alleles detected in the genetic profile.

Embodiment 51. The method of any one of embodiments 44-50, wherein the genetic profile is generated via RNA sequencing.

Embodiment 52. The method of any one of embodiments 44-50, wherein the genetic profile is generated via exome sequencing.

Embodiment 53. The method of any one of embodiments 1-52, wherein the first HLA variant sequence is a class I HLA allele.

Embodiment 54. The method of any one of embodiments 1-52, wherein the first HLA variant sequence is a class II HLA allele.

Embodiment 55. The method of any one of embodiments 1-52, wherein the first HLA variant sequence is HLA-A*01:01.

Embodiment 56. The method of any one of embodiments 1-52, wherein the first HLA variant sequence is HLA-A*02:01.

Embodiment 57. The method of any one of embodiments 1-52, wherein the first HLA variant sequence is HLA-B*44:02.

Embodiment 58. The method of any one of embodiments 1-52, wherein the first HLA variant sequence is HLA-C*07:01.

Embodiment 59. The method of any one of embodiments 1-52, wherein the first HLA variant sequence is HLA-C*08:02.

Embodiment 60. The method of any one of embodiments 1-52, wherein the first HLA variant sequence is HLA-DPA1.

Embodiment 61. The method of any one of embodiments 1-52, wherein the first HLA variant sequence is HLA-DPB1*01.

Embodiment 62. The method of any one of embodiments 1-52, wherein the first HLA variant sequence is HLA-DQA1.

Embodiment 63. The method of any one of embodiments 1-52, wherein the first HLA variant sequence is HLA-DQB1.

Embodiment 64. The method of any one of embodiments 1-52, wherein the first HLA variant sequence is HLA DRB1.

Embodiment 65. The method of any one of embodiments 1-52, wherein the first HLA variant sequence is HLA-DRA.

Embodiment 66. The method of any one of embodiments 1-65, wherein the nucleic acid corresponding to the first HLA variant sequence is a DNA molecule.

Embodiment 67. The method of any one of embodiments 1-65, wherein the nucleic acid corresponding to the first HLA variant sequence is an RNA molecule.

Embodiment 68. The method of any one of embodiments 1-66, further comprising, prior to operation (b), reverse transcribing RNA expressed in the biological sample to form cDNA, wherein the cDNA comprises the nucleic acid corresponding to the first HLA variant sequence.

Embodiment 69. The method of any one of embodiments 1-68, wherein the biological sample is a tissue biopsy.

Embodiment 70. The method of any one of embodiments 1-68, wherein the biological sample is a tumor biopsy.

Embodiment 71. The method of any one of embodiments 1-68, wherein the biological sample is biological tissue.

Embodiment 72. The method of any one of embodiments 1-68, wherein the biological sample is a surgical resection.

Embodiment 73. The method of any one of embodiments 1-68, wherein the biological sample is a tumor.

Embodiment 74. The method of any one of embodiments 1-68, wherein the biological sample is a blood sample.

Embodiment 75. The method of any one of embodiments 1-73, further comprising, prior to operation (b), generating a section, wherein the section comprises a portion of the biological sample.

Embodiment 76. The method of any one of embodiments 1-74, further comprising, prior to operation (c) subjecting the first probe to an amplification reaction to generate an amplified nucleic acid molecule that corresponds to the first HLA variant sequence.

Embodiment 77. The method of embodiment 75, wherein identifying at least a portion of the first probe comprises identifying at least a portion of the amplified nucleic acid molecule.

Embodiment 78. The method of embodiment 75 or 76, wherein determining the location of the first probe comprises determining a location of the amplified nucleic acid molecule.

Embodiment 79. The method of any one of embodiments 7-77, further comprising, prior to operation (c) subjecting the first probe to an amplification reaction to generate an amplified nucleic acid molecule that corresponds to the first HLA variant sequence and immobilizing the amplified nucleic acid molecule on the 3D matrix.

Embodiment 80. The method of any one of embodiments 1-78, wherein the first probe is a circularizable probe.

Embodiment 81. The method of embodiment 80, wherein the circularizable probe is a padlock probe.

Embodiment 82. The method of embodiment 81, wherein the padlock probe comprises:

- a first end;
- a second end;
- a 5′ terminal region; and
- a 3′ terminal region; and
  wherein the 5′ terminal region and the 3′ terminal region hybridize to the nucleic acid corresponding to the first HLA variant sequence.

Embodiment 83. The method of embodiment 82, further comprising circularizing the padlock probe by ligating the first end and the second end of the padlock probe together, thereby generating a circularized padlock probe.

Embodiment 84. The method of embodiment 82, wherein the first end and the second end are contiguous.

Embodiment 85. The method of embodiment 82, wherein the first end and the second end are separated by a gap region containing at least one nucleotide.

Embodiment 86. The method of embodiment 85, wherein the gap region contains from 2 to 500 nucleotides.

Embodiment 87. The method of embodiment 86, further comprising filling the gap region by incorporating at least one nucleotide in an extension reaction.

Embodiment 88. The method of any one of embodiments 76-87, wherein the amplification reaction is a rolling circle amplification (RCA) reaction.

Embodiment 89. The method of any one of embodiments 76-88, wherein:

- the nucleic acid corresponding to the first HLA variant sequence is a DNA molecule hybridized to an RNA molecule;
- the nucleic acid corresponding to the first HLA variant sequence comprises a first sequence;
- the RNA molecule comprises a second sequence;
- the first sequence is the reverse complement of the second sequence;
- the method further comprises, prior to (b);
  - (i) degrading or digesting at least a portion of the RNA molecule; and
- the second sequence is identified based on the identification of at least a portion of the amplified nucleic acid sequence.

Embodiment 90. The method of embodiment 89, wherein the DNA molecule is a cDNA molecule.

Embodiment 91. The method of embodiment 89 or 90, wherein:

- the biological sample is present in a 3D matrix; and
- the DNA molecule is immobilized to the 3D matrix.

Embodiment 92. The method of any one of embodiments 89-90, wherein:

- the biological sample is present in a 3D matrix; and
- the first probe is immobilized to the 3D matrix.

Embodiment 93. The method of any one of embodiments 1-92, further comprising administering a treatment to the subject, wherein the treatment is selected for administration to the subject based at least partially on the spatial distribution of the HLA variant sequence in the biological sample.

Embodiment 94. The method of embodiment 93, wherein the treatment comprises an immunotherapy.

Embodiment 95. The method of embodiment 93, wherein the treatment comprises a checkpoint inhibitor.

Embodiment 96. The method of embodiment 93, wherein the treatment comprises a cancer vaccine.

Embodiment 97. The method of embodiment 93, wherein the treatment comprises a chimeric antigen receptor T-cell therapy.

Embodiment 98. The method of embodiment 93, wherein the treatment comprises a recombinant T-cell therapy.

Embodiment 99. A method of identifying a location of a human leukocyte antigen (HLA) allele in a biological sample comprising targeting a nucleobase to a nucleic acid molecule encoding the HLA allele in the biological sample and identifying a sequence of the nucleic acid molecule or derivative thereof in situ to identify the location of the HLA allele within the biological sample.

Embodiment 100. A method of identifying a location of a human leukocyte antigen (HLA) allele in a biological sample comprising targeting a nucleic acid probe molecule to a nucleic acid molecule encoding the HLA allele in the biological sample and identifying a sequence of the nucleic acid molecule or derivative thereof in situ to identify the location of the HLA allele within the biological sample.

EXAMPLES

The following examples are included for illustrative purposes only and are not intended to limit the scope of the disclosure.

Example 1—RNA-Seq Guided Selection and Administration of Cancer Treatment

Cancer treatment is selected and administered based on RNA sequencing of a tumor.

A tumor biopsy is taken from a subject and placed in a microcentrifuge tube. The biopsy sample is homogenized in 1 mL of TRIzol (TRI) reagent per 10 cm²of tissue area. The homogenized tissue is then transferred to a centrifuge tube. The homogenized tissue is then spun in a centrifuge at 12,000×g for 10 minutes at 4° C. The supernatant is then transferred to a fresh tube which is incubated at room temperature for 5 minutes. Chloroform is then added at a ratio of 1:5 chloroform to TRI reagent and the tube is shaken and incubated at room temperature for 2 minutes. Following the incubation period, the centrifuge tube is spun at 12,000×g for fifteen minutes at 4° C. After centrifugation, the aqueous (top) phase is collected via a pipette and 70% EtOH is added to the aqueous phase in a 1:1 ratio of ethyl alcohol (EtOH) to the aqueous phase. 700 μL of the EtOH/aqueous mixture is then added to a RNeasy column. The RNeasy column is spun at 12,000×g for 30 seconds and the flow through is discarded. 700 μL of RW1 buffer (from an RNeasy kit) is added to the column which is again spun at 12,000×g for 30 seconds. 500 μL of RPE buffer (from an RNeasy kit) is added to the column which is spun at 12,000×g before the flow through is discarded. 500 μL of RPE buffer is then again added to the column which is spun at 12,000×g before the flow through is discarded. The column is then spun via a centrifuge at 12,000×g for 2 minutes to dry the column of EtOH. 50 μL of Ribonuclease (RNase)/deoxyribonuclease (DNase)-free H₂O is added to the column. 1 minute later the column is spun at 12,000×g for 5 minutes. Following this centrifugation, the column eluate (which contains isolated RNA) is transferred to a new tube. The absorbance of the RNA solution is measured at 260 nm and 280 nm to determine the RNA concentration of the solution.

RNA transcripts are enriched via a RiboMinus Eukaryote Kit for RNA-Seq. Library construction is performed using a SOLiD® Total RNA-Seq Kit. SOLiD® next-generation sequencing is performed to assess the expression of genes including epidermal growth factor receptor (EGFR), HLA and cancer/testis antigen (CTA) genes. Results show that the tumor is an EGFR negative tumor that expresses HLA-A*02:02 and a CTA. The tumor is treated via the administration of recombinant T-cells containing T-cell receptors targeted towards MHC-antigen complexes displaying the CTA. Treatment is unable to eliminate the tumor. A physician realizes that the recombinant T-cell therapy lacks efficacy. The physician recommends a chemotherapeutic agent as a second line therapy. The delay in administration of the second line therapy decreases the likelihood of a positive outcome for the patient.

Example 2—Selection and Administration of Cancer Treatment Using a Method of the Disclosure

HLA and CTA genes are labelled with probe molecules of the disclosure. A visual representation of HLA and CTA expression is generated by a computer algorithm via the analysis of FISSEQ data. The generated visual representation is used to guide the selection of cancer treatment.

A tumor biopsy is taken from a subject and fixed using 4% formaldehyde for overnight, followed by 3 washes (including on overnight wash) with 70% EtOH. The sample is washed using PBS and cross-linked using 100 μM BS(PEG)9 (Thermo-Fisher Scientific) in PBS for 1 hour, followed by 1M Tris treatment for fifteen minutes. The biopsy sample is then incubated with probe molecules designed to target EGFR, CTA, and various HLA genes including HLA-A*02:02. Padlocks are then added to the sample to allow hybridization of the circularizing probe molecules, such as padlock probes. The circularization mixture containing 2000U T4 DNA ligase in 10× T4 ligase buffer (NEB) is then added, and the sample is incubated at 60° C. for two hours. Alternatively, the circularization mixture can contain 25U CircLigase (Epicentre), 1 mM MnCl and 1 M Betain. The RCA primer is then hybridized to the sample at 60° C. for fifteen minutes and washed. For rolling circle amplification, 100 U phi29 DNA polymerase (Enzymatics), 250 μM dNTP and 40 μM aminoallyl dNTP are added to the sample and incubated at 30° C. overnight. The sample is then washed using PBS and cross-linked using 100 μM BS(PEG)9 in PBS for 1 hour, followed by 1M Tris treatment for fifteen minutes. For the amplification product detection via FISSEQ analysis, 1 fluorescently label oligonucleotides will be diluted in 2×SSC and hybridized to the matrix containing the DNA amplicons at 60° C. and washed. Imaging is performed using Leica SP5 scanning confocal microscope using 10×, 20× or 63× objectives in four color channels (FITC, Cy3, Texas Red and Cy5).

Detection of amplification products allows for the identification of each type of probe molecule within the biological sample. Each type of probe molecule is bound to nucleotide corresponding to a variant sequence encoding EGFR, a HLA or CTA gene. Thus, the location of each probe molecule is indicative of EGFR, HLA, or CTA gene expression. A computer program then assigns a color to each probe molecule detected and generates a spatial map of HLA and CTA gene expression.

The spatial map of gene expression shows that, although the tumor expresses CTA, this expression is limited to only a small portion of the tumor, as shown in FIG. 4. A physician thus administers recombinant T-cells containing T-cell receptors targeted towards MHC-antigen complexes displaying CTA in combination with chemotherapeutic agents as a second-line therapy. A lack of delay in administration of the second-line therapy increases the odds of a positive patient outcomes.

Example 3—Targeting Multiple Clonal Populations Identified with a Method of the Disclosure

HLA, somatic mutations, and tumor associated antigens are labelled with probe molecules of the disclosure. A visual representation of HLA, somatic mutations, and tumor associated antigen expression is generated by a computer algorithm via the analysis of FISSEQ data. The generated visual representation is used to guide the selection of multiple immunotherapies administered in combination for cancer treatment.

A tumor biopsy is taken from a subject and fixed using 4% formaldehyde for overnight, followed by 3 washes (including on overnight wash) with 70% EtOH. The sample is washed using PBS and cross-linked using 100 μM BS(PEG)9 (Thermo-Fisher Scientific) in PBS for 1 hour, followed by 1M Tris treatment for fifteen minutes. The biopsy sample is then incubated with probe molecules designed to target EGFR, CTA, and various HLA genes including HLA-A*02:02. Padlocks are then added to the sample to allow hybridization of the padlocks to the probe molecules. The circularization mixture containing 2000U T4 DNA ligase in 10× T4 ligase buffer (NEB) is then added, and the sample is incubated at 60° C. for two hours. Alternatively, the circularization mixture can contain 25U CircLigase (Epicentre), 1 mM MnCl and 1 M Betain. The RCA primer is then hybridized to the sample at 60° C. for fifteen minutes and washed. For rolling circle amplification, 100 U phi29 DNA polymerase (Enzymatics), 250 μM dNTP and 40 μM aminoallyl dNTP are added to the sample and incubated at 30° C. overnight. The sample is then washed using PBS and cross-linked using 100 μM BS(PEG)9 in PBS for 1 hour, followed by 1M Tris treatment for fifteen minutes. For the amplification product detection via FISSEQ analysis, 1 μM fluorescently label oligonucleotides will be diluted in 2×SSC and hybridized to the matrix containing the DNA amplicons at 60° C. and washed. Imaging is performed using Leica SP5 scanning confocal microscope using 10×, 20× or 63× objectives in four color channels (FITC, Cy3, Texas Red and Cy5).

Detection of amplification products allows for the identification of each type of probe molecule within the biological sample. Each type of probe molecule is bound to nucleotide corresponding to a variant sequence encoding EGFR, a HLA or CTA gene. Thus, the location of each probe molecule is indicative of EGFR, HLA, or CTA gene expression. A computer program then assigns a color to each probe molecule detected and generates a spatial map of HLA and CTA gene expression.

The spatial map of gene expression shows two clonal populations of cancer cells—clones expressing EGFR and HLA-A*02:02 that do not express CTA, and clones expressing CTA and HLA-A*02:02 that do not express EGFR. A physician thus administers two therapies simultaneously. One therapy contains recombinant T-cells engineered to target the CTA-HLA-A*02:02 complexes, while the other therapy is the EGFR inhibitor erlotinib. Treatment with both therapies increases the odds of a positive patient outcome versus treatment with either single therapy alone, as clones expressing the CTA-HLA-A*02:02 complexes but not EGFR are resistant to erlotinib, and clones expressing EGFR but not the CTA can evade the recombinant T-cell therapy.

Example 4—Prioritizing Somatic Variants in Cancer Vaccines

HLA and somatic mutations are labelled with probe molecules of the disclosure. A visual representation of HLA, somatic mutations, and tumor associated antigen expression is generated by a computer algorithm via the analysis of FISSEQ data. The generated visual representation is used to guide the selection of multiple immunotherapies administered in combination for cancer treatment.

A tumor biopsy is taken from a subject. Part of this tumor biopsy is used for next generation sequencing to identify somatic variants and the germline class I HLA alleles. The remainder is fixed using 4% formaldehyde for overnight, followed by 3 washes (including on overnight wash) with 70% EtOH. The sample is washed using PBS and cross-linked using 100 μM BS(PEG)9 (Thermo-Fisher Scientific) in PBS for 1 hour, followed by 1M Tris treatment for fifteen minutes. The biopsy sample is then incubated with probe molecules designed to somatic mutations and various HLA alleles. Padlocks are then added to the sample to allow hybridization of the padlocks to the probe molecules. The circularization mixture containing 2000U T4 DNA ligase in 10× T4 ligase buffer (NEB) is then added, and the sample is incubated at 60° C. for two hours. Alternatively, the circularization mixture can contain 25U CircLigase (Epicentre), 1 mM MnCl and 1 M Betain. The RCA primer is then hybridized to the sample at 60° C. for fifteen minutes and washed. For rolling circle amplification, 100 U phi29 DNA polymerase (Enzymatics), 250 μM dNTP and 40 μM aminoallyl dNTP are added to the sample and incubated at 30° C. overnight. The sample is then washed using PBS and cross-linked using 100 μM BS(PEG)9 in PBS for 1 hour, followed by 1M Tris treatment for fifteen minutes. For the amplification product detection via FISSEQ analysis, 1 μM fluorescently label oligonucleotides will be diluted in 2×SSC and hybridized to the matrix containing the DNA amplicons at 60° C. and washed. Imaging is performed using Leica SP5 scanning confocal microscope using 10×, 20× or 63× objectives in four color channels (FITC, Cy3, Texas Red and Cy5).

Detection of amplification products allows for the identification of each type of probe molecule within the biological sample. Each type of probe molecule is bound to nucleotide corresponding to a variant sequence encoding a somatic variant or a class I HLA allele. A computer program then assigns a color to each probe molecule detected and generates a spatial map of HLA and somatic variant expression. This map is used to predict the level of expression of each somatic variant, as well as colocalization of somatic variants and HLA expression.

The spatial map of gene expression shows three clonal populations of cancer cells. Two clones are observed to express all class I HLA alleles, while no expression from the germline HLA-A*02:02 allele is observed in the third clone. The therapeutic vaccine construct can target at most twenty neoantigens, so the twenty most effective neoantigens may be prioritized for inclusion. However, there is no one presented neoantigen that is present on all of the three clones. To ensure the complete tumor is targeted, neoantigens a selected based to the spatial map to ensure coverage of each of the three clones and to avoid neoantigens presented by the HLA-A*02:02 allele on the third clone.

Example 5—Assessment of HLA Variants and Immune Marker Expression

This example illustrates the use of a tumor's HLA type and specific probes and to create a clonal map of the allele-specific expression of class I HLA genes. In the same sample, the expression of immune cell markers was also mapped. Padlock probes were designed against target sequences for the class I HLA genes and immune genes including CD3 and CD4. HLA typing was performed to the tumor prior to probe design. The tumor's HLA type was assigned from the RNA sequencing data using the software Optitype (see e.g., Szolekk et al., Bioinformatics (2014) 30(23):3310-6).

A tumor biopsy was obtained and fixed using 4% formaldehyde. The sample was washed cross-linked and treated with Tris. The sample was then incubated with the probe molecules to allow hybridization to nucleic acids in the sample. A circularization mixture containing a ligase in a buffer was added to the sample and incubated. Following ligation of the padlock probes, a primer was added to the sample for rolling circle amplification (RCA). The sample was incubated with a mixture of phi29 DNA polymerase and dNTPs and then washed using PBS.

For detection via FISSEQ analysis, fluorescently label oligonucleotides were added to the sample for hybridization. Imaging was performed to detect the RCA products and identify target nucleic acid molecules within the biological sample. A computer program then assigned a color to each target allele or gene detected via the analysis of the FISSEQ data and generated a spatial map of HLA and somatic variant expression. As shown in FIG. 5A-5B, the image shows a visual representation of detected expression of the indicated somatic variant, as well as presence of T cell markers. In the sample, HLA-A and HLA-B variant expression was observed in proximity to markers of T cells. HLA Class I variants were detected by spatial sequencing in lung adenocarcinoma despite sequence homology.

Using this approach and methods described herein, clonal antigen presentation can be detected including a combination of HLA expression and the expression of various cancer antigens, immune markers, and/or inflammation markers. In some examples, this can provide a spatial map of the tumor microenvironment and provide a method for evaluating and/or monitoring the effectiveness of therapies such as HLA-restricted immunotherapies, such as cancer vaccines. In some aspects, HLA clonal information can be used to prioritize neoantigens presented throughout the tumor. In some cases, the methods described herein may provide insight into why immunotherapies fail for some individuals. In some cases, tumors with sufficient HLA antigen presentation throughout the tumor can be identified as potential candidates for immunotherapy such as checkpoint inhibitors. In some aspects, HLA clonal information can be used to select most effective therapies and/or exclude patients from high-risk treatments where immune evasion is likely.

While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the invention be limited by the specific examples provided within the specification. While the invention has been described with reference to the aforementioned specification, the descriptions and illustrations of the embodiments herein are not meant to be construed in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. Furthermore, it shall be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the invention shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims

1.-78. (canceled)

79. A method of analyzing a spatial distribution of a human leukocyte antigen (HLA) variant sequence in a biological sample from a subject comprising:

(a) obtaining a genetic profile of the subject and detecting the HLA variant sequence in the subject by analyzing the genetic profile;

(b) hybridizing a first probe comprising an HLA targeting sequence to a first nucleic acid in the biological sample corresponding to the HLA variant sequence;

(c) identifying at least a portion of the first probe; and

(d) determining a location of the HLA variant sequence within the biological sample by determining a location of the first probe, wherein the first probe preferentially hybridizes to the first nucleic acid corresponding to the HLA variant sequence detected in the genetic profile.

80. The method of claim 79, wherein the genetic profile is generated via ribonucleic acid (RNA) sequencing or exome sequencing.

81. The method of claim 79, wherein the biological sample comprises a second nucleic acid and the method further comprises:

(A) hybridizing a second probe comprising another nucleic acid targeting sequence to the second nucleic acid;

(B) identifying at least a portion of the second probe; and

(C) determining a location of the second nucleic acid within the biological sample by determining a location of the second probe.

82. The method of claim 81, wherein the second nucleic acid corresponds to an additional HLA variant sequence.

83. The method of claim 81, wherein the second nucleic acid comprises a mutation.

84. The method of claim 81, wherein the second nucleic acid is associated with a tumor antigen or a marker of inflammation.

85. The method of claim 81, wherein the second nucleic acid is associated with a marker for cell typing.

86. The method of claim 81, further comprising generating a visual representation of the location of the HLA variant sequence and the location of the second nucleic acid for display on a graphical user interface (GUI).

87. The method of claim 81, further comprising predicting the presentation of a peptide on a major histocompatibility complex (MHC) protein expressed in the biological sample, wherein the peptide is at least partially encoded by the second nucleic acid and the MHC protein is at least partially encoded by the HLA variant sequence.

88. The method of claim 79, further comprising, prior to identifying the at least a portion of the first probe, subjecting the first probe to an amplification reaction to generate an amplified nucleic acid molecule that corresponds to the HLA variant sequence.

89. The method of claim 79, further comprising contacting the biological sample with a plurality of fluorescently label oligonucleotides directly or indirectly to identify at least a portion of the first probe.

90. A method of analyzing a biological sample from a subject, comprising:

(a) obtaining the biological sample comprising a first nucleic acid and a second nucleic acid, wherein the first nucleic acid corresponds to an HLA variant sequence;

(b) hybridizing a first probe comprising an HLA targeting sequence to the first nucleic acid corresponding to the HLA variant sequence and hybridizing a second probe comprising another nucleic acid targeting sequence to the second nucleic acid;

(c) identifying at least a portion of the first probe and at least a portion of the second probe; and

(d) determining a location of the HLA variant sequence and a location of the second nucleic acid within the biological sample.

91. The method of claim 90, wherein the second nucleic acid corresponds to an additional HLA variant sequence.

92. The method of claim 90, wherein the second nucleic acid comprises a mutation.

93. The method of claim 90, wherein the second nucleic acid is associated with a tumor antigen or a marker of inflammation.

94. The method of claim 90, wherein the second nucleic acid is associated with a marker for cell typing.

95. The method of claim 90, further comprising generating a visual representation of the location of the HLA variant sequence and the location of the second nucleic acid for display on a graphical user interface (GUI).

96. The method of claim 90, further comprising predicting the presentation of a peptide on a major histocompatibility complex (MHC) protein expressed in the biological sample, wherein the peptide is at least partially encoded by the second nucleic acid and the MHC protein is at least partially encoded by the HLA variant sequence.

97. The method of claim 90, further comprising, prior to identifying the at least a portion of the first probe, subjecting the first probe to an amplification reaction to generate an amplified nucleic acid molecule that corresponds to the HLA variant sequence.

98. The method of claim 90, further comprising contacting the biological sample with a plurality of fluorescently label oligonucleotides directly or indirectly to identify at least a portion of the first probe.