Patents by Inventor Keith D. Noto
Keith D. Noto has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240061886Abstract: A computing server may generate a catalog of overrepresented data strings from a database that stores a plurality of data instances. An overrepresented data string is a data string that matches to a number of data instances and the number exceeds a number threshold. The computing server may receive a target data instance that is to be compared to a related data instance. The computing server may determine one or more matched data strings that match between the target data instance and the related data instance. The computing server may compare the matched data strings to the catalog to exclude a subset of matched data strings that are matched to the overrepresented data strings. The computing server may determine a total length of the matched data strings excluding the subset of matched data strings that are matched to the overrepresented data strings.Type: ApplicationFiled: August 18, 2023Publication date: February 22, 2024Inventor: Keith D. Noto
-
Publication number: 20230317300Abstract: Disclosed herein relates to a method that uses the RAM of multiple servers to increase the efficiency of identifying segments of a target dataset that match segments of other datasets in a database. An encoding system may encode large genetic datasets to produce pairs of bitmap sequence pairs that correspond to an encoding scheme. The servers each store portions of the database in their hard drives based on a shared characteristic of the genetic datasets in the database, such as ethnicity or location of birth. The servers encode data from their hard drives and sustain the encoded data in their RAM. A target, or query, individual is input for matching. The servers match the encoded data of the target individual with encoded data in their flash drives and can determine a relationship. The servers sustain the encoded data in RAM to compare against subsequent target individuals.Type: ApplicationFiled: November 21, 2022Publication date: October 5, 2023Inventor: Keith D. Noto
-
Publication number: 20220382730Abstract: Disclosed herein relates to processes that identify segments of a target dataset that match segments of other datasets in a database. A computing server may encode the target dataset to generate a pair of encoded target bitmap sequences based on an encoding scheme. The encoding scheme defines encoding values based on homogeneity between the pair of data value sequences. The computing server may compare the pair of encoded target bitmap sequences with other pairs of encoded bitmap sequences to identify homogeneous mismatched locations. A homogeneous mismatched location may be a location where the target dataset and the other dataset in comparison are both homogeneous but have different types of homogeneity at the location. The computing server may identify a matched segment between the target dataset and one of the other datasets based on the homogeneous mismatched locations identified. The matched segment is contained within two homogeneous mismatched locations.Type: ApplicationFiled: May 26, 2022Publication date: December 1, 2022Inventor: Keith D. Noto
-
Patent number: 11335435Abstract: Identification of inheritance-by-descent haplotype matches between individuals is described. A set of tables including word match, haplotypes and segment match tables are populated. DNA samples are received and stored. A word identification module extracts haplotype values from each sample. The word match table is indexed according to the unique combination of position and haplotype. Each column represents a different sample, and each cell indicates whether that sample includes that haplotype at that position. The haplotypes table includes the raw haplotype data for each sample. The segment match table is indexed by sample identifier, and columns represent other samples. Each cell is populated to indicate for each identified sample pair which position range(s) include matching haplotypes for both samples. The tables are persistently stored in databases of the matching system. As new sample data is received, each table is updated to include the newly received samples, and additional matching takes place.Type: GrantFiled: October 4, 2018Date of Patent: May 17, 2022Assignee: Ancestry.com DNA, LLCInventors: Jake Kelly Byrnes, Aaron Ling, Keith D. Noto, Jeremy Pollack, Catherine Ann Ball, Kenneth Gregory Chahine
-
Publication number: 20210183474Abstract: A system identifies ancestral birth locations or surnames estimated to be associated with an individual's ancestors using an individual's genetic sample. The system identifies users who are genetic matches to the individual and determines whether and how often a birth location or surname appears in the pedigrees of those users. Birth locations or surnames that appear frequently throughout the pedigrees of genetically matching users may represent birth locations or surnames that are affiliated with the individual's ancestors. The system determines whether the frequency of appearance of a birth location or surname is statistically significant to eliminate biases for certain birth locations or surnames that appear more frequently than others. The birth location or surname may be provided to the individual based on an also-determined enrichment score.Type: ApplicationFiled: February 24, 2021Publication date: June 17, 2021Inventors: Amir R. Kermany, Julie M. Granka, Keith D. Noto
-
Patent number: 10957422Abstract: A system identifies ancestral birth locations or surnames estimated to be associated with an individual's ancestors using an individual's genetic sample. The system identifies users who are genetic matches to the individual and determines whether and how often a birth location or surname appears in the pedigrees of those users. Birth locations or surnames that appear frequently throughout the pedigrees of genetically matching users may represent birth locations or surnames that are affiliated with the individual's ancestors. The system determines whether the frequency of appearance of a birth location or surname is statistically significant to eliminate biases for certain birth locations or surnames that appear more frequently than others. The birth location or surname may be provided to the individual based on an also-determined enrichment score.Type: GrantFiled: July 6, 2016Date of Patent: March 23, 2021Assignee: Ancestry.com DNA, LLCInventors: Amir R. Kermany, Julie M. Granka, Keith D. Noto
-
Publication number: 20210034647Abstract: A computer-implemented method for linking individuals' datasets in a database may include receiving a target individual dataset of a target individual and a plurality of additional individual datasets. A computing server may generate a plurality of sub-cluster pairs of first parental groups and second parental groups. At least one of sub-cluster pairs includes a first parental group of matched segments and a second parental group of matched segments. A computing server may link the first parental groups and the second parental groups across the plurality of sub-cluster pairs to generate at least one super-cluster of a parental side. A computing server may assign metadata to one or more additional individual datasets of the plurality of additional individual datasets. The metadata may specify that the one or more additional individual datasets are connected to the target individual dataset by the parental side of the super-cluster.Type: ApplicationFiled: July 23, 2020Publication date: February 4, 2021Inventors: Thi Hong Luong Nguyen, Jingwen Pei, Harendra Guturu, Keith D. Noto
-
Publication number: 20200303035Abstract: Novel haplotype cluster Markov models are used to phase genomic samples. After the models are built, they rapidly and accurately phase new samples without requiring that the new samples be used to re-build the models. The models set transition probabilities such that the probability for an appearance of any allele within any haplotype is a non-zero number. Furthermore, the most unlikely pairs of haplotypes are discarded from each model at each level until ? of the likelihood mass at each level is discarded. The models are also constructed such that contributing windows of SNPs partially overlap so that phasing decisions near one of the extreme ends of any model is are not significantly determinative of the phase. Additionally, the models are configured such that two or more nodes can be merged during the building/updating procedure to consolidate haplotype clusters having similar distributions.Type: ApplicationFiled: April 29, 2020Publication date: September 24, 2020Inventors: Catherine Ann Ball, Keith D. Noto, Kenneth G. Chahine, Mathew J. Barber, Yong Wang
-
Publication number: 20200286579Abstract: An input genotype is divided into a plurality of windows, each including a sequence of SNPs. For each window, a diploid HMM is computed based on genotypes and/or phased haplotypes to determine a probability of a haplotype sequence being associated with a particular label. For example, the diploid HMM for a window is used to determine the emission probability that the window corresponds to a set of labels. An inter-window HMM, with a set of states for each window, is computed. Labels are assigned to the input genotype based on the inter-window HMM. Upper and lower bounds are estimated to produce a range of likely percentage values an input can be assigned to a given label. Confidence values are determined indicating a likelihood that an individual inherits DNA from a certain population. Maps are generated with polygons representing regions where a measure of ethnicity of population falls within specific ranges.Type: ApplicationFiled: May 13, 2020Publication date: September 10, 2020Inventors: Shiya Song, Keith D. Noto, Yong Wang
-
Publication number: 20200286591Abstract: System, computer program products, and methods are disclosed for estimating a degree of ancestral relatedness between two individuals. The haplotype data for a population of individuals is divided into segment windows based on genetic markers, and matched segments for the haplotype data are generated. Each matched segment having a first cM width that exceeds a threshold cM width is included in counting the matched segments in each segment window. A weight associated with each segment window is estimated based on the count of matched segments in the associated segment window. A weighted sum of per-window cM widths for each matched segment is calculated based on the first cM width and the weights associated with the segment windows of the matched segment. The weighted sum of per-window cM widths are used to estimate a degree of ancestral relatedness between two individuals.Type: ApplicationFiled: May 27, 2020Publication date: September 10, 2020Inventors: Mathew J. Barber, Yong Wang, Keith D. Noto, Kenneth G. Chahine, Catherine Ann Ball
-
Patent number: 10720229Abstract: System, computer program products, and methods are disclosed for estimating a degree of ancestral relatedness between two individuals. The haplotype data for a population of individuals is divided into segment windows based on genetic markers, and matched segments for the haplotype data are generated. Each matched segment having a first cM width that exceeds a threshold cM width is included in counting the matched segments in each segment window. A weight associated with each segment window is estimated based on the count of matched segments in the associated segment window. A weighted sum of per-window cM widths for each matched segment is calculated based on the first cM width and the weights associated with the segment windows of the matched segment. The weighted sum of per-window cM widths are used to estimate a degree of ancestral relatedness between two individuals.Type: GrantFiled: October 14, 2015Date of Patent: July 21, 2020Assignee: Ancestry.com DNA, LLCInventors: Mathew J Barber, Yong Wang, Keith D. Noto, Kenneth G. Chahine, Catherine Ann Ball
-
Patent number: 10692587Abstract: An input genotype is divided into a plurality of windows, each including a sequence of SNPs. For each window, a diploid HMM is computed based on genotypes and/or phased haplotypes to determine a probability of a haplotype sequence being associated with a particular label. For example, the diploid HMM for a window is used to determine the emission probability that the window corresponds to a set of labels. An inter-window HMM, with a set of states for each window, is computed. Labels are assigned to the input genotype based on the inter-window HMM. Upper and lower bounds are estimated to produce a range of likely percentage values an input can be assigned to a given label. Confidence values are determined indicating a likelihood that an individual inherits DNA from a certain population. Maps are generated with polygons representing regions where a measure of ethnicity of population falls within specific ranges.Type: GrantFiled: September 11, 2019Date of Patent: June 23, 2020Assignee: Ancestry.com DNA, LLCInventors: Shiya Song, Keith D. Noto, Yong Wang
-
Patent number: 10679729Abstract: Novel haplotype cluster Markov models are used to phase genomic samples. After the models are built, they rapidly and accurately phase new samples without requiring that the new samples be used to re-build the models. The models set transition probabilities such that the probability for an appearance of any allele within any haplotype is a non-zero number. Furthermore, the most unlikely pairs of haplotypes are discarded from each model at each level until c of the likelihood mass at each level is discarded. The models are also constructed such that contributing windows of SNPs partially overlap so that phasing decisions near one of the extreme ends of any model is are not significantly determinative of the phase. Additionally, the models are configured such that two or more nodes can be merged during the building/updating procedure to consolidate haplotype clusters having similar distributions.Type: GrantFiled: October 19, 2015Date of Patent: June 9, 2020Assignee: Ancestry.com DNA, LLCInventors: Catherine Ann Ball, Keith D. Noto, Kenneth G. Chahine, Mathew J. Barber, Yong Wang
-
Publication number: 20200160202Abstract: An input sample SNP genotype is divided into a plurality of windows, each including a sequence of SNPs. For each window, a diploid hidden Markov Model (HMM) is built and from a haplotype Markov Model (MM). The diploid HMM for a window is used to determine the probability that the window corresponds to a pair of labels (e.g., ethnicity labels). An inter-window HMM, with a set of states for each window, is built based on the diploid HMMs for each window. Labels are assigned to the input sample genotype based on the inter-window HMM.Type: ApplicationFiled: January 8, 2020Publication date: May 21, 2020Inventors: Keith D. Noto, Yong Wang
-
Publication number: 20200098445Abstract: Described are computational methods to reconstruct the chromosomes (and genomes) of ancestors given genetic data, IBD information, and full or partial pedigree information of some number of their descendantsType: ApplicationFiled: December 3, 2019Publication date: March 26, 2020Inventors: Julie M. Granka, Keith D. Noto
-
Publication number: 20200082903Abstract: An input genotype is divided into a plurality of windows, each including a sequence of SNPs. For each window, a diploid HMM is computed based on genotypes and/or phased haplotypes to determine a probability of a haplotype sequence being associated with a particular label. For example, the diploid HMM for a window is used to determine the emission probability that the window corresponds to a set of labels. An inter-window HMM, with a set of states for each window, is computed. Labels are assigned to the input genotype based on the inter-window HMM. Upper and lower bounds are estimated to produce a range of likely percentage values an input can be assigned to a given label. Confidence values are determined indicating a likelihood that an individual inherits DNA from a certain population. Maps are generated with polygons representing regions where a measure of ethnicity of population falls within specific ranges.Type: ApplicationFiled: September 11, 2019Publication date: March 12, 2020Inventors: Shiya Song, Keith D. Noto, Yong Wang
-
Patent number: 10558930Abstract: An input sample SNP genotype is divided into a plurality of windows, each including a sequence of SNPs. For each window, a diploid hidden Markov Model (HMM) is built and from a haplotype Markov Model (MM). The diploid HMM for a window is used to determine the probability that the window corresponds to a pair of labels (e.g., ethnicity labels). An inter-window HMM, with a set of states for each window, is built based on the diploid HMMs for each window. Labels are assigned to the input sample genotype based on the inter-window HMM.Type: GrantFiled: July 13, 2016Date of Patent: February 11, 2020Assignee: Ancestry.com DNA, LLCInventors: Keith D. Noto, Yong Wang
-
Patent number: 10504611Abstract: Described are computational methods to reconstruct the chromosomes (and genomes) of ancestors given genetic data, IBD information, and full or partial pedigree information of some number of their descendants.Type: GrantFiled: October 19, 2015Date of Patent: December 10, 2019Assignee: Ancestry.com DNA, LLCInventors: Julie M. Granka, Keith D. Noto
-
Publication number: 20190139624Abstract: Identification of inheritance-by-descent haplotype matches between individuals is described. A set of tables including word match, haplotypes and segment match tables are populated. DNA samples are received and stored. A word identification module extracts haplotype values from each sample. The word match table is indexed according to the unique combination of position and haplotype. Each column represents a different sample, and each cell indicates whether that sample includes that haplotype at that position. The haplotypes table includes the raw haplotype data for each sample. The segment match table is indexed by sample identifier, and columns represent other samples. Each cell is populated to indicate for each identified sample pair which position range(s) include matching haplotypes for both samples. The tables are persistently stored in databases of the matching system. As new sample data is received, each table is updated to include the newly received samples, and additional matching takes place.Type: ApplicationFiled: October 4, 2018Publication date: May 9, 2019Inventors: Jake Kelly Byrnes, Aaron Ling, Keith D. Noto, Jeremy Pollack, Catherine Ann Ball, Kenneth Gregory Chahine
-
Patent number: 10114922Abstract: Identification of inheritance-by-descent haplotype matches between individuals is described. A set of tables including word match, haplotypes and segment match tables are populated. DNA samples are received and stored. A word identification module extracts haplotype values from each sample. The word match table is indexed according to the unique combination of position and haplotype. Each column represents a different sample, and each cell indicates whether that sample includes that haplotype at that position. The haplotypes table includes the raw haplotype data for each sample. The segment match table is indexed by sample identifier, and columns represent other samples. Each cell is populated to indicate for each identified sample pair which position range(s) include matching haplotypes for both samples. The tables are persistently stored in databases of the matching system. As new sample data is received, each table is updated to include the newly received samples, and additional matching takes place.Type: GrantFiled: September 17, 2013Date of Patent: October 30, 2018Assignee: Ancestry.com DNA, LLCInventors: Jake Kelly Byrnes, Aaron Ling, Keith D. Noto, Jeremy Pollack, Catherine Ann Ball, Kenneth Gregory Chahine