Patents by Inventor Keith D. Noto
Keith D. Noto has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240411781Abstract: A computing device may receive a target data instance. The computing device may identify a plurality of matched segments that match to the target data instance for at least a threshold length. The computing device may define, based on overlapping of the matched segments, the target data instance as a plurality of data string ranges, wherein each divided data string is matched to a set of overlapping matched segments. The computing device may apply an iterative clustering algorithm to group the plurality of data string ranges based on values of a similarity metric among data string ranges that are assigned to a given group. The computing device may attribute a first set of data string ranges that are assigned to a first group to a first inheritance.Type: ApplicationFiled: June 25, 2024Publication date: December 12, 2024Inventor: Keith D. Noto
-
Patent number: 12148507Abstract: Described are computational methods to reconstruct the chromosomes (and genomes) of ancestors given genetic data, IBD information, and full or partial pedigree information of some number of their descendants.Type: GrantFiled: December 3, 2019Date of Patent: November 19, 2024Assignee: Ancestry.com DNA, LLCInventors: Julie M. Granka, Keith D. Noto
-
Patent number: 12086735Abstract: An input sample SNP genotype is divided into a plurality of windows, each including a sequence of SNPs. For each window, a diploid hidden Markov Model (HMM) is built and from a haplotype Markov Model (MM). The diploid HMM for a window is used to determine the probability that the window corresponds to a pair of labels (e.g., ethnicity labels). An inter-window HMM, with a set of states for each window, is built based on the diploid HMMs for each window. Labels are assigned to the input sample genotype based on the inter-window HMM.Type: GrantFiled: January 8, 2020Date of Patent: September 10, 2024Assignee: Ancestry.com DNA, LLCInventors: Keith D. Noto, Yong Wang
-
Patent number: 12050629Abstract: A computing device may receive a target data instance. The computing device may identify a plurality of matched segments that match to the target data instance for at least a threshold length. The computing device may define, based on overlapping of the matched segments, the target data instance as a plurality of data string ranges, wherein each divided data string is matched to a set of overlapping matched segments. The computing device may apply an iterative clustering algorithm to group the plurality of data string ranges based on values of a similarity metric among data string ranges that are assigned to a given group. The computing device may attribute a first set of data string ranges that are assigned to a first group to a first inheritance.Type: GrantFiled: October 6, 2023Date of Patent: July 30, 2024Assignee: Ancestry.com DNA, LLCInventor: Keith D. Noto
-
Patent number: 12045219Abstract: Disclosed herein relates to a method that improves the accuracy of producing family trees. The DNA of a target individual is processed to find a matching individual. Using the known family tree of the matching individual, multiple candidate family trees are generated with multiple proposed placements for the target individual. For each candidate family tree, a genetic likelihood for a proposed relationship and the other DNA test takers in the family tree. A birth-year probability is determined by identifying a most recent common ancestor (MRCA). The birth-year probability is based on the number of years between the target individual and the matching individual and a normal distribution of ages for parent-child age differences in a population. The genetic likelihood is converted to a genetic probability so that it can be compared with or added to the birth-year probability. Based on the two probabilities, the candidate family trees are sorted.Type: GrantFiled: November 23, 2022Date of Patent: July 23, 2024Assignee: Ancestry.com DNA, LLCInventors: Jingwen Pei, Keith D. Noto
-
Patent number: 12040054Abstract: An input genotype is divided into a plurality of windows, each including a sequence of SNPs. For each window, a diploid HMM is computed based on genotypes and/or phased haplotypes to determine a probability of a haplotype sequence being associated with a particular label. For example, the diploid HMM for a window is used to determine the emission probability that the window corresponds to a set of labels. An inter-window HMM, with a set of states for each window, is computed. Labels are assigned to the input genotype based on the inter-window HMM. Upper and lower bounds are estimated to produce a range of likely percentage values an input can be assigned to a given label. Confidence values are determined indicating a likelihood that an individual inherits DNA from a certain population. Maps are generated with polygons representing regions where a measure of ethnicity of population falls within specific ranges.Type: GrantFiled: May 13, 2020Date of Patent: July 16, 2024Assignee: ANCESTRY.COM DNA, LLCInventors: Shiya Song, Keith D. Noto, Yong Wang
-
Publication number: 20240061886Abstract: A computing server may generate a catalog of overrepresented data strings from a database that stores a plurality of data instances. An overrepresented data string is a data string that matches to a number of data instances and the number exceeds a number threshold. The computing server may receive a target data instance that is to be compared to a related data instance. The computing server may determine one or more matched data strings that match between the target data instance and the related data instance. The computing server may compare the matched data strings to the catalog to exclude a subset of matched data strings that are matched to the overrepresented data strings. The computing server may determine a total length of the matched data strings excluding the subset of matched data strings that are matched to the overrepresented data strings.Type: ApplicationFiled: August 18, 2023Publication date: February 22, 2024Inventor: Keith D. Noto
-
Publication number: 20230317300Abstract: Disclosed herein relates to a method that uses the RAM of multiple servers to increase the efficiency of identifying segments of a target dataset that match segments of other datasets in a database. An encoding system may encode large genetic datasets to produce pairs of bitmap sequence pairs that correspond to an encoding scheme. The servers each store portions of the database in their hard drives based on a shared characteristic of the genetic datasets in the database, such as ethnicity or location of birth. The servers encode data from their hard drives and sustain the encoded data in their RAM. A target, or query, individual is input for matching. The servers match the encoded data of the target individual with encoded data in their flash drives and can determine a relationship. The servers sustain the encoded data in RAM to compare against subsequent target individuals.Type: ApplicationFiled: November 21, 2022Publication date: October 5, 2023Inventor: Keith D. Noto
-
Publication number: 20220382730Abstract: Disclosed herein relates to processes that identify segments of a target dataset that match segments of other datasets in a database. A computing server may encode the target dataset to generate a pair of encoded target bitmap sequences based on an encoding scheme. The encoding scheme defines encoding values based on homogeneity between the pair of data value sequences. The computing server may compare the pair of encoded target bitmap sequences with other pairs of encoded bitmap sequences to identify homogeneous mismatched locations. A homogeneous mismatched location may be a location where the target dataset and the other dataset in comparison are both homogeneous but have different types of homogeneity at the location. The computing server may identify a matched segment between the target dataset and one of the other datasets based on the homogeneous mismatched locations identified. The matched segment is contained within two homogeneous mismatched locations.Type: ApplicationFiled: May 26, 2022Publication date: December 1, 2022Inventor: Keith D. Noto
-
Patent number: 11335435Abstract: Identification of inheritance-by-descent haplotype matches between individuals is described. A set of tables including word match, haplotypes and segment match tables are populated. DNA samples are received and stored. A word identification module extracts haplotype values from each sample. The word match table is indexed according to the unique combination of position and haplotype. Each column represents a different sample, and each cell indicates whether that sample includes that haplotype at that position. The haplotypes table includes the raw haplotype data for each sample. The segment match table is indexed by sample identifier, and columns represent other samples. Each cell is populated to indicate for each identified sample pair which position range(s) include matching haplotypes for both samples. The tables are persistently stored in databases of the matching system. As new sample data is received, each table is updated to include the newly received samples, and additional matching takes place.Type: GrantFiled: October 4, 2018Date of Patent: May 17, 2022Assignee: Ancestry.com DNA, LLCInventors: Jake Kelly Byrnes, Aaron Ling, Keith D. Noto, Jeremy Pollack, Catherine Ann Ball, Kenneth Gregory Chahine
-
Publication number: 20210183474Abstract: A system identifies ancestral birth locations or surnames estimated to be associated with an individual's ancestors using an individual's genetic sample. The system identifies users who are genetic matches to the individual and determines whether and how often a birth location or surname appears in the pedigrees of those users. Birth locations or surnames that appear frequently throughout the pedigrees of genetically matching users may represent birth locations or surnames that are affiliated with the individual's ancestors. The system determines whether the frequency of appearance of a birth location or surname is statistically significant to eliminate biases for certain birth locations or surnames that appear more frequently than others. The birth location or surname may be provided to the individual based on an also-determined enrichment score.Type: ApplicationFiled: February 24, 2021Publication date: June 17, 2021Inventors: Amir R. Kermany, Julie M. Granka, Keith D. Noto
-
Patent number: 10957422Abstract: A system identifies ancestral birth locations or surnames estimated to be associated with an individual's ancestors using an individual's genetic sample. The system identifies users who are genetic matches to the individual and determines whether and how often a birth location or surname appears in the pedigrees of those users. Birth locations or surnames that appear frequently throughout the pedigrees of genetically matching users may represent birth locations or surnames that are affiliated with the individual's ancestors. The system determines whether the frequency of appearance of a birth location or surname is statistically significant to eliminate biases for certain birth locations or surnames that appear more frequently than others. The birth location or surname may be provided to the individual based on an also-determined enrichment score.Type: GrantFiled: July 6, 2016Date of Patent: March 23, 2021Assignee: Ancestry.com DNA, LLCInventors: Amir R. Kermany, Julie M. Granka, Keith D. Noto
-
Publication number: 20210034647Abstract: A computer-implemented method for linking individuals' datasets in a database may include receiving a target individual dataset of a target individual and a plurality of additional individual datasets. A computing server may generate a plurality of sub-cluster pairs of first parental groups and second parental groups. At least one of sub-cluster pairs includes a first parental group of matched segments and a second parental group of matched segments. A computing server may link the first parental groups and the second parental groups across the plurality of sub-cluster pairs to generate at least one super-cluster of a parental side. A computing server may assign metadata to one or more additional individual datasets of the plurality of additional individual datasets. The metadata may specify that the one or more additional individual datasets are connected to the target individual dataset by the parental side of the super-cluster.Type: ApplicationFiled: July 23, 2020Publication date: February 4, 2021Inventors: Thi Hong Luong Nguyen, Jingwen Pei, Harendra Guturu, Keith D. Noto
-
Publication number: 20200303035Abstract: Novel haplotype cluster Markov models are used to phase genomic samples. After the models are built, they rapidly and accurately phase new samples without requiring that the new samples be used to re-build the models. The models set transition probabilities such that the probability for an appearance of any allele within any haplotype is a non-zero number. Furthermore, the most unlikely pairs of haplotypes are discarded from each model at each level until ? of the likelihood mass at each level is discarded. The models are also constructed such that contributing windows of SNPs partially overlap so that phasing decisions near one of the extreme ends of any model is are not significantly determinative of the phase. Additionally, the models are configured such that two or more nodes can be merged during the building/updating procedure to consolidate haplotype clusters having similar distributions.Type: ApplicationFiled: April 29, 2020Publication date: September 24, 2020Inventors: Catherine Ann Ball, Keith D. Noto, Kenneth G. Chahine, Mathew J. Barber, Yong Wang
-
Publication number: 20200286579Abstract: An input genotype is divided into a plurality of windows, each including a sequence of SNPs. For each window, a diploid HMM is computed based on genotypes and/or phased haplotypes to determine a probability of a haplotype sequence being associated with a particular label. For example, the diploid HMM for a window is used to determine the emission probability that the window corresponds to a set of labels. An inter-window HMM, with a set of states for each window, is computed. Labels are assigned to the input genotype based on the inter-window HMM. Upper and lower bounds are estimated to produce a range of likely percentage values an input can be assigned to a given label. Confidence values are determined indicating a likelihood that an individual inherits DNA from a certain population. Maps are generated with polygons representing regions where a measure of ethnicity of population falls within specific ranges.Type: ApplicationFiled: May 13, 2020Publication date: September 10, 2020Inventors: Shiya Song, Keith D. Noto, Yong Wang
-
Publication number: 20200286591Abstract: System, computer program products, and methods are disclosed for estimating a degree of ancestral relatedness between two individuals. The haplotype data for a population of individuals is divided into segment windows based on genetic markers, and matched segments for the haplotype data are generated. Each matched segment having a first cM width that exceeds a threshold cM width is included in counting the matched segments in each segment window. A weight associated with each segment window is estimated based on the count of matched segments in the associated segment window. A weighted sum of per-window cM widths for each matched segment is calculated based on the first cM width and the weights associated with the segment windows of the matched segment. The weighted sum of per-window cM widths are used to estimate a degree of ancestral relatedness between two individuals.Type: ApplicationFiled: May 27, 2020Publication date: September 10, 2020Inventors: Mathew J. Barber, Yong Wang, Keith D. Noto, Kenneth G. Chahine, Catherine Ann Ball
-
Patent number: 10720229Abstract: System, computer program products, and methods are disclosed for estimating a degree of ancestral relatedness between two individuals. The haplotype data for a population of individuals is divided into segment windows based on genetic markers, and matched segments for the haplotype data are generated. Each matched segment having a first cM width that exceeds a threshold cM width is included in counting the matched segments in each segment window. A weight associated with each segment window is estimated based on the count of matched segments in the associated segment window. A weighted sum of per-window cM widths for each matched segment is calculated based on the first cM width and the weights associated with the segment windows of the matched segment. The weighted sum of per-window cM widths are used to estimate a degree of ancestral relatedness between two individuals.Type: GrantFiled: October 14, 2015Date of Patent: July 21, 2020Assignee: Ancestry.com DNA, LLCInventors: Mathew J Barber, Yong Wang, Keith D. Noto, Kenneth G. Chahine, Catherine Ann Ball
-
Patent number: 10692587Abstract: An input genotype is divided into a plurality of windows, each including a sequence of SNPs. For each window, a diploid HMM is computed based on genotypes and/or phased haplotypes to determine a probability of a haplotype sequence being associated with a particular label. For example, the diploid HMM for a window is used to determine the emission probability that the window corresponds to a set of labels. An inter-window HMM, with a set of states for each window, is computed. Labels are assigned to the input genotype based on the inter-window HMM. Upper and lower bounds are estimated to produce a range of likely percentage values an input can be assigned to a given label. Confidence values are determined indicating a likelihood that an individual inherits DNA from a certain population. Maps are generated with polygons representing regions where a measure of ethnicity of population falls within specific ranges.Type: GrantFiled: September 11, 2019Date of Patent: June 23, 2020Assignee: Ancestry.com DNA, LLCInventors: Shiya Song, Keith D. Noto, Yong Wang
-
Patent number: 10679729Abstract: Novel haplotype cluster Markov models are used to phase genomic samples. After the models are built, they rapidly and accurately phase new samples without requiring that the new samples be used to re-build the models. The models set transition probabilities such that the probability for an appearance of any allele within any haplotype is a non-zero number. Furthermore, the most unlikely pairs of haplotypes are discarded from each model at each level until c of the likelihood mass at each level is discarded. The models are also constructed such that contributing windows of SNPs partially overlap so that phasing decisions near one of the extreme ends of any model is are not significantly determinative of the phase. Additionally, the models are configured such that two or more nodes can be merged during the building/updating procedure to consolidate haplotype clusters having similar distributions.Type: GrantFiled: October 19, 2015Date of Patent: June 9, 2020Assignee: Ancestry.com DNA, LLCInventors: Catherine Ann Ball, Keith D. Noto, Kenneth G. Chahine, Mathew J. Barber, Yong Wang
-
Publication number: 20200160202Abstract: An input sample SNP genotype is divided into a plurality of windows, each including a sequence of SNPs. For each window, a diploid hidden Markov Model (HMM) is built and from a haplotype Markov Model (MM). The diploid HMM for a window is used to determine the probability that the window corresponds to a pair of labels (e.g., ethnicity labels). An inter-window HMM, with a set of states for each window, is built based on the diploid HMMs for each window. Labels are assigned to the input sample genotype based on the inter-window HMM.Type: ApplicationFiled: January 8, 2020Publication date: May 21, 2020Inventors: Keith D. Noto, Yong Wang