Patents by Inventor Mike PIPPIN
Mike PIPPIN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12019601Abstract: Disclosed are embodiments for horizontally skimming composite datasets. In one embodiment, a method is disclosed comprising receiving a script, the script including commands to access a composite dataset; pre-processing the script to identify a set of columns; loading a metadata file associated with the composite dataset file; parsing the metadata file to identify one or more datasets that include a column in the set of columns; loading data from the one or more datasets; and executing the script on the one or more datasets.Type: GrantFiled: December 26, 2019Date of Patent: June 25, 2024Assignee: YAHOO ASSETS LLCInventors: George Aleksandrovich, Allie K. Watfa, Robin Sahner, Mike Pippin
-
Patent number: 11947927Abstract: Disclosed are embodiments for sorting rows of a dataset after a JOIN operation. In one embodiment, a method is disclosed comprising performing a JOIN operation on an annotation dataset, the performing of the JOIN operation generating an unordered dataset; grouping a plurality of rows in the unordered dataset into a plurality of buckets, the grouping performed based on a root dataset associated with the annotation dataset; sorting each bucket, the sorting comprising sorting each bucket independently; and combining each sorted bucket into a sorted dataset.Type: GrantFiled: December 26, 2019Date of Patent: April 2, 2024Assignee: YAHOO ASSETS LLCInventors: George Aleksandrovich, Allie K. Watfa, Robin Sahner, Mike Pippin
-
Patent number: 11809396Abstract: Disclosed are embodiments for generating metadata files for composite datasets. In one embodiment, a method is disclosed comprising generating a tree representing a plurality of datasets; parsing the tree into an algebraic representation of the tree; identifying a plurality of terms in the algebraic representation, each term in the terms comprising at least two factors, each of the two factors associated with a dataset in the plurality of datasets; generating a metadata object of the plurality of terms; serializing the metadata object to generate serialized terms; and storing the serialized terms in a metadata file associated with the plurality of datasets.Type: GrantFiled: November 21, 2022Date of Patent: November 7, 2023Assignee: YAHOO ASSETS LLCInventors: George Aleksandrovich, Allie K. Watfa, Robin Sahner, Mike Pippin
-
Publication number: 20230259506Abstract: Disclosed embodiments are methods, apparatuses, and computer-readable media for annotating distributed data without redundant data copying. In one embodiment, a method is disclosed comprising reading a raw dataset, the raw dataset comprising a first set of columns and a first set of rows; generating an annotation dataset, the annotation dataset comprising a second set of columns and a second set of rows; assigning row identifiers to each row in the second set of rows, the row identifiers aligning the second set of rows with the first set of rows based on the underlying storage of the raw dataset and annotation dataset; and writing the annotation dataset to a distributed storage medium.Type: ApplicationFiled: April 21, 2023Publication date: August 17, 2023Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN
-
Publication number: 20230252021Abstract: Disclosed embodiments are methods, apparatuses, and computer-readable media for annotating distributed data without redundant data copying. In one embodiment, a method is disclosed comprising reading a raw dataset, the raw dataset comprising a first set of columns and a first set of rows; generating an annotation dataset, the annotation dataset comprising a second set of columns and a second set of rows; assigning row identifiers to each row in the second set of rows, the row identifiers aligning the second set of rows with the first set of rows based on the underlying storage of the raw dataset and annotation dataset; and writing the annotation dataset to a distributed storage medium.Type: ApplicationFiled: April 21, 2023Publication date: August 10, 2023Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN
-
Patent number: 11663162Abstract: Disclosed are embodiments for replacing database table join keys with index keys. In one embodiment, a method is disclosed comprising: receiving, by a processor, annotation data, the annotation data comprising a set of rows; retrieving, by the processor, a root dataset, the root dataset stored in one or more files; generating, by the processor, a row identifier for each row in the set of rows, the row identifier storing a plurality of fields enabling alignment of a respective row in the annotation data to a corresponding row in the root dataset; generating, by the processor, an annotation dataset, the annotation dataset comprising the set of rows and corresponding row identifiers; and writing, by the processor, the annotation dataset to at least one file, the at least one file separate from the one or more files.Type: GrantFiled: August 29, 2022Date of Patent: May 30, 2023Assignee: YAHOO ASSETS LLCInventors: George Aleksandrovich, Allie K. Watfa, Robin Sahner, Mike Pippin
-
Patent number: 11650977Abstract: Disclosed embodiments are methods, apparatuses, and computer-readable media for annotating distributed data without redundant data copying. In one embodiment, a method is disclosed comprising reading a raw dataset, the raw dataset comprising a first set of columns and a first set of rows; generating an annotation dataset, the annotation dataset comprising a second set of columns and a second set of rows; assigning row identifiers to each row in the second set of rows, the row identifiers aligning the second set of rows with the first set of rows based on the underlying storage of the raw dataset and annotation dataset; and writing the annotation dataset to a distributed storage medium.Type: GrantFiled: December 26, 2019Date of Patent: May 16, 2023Assignee: YAHOO ASSETS LLCInventors: George Aleksandrovich, Allie K. Watfa, Robin Sahner, Mike Pippin
-
Publication number: 20230086741Abstract: Disclosed are embodiments for generating metadata files for composite datasets. In one embodiment, a method is disclosed comprising generating a tree representing a plurality of datasets; parsing the tree into an algebraic representation of the tree; identifying a plurality of terms in the algebraic representation, each term in the terms comprising at least two factors, each of the two factors associated with a dataset in the plurality of datasets; generating a metadata object of the plurality of terms; serializing the metadata object to generate serialized terms; and storing the serialized terms in a metadata file associated with the plurality of datasets.Type: ApplicationFiled: November 21, 2022Publication date: March 23, 2023Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN
-
Publication number: 20220414057Abstract: Disclosed are embodiments for replacing database table join keys with index keys. In one embodiment, a method is disclosed comprising: receiving, by a processor, annotation data, the annotation data comprising a set of rows; retrieving, by the processor, a root dataset, the root dataset stored in one or more files; generating, by the processor, a row identifier for each row in the set of rows, the row identifier storing a plurality of fields enabling alignment of a respective row in the annotation data to a corresponding row in the root dataset; generating, by the processor, an annotation dataset, the annotation dataset comprising the set of rows and corresponding row identifiers; and writing, by the processor, the annotation dataset to at least one file, the at least one file separate from the one or more files.Type: ApplicationFiled: August 29, 2022Publication date: December 29, 2022Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN
-
Patent number: 11507554Abstract: Disclosed are embodiments for generating metadata files for composite datasets. In one embodiment, a method is disclosed comprising generating a tree representing a plurality of datasets; parsing the tree into an algebraic representation of the tree; identifying a plurality of terms in the algebraic representation, each term in the terms comprising at least two factors, each of the two factors associated with a dataset in the plurality of datasets; generating a metadata object of the plurality of terms; serializing the metadata object to generate serialized terms; and storing the serialized terms in a metadata file associated with the plurality of datasets.Type: GrantFiled: December 26, 2019Date of Patent: November 22, 2022Assignee: YAHOO ASSETS LLCInventors: George Aleksandrovich, Allie K. Watfa, Robin Sahner, Mike Pippin
-
Patent number: 11429561Abstract: Disclosed are embodiments for replacing database table join keys with index keys. In one embodiment, a method is disclosed comprising: receiving, by a processor, annotation data, the annotation data comprising a set of rows; retrieving, by the processor, a root dataset, the root dataset stored in one or more files; generating, by the processor, a row identifier for each row in the set of rows, the row identifier storing a plurality of fields enabling alignment of a respective row in the annotation data to a corresponding row in the root dataset; generating, by the processor, an annotation dataset, the annotation dataset comprising the set of rows and corresponding row identifiers; and writing, by the processor, the annotation dataset to at least one file, the at least one file separate from the one or more files.Type: GrantFiled: December 26, 2019Date of Patent: August 30, 2022Assignee: YAHOO ASSETS LLCInventors: George Aleksandrovich, Allie K. Watfa, Robin Sahner, Mike Pippin
-
Publication number: 20210200715Abstract: Disclosed are embodiments for replacing database table join keys with index keys. In one embodiment, a method is disclosed comprising: receiving, by a processor, annotation data, the annotation data comprising a set of rows; retrieving, by the processor, a root dataset, the root dataset stored in one or more files; generating, by the processor, a row identifier for each row in the set of rows, the row identifier storing a plurality of fields enabling alignment of a respective row in the annotation data to a corresponding row in the root dataset; generating, by the processor, an annotation dataset, the annotation dataset comprising the set of rows and corresponding row identifiers; and writing, by the processor, the annotation dataset to at least one file, the at least one file separate from the one or more files.Type: ApplicationFiled: December 26, 2019Publication date: July 1, 2021Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN
-
Publication number: 20210200732Abstract: Disclosed are embodiments for generating metadata files for composite datasets. In one embodiment, a method is disclosed comprising generating a tree representing a plurality of datasets; parsing the tree into an algebraic representation of the tree; identifying a plurality of terms in the algebraic representation, each term in the terms comprising at least two factors, each of the two factors associated with a dataset in the plurality of datasets; generating a metadata object of the plurality of terms; serializing the metadata object to generate serialized terms; and storing the serialized terms in a metadata file associated with the plurality of datasets.Type: ApplicationFiled: December 26, 2019Publication date: July 1, 2021Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN
-
Publication number: 20210200731Abstract: Disclosed are embodiments for horizontally skimming composite datasets.Type: ApplicationFiled: December 26, 2019Publication date: July 1, 2021Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN
-
Publication number: 20210200717Abstract: Disclosed are embodiments for generating a dataset metadata file based on partial metadata files. In one embodiment, a method is disclosed comprising receiving data to write to disk, the data comprising a subset of a dataset; writing a first portion of the data to disk; detecting a split boundary after writing the first portion; recording metadata describing the split boundary; continuing to write a remaining portion of the data to disk; and after completing the writing of the data to disk: generating a partial metadata file for the data, the partial metadata file including the split boundary, and transmitting the partial metadata to a partial metadata collector.Type: ApplicationFiled: December 26, 2019Publication date: July 1, 2021Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN
-
Publication number: 20210200747Abstract: Disclosed embodiments are methods, apparatuses, and computer-readable media for annotating distributed data without redundant data copying. In one embodiment, a method is disclosed comprising reading a raw dataset, the raw dataset comprising a first set of columns and a first set of rows; generating an annotation dataset, the annotation dataset comprising a second set of columns and a second set of rows; assigning row identifiers to each row in the second set of rows, the row identifiers aligning the second set of rows with the first set of rows based on the underlying storage of the raw dataset and annotation dataset; and writing the annotation dataset to a distributed storage medium.Type: ApplicationFiled: December 26, 2019Publication date: July 1, 2021Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN
-
Publication number: 20210200512Abstract: Disclosed are embodiments for sorting rows of a dataset after a JOIN operation. In one embodiment, a method is disclosed comprising performing a JOIN operation on an annotation dataset, the performing of the JOIN operation generating an unordered dataset; grouping a plurality of rows in the unordered dataset into a plurality of buckets, the grouping performed based on a root dataset associated with the annotation dataset; sorting each bucket, the sorting comprising sorting each bucket independently; and combining each sorted bucket into a sorted dataset.Type: ApplicationFiled: December 26, 2019Publication date: July 1, 2021Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN