Patents by Inventor Mike PIPPIN

Mike PIPPIN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Horizontal skimming of composite datasets

Patent number: 12019601

Abstract: Disclosed are embodiments for horizontally skimming composite datasets. In one embodiment, a method is disclosed comprising receiving a script, the script including commands to access a composite dataset; pre-processing the script to identify a set of columns; loading a metadata file associated with the composite dataset file; parsing the metadata file to identify one or more datasets that include a column in the set of columns; loading data from the one or more datasets; and executing the script on the one or more datasets.

Type: Grant

Filed: December 26, 2019

Date of Patent: June 25, 2024

Assignee: YAHOO ASSETS LLC

Inventors: George Aleksandrovich, Allie K. Watfa, Robin Sahner, Mike Pippin
Sorting unsorted rows of a composite dataset after a join operation

Patent number: 11947927

Abstract: Disclosed are embodiments for sorting rows of a dataset after a JOIN operation. In one embodiment, a method is disclosed comprising performing a JOIN operation on an annotation dataset, the performing of the JOIN operation generating an unordered dataset; grouping a plurality of rows in the unordered dataset into a plurality of buckets, the grouping performed based on a root dataset associated with the annotation dataset; sorting each bucket, the sorting comprising sorting each bucket independently; and combining each sorted bucket into a sorted dataset.

Type: Grant

Filed: December 26, 2019

Date of Patent: April 2, 2024

Assignee: YAHOO ASSETS LLC

Inventors: George Aleksandrovich, Allie K. Watfa, Robin Sahner, Mike Pippin
Tree-like metadata structure for composite datasets

Patent number: 11809396

Abstract: Disclosed are embodiments for generating metadata files for composite datasets. In one embodiment, a method is disclosed comprising generating a tree representing a plurality of datasets; parsing the tree into an algebraic representation of the tree; identifying a plurality of terms in the algebraic representation, each term in the terms comprising at least two factors, each of the two factors associated with a dataset in the plurality of datasets; generating a metadata object of the plurality of terms; serializing the metadata object to generate serialized terms; and storing the serialized terms in a metadata file associated with the plurality of datasets.

Type: Grant

Filed: November 21, 2022

Date of Patent: November 7, 2023

Assignee: YAHOO ASSETS LLC

Inventors: George Aleksandrovich, Allie K. Watfa, Robin Sahner, Mike Pippin
ANNOTATING DATASETS WITHOUT REDUNDANT COPYING

Publication number: 20230259506

Abstract: Disclosed embodiments are methods, apparatuses, and computer-readable media for annotating distributed data without redundant data copying. In one embodiment, a method is disclosed comprising reading a raw dataset, the raw dataset comprising a first set of columns and a first set of rows; generating an annotation dataset, the annotation dataset comprising a second set of columns and a second set of rows; assigning row identifiers to each row in the second set of rows, the row identifiers aligning the second set of rows with the first set of rows based on the underlying storage of the raw dataset and annotation dataset; and writing the annotation dataset to a distributed storage medium.

Type: Application

Filed: April 21, 2023

Publication date: August 17, 2023

Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN
ANNOTATING DATASETS WITHOUT REDUNDANT COPYING

Publication number: 20230252021

Abstract: Disclosed embodiments are methods, apparatuses, and computer-readable media for annotating distributed data without redundant data copying. In one embodiment, a method is disclosed comprising reading a raw dataset, the raw dataset comprising a first set of columns and a first set of rows; generating an annotation dataset, the annotation dataset comprising a second set of columns and a second set of rows; assigning row identifiers to each row in the second set of rows, the row identifiers aligning the second set of rows with the first set of rows based on the underlying storage of the raw dataset and annotation dataset; and writing the annotation dataset to a distributed storage medium.

Type: Application

Filed: April 21, 2023

Publication date: August 10, 2023

Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN
Replacing database table join keys with index keys

Patent number: 11663162

Abstract: Disclosed are embodiments for replacing database table join keys with index keys. In one embodiment, a method is disclosed comprising: receiving, by a processor, annotation data, the annotation data comprising a set of rows; retrieving, by the processor, a root dataset, the root dataset stored in one or more files; generating, by the processor, a row identifier for each row in the set of rows, the row identifier storing a plurality of fields enabling alignment of a respective row in the annotation data to a corresponding row in the root dataset; generating, by the processor, an annotation dataset, the annotation dataset comprising the set of rows and corresponding row identifiers; and writing, by the processor, the annotation dataset to at least one file, the at least one file separate from the one or more files.

Type: Grant

Filed: August 29, 2022

Date of Patent: May 30, 2023

Assignee: YAHOO ASSETS LLC

Inventors: George Aleksandrovich, Allie K. Watfa, Robin Sahner, Mike Pippin
Annotating datasets without redundant copying

Patent number: 11650977

Abstract: Disclosed embodiments are methods, apparatuses, and computer-readable media for annotating distributed data without redundant data copying. In one embodiment, a method is disclosed comprising reading a raw dataset, the raw dataset comprising a first set of columns and a first set of rows; generating an annotation dataset, the annotation dataset comprising a second set of columns and a second set of rows; assigning row identifiers to each row in the second set of rows, the row identifiers aligning the second set of rows with the first set of rows based on the underlying storage of the raw dataset and annotation dataset; and writing the annotation dataset to a distributed storage medium.

Type: Grant

Filed: December 26, 2019

Date of Patent: May 16, 2023

Assignee: YAHOO ASSETS LLC

Inventors: George Aleksandrovich, Allie K. Watfa, Robin Sahner, Mike Pippin
TREE-LIKE METADATA STRUCTURE FOR COMPOSITE DATASETS

Publication number: 20230086741

Abstract: Disclosed are embodiments for generating metadata files for composite datasets. In one embodiment, a method is disclosed comprising generating a tree representing a plurality of datasets; parsing the tree into an algebraic representation of the tree; identifying a plurality of terms in the algebraic representation, each term in the terms comprising at least two factors, each of the two factors associated with a dataset in the plurality of datasets; generating a metadata object of the plurality of terms; serializing the metadata object to generate serialized terms; and storing the serialized terms in a metadata file associated with the plurality of datasets.

Type: Application

Filed: November 21, 2022

Publication date: March 23, 2023

Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN
REPLACING DATABASE TABLE JOIN KEYS WITH INDEX KEYS

Publication number: 20220414057

Abstract: Disclosed are embodiments for replacing database table join keys with index keys. In one embodiment, a method is disclosed comprising: receiving, by a processor, annotation data, the annotation data comprising a set of rows; retrieving, by the processor, a root dataset, the root dataset stored in one or more files; generating, by the processor, a row identifier for each row in the set of rows, the row identifier storing a plurality of fields enabling alignment of a respective row in the annotation data to a corresponding row in the root dataset; generating, by the processor, an annotation dataset, the annotation dataset comprising the set of rows and corresponding row identifiers; and writing, by the processor, the annotation dataset to at least one file, the at least one file separate from the one or more files.

Type: Application

Filed: August 29, 2022

Publication date: December 29, 2022

Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN
Tree-like metadata structure for composite datasets

Patent number: 11507554

Abstract: Disclosed are embodiments for generating metadata files for composite datasets. In one embodiment, a method is disclosed comprising generating a tree representing a plurality of datasets; parsing the tree into an algebraic representation of the tree; identifying a plurality of terms in the algebraic representation, each term in the terms comprising at least two factors, each of the two factors associated with a dataset in the plurality of datasets; generating a metadata object of the plurality of terms; serializing the metadata object to generate serialized terms; and storing the serialized terms in a metadata file associated with the plurality of datasets.

Type: Grant

Filed: December 26, 2019

Date of Patent: November 22, 2022

Assignee: YAHOO ASSETS LLC

Inventors: George Aleksandrovich, Allie K. Watfa, Robin Sahner, Mike Pippin
Replacing database table join keys with index keys

Patent number: 11429561

Abstract: Disclosed are embodiments for replacing database table join keys with index keys. In one embodiment, a method is disclosed comprising: receiving, by a processor, annotation data, the annotation data comprising a set of rows; retrieving, by the processor, a root dataset, the root dataset stored in one or more files; generating, by the processor, a row identifier for each row in the set of rows, the row identifier storing a plurality of fields enabling alignment of a respective row in the annotation data to a corresponding row in the root dataset; generating, by the processor, an annotation dataset, the annotation dataset comprising the set of rows and corresponding row identifiers; and writing, by the processor, the annotation dataset to at least one file, the at least one file separate from the one or more files.

Type: Grant

Filed: December 26, 2019

Date of Patent: August 30, 2022

Assignee: YAHOO ASSETS LLC

Inventors: George Aleksandrovich, Allie K. Watfa, Robin Sahner, Mike Pippin
REPLACING DATABASE TABLE JOIN KEYS WITH INDEX KEYS

Publication number: 20210200715

Abstract: Disclosed are embodiments for replacing database table join keys with index keys. In one embodiment, a method is disclosed comprising: receiving, by a processor, annotation data, the annotation data comprising a set of rows; retrieving, by the processor, a root dataset, the root dataset stored in one or more files; generating, by the processor, a row identifier for each row in the set of rows, the row identifier storing a plurality of fields enabling alignment of a respective row in the annotation data to a corresponding row in the root dataset; generating, by the processor, an annotation dataset, the annotation dataset comprising the set of rows and corresponding row identifiers; and writing, by the processor, the annotation dataset to at least one file, the at least one file separate from the one or more files.

Type: Application

Filed: December 26, 2019

Publication date: July 1, 2021

Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN
ANNOTATING DATASETS WITHOUT REDUNDANT COPYING

Publication number: 20210200747

Abstract: Disclosed embodiments are methods, apparatuses, and computer-readable media for annotating distributed data without redundant data copying. In one embodiment, a method is disclosed comprising reading a raw dataset, the raw dataset comprising a first set of columns and a first set of rows; generating an annotation dataset, the annotation dataset comprising a second set of columns and a second set of rows; assigning row identifiers to each row in the second set of rows, the row identifiers aligning the second set of rows with the first set of rows based on the underlying storage of the raw dataset and annotation dataset; and writing the annotation dataset to a distributed storage medium.

Type: Application

Filed: December 26, 2019

Publication date: July 1, 2021

Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN
TREE-LIKE METADATA STRUCTURE FOR COMPOSITE DATASETS

Publication number: 20210200732

Abstract: Disclosed are embodiments for generating metadata files for composite datasets. In one embodiment, a method is disclosed comprising generating a tree representing a plurality of datasets; parsing the tree into an algebraic representation of the tree; identifying a plurality of terms in the algebraic representation, each term in the terms comprising at least two factors, each of the two factors associated with a dataset in the plurality of datasets; generating a metadata object of the plurality of terms; serializing the metadata object to generate serialized terms; and storing the serialized terms in a metadata file associated with the plurality of datasets.

Type: Application

Filed: December 26, 2019

Publication date: July 1, 2021

Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN
GENERATING FULL METADATA FROM PARTIAL DISTRIBUTED METADATA

Publication number: 20210200717

Abstract: Disclosed are embodiments for generating a dataset metadata file based on partial metadata files. In one embodiment, a method is disclosed comprising receiving data to write to disk, the data comprising a subset of a dataset; writing a first portion of the data to disk; detecting a split boundary after writing the first portion; recording metadata describing the split boundary; continuing to write a remaining portion of the data to disk; and after completing the writing of the data to disk: generating a partial metadata file for the data, the partial metadata file including the split boundary, and transmitting the partial metadata to a partial metadata collector.

Type: Application

Filed: December 26, 2019

Publication date: July 1, 2021

Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN
HORIZONTAL SKIMMING OF COMPOSITE DATASETS

Publication number: 20210200731

Abstract: Disclosed are embodiments for horizontally skimming composite datasets.

Type: Application

Filed: December 26, 2019

Publication date: July 1, 2021

Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN
SORTING UNSORTED ROWS OF A COMPOSITE DATASET AFTER A JOIN OPERATION

Publication number: 20210200512

Abstract: Disclosed are embodiments for sorting rows of a dataset after a JOIN operation. In one embodiment, a method is disclosed comprising performing a JOIN operation on an annotation dataset, the performing of the JOIN operation generating an unordered dataset; grouping a plurality of rows in the unordered dataset into a plurality of buckets, the grouping performed based on a root dataset associated with the annotation dataset; sorting each bucket, the sorting comprising sorting each bucket independently; and combining each sorted bucket into a sorted dataset.

Type: Application

Filed: December 26, 2019

Publication date: July 1, 2021

Inventors: George ALEKSANDROVICH, Allie K. WATFA, Robin SAHNER, Mike PIPPIN