Patents by Inventor Philip Ogren

Philip Ogren has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

NON-LEXICALIZED FEATURES FOR LANGUAGE IDENTITY CLASSIFICATION USING SUBWORD TOKENIZATION

Publication number: 20220343072

Abstract: A natural language identity classifier system is described, which employs a supervised machine learning (ML) model to perform language identity classification on input text. The ML model takes, as input, non-lexicalized features of target text derived from subword tokenization of the text. Specifically, these non-lexicalized features are generated based on statistics determined for tokens identified for the input text. According to an embodiment, at least some of the non-lexicalized features are based on natural language-specific summary statistics that indicate how often tokens were found within a corpus for each natural language. Use of such summary statistics allows for generation of natural language specific conditional probability-based features.

Type: Application

Filed: April 22, 2021

Publication date: October 27, 2022

Inventor: Philip Ogren
Automated entity correlation and classification across heterogeneous datasets

Patent number: 10915233

Abstract: The present disclosure describes techniques for entity classification and data enrichment of data sets. A data enrichment system is disclosed that can extract, repair, and enrich datasets, resulting in more precise entity resolution and classification for purposes of subsequent indexing and clustering. Disclosed techniques may include performing entity recognition to identify segments of interest that relate to an entity. Related data may be analyzed for classification, which can be used to transform the data for enrichment to its users.

Type: Grant

Filed: September 24, 2015

Date of Patent: February 9, 2021

Assignee: ORACLE INTERNATIONAL CORPORATION

Inventors: Alexander Sasha Stojanovic, Philip Ogren, Kevin L. Markey, Mark Kreider
Declarative language and visualization system for recommended data transformations and repairs

Patent number: 10891272

Abstract: The present disclosure relates generally to a data enrichment service that extracts, repairs, and enriches datasets, resulting in more precise entity resolution and correlation for purposes of subsequent indexing and clustering. As the data enrichment service can include a visual recommendation engine and language for performing large-scale data preparation, repair, and enrichment of heterogeneous datasets. This enables the user to select and see how the recommended enrichments (e.g., transformations and repairs) will affect the user's data and make adjustments as needed. The data enrichment service can receive feedback from users through a user interface and can filter recommendations based on the user feedback.

Type: Grant

Filed: September 24, 2015

Date of Patent: January 12, 2021

Assignee: ORACLE INTERNATIONAL CORPORATION

Inventors: Alexander Sasha Stojanovic, Luis E. Rivas, Philip Ogren, Glenn Allen Murray
Data standardization techniques

Patent number: 10885056

Abstract: Techniques are disclosed for standardization of data. According to a first technique, standard representation terms are determined for to-be-standardized data using the to-be-standardized data itself and without using any external reference data. According to a second technique, a combination of the to-be-standardized data and an external reference is used to determine standard representation terms for the to-be-standardized data.

Type: Grant

Filed: September 25, 2018

Date of Patent: January 5, 2021

Assignee: ORACLE INTERNATIONAL CORPORATION

Inventors: Michael Malak, Luis E. Rivas, Mark L. Kreider, Philip Ogren, Robert James Oberbreckling
Scalable approach to information-theoretic string similarity using a guaranteed rank threshold

Patent number: 10482128

Abstract: A string analysis tool for calculating a similarity metric between an input string and a plurality of strings in a collection to be searched. The string analysis tool may include optimizations that may reduce the number of calculations to be carried out when calculating the similarity metric for large volumes of data. In this regard, the string analysis tool may represent strings as features. As such, analysis may be performed relative to features (e.g., of either the input string or plurality of strings to be searched) such that features from the strings may be eliminated from consideration when identifying candidate strings from the collection for which a similarity metric is to be calculated. The elimination of features may be based on a minimum similarity metric threshold, wherein features that are incapable of contributing to a similarity metric above the minimum similarity metric threshold are eliminated from consideration.

Type: Grant

Filed: May 15, 2017

Date of Patent: November 19, 2019

Assignee: ORACLE INTERNATIONAL CORPORATION

Inventor: Philip Ogren
DATA STANDARDIZATION TECHNIQUES

Publication number: 20190102441

Abstract: Techniques are disclosed for standardization of data. According to a first technique, standard representation terms are determined for to-be-standardized data using the to-be-standardized data itself and without using any external reference data. According to a second technique, a combination of the to-be-standardized data and an external reference is used to determine standard representation terms for the to-be-standardized data.

Type: Application

Filed: September 25, 2018

Publication date: April 4, 2019

Applicant: Oracle International Corporation

Inventors: Michael Malak, Luis E. Rivas, Mark L. Kreider, Philip Ogren, Robert James Oberbreckling
SCALABLE APPROACH TO INFORMATION-THEORETIC STRING SIMILARITY USING A GUARANTEED RANK THRESHOLD

Publication number: 20180330015

Abstract: A string analysis tool for calculating a similarity metric between an input string and a plurality of strings in a collection to be searched. The string analysis tool may include optimizations that may reduce the number of calculations to be carried out when calculating the similarity metric for large volumes of data. In this regard, the string analysis tool may represent strings as features. As such, analysis may be performed relative to features (e.g., of either the input string or plurality of strings to be searched) such that features from the strings may be eliminated from consideration when identifying candidate strings from the collection for which a similarity metric is to be calculated. The elimination of features may be based on a minimum similarity metric threshold, wherein features that are incapable of contributing to a similarity metric above the minimum similarity metric threshold are eliminated from consideration.

Type: Application

Filed: May 15, 2017

Publication date: November 15, 2018

Inventor: Philip Ogren
DECLARATIVE LANGUAGE AND VISUALIZATION SYSTEM FOR RECOMMENDED DATA TRANSFORMATIONS AND REPAIRS

Publication number: 20160092474

Abstract: The present disclosure relates generally to a data enrichment service that extracts, repairs, and enriches datasets, resulting in more precise entity resolution and correlation for purposes of subsequent indexing and clustering. As the data enrichment service can include a visual recommendation engine and language for performing large-scale data preparation, repair, and enrichment of heterogeneous datasets. This enables the user to select and see how the recommended enrichments (e.g., transformations and repairs) will affect the user's data and make adjustments as needed. The data enrichment service can receive feedback from users through a user interface and can filter recommendations based on the user feedback.

Type: Application

Filed: September 24, 2015

Publication date: March 31, 2016

Inventors: Alexander Sasha Stojanovic, Luis E. Rivas, Philip Ogren, Glenn Allen Murray
AUTOMATED ENTITY CORRELATION AND CLASSIFICATION ACROSS HETEROGENEOUS DATASETS

Publication number: 20160092475

Abstract: The present disclosure describes techniques for entity classification and data enrichment of data sets. A data enrichment system is disclosed that can extract, repair, and enrich datasets, resulting in more precise entity resolution and classification for purposes of subsequent indexing and clustering. Disclosed techniques may include performing entity recognition to identify segments of interest that relate to an entity. Related data may be analyzed for classification, which can be used to transform the data for enrichment to its users.

Type: Application

Filed: September 24, 2015

Publication date: March 31, 2016

Inventors: Alexander Sasha Stojanovic, Philip Ogren, Kevin L. Markey, Mark Kreider
Contextually blind data conversion using indexed string matching

Patent number: 9201869

Abstract: Computer-based tools and methods for conversion of data from a first form to a second form without reference to the context of data to be converted. The conversion may be facilitated by matching source data with external information (e.g., public and/or private schema) that contain rules (e.g., context specific rules) for conversion of the data. The matching may be performed based on an optimized index string matching technique that may be operable to match source data to external information that is context dependent without specific identification of the context of either the source data or the external information identified. Accordingly, the conversion of data may be performed in an unsupervised machine learning environment.

Type: Grant

Filed: May 21, 2013

Date of Patent: December 1, 2015

Assignee: Oracle International Corporation

Inventors: Philip Ogren, Luis Rivas, Edward A. Green
Scalable string matching as a component for unsupervised learning in semantic meta-model development

Patent number: 9070090

Abstract: A string analysis tool for calculating a similarity metric between a source string and a plurality of target strings. The string analysis tool may include optimizations that may reduce the number of calculations to be carried out when calculating the similarity metric for large volumes of data. In this regard, the string analysis tool may represent strings as features. As such, analysis may be performed relative to features (e.g., of either the source string or plurality of target strings) such that features from the strings may be eliminated from consideration when identifying target strings for which a similarity metric is to be calculated. The elimination of features may be based on a minimum similarity metric threshold, wherein features that are incapable of contributing to a similarity metric above the minimum similarity metric threshold are eliminated from consideration.

Type: Grant

Filed: August 28, 2012

Date of Patent: June 30, 2015

Assignee: Oracle International Corporation

Inventors: Philip Ogren, Luis Rivas, Edward A. Green
CONTEXTUALLY BLIND DATA CONVERSION USING INDEXED STRING MATCHING

Publication number: 20140067363

Abstract: Computer-based tools and methods for conversion of data from a first form to a second form without reference to the context of data to be converted. The conversion may be facilitated by matching source data with external information (e.g., public and/or private schema) that contain rules (e.g., context specific rules) for conversion of the data. The matching may be performed based on an optimized index string matching technique that may be operable to match source data to external information that is context dependent without specific identification of the context of either the source data or the external information identified. Accordingly, the conversion of data may be performed in an unsupervised machine learning environment.

Type: Application

Filed: May 21, 2013

Publication date: March 6, 2014

Inventors: Philip Ogren, Luis Rivas, Edward A. Green
SCALABLE STRING MATCHING AS A COMPONENT FOR UNSUPERVISED LEARNING IN SEMANTIC META-MODEL DEVELOPMENT

Publication number: 20140067728

Abstract: A string analysis tool for calculating a similarity metric between a source string and a plurality of target strings. The string analysis tool may include optimizations that may reduce the number of calculations to be carried out when calculating the similarity metric for large volumes of data. In this regard, the string analysis tool may represent strings as features. As such, analysis may be performed relative to features (e.g., of either the source string or plurality of target strings) such that features from the strings may be eliminated from consideration when identifying target strings for which a similarity metric is to be calculated. The elimination of features may be based on a minimum similarity metric threshold, wherein features that are incapable of contributing to a similarity metric above the minimum similarity metric threshold are eliminated from consideration.

Type: Application

Filed: August 28, 2012

Publication date: March 6, 2014

Applicant: ORACLE INTERNATIONAL CORPORATION

Inventors: Philip Ogren, Luis Rivas, Edward A. Green