Patents by Inventor Michael Malak
Michael Malak has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11941018Abstract: Techniques for generated regular expressions are disclosed. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence. A negative example may be used to generate the regular expression. Context from the negative example may be determined in order to generate the regular expression.Type: GrantFiled: June 17, 2020Date of Patent: March 26, 2024Assignee: Oracle International CorporationInventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
-
Patent number: 11797582Abstract: Disclosed herein are techniques related to automated generation of regular expressions. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence.Type: GrantFiled: June 11, 2019Date of Patent: October 24, 2023Assignee: Oracle International CorporationInventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
-
Patent number: 11755630Abstract: Disclosed herein are techniques related to automated generation of regular expressions. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence.Type: GrantFiled: April 1, 2022Date of Patent: September 12, 2023Assignee: Oracle International CorporationInventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
-
Patent number: 11704321Abstract: The present disclosure related to techniques for analyzing data from multiple different data sources to determine a relationship between the data (also referred to herein a “data relationship discovery”). The relationships between any two compared datasets may be used to determine one or more recommendations for merging (e.g., joining), or “blending,” the data sets together. Relationship discovery may include determining a relationship between a subset of data, such as a relationship between a pair of columns, or column pair, each column in a different dataset of the datasets that are compared. Given two datasets to process for relationship discovery, relationship discovery may identify and recommends a ranked subset of column pairs between two compared datasets. The ranked column pairs identified as a relationship may be useful for blending the datasets with respect to those column pairs.Type: GrantFiled: March 23, 2020Date of Patent: July 18, 2023Assignee: Oracle International CorporationInventors: Robert James Oberbreckling, Luis E. Rivas, Michael Malak, Glenn Allen Murray
-
Patent number: 11694029Abstract: Techniques are provided for identifying attributes associated with a neologism or an unknown word or name. Real world characteristics can be predicted for the neologism. Trigrams are identified for an input word and word embedding model vector values are calculated for the identified trigrams and entered into a matrix. Trigrams are identified for nearest names. Classification values are calculated based on the trigrams for the input word and the trigrams from the nearest names and the classification values are entered into the matrix. A convolutional neural network can process the matrix to identify one or more characteristics associated with the neologism.Type: GrantFiled: August 4, 2020Date of Patent: July 4, 2023Assignee: Oracle International CorporationInventors: Michael Malak, Luis E. Rivas, Mark Lee Kreider
-
Publication number: 20230057706Abstract: In accordance with an embodiment, described herein is a system and method for use of text analytics to transform, analyze, and visualize data, including support for data flows of unstructured text or other types of textual data input. Additionally described are various examples of algorithmic processes and user interfaces that can be used to enable text analytics in particular environments or use cases. In accordance with an embodiment, the system can be implemented within a cloud environment that enables self-service text analytics. A user, for example an organizational business user who may not be expert in the use of machine learning as applied to data processing, can interact with the system via a user interface, to apply natural language processing or other text analysis techniques to a data flow or set of input data, to generate visualizations or other types of useful information associated with the data.Type: ApplicationFiled: August 20, 2021Publication date: February 23, 2023Inventors: MICHAEL MALAK, MANISHA GUPTA, NIKHIL SURVE, CHAOHUI YU, LUIS E. RIVAS, LUIS RAMIREZ, DOUGLAS SAVOLAINEN
-
Patent number: 11580166Abstract: Techniques for generated regular expressions are disclosed. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence. Alignment of span data structures may be performed when generating the regular expression.Type: GrantFiled: June 17, 2020Date of Patent: February 14, 2023Assignee: Oracle International CorporationInventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
-
Publication number: 20220261426Abstract: Disclosed herein are techniques related to automated generation of regular expressions. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence.Type: ApplicationFiled: April 1, 2022Publication date: August 18, 2022Applicant: Oracle International CorporationInventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
-
Patent number: 11417131Abstract: Techniques are provided for performing sentiment analysis on words in a first data set. An example embodiment includes generating a word embedding model including a first plurality of features. A value indicating sentiment for the words in the first data set can be determined using a convolutional neural network (CNN). A second plurality of features are generated based on bigrams identified in the data set. The bigrams can be generated using a co-occurrence graph. The model is updated to include the second plurality of features, and sentiment analysis can be performed on a second data set using the updated model.Type: GrantFiled: August 28, 2020Date of Patent: August 16, 2022Assignee: Oracle International CorporationInventors: Michael Malak, Mark L. Kreider
-
Patent number: 11379506Abstract: The present disclosure relates to performing similarity metric analysis and data enrichment using knowledge sources. A data enrichment service can compare an input data set to reference data sets stored in a knowledge source to identify similarly related data. A similarity metric can be calculated corresponding to the semantic similarity of two or more datasets. The similarity metric can be used to identify datasets based on their metadata attributes and data values enabling easier indexing and high performance retrieval of data values. A input data set can labeled with a category based on the data set having the best match with the input data set. The similarity of an input data set with a data set provided by a knowledge source can be used to query a knowledge source to obtain additional information about the data set. The additional information can be used to provide recommendations to the user.Type: GrantFiled: December 31, 2018Date of Patent: July 5, 2022Assignee: Oracle International CorporationInventors: Alexander Sasha Stojanovic, Mark Kreider, Michael Malak, Glenn Allen Murray
-
Patent number: 11354305Abstract: Techniques for generated regular expressions are disclosed. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence. Generation of the regular expressions can be implemented on an interactive user interface. Commands can be applied to the one or more character sequences and regular expressions are generated based on the applied commands.Type: GrantFiled: June 17, 2020Date of Patent: June 7, 2022Assignee: Oracle International CorporationInventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
-
Patent number: 11347779Abstract: Disclosed herein are techniques related to automated generation of regular expressions. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence.Type: GrantFiled: June 11, 2019Date of Patent: May 31, 2022Assignee: Oracle International CorporationInventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
-
Patent number: 11321368Abstract: Disclosed herein are techniques related to automated generation of regular expressions. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence.Type: GrantFiled: June 11, 2019Date of Patent: May 3, 2022Assignee: Oracle International CorporationInventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
-
Patent number: 11269934Abstract: Disclosed herein are techniques related to automated generation of regular expressions. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence.Type: GrantFiled: June 11, 2019Date of Patent: March 8, 2022Assignee: Oracle International CorporationInventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
-
Patent number: 11263247Abstract: Disclosed herein are techniques related to automated generation of regular expressions. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence.Type: GrantFiled: June 11, 2019Date of Patent: March 1, 2022Assignee: Oracle International CorporationInventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
-
Patent number: 11120086Abstract: Techniques are disclosed for toponym disambiguation. Toponym disambiguation can be performed for a set of geographic location data, such as placenames. A subset of the data and additional location information associated with the subset of the data can be initially determined. The remaining geographic location data in the set of geographic location data can be scored in order to determined additional location information for the remaining geographic location data. Additional location information for the remaining geographic location data can be determined based on calculated scores.Type: GrantFiled: February 11, 2019Date of Patent: September 14, 2021Assignee: Oracle International CorporationInventors: Luis E. Rivas, Michael Malak, Mark L. Kreider
-
Publication number: 20210056264Abstract: Techniques are provided for identifying attributes associated with a neologism or an unknown word or name. Real world characteristics can be predicted for the neologism. Trigrams are identified for an input word and word embedding model vector values are calculated for the identified trigrams and entered into a matrix. Trigrams are identified for nearest names. Classification values are calculated based on the trigrams for the input word and the trigrams from the nearest names and the classification values are entered into the matrix. A convolutional neural network can process the matrix to identify one or more characteristics associated with the neologism.Type: ApplicationFiled: August 4, 2020Publication date: February 25, 2021Applicant: ORACLE INTERNATIONAL CORPORATIONInventors: Michael Malak, Luis E. Rivas, Mark Lee Kreider
-
Patent number: 10885056Abstract: Techniques are disclosed for standardization of data. According to a first technique, standard representation terms are determined for to-be-standardized data using the to-be-standardized data itself and without using any external reference data. According to a second technique, a combination of the to-be-standardized data and an external reference is used to determine standard representation terms for the to-be-standardized data.Type: GrantFiled: September 25, 2018Date of Patent: January 5, 2021Assignee: ORACLE INTERNATIONAL CORPORATIONInventors: Michael Malak, Luis E. Rivas, Mark L. Kreider, Philip Ogren, Robert James Oberbreckling
-
Publication number: 20200394478Abstract: Techniques are provided for performing sentiment analysis on words in a first data set. An example embodiment includes generating a word embedding model including a first plurality of features. A value indicating sentiment for the words in the first data set can be determined using a convolutional neural network (CNN). A second plurality of features are generated based on bigrams identified in the data set. The bigrams can be generated using a co-occurrence graph. The model is updated to include the second plurality of features, and sentiment analysis can be performed on a second data set using the updated model.Type: ApplicationFiled: August 28, 2020Publication date: December 17, 2020Applicant: Oracle International CorporationInventors: Michael Malak, Mark L. Kreider
-
Patent number: 10810472Abstract: Techniques are provided for performing sentiment analysis on words in a first data set. An example embodiment includes generating a word embedding model including a first plurality of features. A value indicating sentiment for the words in the first data set can be determined using a convolutional neural network (CNN). A second plurality of features are generated based on bigrams identified in the data set. The bigrams can be generated using a co-occurrence graph. The model is updated to include the second plurality of features, and sentiment analysis can be performed on a second data set using the updated model.Type: GrantFiled: May 10, 2018Date of Patent: October 20, 2020Assignee: ORACLE INTERNATIONAL CORPORATIONInventors: Michael Malak, Mark L. Kreider