Patents by Inventor Michael Malak

Michael Malak has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11941018
    Abstract: Techniques for generated regular expressions are disclosed. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence. A negative example may be used to generate the regular expression. Context from the negative example may be determined in order to generate the regular expression.
    Type: Grant
    Filed: June 17, 2020
    Date of Patent: March 26, 2024
    Assignee: Oracle International Corporation
    Inventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
  • Patent number: 11797582
    Abstract: Disclosed herein are techniques related to automated generation of regular expressions. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence.
    Type: Grant
    Filed: June 11, 2019
    Date of Patent: October 24, 2023
    Assignee: Oracle International Corporation
    Inventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
  • Patent number: 11755630
    Abstract: Disclosed herein are techniques related to automated generation of regular expressions. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence.
    Type: Grant
    Filed: April 1, 2022
    Date of Patent: September 12, 2023
    Assignee: Oracle International Corporation
    Inventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
  • Patent number: 11704321
    Abstract: The present disclosure related to techniques for analyzing data from multiple different data sources to determine a relationship between the data (also referred to herein a “data relationship discovery”). The relationships between any two compared datasets may be used to determine one or more recommendations for merging (e.g., joining), or “blending,” the data sets together. Relationship discovery may include determining a relationship between a subset of data, such as a relationship between a pair of columns, or column pair, each column in a different dataset of the datasets that are compared. Given two datasets to process for relationship discovery, relationship discovery may identify and recommends a ranked subset of column pairs between two compared datasets. The ranked column pairs identified as a relationship may be useful for blending the datasets with respect to those column pairs.
    Type: Grant
    Filed: March 23, 2020
    Date of Patent: July 18, 2023
    Assignee: Oracle International Corporation
    Inventors: Robert James Oberbreckling, Luis E. Rivas, Michael Malak, Glenn Allen Murray
  • Patent number: 11694029
    Abstract: Techniques are provided for identifying attributes associated with a neologism or an unknown word or name. Real world characteristics can be predicted for the neologism. Trigrams are identified for an input word and word embedding model vector values are calculated for the identified trigrams and entered into a matrix. Trigrams are identified for nearest names. Classification values are calculated based on the trigrams for the input word and the trigrams from the nearest names and the classification values are entered into the matrix. A convolutional neural network can process the matrix to identify one or more characteristics associated with the neologism.
    Type: Grant
    Filed: August 4, 2020
    Date of Patent: July 4, 2023
    Assignee: Oracle International Corporation
    Inventors: Michael Malak, Luis E. Rivas, Mark Lee Kreider
  • Publication number: 20230057706
    Abstract: In accordance with an embodiment, described herein is a system and method for use of text analytics to transform, analyze, and visualize data, including support for data flows of unstructured text or other types of textual data input. Additionally described are various examples of algorithmic processes and user interfaces that can be used to enable text analytics in particular environments or use cases. In accordance with an embodiment, the system can be implemented within a cloud environment that enables self-service text analytics. A user, for example an organizational business user who may not be expert in the use of machine learning as applied to data processing, can interact with the system via a user interface, to apply natural language processing or other text analysis techniques to a data flow or set of input data, to generate visualizations or other types of useful information associated with the data.
    Type: Application
    Filed: August 20, 2021
    Publication date: February 23, 2023
    Inventors: MICHAEL MALAK, MANISHA GUPTA, NIKHIL SURVE, CHAOHUI YU, LUIS E. RIVAS, LUIS RAMIREZ, DOUGLAS SAVOLAINEN
  • Patent number: 11580166
    Abstract: Techniques for generated regular expressions are disclosed. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence. Alignment of span data structures may be performed when generating the regular expression.
    Type: Grant
    Filed: June 17, 2020
    Date of Patent: February 14, 2023
    Assignee: Oracle International Corporation
    Inventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
  • Publication number: 20220261426
    Abstract: Disclosed herein are techniques related to automated generation of regular expressions. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence.
    Type: Application
    Filed: April 1, 2022
    Publication date: August 18, 2022
    Applicant: Oracle International Corporation
    Inventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
  • Patent number: 11417131
    Abstract: Techniques are provided for performing sentiment analysis on words in a first data set. An example embodiment includes generating a word embedding model including a first plurality of features. A value indicating sentiment for the words in the first data set can be determined using a convolutional neural network (CNN). A second plurality of features are generated based on bigrams identified in the data set. The bigrams can be generated using a co-occurrence graph. The model is updated to include the second plurality of features, and sentiment analysis can be performed on a second data set using the updated model.
    Type: Grant
    Filed: August 28, 2020
    Date of Patent: August 16, 2022
    Assignee: Oracle International Corporation
    Inventors: Michael Malak, Mark L. Kreider
  • Patent number: 11379506
    Abstract: The present disclosure relates to performing similarity metric analysis and data enrichment using knowledge sources. A data enrichment service can compare an input data set to reference data sets stored in a knowledge source to identify similarly related data. A similarity metric can be calculated corresponding to the semantic similarity of two or more datasets. The similarity metric can be used to identify datasets based on their metadata attributes and data values enabling easier indexing and high performance retrieval of data values. A input data set can labeled with a category based on the data set having the best match with the input data set. The similarity of an input data set with a data set provided by a knowledge source can be used to query a knowledge source to obtain additional information about the data set. The additional information can be used to provide recommendations to the user.
    Type: Grant
    Filed: December 31, 2018
    Date of Patent: July 5, 2022
    Assignee: Oracle International Corporation
    Inventors: Alexander Sasha Stojanovic, Mark Kreider, Michael Malak, Glenn Allen Murray
  • Patent number: 11354305
    Abstract: Techniques for generated regular expressions are disclosed. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence. Generation of the regular expressions can be implemented on an interactive user interface. Commands can be applied to the one or more character sequences and regular expressions are generated based on the applied commands.
    Type: Grant
    Filed: June 17, 2020
    Date of Patent: June 7, 2022
    Assignee: Oracle International Corporation
    Inventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
  • Patent number: 11347779
    Abstract: Disclosed herein are techniques related to automated generation of regular expressions. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence.
    Type: Grant
    Filed: June 11, 2019
    Date of Patent: May 31, 2022
    Assignee: Oracle International Corporation
    Inventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
  • Patent number: 11321368
    Abstract: Disclosed herein are techniques related to automated generation of regular expressions. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence.
    Type: Grant
    Filed: June 11, 2019
    Date of Patent: May 3, 2022
    Assignee: Oracle International Corporation
    Inventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
  • Patent number: 11269934
    Abstract: Disclosed herein are techniques related to automated generation of regular expressions. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence.
    Type: Grant
    Filed: June 11, 2019
    Date of Patent: March 8, 2022
    Assignee: Oracle International Corporation
    Inventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
  • Patent number: 11263247
    Abstract: Disclosed herein are techniques related to automated generation of regular expressions. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence.
    Type: Grant
    Filed: June 11, 2019
    Date of Patent: March 1, 2022
    Assignee: Oracle International Corporation
    Inventors: Michael Malak, Luis E. Rivas, Mark L. Kreider
  • Patent number: 11120086
    Abstract: Techniques are disclosed for toponym disambiguation. Toponym disambiguation can be performed for a set of geographic location data, such as placenames. A subset of the data and additional location information associated with the subset of the data can be initially determined. The remaining geographic location data in the set of geographic location data can be scored in order to determined additional location information for the remaining geographic location data. Additional location information for the remaining geographic location data can be determined based on calculated scores.
    Type: Grant
    Filed: February 11, 2019
    Date of Patent: September 14, 2021
    Assignee: Oracle International Corporation
    Inventors: Luis E. Rivas, Michael Malak, Mark L. Kreider
  • Publication number: 20210056264
    Abstract: Techniques are provided for identifying attributes associated with a neologism or an unknown word or name. Real world characteristics can be predicted for the neologism. Trigrams are identified for an input word and word embedding model vector values are calculated for the identified trigrams and entered into a matrix. Trigrams are identified for nearest names. Classification values are calculated based on the trigrams for the input word and the trigrams from the nearest names and the classification values are entered into the matrix. A convolutional neural network can process the matrix to identify one or more characteristics associated with the neologism.
    Type: Application
    Filed: August 4, 2020
    Publication date: February 25, 2021
    Applicant: ORACLE INTERNATIONAL CORPORATION
    Inventors: Michael Malak, Luis E. Rivas, Mark Lee Kreider
  • Patent number: 10885056
    Abstract: Techniques are disclosed for standardization of data. According to a first technique, standard representation terms are determined for to-be-standardized data using the to-be-standardized data itself and without using any external reference data. According to a second technique, a combination of the to-be-standardized data and an external reference is used to determine standard representation terms for the to-be-standardized data.
    Type: Grant
    Filed: September 25, 2018
    Date of Patent: January 5, 2021
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: Michael Malak, Luis E. Rivas, Mark L. Kreider, Philip Ogren, Robert James Oberbreckling
  • Publication number: 20200394478
    Abstract: Techniques are provided for performing sentiment analysis on words in a first data set. An example embodiment includes generating a word embedding model including a first plurality of features. A value indicating sentiment for the words in the first data set can be determined using a convolutional neural network (CNN). A second plurality of features are generated based on bigrams identified in the data set. The bigrams can be generated using a co-occurrence graph. The model is updated to include the second plurality of features, and sentiment analysis can be performed on a second data set using the updated model.
    Type: Application
    Filed: August 28, 2020
    Publication date: December 17, 2020
    Applicant: Oracle International Corporation
    Inventors: Michael Malak, Mark L. Kreider
  • Patent number: 10810472
    Abstract: Techniques are provided for performing sentiment analysis on words in a first data set. An example embodiment includes generating a word embedding model including a first plurality of features. A value indicating sentiment for the words in the first data set can be determined using a convolutional neural network (CNN). A second plurality of features are generated based on bigrams identified in the data set. The bigrams can be generated using a co-occurrence graph. The model is updated to include the second plurality of features, and sentiment analysis can be performed on a second data set using the updated model.
    Type: Grant
    Filed: May 10, 2018
    Date of Patent: October 20, 2020
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: Michael Malak, Mark L. Kreider