Patents by Inventor Youngja Park

Youngja Park has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240152698
    Abstract: An enhanced system and method are provided for data-driven named entity type disambiguation of one or more disclosed embodiments. A system and a non-limiting computer-implemented method provides named-entity type disambiguation; receiving an unstructured document, analyzing the document using a set of Named Entity Recognition (NER) annotators, each generating annotated entities. For each respective annotated entity an Entity Disambiguation Module resolves a target entity type when a mention was assigned multiple entity types by different NER annotators by leveraging domain knowledge to form a set of first resolved entities. An Annotation Ranker associates a computed score to each entity in the set of first resolved entities using information in a knowledge base.
    Type: Application
    Filed: November 9, 2022
    Publication date: May 9, 2024
    Inventors: Mohammed Fahd ALHAMID, Stefano BRAGHIN, Jing Xin DUAN, Mokhtar KANDIL, Youngja PARK, Micha Gideon MOFFIE
  • Patent number: 11977625
    Abstract: A method, apparatus and computer program product to defend learning models that are vulnerable to adversarial example attack. It is assumed that data (a “dataset”) is available in multiple modalities (e.g., text and images, audio and images in video, etc.). The defense approach herein is premised on the recognition that the correlations between the different modalities for the same entity can be exploited to defend against such attacks, as it is not realistic for an adversary to attack multiple modalities. To this end, according to this technique, adversarial samples are identified and rejected if the features from one (the attacked) modality are determined to be sufficiently far away from those of another un-attacked modality for the same entity. In other words, the approach herein leverages the consistency between multiple modalities in the data to defend against adversarial attacks on one modality.
    Type: Grant
    Filed: May 12, 2023
    Date of Patent: May 7, 2024
    Assignee: International Business Machines Corporation
    Inventors: Ian Michael Molloy, Youngja Park, Taesung Lee, Wenjie Wang
  • Publication number: 20230418859
    Abstract: A method, computer system, and a computer program product for data processing, comprising obtaining a plurality of files from a data source. These files are analyzed the files for information about the content and in order to determine structural information of each file. Once the files have been analyzed, information in each file may be sorted and categorized by common content. Sensitive information may also be extracted and categorized separately. Information may then be then merged using the categories to create a single unified file.
    Type: Application
    Filed: June 27, 2022
    Publication date: December 28, 2023
    Inventors: Youngja Park, MOHAMMED FAHD ALHAMID, Stefano Braghin, Jing Xin Duan, Mokhtar Kandil, Michael Vu Le, Killian Levacher, Micha Gideon Moffie, Ian Michael Molloy, Walid Rjaibi, ARIEL FARKASH
  • Publication number: 20230319090
    Abstract: An automated method for processing security events. It begins by building an initial version of a knowledge graph based on security information received from structured data sources. Using entities identified in the initial version, additional security information is then received. The additional information is extracted from one or more unstructured data sources. The additional information includes text in which the entities (from the structured data sources) appear. The text is processed to extract relationships involving the entities (from the structured data sources) to generate entities and relationships extracted from the unstructured data sources. The initial version of the knowledge graph is then augmented with the entities and relationships extracted from the unstructured data sources to build a new version of the knowledge graph that consolidates the intelligence received from the structured data sources and the unstructured data sources. The new version is then used to process security event data.
    Type: Application
    Filed: June 5, 2023
    Publication date: October 5, 2023
    Inventors: Youngja Park, Jiyong Jang, Dhilung Hang Kirat, Josyula R. Rao, Marc Philippe Stoecklin
  • Patent number: 11755839
    Abstract: Natural language processing (NLP) methodologies and mechanisms are provided that include a named entity recognition (NER) computer model augmented to operate on an entity pattern embedding input feature in addition to other embedding input features. The mechanisms tokenize natural language content (NLC) to generate tokens and process a selected token in accordance with a predetermined entity pattern embedding technique to generate an entity pattern embedding input feature for the selected token. The entity pattern embedding input feature specifies a pattern of characters present in the selected token. The mechanisms process the NLC to generate the other embedding input features in accordance with other embedding techniques, and process, by the NER computer model, the other embedding input features and the entity pattern embedding input feature for the selected token to generate a predicted tag for the selected token. The predicted tag specifies a named entity type classification for the selected token.
    Type: Grant
    Filed: May 19, 2021
    Date of Patent: September 12, 2023
    Assignee: International Business Machines Corporation
    Inventors: Youngja Park, Jatin Arora
  • Publication number: 20230281298
    Abstract: A method, apparatus and computer program product to defend learning models that are vulnerable to adversarial example attack. It is assumed that data (a “dataset”) is available in multiple modalities (e.g., text and images, audio and images in video, etc.). The defense approach herein is premised on the recognition that the correlations between the different modalities for the same entity can be exploited to defend against such attacks, as it is not realistic for an adversary to attack multiple modalities. To this end, according to this technique, adversarial samples are identified and rejected if the features from one (the attacked) modality are determined to be sufficiently far away from those of another un-attacked modality for the same entity. In other words, the approach herein leverages the consistency between multiple modalities in the data to defend against adversarial attacks on one modality.
    Type: Application
    Filed: May 12, 2023
    Publication date: September 7, 2023
    Applicant: International Business Machines Corporation
    Inventors: Ian Michael Molloy, Youngja Park, Taesung Lee, Wenjie Wang
  • Patent number: 11675896
    Abstract: A method, apparatus and computer program product to defend learning models that are vulnerable to adversarial example attack. It is assumed that data (a “dataset”) is available in multiple modalities (e.g., text and images, audio and images in video, etc.). The defense approach herein is premised on the recognition that the correlations between the different modalities for the same entity can be exploited to defend against such attacks, as it is not realistic for an adversary to attack multiple modalities. To this end, according to this technique, adversarial samples are identified and rejected if the features from one (the attacked) modality are determined to be sufficiently far away from those of another un-attacked modality for the same entity. In other words, the approach herein leverages the consistency between multiple modalities in the data to defend against adversarial attacks on one modality.
    Type: Grant
    Filed: April 9, 2020
    Date of Patent: June 13, 2023
    Assignee: International Business Machines Corporation
    Inventors: Ian Michael Molloy, Youngja Park, Taesung Lee, Wenjie Wang
  • Publication number: 20220374602
    Abstract: Natural language processing (NLP) methodologies and mechanisms are provided that include a named entity recognition (NER) computer model augmented to operate on an entity pattern embedding input feature in addition to other embedding input features. The mechanisms tokenize natural language content (NLC) to generate tokens and process a selected token in accordance with a predetermined entity pattern embedding technique to generate an entity pattern embedding input feature for the selected token. The entity pattern embedding input feature specifies a pattern of characters present in the selected token. The mechanisms process the NLC to generate the other embedding input features in accordance with other embedding techniques, and process, by the NER computer model, the other embedding input features and the entity pattern embedding input feature for the selected token to generate a predicted tag for the selected token. The predicted tag specifies a named entity type classification for the selected token.
    Type: Application
    Filed: May 19, 2021
    Publication date: November 24, 2022
    Inventors: Youngja Park, Jatin Arora
  • Patent number: 11494496
    Abstract: Mechanisms are provided to determine a susceptibility of a trained machine learning model to a cybersecurity threat. The mechanisms execute a trained machine learning model on a test dataset to generate test results output data, and determine an overfit measure of the trained machine learning model based on the generated test results output data. The overfit measure quantifies an amount of overfitting of the trained machine learning model to a specific sub-portion of the test dataset. The mechanisms apply analytics to the overfit measure to determine a susceptibility probability that indicates a likelihood that the trained machine learning model is susceptible to a cybersecurity threat based on the determined amount of overfitting of the trained machine learning model. The mechanisms perform a corrective action based on the determined susceptibility probability.
    Type: Grant
    Filed: March 30, 2020
    Date of Patent: November 8, 2022
    Assignee: International Business Machines Corporation
    Inventors: Kathrin Grosse, Taesung Lee, Youngja Park, Ian Michael Molloy
  • Publication number: 20220188390
    Abstract: A behavioral biometrics deep learning (BBDL) pipeline is provided, comprising a plurality of stages of machine learning computer models that operate to provide a behavioral biometric based authenticator operating based on spatiotemporal input data. The BBDL pipeline receives spatiotemporal input data over a plurality of time intervals, each time interval having a corresponding subset of the spatiotemporal input data. For each time interval, machine learning computer model(s) of a corresponding stage process a subset of the spatiotemporal input data corresponding to the time interval to generate an output vector having values indicative of an internal representation of spatiotemporal traits of the entity. Output vectors are accumulated across the plurality of stages of the BBDL pipeline to generate a final output vector indicative of the spatiotemporal traits of the entity represented in the spatiotemporal input data. The entity is authenticated based on the final output vector.
    Type: Application
    Filed: December 16, 2020
    Publication date: June 16, 2022
    Inventors: Taesung Lee, Ian Michael Molloy, Youngja Park
  • Publication number: 20220121995
    Abstract: A method of forming an anomaly detection monitor includes obtaining data samples of operations performed on an application by a plurality of users and detecting, by a processor, anomalous behavior associated with a target user of the plurality of users with respect to the application based on a portion of the data samples associated with the target user and a portion of the data samples associated with a second user of the plurality of users, different from the target user.
    Type: Application
    Filed: December 30, 2021
    Publication date: April 21, 2022
    Inventors: Suresh N. Chari, Ian Michael Molloy, Youngja Park
  • Patent number: 11227232
    Abstract: A method for anomaly detection on a system or application used by a plurality of users includes providing an access to a memory device storing user data samples of a usage of the system or application for all users of the plurality of users. A target user is selected from among the plurality of users, using a processor on a computer, with data samples of the target user forming a cluster of data points in a data space. The data samples for the target user are used to generate a normal sample data set as training data set for training a model for an anomaly detection monitor for the target user. A local outlier factor (LOF) function is used to generate an abnormal sample data set for training the anomaly detection monitor for the target user.
    Type: Grant
    Filed: October 3, 2018
    Date of Patent: January 18, 2022
    Assignee: Arkose Labs Holdings, Inc.
    Inventors: Suresh N. Chari, Ian Michael Molloy, Youngja Park
  • Patent number: 11182562
    Abstract: Mechanisms are provided to perform embedding of content of a natural language document. The mechanisms receive a document data object of an electronic document and analyze a structure of the electronic document to identify one or more structural document elements that have a relationship with the document data object. A dependency data structure is generated, representing the electronic document, where edges define relationships between document elements and at least one edge represents at least one relationship between the one or more structural document elements and the document data object. The mechanisms embed the document data object based on the at least one relationship to thereby represent the document data object as a vector data structure. The mechanisms perform natural language processing on the portion of natural language content based on the vector data structure. The one or more structural document elements are non-local non-contiguous with the document data object.
    Type: Grant
    Filed: August 12, 2019
    Date of Patent: November 23, 2021
    Assignee: International Business Machines Corporation
    Inventors: Taesung Lee, Youngja Park
  • Patent number: 11176456
    Abstract: Aspects of the present invention disclose a method, computer program product, and system for pre-training a neural network. The method extracting features of data set received from a source, the data set includes labelled data and unlabeled data. Generating a plurality of data clusters from instances of data in the data set, the data clusters are weighted according to a respective number of similar instances of labeled data and unlabeled data within a respective data cluster. Determining a data label indicating a data class that corresponds to labeled data within a data cluster of the generated plurality of data clusters. Applying the determined data label to unlabeled data within the data cluster of the generated plurality of data clusters. In response to applying the determined data label to unlabeled data within the data cluster of the generated plurality of data clusters, deploying the data cluster to a neural network.
    Type: Grant
    Filed: November 29, 2017
    Date of Patent: November 16, 2021
    Assignee: International Business Machines Corporation
    Inventors: Kyusong Lee, Youngja Park
  • Patent number: 11171946
    Abstract: Managing passwords is provided. A machine training process is performed using a set of existing passwords to train a machine learning component. Members of a set of semantic categories are used to categorize respective passwords in the set of existing passwords. Password strengths corresponding to a set of candidate passwords are evaluated using the machine learning component. A resource is secured with a candidate password having a password strength greater than or equal to a defined password strength threshold level.
    Type: Grant
    Filed: February 18, 2020
    Date of Patent: November 9, 2021
    Assignee: International Business Machines Corporation
    Inventors: Suresh Chari, Taesung Lee, Ian Michael Molloy, Youngja Park
  • Patent number: 11159547
    Abstract: A computer system extracts features of documents that mention malware programs to determine textual features that correspond to individual ones of the malware programs. The computer system performs analysis of samples of malware programs to determine features corresponding to the samples. The computer system performs clustering using the textual features and using the features that correspond to the samples of the malware programs. The clustering creates clusters of data points, each data point corresponding to an individual one of the malware programs. The clusters contain data points considered by the clustering to be similar. The computer system outputs indications of the clusters to allow determination of whether data points in the clusters correspond to individual ones of specific malwares. Apparatus, methods, and computer program products are disclosed.
    Type: Grant
    Filed: August 3, 2017
    Date of Patent: October 26, 2021
    Assignee: International Business Machines Corporation
    Inventors: Suresh Chari, Heqing Huang, Taesung Lee, Youngja Park
  • Publication number: 20210319093
    Abstract: A method, apparatus and computer program product to defend learning models that are vulnerable to adversarial example attack. It is assumed that data (a “dataset”) is available in multiple modalities (e.g., text and images, audio and images in video, etc.). The defense approach herein is premised on the recognition that the correlations between the different modalities for the same entity can be exploited to defend against such attacks, as it is not realistic for an adversary to attack multiple modalities. To this end, according to this technique, adversarial samples are identified and rejected if the features from one (the attacked) modality are determined to be sufficiently far away from those of another un-attacked modality for the same entity. In other words, the approach herein leverages the consistency between multiple modalities in the data to defend against adversarial attacks on one modality.
    Type: Application
    Filed: April 9, 2020
    Publication date: October 14, 2021
    Applicant: International Business Machines Corporation
    Inventors: Ian Michael Molloy, Youngja Park, Taesung Lee, Wenjie Wang
  • Patent number: 11144718
    Abstract: In configuring a processing system with an application made up of machine learning components, where the application has been trained on a set of training data, the application is executed on the processing system using another set of training data. Outputs of the application produced from the other set of training data identified that concur with ground truth data are identified. The components are adapted to produce outputs of the application that concur with the ground truth data using the identified outputs of the application.
    Type: Grant
    Filed: February 28, 2017
    Date of Patent: October 12, 2021
    Assignee: International Business Machines Corporation
    Inventors: Youngja Park, Siddharth A. Patwardhan
  • Publication number: 20210303695
    Abstract: Mechanisms are provided to determine a susceptibility of a trained machine learning model to a cybersecurity threat. The mechanisms execute a trained machine learning model on a test dataset to generate test results output data, and determine an overfit measure of the trained machine learning model based on the generated test results output data. The overfit measure quantifies an amount of overfitting of the trained machine learning model to a specific sub-portion of the test dataset. The mechanisms apply analytics to the overfit measure to determine a susceptibility probability that indicates a likelihood that the trained machine learning model is susceptible to a cybersecurity threat based on the determined amount of overfitting of the trained machine learning model. The mechanisms perform a corrective action based on the determined susceptibility probability.
    Type: Application
    Filed: March 30, 2020
    Publication date: September 30, 2021
    Inventors: Kathrin Grosse, Taesung Lee, Youngja Park, Ian Michael Molloy
  • Patent number: 11132507
    Abstract: A first vector representation of a first word within a first narrative text and a machine-generated label corresponding to the first word are constructed. Using the first vector representation, an annotator model is trained. The annotator model is configured to produce a set of probabilities, each probability in the set of probabilities representing a probable output annotation corresponding to a word within a narrative text. The training includes minimizing a difference between a first human-generated label corresponding to the first word and a first probable output annotation corresponding to the first word. Using the trained annotator model and a second narrative text, second training data is generated. The trained annotator model is configured to produce an output annotation corresponding to a word within a narrative text. The second training data is usable to train a relation extraction model.
    Type: Grant
    Filed: April 2, 2019
    Date of Patent: September 28, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Youngja Park, Taesung Lee, Arpita Roy