Patents by Inventor Youngja Park

Youngja Park has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DATA-DRIVEN NAMED ENTITY TYPE DISAMBIGUATION

Publication number: 20240152698

Abstract: An enhanced system and method are provided for data-driven named entity type disambiguation of one or more disclosed embodiments. A system and a non-limiting computer-implemented method provides named-entity type disambiguation; receiving an unstructured document, analyzing the document using a set of Named Entity Recognition (NER) annotators, each generating annotated entities. For each respective annotated entity an Entity Disambiguation Module resolves a target entity type when a mention was assigned multiple entity types by different NER annotators by leveraging domain knowledge to form a set of first resolved entities. An Annotation Ranker associates a computed score to each entity in the set of first resolved entities using information in a knowledge base.

Type: Application

Filed: November 9, 2022

Publication date: May 9, 2024

Inventors: Mohammed Fahd ALHAMID, Stefano BRAGHIN, Jing Xin DUAN, Mokhtar KANDIL, Youngja PARK, Micha Gideon MOFFIE
Using multimodal model consistency to detect adversarial attacks

Patent number: 11977625

Abstract: A method, apparatus and computer program product to defend learning models that are vulnerable to adversarial example attack. It is assumed that data (a “dataset”) is available in multiple modalities (e.g., text and images, audio and images in video, etc.). The defense approach herein is premised on the recognition that the correlations between the different modalities for the same entity can be exploited to defend against such attacks, as it is not realistic for an adversary to attack multiple modalities. To this end, according to this technique, adversarial samples are identified and rejected if the features from one (the attacked) modality are determined to be sufficiently far away from those of another un-attacked modality for the same entity. In other words, the approach herein leverages the consistency between multiple modalities in the data to defend against adversarial attacks on one modality.

Type: Grant

Filed: May 12, 2023

Date of Patent: May 7, 2024

Assignee: International Business Machines Corporation

Inventors: Ian Michael Molloy, Youngja Park, Taesung Lee, Wenjie Wang
UNIFIED DATA CLASSIFICATION TECHNIQUES

Publication number: 20230418859

Abstract: A method, computer system, and a computer program product for data processing, comprising obtaining a plurality of files from a data source. These files are analyzed the files for information about the content and in order to determine structural information of each file. Once the files have been analyzed, information in each file may be sorted and categorized by common content. Sensitive information may also be extracted and categorized separately. Information may then be then merged using the categories to create a single unified file.

Type: Application

Filed: June 27, 2022

Publication date: December 28, 2023

Inventors: Youngja Park, MOHAMMED FAHD ALHAMID, Stefano Braghin, Jing Xin Duan, Mokhtar Kandil, Michael Vu Le, Killian Levacher, Micha Gideon Moffie, Ian Michael Molloy, Walid Rjaibi, ARIEL FARKASH
Consolidating structured and unstructured security and threat intelligence with knowledge graphs

Publication number: 20230319090

Abstract: An automated method for processing security events. It begins by building an initial version of a knowledge graph based on security information received from structured data sources. Using entities identified in the initial version, additional security information is then received. The additional information is extracted from one or more unstructured data sources. The additional information includes text in which the entities (from the structured data sources) appear. The text is processed to extract relationships involving the entities (from the structured data sources) to generate entities and relationships extracted from the unstructured data sources. The initial version of the knowledge graph is then augmented with the entities and relationships extracted from the unstructured data sources to build a new version of the knowledge graph that consolidates the intelligence received from the structured data sources and the unstructured data sources. The new version is then used to process security event data.

Type: Application

Filed: June 5, 2023

Publication date: October 5, 2023

Inventors: Youngja Park, Jiyong Jang, Dhilung Hang Kirat, Josyula R. Rao, Marc Philippe Stoecklin
Low resource named entity recognition for sensitive personal information

Patent number: 11755839

Abstract: Natural language processing (NLP) methodologies and mechanisms are provided that include a named entity recognition (NER) computer model augmented to operate on an entity pattern embedding input feature in addition to other embedding input features. The mechanisms tokenize natural language content (NLC) to generate tokens and process a selected token in accordance with a predetermined entity pattern embedding technique to generate an entity pattern embedding input feature for the selected token. The entity pattern embedding input feature specifies a pattern of characters present in the selected token. The mechanisms process the NLC to generate the other embedding input features in accordance with other embedding techniques, and process, by the NER computer model, the other embedding input features and the entity pattern embedding input feature for the selected token to generate a predicted tag for the selected token. The predicted tag specifies a named entity type classification for the selected token.

Type: Grant

Filed: May 19, 2021

Date of Patent: September 12, 2023

Assignee: International Business Machines Corporation

Inventors: Youngja Park, Jatin Arora
USING MULTIMODAL MODEL CONSISTENCY TO DETECT ADVERSARIAL ATTACKS

Publication number: 20230281298

Abstract: A method, apparatus and computer program product to defend learning models that are vulnerable to adversarial example attack. It is assumed that data (a “dataset”) is available in multiple modalities (e.g., text and images, audio and images in video, etc.). The defense approach herein is premised on the recognition that the correlations between the different modalities for the same entity can be exploited to defend against such attacks, as it is not realistic for an adversary to attack multiple modalities. To this end, according to this technique, adversarial samples are identified and rejected if the features from one (the attacked) modality are determined to be sufficiently far away from those of another un-attacked modality for the same entity. In other words, the approach herein leverages the consistency between multiple modalities in the data to defend against adversarial attacks on one modality.

Type: Application

Filed: May 12, 2023

Publication date: September 7, 2023

Applicant: International Business Machines Corporation

Inventors: Ian Michael Molloy, Youngja Park, Taesung Lee, Wenjie Wang
Using multimodal model consistency to detect adversarial attacks

Patent number: 11675896

Abstract: A method, apparatus and computer program product to defend learning models that are vulnerable to adversarial example attack. It is assumed that data (a “dataset”) is available in multiple modalities (e.g., text and images, audio and images in video, etc.). The defense approach herein is premised on the recognition that the correlations between the different modalities for the same entity can be exploited to defend against such attacks, as it is not realistic for an adversary to attack multiple modalities. To this end, according to this technique, adversarial samples are identified and rejected if the features from one (the attacked) modality are determined to be sufficiently far away from those of another un-attacked modality for the same entity. In other words, the approach herein leverages the consistency between multiple modalities in the data to defend against adversarial attacks on one modality.

Type: Grant

Filed: April 9, 2020

Date of Patent: June 13, 2023

Assignee: International Business Machines Corporation

Inventors: Ian Michael Molloy, Youngja Park, Taesung Lee, Wenjie Wang
Low Resource Named Entity Recognition for Sensitive Personal Information

Publication number: 20220374602

Abstract: Natural language processing (NLP) methodologies and mechanisms are provided that include a named entity recognition (NER) computer model augmented to operate on an entity pattern embedding input feature in addition to other embedding input features. The mechanisms tokenize natural language content (NLC) to generate tokens and process a selected token in accordance with a predetermined entity pattern embedding technique to generate an entity pattern embedding input feature for the selected token. The entity pattern embedding input feature specifies a pattern of characters present in the selected token. The mechanisms process the NLC to generate the other embedding input features in accordance with other embedding techniques, and process, by the NER computer model, the other embedding input features and the entity pattern embedding input feature for the selected token to generate a predicted tag for the selected token. The predicted tag specifies a named entity type classification for the selected token.

Type: Application

Filed: May 19, 2021

Publication date: November 24, 2022

Inventors: Youngja Park, Jatin Arora
Measuring overfitting of machine learning computer model and susceptibility to security threats

Patent number: 11494496

Abstract: Mechanisms are provided to determine a susceptibility of a trained machine learning model to a cybersecurity threat. The mechanisms execute a trained machine learning model on a test dataset to generate test results output data, and determine an overfit measure of the trained machine learning model based on the generated test results output data. The overfit measure quantifies an amount of overfitting of the trained machine learning model to a specific sub-portion of the test dataset. The mechanisms apply analytics to the overfit measure to determine a susceptibility probability that indicates a likelihood that the trained machine learning model is susceptible to a cybersecurity threat based on the determined amount of overfitting of the trained machine learning model. The mechanisms perform a corrective action based on the determined susceptibility probability.

Type: Grant

Filed: March 30, 2020

Date of Patent: November 8, 2022

Assignee: International Business Machines Corporation

Inventors: Kathrin Grosse, Taesung Lee, Youngja Park, Ian Michael Molloy
Spatiotemporal Deep Learning for Behavioral Biometrics

Publication number: 20220188390

Abstract: A behavioral biometrics deep learning (BBDL) pipeline is provided, comprising a plurality of stages of machine learning computer models that operate to provide a behavioral biometric based authenticator operating based on spatiotemporal input data. The BBDL pipeline receives spatiotemporal input data over a plurality of time intervals, each time interval having a corresponding subset of the spatiotemporal input data. For each time interval, machine learning computer model(s) of a corresponding stage process a subset of the spatiotemporal input data corresponding to the time interval to generate an output vector having values indicative of an internal representation of spatiotemporal traits of the entity. Output vectors are accumulated across the plurality of stages of the BBDL pipeline to generate a final output vector indicative of the spatiotemporal traits of the entity represented in the spatiotemporal input data. The entity is authenticated based on the final output vector.

Type: Application

Filed: December 16, 2020

Publication date: June 16, 2022

Inventors: Taesung Lee, Ian Michael Molloy, Youngja Park
AUTOMATIC GENERATION OF TRAINING DATA FOR ANOMALY DETECTION USING OTHER USER'S DATA SAMPLES

Publication number: 20220121995

Abstract: A method of forming an anomaly detection monitor includes obtaining data samples of operations performed on an application by a plurality of users and detecting, by a processor, anomalous behavior associated with a target user of the plurality of users with respect to the application based on a portion of the data samples associated with the target user and a portion of the data samples associated with a second user of the plurality of users, different from the target user.

Type: Application

Filed: December 30, 2021

Publication date: April 21, 2022

Inventors: Suresh N. Chari, Ian Michael Molloy, Youngja Park
Automatic generation of training data for anomaly detection using other user's data samples

Patent number: 11227232

Abstract: A method for anomaly detection on a system or application used by a plurality of users includes providing an access to a memory device storing user data samples of a usage of the system or application for all users of the plurality of users. A target user is selected from among the plurality of users, using a processor on a computer, with data samples of the target user forming a cluster of data points in a data space. The data samples for the target user are used to generate a normal sample data set as training data set for training a model for an anomaly detection monitor for the target user. A local outlier factor (LOF) function is used to generate an abnormal sample data set for training the anomaly detection monitor for the target user.

Type: Grant

Filed: October 3, 2018

Date of Patent: January 18, 2022

Assignee: Arkose Labs Holdings, Inc.

Inventors: Suresh N. Chari, Ian Michael Molloy, Youngja Park
Deep embedding for natural language content based on semantic dependencies

Patent number: 11182562

Abstract: Mechanisms are provided to perform embedding of content of a natural language document. The mechanisms receive a document data object of an electronic document and analyze a structure of the electronic document to identify one or more structural document elements that have a relationship with the document data object. A dependency data structure is generated, representing the electronic document, where edges define relationships between document elements and at least one edge represents at least one relationship between the one or more structural document elements and the document data object. The mechanisms embed the document data object based on the at least one relationship to thereby represent the document data object as a vector data structure. The mechanisms perform natural language processing on the portion of natural language content based on the vector data structure. The one or more structural document elements are non-local non-contiguous with the document data object.

Type: Grant

Filed: August 12, 2019

Date of Patent: November 23, 2021

Assignee: International Business Machines Corporation

Inventors: Taesung Lee, Youngja Park
Pre-training neural networks using data clusters

Patent number: 11176456

Abstract: Aspects of the present invention disclose a method, computer program product, and system for pre-training a neural network. The method extracting features of data set received from a source, the data set includes labelled data and unlabeled data. Generating a plurality of data clusters from instances of data in the data set, the data clusters are weighted according to a respective number of similar instances of labeled data and unlabeled data within a respective data cluster. Determining a data label indicating a data class that corresponds to labeled data within a data cluster of the generated plurality of data clusters. Applying the determined data label to unlabeled data within the data cluster of the generated plurality of data clusters. In response to applying the determined data label to unlabeled data within the data cluster of the generated plurality of data clusters, deploying the data cluster to a neural network.

Type: Grant

Filed: November 29, 2017

Date of Patent: November 16, 2021

Assignee: International Business Machines Corporation

Inventors: Kyusong Lee, Youngja Park
Two-level sequence learning for analyzing, metering, generating, and cracking passwords

Patent number: 11171946

Abstract: Managing passwords is provided. A machine training process is performed using a set of existing passwords to train a machine learning component. Members of a set of semantic categories are used to categorize respective passwords in the set of existing passwords. Password strengths corresponding to a set of candidate passwords are evaluated using the machine learning component. A resource is secured with a candidate password having a password strength greater than or equal to a defined password strength threshold level.

Type: Grant

Filed: February 18, 2020

Date of Patent: November 9, 2021

Assignee: International Business Machines Corporation

Inventors: Suresh Chari, Taesung Lee, Ian Michael Molloy, Youngja Park
Malware clustering approaches based on cognitive computing techniques

Patent number: 11159547

Abstract: A computer system extracts features of documents that mention malware programs to determine textual features that correspond to individual ones of the malware programs. The computer system performs analysis of samples of malware programs to determine features corresponding to the samples. The computer system performs clustering using the textual features and using the features that correspond to the samples of the malware programs. The clustering creates clusters of data points, each data point corresponding to an individual one of the malware programs. The clusters contain data points considered by the clustering to be similar. The computer system outputs indications of the clusters to allow determination of whether data points in the clusters correspond to individual ones of specific malwares. Apparatus, methods, and computer program products are disclosed.

Type: Grant

Filed: August 3, 2017

Date of Patent: October 26, 2021

Assignee: International Business Machines Corporation

Inventors: Suresh Chari, Heqing Huang, Taesung Lee, Youngja Park
Using multimodal model consistency to detect adversarial attacks

Publication number: 20210319093

Abstract: A method, apparatus and computer program product to defend learning models that are vulnerable to adversarial example attack. It is assumed that data (a “dataset”) is available in multiple modalities (e.g., text and images, audio and images in video, etc.). The defense approach herein is premised on the recognition that the correlations between the different modalities for the same entity can be exploited to defend against such attacks, as it is not realistic for an adversary to attack multiple modalities. To this end, according to this technique, adversarial samples are identified and rejected if the features from one (the attacked) modality are determined to be sufficiently far away from those of another un-attacked modality for the same entity. In other words, the approach herein leverages the consistency between multiple modalities in the data to defend against adversarial attacks on one modality.

Type: Application

Filed: April 9, 2020

Publication date: October 14, 2021

Applicant: International Business Machines Corporation

Inventors: Ian Michael Molloy, Youngja Park, Taesung Lee, Wenjie Wang
Adaptable processing components

Patent number: 11144718

Abstract: In configuring a processing system with an application made up of machine learning components, where the application has been trained on a set of training data, the application is executed on the processing system using another set of training data. Outputs of the application produced from the other set of training data identified that concur with ground truth data are identified. The components are adapted to produce outputs of the application that concur with the ground truth data using the identified outputs of the application.

Type: Grant

Filed: February 28, 2017

Date of Patent: October 12, 2021

Assignee: International Business Machines Corporation

Inventors: Youngja Park, Siddharth A. Patwardhan
Measuring Overfitting of Machine Learning Computer Model and Susceptibility to Security Threats

Publication number: 20210303695

Abstract: Mechanisms are provided to determine a susceptibility of a trained machine learning model to a cybersecurity threat. The mechanisms execute a trained machine learning model on a test dataset to generate test results output data, and determine an overfit measure of the trained machine learning model based on the generated test results output data. The overfit measure quantifies an amount of overfitting of the trained machine learning model to a specific sub-portion of the test dataset. The mechanisms apply analytics to the overfit measure to determine a susceptibility probability that indicates a likelihood that the trained machine learning model is susceptible to a cybersecurity threat based on the determined amount of overfitting of the trained machine learning model. The mechanisms perform a corrective action based on the determined susceptibility probability.

Type: Application

Filed: March 30, 2020

Publication date: September 30, 2021

Inventors: Kathrin Grosse, Taesung Lee, Youngja Park, Ian Michael Molloy
Cross-subject model-generated training data for relation extraction modeling

Patent number: 11132507

Abstract: A first vector representation of a first word within a first narrative text and a machine-generated label corresponding to the first word are constructed. Using the first vector representation, an annotator model is trained. The annotator model is configured to produce a set of probabilities, each probability in the set of probabilities representing a probable output annotation corresponding to a word within a narrative text. The training includes minimizing a difference between a first human-generated label corresponding to the first word and a first probable output annotation corresponding to the first word. Using the trained annotator model and a second narrative text, second training data is generated. The trained annotator model is configured to produce an output annotation corresponding to a word within a narrative text. The second training data is usable to train a relation extraction model.

Type: Grant

Filed: April 2, 2019

Date of Patent: September 28, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Youngja Park, Taesung Lee, Arpita Roy

1 2 3 4 5 … next