Patents by Inventor Youngja Park
Youngja Park has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12019720Abstract: A behavioral biometrics deep learning (BBDL) pipeline is provided, comprising a plurality of stages of machine learning computer models that operate to provide a behavioral biometric based authenticator operating based on spatiotemporal input data. The BBDL pipeline receives spatiotemporal input data over a plurality of time intervals, each time interval having a corresponding subset of the spatiotemporal input data. For each time interval, machine learning computer model(s) of a corresponding stage process a subset of the spatiotemporal input data corresponding to the time interval to generate an output vector having values indicative of an internal representation of spatiotemporal traits of the entity. Output vectors are accumulated across the plurality of stages of the BBDL pipeline to generate a final output vector indicative of the spatiotemporal traits of the entity represented in the spatiotemporal input data. The entity is authenticated based on the final output vector.Type: GrantFiled: December 16, 2020Date of Patent: June 25, 2024Assignee: International Business Machines CorporationInventors: Taesung Lee, Ian Michael Molloy, Youngja Park
-
Publication number: 20240152698Abstract: An enhanced system and method are provided for data-driven named entity type disambiguation of one or more disclosed embodiments. A system and a non-limiting computer-implemented method provides named-entity type disambiguation; receiving an unstructured document, analyzing the document using a set of Named Entity Recognition (NER) annotators, each generating annotated entities. For each respective annotated entity an Entity Disambiguation Module resolves a target entity type when a mention was assigned multiple entity types by different NER annotators by leveraging domain knowledge to form a set of first resolved entities. An Annotation Ranker associates a computed score to each entity in the set of first resolved entities using information in a knowledge base.Type: ApplicationFiled: November 9, 2022Publication date: May 9, 2024Inventors: Mohammed Fahd ALHAMID, Stefano BRAGHIN, Jing Xin DUAN, Mokhtar KANDIL, Youngja PARK, Micha Gideon MOFFIE
-
Patent number: 11977625Abstract: A method, apparatus and computer program product to defend learning models that are vulnerable to adversarial example attack. It is assumed that data (a “dataset”) is available in multiple modalities (e.g., text and images, audio and images in video, etc.). The defense approach herein is premised on the recognition that the correlations between the different modalities for the same entity can be exploited to defend against such attacks, as it is not realistic for an adversary to attack multiple modalities. To this end, according to this technique, adversarial samples are identified and rejected if the features from one (the attacked) modality are determined to be sufficiently far away from those of another un-attacked modality for the same entity. In other words, the approach herein leverages the consistency between multiple modalities in the data to defend against adversarial attacks on one modality.Type: GrantFiled: May 12, 2023Date of Patent: May 7, 2024Assignee: International Business Machines CorporationInventors: Ian Michael Molloy, Youngja Park, Taesung Lee, Wenjie Wang
-
Publication number: 20230418859Abstract: A method, computer system, and a computer program product for data processing, comprising obtaining a plurality of files from a data source. These files are analyzed the files for information about the content and in order to determine structural information of each file. Once the files have been analyzed, information in each file may be sorted and categorized by common content. Sensitive information may also be extracted and categorized separately. Information may then be then merged using the categories to create a single unified file.Type: ApplicationFiled: June 27, 2022Publication date: December 28, 2023Inventors: Youngja Park, MOHAMMED FAHD ALHAMID, Stefano Braghin, Jing Xin Duan, Mokhtar Kandil, Michael Vu Le, Killian Levacher, Micha Gideon Moffie, Ian Michael Molloy, Walid Rjaibi, ARIEL FARKASH
-
Publication number: 20230319090Abstract: An automated method for processing security events. It begins by building an initial version of a knowledge graph based on security information received from structured data sources. Using entities identified in the initial version, additional security information is then received. The additional information is extracted from one or more unstructured data sources. The additional information includes text in which the entities (from the structured data sources) appear. The text is processed to extract relationships involving the entities (from the structured data sources) to generate entities and relationships extracted from the unstructured data sources. The initial version of the knowledge graph is then augmented with the entities and relationships extracted from the unstructured data sources to build a new version of the knowledge graph that consolidates the intelligence received from the structured data sources and the unstructured data sources. The new version is then used to process security event data.Type: ApplicationFiled: June 5, 2023Publication date: October 5, 2023Inventors: Youngja Park, Jiyong Jang, Dhilung Hang Kirat, Josyula R. Rao, Marc Philippe Stoecklin
-
Patent number: 11755839Abstract: Natural language processing (NLP) methodologies and mechanisms are provided that include a named entity recognition (NER) computer model augmented to operate on an entity pattern embedding input feature in addition to other embedding input features. The mechanisms tokenize natural language content (NLC) to generate tokens and process a selected token in accordance with a predetermined entity pattern embedding technique to generate an entity pattern embedding input feature for the selected token. The entity pattern embedding input feature specifies a pattern of characters present in the selected token. The mechanisms process the NLC to generate the other embedding input features in accordance with other embedding techniques, and process, by the NER computer model, the other embedding input features and the entity pattern embedding input feature for the selected token to generate a predicted tag for the selected token. The predicted tag specifies a named entity type classification for the selected token.Type: GrantFiled: May 19, 2021Date of Patent: September 12, 2023Assignee: International Business Machines CorporationInventors: Youngja Park, Jatin Arora
-
Publication number: 20230281298Abstract: A method, apparatus and computer program product to defend learning models that are vulnerable to adversarial example attack. It is assumed that data (a “dataset”) is available in multiple modalities (e.g., text and images, audio and images in video, etc.). The defense approach herein is premised on the recognition that the correlations between the different modalities for the same entity can be exploited to defend against such attacks, as it is not realistic for an adversary to attack multiple modalities. To this end, according to this technique, adversarial samples are identified and rejected if the features from one (the attacked) modality are determined to be sufficiently far away from those of another un-attacked modality for the same entity. In other words, the approach herein leverages the consistency between multiple modalities in the data to defend against adversarial attacks on one modality.Type: ApplicationFiled: May 12, 2023Publication date: September 7, 2023Applicant: International Business Machines CorporationInventors: Ian Michael Molloy, Youngja Park, Taesung Lee, Wenjie Wang
-
Patent number: 11675896Abstract: A method, apparatus and computer program product to defend learning models that are vulnerable to adversarial example attack. It is assumed that data (a “dataset”) is available in multiple modalities (e.g., text and images, audio and images in video, etc.). The defense approach herein is premised on the recognition that the correlations between the different modalities for the same entity can be exploited to defend against such attacks, as it is not realistic for an adversary to attack multiple modalities. To this end, according to this technique, adversarial samples are identified and rejected if the features from one (the attacked) modality are determined to be sufficiently far away from those of another un-attacked modality for the same entity. In other words, the approach herein leverages the consistency between multiple modalities in the data to defend against adversarial attacks on one modality.Type: GrantFiled: April 9, 2020Date of Patent: June 13, 2023Assignee: International Business Machines CorporationInventors: Ian Michael Molloy, Youngja Park, Taesung Lee, Wenjie Wang
-
Publication number: 20220374602Abstract: Natural language processing (NLP) methodologies and mechanisms are provided that include a named entity recognition (NER) computer model augmented to operate on an entity pattern embedding input feature in addition to other embedding input features. The mechanisms tokenize natural language content (NLC) to generate tokens and process a selected token in accordance with a predetermined entity pattern embedding technique to generate an entity pattern embedding input feature for the selected token. The entity pattern embedding input feature specifies a pattern of characters present in the selected token. The mechanisms process the NLC to generate the other embedding input features in accordance with other embedding techniques, and process, by the NER computer model, the other embedding input features and the entity pattern embedding input feature for the selected token to generate a predicted tag for the selected token. The predicted tag specifies a named entity type classification for the selected token.Type: ApplicationFiled: May 19, 2021Publication date: November 24, 2022Inventors: Youngja Park, Jatin Arora
-
Patent number: 11494496Abstract: Mechanisms are provided to determine a susceptibility of a trained machine learning model to a cybersecurity threat. The mechanisms execute a trained machine learning model on a test dataset to generate test results output data, and determine an overfit measure of the trained machine learning model based on the generated test results output data. The overfit measure quantifies an amount of overfitting of the trained machine learning model to a specific sub-portion of the test dataset. The mechanisms apply analytics to the overfit measure to determine a susceptibility probability that indicates a likelihood that the trained machine learning model is susceptible to a cybersecurity threat based on the determined amount of overfitting of the trained machine learning model. The mechanisms perform a corrective action based on the determined susceptibility probability.Type: GrantFiled: March 30, 2020Date of Patent: November 8, 2022Assignee: International Business Machines CorporationInventors: Kathrin Grosse, Taesung Lee, Youngja Park, Ian Michael Molloy
-
Publication number: 20220188390Abstract: A behavioral biometrics deep learning (BBDL) pipeline is provided, comprising a plurality of stages of machine learning computer models that operate to provide a behavioral biometric based authenticator operating based on spatiotemporal input data. The BBDL pipeline receives spatiotemporal input data over a plurality of time intervals, each time interval having a corresponding subset of the spatiotemporal input data. For each time interval, machine learning computer model(s) of a corresponding stage process a subset of the spatiotemporal input data corresponding to the time interval to generate an output vector having values indicative of an internal representation of spatiotemporal traits of the entity. Output vectors are accumulated across the plurality of stages of the BBDL pipeline to generate a final output vector indicative of the spatiotemporal traits of the entity represented in the spatiotemporal input data. The entity is authenticated based on the final output vector.Type: ApplicationFiled: December 16, 2020Publication date: June 16, 2022Inventors: Taesung Lee, Ian Michael Molloy, Youngja Park
-
Publication number: 20220121995Abstract: A method of forming an anomaly detection monitor includes obtaining data samples of operations performed on an application by a plurality of users and detecting, by a processor, anomalous behavior associated with a target user of the plurality of users with respect to the application based on a portion of the data samples associated with the target user and a portion of the data samples associated with a second user of the plurality of users, different from the target user.Type: ApplicationFiled: December 30, 2021Publication date: April 21, 2022Inventors: Suresh N. Chari, Ian Michael Molloy, Youngja Park
-
Patent number: 11227232Abstract: A method for anomaly detection on a system or application used by a plurality of users includes providing an access to a memory device storing user data samples of a usage of the system or application for all users of the plurality of users. A target user is selected from among the plurality of users, using a processor on a computer, with data samples of the target user forming a cluster of data points in a data space. The data samples for the target user are used to generate a normal sample data set as training data set for training a model for an anomaly detection monitor for the target user. A local outlier factor (LOF) function is used to generate an abnormal sample data set for training the anomaly detection monitor for the target user.Type: GrantFiled: October 3, 2018Date of Patent: January 18, 2022Assignee: Arkose Labs Holdings, Inc.Inventors: Suresh N. Chari, Ian Michael Molloy, Youngja Park
-
Patent number: 11182562Abstract: Mechanisms are provided to perform embedding of content of a natural language document. The mechanisms receive a document data object of an electronic document and analyze a structure of the electronic document to identify one or more structural document elements that have a relationship with the document data object. A dependency data structure is generated, representing the electronic document, where edges define relationships between document elements and at least one edge represents at least one relationship between the one or more structural document elements and the document data object. The mechanisms embed the document data object based on the at least one relationship to thereby represent the document data object as a vector data structure. The mechanisms perform natural language processing on the portion of natural language content based on the vector data structure. The one or more structural document elements are non-local non-contiguous with the document data object.Type: GrantFiled: August 12, 2019Date of Patent: November 23, 2021Assignee: International Business Machines CorporationInventors: Taesung Lee, Youngja Park
-
Patent number: 11176456Abstract: Aspects of the present invention disclose a method, computer program product, and system for pre-training a neural network. The method extracting features of data set received from a source, the data set includes labelled data and unlabeled data. Generating a plurality of data clusters from instances of data in the data set, the data clusters are weighted according to a respective number of similar instances of labeled data and unlabeled data within a respective data cluster. Determining a data label indicating a data class that corresponds to labeled data within a data cluster of the generated plurality of data clusters. Applying the determined data label to unlabeled data within the data cluster of the generated plurality of data clusters. In response to applying the determined data label to unlabeled data within the data cluster of the generated plurality of data clusters, deploying the data cluster to a neural network.Type: GrantFiled: November 29, 2017Date of Patent: November 16, 2021Assignee: International Business Machines CorporationInventors: Kyusong Lee, Youngja Park
-
Patent number: 11171946Abstract: Managing passwords is provided. A machine training process is performed using a set of existing passwords to train a machine learning component. Members of a set of semantic categories are used to categorize respective passwords in the set of existing passwords. Password strengths corresponding to a set of candidate passwords are evaluated using the machine learning component. A resource is secured with a candidate password having a password strength greater than or equal to a defined password strength threshold level.Type: GrantFiled: February 18, 2020Date of Patent: November 9, 2021Assignee: International Business Machines CorporationInventors: Suresh Chari, Taesung Lee, Ian Michael Molloy, Youngja Park
-
Patent number: 11159547Abstract: A computer system extracts features of documents that mention malware programs to determine textual features that correspond to individual ones of the malware programs. The computer system performs analysis of samples of malware programs to determine features corresponding to the samples. The computer system performs clustering using the textual features and using the features that correspond to the samples of the malware programs. The clustering creates clusters of data points, each data point corresponding to an individual one of the malware programs. The clusters contain data points considered by the clustering to be similar. The computer system outputs indications of the clusters to allow determination of whether data points in the clusters correspond to individual ones of specific malwares. Apparatus, methods, and computer program products are disclosed.Type: GrantFiled: August 3, 2017Date of Patent: October 26, 2021Assignee: International Business Machines CorporationInventors: Suresh Chari, Heqing Huang, Taesung Lee, Youngja Park
-
Publication number: 20210319093Abstract: A method, apparatus and computer program product to defend learning models that are vulnerable to adversarial example attack. It is assumed that data (a “dataset”) is available in multiple modalities (e.g., text and images, audio and images in video, etc.). The defense approach herein is premised on the recognition that the correlations between the different modalities for the same entity can be exploited to defend against such attacks, as it is not realistic for an adversary to attack multiple modalities. To this end, according to this technique, adversarial samples are identified and rejected if the features from one (the attacked) modality are determined to be sufficiently far away from those of another un-attacked modality for the same entity. In other words, the approach herein leverages the consistency between multiple modalities in the data to defend against adversarial attacks on one modality.Type: ApplicationFiled: April 9, 2020Publication date: October 14, 2021Applicant: International Business Machines CorporationInventors: Ian Michael Molloy, Youngja Park, Taesung Lee, Wenjie Wang
-
Patent number: 11144718Abstract: In configuring a processing system with an application made up of machine learning components, where the application has been trained on a set of training data, the application is executed on the processing system using another set of training data. Outputs of the application produced from the other set of training data identified that concur with ground truth data are identified. The components are adapted to produce outputs of the application that concur with the ground truth data using the identified outputs of the application.Type: GrantFiled: February 28, 2017Date of Patent: October 12, 2021Assignee: International Business Machines CorporationInventors: Youngja Park, Siddharth A. Patwardhan
-
Publication number: 20210303695Abstract: Mechanisms are provided to determine a susceptibility of a trained machine learning model to a cybersecurity threat. The mechanisms execute a trained machine learning model on a test dataset to generate test results output data, and determine an overfit measure of the trained machine learning model based on the generated test results output data. The overfit measure quantifies an amount of overfitting of the trained machine learning model to a specific sub-portion of the test dataset. The mechanisms apply analytics to the overfit measure to determine a susceptibility probability that indicates a likelihood that the trained machine learning model is susceptible to a cybersecurity threat based on the determined amount of overfitting of the trained machine learning model. The mechanisms perform a corrective action based on the determined susceptibility probability.Type: ApplicationFiled: March 30, 2020Publication date: September 30, 2021Inventors: Kathrin Grosse, Taesung Lee, Youngja Park, Ian Michael Molloy