Patents by Inventor Tetsuya Nasukawa

Tetsuya Nasukawa has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Data augmentation based on failure cases

Patent number: 11797425

Abstract: A computer-implemented method is provided for data augmentation. The method includes receiving a set of different base models already pretrained and a set of different test cases. The method further includes collecting a plurality of prediction results of the set of different test cases from the set of different base models. The method also includes identifying a test case as a candidate for the data augmentation based on a number of models in the set of different base models which fail to solve the test case. The method additionally includes augmenting, by a processor device, the identified test case with additional data to form an augmented training dataset. The method further includes retraining at least some of the different base models with the augmented training dataset.

Type: Grant

Filed: July 9, 2021

Date of Patent: October 24, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Masayasu Muraoka, Issei Yoshida, Tetsuya Nasukawa
User-centric optimization for interactive dictionary expansion

Patent number: 11645461

Abstract: A method is provided for dictionary expansion. The method acquires an object from a user and adds the object to a set of objects previously acquired from the user that form an expandable dictionary. The method calculates a centroid based on the set. The method calculates a similarity score of each of a plurality of objects relative to the centroid for each of a plurality of object features to calculate a weighted sum of similarity scores for each of the plurality of objects. The method presents candidate objects selected among the plurality of objects based on the weighted sum. The method acquires, from the user, a preferred candidate object among the candidate objects. The method updates weights of the plurality of features to maximize the weighed sum of similarity scores for the preferred candidate object. The method expands the dictionary by adding the preferred candidate object to the expandable dictionary.

Type: Grant

Filed: February 10, 2020

Date of Patent: May 9, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ryosuke Kohita, Issei Yoshida, Hiroshi Kanayama, Tetsuya Nasukawa
Data augmentation by dynamic word replacement

Patent number: 11636338

Abstract: A computer-implemented method is provided for data augmentation. The method includes calculating, by a hardware processor for each of words in a text data, a word replacement probability based on a word occurrence frequency in the text data, wherein the word replacement probability decreases with increasing word occurrence frequency. The method additionally includes selectively replacing at least one of the words in the text data with words predicted therefor by a Bidirectional Neural Network Language Model (BiNNLM) to generate augmented text data, based on the word replacement probability.

Type: Grant

Filed: March 20, 2020

Date of Patent: April 25, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Masayasu Muraoka, Tetsuya Nasukawa
DATA AUGMENTATION BASED ON FAILURE CASES

Publication number: 20230027777

Abstract: A computer-implemented method is provided for data augmentation. The method includes receiving a set of different base models already pretrained and a set of different test cases. The method further includes collecting a plurality of prediction results of the set of different test cases from the set of different base models. The method also includes identifying a test case as a candidate for the data augmentation based on a number of models in the set of different base models which fail to solve the test case. The method additionally includes augmenting, by a processor device, the identified test case with additional data to form an augmented training dataset. The method further includes retraining at least some of the different base models with the augmented training dataset.

Type: Application

Filed: July 9, 2021

Publication date: January 26, 2023

Inventors: Masayasu Muraoka, Issei Yoshida, Tetsuya Nasukawa
Extraction of semantic relation

Patent number: 11556570

Abstract: A computer-implemented method for extracting semantic relations is disclosed. In the method, a plurality of hierarchal structures that originates from a corpus of documents is obtained. Each hierarchal structure includes a plurality of elements having respective recitations included in a corresponding document. In the method, for each predetermined relationship between ancestor and descendant elements in the hierarchal structures, a first keyword list is extracted from the ancestor element and a second keyword list is extracted from the descendant element. A statistical index is calculated for each pair of first and second keywords using the first keyword lists and the second keyword lists. The index indicates a strength of association between the first and second keywords. In the method, a candidate list of keyword pairs having semantic relationships is output using the statistical index calculated for each pair.

Type: Grant

Filed: September 20, 2018

Date of Patent: January 17, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Shoko Suzuki, Tetsuya Nasukawa, Hiroshi Kanayama
TEXT MINING BASED ON DOCUMENT STRUCTURE INFORMATION EXTRACTION

Publication number: 20220358287

Abstract: Frequent sequences extracted from a set of documents according to a common rule are obtained. Based on comparing occurrence frequencies of various sequences, confidence of the first frequent sequence being a label expression representing a document part in a target document is evaluated. Keywords are extracted from the target document based on evaluation of the confidence.

Type: Application

Filed: May 10, 2021

Publication date: November 10, 2022

Inventors: Tetsuya Nasukawa, Shoko Suzuki, Daisuke Takuma, Issei Yoshida
Word grouping using a plurality of models

Patent number: 11308274

Abstract: A computer-implemented method is provided. The method includes acquiring a seed word; calculating a similarity score of each of a plurality of words relative to the seed word for each of a plurality of models to calculate a weighted sum of similarity scores for each of the plurality of words; outputting a plurality of candidate words among the plurality of words; acquiring annotations indicating at least one of preferred words and non-preferred words among the plurality of the candidate words; updating weights of the plurality of models in a manner to cause weighted sums of similarity scores for the preferred words to be relatively larger than the weighted sums of the similarity scores for the non-preferred words, based on the annotations; and grouping the plurality of candidate words output based on the weighted sum of similarity scores calculated with updated weights of the plurality of models.

Type: Grant

Filed: May 17, 2019

Date of Patent: April 19, 2022

Assignee: International Business Machines Corporation

Inventors: Ryosuke Kohita, Issei Yoshida, Tetsuya Nasukawa, Hiroshi Kanayama
Identifying expressions for target concept with images

Patent number: 11132393

Abstract: A computer-implemented method for identifying an expression for a target concept, includes: obtaining a set of texts as a target set of texts, with each text being associated with one of images relevant to a target concept. Candidate expressions for the target concept are extracted from the target set of texts. The candidate expressions are characteristic of the target set of texts. Each image relevant to one of the candidate expressions is collected by using an image search engine. A target expression for the target concept is selected from the candidate expressions based on a comparison result of the target concept and the collected images.

Type: Grant

Filed: October 30, 2018

Date of Patent: September 28, 2021

Assignee: International Business Machines Corporation

Inventors: Tetsuya Nasukawa, Masayasu Muraoka, Khan Md. Anwarus Salam
DATA AUGMENTATION BY DYNAMIC WORD REPLACEMENT

Publication number: 20210295149

Abstract: A computer-implemented method is provided for data augmentation. The method includes calculating, by a hardware processor for each of words in a text data, a word replacement probability based on a word occurrence frequency in the text data, wherein the word replacement probability decreases with increasing word occurrence frequency. The method additionally includes selectively replacing at least one of the words in the text data with words predicted therefor by a Bidirectional Neural Network Language Model (BiNNLM) to generate augmented text data, based on the word replacement probability.

Type: Application

Filed: March 20, 2020

Publication date: September 23, 2021

Inventors: Masayasu Muraoka, Tetsuya Nasukawa
USER-CENTRIC OPTIMIZATION FOR INTERACTIVE DICTIONARY EXPANSION

Publication number: 20210248315

Abstract: A method is provided for dictionary expansion. The method acquires an object from a user and adds the object to a set of objects previously acquired from the user that form an expandable dictionary. The method calculates a centroid based on the set. The method calculates a similarity score of each of a plurality of objects relative to the centroid for each of a plurality of object features to calculate a weighted sum of similarity scores for each of the plurality of objects. The method presents candidate objects selected among the plurality of objects based on the weighted sum. The method acquires, from the user, a preferred candidate object among the candidate objects. The method updates weights of the plurality of features to maximize the weighed sum of similarity scores for the preferred candidate object. The method expands the dictionary by adding the preferred candidate object to the expandable dictionary.

Type: Application

Filed: February 10, 2020

Publication date: August 12, 2021

Inventors: Ryosuke Kohita, Issei Yoshida, Hiroshi Kanayama, Tetsuya Nasukawa
Finding of asymmetric relation between words

Patent number: 10970488

Abstract: A computer-implemented method for finding an asymmetric relation between a plurality of target words is disclosed. The method includes preparing a plurality of image sets, each of which includes one or more images relevant to a corresponding one of the plurality of the target words. The method also includes obtaining a plurality of object labels for each of the plurality of image sets. The method further includes computing a representation for each of the plurality of the target words using the plurality of the object labels obtained for each of the plurality of image sets. The method includes further determining whether there is an asymmetric relation between the plurality of the target words using representations computed for the plurality of the target words.

Type: Grant

Filed: February 27, 2019

Date of Patent: April 6, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Masayasu Muraoka, Tetsuya Nasukawa, Khan Md. Anwarus Salam
Text analysis in unsupported languages using backtranslation

Patent number: 10929617

Abstract: Text analysis includes determining one or more global analysis parameters based on backtranslation of a first corpus between supported languages. A new text analysis model is determined for an unsupported language based on the one or more global analysis parameters and a text analysis model for a first supported language. An input text is analyzed in the unsupported language with the new text analysis model.

Type: Grant

Filed: July 20, 2018

Date of Patent: February 23, 2021

Assignee: International Business Machines Corporation

Inventors: Kohichi Kamijoh, Tetsuya Nasukawa, Yohei Ikawa, Masaki Ono
WORD GROUPING USING A PLURALITY OF MODELS

Publication number: 20200364298

Abstract: A computer-implemented method is provided. The method includes acquiring a seed word; calculating a similarity score of each of a plurality of words relative to the seed word for each of a plurality of models to calculate a weighted sum of similarity scores for each of the plurality of words; outputting a plurality of candidate words among the plurality of words; acquiring annotations indicating at least one of preferred words and non-preferred words among the plurality of the candidate words; updating weights of the plurality of models in a manner to cause weighted sums of similarity scores for the preferred words to be relatively larger than the weighted sums of the similarity scores for the non-preferred words, based on the annotations; and grouping the plurality of candidate words output based on the weighted sum of similarity scores calculated with updated weights of the plurality of models.

Type: Application

Filed: May 17, 2019

Publication date: November 19, 2020

Inventors: Ryosuke Kohita, Issei Yoshida, Tetsuya Nasukawa, Hiroshi Kanayama
FINDING OF ASYMMETRIC RELATION BETWEEN WORDS

Publication number: 20200272696

Abstract: A computer-implemented method for finding an asymmetric relation between a plurality of target words is disclosed. The method includes preparing a plurality of image sets, each of which includes one or more images relevant to a corresponding one of the plurality of the target words. The method also includes obtaining a plurality of object labels for each of the plurality of image sets. The method further includes computing a representation for each of the plurality of the target words using the plurality of the object labels obtained for each of the plurality of image sets. The method includes further determining whether there is an asymmetric relation between the plurality of the target words using representations computed for the plurality of the target words.

Type: Application

Filed: February 27, 2019

Publication date: August 27, 2020

Inventors: Masayasu Muraoka, Tetsuya Nasukawa, Khan Md. Anwarus Salam
Method for identifying concepts that cause significant deviations of regional distribution in a large data set

Patent number: 10671882

Abstract: A technique for use in analyzing multidimensional data is disclosed. In the technique, a subset of texts specified by a textual feature is selected from the multidimensional data. Each text of the subset is projected into a target image based on the corresponding spatial information to obtain a spatial distribution map for the textual feature. The similarity between the spatial distribution map for the textual feature and each property distribution map for each predefined property is determined. For the similarity exceeding a threshold, the textual feature is outputted as a notable textual feature.

Type: Grant

Filed: November 14, 2017

Date of Patent: June 2, 2020

Assignee: International Business Machines Corporation

Inventors: Tetsuya Nasukawa, Kazuki Sato
IDENTIFYING EXPRESSIONS FOR TARGET CONCEPT WITH IMAGES

Publication number: 20200134055

Abstract: A computer-implemented method for identifying an expression for a target concept, includes: obtaining a set of texts as a target set of texts, with each text being associated with one of images relevant to a target concept. Candidate expressions for the target concept are extracted from the target set of texts. The candidate expressions are characteristic of the target set of texts. Each image relevant to one of the candidate expressions is collected by using an image search engine. A target expression for the target concept is selected from the candidate expressions based on a comparison result of the target concept and the collected images.

Type: Application

Filed: October 30, 2018

Publication date: April 30, 2020

Inventors: Tetsuya Nasukawa, Masayasu Muraoka, Khan Md. Anwarus Salam
EXTRACTION OF SEMANTIC RELATION

Publication number: 20200097596

Abstract: A computer-implemented method for extracting semantic relations is disclosed. In the method, a plurality of hierarchal structures that originates from a corpus of documents is obtained. Each hierarchal structure includes a plurality of elements having respective recitations included in a corresponding document. In the method, for each predetermined relationship between ancestor and descendant elements in the hierarchal structures, a first keyword list is extracted from the ancestor element and a second keyword list is extracted from the descendant element. A statistical index is calculated for each pair of first and second keywords using the first keyword lists and the second keyword lists. The index indicates a strength of association between the first and second keywords. In the method, a candidate list of keyword pairs having semantic relationships is output using the statistical index calculated for each pair.

Type: Application

Filed: September 20, 2018

Publication date: March 26, 2020

Inventors: Shoko Suzuki, Tetsuya Nasukawa, Hiroshi Kanayama
TEXT ANALYSIS IN UNSUPPORTED LANGUAGES

Publication number: 20200026761

Abstract: Text analysis includes determining one or more global analysis parameters based on backtranslation of a first corpus between supported languages. A new text analysis model is determined for an unsupported language based on the one or more global analysis parameters and a text analysis model for a first supported language. An input text is analyzed in the unsupported language with the new text analysis model.

Type: Application

Filed: July 20, 2018

Publication date: January 23, 2020

Inventors: Kohichi Kamijoh, Tetsuya Nasukawa, Yohei Ikawa, Masaki Ono
METHOD FOR IDENTIFYING CONCEPTS THAT CAUSE SIGNIFICANT DEVIATIONS OF REGIONAL DISTRIBUTION IN A LARGE DATA SET

Publication number: 20190147291

Abstract: A technique for use in analyzing multidimensional data is disclosed. In the technique, a subset of texts specified by a textual feature is selected from the multidimensional data. Each text of the subset is projected into a target image based on the corresponding spatial information to obtain a spatial distribution map for the textual feature. The similarity between the spatial distribution map for the textual feature and each property distribution map for each predefined property is determined. For the similarity exceeding a threshold, the textual feature is outputted as a notable textual feature.

Type: Application

Filed: November 14, 2017

Publication date: May 16, 2019

Inventors: Tetsuya Nasukawa, Kazuki K. Sato
EXTRACTION OF EXPRESSION FOR NATURAL LANGUAGE PROCESSING

Publication number: 20190095525

Abstract: A computer-implemented method, a computer program product, and a computer system for extracting an expression in a text for natural language processing. The computer system reads a text to generate a plurality of substrings in which each substring includes one or more units appearing in the text. The computer system obtains an image set for the each substring, using the one or more units as a query for an image search system; wherein the image set includes one or more images. The computer system calculates a deviation in the image set for the each substring. The computer system selects a respective one of the plurality of the substrings as an expression to be extracted, based on the deviation and a length of each substring.

Type: Application

Filed: September 27, 2017

Publication date: March 28, 2019

Inventors: MASAYASU MURAOKA, TETSUYA NASUKAWA

1 2 3 next