Patents by Inventor Daisuke Takuma

Daisuke Takuma has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Apparatus, method, and program for visualizing boolean expressions

Patent number: 7930320

Abstract: An apparatus, a method, and a program for visualizing a Boolean expression so that it is readily recognized what is added to or excluded from conditions. A Boolean expression to be visualized is input in the form of a binary tree in which a leaf node represents an operand in the Boolean expression and a node other than the leaf node represents an operator in the Boolean expression. The input binary tree is transformed into a two-dimensional nested representation composed of a plurality of regions, and a pictorial representation for visualization is drawn on the basis of the nested representation and is displayed. When the Boolean expression is provided in a string expression, the string expression is transformed into a binary tree.

Type: Grant

Filed: October 12, 2007

Date of Patent: April 19, 2011

Assignee: International Business Machines Corporation

Inventors: Kinya Kuriyama, Mariko Nagai, Daisuke Takuma
Word boundary probability estimating, probabilistic language model building, kana-kanji converting, and unknown word model building

Patent number: 7917350

Abstract: Calculates a word n-gram probability with high accuracy in a situation where a first corpus), which is a relatively small corpus containing manually segmented word information, and a second corpus, which is a relatively large corpus, are given as a training corpus that is storage containing vast quantities of sample sentences. Vocabulary including contextual information is expanded from words occurring in first corpus of relatively small size to words occurring in second corpus of relatively large size by using a word n-gram probability estimated from an unknown word model and the raw corpus. The first corpus (word-segmented) is used for calculating n-grams and the probability that the word boundary between two adjacent characters will be the boundary of two words (segmentation probability). The second corpus (word-unsegmented), in which probabilistic word boundaries are assigned based on information in the first corpus (word-segmented), is used for calculating a word n-grams.

Type: Grant

Filed: May 26, 2008

Date of Patent: March 29, 2011

Assignee: International Business Machines Corporation

Inventors: Shinsuke Mori, Daisuke Takuma
CREATING A TERMS DICTIONARY WITH NAMED ENTITIES OR TERMINOLOGIES INCLUDED IN TEXT DATA

Publication number: 20100174528

Abstract: A computer system of an embodiment of the disclosure can be used to automatically create or populate a terms dictionary using a set of computing units. A morphological analysis unit can acquire token sequence data by performing morphological analysis for the text data. A category distinguishing unit can distinguish tokens of the token sequence data by using a category dictionary to extract uncategorized words. An uncategorized-word comparing unit can compare each of the extracted uncategorized words with an uncategorized-word comparison rule to extract an uncategorized word matching the uncategorized-word comparison rule as a registration candidate word. A token-sequence comparing unit can compare a token sequence of the token sequence data with a token-sequence comparison rule to extract a token sequence matching the token-sequence comparison rule as registration candidate words. A permission unit can permit a user to select whether to register the registration candidate words in the category dictionary.

Type: Application

Filed: January 4, 2010

Publication date: July 8, 2010

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: HIROKI OYA, DAISUKE TAKUMA, HIROBUMI TOYOSHIMA
Methods and Apparatus for Optimizing Keyword Data Analysis

Publication number: 20090248628

Abstract: Techniques for analyzing keyword data for quality management purposes are provided. One or more keywords are selected. Each of the one or more keywords represent a category of quality management. A keyword time series is prepared for each of the one or more selected keywords. A set of fixed form time series is prepared for each of the one or more selected keywords. The set of fixed form time series comprises one or more fixed form time series representing statistical data related to the one or more selected keywords. One or more correction sets comprising one or more correction parameters are obtained. Each of the one or more correction parameters correspond to one of the one or more fixed form time series within each set of fixed form time series. A set of corrected time series is generated for each of the one or more correction sets.

Type: Application

Filed: March 28, 2008

Publication date: October 1, 2009

Inventors: Hirobumi Toyoshima, Daisuke Takuma, Hiroki Oya
INFORMATION SEARCH SYSTEM, METHOD AND PROGRAM

Publication number: 20090222407

Abstract: A system, method and computer program product for searching at high speed for documents matching a dependency pattern from document data containing a large volume of text documents. The system includes a storage device for storing, index storage means for storing in the storage device occurrence information, receiving means for receiving information, reading means for reading from the index storage means, and searching means for comparing occurrence information. The method and computer program product include the steps of storing in the storage device, receiving information, reading from the storage device, comparing occurrence information, and searching. The computer program product includes instructions to execute the steps of storing each of the plurality of document data in the storage device, storing in the storage device occurrence information.

Type: Application

Filed: March 3, 2009

Publication date: September 3, 2009

Inventors: Daisuke Takuma, Yuta Tsuboi
System of effectively searching text for keyword, and method thereof

Patent number: 7584184

Abstract: A system of the present invention stores: a first index which designates lists of keywords contained in texts from identifications of the respective texts; a second index which designates lists of texts containing keywords from identifications of the respective keywords; and the number of texts containing the respective keywords. Then, upon receiving an input of a text search condition, the system calculates an estimation of search time by the first index and an estimation of search time by the second index, and determines which one of the first and second indexes makes a search faster. Then, by using the index which has been determined to make the search faster, the system searches for keywords which appear in texts satisfying the text search condition with higher frequency.

Type: Grant

Filed: November 2, 2006

Date of Patent: September 1, 2009

Assignee: International Business Machines Corporation

Inventors: Daisuke Takuma, Issei Yoshida, Yuta Tsuboi
Document data retrieval and reporting

Patent number: 7571383

Abstract: Enables retrieving document data appropriately reflecting content of a retrieval statement and detecting problems in sequentially added document data.

Type: Grant

Filed: July 13, 2005

Date of Patent: August 4, 2009

Assignee: International Business Machines Corporation

Inventors: Hiroshi Nomiyama, Daisuke Takuma
SYSTEM OF EFFECTIVELY SEARCHING TEXT FOR KEYWORD, AND METHOD THEREOF

Publication number: 20090030892

Abstract: A system of the present invention stores: a first index which designates lists of keywords contained in texts from identifications of the respective texts; a second index which designates lists of texts containing keywords from identifications of the respective keywords; and the number of texts containing the respective keywords. Then, upon receiving an input of a text search condition, the system calculates an estimation of search time by the first index and an estimation of search time by the second index, and determines which one of the first and second indexes makes a search faster. Then, by using the index which has been determined to make the search faster, the system searches for keywords which appear in texts satisfying the text search condition with higher frequency.

Type: Application

Filed: March 26, 2008

Publication date: January 29, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Daisuke Takuma, Issei Yoshida, Yuta Tsuboi
SYSTEM, METHOD AND PROGRAM FOR CREATING INDEX FOR DATABASE

Publication number: 20080319987

Abstract: An entire document set is decomposed into a sum of subsets each having no common part. Next, a set of keywords appearing in each of the subsets divided in the aforementioned manner is categorized into groups on the basis of a remainder resulting from dividing a hash value of each of the keywords by a certain fixed integer value. Thereby, index files for the respective groups are created. Among the index files prepared for the respective subsets of the document in the aforementioned manner, ones each having the same group number are merged. Thereby, integrated index files corresponding to the respective individual group numbers are created. Such index files, however, exist as many as the number of group numbers, and have not yet become an index corresponding to the entire document set. In this respect, the index files existing as many as the number of group numbers are next merged into one, and thereby, an index file corresponding to the entire document set is created.

Type: Application

Filed: June 3, 2008

Publication date: December 25, 2008

Inventors: Daisuke Takuma, Issei Yoshida
Word boundary probability estimating, probabilistic language model building, kana-kanji converting, and unknown word model building

Publication number: 20080228463

Abstract: Calculates a word n-gram probability with high accuracy in a situation where a first corpus), which is a relatively small corpus containing manually segmented word information, and a second corpus, which is a relatively large corpus, are given as a training corpus that is storage containing vast quantities of sample sentences. Vocabulary including contextual information is expanded from words occurring in first corpus of relatively small size to words occurring in second corpus of relatively large size by using a word n-gram probability estimated from an unknown word model and the raw corpus. The first corpus (word-segmented) is used for calculating n-grams and the probability that the word boundary between two adjacent characters will be the boundary of two words (segmentation probability). The second corpus (word-unsegmented), in which probabilistic word boundaries are assigned based on information in the first corpus (word-segmented), is used for calculating a word n-grams.

Type: Application

Filed: May 26, 2008

Publication date: September 18, 2008

Inventors: Shinsuke Mori, Daisuke Takuma
METHOD AND DEVICE FOR EVALUATING A TREND ANALYSIS SYSTEM

Publication number: 20080126160

Abstract: A device for evaluating a trend analysis system comprises: an allowable value input unit for receiving allowable values of false positives and allowable values of false negatives made by the trend analysis system; and an accuracy computation unit for computing an accuracy of the trend analysis system as a function of the allowable values of false positives and the allowable values of false negatives.

Type: Application

Filed: November 29, 2007

Publication date: May 29, 2008

Inventors: Hironori Takuechi, Daisuke Takuma
APPARATUS, METHOD, AND PROGRAM FOR VISUALIZING BOOLEAN EXPRESSIONS

Publication number: 20080104088

Abstract: An apparatus, a method, and a program for visualizing a Boolean expression so that it is readily recognized what is added to or excluded from conditions. A Boolean expression to be visualized is input in the form of a binary tree in which a leaf node represents an operand in the Boolean expression and a node other than the leaf node represents an operator in the Boolean expression. The input binary tree is transformed into a two-dimensional nested representation composed of a plurality of regions, and a pictorial representation for visualization is drawn on the basis of the nested representation and is displayed. When the Boolean expression is provided in a string expression, the string expression is transformed into a binary tree.

Type: Application

Filed: October 12, 2007

Publication date: May 1, 2008

Applicant: International Business Machines Corporation

Inventors: Kinya Kuriyama, Mariko Nagai, Daisuke Takuma
CHARACTER STRING PROCESSING METHOD, APPARATUS, AND PROGRAM

Publication number: 20070157123

Abstract: In order to solve the above problem, disclosed as a first aspect is a method including the steps of analyzing a character string in a document into partial character strings; calculating, with respect to each of the partial character strings, a score incorporating appearance frequency of the partial character string; presenting the partial character strings and the scores to a user; determining which ones of the partial character strings have been selected by the user; storing the selected partial character strings as a safe partial character string list; and replacing, with predetermined replacement character strings, the partial character strings excluding the partial character strings existing in the safe partial character string list.

Type: Application

Filed: December 8, 2006

Publication date: July 5, 2007

Inventors: Yohei Ikawa, Hiroshi Kanayama, Daisuke Takuma
SYSTEM OF EFFECTIVELY SEARCHING TEXT FOR KEYWORD, AND METHOD THEREOF

Publication number: 20070136274

Abstract: A system of the present invention stores: a first index which designates lists of keywords contained in texts from identifications of the respective texts; a second index which designates lists of texts containing keywords from identifications of the respective keywords; and the number of texts containing the respective keywords. Then, upon receiving an input of a text search condition, the system calculates an estimation of search time by the first index and an estimation of search time by the second index, and determines which one of the first and second indexes makes a search faster. Then, by using the index which has been determined to make the search faster, the system searches for keywords which appear in texts satisfying the text search condition with higher frequency.

Type: Application

Filed: November 2, 2006

Publication date: June 14, 2007

Inventors: Daisuke Takuma, Issei Yoshida, Yuta Tsuboi
Document data retrieval and reporting

Publication number: 20060015486

Abstract: Enables retrieving document data appropriately reflecting content of a retrieval statement and detecting problems in sequentially added document data.

Type: Application

Filed: July 13, 2005

Publication date: January 19, 2006

Applicant: International Business Machines Corporation

Inventors: Hiroshi Nomiyama, Daisuke Takuma
Word boundary probability estimating, probabilistic language model building, kana-kanji converting, and unknown word model building

Publication number: 20060015326

Abstract: Calculates a word n-gram probability with high accuracy in a situation where a first corpus), which is a relatively small corpus containing manually segmented word information, and a second corpus, which is a relatively large corpus, are given as a training corpus that is storage containing vast quantities of sample sentences. Vocabulary including contextual information is expanded from words occurring in first corpus of relatively small size to words occurring in second corpus of relatively large size by using a word n-gram probability estimated from an unknown word model and the raw corpus. The first corpus (word-segmented) is used for calculating n-grams and the probability that the word boundary between two adjacent characters will be the boundary of two words (segmentation probability). The second corpus (word-unsegmented), in which probabilistic word boundaries are assigned based on information in the first corpus (word-segmented), is used for calculating a word n-grams.

Type: Application

Filed: July 13, 2005

Publication date: January 19, 2006

Applicant: International Business Machines Corporation

Inventors: Shinsuke Mori, Daisuke Takuma

prev 1 2 3 4 5