Patents by Inventor Takahiko Kawatani

Takahiko Kawatani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method and vector analysis for a document

Patent number: 8171026

Abstract: The invention provides a document representation method and a document analysis method including extraction of important sentences from a given document and/or determination of similarity between two documents. The inventive method detects terms that occur in the input document, segments the input document into document segments, each segment being an appropriately sized chunk and generates document segment vectors, each vector including as its element values according to occurrence frequencies of the terms occurring in the document segments. The method further calculates eigenvalues and eigenvectors of a square sum matrix in which a rank of the respective document segment vector is represented by R and selects from the eigenvectors a plural (L) of eigenvectors to be used for determining the importance.

Type: Grant

Filed: April 16, 2009

Date of Patent: May 1, 2012

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: Takahiko Kawatani
METHOD AND VECTOR ANALYSIS FOR A DOCUMENT

Publication number: 20090216759

Abstract: The invention provides a document representation method and a document analysis method including extraction of important sentences from a given document and/or determination of similarity between two documents. The inventive method detects terms that occur in the input document, segments the input document into document segments, each segment being an appropriately sized chunk and generates document segment vectors, each vector including as its element values according to occurrence frequencies of the terms occurring in the document segments. The method further calculates eigenvalues and eigenvectors of a square sum matrix in which a rank of the respective document segment vector is represented by R and selects from the eigenvectors a plural (L) of eigenvectors to be used for determining the importance.

Type: Application

Filed: April 16, 2009

Publication date: August 27, 2009

Applicant: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.

Inventor: Takahiko KAWATANI
Method of vector analysis for a document

Patent number: 7562066

Abstract: The invention provides a document representation method and a document analysis method including extraction of important sentences from a given document and/or determination of similarity between two documents. The inventive method detects terms that occur in the input document, segments the input document into document segments, each segment being an appropriately sized chunk and generates document segment vectors, each vector including as its element values according to occurrence frequencies of the terms occurring in the document segments. The method further calculates eigenvalues and eigenvectors of a square sum matrix in which a rank of the respective document segment vector is represented by R and selects from the eigenvectors a plural (L) of eigenvectors to be used for determining the importance.

Type: Grant

Filed: November 15, 2001

Date of Patent: July 14, 2009

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: Takahiko Kawatani
Document clustering method and apparatus based on common information of documents

Patent number: 7499923

Abstract: In document (or pattern) clustering, the correct number of clusters and accurate assignment of each document (or pattern) to the correct cluster are attained. Documents (or patterns) describing the same topic (or object) are grouped, so a document (or pattern) group belonging to the same cluster has some commonality. Each topic (or object) has distinctive terms (or object features) or term (or object feature) pairs. When the closeness of each document (or pattern) to a given cluster is obtained, common information about the given cluster is extracted and used while the influence of terms (or object features) or term (or object feature) pairs not distinctive to the given cluster is excluded.

Type: Grant

Filed: March 4, 2004

Date of Patent: March 3, 2009

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: Takahiko Kawatani
Evaluating commonality of documents using segment vector, co-occurrence matrix, and common co-occurrence matrix

Patent number: 7392175

Abstract: In evaluating commonality of documents, each sentence is represented by a binary vector whose components indicate the presence or absence of corresponding terms, whereupon the concept of a common vector among documents is introduced. One sentence vector is derived from each of the documents to form a group of sentence groups, and only components which assume “1” (one) in all the vectors are “1”, the other components being “0” (zero). The commonality of a document set is evaluated by employing the sum or squared sum of the numbers of components whose values are not zero in the individual common vectors, for all the common vectors.

Type: Grant

Filed: October 29, 2003

Date of Patent: June 24, 2008

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: Takahiko Kawatani
Document segmentation method

Patent number: 7308138

Abstract: A document segmentation method of detecting segmentation points where a topic of an input document is discontinuous before and after the point to divide the document into plural blocks includes: detecting terms that occur in an input document; segmenting the input document into document segments, each segment being an appropriate sized chunk; generating document segment vectors with as its elements values related to frequencies of the terms occurring in the document segments; calculating eigenvalues and eigenvectors of a square sum matrix of the document segment vectors; selecting the basis vectors consisting a subspace from the eigenvectors to calculate the topic continuity of the document segments; calculating vectors with as their elements the values corresponding to the projection values of the each document segment vector onto the basis vector; and determining segmentation points of the document based on the continuity of the projected vectors.

Type: Grant

Filed: November 16, 2001

Date of Patent: December 11, 2007

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: Takahiko Kawatani
Method of extracting important terms, phrases, and sentences

Patent number: 7225120

Abstract: A computer extracts important terms, phrases or sentences from a document that it segments. The computer generates a square sum matrix from the document segments. The computer determines the importance of a given term, phrase or sentence on the basis of eigenvectors and eigenvalues of the matrix. The computer thereby selects the important terms, phrases or sentences related to the central concepts of the document.

Type: Grant

Filed: May 30, 2002

Date of Patent: May 29, 2007

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: Takahiko Kawatani
Evaluating distinctiveness of document

Patent number: 7200802

Abstract: Two document sets are compared in natural language processing and the distinctiveness of each constituent element (such as a sentence, term or phrase) of one document set is evaluated by dividing both the target and comparison documents into document segments, constructing the sentence vector of each document segment whose components are the occurring frequencies of terms occurring in the document segment, and projecting all the sentence vectors of both the documents on a projection axis to find a projection axis which maximizes a ratio equal to: (squared sum of projected values originating from the target document)/(squared sum of projected values originating from the comparison document). Projected values are obtained by projecting the sentence vectors on the projection axis, and the degrees of distinctiveness of the individual sentences of the target document are calculated on the basis of the projected values.

Type: Grant

Filed: June 13, 2003

Date of Patent: April 3, 2007

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: Takahiko Kawatani
Document and information retrieval method and apparatus

Patent number: 7194461

Abstract: A data procassing unit in programmed to retrieve a document item and/or an information item from a plurality of document items and/or information items. Each of the items is identified by an index vector. The retrieval (15) is in response to a query (11) including plural query terms related to each other by Boolean logic. The program causes the data processing unit to transform the query into vector form through matrix calculations (12) and to measure the similarities of the item index vectors and the vector form of the query to determine which of the items correspond with the query.

Type: Grant

Filed: March 1, 2002

Date of Patent: March 20, 2007

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: Takahiko Kawatani
Document classification method and apparatus

Patent number: 7185008

Abstract: A document is classified into at least one document class by selecting terms for use in the classification from among terms that occur in the document. A similarity between the input document and each class is calculated using information saved for every document class. The calculated similarity to each class is corrected. The class to which the input document belongs is determined in accordance with the corrected similarity to each class.

Type: Grant

Filed: February 27, 2003

Date of Patent: February 27, 2007

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: Takahiko Kawatani
Classification evaluation system, method, and program

Publication number: 20050097436

Abstract: A document classification system automatically sorts an input document into pre-determined document classes by matching the input document to class models. The content of the input documents changes with time and the class models deteriorate. Similarities between a training document set and an actual document set (which is classified into multiple classes) is calculated with respect to each class. A class with a low similarity is selected. Alternatively, classes where deterioration has occurred are detected by calculating similarities between the training document set in each individual class and the actual document set in all other classes. Class-pairs with low similarities are calculated. Close topic class-pairs are detected by calculating similarities between the training document set and all the class-pairs. Class-pairs with low similarities are selected.

Type: Application

Filed: October 29, 2004

Publication date: May 5, 2005

Inventor: Takahiko Kawatani
Document and pattern clustering method and apparatus

Publication number: 20040230577

Abstract: In document (or pattern) clustering, the correct number of clusters and accurate assignment of each document (or pattern) to the correct cluster are attained. Documents (or patterns) describing the same topic (or object) are grouped, so a document (or pattern) group belonging to the same cluster has some commonality. Each topic (or object) has distinctive terms (or object features) or term (or object feature) pairs. When the closeness of each document (or pattern) to a given cluster is obtained, common information about the given cluster is extracted and used while the influence of terms (or object features) or term (or object feature) pairs not distinctive to the given cluster is excluded.

Type: Application

Filed: March 4, 2004

Publication date: November 18, 2004

Inventor: Takahiko Kawatani
Method and apparatus for pattern recognition using a recognition dictionary partitioned into subcategories

Patent number: 6778704

Abstract: A pattern recognition method that determines the category of an unknown pattern. The category is one of a set of categories corresponding to a set of known patterns. A subcategory-level recognition dictionary is provided that stores reference information for each one of plural subcategories obtained by partitioning the categories constituting the category set. A pattern signal respresenting the unknown pattern is received (12) and is processed to extract a feature vector from it. The reference information of one subcategory of each category in the recognition dictionary is selected (14, 16) from the recognition dictionary in response to the feature vector. Finally, a distance between the feature vector and the reference information of the subcategory of each category selected in the selected in the selecting step is determined (18) to determine the category of the unknown pattern.

Type: Grant

Filed: January 4, 2002

Date of Patent: August 17, 2004

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: Takahiko Kawatani
Evaluating commonality of documents

Publication number: 20040093557

Abstract: In evaluating commonality of documents, each sentence is represented by a binary vector whose components indicate the presence or absence of corresponding terms, whereupon the concept of a common vector among documents is introduced. One sentence vector is derived from each of the documents to form a group of sentence groups, and only components which assume “1” (one) in all the vectors are “1”, the other components being “0” (zero). The commonality of a document set is evaluated by employing the sum or squared sum of the numbers of components whose values are not zero in the individual common vectors, for all the common vectors.

Type: Application

Filed: October 29, 2003

Publication date: May 13, 2004

Inventor: Takahiko Kawatani
Document segmentation method

Publication number: 20040086178

Abstract: The invention provides a document segmentation method of detecting segmentation points where a topic of an input document is discontinuous before and after the point to divide the document into plural blocks.

Type: Application

Filed: December 8, 2003

Publication date: May 6, 2004

Inventor: Takahiko Kawatani
Document and information retrieval method and apparatus

Publication number: 20040078363

Abstract: A data procassing unit in programmed to retrieve a document item and/or an information item from a plurality of document items and/or information items. Each of the items is identified by an index vector. The retrieval (15) is in response to a query (11) including plural query terms related to each other by Boolean logic. The program causes the data processing unit to transform the query into vector form through matrix calculations (12) and to measure the similarities of the item index vectors and the vector form of the query to determine which of the items correspond with the query.

Type: Application

Filed: September 2, 2003

Publication date: April 22, 2004

Inventor: Takahiko Kawatani
Method of vector analysis for a document

Publication number: 20040068396

Abstract: The invention provides a document representation method and a document analysis method including extraction of important sentences from a given document and/or determination of similarity between two documents.

Type: Application

Filed: September 22, 2003

Publication date: April 8, 2004

Inventor: Takahiko Kawatani
Evaluating distinctiveness of document

Publication number: 20040006736

Abstract: Two document sets are compared in natural language processing and the distinctiveness of each constituent element (such as a sentence, term or phrase) of one document set is evaluated by dividing both the target and comparison documents into document segments, constructing the sentence vector of each document segment whose components are the occurring frequencies of terms occurring in the document segment, and projecting all the sentence vectors of both the documents on a projection axis to find a projection axis which maximizes a ratio equal to: (squared sum of projected values originating from the target document)/(squared sum of projected values originating from the comparison document). Projected values are obtained by projecting the sentence vectors on the projection axis, and the degrees of distinctiveness of the individual sentences of the target document are calculated on the basis of the projected values.

Type: Application

Filed: June 13, 2003

Publication date: January 8, 2004

Inventor: Takahiko Kawatani
Method and apparatus for recognizing patterns

Patent number: 6671404

Abstract: A pattern recognition apparatus that comprises an input section, a feature extraction module, a feature transform module, a recognition section that includes a recognition dictionary, and a categorizer. The input section receives input patterns that include a pattern belonging to one of plural categories constituting a category set. The feature extraction module that expresses features of the pattern as a feature vector. The feature transform module uses transform vector matrices to transform at least part of the feature vector to generate an at least partially transformed feature vector corresponding to each of the categories. The transform vector matrices include a transform vector matrix generated in response to a rival pattern set composed of rival patterns misrecognized as belonging to plural ones of the categories. The plural ones of the categories constitute a category subset. The at least partially transformed feature vector is common to the ones of the categories constituting the category subset.

Type: Grant

Filed: January 22, 2000

Date of Patent: December 30, 2003

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Takahiko Kawatani, Hiroyuki Shimizu
Document classification method and apparatus

Publication number: 20030167267

Abstract: A document is classified into at least one document class by selecting terms for use in the classification from among terms that occur in the document. A similarity between the input document and each class is calculated using information saved for every document class. The calculated similarity to each class is corrected. The class to which the input document belongs is determined in accordance with the corrected similarity to each class.

Type: Application

Filed: February 27, 2003

Publication date: September 4, 2003

Inventor: Takahiko Kawatani

1 2 next