Patents by Inventor Takahiko Kawatani

Takahiko Kawatani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8171026
    Abstract: The invention provides a document representation method and a document analysis method including extraction of important sentences from a given document and/or determination of similarity between two documents. The inventive method detects terms that occur in the input document, segments the input document into document segments, each segment being an appropriately sized chunk and generates document segment vectors, each vector including as its element values according to occurrence frequencies of the terms occurring in the document segments. The method further calculates eigenvalues and eigenvectors of a square sum matrix in which a rank of the respective document segment vector is represented by R and selects from the eigenvectors a plural (L) of eigenvectors to be used for determining the importance.
    Type: Grant
    Filed: April 16, 2009
    Date of Patent: May 1, 2012
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Takahiko Kawatani
  • Publication number: 20090216759
    Abstract: The invention provides a document representation method and a document analysis method including extraction of important sentences from a given document and/or determination of similarity between two documents. The inventive method detects terms that occur in the input document, segments the input document into document segments, each segment being an appropriately sized chunk and generates document segment vectors, each vector including as its element values according to occurrence frequencies of the terms occurring in the document segments. The method further calculates eigenvalues and eigenvectors of a square sum matrix in which a rank of the respective document segment vector is represented by R and selects from the eigenvectors a plural (L) of eigenvectors to be used for determining the importance.
    Type: Application
    Filed: April 16, 2009
    Publication date: August 27, 2009
    Applicant: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
    Inventor: Takahiko KAWATANI
  • Patent number: 7562066
    Abstract: The invention provides a document representation method and a document analysis method including extraction of important sentences from a given document and/or determination of similarity between two documents. The inventive method detects terms that occur in the input document, segments the input document into document segments, each segment being an appropriately sized chunk and generates document segment vectors, each vector including as its element values according to occurrence frequencies of the terms occurring in the document segments. The method further calculates eigenvalues and eigenvectors of a square sum matrix in which a rank of the respective document segment vector is represented by R and selects from the eigenvectors a plural (L) of eigenvectors to be used for determining the importance.
    Type: Grant
    Filed: November 15, 2001
    Date of Patent: July 14, 2009
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Takahiko Kawatani
  • Patent number: 7499923
    Abstract: In document (or pattern) clustering, the correct number of clusters and accurate assignment of each document (or pattern) to the correct cluster are attained. Documents (or patterns) describing the same topic (or object) are grouped, so a document (or pattern) group belonging to the same cluster has some commonality. Each topic (or object) has distinctive terms (or object features) or term (or object feature) pairs. When the closeness of each document (or pattern) to a given cluster is obtained, common information about the given cluster is extracted and used while the influence of terms (or object features) or term (or object feature) pairs not distinctive to the given cluster is excluded.
    Type: Grant
    Filed: March 4, 2004
    Date of Patent: March 3, 2009
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Takahiko Kawatani
  • Patent number: 7392175
    Abstract: In evaluating commonality of documents, each sentence is represented by a binary vector whose components indicate the presence or absence of corresponding terms, whereupon the concept of a common vector among documents is introduced. One sentence vector is derived from each of the documents to form a group of sentence groups, and only components which assume “1” (one) in all the vectors are “1”, the other components being “0” (zero). The commonality of a document set is evaluated by employing the sum or squared sum of the numbers of components whose values are not zero in the individual common vectors, for all the common vectors.
    Type: Grant
    Filed: October 29, 2003
    Date of Patent: June 24, 2008
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Takahiko Kawatani
  • Patent number: 7308138
    Abstract: A document segmentation method of detecting segmentation points where a topic of an input document is discontinuous before and after the point to divide the document into plural blocks includes: detecting terms that occur in an input document; segmenting the input document into document segments, each segment being an appropriate sized chunk; generating document segment vectors with as its elements values related to frequencies of the terms occurring in the document segments; calculating eigenvalues and eigenvectors of a square sum matrix of the document segment vectors; selecting the basis vectors consisting a subspace from the eigenvectors to calculate the topic continuity of the document segments; calculating vectors with as their elements the values corresponding to the projection values of the each document segment vector onto the basis vector; and determining segmentation points of the document based on the continuity of the projected vectors.
    Type: Grant
    Filed: November 16, 2001
    Date of Patent: December 11, 2007
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Takahiko Kawatani
  • Patent number: 7225120
    Abstract: A computer extracts important terms, phrases or sentences from a document that it segments. The computer generates a square sum matrix from the document segments. The computer determines the importance of a given term, phrase or sentence on the basis of eigenvectors and eigenvalues of the matrix. The computer thereby selects the important terms, phrases or sentences related to the central concepts of the document.
    Type: Grant
    Filed: May 30, 2002
    Date of Patent: May 29, 2007
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Takahiko Kawatani
  • Patent number: 7200802
    Abstract: Two document sets are compared in natural language processing and the distinctiveness of each constituent element (such as a sentence, term or phrase) of one document set is evaluated by dividing both the target and comparison documents into document segments, constructing the sentence vector of each document segment whose components are the occurring frequencies of terms occurring in the document segment, and projecting all the sentence vectors of both the documents on a projection axis to find a projection axis which maximizes a ratio equal to: (squared sum of projected values originating from the target document)/(squared sum of projected values originating from the comparison document). Projected values are obtained by projecting the sentence vectors on the projection axis, and the degrees of distinctiveness of the individual sentences of the target document are calculated on the basis of the projected values.
    Type: Grant
    Filed: June 13, 2003
    Date of Patent: April 3, 2007
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Takahiko Kawatani
  • Patent number: 7194461
    Abstract: A data procassing unit in programmed to retrieve a document item and/or an information item from a plurality of document items and/or information items. Each of the items is identified by an index vector. The retrieval (15) is in response to a query (11) including plural query terms related to each other by Boolean logic. The program causes the data processing unit to transform the query into vector form through matrix calculations (12) and to measure the similarities of the item index vectors and the vector form of the query to determine which of the items correspond with the query.
    Type: Grant
    Filed: March 1, 2002
    Date of Patent: March 20, 2007
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Takahiko Kawatani
  • Patent number: 7185008
    Abstract: A document is classified into at least one document class by selecting terms for use in the classification from among terms that occur in the document. A similarity between the input document and each class is calculated using information saved for every document class. The calculated similarity to each class is corrected. The class to which the input document belongs is determined in accordance with the corrected similarity to each class.
    Type: Grant
    Filed: February 27, 2003
    Date of Patent: February 27, 2007
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Takahiko Kawatani
  • Publication number: 20050097436
    Abstract: A document classification system automatically sorts an input document into pre-determined document classes by matching the input document to class models. The content of the input documents changes with time and the class models deteriorate. Similarities between a training document set and an actual document set (which is classified into multiple classes) is calculated with respect to each class. A class with a low similarity is selected. Alternatively, classes where deterioration has occurred are detected by calculating similarities between the training document set in each individual class and the actual document set in all other classes. Class-pairs with low similarities are calculated. Close topic class-pairs are detected by calculating similarities between the training document set and all the class-pairs. Class-pairs with low similarities are selected.
    Type: Application
    Filed: October 29, 2004
    Publication date: May 5, 2005
    Inventor: Takahiko Kawatani
  • Publication number: 20040230577
    Abstract: In document (or pattern) clustering, the correct number of clusters and accurate assignment of each document (or pattern) to the correct cluster are attained. Documents (or patterns) describing the same topic (or object) are grouped, so a document (or pattern) group belonging to the same cluster has some commonality. Each topic (or object) has distinctive terms (or object features) or term (or object feature) pairs. When the closeness of each document (or pattern) to a given cluster is obtained, common information about the given cluster is extracted and used while the influence of terms (or object features) or term (or object feature) pairs not distinctive to the given cluster is excluded.
    Type: Application
    Filed: March 4, 2004
    Publication date: November 18, 2004
    Inventor: Takahiko Kawatani
  • Patent number: 6778704
    Abstract: A pattern recognition method that determines the category of an unknown pattern. The category is one of a set of categories corresponding to a set of known patterns. A subcategory-level recognition dictionary is provided that stores reference information for each one of plural subcategories obtained by partitioning the categories constituting the category set. A pattern signal respresenting the unknown pattern is received (12) and is processed to extract a feature vector from it. The reference information of one subcategory of each category in the recognition dictionary is selected (14, 16) from the recognition dictionary in response to the feature vector. Finally, a distance between the feature vector and the reference information of the subcategory of each category selected in the selected in the selecting step is determined (18) to determine the category of the unknown pattern.
    Type: Grant
    Filed: January 4, 2002
    Date of Patent: August 17, 2004
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Takahiko Kawatani
  • Publication number: 20040093557
    Abstract: In evaluating commonality of documents, each sentence is represented by a binary vector whose components indicate the presence or absence of corresponding terms, whereupon the concept of a common vector among documents is introduced. One sentence vector is derived from each of the documents to form a group of sentence groups, and only components which assume “1” (one) in all the vectors are “1”, the other components being “0” (zero). The commonality of a document set is evaluated by employing the sum or squared sum of the numbers of components whose values are not zero in the individual common vectors, for all the common vectors.
    Type: Application
    Filed: October 29, 2003
    Publication date: May 13, 2004
    Inventor: Takahiko Kawatani
  • Publication number: 20040086178
    Abstract: The invention provides a document segmentation method of detecting segmentation points where a topic of an input document is discontinuous before and after the point to divide the document into plural blocks.
    Type: Application
    Filed: December 8, 2003
    Publication date: May 6, 2004
    Inventor: Takahiko Kawatani
  • Publication number: 20040078363
    Abstract: A data procassing unit in programmed to retrieve a document item and/or an information item from a plurality of document items and/or information items. Each of the items is identified by an index vector. The retrieval (15) is in response to a query (11) including plural query terms related to each other by Boolean logic. The program causes the data processing unit to transform the query into vector form through matrix calculations (12) and to measure the similarities of the item index vectors and the vector form of the query to determine which of the items correspond with the query.
    Type: Application
    Filed: September 2, 2003
    Publication date: April 22, 2004
    Inventor: Takahiko Kawatani
  • Publication number: 20040068396
    Abstract: The invention provides a document representation method and a document analysis method including extraction of important sentences from a given document and/or determination of similarity between two documents.
    Type: Application
    Filed: September 22, 2003
    Publication date: April 8, 2004
    Inventor: Takahiko Kawatani
  • Publication number: 20040006736
    Abstract: Two document sets are compared in natural language processing and the distinctiveness of each constituent element (such as a sentence, term or phrase) of one document set is evaluated by dividing both the target and comparison documents into document segments, constructing the sentence vector of each document segment whose components are the occurring frequencies of terms occurring in the document segment, and projecting all the sentence vectors of both the documents on a projection axis to find a projection axis which maximizes a ratio equal to: (squared sum of projected values originating from the target document)/(squared sum of projected values originating from the comparison document). Projected values are obtained by projecting the sentence vectors on the projection axis, and the degrees of distinctiveness of the individual sentences of the target document are calculated on the basis of the projected values.
    Type: Application
    Filed: June 13, 2003
    Publication date: January 8, 2004
    Inventor: Takahiko Kawatani
  • Patent number: 6671404
    Abstract: A pattern recognition apparatus that comprises an input section, a feature extraction module, a feature transform module, a recognition section that includes a recognition dictionary, and a categorizer. The input section receives input patterns that include a pattern belonging to one of plural categories constituting a category set. The feature extraction module that expresses features of the pattern as a feature vector. The feature transform module uses transform vector matrices to transform at least part of the feature vector to generate an at least partially transformed feature vector corresponding to each of the categories. The transform vector matrices include a transform vector matrix generated in response to a rival pattern set composed of rival patterns misrecognized as belonging to plural ones of the categories. The plural ones of the categories constitute a category subset. The at least partially transformed feature vector is common to the ones of the categories constituting the category subset.
    Type: Grant
    Filed: January 22, 2000
    Date of Patent: December 30, 2003
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Takahiko Kawatani, Hiroyuki Shimizu
  • Publication number: 20030167267
    Abstract: A document is classified into at least one document class by selecting terms for use in the classification from among terms that occur in the document. A similarity between the input document and each class is calculated using information saved for every document class. The calculated similarity to each class is corrected. The class to which the input document belongs is determined in accordance with the corrected similarity to each class.
    Type: Application
    Filed: February 27, 2003
    Publication date: September 4, 2003
    Inventor: Takahiko Kawatani