Patents Assigned to Word Data Corp.

Code, system and method for representing a natural-language text in a form suitable for text manipulation

Patent number: 7386442

Abstract: A computer method, system and code, for representing a natural-language document in a vector form suitable for text manipulation operations are disclosed. The method involves determining (a) for each of a plurality of terms selected from one of (i) non-generic words in the document, (ii) proximately arranged word groups in the document, and (iii) a combination of (i) and (ii), a selectivity value of the term related to the frequency of occurrence of that term in a library of texts in one field, relative to the frequency of occurrence of the same term in one or more other libraries of texts in one or more other fields, respectively. The document is represented as a vector of terms, where the coefficient assigned to each term includes a function of the selectivity value determined for that term.

Type: Grant

Filed: July 1, 2003

Date of Patent: June 10, 2008

Assignee: Word Data Corp.

Inventors: Peter J. Dehlinger, Shao Chin
Processing input text to generate the selectivity value of a word or word group in a library of texts in a field is related to the frequency of occurrence of that word or word group in library

Patent number: 7181451

Abstract: Disclosed is an automated system, machine-readable storage medium embodying computer-executable code, and method for generating descriptive words and optionally, multi-word groups derived from a digitally encoded, natural-language input text that describes a concept, invention, or event in a selected field. The system includes (a) an electronic digital computer, (b) a database of words and optionally, word-groups derived from a plurality of texts, and (c) machine-readable storage medium embodying computer-executable code for accessing the database. The database provides, or can be used to calculate, a selectivity value for each of the words and optionally, word groups contained in or derived from the input text. Words and optionally, word groups having an above-threshold selectivity value are selected as descriptive terms from the input text.

Type: Grant

Filed: September 30, 2002

Date of Patent: February 20, 2007

Assignee: Word Data Corp.

Inventors: Peter J. Dehlinger, Shao Chin
Text-classification code, system and method

Patent number: 7024408

Abstract: Disclosed are a computer-readable code, system and method for classifying a target document in the form of a digitally encoded natural-language text as belonging to one or more of two or more different classes. For each of a plurality of non-generic words and/or words groups characterizing the target document, there is determined a selectivity value calculated as the frequency of occurrence of that term in a library of texts in one field, relative to the frequency of occurrence of the same term in one or more other libraries of texts in one or more other fields, respectively, and the document is represented as a vector of terms, where the coefficient assigned to each term is a function of the selectivity value determined for that term. There is then determined, for each of the plurality of sample texts having associated classification identifiers, a match score related to the number of descriptive terms present in or derived from that text that match those in the target text.

Type: Grant

Filed: July 1, 2003

Date of Patent: April 4, 2006

Assignee: Word Data Corp.

Inventors: Peter J. Dehlinger, Shao Chin
Text-classification system and method

Patent number: 7016895

Abstract: Disclosed are a computer-readable code, system and method for classifying a target document in the form of a digitally encoded natural-language text as belonging to one or more of two or more different classes. Each of a plurality of non-generic words and optionally, words groups characterizing the target document is selected as a descriptive term if the term has an above-threshold selectivity value in at least one library of texts in a field, where the selectivity value of a term is a measure of the field-specificity of that term. There is then determined, for each of the plurality of sample texts having associated classification identifiers, a match score related to the number of descriptive terms present in or derived from that text that match those in the target text. From the selected matched texts, and the associated classification identifiers, a classification determination of the target document is made.

Type: Grant

Filed: February 25, 2003

Date of Patent: March 21, 2006

Assignee: Word Data Corp.

Inventors: Peter J. Dehlinger, Shao Chin
Text representation and method

Patent number: 7003516

Abstract: A computer method for representing a natural-language document in a vector form suitable for text manipulation operations is disclosed. The method involves determining (a) for each of a plurality of terms composed of non-generic words and, optionally, proximately arranged word groups in the document, a selectivity value of the term related to the frequency of occurrence of that term in a library of texts in one field, relative to the frequency of occurrence of the same term in one or more other libraries of texts in one or more other fields, respectively. The document is represented as a vector of terms, where the coefficient assigned to each term includes a function of the selectivity value determined for that term, and optionally related to the inverse document frequency of that word in one or more libraries of texts. Also disclosed are a computer-readable code for carrying out the method, a computer system that employs the code, and a vector produced by the method.

Type: Grant

Filed: May 15, 2003

Date of Patent: February 21, 2006

Assignee: Word Data Corp.

Inventors: Peter J. Dehlinger, Shao Chin

Code, system and method for representing a natural-language text in a form suitable for text manipulation

Processing input text to generate the selectivity value of a word or word group in a library of texts in a field is related to the frequency of occurrence of that word or word group in library

Text-classification code, system and method

Text-classification system and method

Text representation and method