Abstract: A method for performing a semantic analysis process on a computer system including a storage unit and an interface includes the steps of: receiving a syntactic tree generated from a natural language sentence text; determining whether an analysis object, which is one of nodes of the syntactic tree, is a verb phrase class which has a verb as a head or a non-verb phrase class which has mainly a noun as the head on the basis of subdivided type information of a phrase of the node with reference to first data stored in the storage unit; analyzing a relation between a verb in the analysis object and a deep case of the verb when the analysis object is the verb phrase class; analyzing a modificative relation in the analysis object when the analysis object is the non-verb phrase class; generating a semantic structure of the natural language sentence text wherein the semantic structure comprises semantic frames corresponding to nodes of the syntactic tree, at least two semantic frames of the semantic frames being linked
Type:
Grant
Filed:
August 31, 1999
Date of Patent:
June 5, 2001
Assignee:
Nippon Telegraph and Telephone Corporation
Abstract: An information abstracting method and apparatus for extracting and displaying keywords as an information abstract. Given a large number of character string data sets divided into prescribed units, the extracted keywords are significant and effective in describing a topic common to the plurality of units. The information abstracting apparatus comprises an input section for accepting an input of character string data divided into prescribed units, with each individual character represented by a character code, and an output section for displaying the result of information abstracting. Keywords contained in each of the prescribed units are extracted by a keyword extracting section from the character string input data from the input section. A score is calculated for each keyword by a score calculating section, so that a higher score is given to a keyword extracted from a larger number of units.
Type:
Grant
Filed:
December 14, 1998
Date of Patent:
May 29, 2001
Assignee:
Matsushita Electric Industrial Co., Ltd.
Abstract: A terminology extraction system which allows for automatic creation of bilingual terminology has a source text which comprises at least one sequence of source terms, aligned with a target text which also comprises at least one sequence of target terms. A term extractor builds a network from each source and target sequence wherein each node of the network comprises at least one term and such that each combination of source terms is included within one source node and each combination of target terms is included within one target node. The term extractor links each source node with each target node, and through a flow optimization method selects relevant links in the resulting network. Once the term extractor has been run on the entire set of aligned sequences, a term statistics circuit computes an association score for each pair of linked source/target terms, and finally the scored pairs of linked source/target term that are considered relevant bilingual terms are stored in a bilingual terminology database.
Type:
Grant
Filed:
May 15, 1998
Date of Patent:
May 22, 2001
Assignee:
International Business Machines Corporation
Abstract: An improved natural language parser uses a directed search template set to identify problematic word sequences, thus reducing processing time while increasing accuracy. The directed search template set is used to identify problematic input spans or portions of input spans. A problematic input span is one that contains at least one word or phrase that can be constructed in alternative ways. Problematic input spans can reduce the efficiency of a natural language parser and can result in the production of an inaccurate parse tree. Once a problematic span has been identified, the improved parser generates alternative parses for the problematic portion of the input span. This on-the-fly alternative parse generation permits the parser to consider the alternatives as early in the parse process as possible, thus reducing the overall time needed to parse a problematic input span.
Abstract: A computer program product for retrieving multi-media objects uses a natural language having a pronoun. The computer program product includes a computer readable storage medium having a computer program stored thereon for performing the steps of receiving a query in the natural language containing the pronoun; determining the pronoun in the query; determining whether either a phrase or sentence containing the pronoun conforms to a predetermined phrase structure; determining a noun or noun phrase to which the pronoun refers; and processing the query.
Abstract: A source signal embodying knowledge is decomposed into a simple and regular internal representation of a decomposition of epistemic instances of the knowledge. An epistemic instance is a fundamental semantic structure that expresses a transformation of two objects. The internal representation is then transformed into another internal representation from which a target signal is constructed. The complexity of language is localized within rules for decomposing the source signal into the internal representation and rules for transforming the internal representation into another internal representation. In one embodiment, source signal decomposition and target signal constructions are facilitated by look ups into a universal dictionary.
Abstract: A universal machine translator of arbitrary languages enables the semantic, or meaningful, translation of arbitrary languages with zero loss of meaning of the source language in the target language translation, which loss is typical in prior art human and machine translations.
Abstract: A method and apparatus for parsing in a spoken language translation system are provided, wherein an input is received comprising at least one input sentence or expression. A parsing table is accessed and consulted for a next action, wherein the parser looks up in the next action in the parsing table. During parsing operations, the parser may perform shift actions and reduce actions. In performing a shift action, a next item of the input string is shifted onto a stack or intermediate data structure of the parser. A new parse node is generated, and a feature structure or lexical feature structure of the shifted input item is obtained from a morphological analyzer and associated with the new parse node. The new node is placed on the stack or intermediate data structure. In performing a reduce action, a grammar rule and an associated compiled feature structure manipulation are applied.
Type:
Grant
Filed:
January 29, 1999
Date of Patent:
April 24, 2001
Assignees:
Sony Corporation, Sony Electronics, Inc.
Abstract: Patient monitoring apparatus for use in an environment which includes a plurality of sensors. The apparatus provides collection and display of patient data signals collected from a medical patient using the sensors, including periods when the patient is being transported. The apparatus comprises a portable monitor coupled to a plurality of distinct data acquisition modules, which are coupled to the sensors. The modules includes cartridges, which detachably mount to the portable monitor, and pods which are positioned independent of the monitor. The pods reduce the number of cables extending between the patient's bed and the portable monitor by combining signals from many sensors into a single output signal. The modules collect patient data in analog form from the sensors and provide digital data signals to the monitor. The portable monitor includes: a display device for displaying the patient data, and storage for the patient data. The portable monitor may be coupled to a docking station.
Type:
Grant
Filed:
January 6, 1995
Date of Patent:
April 24, 2001
Assignee:
Siemens Medical Electronics, Inc.
Inventors:
Michael Maschke, Thomas Bishop, Bengt Hermanrud, Wolfgang Scholz, Clifford Mark Kelly
Abstract: In an analogically similar word production apparatus, based on three inputted unit strings, an analogically similar word which is a word analogically similar to inputted unit strings is produced at high speed without using attributes and without any finite state automaton. A pseudo-distance matrix memory stores therein only matrix elements sufficient for computation of limited pseudo distances between two letter strings, out of the elements of two pseudo-distance matrices, and more specifically, matrix elements including diagonal elements are computed by a preprocessing section and then stored in the pseudo-distance matrix memory.
Type:
Grant
Filed:
August 6, 1999
Date of Patent:
April 17, 2001
Assignee:
ATR Interpreting Telecommunications Research
Laboratories
Abstract: A method involving computer-mediated linguistic analysis of online technical documentation to extract and catalog from the documentation knowledge essential to, for example, creating a online help database useful in providing online assistance to users in performing a task. The method comprises stripping markup tags from the documentation, linguistically analyzing and annotating the text, including the steps of morphologically and lexically analyzing the text, disambiguating between possible parts-of-speech for each word, and syntactically analyzing and labeling each word.
Abstract: A method and device for identifying word boundaries in continuous text compares the continuous text to a set of varying length strings to identify candidate word-initial boundaries and candidate word-final boundaries in the continuous text. Each candidate word-initial boundary and candidate word-final boundary has an associated probability value. Each candidate word boundary in the continuous text is identified by calculating a word boundary score for such candidate word boundary using the probability values associated with the candidate word-initial boundaries and candidate word-final boundaries. The set of varying length strings may include words and n-grams.
Abstract: A computerized method for extracting information from natural-language text data includes parsing the text data to determine the grammatical structure of the text data and regularizing the parsed text data to form structured word terms. The parsing step, which can be performed in one or more parsing modes, includes the step of referring to a domain parameter having a value indicative of a domain from which the text data originated, wherein the domain parameter corresponds to one or more rules of grammar within a knowledge base related to the domain to be applied for parsing the text data. Preferably, the structured output is mapped back to the words in the original sentences of the text data input using XML tags.
Type:
Grant
Filed:
August 6, 1999
Date of Patent:
January 30, 2001
Assignee:
The Trustees of Columbia University in the City of New
York
Abstract: A document or sentence processing apparatus having an input unit for inputting characters, a display unit for displaying input characters and a processing unit for converting and editing the input characters, in which the processing unit has a candidate word extraction unit which extracts candidates for the words with their characters omitted and/or omitted words themselves by referring to the vocabulary dictionary storing words and their usage frequency, to the dictionary of transition between words defining the information on the transition between words and the probability of the transition between words, and by searching the characters before and after the elliptic character included in the input sentence into the vocabulary dictionary, and a determination unit which selects a single word among the extracted candidate words by referring to the dictionary of transition between words.
Abstract: A Chinese error check system comprises: an input device for inputting a sentence to be checked; a regular dictionary storing device for storing regular words and their weights; a special segmentation classes storing device for storing special segmentation classes and their weights; a segmentation device for segmenting an inputted sentence by retrieving the contents of the regular dictionary storing device and the special segmentation class storing device and employing a dynamic programming method to select a most probable segmentation of the inputted sentence; a lone character bigram table storage device for storing the lone character bigram table, the table having the probability of Chinese character pairs being adjacent lone character pairs stored therein; and a segmentation results processing device operatively coupled to the lone character bigram table storage device and the segmentation device for processing the segmentation results and displaying possible errors in the inputted sentence.
Abstract: A computer based software system and method for semantically processing a user entered natural language request to identify and store linguistic subject-action-object (SAO) structures, using such structures as key words/phrases to search local and web-based databases for downloading candidate natural language documents, semantically processing candidate document texts into candidate document SAO structures, and selecting and storing only relevant documents whose SAO structures include a match with a stored request SAO structure. Further features include analyzing relationships among relevant document SAO structures and creating new SAO structures based on such relationships that may yield new knowledge concepts and ideas for display to the user and generating and displaying natural language summaries based on the relevant document SAO structures.
Type:
Grant
Filed:
May 27, 1999
Date of Patent:
December 26, 2000
Assignee:
Invention Machine Corporation
Inventors:
Valery M. Tsourikov, Leonid S. Batchilo, Igor V. Sovpel
Abstract: A network based language translation system is provided. A network is provided that has language translation software installed on the network. A user communication device that is interconnected to the network is utilized to communicate with the network. The user communication device both inputs text and/or spoken communications into the network and receives text and/or spoken communications from the network. The network is able to receive communication inputs from multiple users in multiple languages and translate and transmit output communications to those users in languages designated by the users.
Abstract: The present invention provides an ideogrammatic character editor method and apparatus for creating, editing and communicating ideogrammatic characters which are comprised of a series of strokes forming a word in a particular language. A platform displays pre-formed strokes and provides an area on which the pre-formed strokes are positioned. A selector selects and positions the pre-formed strokes on the platform. An encoder encodes each pre-formed stroke selected and positioned by the selector on the platform as a stroke code and a position on the platform. A processor stores the stroke code and the position for each pre-formed stroke encoded by the encoding unit in a stroke loc list. In preferred embodiments, the present invention creates Japanese Kanji, Chinese and Korean characters, but also creates ideogrammatic characters of any language including those presently existing or those yet to be developed.
Abstract: A translating apparatus and a translating method wherein a first language sentence is divided into syntax units consisting of predetermined units of sentence structure such as clauses and phrases in stages from large syntax units into small syntax units and at each stage stored examples most similar to these syntax units are detected using probability models taking into account grammatical attributes of the syntax units and of the examples and using generalized linguistic knowledge and with reference to a thesaurus and the syntax units are translated on the basis of these detected examples and the results of translation of the syntax units are compounded to generate a second language sentence.
Abstract: The present invention is directed to performing information retrieval utilizing semantic representation of text. In a preferred embodiment, a tokenizer generates from an input string information retrieval tokens that characterize the semantic relationship expressed in the input string. The tokenizer first creates from the input string a primary logical form characterizing a semantic relationship between selected words in the input string. The tokenizer then identifies hypernyms that each have an "is a" relationship with one of the selected words in the input string. The tokenizer then constructs from the primary logical form one or more alternative logical forms. The tokenizer constructs each alternative logical form by, for each of one or more of the selected words in the input string, replacing the selected word in the primary logical form with an identified hypernym of the selected word. Finally, the tokenizer generates tokens representing both the primary logical form and the alternative logical forms.
Type:
Grant
Filed:
August 3, 1999
Date of Patent:
December 12, 2000
Assignee:
Microsoft Corporation
Inventors:
John J. Messerly, George E. Heidorn, Stephen D. Richardson, William B. Dolan, Karen Jensen