Patents by Inventor Gregory Grefenstette

Gregory Grefenstette has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8489384
    Abstract: The present invention relates to an automatic translation method. When a sentence in a source language is translated into a sentence in a target language, the method comprises: a step (1) of extracting the set of sentence portions of the target language from a textual database that correspond to a total or partial translation of the source sentence to be translated; a step (2) of determining all the assemblies of these target sentence portions that overlap the source sentence; a step (3) of choosing the best assemblies according to a criterion of maximum overlap between the target sentence portions assembled in the preceding step and according to a criterion of minimizing the number of assembled elements; a step (4) of determining the target sentence by choosing the best assembly according to coherence criteria. The invention is notably applicable to the translation of texts in a rare language. More generally, it applies to translation with no previously established bilingual texts.
    Type: Grant
    Filed: March 12, 2008
    Date of Patent: July 16, 2013
    Assignee: Commissariat a l'Energie Atomique
    Inventors: Christian Fluhr, Gregory Grefenstette, Nasredine Semmar
  • Patent number: 8219904
    Abstract: A system includes a meta-document, i.e., a document including content information which has a set of document service requests associated with it. A document service is a process which uses a portion of the document content as a starting point to obtain other information pertaining to that content. A scheduler selects a document service request from the set, then initiates and manages managing communication with a service provider to satisfy the selected document service. Any results received from the selected document service are integrated into the document.
    Type: Grant
    Filed: May 21, 2010
    Date of Patent: July 10, 2012
    Assignee: Xerox Corporation
    Inventors: James Shanahan, Gregory Grefenstette
  • Publication number: 20100235337
    Abstract: A system includes a meta-document, i.e., a document including content information which has a set of document service requests associated with it. A document service is a process which uses a portion of the document content as a starting point to obtain other information pertaining to that content. A scheduler selects a document service request from the set, then initiates and manages managing communication with a service provider to satisfy the selected document service. Any results received from the selected document service are integrated into the document.
    Type: Application
    Filed: May 21, 2010
    Publication date: September 16, 2010
    Inventors: James Shanahan, Gregory Grefenstette
  • Patent number: 7757168
    Abstract: A system includes a meta-document, i.e., a document including content information which has a set of document service requests associated with it. A document service is a process which uses a portion of the document content as a starting point to obtain other information pertaining to that content. A scheduler selects a document service request from the set, then initiates and manages managing communication with a service provider to satisfy the selected document service. Any results received from the selected document service are integrated into the document.
    Type: Grant
    Filed: April 7, 2000
    Date of Patent: July 13, 2010
    Assignee: Xerox Corporation
    Inventors: James Shanahan, Gregory Grefenstette
  • Publication number: 20100114558
    Abstract: The present invention relates to an automatic translation method. When a sentence in a source language is translated into a sentence in a target language, the method comprises: a step (1) of extracting the set of sentence portions of the target language from a textual database that correspond to a total or partial translation of the source sentence to be translated; a step (2) of determining all the assemblies of these target sentence portions that overlap the source sentence; a step (3) of choosing the best assemblies according to a criterion of maximum overlap between the target sentence portions assembled in the preceding step and according to a criterion of minimizing the number of assembled elements; a step (4) of determining the target sentence by choosing the best assembly according to coherence criteria. The invention is notably applicable to the translation of texts in a rare language. More generally, it applies to translation with no previously established bilingual texts.
    Type: Application
    Filed: March 12, 2008
    Publication date: May 6, 2010
    Inventors: Christian Fluhr, Gregory Grefenstette, Nasredlne Semmar
  • Publication number: 20080005651
    Abstract: A method, system and article of manufacture therefor, are disclosed for automatically generating a query from document content.
    Type: Application
    Filed: September 10, 2007
    Publication date: January 3, 2008
    Applicant: XEROX CORPORATION
    Inventors: Gregory Grefenstette, James Shanahan
  • Publication number: 20070021956
    Abstract: A method of generating an ideographic representation of a name given in a letter based system begins with a determination of the language of original. After determining the language of origin for the name, the name is segmented into a segmentation sequence in response to the determined language of origin. A candidate representation is generated for the segmentation sequence based on ideographic representations of the segments. A corpus is used to validate the candidate representation. The corpus can be either a monolingual corpus or a multilingual corpus. The method can also include adding an additional validation step using either a monolingual corpus or a multilingual corpus, which ever was not used in the first validation step. Because of the rules governing abstracts, this abstract should not be used to construe the claims.
    Type: Application
    Filed: July 6, 2006
    Publication date: January 25, 2007
    Inventors: Yan Qu, Gregory Grefenstette
  • Patent number: 6973423
    Abstract: A processor implemented method of identifying the text genre of a machine-readable, untagged text. The processor implemented method begins by generating a cue vector from the text, which represents occurrences in the text of a first set of nonstructural, surface cues, which are easily computable. Afterward, the processor determines whether the text is an instance of a first text genre using the cue vector and a weighting vector associated with the first text genre.
    Type: Grant
    Filed: June 18, 1998
    Date of Patent: December 6, 2005
    Assignee: Xerox Corporation
    Inventors: Geoffrey D. Nunberg, Hinrich Schuetze, Jan O. Pedersen, Brett L. Kessler, Gregory Grefenstette
  • Publication number: 20050154578
    Abstract: A method and system identifying the language of a textual passage is disclosed. The method and system includes parsing the textual passage into n-grams and assigning an initial weight to each n-gram, and adjusting the weight initially assigned to a word or n-gram parsed from the textual passage. The initially assigned weight is adjusted in a manner proportionate to the inverse of the number of languages within which such words or n-grams appear. Reducing the weight assigned to such words or n-grams diminishes—without completely eliminating—their importance in comparison to other words or n-grams parsed from the same textual passage when determining the language of a passage. The method and system of the present invention appropriately weighs the short words or n-grams common to multiple languages without affecting the short words or n-grams that are uncommon to several languages.
    Type: Application
    Filed: January 14, 2004
    Publication date: July 14, 2005
    Inventors: Xiang Tong, Gregory Grefenstette, David Evans
  • Publication number: 20050022114
    Abstract: A digitally readable identifier located proximate to a physical object communicates a personality identifier. The personality identifier is automatically or manually associated with document content. The personality identifier is automatically associated with document content using context information that identifies the place of a physical object and/or the time the personality identifier is communicated. Once the personality identifier is associated with document content, a meta-document server enriches the document content in accordance with a predefined thematic set of document services identified by the personality identifier.
    Type: Application
    Filed: December 5, 2001
    Publication date: January 27, 2005
    Applicant: Xerox Corporation
    Inventors: James Shanahan, Gregory Grefenstette
  • Patent number: 6498567
    Abstract: A system and remote control device that controls an appliance, and obtains a desired function of the appliance, by invoking a remote procedure call in a remote server which will thereby actually control the appliance. The remote control device comprises a sensor; sensor responsive means for capturing information provided by the sensor, wherein the captured information at least contains parameters representative of the identification of the appliance, user-profile parameters, parameters representative of the address of the remote server, and a function name indicating the function to be performed by the appliance; marshalling means for encoding the captured information and for packaging the captured information into data in a remote procedure call format; and a transmitter for establishing a communication protocol with the remote server in order to transmit the packaged data to the remote server so that the remote server may execute the remote procedure call.
    Type: Grant
    Filed: December 20, 1999
    Date of Patent: December 24, 2002
    Assignee: Xerox Corporation
    Inventors: Gregory Grefenstette, Francois Pacull, Max Copperman
  • Patent number: 6473729
    Abstract: A system and method are provided for translating an input text from a natural source language to a natural target language. The system stores a database that contains a plurality of pairs of text fragments with each pair including a text fragment in the source language and a corresponding text fragment in the target language. Each text fragment contains at least one word phrase and represents a primary grammatical unit such as a sentence or a clause. For translating a word phrase, the database is queried using a phrase index of the database, where the phrase index indexes text fragments by word phrases. Word phrases are noun phrases or word phrases. Alternatively, word phrases are predicates involving at least one verb and one noun or adjective used as a noun. The system further comprises a phrase extractor for extracting a word phrase from a text fragment of an input text.
    Type: Grant
    Filed: December 20, 1999
    Date of Patent: October 29, 2002
    Assignee: Xerox Corporation
    Inventors: Michel Gastaldo, Gregory Grefenstette
  • Patent number: 6446035
    Abstract: Expression/person data are obtained and, in turn, are used to obtain information about groups of people in a population. The people access resources that include linguistically analyzable content, such as Web pages that include text. The expression/person data identify, for each of a set of expression types that occur in the resources, people who have accessed resources that include that type. The group information indicates a group of people who have accessed resources that include instances of expression types that have similar conceptual content. For example, an item of expression/person data can be obtained when a person accesses a Web page in an acquisition mode, by performing linguistic analysis in the background. An expression type can be indicated, for example, by a syntactic relation and a pair of normalized words that occur in the syntactic relation in the analyzed text. The expression/person data can be stored in a database.
    Type: Grant
    Filed: May 5, 1999
    Date of Patent: September 3, 2002
    Assignee: Xerox Corporation
    Inventors: Gregory Grefenstette, Claude Roux
  • Patent number: 6430557
    Abstract: A query word is used to identify one of a number of word groups, by first determining whether the query word is in any of the word groups. If not, attempts to modify the query word are made in accordance with successive suffix relationships in a sequence until a modified query word is obtained that is in one of the word groups. The sequence of suffix relationships, which can be pairwise relationships, can be defined by a list ordered according to the frequencies of occurrence of the suffix relationships in a natural language. If a modified query word is obtained that is in one of the word groups, information identifying the word group can be provided, such as a representative of the group or a list of words in the group.
    Type: Grant
    Filed: December 16, 1998
    Date of Patent: August 6, 2002
    Assignee: Xerox Corporation
    Inventors: Eric Gaussier, Gregory Grefenstette, Jean-Pierre Chanod
  • Patent number: 6396951
    Abstract: To obtain a query for use in information retrieval, a document is scanned. The resulting text image data define an image of a segment of text in a first language. Automatic recognition is then performed on at least part of the text image data to obtain text code data including a series of element codes. Each element code indicates an element that occurs in the first language, and the series of element codes defines a set of expressions that also occur in the first language. Automatic translation is then performed on a version of the text code data to obtain translation data indicating a set of counterpart expressions in a second language. The counterpart expressions are used to automatically obtain query data defining the query. The query can then be provided to an information retrieval engine.
    Type: Grant
    Filed: December 23, 1998
    Date of Patent: May 28, 2002
    Assignee: Xerox Corporation
    Inventor: Gregory Grefenstette
  • Patent number: 6308149
    Abstract: A set of words of a natural language is grouped by automatically obtaining suffix relation data that indicate a relation value for each of a set of relationships between suffixes that occur in the natural language, and, then, by automatically clustering the words in the set using the relation values from the suffix relation data, to obtain group data indicating groups of words. Two or more words in a group have suffixes as in one of the relationships and, preceding the suffixes, equivalent substrings. The relationships can be pairwise relationships, and the relation value can indicate the number of occurrences of a suffix pair. The suffix relation data can be obtained using an inflectional lexicon. Complete link clustering can be used.
    Type: Grant
    Filed: December 16, 1998
    Date of Patent: October 23, 2001
    Assignee: Xerox Corporation
    Inventors: Eric Gaussier, Gregory Grefenstette, Jean-Pierre Chanod
  • Patent number: 6289304
    Abstract: Text is summarized using part-of-speech (POS) data indicating parts of speech for tokens in the text. The POS data can be obtained using input text data defining the text, such as by POS tagging. The POS data can be used to obtain group data indicating groups of tokens of the text, such as verb groups and noun groups. The group data can also indicate, within each group, any tokens that meet a POS based removal criterion. The group data can be used to obtain summarized text data by removing tokens that meet the removal criterion. The original text may be obtained via scanner or video camera from a user's document, and may be recognized to obtain input text data. The summarized text may output as text or as audio pronunciation using a speech synthesizer.
    Type: Grant
    Filed: March 17, 1999
    Date of Patent: September 11, 2001
    Assignee: Xerox Corporation
    Inventor: Gregory Grefenstette