Patents by Inventor Ashok C. Popat

Ashok C. Popat has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method and system for document image layout deconstruction and redisplay

Patent number: 10606933

Abstract: The invention converts a document originating in a page-image format into a form suitable for an arbitrarily sized display, by reformatting or “re-flowing” of the document to fit an arbitrarily sized display device. A two-stage system analyzes, or “deconstructs,” page image layout. The deconstruction includes both physical (geometric) and logical (functional) segmentation of page images. The segment that image elements may include blocks, lines, and/or words of text, and other segmented image elements. The segment that image elements are synthesized and converted into an intermediate structure. The intermediate data structure is then distilled or converted or redisplayed into any number of standard print formats.

Type: Grant

Filed: June 3, 2011

Date of Patent: March 31, 2020

Assignee: XEROX CORPORATION

Inventors: Thomas M. Breuel, Henry S. Baird, William C. Janssen, Ashok C. Popat, Daniel S. Bloomberg
Identifying matching canonical documents in response to a visual query

Patent number: 9183224

Abstract: A server system receives a visual query from a client system. The visual query is an image containing text such as a picture of a document. At the receiving server or another server, optical character recognition (OCR) is performed on the visual query to produce text recognition data representing textual characters. Each character in a contiguous region of the visual query is individually scored according to its quality. The quality score of a respective character is influenced by the quality scores of neighboring or nearby characters. Using the scores, one or more high quality strings of characters are identified. Each high quality string has a plurality of high quality characters. A canonical document containing the one or more high quality textual strings is retrieved. At least a portion of the canonical document is sent to the client system.

Type: Grant

Filed: August 6, 2010

Date of Patent: November 10, 2015

Assignee: Google Inc.

Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
Generating a combination of a visual query and matching canonical document

Patent number: 9176986

Abstract: A server system receives a visual query from a client system distinct from the server system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query, and scores each textual character in the plurality of textual characters. The server system identifies, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query; retrieves a canonical document having the one or more high quality textual strings; generates a combination of the visual query and at least a portion of the canonical document; and sends the combination to the client system.

Type: Grant

Filed: December 1, 2011

Date of Patent: November 3, 2015

Assignee: Google Inc.

Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
Identifying matching canonical documents consistent with visual query structural information

Patent number: 9087235

Abstract: A server system receives a visual query from a client system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system also produces structural information associated with the textual characters in the visual query. Textual characters in the plurality of textual characters are scored. The method further includes identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. A canonical document that includes the one or more high quality textual strings and that is consistent with the structural information is retrieved. At least a portion of the canonical document is sent to the client system.

Type: Grant

Filed: July 29, 2014

Date of Patent: July 21, 2015

Assignee: Google Inc.

Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
Compound splitting

Patent number: 9075792

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for decompounding compound words are disclosed. In one aspect, a method includes obtaining a token that includes a sequence of characters, identifying two or more candidate sub-words that are constituents of the token, and one or more morphological operations that are required to transform the sub-words into the token, where at least one of the morphological operations involves a use of a non-dictionary word, and determining a cost associated with each sub-word and a cost associated with each morphological operation.

Type: Grant

Filed: February 14, 2011

Date of Patent: July 7, 2015

Assignee: Google Inc.

Inventors: Andrew M. Dai, Klaus Macherey, Franz Josef Och, Ashok C. Popat, David R. Talbot
Identifying Matching Canonical Documents Consistent With Visual Query Structural Information

Publication number: 20140334746

Abstract: A server system receives a visual query from a client system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system also produces structural information associated with the textual characters in the visual query. Textual characters in the plurality of textual characters are scored. The method further includes identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. A canonical document that includes the one or more high quality textual strings and that is consistent with the structural information is retrieved. At least a portion of the canonical document is sent to the client system.

Type: Application

Filed: July 29, 2014

Publication date: November 13, 2014

Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
Identifying matching canonical documents consistent with visual query structural information

Patent number: 8811742

Abstract: A server system receives a visual query from a client system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system also produces structural information associated with the textual characters in the visual query. Textual characters in the plurality of textual characters are scored. The method further includes identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. A canonical document that includes the one or more high quality textual strings and that is consistent with the structural information is retrieved. At least a portion of the canonical document is sent to the client system.

Type: Grant

Filed: December 1, 2011

Date of Patent: August 19, 2014

Assignee: Google Inc.

Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
Large language models in machine translation

Patent number: 8812291

Abstract: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n?1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.

Type: Grant

Filed: December 10, 2012

Date of Patent: August 19, 2014

Assignee: Google Inc.

Inventors: Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, Jeffrey Dean
Identifying matching canonical documents in response to a visual query and in accordance with geographic information

Patent number: 8805079

Abstract: A server system receives a visual query from a client system distinct from the server system. The server system performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system scores each textual character in the plurality of textual characters in accordance with the geographic location of the client system. The server system identifies, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. Then the server system retrieves a canonical document having the one or more high quality textual strings and sends at least a portion of the canonical document to the client system.

Type: Grant

Filed: December 1, 2011

Date of Patent: August 12, 2014

Assignee: Google Inc.

Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
Methods and systems for assessing the quality of automatically generated text

Patent number: 8442813

Abstract: A set of ordered characters is received in association with information specifying the locations of the characters within the image of the document. Language-conditional character probabilities for each character are determined based on a set of language models and the ordering of the characters. Neighbor characters associated with a target character are identified based on the locations of the characters. Language-conditional character probabilities associated with the neighbor characters and language-conditional character probabilities associated with the target character are combined to generate a local language-conditional likelihood associated with the target character, the local language-conditional likelihood representing a concordance of the target character to a language model.

Type: Grant

Filed: February 5, 2009

Date of Patent: May 14, 2013

Assignee: Google Inc.

Inventor: Ashok C. Popat
Large language models in machine translation

Patent number: 8332207

Abstract: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n-1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.

Type: Grant

Filed: June 22, 2007

Date of Patent: December 11, 2012

Assignee: Google Inc.

Inventors: Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, Jeffrey Dean
Identifying Matching Canonical Documents in Response to a Visual Query and in Accordance with Geographic Information

Publication number: 20120134590

Abstract: A server system receives a visual query from a client system distinct from the server system. The server system performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system scores each textual character in the plurality of textual characters in accordance with the geographic location of the client system. The server system identifies, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. Then the server system retrieves a canonical document having the one or more high quality textual strings and sends at least a portion of the canonical document to the client system.

Type: Application

Filed: December 1, 2011

Publication date: May 31, 2012

Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
Generating a Combination of a Visual Query and Matching Canonical Document

Publication number: 20120128250

Abstract: A server system receives a visual query from a client system distinct from the server system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query, and scores each textual character in the plurality of textual characters. The server system identifies, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query; retrieves a canonical document having the one or more high quality textual strings; generates a combination of the visual query and at least a portion of the canonical document; and sends the combination to the client system.

Type: Application

Filed: December 1, 2011

Publication date: May 24, 2012

Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
Identifying Matching Canonical Documents Consistent with Visual Query Structural Information

Publication number: 20120128251

Abstract: A server system receives a visual query from a client system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system also produces structural information associated with the textual characters in the visual query. Textual characters in the plurality of textual characters are scored. The method further includes identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. A canonical document that includes the one or more high quality textual strings and that is consistent with the structural information is retrieved. At least a portion of the canonical document is sent to the client system.

Type: Application

Filed: December 1, 2011

Publication date: May 24, 2012

Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
PARALLEL DOCUMENT MINING

Publication number: 20120047172

Abstract: A technique includes providing a collection of documents in multiple languages, identifying, from the collection of documents, a group of candidate documents, where each candidate document in the group shares multiple corresponding rare features, evaluating pairs of candidate documents in the group using multiple common features present in the collection of documents, and determining, based on evaluating the pairs of candidate documents, whether each pair of candidate documents corresponds to a translated pair of documents.

Type: Application

Filed: August 22, 2011

Publication date: February 23, 2012

Applicant: Google Inc.

Inventors: Jay M. Ponte, Jakob Uszkoreit, Ashok C. Popat, Moshe Dubiner
METHOD AND SYSTEM FOR DOCUMENT IMAGE LAYOUT DECONSTRUCTION AND REDISPLAY

Publication number: 20110289395

Abstract: The invention converts a document originating in a page-image format into a form suitable for an arbitrarily sized display, by reformatting or “re-flowing” of the document to fit an arbitrarily sized display device. A two-stage system analyzes, or “deconstructs,” page image layout. The deconstruction includes both physical (geometric) and logical (functional) segmentation of page images. The segment that image elements may include blocks, lines, and/or words of text, and other segmented image elements. The segment that image elements are synthesized and converted into an intermediate structure. The intermediate data structure is then distilled or converted or redisplayed into any number of standard print formats.

Type: Application

Filed: June 3, 2011

Publication date: November 24, 2011

Applicant: XEROX CORPORATION

Inventors: Thomas M. BREUEL, Henry S. BAIRD, William C. JANSSEN, Ashok C. POPAT, Dan S. BLOOMBERG
Compound Splitting

Publication number: 20110202330

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for decompounding compound words are disclosed. In one aspect, a method includes obtaining a token that includes a sequence of characters, identifying two or more candidate sub-words that are constituents of the token, and one or more morphological operations that are required to transform the sub-words into the token, where at least one of the morphological operations involves a use of a non-dictionary word, and determining a cost associated with each sub-word and a cost associated with each morphological operation.

Type: Application

Filed: February 14, 2011

Publication date: August 18, 2011

Applicant: GOOGLE INC.

Inventors: Andrew M. Dai, Klaus Macherey, Franz Josef Och, Ashok C. Popat, David R. Talbot
Identifying Matching Canonical Documents in Response to a Visual Query

Publication number: 20110129153

Abstract: A server system receives a visual query from a client system. The visual query is an image containing text such as a picture of a document. At the receiving server or another server, optical character recognition (OCR) is performed on the visual query to produce text recognition data representing textual characters. Each character in a contiguous region of the visual query is individually scored according to its quality. The quality score of a respective character is influenced by the quality scores of neighboring or nearby characters. Using the scores, one or more high quality strings of characters are identified. Each high quality string has a plurality of high quality characters. A canonical document containing the one or more high quality textual strings is retrieved. At least a portion of the canonical document is sent to the client system.

Type: Application

Filed: August 6, 2010

Publication date: June 2, 2011

Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
Large Language Models in Machine Translation

Publication number: 20080243481

Abstract: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n-1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.

Type: Application

Filed: June 22, 2007

Publication date: October 2, 2008

Inventors: Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, Jeffrey Dean
Document image decoding systems and methods using modified stack algorithm

Patent number: 7289668

Abstract: Methods and systems for document image decoding incorporating a Stack algorithm improve document image decoding. The application of the Stack algorithm is iterated to improved decoding. A provisional weight is determined for a partial path to reduce template matching. In addition, semantically equivalent hypotheses are identified to reduce redundant hypotheses.

Type: Grant

Filed: August 9, 2002

Date of Patent: October 30, 2007

Assignee: Xerox Corporation

Inventors: Daniel H. Greene, Tze-Lei Poo, Ashok C. Popat

1 2 next