Patents by Inventor Ashok C. Popat
Ashok C. Popat has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10606933Abstract: The invention converts a document originating in a page-image format into a form suitable for an arbitrarily sized display, by reformatting or “re-flowing” of the document to fit an arbitrarily sized display device. A two-stage system analyzes, or “deconstructs,” page image layout. The deconstruction includes both physical (geometric) and logical (functional) segmentation of page images. The segment that image elements may include blocks, lines, and/or words of text, and other segmented image elements. The segment that image elements are synthesized and converted into an intermediate structure. The intermediate data structure is then distilled or converted or redisplayed into any number of standard print formats.Type: GrantFiled: June 3, 2011Date of Patent: March 31, 2020Assignee: XEROX CORPORATIONInventors: Thomas M. Breuel, Henry S. Baird, William C. Janssen, Ashok C. Popat, Daniel S. Bloomberg
-
Patent number: 9183224Abstract: A server system receives a visual query from a client system. The visual query is an image containing text such as a picture of a document. At the receiving server or another server, optical character recognition (OCR) is performed on the visual query to produce text recognition data representing textual characters. Each character in a contiguous region of the visual query is individually scored according to its quality. The quality score of a respective character is influenced by the quality scores of neighboring or nearby characters. Using the scores, one or more high quality strings of characters are identified. Each high quality string has a plurality of high quality characters. A canonical document containing the one or more high quality textual strings is retrieved. At least a portion of the canonical document is sent to the client system.Type: GrantFiled: August 6, 2010Date of Patent: November 10, 2015Assignee: Google Inc.Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
-
Patent number: 9176986Abstract: A server system receives a visual query from a client system distinct from the server system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query, and scores each textual character in the plurality of textual characters. The server system identifies, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query; retrieves a canonical document having the one or more high quality textual strings; generates a combination of the visual query and at least a portion of the canonical document; and sends the combination to the client system.Type: GrantFiled: December 1, 2011Date of Patent: November 3, 2015Assignee: Google Inc.Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
-
Patent number: 9087235Abstract: A server system receives a visual query from a client system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system also produces structural information associated with the textual characters in the visual query. Textual characters in the plurality of textual characters are scored. The method further includes identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. A canonical document that includes the one or more high quality textual strings and that is consistent with the structural information is retrieved. At least a portion of the canonical document is sent to the client system.Type: GrantFiled: July 29, 2014Date of Patent: July 21, 2015Assignee: Google Inc.Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
-
Patent number: 9075792Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for decompounding compound words are disclosed. In one aspect, a method includes obtaining a token that includes a sequence of characters, identifying two or more candidate sub-words that are constituents of the token, and one or more morphological operations that are required to transform the sub-words into the token, where at least one of the morphological operations involves a use of a non-dictionary word, and determining a cost associated with each sub-word and a cost associated with each morphological operation.Type: GrantFiled: February 14, 2011Date of Patent: July 7, 2015Assignee: Google Inc.Inventors: Andrew M. Dai, Klaus Macherey, Franz Josef Och, Ashok C. Popat, David R. Talbot
-
Publication number: 20140334746Abstract: A server system receives a visual query from a client system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system also produces structural information associated with the textual characters in the visual query. Textual characters in the plurality of textual characters are scored. The method further includes identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. A canonical document that includes the one or more high quality textual strings and that is consistent with the structural information is retrieved. At least a portion of the canonical document is sent to the client system.Type: ApplicationFiled: July 29, 2014Publication date: November 13, 2014Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
-
Patent number: 8812291Abstract: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n?1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.Type: GrantFiled: December 10, 2012Date of Patent: August 19, 2014Assignee: Google Inc.Inventors: Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, Jeffrey Dean
-
Patent number: 8811742Abstract: A server system receives a visual query from a client system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system also produces structural information associated with the textual characters in the visual query. Textual characters in the plurality of textual characters are scored. The method further includes identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. A canonical document that includes the one or more high quality textual strings and that is consistent with the structural information is retrieved. At least a portion of the canonical document is sent to the client system.Type: GrantFiled: December 1, 2011Date of Patent: August 19, 2014Assignee: Google Inc.Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
-
Patent number: 8805079Abstract: A server system receives a visual query from a client system distinct from the server system. The server system performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system scores each textual character in the plurality of textual characters in accordance with the geographic location of the client system. The server system identifies, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. Then the server system retrieves a canonical document having the one or more high quality textual strings and sends at least a portion of the canonical document to the client system.Type: GrantFiled: December 1, 2011Date of Patent: August 12, 2014Assignee: Google Inc.Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
-
Patent number: 8442813Abstract: A set of ordered characters is received in association with information specifying the locations of the characters within the image of the document. Language-conditional character probabilities for each character are determined based on a set of language models and the ordering of the characters. Neighbor characters associated with a target character are identified based on the locations of the characters. Language-conditional character probabilities associated with the neighbor characters and language-conditional character probabilities associated with the target character are combined to generate a local language-conditional likelihood associated with the target character, the local language-conditional likelihood representing a concordance of the target character to a language model.Type: GrantFiled: February 5, 2009Date of Patent: May 14, 2013Assignee: Google Inc.Inventor: Ashok C. Popat
-
Patent number: 8332207Abstract: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n-1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.Type: GrantFiled: June 22, 2007Date of Patent: December 11, 2012Assignee: Google Inc.Inventors: Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, Jeffrey Dean
-
Publication number: 20120134590Abstract: A server system receives a visual query from a client system distinct from the server system. The server system performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system scores each textual character in the plurality of textual characters in accordance with the geographic location of the client system. The server system identifies, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. Then the server system retrieves a canonical document having the one or more high quality textual strings and sends at least a portion of the canonical document to the client system.Type: ApplicationFiled: December 1, 2011Publication date: May 31, 2012Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
-
Publication number: 20120128251Abstract: A server system receives a visual query from a client system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system also produces structural information associated with the textual characters in the visual query. Textual characters in the plurality of textual characters are scored. The method further includes identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. A canonical document that includes the one or more high quality textual strings and that is consistent with the structural information is retrieved. At least a portion of the canonical document is sent to the client system.Type: ApplicationFiled: December 1, 2011Publication date: May 24, 2012Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
-
Publication number: 20120128250Abstract: A server system receives a visual query from a client system distinct from the server system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query, and scores each textual character in the plurality of textual characters. The server system identifies, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query; retrieves a canonical document having the one or more high quality textual strings; generates a combination of the visual query and at least a portion of the canonical document; and sends the combination to the client system.Type: ApplicationFiled: December 1, 2011Publication date: May 24, 2012Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
-
Publication number: 20120047172Abstract: A technique includes providing a collection of documents in multiple languages, identifying, from the collection of documents, a group of candidate documents, where each candidate document in the group shares multiple corresponding rare features, evaluating pairs of candidate documents in the group using multiple common features present in the collection of documents, and determining, based on evaluating the pairs of candidate documents, whether each pair of candidate documents corresponds to a translated pair of documents.Type: ApplicationFiled: August 22, 2011Publication date: February 23, 2012Applicant: Google Inc.Inventors: Jay M. Ponte, Jakob Uszkoreit, Ashok C. Popat, Moshe Dubiner
-
Publication number: 20110289395Abstract: The invention converts a document originating in a page-image format into a form suitable for an arbitrarily sized display, by reformatting or “re-flowing” of the document to fit an arbitrarily sized display device. A two-stage system analyzes, or “deconstructs,” page image layout. The deconstruction includes both physical (geometric) and logical (functional) segmentation of page images. The segment that image elements may include blocks, lines, and/or words of text, and other segmented image elements. The segment that image elements are synthesized and converted into an intermediate structure. The intermediate data structure is then distilled or converted or redisplayed into any number of standard print formats.Type: ApplicationFiled: June 3, 2011Publication date: November 24, 2011Applicant: XEROX CORPORATIONInventors: Thomas M. BREUEL, Henry S. BAIRD, William C. JANSSEN, Ashok C. POPAT, Dan S. BLOOMBERG
-
Publication number: 20110202330Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for decompounding compound words are disclosed. In one aspect, a method includes obtaining a token that includes a sequence of characters, identifying two or more candidate sub-words that are constituents of the token, and one or more morphological operations that are required to transform the sub-words into the token, where at least one of the morphological operations involves a use of a non-dictionary word, and determining a cost associated with each sub-word and a cost associated with each morphological operation.Type: ApplicationFiled: February 14, 2011Publication date: August 18, 2011Applicant: GOOGLE INC.Inventors: Andrew M. Dai, Klaus Macherey, Franz Josef Och, Ashok C. Popat, David R. Talbot
-
Publication number: 20110129153Abstract: A server system receives a visual query from a client system. The visual query is an image containing text such as a picture of a document. At the receiving server or another server, optical character recognition (OCR) is performed on the visual query to produce text recognition data representing textual characters. Each character in a contiguous region of the visual query is individually scored according to its quality. The quality score of a respective character is influenced by the quality scores of neighboring or nearby characters. Using the scores, one or more high quality strings of characters are identified. Each high quality string has a plurality of high quality characters. A canonical document containing the one or more high quality textual strings is retrieved. At least a portion of the canonical document is sent to the client system.Type: ApplicationFiled: August 6, 2010Publication date: June 2, 2011Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
-
Publication number: 20080243481Abstract: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n-1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.Type: ApplicationFiled: June 22, 2007Publication date: October 2, 2008Inventors: Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, Jeffrey Dean
-
Patent number: 7289668Abstract: Methods and systems for document image decoding incorporating a Stack algorithm improve document image decoding. The application of the Stack algorithm is iterated to improved decoding. A provisional weight is determined for a partial path to reduce template matching. In addition, semantically equivalent hypotheses are identified to reduce redundant hypotheses.Type: GrantFiled: August 9, 2002Date of Patent: October 30, 2007Assignee: Xerox CorporationInventors: Daniel H. Greene, Tze-Lei Poo, Ashok C. Popat