Patents by Inventor Ashok C. Popat

Ashok C. Popat has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10606933
    Abstract: The invention converts a document originating in a page-image format into a form suitable for an arbitrarily sized display, by reformatting or “re-flowing” of the document to fit an arbitrarily sized display device. A two-stage system analyzes, or “deconstructs,” page image layout. The deconstruction includes both physical (geometric) and logical (functional) segmentation of page images. The segment that image elements may include blocks, lines, and/or words of text, and other segmented image elements. The segment that image elements are synthesized and converted into an intermediate structure. The intermediate data structure is then distilled or converted or redisplayed into any number of standard print formats.
    Type: Grant
    Filed: June 3, 2011
    Date of Patent: March 31, 2020
    Assignee: XEROX CORPORATION
    Inventors: Thomas M. Breuel, Henry S. Baird, William C. Janssen, Ashok C. Popat, Daniel S. Bloomberg
  • Patent number: 9183224
    Abstract: A server system receives a visual query from a client system. The visual query is an image containing text such as a picture of a document. At the receiving server or another server, optical character recognition (OCR) is performed on the visual query to produce text recognition data representing textual characters. Each character in a contiguous region of the visual query is individually scored according to its quality. The quality score of a respective character is influenced by the quality scores of neighboring or nearby characters. Using the scores, one or more high quality strings of characters are identified. Each high quality string has a plurality of high quality characters. A canonical document containing the one or more high quality textual strings is retrieved. At least a portion of the canonical document is sent to the client system.
    Type: Grant
    Filed: August 6, 2010
    Date of Patent: November 10, 2015
    Assignee: Google Inc.
    Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
  • Patent number: 9176986
    Abstract: A server system receives a visual query from a client system distinct from the server system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query, and scores each textual character in the plurality of textual characters. The server system identifies, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query; retrieves a canonical document having the one or more high quality textual strings; generates a combination of the visual query and at least a portion of the canonical document; and sends the combination to the client system.
    Type: Grant
    Filed: December 1, 2011
    Date of Patent: November 3, 2015
    Assignee: Google Inc.
    Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
  • Patent number: 9087235
    Abstract: A server system receives a visual query from a client system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system also produces structural information associated with the textual characters in the visual query. Textual characters in the plurality of textual characters are scored. The method further includes identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. A canonical document that includes the one or more high quality textual strings and that is consistent with the structural information is retrieved. At least a portion of the canonical document is sent to the client system.
    Type: Grant
    Filed: July 29, 2014
    Date of Patent: July 21, 2015
    Assignee: Google Inc.
    Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
  • Patent number: 9075792
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for decompounding compound words are disclosed. In one aspect, a method includes obtaining a token that includes a sequence of characters, identifying two or more candidate sub-words that are constituents of the token, and one or more morphological operations that are required to transform the sub-words into the token, where at least one of the morphological operations involves a use of a non-dictionary word, and determining a cost associated with each sub-word and a cost associated with each morphological operation.
    Type: Grant
    Filed: February 14, 2011
    Date of Patent: July 7, 2015
    Assignee: Google Inc.
    Inventors: Andrew M. Dai, Klaus Macherey, Franz Josef Och, Ashok C. Popat, David R. Talbot
  • Publication number: 20140334746
    Abstract: A server system receives a visual query from a client system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system also produces structural information associated with the textual characters in the visual query. Textual characters in the plurality of textual characters are scored. The method further includes identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. A canonical document that includes the one or more high quality textual strings and that is consistent with the structural information is retrieved. At least a portion of the canonical document is sent to the client system.
    Type: Application
    Filed: July 29, 2014
    Publication date: November 13, 2014
    Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
  • Patent number: 8812291
    Abstract: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n?1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.
    Type: Grant
    Filed: December 10, 2012
    Date of Patent: August 19, 2014
    Assignee: Google Inc.
    Inventors: Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, Jeffrey Dean
  • Patent number: 8811742
    Abstract: A server system receives a visual query from a client system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system also produces structural information associated with the textual characters in the visual query. Textual characters in the plurality of textual characters are scored. The method further includes identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. A canonical document that includes the one or more high quality textual strings and that is consistent with the structural information is retrieved. At least a portion of the canonical document is sent to the client system.
    Type: Grant
    Filed: December 1, 2011
    Date of Patent: August 19, 2014
    Assignee: Google Inc.
    Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
  • Patent number: 8805079
    Abstract: A server system receives a visual query from a client system distinct from the server system. The server system performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system scores each textual character in the plurality of textual characters in accordance with the geographic location of the client system. The server system identifies, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. Then the server system retrieves a canonical document having the one or more high quality textual strings and sends at least a portion of the canonical document to the client system.
    Type: Grant
    Filed: December 1, 2011
    Date of Patent: August 12, 2014
    Assignee: Google Inc.
    Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
  • Patent number: 8442813
    Abstract: A set of ordered characters is received in association with information specifying the locations of the characters within the image of the document. Language-conditional character probabilities for each character are determined based on a set of language models and the ordering of the characters. Neighbor characters associated with a target character are identified based on the locations of the characters. Language-conditional character probabilities associated with the neighbor characters and language-conditional character probabilities associated with the target character are combined to generate a local language-conditional likelihood associated with the target character, the local language-conditional likelihood representing a concordance of the target character to a language model.
    Type: Grant
    Filed: February 5, 2009
    Date of Patent: May 14, 2013
    Assignee: Google Inc.
    Inventor: Ashok C. Popat
  • Patent number: 8332207
    Abstract: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n-1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.
    Type: Grant
    Filed: June 22, 2007
    Date of Patent: December 11, 2012
    Assignee: Google Inc.
    Inventors: Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, Jeffrey Dean
  • Publication number: 20120134590
    Abstract: A server system receives a visual query from a client system distinct from the server system. The server system performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system scores each textual character in the plurality of textual characters in accordance with the geographic location of the client system. The server system identifies, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. Then the server system retrieves a canonical document having the one or more high quality textual strings and sends at least a portion of the canonical document to the client system.
    Type: Application
    Filed: December 1, 2011
    Publication date: May 31, 2012
    Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
  • Publication number: 20120128251
    Abstract: A server system receives a visual query from a client system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system also produces structural information associated with the textual characters in the visual query. Textual characters in the plurality of textual characters are scored. The method further includes identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. A canonical document that includes the one or more high quality textual strings and that is consistent with the structural information is retrieved. At least a portion of the canonical document is sent to the client system.
    Type: Application
    Filed: December 1, 2011
    Publication date: May 24, 2012
    Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
  • Publication number: 20120128250
    Abstract: A server system receives a visual query from a client system distinct from the server system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query, and scores each textual character in the plurality of textual characters. The server system identifies, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query; retrieves a canonical document having the one or more high quality textual strings; generates a combination of the visual query and at least a portion of the canonical document; and sends the combination to the client system.
    Type: Application
    Filed: December 1, 2011
    Publication date: May 24, 2012
    Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
  • Publication number: 20120047172
    Abstract: A technique includes providing a collection of documents in multiple languages, identifying, from the collection of documents, a group of candidate documents, where each candidate document in the group shares multiple corresponding rare features, evaluating pairs of candidate documents in the group using multiple common features present in the collection of documents, and determining, based on evaluating the pairs of candidate documents, whether each pair of candidate documents corresponds to a translated pair of documents.
    Type: Application
    Filed: August 22, 2011
    Publication date: February 23, 2012
    Applicant: Google Inc.
    Inventors: Jay M. Ponte, Jakob Uszkoreit, Ashok C. Popat, Moshe Dubiner
  • Publication number: 20110289395
    Abstract: The invention converts a document originating in a page-image format into a form suitable for an arbitrarily sized display, by reformatting or “re-flowing” of the document to fit an arbitrarily sized display device. A two-stage system analyzes, or “deconstructs,” page image layout. The deconstruction includes both physical (geometric) and logical (functional) segmentation of page images. The segment that image elements may include blocks, lines, and/or words of text, and other segmented image elements. The segment that image elements are synthesized and converted into an intermediate structure. The intermediate data structure is then distilled or converted or redisplayed into any number of standard print formats.
    Type: Application
    Filed: June 3, 2011
    Publication date: November 24, 2011
    Applicant: XEROX CORPORATION
    Inventors: Thomas M. BREUEL, Henry S. BAIRD, William C. JANSSEN, Ashok C. POPAT, Dan S. BLOOMBERG
  • Publication number: 20110202330
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for decompounding compound words are disclosed. In one aspect, a method includes obtaining a token that includes a sequence of characters, identifying two or more candidate sub-words that are constituents of the token, and one or more morphological operations that are required to transform the sub-words into the token, where at least one of the morphological operations involves a use of a non-dictionary word, and determining a cost associated with each sub-word and a cost associated with each morphological operation.
    Type: Application
    Filed: February 14, 2011
    Publication date: August 18, 2011
    Applicant: GOOGLE INC.
    Inventors: Andrew M. Dai, Klaus Macherey, Franz Josef Och, Ashok C. Popat, David R. Talbot
  • Publication number: 20110129153
    Abstract: A server system receives a visual query from a client system. The visual query is an image containing text such as a picture of a document. At the receiving server or another server, optical character recognition (OCR) is performed on the visual query to produce text recognition data representing textual characters. Each character in a contiguous region of the visual query is individually scored according to its quality. The quality score of a respective character is influenced by the quality scores of neighboring or nearby characters. Using the scores, one or more high quality strings of characters are identified. Each high quality string has a plurality of high quality characters. A canonical document containing the one or more high quality textual strings is retrieved. At least a portion of the canonical document is sent to the client system.
    Type: Application
    Filed: August 6, 2010
    Publication date: June 2, 2011
    Inventors: David Petrou, Ashok C. Popat, Matthew R. Casey
  • Publication number: 20080243481
    Abstract: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n-1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.
    Type: Application
    Filed: June 22, 2007
    Publication date: October 2, 2008
    Inventors: Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, Jeffrey Dean
  • Patent number: 7289668
    Abstract: Methods and systems for document image decoding incorporating a Stack algorithm improve document image decoding. The application of the Stack algorithm is iterated to improved decoding. A provisional weight is determined for a partial path to reduce template matching. In addition, semantically equivalent hypotheses are identified to reduce redundant hypotheses.
    Type: Grant
    Filed: August 9, 2002
    Date of Patent: October 30, 2007
    Assignee: Xerox Corporation
    Inventors: Daniel H. Greene, Tze-Lei Poo, Ashok C. Popat