Patents by Inventor Aswath Manoharan

Aswath Manoharan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7908279
    Abstract: Systems and methods for filtering tokens from a document for determining whether the document describes substantially similar subject matter compared to another document are described. In one embodiment, a first document is obtained. This document is organized into a plurality of fields, and at least some of the fields include tokens representing the subject matter described by the document. A field of this document is selected and a token from within the selected field having the highest inverse document frequency (IDF) is selected. Those tokens that have a higher IDF than the selected token are removed. Using the remaining tokens, a determination is made as to whether the first document describes substantially similar subject matter to the subject matter described by a second document. An indication is provided as to whether the first document describes substantially similar subject matter to that described by a second document according to the determination.
    Type: Grant
    Filed: September 17, 2007
    Date of Patent: March 15, 2011
    Assignee: Amazon Technologies, Inc.
    Inventors: Srikanth Thirumalai, Aswath Manoharan, Mark J. Tomko, Grant M. Emery, Vijai Mohan, Egidio Terra
  • Patent number: 7904462
    Abstract: Systems and methods for determining whether a first document is a potential duplicate of a second document such that the two documents describe the same or substantially the same subject matter, wherein the first and second documents include attribute data in attribute fields. A set of rules is obtained for determining whether the first document is a potential duplicate of the second document. Moreover, for each rule in the set of rules, a determination is made as to whether data in a first set of attributes of the first document is contained in a second set of attributes of the second document. According to the results of the evaluated rules in the rules set, determining whether the first document is a potential duplicate of the second document. If, according to the evaluated rules in the rules set, the first document is determined to be a potential duplicate of the second document, storing a reference to the first document in a set of potential duplicates of the second document.
    Type: Grant
    Filed: December 10, 2007
    Date of Patent: March 8, 2011
    Assignee: Amazon Technologies, Inc.
    Inventors: Srikanth Thirumalai, Aswath Manoharan, Mark J. Tomko, Grant M. Emery, Vijai Mohan, Egidio Terra
  • Patent number: 7895225
    Abstract: According to aspects of the disclosed subject matter, a method for identifying a set of documents from a document corpus that are potential duplicates of a source document is provided. A source document is obtained. A list of queries corresponding to a source document is identified. Each query in the identified list of queries is executed on the document corpus, wherein the execution of each query yields a corresponding results set identifying an ordered set of documents in the document corpus. For each document identified in each results set, a document score is generated for the identified document based on the identified document's ordinal position in its results set. A subset of the identified documents of the results set is selected according to the generated document scores that satisfy predetermined selection criteria. The selected subset of identified documents are stored or displayed.
    Type: Grant
    Filed: December 6, 2007
    Date of Patent: February 22, 2011
    Assignee: Amazon Technologies, Inc.
    Inventors: Srikanth Thirumalai, Aswath Manoharan, Mark J. Tomko, Grant M. Emery, Vijai Mohan
  • Patent number: 7814107
    Abstract: A system and method for determining the likelihood of two documents describing substantially similar subject matter is presented. A set of tokens for each of two documents is obtained, each set representing strings of characters found in the corresponding document. A matrix of token pairs is determined, each token pair comprising a token from each set of tokens. For each token pair in the matrix, a similarity score is determined. Those token pairs in the matrix with a similarity score above a threshold score are selected and added to a set of matched tokens. A similarity score for the two documents is determined according to the scores of the token pairs added to the set of matched tokens. The determined similarity score is provided as the likelihood that the first and second documents describing substantially similar subject matter.
    Type: Grant
    Filed: May 25, 2007
    Date of Patent: October 12, 2010
    Assignee: Amazon Technologies, Inc.
    Inventors: Srikanth Thirumalai, Egidio Terra, Vijai Mohan, Mark J. Tomko, Grant M. Emery, Aswath Manoharan