Patents Represented by Attorney Sylvia Rodriguez
  • Patent number: 8055633
    Abstract: A method of duplicate detection for data items in a stream of data items, the method comprising the steps of: receiving a data item from the stream of data items; applying at least two different hashing algorithms to the data item to generate hash keys that identify elements in a first bloom filter data structure having a plurality of elements; checking a state of each of the identified elements to determine if the data item is a potential duplicate, the determination depending on whether the identified elements are indicated as having been also identified for a previous data item received from the stream; and in response to the determination that the data item is a potential duplicate, checking an index of hash keys to determine if at least one of the generated hash keys exists in the index to identify the data item as an actual duplicate.
    Type: Grant
    Filed: January 21, 2009
    Date of Patent: November 8, 2011
    Assignee: International Business Machines Corporation
    Inventor: James Richard Hamilton Whyte