Data Cleansing, Data Scrubbing, And Deleting Duplicates Patents (Class 707/692)

User-determinable method and system for manipulating and displaying textual and graphical information

Patent number: 8874529

Abstract: One or more aspects of the invention include transforming source data in order to display a work product. A plurality of rules relating to content manipulation of the source data include at least one rule relating to content selection and at least one rule relating to content compression. Source data for content manipulation may also be received. A selected portion of the source data and a compressed portion of the source data may be formed. The compressed portion may then be received and presented on a computer a work product.

Type: Grant

Filed: March 16, 2009

Date of Patent: October 28, 2014

Inventor: Bert A. Silich
DATA DE-DUPLICATION

Publication number: 20140317067

Abstract: Disclosed are computer implemented methods, computer program products, and computer systems for storing a file into a storage system. An embodiment includes, responsive to a determination that a descriptive information describing content of a first file corresponds to a descriptive information describing content of a second file, that a format of the first file is convertible to a format of the second file using a transformation matrix, and that the format of the first file has a higher quality indicator value than the format of the second file, storing the first file into the storage system.

Type: Application

Filed: April 23, 2014

Publication date: October 23, 2014

Applicant: International Business Machines Corporation

Inventors: Michael Baessler, Peng Hui Jiang, Pi Jun Jiang
System and method for generating and updating location check digits

Patent number: 8868519

Abstract: Method, apparatus and program product for generating check data for a location within an area of a workspace include receiving an identifier for a selected location that has check data associated therewith. Candidate check data for use with the selected location is generated. The candidate check data is evaluated for a match against at least one of existing check data for the selected location or check data associated with a related location. Based on the evaluation, a determination is made of whether the candidate check data is acceptable for use for the selected location.

Type: Grant

Filed: May 27, 2011

Date of Patent: October 21, 2014

Assignee: Vocollect, Inc.

Inventors: James D. Maloy, Michael Kusar, Alexander Mranca, Venkatesh Narayan, Jeffrey Thorsen
Systems and methods for protecting data in a network host environment

Patent number: 8868505

Abstract: Data protection programs are installed at each network host. The programs communicate with each other to scan the hosts and identify duplicate and unique data objects stored at the hosts. Duplicate data objects are maintained on the hosts. Unique data objects are broken into chunks, copied to other hosts, and a parity data is calculated. When a network host becomes unavailable and is replaced with a new network host, duplicate data objects stored on the now unavailable network host may be rebuilt on the new network host using the maintained duplicate data objects on the other hosts. Unique data objects stored on the now unavailable network host may be rebuilt on the new network host using the copied chunks and parity data.

Type: Grant

Filed: March 20, 2012

Date of Patent: October 21, 2014

Assignee: EMC Corporation

Inventor: Mahendra Nag Jayanthi
Blob manipulation in an integrated structured storage system

Patent number: 8868624

Abstract: Embodiments of the present invention relate to systems, methods and computer storage media for facilitating the structured storage of binary large objects (Blobs) to be accessed by an application program being executed by a computing device. Generally, the manipulation of Blobs in a structured storage system includes receiving a request for a Blob, which may be located by way of a Blob pointer. The Blob pointer allows for the data, such as properties, of the Blob to be identified and located. Expired properties are garbage collected as a manipulation of the Blob data within a structured storage system. In an embodiment, the Blob is identified by a key that is utilized within a primary structured index to located the requested Blob. In another embodiment, the requested Blob is located utilizing a secondary hash index. In an additional embodiment, the Blob is locate utilizing a file table.

Type: Grant

Filed: July 22, 2013

Date of Patent: October 21, 2014

Assignee: Microsoft Corporation

Inventors: Bradley Gene Calder, Ju Wang, Xinran Wu, Niranjan Nilakantan, Deepali Bhardwaj, Shashwat Srivastav, Alexander Felsobuki Nagy
System and method for removing overlapping ranges from a flat sorted data structure

Patent number: 8868520

Abstract: A system and method efficiently removes ranges of entries from a flat sorted data structure, such as a fingerprint database, of a storage system. The ranges of entries represent fingerprints that have become stale, i.e., are not representative of current states of corresponding blocks in the file system, due to various file system operations such as, e.g., deletion of a data block without overwriting its contents. A deduplication module performs an attributes intersect range calculation (AIRC) procedure on the stale fingerprint data structure to compute a set of non-overlapping and latest consistency point (CP) ranges. The output from the AIRC procedure, i.e., the set of non-overlapping and latest CP ranges, is then used to remove stale fingerprints associated with that deleted block (as well as each other deleted data block) from the fingerprint database.

Type: Grant

Filed: March 1, 2012

Date of Patent: October 21, 2014

Assignee: NetApp, Inc.

Inventors: Rohini Raghuwanshi, Ashish Shukla, Praveen Killamsetti
Processing of streaming data with keyed aggregation

Patent number: 8868518

Abstract: Keyed aggregation is used in the processing of streaming data to streamline processing to provide higher throughput and decreased use of resources. The most recent event for each unique replacement key value(s) is maintained. In response to an incoming event having a same key as a previous event, the effect on an aggregation of the previous event is removed. The aggregation is then updated with one or more values from the arriving event and the updated aggregation is output.

Type: Grant

Filed: August 14, 2009

Date of Patent: October 21, 2014

Assignee: International Business Machines Corporation

Inventors: Henrique Andrade, Mitchell A. Cohen, Bugra Gedik
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND COMPUTER PROGRAM

Publication number: 20140310252

Abstract: An information processing apparatus is provided, in which content and position information generated independently of each other are recorded in a recording medium. The apparatus includes a recording medium in which the content and the position information are recorded and a deletion unit deleting position information temporally associated with a piece of the content from the recording medium when the piece of content is deleted from the recording medium.

Type: Application

Filed: June 26, 2014

Publication date: October 16, 2014

Applicant: Sony Corporation

Inventors: Masayuki ICHIHARA, Masanao TSUTSUI
INTELLIGENT DEDUPLICATION DATA PREFETCHING

Publication number: 20140310251

Abstract: Deduplication dictionaries are used to maintain data chunk identifier and location pairings in a deduplication system. When access to a particular data chunk is requested, a deduplication dictionary is accessed to determine the location of the data chunk and a datastore is accessed to retrieve the data chunk. However, deduplication dictionaries are large and typically maintained on disk, so dictionary access is expensive. Techniques and mechanisms of the present invention allow prefetches or read aheads of datastore (DS) headers. For example, if a dictionary hit results in datastore DS(X), then headers for DS (X+1), DS (X+2), DS(X+read-ahead-window) are prefetched ahead of time. These datastore headers are cached in memory, and indexed by datastore identifier. Before going to the dictionary, a lookup is first performed in the cached headers to reduce deduplication data access request latency.

Type: Application

Filed: June 23, 2014

Publication date: October 16, 2014

Applicant: Dell Products L.P.

Inventors: Vinod Jayaraman, Ratna Manoj Bolla
STORAGE-NETWORK DE-DUPLICATION

Publication number: 20140310250

Abstract: Techniques are provided for de-duplication of data. In one embodiment, a system comprises de-duplication logic that is coupled to a de-duplication repository. The de-duplication logic is operable to receive, from a client device over a network, a request to store a file in the de-duplicated repository using a single storage encoding. The request includes a file identifier and a set of signatures that identify a set of chunks from the file. The de-duplication logic determines whether any chunks in the set are missing from the de-duplicated repository and requests the missing chunks from the client device. Then, for each missing chunk, the de-duplication logic stores in the de-duplicated repository that chunk and a signature representing that chunk. The de-duplication logic also stores, in the de-duplicated repository, a file entry that represents the file and that associates the set of signatures with the file identifier.

Type: Application

Filed: January 7, 2014

Publication date: October 16, 2014

Applicant: VMware, Inc.

Inventors: Israel Zvi BEN-SHAUL, Leonid VASETSKY
File management apparatus, method, and program product thereof for sending a file-saving related notice that indicates the contents of the saved file

Patent number: 8862562

Abstract: A file management apparatus, file management method, and file management program product are provided in which a user who receives a file-saving related notice from a system can easily grasp the contents of the notified file. Accordingly, a designated notice destination that the end of a save period of a file recorded in a file saving apparatus is provided. The apparatus includes a save-period counter, a save-period monitoring section for monitoring an end of a save period of each file basing on timing by the save-period counter, an attachment-file making section for making a partial file composed of a part of file contents, a notice transmitting section for notifying a notice destination of a fact that there is a file at the end of a save period, and a notice-file making section for attaching a partial file of the file to the notice of the notice transmitting section.

Type: Grant

Filed: December 16, 2004

Date of Patent: October 14, 2014

Assignee: Konica Minolta, Inc.

Inventors: Takeshi Hibino, Kazuyuki Kawabata, Hideyuki Hashimoto
Single instantiation method using file clone and file storage system utilizing the same

Patent number: 8862558

Abstract: In file de-duplication using hash value comparison, hash values of all target files must be calculated and actual data of all files must be read for hash value calculation, so that the processing time was long. The present invention provides a file storage system comprising a controller and a volume storing a plurality of files, the volume including a first directory storing a first file and a second file and a second directory storing a third file being created, wherein the controller migrates actual data of the second file to the third file, sets up a management information of the second file so that the third file is referred to when the second file is read, and if the sizes of actual data of the first file and the actual data of the third file are identical and the binaries of the actual data of the first file and the actual data of the third file are identical, sets up a management information of the first file to refer to the third file when reading the first file.

Type: Grant

Filed: January 25, 2012

Date of Patent: October 14, 2014

Assignee: Hitachi, Ltd.

Inventors: Tomonori Esaka, Takaki Nakamura, Hitoshi Kamei, Masakuni Agetsuma
METHOD AND APPARATUS FOR DETECTING DUPLICATE MESSAGES

Publication number: 20140304238

Abstract: An approach is provided for detect duplicate messages with multiple probabilistic data structures. A de-duplication platform causes, at least in part, a representing of one or more messages in two or more probabilistic data structures. The de-duplication platform further causes, at least in part, an alternating clearing of the two or more probabilistic data structures as respective probabilistic data structures are filled with the one or more messages to respective thresholds, with the two or more probabilistic data structures facilitating determination of one or more duplicates among the one or more messages.

Type: Application

Filed: April 5, 2013

Publication date: October 9, 2014

Applicant: Nokia Corporation

Inventors: Tero Mikael Halla-Aho, Yongbeom Pak, Srikanth Kyatham, Eero Tapani Lepisto
SYSTEMS AND METHODS FOR SCHEDULING DEDUPLICATION OF A STORAGE SYSTEM

Publication number: 20140304239

Abstract: Systems for deduplicating one or more storage units of a storage system provide a scheduler, which is operable to select at least one storage unit (e.g. a storage volume) for deduplication and perform a deduplication process, which removes duplicate data blocks from the selected storage volume. The systems are operable to determine the state of one or more storage units and manage deduplication requests in part based state information. The system is further operable to manage user generated requests and manage deduplication requests in part based on user input information. The system may include a rules engine which prioritizes system operations including determining an order in which to perform state-gathering information and determining an order in which to perform deduplication. The system is further operable to determine the order in which storage units are processed.

Type: Application

Filed: April 5, 2013

Publication date: October 9, 2014

Applicant: NetApp, Inc.

Inventors: Blake Lewis, Ling Zheng, Craig Johnston, Vinod Daga
SYSTEM AND METHOD FOR ACCELERATING ANCHOR POINT DETECTION

Publication number: 20140304241

Abstract: A sampling based technique for eliminating duplicate data (de-duplication) stored on storage resources, is provided. According to the invention, when a new data set, e.g., a backup data stream, is received by a server, e.g., a storage system or virtual tape library (VTL) system implementing the invention, one or more anchors are identified within the new data set. The anchors are identified using a novel anchor detection circuitry in accordance with an illustrative embodiment of the present invention. Upon receipt of the new data set by, for example, a network adapter of a VTL system, the data set is transferred using direct memory access (DMA) operations to a memory associated with an anchor detection hardware card that is operatively interconnected with the storage system. The anchor detection hardware card may be implemented as, for example, a FPGA is to quickly identify anchors within the data set.

Type: Application

Filed: June 20, 2014

Publication date: October 9, 2014

Inventors: Steven C. Miller, Roger Stager
Pruning of Blob Replicas

Publication number: 20140304240

Abstract: A method allocates object replicas in a distributed storage system. The method identifies a plurality of objects in the distributed storage system. Each object has an associated storage policy that specifies a target number of object replicas stored at distinct instances of the distributed storage system. The method identifies an object of the plurality of objects whose number of object replicas exceeds the target number of object replicas specified by the storage policy associated with the object. The method selects a first replica of the object for removal based on last access times for replicas of the object, and transmits a request to a first instance of the distributed storage system that stores the first replica. The request instructs the first instance to remove the first replica of the object.

Type: Application

Filed: June 2, 2014

Publication date: October 9, 2014

Applicant: Google Inc.

Inventors: Yonatan Zunger, Alexandre Drobychev, Alexander Kesselman, Rebekah C. Vickrey, Frank C. Dachille, George Datuashvili
STORAGE SYSTEM FOR ELIMINATING DUPLICATED DATA

Publication number: 20140304242

Abstract: A storage system 103 carries out first and second de-duplication processes in response to receiving a write request from a client. First, a determination is made as to whether a write target data item overlaps with any of the stored data items of a part of a stored data item group, which is a user data item group stored in a storage device 209, and if so, the write target data item is prevented from being stored in the storage device. Second, a determination is made as to whether a target stored data item, which is not finished being evaluated as to whether it overlaps with the stored data item in the first de-duplication process, overlaps with another stored data item, and if so, the target stored data item or the same data item overlapping with the target stored data item is deleted from the storage device 209.

Type: Application

Filed: June 20, 2014

Publication date: October 9, 2014

Inventors: Takaki NAKAMURA, Akira YAMAMOTO, Masaaki IWASAKI, Yohsuke ISHII, Nobumitsu TAKAOKA
Policy based population of genealogical archive data

Patent number: 8856082

Abstract: An approach for managing a family tree archive is provided. The approach includes creating an electronic archive based on a family tree. The approach also includes automatically discovering Internet-based data associated with at least one member of the family tree. The approach additionally includes adding the Internet-based data to the archive. The approach further includes storing the archive at a storage device.

Type: Grant

Filed: May 23, 2012

Date of Patent: October 7, 2014

Assignee: International Business Machines Corporation

Inventors: Michael D. Hale, Tian M. Pan, Randy A. Rendahl
Typed relevance scores in an identity resolution system

Patent number: 8856144

Abstract: Techniques are disclosed for configuring an identity resolution system to support distinct relevance types. Identity records are accessed that are assigned relevance scores of distinct relevance types. Upon determining that the identity records refer to a common individual, the identity records are resolved into an entity representing the common individual. Relevance scores of the distinct relevance types are then determined for the entity, based on the identity records.

Type: Grant

Filed: July 18, 2012

Date of Patent: October 7, 2014

Assignee: International Business Machines Corporation

Inventors: Thomas B. Allen, Barry M. Caceres
TECHNIQUES FOR RECONCILING METADATA AND DATA IN A CLOUD STORAGE SYSTEM WITHOUT SERVICE INTERRUPTION

Publication number: 20140297604

Abstract: A system and methods for reconciling data and metadata in a cloud storage system while the cloud storage system is fully operational are provided. The method comprises scanning for broken references in a metadata database containing metadata of blocks stored in the cloud storage system, wherein the scanning for the broken references is performed as a background process; and synchronously verifying blocks for at least existence of the blocks in the object storage system, wherein the synchronous block verification is performed using a foreground process as blocks are requested.

Type: Application

Filed: March 27, 2014

Publication date: October 2, 2014

Applicant: CTERA NETWORKS, LTD.

Inventor: Aron Brand
MULTIPLE USER PROFILE CLEANER

Publication number: 20140297602

Abstract: A cleaning application that can clean, for one or more user profiles, at least one of one or more files of a computer or a registry of the computer is provided. The cleaning application can include a cleaning module. The cleaning module can select a plurality of user profiles of the computer. The cleaning module can further select at least one of a file location or a user profile hive for each user profile of the plurality of user profiles. The cleaning module can further clean at least one of one or more files stored within the file location or a registry stored within the user profile hive for each user profile of the plurality of user profiles.

Type: Application

Filed: March 29, 2013

Publication date: October 2, 2014

Applicant: Piriform Ltd.

Inventor: Guy SANER
SYSTEM AND METHOD FOR DELETION COMPACTOR FOR LARGE STATIC DATA IN NOSQL DATABASE

Publication number: 20140297601

Abstract: System and method to compact a NoSQL database, the method including: receiving, by a receiver coupled to a processor, an indication of a record to delete in the NoSQL database; for each file in the NoSQL database, perform the steps of: if said file does not contain the record to delete, placing said file in a first memory; if said file contains the record to delete: placing said file in a second memory; searching whether the record to delete from said file in the second memory matches a record in one or more files in the first memory; and if a searched files in the first memory contain the record to delete from said file in the second memory, compacting said file in the second memory with the files in the first memory that contain the record to delete.

Type: Application

Filed: March 28, 2013

Publication date: October 2, 2014

Applicant: Avaya Inc.

Inventor: Anne Pruner
SYSTEM AND METHOD FOR DETECTING DUPLICATION IN DATA FEEDS

Publication number: 20140297576

Abstract: A system and method for filtering data sources is provided. Data corresponding to an entity listing is received from a set of data sources including one or more primary data sources and at least one secondary data source. The received data is grouped based attributes of the entity listing. Common values between data from the one or more primary data sources and data from the at least one secondary data source are identified for each attribute of the entity listing. A probability that one of the at least one secondary data source copied data from the one or more primary data sources is calculated based on the identified common values. A determination of whether the calculated probability is greater than a predetermined value is made. If the calculated probability is greater than the predetermined value, the one data source is removed from the at least one secondary data source.

Type: Application

Filed: April 1, 2013

Publication date: October 2, 2014

Applicant: Google Inc.

Inventor: Google Inc.
METHOD AND APPARATUS FOR DEDUPLICATION OF REPLICATED FILE

Publication number: 20140297603

Abstract: A replicated file deduplication apparatus generates a hash key of a requested data block, determines whether the same data block as the requested data block exists in data blocks of a replicated image file that is derived from the same golden image file as the requested data block using the hash key of the requested data block, and records, if the same data block as the requested data block exists, information of a chunk in which the same data block as the requested data block is stored at a layout of the requested data block.

Type: Application

Filed: June 26, 2013

Publication date: October 2, 2014

Inventors: Young-Chang KIM, Hong Yeon KIM, Young Kyun KIM
Systems and methods for classifying files as candidates for deduplication

Patent number: 8849768

Abstract: A computer-implemented method may include identifying at least one file and detecting an event that is suggestive of at least a portion of the file being duplicated in at least one additional file. The computer-implemented method may also include classifying the file as a candidate for deduplication in response to detecting the event. The computer-implemented method may further include maintaining the file's candidate-for-deduplication classification for use in prompting a determination on whether the portion of the file is already stored within a storage device.

Type: Grant

Filed: March 8, 2011

Date of Patent: September 30, 2014

Assignee: Symantec Corporation

Inventor: Namita Agrawal
Methods and apparatus for active optimization of data

Patent number: 8849773

Abstract: Techniques and mechanisms are provided to support live file optimization. Active I/O access to an optimization target is monitored during optimization. Active files need not be taken offline or made unavailable to an application during optimization and retain the ability to support file operations such as read, write, unlink, and truncate while an optimization engine performs deduplication and/or compression on active file ranges.

Type: Grant

Filed: March 4, 2011

Date of Patent: September 30, 2014

Assignee: Dell Products L.P.

Inventors: Abhijit Dinkar, Vinod Jayaraman, Murali Bashyam, Goutham Rao
Data replication with delta compression

Patent number: 8849772

Abstract: Data replication with delta compression is disclosed. A primary system and a replica system are determined to both have an identical first data segment that is similar to a second data segment. The second data segment is encoded, wherein the encoding refers to the first data segment.

Type: Grant

Filed: November 14, 2008

Date of Patent: September 30, 2014

Assignee: EMC Corporation

Inventors: Mark Huang, Philip Shilane, Grant Wallace, Ming Benjamin Zhu
Typed relevance scores in an identity resolution system

Patent number: 8843501

Abstract: Techniques are disclosed for configuring an identity resolution system to support distinct relevance types. Identity records are accessed that are assigned relevance scores of distinct relevance types. Upon determining that the identity records refer to a common individual, the identity records are resolved into an entity representing the common individual. Relevance scores of the distinct relevance types are then determined for the entity, based on the identity records.

Type: Grant

Filed: February 18, 2011

Date of Patent: September 23, 2014

Assignee: International Business Machines Corporation

Inventors: Thomas B. Allen, Barry M. Caceres
Elimination of duplicate objects in storage clusters

Patent number: 8843454

Abstract: Digital objects within a fixed-content storage cluster use a page mapping table and a hash-to-UID table to store a representation of each object. For each object stored within the cluster, a record in the hash-to-UID table stores the object's hash value and its unique identifier (or portions thereof). To detect a duplicate of an object, a portion of its hash value is used as a key into the page mapping table. The page mapping table indicates a node holding a hash-to-UID table indicating currently stored objects in a particular page range. Finding the same hash value but with a different unique identifier in the table indicates that a duplicate of an object exists. Portions of the hash value and unique identifier may be used in the hash-to-UID table. Unneeded duplicate objects are deleted by copying their metadata to a manifest and then redirecting unique identifiers to point at the manifest.

Type: Grant

Filed: April 25, 2014

Date of Patent: September 23, 2014

Assignee: Caringo, Inc.

Inventors: Paul R. M. Carpentier, Russell Turpin
SYSTEM AND METHOD FOR METADATA MODIFICATION

Publication number: 20140279950

Abstract: The present invention provides a method for modifying a first storage medium having a plurality of files, the method including providing a first modification tool; operatively coupling the first storage medium to the modification tool, wherein the operatively coupling includes bypassing a first operating system used to access the plurality of files; and dematerializing, using the first modification tool, at least a first file to form one or more dematerialized files. In some embodiments, the present invention provides a modification system for modifying a first storage medium having a plurality of files, the system including a first modification tool that includes an attachment module configured to operatively couple the modification tool to the first storage medium such that a first operating system used to access the plurality of files is bypassed; and a dematerialization module configured to dematerialize at least a first file to form one or more dematerialized files.

Type: Application

Filed: March 14, 2013

Publication date: September 18, 2014

Inventors: Joshua Shapiro, Robert Gezelter
EFFICIENT CALCULATION OF SIMILARITY SEARCH VALUES AND DIGEST BLOCK BOUNDARIES FOR DATA DEDUPLICATION

Publication number: 20140279952

Abstract: For efficient calculation of both similarity search values and boundaries of digest blocks in data deduplication, input data is partitioned into chunks, and for each chunk a set of rolling hash values is calculated. A single linear scan of the rolling hash values is used to produce both similarity search values and boundaries of the digest blocks of the chunk.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Shay H. AKIRAV, Lior ARONOVICH, Shira BEN-DOR, Michael HIRSCH, Ofer LENEMAN
REDUCING DIGEST STORAGE CONSUMPTION IN A DATA DEDUPLICATION SYSTEM

Publication number: 20140279953

Abstract: For reducing digests storage consumption in a data deduplication system using a processor device in a computing environment, digest values are calculated for input data. The digest values are used to locate matches with data stored in a repository. The digest values are stored in the repository. The digest values of the data stored in the repository that is determined to be redundant with the input data are removed.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
TABULAR DATA MANIPULATION SYSTEM AND METHOD

Publication number: 20140279957

Abstract: A system and method that implements a tabular graph editor are disclosed. The system supports employing tables to browse and edit comparisons by multiple attributes of nodes in a graph.

Type: Application

Filed: March 10, 2014

Publication date: September 18, 2014

Applicant: Clados Management LLC

Inventors: Peter Moore, Franz Christian Halaschek-Wiener, Andrey Pleshakov, Maxwell Bernardy
DIGEST RETRIEVAL BASED ON SIMILARITY SEARCH IN DATA DEDUPLICATION

Publication number: 20140279951

Abstract: For digest retrieval based on similarity search in deduplication processing in a data deduplication system using a processor device in a computing environment, input data is partitioned into fixed sized data chunks. Similarity elements and digest block boundaries and digest values are calculated for each of the fixed sized data chunks. Matching similarity elements are searched for in a search structure containing the similarity elements for each of the fixed sized data chunks in a repository of data. Positions of similar data are located in the repository. The positions of the similar data are used to locate and load into the memory stored digest values and corresponding stored digest block boundaries of the similar data in the repository. The digest values and the corresponding digest block boundaries of the input data are matched with the stored digest values and the corresponding stored digest block boundaries to find data matches.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Shay H. AKIRAV, Lior ARONOVICH, Shira BEN-DOR, Michael HIRSCH, Ofer LENEMAN
SYSTEMS AND METHODS OF LOCATING REDUNDANT DATA USING PATTERNS OF MATCHING FINGERPRINTS

Publication number: 20140279956

Abstract: A system configured to compute match potential between first data and second data is provided. The system includes data storage storing the first data and the second data, and at least one processor coupled to the data storage. The at least one processor is configured to identify a first sequence of fingerprints characterizing a first plurality of sections of the first data, the first sequence being ordered according to an order of the first plurality of sections within the first data; identify a second sequence of fingerprints comprising fingerprints that match fingerprints within the first sequence, the second sequence of fingerprints characterizing a second plurality of sections of the second data, the second sequence being ordered according to an order of the second plurality of sections within the second data; quantify a similarity between the first sequence and the second sequence; and adjust the match potential based on the similarity.

Type: Application

Filed: January 22, 2014

Publication date: September 18, 2014

Inventors: Ronald Ray Trimble, Jon Christopher Kennedy, Timmie G. Reiter, David Michael Biernacki, Carey Jay McMaster, Stefan Merrill King
INDUSTRIAL ASSET EVENT CHRONOLOGY

Publication number: 20140279948

Abstract: Among other things, one or more techniques and/or systems are provided for developing a timeline chronicling events pertaining to an industrial asset. Data is received from a plurality of assets, processed (e.g., to reduce duplicative and/or redundant data), and organized chronologically for presentation in a timeline. The data is further grouped and/or prioritized to display some portions of the data more prominently relative to other portions of the data in the timeline (e.g., which may be hidden). Grouping rules and/or prioritization rules for grouping and/or prioritizing the data may be a function of user interaction with the timeline and/or a function of a machine learning algorithm which may be configured to identify patterns in how users interact with the timeline based upon, among other things, a role the user plays relative to the industrial asset and/or an operating state of the industrial asset.

Type: Application

Filed: March 13, 2013

Publication date: September 18, 2014

Applicant: ABB Research Ltd.

Inventors: Shakeel Mahamood Mahate, Karen J. Smiley, Paul F. Wood
SCALABLE GRAPH MODELING OF METADATA FOR DEDUPLICATED STORAGE SYSTEMS

Publication number: 20140279927

Abstract: Embodiments of the invention relate to a method and computer program product for providing a scalable representation of metadata for deduplicated storage systems. The method includes identifying shared data segments that are contained in a plurality of data objects in a deduplicated storage system. A data object centric graph is generated. The generating includes creating vertices that represent the data objects and creating edges between the data objects. An edge connecting two data objects indicates that the two data objects contain at least one shared data segment in common. Each shared data segment between any two data objects is represented by at most one of the edges. At least one of the data objects is manipulated based on the data object centric graph.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Mihail Corneliu Constantinescu, Abdullah Gharaibeh, Maohua Lu
OBJECT STORE MANAGEMENT OPERATIONS WITHIN COMPUTE-CENTRIC OBJECT STORES

Publication number: 20140279955

Abstract: Object store management operations within compute-centric object stores are provided herein. An exemplary method may include transforming an object storage dump into an object store table by a table generator container, wherein the object storage dump includes at least objects within an object store that are marked for deletion, transmitting records for objects from the object store table to reducer containers, such that each reducer container receives object records for at least one object, the object records comprising all object records for the at least one object, generating a set of cleanup tasks by the reducer containers, and executing the cleanup tasks by a cleanup agents.

Type: Application

Filed: September 26, 2013

Publication date: September 18, 2014

Inventors: Mark Cavage, Nathan Fitch, Fred Kuo, Yunong Xiao, David Pacheco, Bryan Cantrill
Method and system for Data De-Duplication in storage devices

Publication number: 20140279949

Abstract: A method and system for data de-duplication in storage devices is disclosed. The method scans for the content within the storage device. When the method obtains all the content within the storage device, it checks for the duplicate content in the storage device. The method identifies duplicate content based on two criteria which include parametric level and Meta data level. The method switches to Meta data level when the method fails to identify duplicate content in parametric level. Further, the method obtains the input from user to delete or retain the duplicate content. If the user provides a confirmation for deleting the duplicate content, the method deletes the duplicate content.

Type: Application

Filed: March 13, 2013

Publication date: September 18, 2014

Inventor: Kadari Subbarao Sudeendra Thirtha Koushik
REDUCING DIGEST STORAGE CONSUMPTION BY TRACKING SIMILARITY ELEMENTS IN A DATA DEDUPLICATION SYSTEM

Publication number: 20140279954

Abstract: For reducing digests storage consumption in a data deduplication system using a processor device in a computing environment, input data is partitioned into chunks, and the chunks are grouped into chunk sets. Digests are calculated for input data and stored in sets corresponding to the chunk sets. Similarity elements are calculated for the input data and the similarity elements are stored in a similarity search structure. The number of similarity elements associated with a chunk set which are currently contained in the similarity search structure is maintained for each chunk set, and when this number of a specific chunk set becomes lower than a threshold, the digests set associated with that chunk set are removed from the repository.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Lior ARONOVICH
REPRESENTING DE-DUPLICATED FILE DATA

Publication number: 20140279958

Abstract: Providing a subset of de-duplicated as output is disclosed. In some embodiments, the output comprises a subset of data stored in de-duplicated form in a plurality of containers each comprising a plurality of data segments comprising the data. For each container that includes one or more data segments comprising the subset, a corresponding container data is included in the output. Each container may include one or more segments not included in the subset. For each container the corresponding container data of which is included in the output, a corresponding value in a data structure comprising for each container stored on the de-duplicated storage system a data value indicating whether or not the corresponding container data has been included in the output is updated.

Type: Application

Filed: May 28, 2014

Publication date: September 18, 2014

Applicant: EMC Corporation

Inventor: Mark Huang
Detecting duplicate records

Patent number: 8838549

Abstract: A method for finding duplicates by matching group of fields in records is disclosed. The method comprises standardizing data using field specific knowledge base; extracting at least part of one or more related fields of records; applying a matching attribute function to generate keys on the “comparable” field part extracted data; generating record level keys using generated field level keys; clustering the records based on generated record level keys; identifying reference record for each cluster identified; and calculating matching percentage for each record in a cluster with respect to reference record of the cluster. Devices and systems are disclosed that enable the method for finding duplicates.

Type: Grant

Filed: July 7, 2008

Date of Patent: September 16, 2014

Inventors: Chandra Bodapati, Noel Vijay Gunasekar
EFFICIENT DATA DEDUPLICATION

Publication number: 20140258245

Abstract: Efficient data deduplication is described herein. A deduplication bit array partition can be created that corresponds to a number of data items in an expected dataset. The deduplication bit array partition can track whether the data items have been received. When a data item in the expected dataset is received, a bit in the deduplication bit array partition corresponding to the received data item can be accessed to determine, based on the value of the bit, if the received data item has already been received. When the value of the bit indicates that the received data item has not already been received, the value can be changed to indicate that the data item has now been received. When the value of the bit indicates that the received data item has already been received, the data item can be deleted or ignored.

Type: Application

Filed: March 7, 2013

Publication date: September 11, 2014

Applicant: JIVE SOFTWARE, INC.

Inventor: James Donald Estes
STORAGE SYSTEM DEDUPLICATION WITH SERVICE LEVEL AGREEMENTS

Publication number: 20140258244

Abstract: Mechanisms are provided for adjusting a configuration of data stored in a storage system. According to various embodiments, a storage module may be configured to store a configuration of data. A processor may be configured to identify an estimated performance level for the storage system based on a configuration of data stored on the storage system.

Type: Application

Filed: March 6, 2013

Publication date: September 11, 2014

Applicant: DELL PRODUCTS L.P.

Inventors: Goutham Rao, Ratna Manoj Bolla, Vinod Jayaraman
RECOGNIZING AND COMBINING REDUNDANT MERCHANT DEISGNATIONS IN A TRANSACTION DATABASE

Publication number: 20140258246

Abstract: Determining whether two merchant location database entries are describing the same merchant location. A subject merchant location database entry and comparison candidate merchant location database entries include a DBA name field, a street address field, and one or more additional descriptive fields descriptive of one or more predetermined characteristics of the respective merchant location. The subject merchant location database entry is compared to a set populated with candidate merchant location database entries, candidates having a predetermined minimum textural similarity with the subject merchant location database entry on the basis of each entry's DBA name field or street address field.

Type: Application

Filed: March 8, 2013

Publication date: September 11, 2014

Applicant: MASTERCARD INTERNATIONAL INCORPORATED

Inventors: Walter Francis Lo Faro, Steve Oshry, Anita Christine Galliani, Gary Randall Horn
Method and system to scan data from a system that supports deduplication

Patent number: 8832042

Abstract: An interface is disclosed that makes information obtained from a file deduplication process available to an application for the efficient operation thereof. A data deduplication repository is scanned to determine a plurality of file segments and respective checksum values associated with the segments. A data structure is generated that allows shared segments to be identified by indexing using a common checksum value. The segments also indicate the file to which they belong and may also include a timestamp value. This data structure is updated as files are modified, etc. The data structure is accessible to an application program so that the application program can readily determine which segments are shared between multiple files. With this information, the application can efficiently process the segment once rather than multiple times. Timestamps can be used by the application to efficiently identify only those segments that were accessed after a given time.

Type: Grant

Filed: March 15, 2010

Date of Patent: September 9, 2014

Assignee: Symantec Corporation

Inventor: Mukund Agrawal
Computer-Implemented System And Method For Identifying Relevant Documents For Display

Publication number: 20140250087

Abstract: A computer-implemented system and method for identifying relevant documents for display are provided. Themes for a set of documents are generated. The documents are clustered based on the themes. A matrix including an inner product of document frequency occurrences and cluster concept weightings for each theme is generated for the documents. From the matrix, documents most relevant to a particular theme are identified, and the relevant documents are displayed.

Type: Application

Filed: May 12, 2014

Publication date: September 4, 2014

Applicant: FTI TECHNOLOGY LLC

Inventors: Dan Gallivan, Kenji Kawai
CHANGE TRACKING FOR MULTIPHASE DEDUPLICATION

Publication number: 20140250080

Abstract: Change tracking for multiphase deduplication. In one example embodiment, a method of tracking changes to a source storage of a source system for multiphase deduplication includes a change tracking phase that includes performing various steps for only allocated blocks in the source storage that are changed between a prior point in time and a subsequent point in time. These steps include temporarily storing a copy of the changed block in a volatile memory of the source system prior to writing the changed block to the source storage, performing a hash function only once on the copy of the changed block, while the copy is temporarily stored in a volatile memory of the source system, to calculate a hash value, writing the changed block to the source storage, and tracking, in a change log, a location in the source storage of the changed block and the corresponding hash value.

Type: Application

Filed: April 23, 2014

Publication date: September 4, 2014

Applicant: STORAGECRAFT TECHNOLOGY CORPORATION

Inventor: Andrew Lynn Gardner
WAN Gateway Optimization by Indicia Matching to Pre-cached Data Stream Apparatus, System, and Method of Operation

Publication number: 20140250086

Abstract: A network gateway coupled to a backup server on a wide area network which receives and de-duplicates binary objects. The backup server provides selected data segments of binary objects to the gateway to store into a prescient cache (p-cache) store. The network gateway optimizes network traffic by fulfilling a local client request from its local p-cache store instead of requiring further network traffic when it matches indicia of stored data segments stored in its p-cache store with indicia of a first segment of a binary object requested from and received from a remote server.

Type: Application

Filed: June 12, 2013

Publication date: September 4, 2014

Applicant: BARRACUDA NETWORKS, INC.

Inventor: Fleming Shi
Method and system for matching and consolidating addresses in a database

Patent number: RE45160

Abstract: An address consolidating system that has a name and address database where duplicate names and address are consolidated by matching name and address and e-mail address simultaneously. The address consolidating system utilizes a database along with off-the-shelf and custom proprietary software. There are two segments to the database: records with name and address data (which may or may not include e-mail address data), and records with e-mail address data (which may include incomplete portions of associated name and address data). Periodically the database is updated with new or corrected name, address, or e-mail information, or with new records obtained from other database lists.

Type: Grant

Filed: January 10, 2008

Date of Patent: September 23, 2014

Assignee: I-BR Technologies, L.L.C.

Inventors: Henry T. Ferlauto, Stephen H. Yu

prev … 8 9 10 11 12 13 14 15 16 … next