Data Cleansing, Data Scrubbing, And Deleting Duplicates Patents (Class 707/692)
  • Patent number: 8874529
    Abstract: One or more aspects of the invention include transforming source data in order to display a work product. A plurality of rules relating to content manipulation of the source data include at least one rule relating to content selection and at least one rule relating to content compression. Source data for content manipulation may also be received. A selected portion of the source data and a compressed portion of the source data may be formed. The compressed portion may then be received and presented on a computer a work product.
    Type: Grant
    Filed: March 16, 2009
    Date of Patent: October 28, 2014
    Inventor: Bert A. Silich
  • Publication number: 20140317067
    Abstract: Disclosed are computer implemented methods, computer program products, and computer systems for storing a file into a storage system. An embodiment includes, responsive to a determination that a descriptive information describing content of a first file corresponds to a descriptive information describing content of a second file, that a format of the first file is convertible to a format of the second file using a transformation matrix, and that the format of the first file has a higher quality indicator value than the format of the second file, storing the first file into the storage system.
    Type: Application
    Filed: April 23, 2014
    Publication date: October 23, 2014
    Applicant: International Business Machines Corporation
    Inventors: Michael Baessler, Peng Hui Jiang, Pi Jun Jiang
  • Patent number: 8868519
    Abstract: Method, apparatus and program product for generating check data for a location within an area of a workspace include receiving an identifier for a selected location that has check data associated therewith. Candidate check data for use with the selected location is generated. The candidate check data is evaluated for a match against at least one of existing check data for the selected location or check data associated with a related location. Based on the evaluation, a determination is made of whether the candidate check data is acceptable for use for the selected location.
    Type: Grant
    Filed: May 27, 2011
    Date of Patent: October 21, 2014
    Assignee: Vocollect, Inc.
    Inventors: James D. Maloy, Michael Kusar, Alexander Mranca, Venkatesh Narayan, Jeffrey Thorsen
  • Patent number: 8868505
    Abstract: Data protection programs are installed at each network host. The programs communicate with each other to scan the hosts and identify duplicate and unique data objects stored at the hosts. Duplicate data objects are maintained on the hosts. Unique data objects are broken into chunks, copied to other hosts, and a parity data is calculated. When a network host becomes unavailable and is replaced with a new network host, duplicate data objects stored on the now unavailable network host may be rebuilt on the new network host using the maintained duplicate data objects on the other hosts. Unique data objects stored on the now unavailable network host may be rebuilt on the new network host using the copied chunks and parity data.
    Type: Grant
    Filed: March 20, 2012
    Date of Patent: October 21, 2014
    Assignee: EMC Corporation
    Inventor: Mahendra Nag Jayanthi
  • Patent number: 8868624
    Abstract: Embodiments of the present invention relate to systems, methods and computer storage media for facilitating the structured storage of binary large objects (Blobs) to be accessed by an application program being executed by a computing device. Generally, the manipulation of Blobs in a structured storage system includes receiving a request for a Blob, which may be located by way of a Blob pointer. The Blob pointer allows for the data, such as properties, of the Blob to be identified and located. Expired properties are garbage collected as a manipulation of the Blob data within a structured storage system. In an embodiment, the Blob is identified by a key that is utilized within a primary structured index to located the requested Blob. In another embodiment, the requested Blob is located utilizing a secondary hash index. In an additional embodiment, the Blob is locate utilizing a file table.
    Type: Grant
    Filed: July 22, 2013
    Date of Patent: October 21, 2014
    Assignee: Microsoft Corporation
    Inventors: Bradley Gene Calder, Ju Wang, Xinran Wu, Niranjan Nilakantan, Deepali Bhardwaj, Shashwat Srivastav, Alexander Felsobuki Nagy
  • Patent number: 8868520
    Abstract: A system and method efficiently removes ranges of entries from a flat sorted data structure, such as a fingerprint database, of a storage system. The ranges of entries represent fingerprints that have become stale, i.e., are not representative of current states of corresponding blocks in the file system, due to various file system operations such as, e.g., deletion of a data block without overwriting its contents. A deduplication module performs an attributes intersect range calculation (AIRC) procedure on the stale fingerprint data structure to compute a set of non-overlapping and latest consistency point (CP) ranges. The output from the AIRC procedure, i.e., the set of non-overlapping and latest CP ranges, is then used to remove stale fingerprints associated with that deleted block (as well as each other deleted data block) from the fingerprint database.
    Type: Grant
    Filed: March 1, 2012
    Date of Patent: October 21, 2014
    Assignee: NetApp, Inc.
    Inventors: Rohini Raghuwanshi, Ashish Shukla, Praveen Killamsetti
  • Patent number: 8868518
    Abstract: Keyed aggregation is used in the processing of streaming data to streamline processing to provide higher throughput and decreased use of resources. The most recent event for each unique replacement key value(s) is maintained. In response to an incoming event having a same key as a previous event, the effect on an aggregation of the previous event is removed. The aggregation is then updated with one or more values from the arriving event and the updated aggregation is output.
    Type: Grant
    Filed: August 14, 2009
    Date of Patent: October 21, 2014
    Assignee: International Business Machines Corporation
    Inventors: Henrique Andrade, Mitchell A. Cohen, Bugra Gedik
  • Publication number: 20140310252
    Abstract: An information processing apparatus is provided, in which content and position information generated independently of each other are recorded in a recording medium. The apparatus includes a recording medium in which the content and the position information are recorded and a deletion unit deleting position information temporally associated with a piece of the content from the recording medium when the piece of content is deleted from the recording medium.
    Type: Application
    Filed: June 26, 2014
    Publication date: October 16, 2014
    Applicant: Sony Corporation
    Inventors: Masayuki ICHIHARA, Masanao TSUTSUI
  • Publication number: 20140310251
    Abstract: Deduplication dictionaries are used to maintain data chunk identifier and location pairings in a deduplication system. When access to a particular data chunk is requested, a deduplication dictionary is accessed to determine the location of the data chunk and a datastore is accessed to retrieve the data chunk. However, deduplication dictionaries are large and typically maintained on disk, so dictionary access is expensive. Techniques and mechanisms of the present invention allow prefetches or read aheads of datastore (DS) headers. For example, if a dictionary hit results in datastore DS(X), then headers for DS (X+1), DS (X+2), DS(X+read-ahead-window) are prefetched ahead of time. These datastore headers are cached in memory, and indexed by datastore identifier. Before going to the dictionary, a lookup is first performed in the cached headers to reduce deduplication data access request latency.
    Type: Application
    Filed: June 23, 2014
    Publication date: October 16, 2014
    Applicant: Dell Products L.P.
    Inventors: Vinod Jayaraman, Ratna Manoj Bolla
  • Publication number: 20140310250
    Abstract: Techniques are provided for de-duplication of data. In one embodiment, a system comprises de-duplication logic that is coupled to a de-duplication repository. The de-duplication logic is operable to receive, from a client device over a network, a request to store a file in the de-duplicated repository using a single storage encoding. The request includes a file identifier and a set of signatures that identify a set of chunks from the file. The de-duplication logic determines whether any chunks in the set are missing from the de-duplicated repository and requests the missing chunks from the client device. Then, for each missing chunk, the de-duplication logic stores in the de-duplicated repository that chunk and a signature representing that chunk. The de-duplication logic also stores, in the de-duplicated repository, a file entry that represents the file and that associates the set of signatures with the file identifier.
    Type: Application
    Filed: January 7, 2014
    Publication date: October 16, 2014
    Applicant: VMware, Inc.
    Inventors: Israel Zvi BEN-SHAUL, Leonid VASETSKY
  • Patent number: 8862562
    Abstract: A file management apparatus, file management method, and file management program product are provided in which a user who receives a file-saving related notice from a system can easily grasp the contents of the notified file. Accordingly, a designated notice destination that the end of a save period of a file recorded in a file saving apparatus is provided. The apparatus includes a save-period counter, a save-period monitoring section for monitoring an end of a save period of each file basing on timing by the save-period counter, an attachment-file making section for making a partial file composed of a part of file contents, a notice transmitting section for notifying a notice destination of a fact that there is a file at the end of a save period, and a notice-file making section for attaching a partial file of the file to the notice of the notice transmitting section.
    Type: Grant
    Filed: December 16, 2004
    Date of Patent: October 14, 2014
    Assignee: Konica Minolta, Inc.
    Inventors: Takeshi Hibino, Kazuyuki Kawabata, Hideyuki Hashimoto
  • Patent number: 8862558
    Abstract: In file de-duplication using hash value comparison, hash values of all target files must be calculated and actual data of all files must be read for hash value calculation, so that the processing time was long. The present invention provides a file storage system comprising a controller and a volume storing a plurality of files, the volume including a first directory storing a first file and a second file and a second directory storing a third file being created, wherein the controller migrates actual data of the second file to the third file, sets up a management information of the second file so that the third file is referred to when the second file is read, and if the sizes of actual data of the first file and the actual data of the third file are identical and the binaries of the actual data of the first file and the actual data of the third file are identical, sets up a management information of the first file to refer to the third file when reading the first file.
    Type: Grant
    Filed: January 25, 2012
    Date of Patent: October 14, 2014
    Assignee: Hitachi, Ltd.
    Inventors: Tomonori Esaka, Takaki Nakamura, Hitoshi Kamei, Masakuni Agetsuma
  • Publication number: 20140304238
    Abstract: An approach is provided for detect duplicate messages with multiple probabilistic data structures. A de-duplication platform causes, at least in part, a representing of one or more messages in two or more probabilistic data structures. The de-duplication platform further causes, at least in part, an alternating clearing of the two or more probabilistic data structures as respective probabilistic data structures are filled with the one or more messages to respective thresholds, with the two or more probabilistic data structures facilitating determination of one or more duplicates among the one or more messages.
    Type: Application
    Filed: April 5, 2013
    Publication date: October 9, 2014
    Applicant: Nokia Corporation
    Inventors: Tero Mikael Halla-Aho, Yongbeom Pak, Srikanth Kyatham, Eero Tapani Lepisto
  • Publication number: 20140304239
    Abstract: Systems for deduplicating one or more storage units of a storage system provide a scheduler, which is operable to select at least one storage unit (e.g. a storage volume) for deduplication and perform a deduplication process, which removes duplicate data blocks from the selected storage volume. The systems are operable to determine the state of one or more storage units and manage deduplication requests in part based state information. The system is further operable to manage user generated requests and manage deduplication requests in part based on user input information. The system may include a rules engine which prioritizes system operations including determining an order in which to perform state-gathering information and determining an order in which to perform deduplication. The system is further operable to determine the order in which storage units are processed.
    Type: Application
    Filed: April 5, 2013
    Publication date: October 9, 2014
    Applicant: NetApp, Inc.
    Inventors: Blake Lewis, Ling Zheng, Craig Johnston, Vinod Daga
  • Publication number: 20140304241
    Abstract: A sampling based technique for eliminating duplicate data (de-duplication) stored on storage resources, is provided. According to the invention, when a new data set, e.g., a backup data stream, is received by a server, e.g., a storage system or virtual tape library (VTL) system implementing the invention, one or more anchors are identified within the new data set. The anchors are identified using a novel anchor detection circuitry in accordance with an illustrative embodiment of the present invention. Upon receipt of the new data set by, for example, a network adapter of a VTL system, the data set is transferred using direct memory access (DMA) operations to a memory associated with an anchor detection hardware card that is operatively interconnected with the storage system. The anchor detection hardware card may be implemented as, for example, a FPGA is to quickly identify anchors within the data set.
    Type: Application
    Filed: June 20, 2014
    Publication date: October 9, 2014
    Inventors: Steven C. Miller, Roger Stager
  • Publication number: 20140304240
    Abstract: A method allocates object replicas in a distributed storage system. The method identifies a plurality of objects in the distributed storage system. Each object has an associated storage policy that specifies a target number of object replicas stored at distinct instances of the distributed storage system. The method identifies an object of the plurality of objects whose number of object replicas exceeds the target number of object replicas specified by the storage policy associated with the object. The method selects a first replica of the object for removal based on last access times for replicas of the object, and transmits a request to a first instance of the distributed storage system that stores the first replica. The request instructs the first instance to remove the first replica of the object.
    Type: Application
    Filed: June 2, 2014
    Publication date: October 9, 2014
    Applicant: Google Inc.
    Inventors: Yonatan Zunger, Alexandre Drobychev, Alexander Kesselman, Rebekah C. Vickrey, Frank C. Dachille, George Datuashvili
  • Publication number: 20140304242
    Abstract: A storage system 103 carries out first and second de-duplication processes in response to receiving a write request from a client. First, a determination is made as to whether a write target data item overlaps with any of the stored data items of a part of a stored data item group, which is a user data item group stored in a storage device 209, and if so, the write target data item is prevented from being stored in the storage device. Second, a determination is made as to whether a target stored data item, which is not finished being evaluated as to whether it overlaps with the stored data item in the first de-duplication process, overlaps with another stored data item, and if so, the target stored data item or the same data item overlapping with the target stored data item is deleted from the storage device 209.
    Type: Application
    Filed: June 20, 2014
    Publication date: October 9, 2014
    Inventors: Takaki NAKAMURA, Akira YAMAMOTO, Masaaki IWASAKI, Yohsuke ISHII, Nobumitsu TAKAOKA
  • Patent number: 8856082
    Abstract: An approach for managing a family tree archive is provided. The approach includes creating an electronic archive based on a family tree. The approach also includes automatically discovering Internet-based data associated with at least one member of the family tree. The approach additionally includes adding the Internet-based data to the archive. The approach further includes storing the archive at a storage device.
    Type: Grant
    Filed: May 23, 2012
    Date of Patent: October 7, 2014
    Assignee: International Business Machines Corporation
    Inventors: Michael D. Hale, Tian M. Pan, Randy A. Rendahl
  • Patent number: 8856144
    Abstract: Techniques are disclosed for configuring an identity resolution system to support distinct relevance types. Identity records are accessed that are assigned relevance scores of distinct relevance types. Upon determining that the identity records refer to a common individual, the identity records are resolved into an entity representing the common individual. Relevance scores of the distinct relevance types are then determined for the entity, based on the identity records.
    Type: Grant
    Filed: July 18, 2012
    Date of Patent: October 7, 2014
    Assignee: International Business Machines Corporation
    Inventors: Thomas B. Allen, Barry M. Caceres
  • Publication number: 20140297604
    Abstract: A system and methods for reconciling data and metadata in a cloud storage system while the cloud storage system is fully operational are provided. The method comprises scanning for broken references in a metadata database containing metadata of blocks stored in the cloud storage system, wherein the scanning for the broken references is performed as a background process; and synchronously verifying blocks for at least existence of the blocks in the object storage system, wherein the synchronous block verification is performed using a foreground process as blocks are requested.
    Type: Application
    Filed: March 27, 2014
    Publication date: October 2, 2014
    Applicant: CTERA NETWORKS, LTD.
    Inventor: Aron Brand
  • Publication number: 20140297602
    Abstract: A cleaning application that can clean, for one or more user profiles, at least one of one or more files of a computer or a registry of the computer is provided. The cleaning application can include a cleaning module. The cleaning module can select a plurality of user profiles of the computer. The cleaning module can further select at least one of a file location or a user profile hive for each user profile of the plurality of user profiles. The cleaning module can further clean at least one of one or more files stored within the file location or a registry stored within the user profile hive for each user profile of the plurality of user profiles.
    Type: Application
    Filed: March 29, 2013
    Publication date: October 2, 2014
    Applicant: Piriform Ltd.
    Inventor: Guy SANER
  • Publication number: 20140297601
    Abstract: System and method to compact a NoSQL database, the method including: receiving, by a receiver coupled to a processor, an indication of a record to delete in the NoSQL database; for each file in the NoSQL database, perform the steps of: if said file does not contain the record to delete, placing said file in a first memory; if said file contains the record to delete: placing said file in a second memory; searching whether the record to delete from said file in the second memory matches a record in one or more files in the first memory; and if a searched files in the first memory contain the record to delete from said file in the second memory, compacting said file in the second memory with the files in the first memory that contain the record to delete.
    Type: Application
    Filed: March 28, 2013
    Publication date: October 2, 2014
    Applicant: Avaya Inc.
    Inventor: Anne Pruner
  • Publication number: 20140297576
    Abstract: A system and method for filtering data sources is provided. Data corresponding to an entity listing is received from a set of data sources including one or more primary data sources and at least one secondary data source. The received data is grouped based attributes of the entity listing. Common values between data from the one or more primary data sources and data from the at least one secondary data source are identified for each attribute of the entity listing. A probability that one of the at least one secondary data source copied data from the one or more primary data sources is calculated based on the identified common values. A determination of whether the calculated probability is greater than a predetermined value is made. If the calculated probability is greater than the predetermined value, the one data source is removed from the at least one secondary data source.
    Type: Application
    Filed: April 1, 2013
    Publication date: October 2, 2014
    Applicant: Google Inc.
    Inventor: Google Inc.
  • Publication number: 20140297603
    Abstract: A replicated file deduplication apparatus generates a hash key of a requested data block, determines whether the same data block as the requested data block exists in data blocks of a replicated image file that is derived from the same golden image file as the requested data block using the hash key of the requested data block, and records, if the same data block as the requested data block exists, information of a chunk in which the same data block as the requested data block is stored at a layout of the requested data block.
    Type: Application
    Filed: June 26, 2013
    Publication date: October 2, 2014
    Inventors: Young-Chang KIM, Hong Yeon KIM, Young Kyun KIM
  • Patent number: 8849768
    Abstract: A computer-implemented method may include identifying at least one file and detecting an event that is suggestive of at least a portion of the file being duplicated in at least one additional file. The computer-implemented method may also include classifying the file as a candidate for deduplication in response to detecting the event. The computer-implemented method may further include maintaining the file's candidate-for-deduplication classification for use in prompting a determination on whether the portion of the file is already stored within a storage device.
    Type: Grant
    Filed: March 8, 2011
    Date of Patent: September 30, 2014
    Assignee: Symantec Corporation
    Inventor: Namita Agrawal
  • Patent number: 8849773
    Abstract: Techniques and mechanisms are provided to support live file optimization. Active I/O access to an optimization target is monitored during optimization. Active files need not be taken offline or made unavailable to an application during optimization and retain the ability to support file operations such as read, write, unlink, and truncate while an optimization engine performs deduplication and/or compression on active file ranges.
    Type: Grant
    Filed: March 4, 2011
    Date of Patent: September 30, 2014
    Assignee: Dell Products L.P.
    Inventors: Abhijit Dinkar, Vinod Jayaraman, Murali Bashyam, Goutham Rao
  • Patent number: 8849772
    Abstract: Data replication with delta compression is disclosed. A primary system and a replica system are determined to both have an identical first data segment that is similar to a second data segment. The second data segment is encoded, wherein the encoding refers to the first data segment.
    Type: Grant
    Filed: November 14, 2008
    Date of Patent: September 30, 2014
    Assignee: EMC Corporation
    Inventors: Mark Huang, Philip Shilane, Grant Wallace, Ming Benjamin Zhu
  • Patent number: 8843501
    Abstract: Techniques are disclosed for configuring an identity resolution system to support distinct relevance types. Identity records are accessed that are assigned relevance scores of distinct relevance types. Upon determining that the identity records refer to a common individual, the identity records are resolved into an entity representing the common individual. Relevance scores of the distinct relevance types are then determined for the entity, based on the identity records.
    Type: Grant
    Filed: February 18, 2011
    Date of Patent: September 23, 2014
    Assignee: International Business Machines Corporation
    Inventors: Thomas B. Allen, Barry M. Caceres
  • Patent number: 8843454
    Abstract: Digital objects within a fixed-content storage cluster use a page mapping table and a hash-to-UID table to store a representation of each object. For each object stored within the cluster, a record in the hash-to-UID table stores the object's hash value and its unique identifier (or portions thereof). To detect a duplicate of an object, a portion of its hash value is used as a key into the page mapping table. The page mapping table indicates a node holding a hash-to-UID table indicating currently stored objects in a particular page range. Finding the same hash value but with a different unique identifier in the table indicates that a duplicate of an object exists. Portions of the hash value and unique identifier may be used in the hash-to-UID table. Unneeded duplicate objects are deleted by copying their metadata to a manifest and then redirecting unique identifiers to point at the manifest.
    Type: Grant
    Filed: April 25, 2014
    Date of Patent: September 23, 2014
    Assignee: Caringo, Inc.
    Inventors: Paul R. M. Carpentier, Russell Turpin
  • Publication number: 20140279950
    Abstract: The present invention provides a method for modifying a first storage medium having a plurality of files, the method including providing a first modification tool; operatively coupling the first storage medium to the modification tool, wherein the operatively coupling includes bypassing a first operating system used to access the plurality of files; and dematerializing, using the first modification tool, at least a first file to form one or more dematerialized files. In some embodiments, the present invention provides a modification system for modifying a first storage medium having a plurality of files, the system including a first modification tool that includes an attachment module configured to operatively couple the modification tool to the first storage medium such that a first operating system used to access the plurality of files is bypassed; and a dematerialization module configured to dematerialize at least a first file to form one or more dematerialized files.
    Type: Application
    Filed: March 14, 2013
    Publication date: September 18, 2014
    Inventors: Joshua Shapiro, Robert Gezelter
  • Publication number: 20140279952
    Abstract: For efficient calculation of both similarity search values and boundaries of digest blocks in data deduplication, input data is partitioned into chunks, and for each chunk a set of rolling hash values is calculated. A single linear scan of the rolling hash values is used to produce both similarity search values and boundaries of the digest blocks of the chunk.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Shay H. AKIRAV, Lior ARONOVICH, Shira BEN-DOR, Michael HIRSCH, Ofer LENEMAN
  • Publication number: 20140279953
    Abstract: For reducing digests storage consumption in a data deduplication system using a processor device in a computing environment, digest values are calculated for input data. The digest values are used to locate matches with data stored in a repository. The digest values are stored in the repository. The digest values of the data stored in the repository that is determined to be redundant with the input data are removed.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
  • Publication number: 20140279957
    Abstract: A system and method that implements a tabular graph editor are disclosed. The system supports employing tables to browse and edit comparisons by multiple attributes of nodes in a graph.
    Type: Application
    Filed: March 10, 2014
    Publication date: September 18, 2014
    Applicant: Clados Management LLC
    Inventors: Peter Moore, Franz Christian Halaschek-Wiener, Andrey Pleshakov, Maxwell Bernardy
  • Publication number: 20140279951
    Abstract: For digest retrieval based on similarity search in deduplication processing in a data deduplication system using a processor device in a computing environment, input data is partitioned into fixed sized data chunks. Similarity elements and digest block boundaries and digest values are calculated for each of the fixed sized data chunks. Matching similarity elements are searched for in a search structure containing the similarity elements for each of the fixed sized data chunks in a repository of data. Positions of similar data are located in the repository. The positions of the similar data are used to locate and load into the memory stored digest values and corresponding stored digest block boundaries of the similar data in the repository. The digest values and the corresponding digest block boundaries of the input data are matched with the stored digest values and the corresponding stored digest block boundaries to find data matches.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Shay H. AKIRAV, Lior ARONOVICH, Shira BEN-DOR, Michael HIRSCH, Ofer LENEMAN
  • Publication number: 20140279956
    Abstract: A system configured to compute match potential between first data and second data is provided. The system includes data storage storing the first data and the second data, and at least one processor coupled to the data storage. The at least one processor is configured to identify a first sequence of fingerprints characterizing a first plurality of sections of the first data, the first sequence being ordered according to an order of the first plurality of sections within the first data; identify a second sequence of fingerprints comprising fingerprints that match fingerprints within the first sequence, the second sequence of fingerprints characterizing a second plurality of sections of the second data, the second sequence being ordered according to an order of the second plurality of sections within the second data; quantify a similarity between the first sequence and the second sequence; and adjust the match potential based on the similarity.
    Type: Application
    Filed: January 22, 2014
    Publication date: September 18, 2014
    Inventors: Ronald Ray Trimble, Jon Christopher Kennedy, Timmie G. Reiter, David Michael Biernacki, Carey Jay McMaster, Stefan Merrill King
  • Publication number: 20140279948
    Abstract: Among other things, one or more techniques and/or systems are provided for developing a timeline chronicling events pertaining to an industrial asset. Data is received from a plurality of assets, processed (e.g., to reduce duplicative and/or redundant data), and organized chronologically for presentation in a timeline. The data is further grouped and/or prioritized to display some portions of the data more prominently relative to other portions of the data in the timeline (e.g., which may be hidden). Grouping rules and/or prioritization rules for grouping and/or prioritizing the data may be a function of user interaction with the timeline and/or a function of a machine learning algorithm which may be configured to identify patterns in how users interact with the timeline based upon, among other things, a role the user plays relative to the industrial asset and/or an operating state of the industrial asset.
    Type: Application
    Filed: March 13, 2013
    Publication date: September 18, 2014
    Applicant: ABB Research Ltd.
    Inventors: Shakeel Mahamood Mahate, Karen J. Smiley, Paul F. Wood
  • Publication number: 20140279927
    Abstract: Embodiments of the invention relate to a method and computer program product for providing a scalable representation of metadata for deduplicated storage systems. The method includes identifying shared data segments that are contained in a plurality of data objects in a deduplicated storage system. A data object centric graph is generated. The generating includes creating vertices that represent the data objects and creating edges between the data objects. An edge connecting two data objects indicates that the two data objects contain at least one shared data segment in common. Each shared data segment between any two data objects is represented by at most one of the edges. At least one of the data objects is manipulated based on the data object centric graph.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Mihail Corneliu Constantinescu, Abdullah Gharaibeh, Maohua Lu
  • Publication number: 20140279955
    Abstract: Object store management operations within compute-centric object stores are provided herein. An exemplary method may include transforming an object storage dump into an object store table by a table generator container, wherein the object storage dump includes at least objects within an object store that are marked for deletion, transmitting records for objects from the object store table to reducer containers, such that each reducer container receives object records for at least one object, the object records comprising all object records for the at least one object, generating a set of cleanup tasks by the reducer containers, and executing the cleanup tasks by a cleanup agents.
    Type: Application
    Filed: September 26, 2013
    Publication date: September 18, 2014
    Inventors: Mark Cavage, Nathan Fitch, Fred Kuo, Yunong Xiao, David Pacheco, Bryan Cantrill
  • Publication number: 20140279949
    Abstract: A method and system for data de-duplication in storage devices is disclosed. The method scans for the content within the storage device. When the method obtains all the content within the storage device, it checks for the duplicate content in the storage device. The method identifies duplicate content based on two criteria which include parametric level and Meta data level. The method switches to Meta data level when the method fails to identify duplicate content in parametric level. Further, the method obtains the input from user to delete or retain the duplicate content. If the user provides a confirmation for deleting the duplicate content, the method deletes the duplicate content.
    Type: Application
    Filed: March 13, 2013
    Publication date: September 18, 2014
    Inventor: Kadari Subbarao Sudeendra Thirtha Koushik
  • Publication number: 20140279954
    Abstract: For reducing digests storage consumption in a data deduplication system using a processor device in a computing environment, input data is partitioned into chunks, and the chunks are grouped into chunk sets. Digests are calculated for input data and stored in sets corresponding to the chunk sets. Similarity elements are calculated for the input data and the similarity elements are stored in a similarity search structure. The number of similarity elements associated with a chunk set which are currently contained in the similarity search structure is maintained for each chunk set, and when this number of a specific chunk set becomes lower than a threshold, the digests set associated with that chunk set are removed from the repository.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Lior ARONOVICH
  • Publication number: 20140279958
    Abstract: Providing a subset of de-duplicated as output is disclosed. In some embodiments, the output comprises a subset of data stored in de-duplicated form in a plurality of containers each comprising a plurality of data segments comprising the data. For each container that includes one or more data segments comprising the subset, a corresponding container data is included in the output. Each container may include one or more segments not included in the subset. For each container the corresponding container data of which is included in the output, a corresponding value in a data structure comprising for each container stored on the de-duplicated storage system a data value indicating whether or not the corresponding container data has been included in the output is updated.
    Type: Application
    Filed: May 28, 2014
    Publication date: September 18, 2014
    Applicant: EMC Corporation
    Inventor: Mark Huang
  • Patent number: 8838549
    Abstract: A method for finding duplicates by matching group of fields in records is disclosed. The method comprises standardizing data using field specific knowledge base; extracting at least part of one or more related fields of records; applying a matching attribute function to generate keys on the “comparable” field part extracted data; generating record level keys using generated field level keys; clustering the records based on generated record level keys; identifying reference record for each cluster identified; and calculating matching percentage for each record in a cluster with respect to reference record of the cluster. Devices and systems are disclosed that enable the method for finding duplicates.
    Type: Grant
    Filed: July 7, 2008
    Date of Patent: September 16, 2014
    Inventors: Chandra Bodapati, Noel Vijay Gunasekar
  • Publication number: 20140258245
    Abstract: Efficient data deduplication is described herein. A deduplication bit array partition can be created that corresponds to a number of data items in an expected dataset. The deduplication bit array partition can track whether the data items have been received. When a data item in the expected dataset is received, a bit in the deduplication bit array partition corresponding to the received data item can be accessed to determine, based on the value of the bit, if the received data item has already been received. When the value of the bit indicates that the received data item has not already been received, the value can be changed to indicate that the data item has now been received. When the value of the bit indicates that the received data item has already been received, the data item can be deleted or ignored.
    Type: Application
    Filed: March 7, 2013
    Publication date: September 11, 2014
    Applicant: JIVE SOFTWARE, INC.
    Inventor: James Donald Estes
  • Publication number: 20140258244
    Abstract: Mechanisms are provided for adjusting a configuration of data stored in a storage system. According to various embodiments, a storage module may be configured to store a configuration of data. A processor may be configured to identify an estimated performance level for the storage system based on a configuration of data stored on the storage system.
    Type: Application
    Filed: March 6, 2013
    Publication date: September 11, 2014
    Applicant: DELL PRODUCTS L.P.
    Inventors: Goutham Rao, Ratna Manoj Bolla, Vinod Jayaraman
  • Publication number: 20140258246
    Abstract: Determining whether two merchant location database entries are describing the same merchant location. A subject merchant location database entry and comparison candidate merchant location database entries include a DBA name field, a street address field, and one or more additional descriptive fields descriptive of one or more predetermined characteristics of the respective merchant location. The subject merchant location database entry is compared to a set populated with candidate merchant location database entries, candidates having a predetermined minimum textural similarity with the subject merchant location database entry on the basis of each entry's DBA name field or street address field.
    Type: Application
    Filed: March 8, 2013
    Publication date: September 11, 2014
    Applicant: MASTERCARD INTERNATIONAL INCORPORATED
    Inventors: Walter Francis Lo Faro, Steve Oshry, Anita Christine Galliani, Gary Randall Horn
  • Patent number: 8832042
    Abstract: An interface is disclosed that makes information obtained from a file deduplication process available to an application for the efficient operation thereof. A data deduplication repository is scanned to determine a plurality of file segments and respective checksum values associated with the segments. A data structure is generated that allows shared segments to be identified by indexing using a common checksum value. The segments also indicate the file to which they belong and may also include a timestamp value. This data structure is updated as files are modified, etc. The data structure is accessible to an application program so that the application program can readily determine which segments are shared between multiple files. With this information, the application can efficiently process the segment once rather than multiple times. Timestamps can be used by the application to efficiently identify only those segments that were accessed after a given time.
    Type: Grant
    Filed: March 15, 2010
    Date of Patent: September 9, 2014
    Assignee: Symantec Corporation
    Inventor: Mukund Agrawal
  • Publication number: 20140250087
    Abstract: A computer-implemented system and method for identifying relevant documents for display are provided. Themes for a set of documents are generated. The documents are clustered based on the themes. A matrix including an inner product of document frequency occurrences and cluster concept weightings for each theme is generated for the documents. From the matrix, documents most relevant to a particular theme are identified, and the relevant documents are displayed.
    Type: Application
    Filed: May 12, 2014
    Publication date: September 4, 2014
    Applicant: FTI TECHNOLOGY LLC
    Inventors: Dan Gallivan, Kenji Kawai
  • Publication number: 20140250080
    Abstract: Change tracking for multiphase deduplication. In one example embodiment, a method of tracking changes to a source storage of a source system for multiphase deduplication includes a change tracking phase that includes performing various steps for only allocated blocks in the source storage that are changed between a prior point in time and a subsequent point in time. These steps include temporarily storing a copy of the changed block in a volatile memory of the source system prior to writing the changed block to the source storage, performing a hash function only once on the copy of the changed block, while the copy is temporarily stored in a volatile memory of the source system, to calculate a hash value, writing the changed block to the source storage, and tracking, in a change log, a location in the source storage of the changed block and the corresponding hash value.
    Type: Application
    Filed: April 23, 2014
    Publication date: September 4, 2014
    Applicant: STORAGECRAFT TECHNOLOGY CORPORATION
    Inventor: Andrew Lynn Gardner
  • Publication number: 20140250086
    Abstract: A network gateway coupled to a backup server on a wide area network which receives and de-duplicates binary objects. The backup server provides selected data segments of binary objects to the gateway to store into a prescient cache (p-cache) store. The network gateway optimizes network traffic by fulfilling a local client request from its local p-cache store instead of requiring further network traffic when it matches indicia of stored data segments stored in its p-cache store with indicia of a first segment of a binary object requested from and received from a remote server.
    Type: Application
    Filed: June 12, 2013
    Publication date: September 4, 2014
    Applicant: BARRACUDA NETWORKS, INC.
    Inventor: Fleming Shi
  • Patent number: RE45160
    Abstract: An address consolidating system that has a name and address database where duplicate names and address are consolidated by matching name and address and e-mail address simultaneously. The address consolidating system utilizes a database along with off-the-shelf and custom proprietary software. There are two segments to the database: records with name and address data (which may or may not include e-mail address data), and records with e-mail address data (which may include incomplete portions of associated name and address data). Periodically the database is updated with new or corrected name, address, or e-mail information, or with new records obtained from other database lists.
    Type: Grant
    Filed: January 10, 2008
    Date of Patent: September 23, 2014
    Assignee: I-BR Technologies, L.L.C.
    Inventors: Henry T. Ferlauto, Stephen H. Yu