Data Cleansing, Data Scrubbing, And Deleting Duplicates Patents (Class 707/692)

Reducing digest storage consumption by tracking similarity elements in a data deduplication system

Patent number: 9116941

Abstract: For reducing digests storage consumption in a data deduplication system using a processor device in a computing environment, input data is partitioned into chunks, and the chunks are grouped into chunk sets. Digests are calculated for input data and stored in sets corresponding to the chunk sets. Similarity elements are calculated for the input data and the similarity elements are stored in a similarity search structure. The number of similarity elements associated with a chunk set which are currently contained in the similarity search structure is maintained for each chunk set, and when this number of a specific chunk set becomes lower than a threshold, the digests set associated with that chunk set are removed from the repository.

Type: Grant

Filed: March 15, 2013

Date of Patent: August 25, 2015

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Lior Aronovich
Method for maintaining multiple fingerprint tables in a deduplicating storage system

Patent number: 9069786

Abstract: A system and method for managing multiple fingerprint tables in a deduplicating storage system. A computer system includes a data storage medium, a first fingerprint table comprising a first plurality of entries, and a second fingerprint table comprising a second plurality of entries. Each of the first plurality of entries and each of the second plurality of entries are configured to store fingerprint related data corresponding to data stored in the data storage medium. A data storage controller is configured to select the first fingerprint table for storage of entries corresponding to data stored in the data storage medium that has been deemed more likely to be successfully deduplicated than other data stored in the data storage medium; and select the second fingerprint table for storage of entries corresponding to data stored in the data storage medium that has been deemed less likely to be successfully deduplicated than other data stored in the data storage medium.

Type: Grant

Filed: November 18, 2013

Date of Patent: June 30, 2015

Assignee: Pure Storage, Inc.

Inventors: John Colgrove, John Hayes, Ethan Miller, Joseph S. Hasbani, Cary Sandvig
Hierarchical identification and mapping of duplicate data in a storage system

Patent number: 9043292

Abstract: The technique introduced here includes a system and method for identifying and mapping duplicate data objects referenced by data objects. The technique illustratively utilizes a hierarchical tree of fingerprints for each data object to compare the data objects and identify duplicate data blocks referenced by the data objects. A progressive comparison of the hierarchical trees starts from a top layer of the hierarchical trees and proceeds toward a base layer. Between the compared data objects (i.e., the compared hierarchical trees), the technique maps matching fingerprints only at the top-most layer of the hierarchical trees at which the fingerprints match. Lower layer matching fingerprints are neither compared nor mapped. Data blocks corresponding to the matching fingerprints are then deleted. Such an identification and mapping technique substantially reduces the amount of mapping metadata stored in data objects that have been subject to deduplication.

Type: Grant

Filed: June 14, 2011

Date of Patent: May 26, 2015

Assignee: NetApp, Inc.

Inventors: Giridhar Appaji Nag Yasa, Nagesh Panyam Chandrasekarasastry
Method for Intelligently Categorizing Data to Delete Specified Amounts of Data Based on Selected Data Characteristics

Publication number: 20150142758

Abstract: A method assigns stored documents within a distributed storage system (DSS) to various document categories to enable a target number of documents to be deleted. An intelligent storage management (ISM) utility identifies a data storage threshold value used to control data storage within the DSS. If a current storage usage exceeds the data storage threshold value, the ISM utility calculates, based on the current storage usage, a target number of documents that can be deleted from the DSS. The ISM utility utilizes a recursive process which includes assigning stored documents to groups including a set of document categories based on data characteristics of the stored documents. The ISM utility further utilizes the recursive process to delete, based on an established ordering of the groups, all of the stored documents assigned to a subset of the groups in order to remove the target number of stored documents.

Type: Application

Filed: August 29, 2014

Publication date: May 21, 2015

Inventors: Dinakaran Joseph, Devaprasad Khandurao Nadgir, Ramkumar Ramalingam, David Elliot Shepard
METHOD AND DEVICE FOR DEDUPLICATING WEB PAGE

Publication number: 20150142760

Abstract: A method and a device is described for de-duplicating a web page. The method includes: extracting at least one core sentence from a target web page; mapping each core sentence to a unique numeric value to form a first numeric value set; determining an intersection set of the first numeric value set and each second numeric value set, and the number of numeric values included in each intersection set, and determining a maximum number of numeric values included in each intersection set; and when a ratio of the maximum number to a total number of numeric values in the first numeric value set is greater than a set threshold, processing the target web page as a duplicate web page. In embodiments of the present invention, during web page de-duplication processing, accuracy can be improved, an anti-noise capability can be enhanced, and a calculating scale can be reduced.

Type: Application

Filed: December 23, 2014

Publication date: May 21, 2015

Inventors: Nan Jiang, Hui Zhang, Jia Wan
METHOD FOR DETECTING THE PLAYBACK OF A DATA PACKET

Publication number: 20150142759

Abstract: A method of detecting whether a packet from a plurality of packets transmitted by at least one transmitting station over a network has been played back is disclosed. Each packet includes a message and an identifier, the packets being successively transmitted over several consecutive time periods. The method includes receiving the packet by at least one receiving station and reading of the identifier of the received packet to obtain a received identifier, and consulting, by the receiving station, a database of identifiers already received to determine whether the received identifier has already been received. If the received identifier has not already been received, the method also includes updating the database to include the received identifier. The identifier includes an indicator of belonging to groups of packets.

Type: Application

Filed: November 20, 2014

Publication date: May 21, 2015

Inventors: Patrick DUPUTZ, Sepideh FOULADGAR, Carlos PINTO
INFORMATION PROCESSING METHOD AND ELECTRONIC DEVICE

Publication number: 20150142757

Abstract: The disclosure provides an information processing method and an electronic device. The electronic device generates M components to be embedded into a first application program when installing a recording application program, M is an integer greater than or equal to 1. There is an association relationship between the M components and the recording application program. In a case where the M components are embedded into the first application program, the method includes: when the first application program runs, displaying a first graphical interface corresponding to the first application program by the electronic device, the first graphical interface including the M components; obtaining a first triggering operation for a first component of the M components; collecting, in response to the first triggering operation, first data content under the first graphical interface directly; and storing the collected first data content.

Type: Application

Filed: March 28, 2014

Publication date: May 21, 2015

Applicant: Lenovo (Beijing) Co., Ltd.

Inventors: Kai Li, Wei Huang, Wenhui Lu, Kangli Zhao
DEDUPLICATION IN DISTRIBUTED FILE SYSTEMS

Publication number: 20150142756

Abstract: Deduplication in a distributed file system is described. Key classes are determined from a set of potential keys, the potential keys used to represent file content stored by the file system. Control of the key classes is apportioned among index nodes of the file system. Nodes in the file system, during deduplication of data chunks of the file content, generate keys calculated from the data chunks. The keys are distributed among the index nodes based on relations between the keys and the key classes controlled by the index nodes.

Type: Application

Filed: June 14, 2011

Publication date: May 21, 2015

Inventors: Mark Robert Watkins, Boris Zuckerman, Oskar Y. Batuner
STORAGE APPARATUS AND DATA MANAGEMENT METHOD

Publication number: 20150142755

Abstract: A control unit of a storage apparatus divides received data into one or more chunks and compresses the divided chunk(s); and regarding the chunk whose compressibility is equal to or lower than a threshold value, the control unit does not store the chunk in the first storage area, but calculates a hash value of the compressed chunk, compares the hash value with a hash value of another data already stored in the second storage area and executes first deduplication processing; and regarding the chunk whose compressibility is higher than the threshold value, the control unit stores the compressed chunk in the first storage area, reads the compressed chunk from the first storage area, calculates a hash value of the compressed chunk, compares the relevant hash value with a hash value of another data already stored in the second storage area, and executes secondary deduplication processing.

Type: Application

Filed: August 24, 2012

Publication date: May 21, 2015

Applicants: HITACHI, LTD., HITACHI INFORMATION & TELECOMMUNICATION ENGINEERING, LTD.

Inventor: Masayuki Kishi
Redundant attribute values

Patent number: 9037551

Abstract: Aspects of the present disclosure provide techniques that determine whether an attribute value is associated with each configuration item in a plurality of configuration items. If it is determined that the attribute value is associated with each configuration item in the plurality of configuration items, the attribute value is deemed a redundant attribute value.

Type: Grant

Filed: March 7, 2012

Date of Patent: May 19, 2015

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: David Azriel, Nimrod Nahum, Nir Mardiks
Write horizon data management

Patent number: 9037825

Abstract: Conditions are enforced to prevent unintended deletion of data stored by a data storage system. For example, to delete a collection of data, a condition on the collection of data's size may be enforced. The collection may be required to be empty, for example. In addition, a condition that there not exist a pending data processing operation that can affect fulfillment of the condition on the collection of data's size is also enforced.

Type: Grant

Filed: November 20, 2012

Date of Patent: May 19, 2015

Assignee: Amazon Technologies, Inc.

Inventors: Bryan James Donlan, Sandeep Kumar
System, method and computer program product for automatic code generation for database object deletion

Patent number: 9037546

Abstract: In accordance with embodiments, there are provided mechanisms and methods for automatic code generation for database object deletion. These mechanisms and methods for automatic code generation for database object deletion can generate code for deleting database objects in an automated manner. The ability to generate code for deleting database objects in an automated manner can enable the efficient and accurate deletion of database objects, including database objects with relationships to other database objects.

Type: Grant

Filed: March 25, 2011

Date of Patent: May 19, 2015

Assignee: salesforce.com, inc.

Inventors: Simon Wong, Sonali Agrawal
Data abstraction layer for interfacing with reporting systems

Patent number: 9037534

Abstract: A data transformation system receives data from one or more external source systems and stores and transforms the data for providing to reporting systems. The data transformation system maintains multiple versions of data received from an external source system. The data transformation system can combine data from different versions of data and provide to the reporting system. As a result, external source systems that do not maintain data in a format appropriate for reporting systems and/or do not maintain sufficient historical data to generate different types of reports are able to generate these reports. The data transformation system can also enhance older versions of data stored in the system or exclude portions of data from reports. The data transformation system can purge older versions of data so that older data that is less frequently requested is maintained at a lower frequency than recent data.

Type: Grant

Filed: December 12, 2014

Date of Patent: May 19, 2015

Assignee: GoodData Corporation

Inventor: Pavel Kolesnikov
Database management system and method which monitors activity levels and determines appropriate schedule times

Patent number: 9037536

Abstract: A system and method for automated database management are provided. Statistics relating to operation of a database may be collected, wherein the database comprises one or more database objects. Characteristics of the database objects may be determined, either automatically or by user intervention, using the collected statistics, one or more policies, and/or one or more definitions. The policies and definitions may be defaults or may be customized by a user. Actions to be performed on the database objects may be determined, either automatically or by user intervention, based on the characteristics of the database objects. A schedule for performing the actions on the database objects may be automatically determined. The actions may be performed on the database objects based on the schedule.

Type: Grant

Filed: October 30, 2007

Date of Patent: May 19, 2015

Assignee: BMC SOFTWARE, INC.

Inventors: Melody Vos, Jeff Slavin
CONTENT ITEM PURGING

Publication number: 20150134624

Abstract: Methods, systems, and computer readable media for content item purging functionality are provided. A contact item purger, such as may be incorporated within a local client application of a content management system, leverages its knowledge as to which items have been uploaded to the content management system, and how long content items have been stored on the user device, to propose items for local deletion and thus reclaiming storage on the user device. A contact item purger may run on one or more devices of a user associated with an account on a content management system upon various triggering events, and may run with or without user interaction, thus maintaining available user device memory capacity at all times.

Type: Application

Filed: November 12, 2013

Publication date: May 14, 2015

Applicant: Dropbox, Inc.

Inventors: Michael Dwan, Anthony Grue, Daniel Kluesing
PRUNING OF SERVER DUPLICATION INFORMATION FOR EFFICIENT CACHING

Publication number: 20150134625

Abstract: Technology is disclosed for improving the storage efficiency and communication efficiency for a storage client device by maximizing the cache hit rate and minimizing data requests to the storage server. The storage server provides a duplication list to the storage client device. The duplication list contains references (e.g. storage addresses) to data blocks that contain duplicate data content. The storage client uses the duplication list to improve the cache hit rate. The duplication list is pruned to contain references to data blocks relevant to the storage client device. The storage server can prune the duplication list based on a working set of storage objects for a client. Alternatively, the storage server can prune the duplication list based on content characteristics, e.g. duplication degree and access frequency. Duplicate blocks to which the client does not have access can be excluded from the duplication list.

Type: Application

Filed: November 13, 2013

Publication date: May 14, 2015

Inventors: James F. Lentini, Anshul Madan, Deepak R. Kenchammana-Hosekote
SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR CONTACT INFORMATION

Publication number: 20150134603

Abstract: A computer based method and system for managing contact information from the contacts of a user. The contact information is collected and transformed to a consistent format, which permits resolution of conflicting information from multiple sources, such as differences in location information from different social mediums. This transformation enables cross media communication, such as notifications between users and contacts about location or other matters. In addition, the transformation permits a single communication to be transformed for use in multiple social media platforms, whether to a single contact or a select group. User interfaces are provided for display and use of such functional interactions.

Type: Application

Filed: October 15, 2014

Publication date: May 14, 2015

Applicant: Connect Software Corporation

Inventors: ZACH MELAMED, Ryan Allis, Anima Sarah LaVoy, Jared Weinstock, Dan Ho, Nick Gonzalez, Lilia Tamm, Dana Chambers
PARALLEL DATA PARTITIONING

Publication number: 20150134623

Abstract: A method, system, and data storage medium for parallel partitioning of input data into chunks for data deduplication, comprising: dividing said input data into segments; for at least one segment, appending a portion of a subsequent segment; searching the segments in parallel for candidate breaking points; and partitioning each segment into chunks based on a group of final breaking points selected from said candidate breaking points.

Type: Application

Filed: February 17, 2011

Publication date: May 14, 2015

Applicant: JITCOMM NETWORKS PTE LTD

Inventor: Yong Steven Liu
Data replication feedback for transport input/output

Patent number: 9032032

Abstract: Architecture for efficiently ensuring that data is stored to the desired destination datastore such as for replication processes. A copy of data (e.g., messages) sent to a datastore for storage is stored at an alternate location until a received signal indicates that the storage and replication was successful. As soon as the feedback signal is received, the copy is removed from the alternate location, and hence, improves input/output (I/O) and storage patterns. The feedback mechanism can also be used for monitoring the status of data transport associated with log shipping, for example, and taking the appropriate actions when storage (e.g., replication) is not being performed properly.

Type: Grant

Filed: June 26, 2008

Date of Patent: May 12, 2015

Assignee: Microsoft Technology Licensing, LLC

Inventors: David Mills, Todd Luttinen, Victor Boctor
USE OF SOLID STATE STORAGE DEVICES AND THE LIKE IN DATA DEDUPLICATION

Publication number: 20150127621

Abstract: Systems and methods of data deduplication are disclosed comprising generating a hash value of a data block and comparing the hash value to a table in a first memory that correlates ranges of hash values with buckets of hash values in a second memory different from the first memory. A bucket is identified based on the comparison and the bucket is searched to locate the hash value. If the hash value is not found in the bucket, the hash value is stored in the bucket and the data block is stored in a third memory. The first memory may be volatile memory and the second memory may be non-volatile random access memory, such as an SSD. Rebalancing of buckets and the table, and use of additional metadata to determine where data blocks should be stored, are also disclosed.

Type: Application

Filed: November 4, 2014

Publication date: May 7, 2015

Inventor: Chin L. KUO
DISTRIBUTED DATA SYSTEM WITH DOCUMENT MANAGEMENT AND ACCESS CONTROL

Publication number: 20150127607

Abstract: Data management systems and methods include a cloud-based platform coupled to a system of agents or folders hosted on client devices. The platform does not store actual data but instead makes use of metadata provided by the agents to track a location of all data in the system and manage the distributed storage, movement and processing of the actual data among the agents. In so doing, the platform pools networked storage into “virtual clusters” using local storage at the agents. The agents collectively monitor, store, and transfer or move data, and perform data processing operations as directed by the platform, as described in detail herein. The agents include agents hosted on or coupled to processor-based devices, agents hosted on devices of a local area network, agents hosted on devices of a wide area network, agents hosted on mobile devices, and agents hosted on cloud-based devices.

Type: Application

Filed: September 15, 2014

Publication date: May 7, 2015

Inventors: Bret SAVAGE, Casey MARSHALL, Geoffrey STUTCHMAN, Ross ELTHERINGTON, Steve OWENS, George NORTHUP
METHODS AND APPARATUS FOR NETWORK EFFICIENT DEDUPLICATION

Publication number: 20150127622

Abstract: Mechanisms are provided for performing network efficient deduplication. Segments are extracted from files received for deduplication at a host connected to a target over one or more networks and/or fabrics in a deduplication system. Segment identifiers (IDs) are determined and compared with segment IDs for segments already deduplicated. Segments already deduplicated need not be transmitted to a target system. References and reference counts are modified at a target system. Updating references and reference counts may involve modifying filemaps, dictionaries, and datastore suitcases for both already deduplicated and not already deduplicated segments.

Type: Application

Filed: January 13, 2015

Publication date: May 7, 2015

Applicant: Dell Products L.P.

Inventor: Vinod Jayaraman
Fragmentation control for performing deduplication operations

Patent number: 9026503

Abstract: The techniques introduced here provide for enabling deduplication operations for a file system without significantly affecting read performance of the file system due to fragmentation of the data sets in the file system. The techniques include determining, by a storage server that hosts the file system, a level of fragmentation that would be introduced to a data set stored in the file system as a result of performing a deduplication operation on the data set. The storage server then compares the level of fragmentation with a threshold value and determines whether to perform the deduplication operation based on a result of comparing the level of fragmentation with the threshold value. The threshold value represents an acceptable level of fragmentation in the data sets of the file system.

Type: Grant

Filed: February 29, 2012

Date of Patent: May 5, 2015

Assignee: NetApp, Inc.

Inventors: Alok Sharma, Sunil Walwaiker, Vaijayanti Bharadwaj
Multi-row database data loading for enterprise workflow application

Patent number: 9026504

Abstract: Embodiments of the invention are directed to a system, method, or computer program product for providing expedited loading/inserting of data by an entity. Specifically, the invention expedites the loading/inserting of large quantities of data to database tables. Initially received data for loading is processed, via multi-row insert, onto in-memory or temporary tables. The data is staged on a temporary table while the appropriate base table is determined. Once determined, data from the temporary table is pointed to the base table. In this way, a massive amount of data loading from the temporary table to a base table may occur. This prevents logging and locking associated with adding individual data points or row to a base table independently. Errors are check and processed accordingly. Once updated, the data on the temporary table is deleted in mass and a check point restart is issued.

Type: Grant

Filed: February 4, 2013

Date of Patent: May 5, 2015

Assignee: Bank of America Corporation

Inventors: Ron G. Rambo, Steven A. Walker
SYSTEM AND METHOD FOR AGGREGATING MEDIA CONTENT METADATA

Publication number: 20150120681

Abstract: A system and a method to aggregate multiple content servers' metadata to a local database is provided that enable various features such as improved performance, non searchable server support, duplicate handling and protocol independence. The system performs local content crawling, remote server crawling and remote server searching to create an aggregated database of metadata. The content is located in a single database. Hence, the duplicate metadata can be removed easily.

Type: Application

Filed: October 27, 2013

Publication date: April 30, 2015

Applicant: Videon Central, Inc.

Inventors: Robert Behe, Robert Kennedy, Russ Shanahan, Derek Andrews, James Condon
DISCUSSION SUMMARY

Publication number: 20150120680

Abstract: One or more techniques and/or systems are provided for providing a discussion summary corresponding to a search query and/or for providing discussion session search results. For example, discussion data (e.g., corresponding to real-time messaging, such as a microblog discussion) may be evaluated to identify a discussion topic for a discussion sessions (e.g., a kitchen renovation topic may be assigned to a 1 hour exchange of kitchen renovation messages by a discussion group). A discussion summary of a discussion session may be provided based upon the discussion session having a discussion topic corresponding to a search query topic of a search query. The discussion summary may be provided along with other results for the query and may describe the discussion group, identifiers such as hashtags used by the discussion group, meeting dates/times, average number(s) of participants, other discussion sessions hosted by the discussion group, future discussion sessions, and/or other information.

Type: Application

Filed: October 24, 2013

Publication date: April 30, 2015

Applicant: Microsoft Corporation

Inventors: Omar Alonso, Kartikay Khandelwal, Mohamed Mansour, Paul Ko, Nina Mishra, Krishnaram Kenthapadi, Abhimanyu Das
AUTOMATED RECOGNITION OF PATTERNS IN A LOG FILE HAVING UNKNOWN GRAMMAR

Publication number: 20150120682

Abstract: Embodiments of the present invention disclose a method, computer program product, and system for recognizing patterns in log files with unknown grammar. A computer replaces one or more alphanumeric strings with a first alphanumeric character to generate a first resulting string. The computer then replaces one or more identical pairs of characters of the first resulting string with a second alphanumeric character to generate a second resulting string. The computer then replaces one or more consecutive instances of the second alphanumeric character, in the second resulting string, with one instance of the second alphanumeric character to generate a compressed string.

Type: Application

Filed: October 28, 2013

Publication date: April 30, 2015

Applicant: International Business Machines Corporation

Inventors: Fiona M. Crowther, Geza Geleji, Martin A. Ross
Distributed deduplicated storage system

Patent number: 9020900

Abstract: A distributed, deduplicated storage system according to certain embodiments is arranged in a parallel configuration including multiple deduplication nodes. Deduplicated data is distributed across the deduplication nodes. The deduplication nodes can be networked together and communicate with one another according using a light-weight, customized communication scheme (e.g., a scheme based on FTP or HTTP). In some cases, deduplication management information including deduplication signatures and/or other metadata is stored separately from the deduplicated data in deduplication management nodes, improving performance and scalability.

Type: Grant

Filed: December 13, 2011

Date of Patent: April 28, 2015

Assignee: CommVault Systems, Inc.

Inventors: Manoj Kumar Vijayan Retnamma, Rajiv Kottomtharayil, Deepak Raghunath Attarde
Active file Instant Cloning

Patent number: 9020909

Abstract: Techniques and mechanisms are provided to instantly clone active files including active optimized files. When a new instance of an active file is created, a new stub is generated in the user namespace and a block map file is cloned. The block map file includes the same offsets and location pointers that existed in the original block map file. No user file data needs to be copied. If the cloned file is later modified, the behavior can be same as what happens when a de-duplicated file is modified.

Type: Grant

Filed: February 7, 2013

Date of Patent: April 28, 2015

Assignee: Dell Products L.P.

Inventors: Vinod Jayaraman, Goutham Rao, Ratna Manoj Bolla
Recovering duplicate blocks in file systems

Patent number: 9020903

Abstract: A method is used in recovering duplicate blocks in file systems. A duplicate file system block is detected in a file system. The duplicate file system block is referred by a first inode associated with a first file of the file system and a second inode associated with a second file of the file system. Metadata of the duplicate file system block is evaluated. Based on the evaluation, a set of inodes in the file system is determined. Each inode of the set of inodes refer to the duplicate file system block. Based on the determination, the set of inodes is updated.

Type: Grant

Filed: June 29, 2012

Date of Patent: April 28, 2015

Assignee: EMC Corporation

Inventors: Srinivasa Rao Vempati, Dixitkumar Vishnubhai Patel, Jean-Pierre Bono, Marshall Hansi Wu
SYSTEMS AND METHODS FOR PROVIDING INCREASED SCALABILITY IN DEDUPLICATION STORAGE SYSTEMS

Publication number: 20150112950

Abstract: A computer-implemented method for providing increased scalability in deduplication storage systems may include (1) identifying a database that stores a plurality of reference objects, (2) determining that at least one size-related characteristic of the database has reached a predetermined threshold, (3) partitioning the database into a plurality of sub-databases capable of being updated independent of one another, (4) identifying a request to perform an update operation that updates one or more reference objects stored within at least one sub-database, and then (5) performing the update operation on less than all of the sub-databases to avoid processing costs associated with performing the update operation on all of the sub-databases. Various other systems, methods, and computer-readable media are also disclosed.

Type: Application

Filed: December 23, 2014

Publication date: April 23, 2015

Inventors: Xianbo Zhang, Fanglu Guo, Weibao Wu
Method and apparatus for eventually consistent delete in a distributed data store

Patent number: 9015126

Abstract: Techniques for effective delete operations in a distributed data store with eventually consistent replicated entries include determining to delete a particular entry from the distributed data store. Each entry includes a first field that holds data that indicates a key and a second field that holds data that indicates content associated with the key and a third field that holds data that indicates a version for the content. The method also comprises causing, at least in part, actions that result in marking the particular entry as deleted without removing the particular entry, and updating a version in the third field for the particular entry.

Type: Grant

Filed: April 21, 2011

Date of Patent: April 21, 2015

Assignee: Nokia Corporation

Inventors: Mark Rambacher, Abhijit Bagri, Yekesa Kosuru
Data deduplication using CRC-seed differentiation between data and stubs

Patent number: 9015552

Abstract: Various embodiments for differentiating between data and stubs pointing to a parent copy of deduplicated data are provided. Undeduplicated data is stored with a checksum of an initial value as a first cyclic redundancy check (CRC) seed. A stub pointing to the parent copy of the deduplicated data is stored with an additional checksum of a differing, additional initial value as a second CRC seed.

Type: Grant

Filed: May 14, 2013

Date of Patent: April 21, 2015

Assignee: International Business Machines Corporation

Inventors: Allen K. Bates, Nils Haustein, Craig A. Klein, Frank Krick, Ulf Troppens, Daniel J. Winarski
Information management method, and computer for providing information

Patent number: 9015131

Abstract: When an online storage service is used to expand a storage capacity of a file server, an amount of communication in synchronization processing and an amount of data retained on the online storage service are reduced to save an amount of charge. In a kernel module provided with a storage area on the online storage service, files are divided into block files and managed, and blocks overlapping with an already registered and saved block file group are not uploaded, but only configuration information of the files is changed. A mechanism is adopted, in which DBs for managing meta information and elimination of duplication are divided and managed, and only updated sections are appropriately uploaded.

Type: Grant

Filed: August 26, 2011

Date of Patent: April 21, 2015

Assignee: Hitachi Solutions, Ltd.

Inventors: Yasuhiro Kirihata, Kouji Nakayama
System and method for exposing cloud stored data to a content delivery network

Patent number: 9015212

Abstract: A system for exposing data stored in a cloud computing system to a content delivery network provider includes a database configured to receive and store metadata about the data, the database being implemented in the cloud computing system to store configuration metadata for the data related to the content delivery network, and an origin server configured to receive requests for the data from the content delivery network provider, and configured to provide the data to the content delivery network provider based on the metadata.

Type: Grant

Filed: October 16, 2012

Date of Patent: April 21, 2015

Assignee: Rackspace US, Inc.

Inventors: Goetz David, Gregory Lee Holt
METHODS AND SYSTEMS FOR INTELLIGENT ARCHIVE SEARCHING IN MULTIPLE REPOSITORY SYSTEMS

Publication number: 20150106344

Abstract: Systems and methods of providing a configurable table of rules that defines a repository/archive search priority that includes multiple repositories/archives. In this manner, repository/archives are successively searched and after a first result is returned the search is stopped. Repository/archives searched in priority order based on location in pre-configured “tiers.” This enables searches to be directed to repository/archives that are best able to handle load for different types of searches, and for different types of studies as well. A duplicate priority list enables an administrator to designate which repository/archive will appear on search results list if duplicates are found. For example, in clinical study archiving systems, the search priority enables an administrator to direct searches to repository best able to handle load for different types of searches and for different types of studies.

Type: Application

Filed: October 7, 2014

Publication date: April 16, 2015

Inventor: Mark Allan Wagner
MULTI-NODE HYBRID DEDUPLICATION

Publication number: 20150106345

Abstract: According to at least one embodiment, a data storage system is provided. The data storage system includes memory, at least one processor in data communication with the memory, and a deduplication director component executable by the at least one processor. The deduplication director component is configured to receive data for storage on the data storage system, analyze the data to determine whether the data is suitable for at least one of summary-based deduplication, content-based deduplication, and no deduplication, and store, in a common object store, at least one of the data and a reference to duplicate data stored in the common object store.

Type: Application

Filed: October 15, 2014

Publication date: April 16, 2015

Inventors: Ronald Ray Trimble, Jeffrey V. Tofano, Thomas R. Ramsdell, Jon Christopher Kennedy
TECHNIQUE FOR GLOBAL DEDUPLICATION ACROSS DATACENTERS WITH MINIMAL COORDINATION

Publication number: 20150106343

Abstract: A system and method for global data de-duplication in a cloud storage environment utilizing a plurality of data centers is provided. Each cloud storage gateway appliance divides a data stream into a plurality of data objects and generates a content-based hash value as a key for each data object. An IMMUTABLE PUT operation is utilized to store the data object at the associated key within the cloud.

Type: Application

Filed: October 16, 2013

Publication date: April 16, 2015

Applicant: NetApp, Inc.

Inventors: Kiran Nenmeli Srinivasan, Kishore Kasi Udayashankar, Swetha Krishnan
Addressing Cross-Allocated Blocks in a File System

Publication number: 20150106336

Abstract: A mechanism is provided for cross-allocated block repair in a mounted file system. A set of cross-allocated blocks are identified from a plurality of blocks within an inode of the mounted file system, based on a corresponding bit associated with each cross-allocated block in a duplicated block information bitmap being in a first identified state. The set of cross-allocated blocks are repaired using a user-defined repair process. Then one or more of the set of cross-allocated blocks are deallocated based on results of the user-defined repair process.

Type: Application

Filed: December 15, 2014

Publication date: April 16, 2015

Inventors: Kalyan C. Gunda, Srikanth Srinivasan
Methods and systems for data cleanup using physical image of files on storage devices

Patent number: 9009435

Abstract: Systems and computer program products are provided for optimizing selection of files for deletion from one or more data storage devices to free up a predetermined amount of space in the one or more data storage devices. A method includes analyzing an effective space occupied by each file of a plurality of files in the one or more data storage devices, identifying, from the plurality of files, one or more data blocks making up a file to free up the predetermined amount of space based on the analysis of the effective space of each file of the plurality of files, selecting one or more of the plurality of files as one or more candidate files for deletion, based on the identified one or more data blocks, and deleting the one or more candidate files for deletion from the one or more data storage devices.

Type: Grant

Filed: August 13, 2012

Date of Patent: April 14, 2015

Assignee: International Business Machines Corporation

Inventors: Duane Mark Baldwin, Sandeep Ramesh Patil, Riyazahamad Moulasab Shiraguppi, Prashant Sodhiya
Methods and systems for data cleanup using physical image of files on storage devices

Patent number: 9009434

Abstract: Systems and computer program products are provided for optimizing selection of files for eviction from a first storage pool to free up a predetermined amount of space in the first storage pool. A method includes analyzing an effective space occupied by each file of a plurality of files in the first storage pool, identifying, from the plurality of files, one or more data blocks making up a file to free up the predetermined amount of space based on the analysis of the effective space of each file of the plurality of files, selecting one or more of the plurality of files as one or more candidate files for eviction, based on the identified one or more data blocks, and evicting the one or more candidate files for eviction from the first storage pool to a second storage pool.

Type: Grant

Filed: August 13, 2012

Date of Patent: April 14, 2015

Assignee: International Business Machines Corporation

Inventors: Duane Mark Baldwin, Sandeep Ramesh Patil, Riyazahamad Moulasab Shiraguppi, Prashant Sodhiya
ATTRIBUTE REDUNDANCY REMOVAL

Publication number: 20150100554

Abstract: Systems, methods, and other embodiments associated with attribute redundancy removal are described. In one embodiment, a method includes identifying redundant attribute values in a group of attributes that describe two items. The example method also includes generating a pruned group of attributes having the redundant attribute values removed. The similarity of the two items is calculated based, at least in part, on the pruned group of attribute values.

Type: Application

Filed: October 31, 2013

Publication date: April 9, 2015

Inventors: Z. Maria WANG, Su-Ming WU
Methods and systems for data cleanup using physical image of files on storage devices

Patent number: 9003152

Abstract: Methods, systems, and computer program products are provided for optimizing selection of files for eviction from a first storage pool to free up a predetermined amount of space in the first storage pool. A method includes analyzing an effective space occupied by each file of a plurality of files in the first storage pool, identifying, from the plurality of files, one or more data blocks making up a file to free up the predetermined amount of space based on the analysis of the effective space of each file of the plurality of files, selecting one or more of the plurality of files as one or more candidate files for eviction, based on the identified one or more data blocks, and evicting the one or more candidate files for eviction from the first storage pool to a second storage pool.

Type: Grant

Filed: November 5, 2013

Date of Patent: April 7, 2015

Assignee: International Business Machines Corporation

Inventors: Duane M. Baldwin, Sandeep R. Patil, Riyazahamad M. Shiraguppi, Prashant Sodhiya
Conditional storage object deletion

Patent number: 9002805

Abstract: Methods and apparatus for conditional deletes of storage objects are disclosed. A storage medium comprises program instructions that when executed, implement a metadata node of a storage service in which a protocol based on sequence numbers is used to resolve update conflicts. The instructions store, as part of a conditional deletion record associated with a key of a particular storage object identified as a deletion candidate, a deletion sequence number derived from a particular modification sequence number of the object. In accordance with the protocol, the instructions determine whether an additional modification sequence number larger than the deletion sequence number has been generated in response to an operation associated with the key. If such an additional sequence number has been generated, the deletion of the storage object is canceled.

Type: Grant

Filed: December 14, 2012

Date of Patent: April 7, 2015

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey Michael Barber, Praveen Kumar Gattu, Christopher Henning Elving, Derek Ernest Denny-Brown, II, Carl Yates Perry
Methods and systems for data cleanup using physical image of files on storage devices

Patent number: 9003151

Abstract: Methods, systems, and computer program products are provided for optimizing selection of files for deletion from one or more data storage devices to free up a predetermined amount of space in the one or more data storage devices. A method includes analyzing an effective space occupied by each file of a plurality of files in the one or more data storage devices, identifying, from the plurality of files, one or more data blocks making up a file to free up the predetermined amount of space based on the analysis of the effective space of each file of the plurality of files, selecting one or more of the plurality of files as one or more candidate files for deletion, based on the identified one or more data blocks, and deleting the one or more candidate files for deletion from the one or more data storage devices.

Type: Grant

Filed: November 5, 2013

Date of Patent: April 7, 2015

Assignee: International Business Machines Corporation

Inventors: Duane M. Baldwin, Sandeep R. Patil, Riyazahamad M. Shiraguppi, Prashant Sodhiya
Identifying Product Groups in Ecommerce

Publication number: 20150095291

Abstract: Systems and methods are disclosed herein for supplementing product records with product groups that are relevant to the product records. Queries form users may be analyzed to extract keywords. Search results for keywords are evaluated to determine category consistency among product records, including such values as entropy and taxonomy depth. Those keywords with search results having adequate category consistency are selected as product groups and the search results associated with the product groups. Product groups are associated with product records according to a random walk of a graph having as nodes products and product groups and links representing belonging of a product to a product group. Product groups may be selected based on a transition probability based on a random walk and a quality score based on usage of a product group page for the product group.

Type: Application

Filed: September 30, 2013

Publication date: April 2, 2015

Applicant: Wal-Mart Stores, Inc.

Inventor: Shankara B. Subramanya
System for ensuring the internal consistency of a fact repository

Patent number: 8996470

Abstract: Methods and systems for maintaining the internal consistency of a fact repository are described. Accessed objects are checked for attribute-value pairs that have links to other objects. For any link to an object, the name of the linked-to object is inserted into the attribute-value pair having the link. The accessed objects are filtered to remove attribute-value pairs meeting predefined criteria, possibly resulting in null objects. Links to null objects are identified and removed.

Type: Grant

Filed: May 31, 2005

Date of Patent: March 31, 2015

Assignee: Google Inc.

Inventors: Andrew William Hogue, Robert Joseph Siemborski, Jonathan T. Betz
Accessing an image in a continuous data protection using deduplication-based storage

Patent number: 8996460

Abstract: In one aspect, a method to generate a point-in-time (PIT) snapshot of deduplication-based volume includes generating a virtual access data structure, generating a preliminary snapshot of the volume and modifying the preliminary snapshot to point to a block according to the virtual access data structure to generate the PIT snapshot of the deduplication-based volume.

Type: Grant

Filed: March 14, 2013

Date of Patent: March 31, 2015

Assignee: EMC Corporation

Inventors: Shahar Frank, Assaf Natanzon, Jehuda Shemer
Distributed scalable deduplicated data backup system

Patent number: 8996467

Abstract: A distributed, cloud-based storage system provides a reliable, deduplicated, scalable and high performance backup service to heterogeneous clients that connect to it via a communications network. The distributed cloud-based storage system guarantees consistent and reliable data storage while using structured storage that lacks ACID compliance. Consistency and reliability are guaranteed using a system that includes: 1) back references from shared objects to referring objects, 2) safe orders of operation for object deletion and creation, 3) and simultaneous access to shared resources through sub-resources.

Type: Grant

Filed: December 29, 2011

Date of Patent: March 31, 2015

Assignee: Druva Inc.

Inventors: Anand Apte, Faisal Puthuparackat, Jaspreet Singh, Milind Borate, Shekhar S. Deshkar
Global information management system and method

Patent number: 8996475

Abstract: A global information management system (GIMS) includes a collection of standards and methods that allow information management on a global scale. A GIMS computer network includes a central registration database (CRD) and one or more GIMS computer systems connected over a network. Each GIMS computer system includes a relational database having a set of standardized tables. The CRD may provide a GIMS network-unique system ID to each GIMS computer system. Each GIMS computer system uses the GIMS network-unique system ID as part of a primary key for each record generated by and stored in the set of standardized tables of the GIMS database. The GIMS enables global database normalization through the globally unique identification of database records.

Type: Grant

Filed: July 17, 2013

Date of Patent: March 31, 2015

Assignee: Asibo Inc.

Inventor: Borsu Asisi Namini

prev … 4 5 6 7 8 9 10 11 12 … next