Data Cleansing, Data Scrubbing, And Deleting Duplicates Patents (Class 707/692)
-
Patent number: 9116941Abstract: For reducing digests storage consumption in a data deduplication system using a processor device in a computing environment, input data is partitioned into chunks, and the chunks are grouped into chunk sets. Digests are calculated for input data and stored in sets corresponding to the chunk sets. Similarity elements are calculated for the input data and the similarity elements are stored in a similarity search structure. The number of similarity elements associated with a chunk set which are currently contained in the similarity search structure is maintained for each chunk set, and when this number of a specific chunk set becomes lower than a threshold, the digests set associated with that chunk set are removed from the repository.Type: GrantFiled: March 15, 2013Date of Patent: August 25, 2015Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Lior Aronovich
-
Patent number: 9069786Abstract: A system and method for managing multiple fingerprint tables in a deduplicating storage system. A computer system includes a data storage medium, a first fingerprint table comprising a first plurality of entries, and a second fingerprint table comprising a second plurality of entries. Each of the first plurality of entries and each of the second plurality of entries are configured to store fingerprint related data corresponding to data stored in the data storage medium. A data storage controller is configured to select the first fingerprint table for storage of entries corresponding to data stored in the data storage medium that has been deemed more likely to be successfully deduplicated than other data stored in the data storage medium; and select the second fingerprint table for storage of entries corresponding to data stored in the data storage medium that has been deemed less likely to be successfully deduplicated than other data stored in the data storage medium.Type: GrantFiled: November 18, 2013Date of Patent: June 30, 2015Assignee: Pure Storage, Inc.Inventors: John Colgrove, John Hayes, Ethan Miller, Joseph S. Hasbani, Cary Sandvig
-
Patent number: 9043292Abstract: The technique introduced here includes a system and method for identifying and mapping duplicate data objects referenced by data objects. The technique illustratively utilizes a hierarchical tree of fingerprints for each data object to compare the data objects and identify duplicate data blocks referenced by the data objects. A progressive comparison of the hierarchical trees starts from a top layer of the hierarchical trees and proceeds toward a base layer. Between the compared data objects (i.e., the compared hierarchical trees), the technique maps matching fingerprints only at the top-most layer of the hierarchical trees at which the fingerprints match. Lower layer matching fingerprints are neither compared nor mapped. Data blocks corresponding to the matching fingerprints are then deleted. Such an identification and mapping technique substantially reduces the amount of mapping metadata stored in data objects that have been subject to deduplication.Type: GrantFiled: June 14, 2011Date of Patent: May 26, 2015Assignee: NetApp, Inc.Inventors: Giridhar Appaji Nag Yasa, Nagesh Panyam Chandrasekarasastry
-
Publication number: 20150142758Abstract: A method assigns stored documents within a distributed storage system (DSS) to various document categories to enable a target number of documents to be deleted. An intelligent storage management (ISM) utility identifies a data storage threshold value used to control data storage within the DSS. If a current storage usage exceeds the data storage threshold value, the ISM utility calculates, based on the current storage usage, a target number of documents that can be deleted from the DSS. The ISM utility utilizes a recursive process which includes assigning stored documents to groups including a set of document categories based on data characteristics of the stored documents. The ISM utility further utilizes the recursive process to delete, based on an established ordering of the groups, all of the stored documents assigned to a subset of the groups in order to remove the target number of stored documents.Type: ApplicationFiled: August 29, 2014Publication date: May 21, 2015Inventors: Dinakaran Joseph, Devaprasad Khandurao Nadgir, Ramkumar Ramalingam, David Elliot Shepard
-
Publication number: 20150142760Abstract: A method and a device is described for de-duplicating a web page. The method includes: extracting at least one core sentence from a target web page; mapping each core sentence to a unique numeric value to form a first numeric value set; determining an intersection set of the first numeric value set and each second numeric value set, and the number of numeric values included in each intersection set, and determining a maximum number of numeric values included in each intersection set; and when a ratio of the maximum number to a total number of numeric values in the first numeric value set is greater than a set threshold, processing the target web page as a duplicate web page. In embodiments of the present invention, during web page de-duplication processing, accuracy can be improved, an anti-noise capability can be enhanced, and a calculating scale can be reduced.Type: ApplicationFiled: December 23, 2014Publication date: May 21, 2015Inventors: Nan Jiang, Hui Zhang, Jia Wan
-
Publication number: 20150142759Abstract: A method of detecting whether a packet from a plurality of packets transmitted by at least one transmitting station over a network has been played back is disclosed. Each packet includes a message and an identifier, the packets being successively transmitted over several consecutive time periods. The method includes receiving the packet by at least one receiving station and reading of the identifier of the received packet to obtain a received identifier, and consulting, by the receiving station, a database of identifiers already received to determine whether the received identifier has already been received. If the received identifier has not already been received, the method also includes updating the database to include the received identifier. The identifier includes an indicator of belonging to groups of packets.Type: ApplicationFiled: November 20, 2014Publication date: May 21, 2015Inventors: Patrick DUPUTZ, Sepideh FOULADGAR, Carlos PINTO
-
Publication number: 20150142757Abstract: The disclosure provides an information processing method and an electronic device. The electronic device generates M components to be embedded into a first application program when installing a recording application program, M is an integer greater than or equal to 1. There is an association relationship between the M components and the recording application program. In a case where the M components are embedded into the first application program, the method includes: when the first application program runs, displaying a first graphical interface corresponding to the first application program by the electronic device, the first graphical interface including the M components; obtaining a first triggering operation for a first component of the M components; collecting, in response to the first triggering operation, first data content under the first graphical interface directly; and storing the collected first data content.Type: ApplicationFiled: March 28, 2014Publication date: May 21, 2015Applicant: Lenovo (Beijing) Co., Ltd.Inventors: Kai Li, Wei Huang, Wenhui Lu, Kangli Zhao
-
Publication number: 20150142756Abstract: Deduplication in a distributed file system is described. Key classes are determined from a set of potential keys, the potential keys used to represent file content stored by the file system. Control of the key classes is apportioned among index nodes of the file system. Nodes in the file system, during deduplication of data chunks of the file content, generate keys calculated from the data chunks. The keys are distributed among the index nodes based on relations between the keys and the key classes controlled by the index nodes.Type: ApplicationFiled: June 14, 2011Publication date: May 21, 2015Inventors: Mark Robert Watkins, Boris Zuckerman, Oskar Y. Batuner
-
Publication number: 20150142755Abstract: A control unit of a storage apparatus divides received data into one or more chunks and compresses the divided chunk(s); and regarding the chunk whose compressibility is equal to or lower than a threshold value, the control unit does not store the chunk in the first storage area, but calculates a hash value of the compressed chunk, compares the hash value with a hash value of another data already stored in the second storage area and executes first deduplication processing; and regarding the chunk whose compressibility is higher than the threshold value, the control unit stores the compressed chunk in the first storage area, reads the compressed chunk from the first storage area, calculates a hash value of the compressed chunk, compares the relevant hash value with a hash value of another data already stored in the second storage area, and executes secondary deduplication processing.Type: ApplicationFiled: August 24, 2012Publication date: May 21, 2015Applicants: HITACHI, LTD., HITACHI INFORMATION & TELECOMMUNICATION ENGINEERING, LTD.Inventor: Masayuki Kishi
-
Patent number: 9037551Abstract: Aspects of the present disclosure provide techniques that determine whether an attribute value is associated with each configuration item in a plurality of configuration items. If it is determined that the attribute value is associated with each configuration item in the plurality of configuration items, the attribute value is deemed a redundant attribute value.Type: GrantFiled: March 7, 2012Date of Patent: May 19, 2015Assignee: Hewlett-Packard Development Company, L.P.Inventors: David Azriel, Nimrod Nahum, Nir Mardiks
-
Patent number: 9037825Abstract: Conditions are enforced to prevent unintended deletion of data stored by a data storage system. For example, to delete a collection of data, a condition on the collection of data's size may be enforced. The collection may be required to be empty, for example. In addition, a condition that there not exist a pending data processing operation that can affect fulfillment of the condition on the collection of data's size is also enforced.Type: GrantFiled: November 20, 2012Date of Patent: May 19, 2015Assignee: Amazon Technologies, Inc.Inventors: Bryan James Donlan, Sandeep Kumar
-
Patent number: 9037546Abstract: In accordance with embodiments, there are provided mechanisms and methods for automatic code generation for database object deletion. These mechanisms and methods for automatic code generation for database object deletion can generate code for deleting database objects in an automated manner. The ability to generate code for deleting database objects in an automated manner can enable the efficient and accurate deletion of database objects, including database objects with relationships to other database objects.Type: GrantFiled: March 25, 2011Date of Patent: May 19, 2015Assignee: salesforce.com, inc.Inventors: Simon Wong, Sonali Agrawal
-
Patent number: 9037534Abstract: A data transformation system receives data from one or more external source systems and stores and transforms the data for providing to reporting systems. The data transformation system maintains multiple versions of data received from an external source system. The data transformation system can combine data from different versions of data and provide to the reporting system. As a result, external source systems that do not maintain data in a format appropriate for reporting systems and/or do not maintain sufficient historical data to generate different types of reports are able to generate these reports. The data transformation system can also enhance older versions of data stored in the system or exclude portions of data from reports. The data transformation system can purge older versions of data so that older data that is less frequently requested is maintained at a lower frequency than recent data.Type: GrantFiled: December 12, 2014Date of Patent: May 19, 2015Assignee: GoodData CorporationInventor: Pavel Kolesnikov
-
Patent number: 9037536Abstract: A system and method for automated database management are provided. Statistics relating to operation of a database may be collected, wherein the database comprises one or more database objects. Characteristics of the database objects may be determined, either automatically or by user intervention, using the collected statistics, one or more policies, and/or one or more definitions. The policies and definitions may be defaults or may be customized by a user. Actions to be performed on the database objects may be determined, either automatically or by user intervention, based on the characteristics of the database objects. A schedule for performing the actions on the database objects may be automatically determined. The actions may be performed on the database objects based on the schedule.Type: GrantFiled: October 30, 2007Date of Patent: May 19, 2015Assignee: BMC SOFTWARE, INC.Inventors: Melody Vos, Jeff Slavin
-
Publication number: 20150134624Abstract: Methods, systems, and computer readable media for content item purging functionality are provided. A contact item purger, such as may be incorporated within a local client application of a content management system, leverages its knowledge as to which items have been uploaded to the content management system, and how long content items have been stored on the user device, to propose items for local deletion and thus reclaiming storage on the user device. A contact item purger may run on one or more devices of a user associated with an account on a content management system upon various triggering events, and may run with or without user interaction, thus maintaining available user device memory capacity at all times.Type: ApplicationFiled: November 12, 2013Publication date: May 14, 2015Applicant: Dropbox, Inc.Inventors: Michael Dwan, Anthony Grue, Daniel Kluesing
-
Publication number: 20150134625Abstract: Technology is disclosed for improving the storage efficiency and communication efficiency for a storage client device by maximizing the cache hit rate and minimizing data requests to the storage server. The storage server provides a duplication list to the storage client device. The duplication list contains references (e.g. storage addresses) to data blocks that contain duplicate data content. The storage client uses the duplication list to improve the cache hit rate. The duplication list is pruned to contain references to data blocks relevant to the storage client device. The storage server can prune the duplication list based on a working set of storage objects for a client. Alternatively, the storage server can prune the duplication list based on content characteristics, e.g. duplication degree and access frequency. Duplicate blocks to which the client does not have access can be excluded from the duplication list.Type: ApplicationFiled: November 13, 2013Publication date: May 14, 2015Inventors: James F. Lentini, Anshul Madan, Deepak R. Kenchammana-Hosekote
-
Publication number: 20150134603Abstract: A computer based method and system for managing contact information from the contacts of a user. The contact information is collected and transformed to a consistent format, which permits resolution of conflicting information from multiple sources, such as differences in location information from different social mediums. This transformation enables cross media communication, such as notifications between users and contacts about location or other matters. In addition, the transformation permits a single communication to be transformed for use in multiple social media platforms, whether to a single contact or a select group. User interfaces are provided for display and use of such functional interactions.Type: ApplicationFiled: October 15, 2014Publication date: May 14, 2015Applicant: Connect Software CorporationInventors: ZACH MELAMED, Ryan Allis, Anima Sarah LaVoy, Jared Weinstock, Dan Ho, Nick Gonzalez, Lilia Tamm, Dana Chambers
-
Publication number: 20150134623Abstract: A method, system, and data storage medium for parallel partitioning of input data into chunks for data deduplication, comprising: dividing said input data into segments; for at least one segment, appending a portion of a subsequent segment; searching the segments in parallel for candidate breaking points; and partitioning each segment into chunks based on a group of final breaking points selected from said candidate breaking points.Type: ApplicationFiled: February 17, 2011Publication date: May 14, 2015Applicant: JITCOMM NETWORKS PTE LTDInventor: Yong Steven Liu
-
Patent number: 9032032Abstract: Architecture for efficiently ensuring that data is stored to the desired destination datastore such as for replication processes. A copy of data (e.g., messages) sent to a datastore for storage is stored at an alternate location until a received signal indicates that the storage and replication was successful. As soon as the feedback signal is received, the copy is removed from the alternate location, and hence, improves input/output (I/O) and storage patterns. The feedback mechanism can also be used for monitoring the status of data transport associated with log shipping, for example, and taking the appropriate actions when storage (e.g., replication) is not being performed properly.Type: GrantFiled: June 26, 2008Date of Patent: May 12, 2015Assignee: Microsoft Technology Licensing, LLCInventors: David Mills, Todd Luttinen, Victor Boctor
-
Publication number: 20150127621Abstract: Systems and methods of data deduplication are disclosed comprising generating a hash value of a data block and comparing the hash value to a table in a first memory that correlates ranges of hash values with buckets of hash values in a second memory different from the first memory. A bucket is identified based on the comparison and the bucket is searched to locate the hash value. If the hash value is not found in the bucket, the hash value is stored in the bucket and the data block is stored in a third memory. The first memory may be volatile memory and the second memory may be non-volatile random access memory, such as an SSD. Rebalancing of buckets and the table, and use of additional metadata to determine where data blocks should be stored, are also disclosed.Type: ApplicationFiled: November 4, 2014Publication date: May 7, 2015Inventor: Chin L. KUO
-
Publication number: 20150127607Abstract: Data management systems and methods include a cloud-based platform coupled to a system of agents or folders hosted on client devices. The platform does not store actual data but instead makes use of metadata provided by the agents to track a location of all data in the system and manage the distributed storage, movement and processing of the actual data among the agents. In so doing, the platform pools networked storage into “virtual clusters” using local storage at the agents. The agents collectively monitor, store, and transfer or move data, and perform data processing operations as directed by the platform, as described in detail herein. The agents include agents hosted on or coupled to processor-based devices, agents hosted on devices of a local area network, agents hosted on devices of a wide area network, agents hosted on mobile devices, and agents hosted on cloud-based devices.Type: ApplicationFiled: September 15, 2014Publication date: May 7, 2015Inventors: Bret SAVAGE, Casey MARSHALL, Geoffrey STUTCHMAN, Ross ELTHERINGTON, Steve OWENS, George NORTHUP
-
Publication number: 20150127622Abstract: Mechanisms are provided for performing network efficient deduplication. Segments are extracted from files received for deduplication at a host connected to a target over one or more networks and/or fabrics in a deduplication system. Segment identifiers (IDs) are determined and compared with segment IDs for segments already deduplicated. Segments already deduplicated need not be transmitted to a target system. References and reference counts are modified at a target system. Updating references and reference counts may involve modifying filemaps, dictionaries, and datastore suitcases for both already deduplicated and not already deduplicated segments.Type: ApplicationFiled: January 13, 2015Publication date: May 7, 2015Applicant: Dell Products L.P.Inventor: Vinod Jayaraman
-
Patent number: 9026503Abstract: The techniques introduced here provide for enabling deduplication operations for a file system without significantly affecting read performance of the file system due to fragmentation of the data sets in the file system. The techniques include determining, by a storage server that hosts the file system, a level of fragmentation that would be introduced to a data set stored in the file system as a result of performing a deduplication operation on the data set. The storage server then compares the level of fragmentation with a threshold value and determines whether to perform the deduplication operation based on a result of comparing the level of fragmentation with the threshold value. The threshold value represents an acceptable level of fragmentation in the data sets of the file system.Type: GrantFiled: February 29, 2012Date of Patent: May 5, 2015Assignee: NetApp, Inc.Inventors: Alok Sharma, Sunil Walwaiker, Vaijayanti Bharadwaj
-
Patent number: 9026504Abstract: Embodiments of the invention are directed to a system, method, or computer program product for providing expedited loading/inserting of data by an entity. Specifically, the invention expedites the loading/inserting of large quantities of data to database tables. Initially received data for loading is processed, via multi-row insert, onto in-memory or temporary tables. The data is staged on a temporary table while the appropriate base table is determined. Once determined, data from the temporary table is pointed to the base table. In this way, a massive amount of data loading from the temporary table to a base table may occur. This prevents logging and locking associated with adding individual data points or row to a base table independently. Errors are check and processed accordingly. Once updated, the data on the temporary table is deleted in mass and a check point restart is issued.Type: GrantFiled: February 4, 2013Date of Patent: May 5, 2015Assignee: Bank of America CorporationInventors: Ron G. Rambo, Steven A. Walker
-
Publication number: 20150120681Abstract: A system and a method to aggregate multiple content servers' metadata to a local database is provided that enable various features such as improved performance, non searchable server support, duplicate handling and protocol independence. The system performs local content crawling, remote server crawling and remote server searching to create an aggregated database of metadata. The content is located in a single database. Hence, the duplicate metadata can be removed easily.Type: ApplicationFiled: October 27, 2013Publication date: April 30, 2015Applicant: Videon Central, Inc.Inventors: Robert Behe, Robert Kennedy, Russ Shanahan, Derek Andrews, James Condon
-
Publication number: 20150120680Abstract: One or more techniques and/or systems are provided for providing a discussion summary corresponding to a search query and/or for providing discussion session search results. For example, discussion data (e.g., corresponding to real-time messaging, such as a microblog discussion) may be evaluated to identify a discussion topic for a discussion sessions (e.g., a kitchen renovation topic may be assigned to a 1 hour exchange of kitchen renovation messages by a discussion group). A discussion summary of a discussion session may be provided based upon the discussion session having a discussion topic corresponding to a search query topic of a search query. The discussion summary may be provided along with other results for the query and may describe the discussion group, identifiers such as hashtags used by the discussion group, meeting dates/times, average number(s) of participants, other discussion sessions hosted by the discussion group, future discussion sessions, and/or other information.Type: ApplicationFiled: October 24, 2013Publication date: April 30, 2015Applicant: Microsoft CorporationInventors: Omar Alonso, Kartikay Khandelwal, Mohamed Mansour, Paul Ko, Nina Mishra, Krishnaram Kenthapadi, Abhimanyu Das
-
Publication number: 20150120682Abstract: Embodiments of the present invention disclose a method, computer program product, and system for recognizing patterns in log files with unknown grammar. A computer replaces one or more alphanumeric strings with a first alphanumeric character to generate a first resulting string. The computer then replaces one or more identical pairs of characters of the first resulting string with a second alphanumeric character to generate a second resulting string. The computer then replaces one or more consecutive instances of the second alphanumeric character, in the second resulting string, with one instance of the second alphanumeric character to generate a compressed string.Type: ApplicationFiled: October 28, 2013Publication date: April 30, 2015Applicant: International Business Machines CorporationInventors: Fiona M. Crowther, Geza Geleji, Martin A. Ross
-
Patent number: 9020900Abstract: A distributed, deduplicated storage system according to certain embodiments is arranged in a parallel configuration including multiple deduplication nodes. Deduplicated data is distributed across the deduplication nodes. The deduplication nodes can be networked together and communicate with one another according using a light-weight, customized communication scheme (e.g., a scheme based on FTP or HTTP). In some cases, deduplication management information including deduplication signatures and/or other metadata is stored separately from the deduplicated data in deduplication management nodes, improving performance and scalability.Type: GrantFiled: December 13, 2011Date of Patent: April 28, 2015Assignee: CommVault Systems, Inc.Inventors: Manoj Kumar Vijayan Retnamma, Rajiv Kottomtharayil, Deepak Raghunath Attarde
-
Patent number: 9020909Abstract: Techniques and mechanisms are provided to instantly clone active files including active optimized files. When a new instance of an active file is created, a new stub is generated in the user namespace and a block map file is cloned. The block map file includes the same offsets and location pointers that existed in the original block map file. No user file data needs to be copied. If the cloned file is later modified, the behavior can be same as what happens when a de-duplicated file is modified.Type: GrantFiled: February 7, 2013Date of Patent: April 28, 2015Assignee: Dell Products L.P.Inventors: Vinod Jayaraman, Goutham Rao, Ratna Manoj Bolla
-
Patent number: 9020903Abstract: A method is used in recovering duplicate blocks in file systems. A duplicate file system block is detected in a file system. The duplicate file system block is referred by a first inode associated with a first file of the file system and a second inode associated with a second file of the file system. Metadata of the duplicate file system block is evaluated. Based on the evaluation, a set of inodes in the file system is determined. Each inode of the set of inodes refer to the duplicate file system block. Based on the determination, the set of inodes is updated.Type: GrantFiled: June 29, 2012Date of Patent: April 28, 2015Assignee: EMC CorporationInventors: Srinivasa Rao Vempati, Dixitkumar Vishnubhai Patel, Jean-Pierre Bono, Marshall Hansi Wu
-
Publication number: 20150112950Abstract: A computer-implemented method for providing increased scalability in deduplication storage systems may include (1) identifying a database that stores a plurality of reference objects, (2) determining that at least one size-related characteristic of the database has reached a predetermined threshold, (3) partitioning the database into a plurality of sub-databases capable of being updated independent of one another, (4) identifying a request to perform an update operation that updates one or more reference objects stored within at least one sub-database, and then (5) performing the update operation on less than all of the sub-databases to avoid processing costs associated with performing the update operation on all of the sub-databases. Various other systems, methods, and computer-readable media are also disclosed.Type: ApplicationFiled: December 23, 2014Publication date: April 23, 2015Inventors: Xianbo Zhang, Fanglu Guo, Weibao Wu
-
Patent number: 9015126Abstract: Techniques for effective delete operations in a distributed data store with eventually consistent replicated entries include determining to delete a particular entry from the distributed data store. Each entry includes a first field that holds data that indicates a key and a second field that holds data that indicates content associated with the key and a third field that holds data that indicates a version for the content. The method also comprises causing, at least in part, actions that result in marking the particular entry as deleted without removing the particular entry, and updating a version in the third field for the particular entry.Type: GrantFiled: April 21, 2011Date of Patent: April 21, 2015Assignee: Nokia CorporationInventors: Mark Rambacher, Abhijit Bagri, Yekesa Kosuru
-
Patent number: 9015552Abstract: Various embodiments for differentiating between data and stubs pointing to a parent copy of deduplicated data are provided. Undeduplicated data is stored with a checksum of an initial value as a first cyclic redundancy check (CRC) seed. A stub pointing to the parent copy of the deduplicated data is stored with an additional checksum of a differing, additional initial value as a second CRC seed.Type: GrantFiled: May 14, 2013Date of Patent: April 21, 2015Assignee: International Business Machines CorporationInventors: Allen K. Bates, Nils Haustein, Craig A. Klein, Frank Krick, Ulf Troppens, Daniel J. Winarski
-
Patent number: 9015131Abstract: When an online storage service is used to expand a storage capacity of a file server, an amount of communication in synchronization processing and an amount of data retained on the online storage service are reduced to save an amount of charge. In a kernel module provided with a storage area on the online storage service, files are divided into block files and managed, and blocks overlapping with an already registered and saved block file group are not uploaded, but only configuration information of the files is changed. A mechanism is adopted, in which DBs for managing meta information and elimination of duplication are divided and managed, and only updated sections are appropriately uploaded.Type: GrantFiled: August 26, 2011Date of Patent: April 21, 2015Assignee: Hitachi Solutions, Ltd.Inventors: Yasuhiro Kirihata, Kouji Nakayama
-
Patent number: 9015212Abstract: A system for exposing data stored in a cloud computing system to a content delivery network provider includes a database configured to receive and store metadata about the data, the database being implemented in the cloud computing system to store configuration metadata for the data related to the content delivery network, and an origin server configured to receive requests for the data from the content delivery network provider, and configured to provide the data to the content delivery network provider based on the metadata.Type: GrantFiled: October 16, 2012Date of Patent: April 21, 2015Assignee: Rackspace US, Inc.Inventors: Goetz David, Gregory Lee Holt
-
Publication number: 20150106344Abstract: Systems and methods of providing a configurable table of rules that defines a repository/archive search priority that includes multiple repositories/archives. In this manner, repository/archives are successively searched and after a first result is returned the search is stopped. Repository/archives searched in priority order based on location in pre-configured “tiers.” This enables searches to be directed to repository/archives that are best able to handle load for different types of searches, and for different types of studies as well. A duplicate priority list enables an administrator to designate which repository/archive will appear on search results list if duplicates are found. For example, in clinical study archiving systems, the search priority enables an administrator to direct searches to repository best able to handle load for different types of searches and for different types of studies.Type: ApplicationFiled: October 7, 2014Publication date: April 16, 2015Inventor: Mark Allan Wagner
-
Publication number: 20150106345Abstract: According to at least one embodiment, a data storage system is provided. The data storage system includes memory, at least one processor in data communication with the memory, and a deduplication director component executable by the at least one processor. The deduplication director component is configured to receive data for storage on the data storage system, analyze the data to determine whether the data is suitable for at least one of summary-based deduplication, content-based deduplication, and no deduplication, and store, in a common object store, at least one of the data and a reference to duplicate data stored in the common object store.Type: ApplicationFiled: October 15, 2014Publication date: April 16, 2015Inventors: Ronald Ray Trimble, Jeffrey V. Tofano, Thomas R. Ramsdell, Jon Christopher Kennedy
-
Publication number: 20150106343Abstract: A system and method for global data de-duplication in a cloud storage environment utilizing a plurality of data centers is provided. Each cloud storage gateway appliance divides a data stream into a plurality of data objects and generates a content-based hash value as a key for each data object. An IMMUTABLE PUT operation is utilized to store the data object at the associated key within the cloud.Type: ApplicationFiled: October 16, 2013Publication date: April 16, 2015Applicant: NetApp, Inc.Inventors: Kiran Nenmeli Srinivasan, Kishore Kasi Udayashankar, Swetha Krishnan
-
Publication number: 20150106336Abstract: A mechanism is provided for cross-allocated block repair in a mounted file system. A set of cross-allocated blocks are identified from a plurality of blocks within an inode of the mounted file system, based on a corresponding bit associated with each cross-allocated block in a duplicated block information bitmap being in a first identified state. The set of cross-allocated blocks are repaired using a user-defined repair process. Then one or more of the set of cross-allocated blocks are deallocated based on results of the user-defined repair process.Type: ApplicationFiled: December 15, 2014Publication date: April 16, 2015Inventors: Kalyan C. Gunda, Srikanth Srinivasan
-
Patent number: 9009435Abstract: Systems and computer program products are provided for optimizing selection of files for deletion from one or more data storage devices to free up a predetermined amount of space in the one or more data storage devices. A method includes analyzing an effective space occupied by each file of a plurality of files in the one or more data storage devices, identifying, from the plurality of files, one or more data blocks making up a file to free up the predetermined amount of space based on the analysis of the effective space of each file of the plurality of files, selecting one or more of the plurality of files as one or more candidate files for deletion, based on the identified one or more data blocks, and deleting the one or more candidate files for deletion from the one or more data storage devices.Type: GrantFiled: August 13, 2012Date of Patent: April 14, 2015Assignee: International Business Machines CorporationInventors: Duane Mark Baldwin, Sandeep Ramesh Patil, Riyazahamad Moulasab Shiraguppi, Prashant Sodhiya
-
Patent number: 9009434Abstract: Systems and computer program products are provided for optimizing selection of files for eviction from a first storage pool to free up a predetermined amount of space in the first storage pool. A method includes analyzing an effective space occupied by each file of a plurality of files in the first storage pool, identifying, from the plurality of files, one or more data blocks making up a file to free up the predetermined amount of space based on the analysis of the effective space of each file of the plurality of files, selecting one or more of the plurality of files as one or more candidate files for eviction, based on the identified one or more data blocks, and evicting the one or more candidate files for eviction from the first storage pool to a second storage pool.Type: GrantFiled: August 13, 2012Date of Patent: April 14, 2015Assignee: International Business Machines CorporationInventors: Duane Mark Baldwin, Sandeep Ramesh Patil, Riyazahamad Moulasab Shiraguppi, Prashant Sodhiya
-
Publication number: 20150100554Abstract: Systems, methods, and other embodiments associated with attribute redundancy removal are described. In one embodiment, a method includes identifying redundant attribute values in a group of attributes that describe two items. The example method also includes generating a pruned group of attributes having the redundant attribute values removed. The similarity of the two items is calculated based, at least in part, on the pruned group of attribute values.Type: ApplicationFiled: October 31, 2013Publication date: April 9, 2015Inventors: Z. Maria WANG, Su-Ming WU
-
Patent number: 9003152Abstract: Methods, systems, and computer program products are provided for optimizing selection of files for eviction from a first storage pool to free up a predetermined amount of space in the first storage pool. A method includes analyzing an effective space occupied by each file of a plurality of files in the first storage pool, identifying, from the plurality of files, one or more data blocks making up a file to free up the predetermined amount of space based on the analysis of the effective space of each file of the plurality of files, selecting one or more of the plurality of files as one or more candidate files for eviction, based on the identified one or more data blocks, and evicting the one or more candidate files for eviction from the first storage pool to a second storage pool.Type: GrantFiled: November 5, 2013Date of Patent: April 7, 2015Assignee: International Business Machines CorporationInventors: Duane M. Baldwin, Sandeep R. Patil, Riyazahamad M. Shiraguppi, Prashant Sodhiya
-
Patent number: 9002805Abstract: Methods and apparatus for conditional deletes of storage objects are disclosed. A storage medium comprises program instructions that when executed, implement a metadata node of a storage service in which a protocol based on sequence numbers is used to resolve update conflicts. The instructions store, as part of a conditional deletion record associated with a key of a particular storage object identified as a deletion candidate, a deletion sequence number derived from a particular modification sequence number of the object. In accordance with the protocol, the instructions determine whether an additional modification sequence number larger than the deletion sequence number has been generated in response to an operation associated with the key. If such an additional sequence number has been generated, the deletion of the storage object is canceled.Type: GrantFiled: December 14, 2012Date of Patent: April 7, 2015Assignee: Amazon Technologies, Inc.Inventors: Jeffrey Michael Barber, Praveen Kumar Gattu, Christopher Henning Elving, Derek Ernest Denny-Brown, II, Carl Yates Perry
-
Patent number: 9003151Abstract: Methods, systems, and computer program products are provided for optimizing selection of files for deletion from one or more data storage devices to free up a predetermined amount of space in the one or more data storage devices. A method includes analyzing an effective space occupied by each file of a plurality of files in the one or more data storage devices, identifying, from the plurality of files, one or more data blocks making up a file to free up the predetermined amount of space based on the analysis of the effective space of each file of the plurality of files, selecting one or more of the plurality of files as one or more candidate files for deletion, based on the identified one or more data blocks, and deleting the one or more candidate files for deletion from the one or more data storage devices.Type: GrantFiled: November 5, 2013Date of Patent: April 7, 2015Assignee: International Business Machines CorporationInventors: Duane M. Baldwin, Sandeep R. Patil, Riyazahamad M. Shiraguppi, Prashant Sodhiya
-
Publication number: 20150095291Abstract: Systems and methods are disclosed herein for supplementing product records with product groups that are relevant to the product records. Queries form users may be analyzed to extract keywords. Search results for keywords are evaluated to determine category consistency among product records, including such values as entropy and taxonomy depth. Those keywords with search results having adequate category consistency are selected as product groups and the search results associated with the product groups. Product groups are associated with product records according to a random walk of a graph having as nodes products and product groups and links representing belonging of a product to a product group. Product groups may be selected based on a transition probability based on a random walk and a quality score based on usage of a product group page for the product group.Type: ApplicationFiled: September 30, 2013Publication date: April 2, 2015Applicant: Wal-Mart Stores, Inc.Inventor: Shankara B. Subramanya
-
Patent number: 8996470Abstract: Methods and systems for maintaining the internal consistency of a fact repository are described. Accessed objects are checked for attribute-value pairs that have links to other objects. For any link to an object, the name of the linked-to object is inserted into the attribute-value pair having the link. The accessed objects are filtered to remove attribute-value pairs meeting predefined criteria, possibly resulting in null objects. Links to null objects are identified and removed.Type: GrantFiled: May 31, 2005Date of Patent: March 31, 2015Assignee: Google Inc.Inventors: Andrew William Hogue, Robert Joseph Siemborski, Jonathan T. Betz
-
Patent number: 8996460Abstract: In one aspect, a method to generate a point-in-time (PIT) snapshot of deduplication-based volume includes generating a virtual access data structure, generating a preliminary snapshot of the volume and modifying the preliminary snapshot to point to a block according to the virtual access data structure to generate the PIT snapshot of the deduplication-based volume.Type: GrantFiled: March 14, 2013Date of Patent: March 31, 2015Assignee: EMC CorporationInventors: Shahar Frank, Assaf Natanzon, Jehuda Shemer
-
Patent number: 8996467Abstract: A distributed, cloud-based storage system provides a reliable, deduplicated, scalable and high performance backup service to heterogeneous clients that connect to it via a communications network. The distributed cloud-based storage system guarantees consistent and reliable data storage while using structured storage that lacks ACID compliance. Consistency and reliability are guaranteed using a system that includes: 1) back references from shared objects to referring objects, 2) safe orders of operation for object deletion and creation, 3) and simultaneous access to shared resources through sub-resources.Type: GrantFiled: December 29, 2011Date of Patent: March 31, 2015Assignee: Druva Inc.Inventors: Anand Apte, Faisal Puthuparackat, Jaspreet Singh, Milind Borate, Shekhar S. Deshkar
-
Patent number: 8996475Abstract: A global information management system (GIMS) includes a collection of standards and methods that allow information management on a global scale. A GIMS computer network includes a central registration database (CRD) and one or more GIMS computer systems connected over a network. Each GIMS computer system includes a relational database having a set of standardized tables. The CRD may provide a GIMS network-unique system ID to each GIMS computer system. Each GIMS computer system uses the GIMS network-unique system ID as part of a primary key for each record generated by and stored in the set of standardized tables of the GIMS database. The GIMS enables global database normalization through the globally unique identification of database records.Type: GrantFiled: July 17, 2013Date of Patent: March 31, 2015Assignee: Asibo Inc.Inventor: Borsu Asisi Namini