Data Cleansing, Data Scrubbing, And Deleting Duplicates Patents (Class 707/692)
  • Patent number: 9116941
    Abstract: For reducing digests storage consumption in a data deduplication system using a processor device in a computing environment, input data is partitioned into chunks, and the chunks are grouped into chunk sets. Digests are calculated for input data and stored in sets corresponding to the chunk sets. Similarity elements are calculated for the input data and the similarity elements are stored in a similarity search structure. The number of similarity elements associated with a chunk set which are currently contained in the similarity search structure is maintained for each chunk set, and when this number of a specific chunk set becomes lower than a threshold, the digests set associated with that chunk set are removed from the repository.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: August 25, 2015
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Lior Aronovich
  • Patent number: 9069786
    Abstract: A system and method for managing multiple fingerprint tables in a deduplicating storage system. A computer system includes a data storage medium, a first fingerprint table comprising a first plurality of entries, and a second fingerprint table comprising a second plurality of entries. Each of the first plurality of entries and each of the second plurality of entries are configured to store fingerprint related data corresponding to data stored in the data storage medium. A data storage controller is configured to select the first fingerprint table for storage of entries corresponding to data stored in the data storage medium that has been deemed more likely to be successfully deduplicated than other data stored in the data storage medium; and select the second fingerprint table for storage of entries corresponding to data stored in the data storage medium that has been deemed less likely to be successfully deduplicated than other data stored in the data storage medium.
    Type: Grant
    Filed: November 18, 2013
    Date of Patent: June 30, 2015
    Assignee: Pure Storage, Inc.
    Inventors: John Colgrove, John Hayes, Ethan Miller, Joseph S. Hasbani, Cary Sandvig
  • Patent number: 9043292
    Abstract: The technique introduced here includes a system and method for identifying and mapping duplicate data objects referenced by data objects. The technique illustratively utilizes a hierarchical tree of fingerprints for each data object to compare the data objects and identify duplicate data blocks referenced by the data objects. A progressive comparison of the hierarchical trees starts from a top layer of the hierarchical trees and proceeds toward a base layer. Between the compared data objects (i.e., the compared hierarchical trees), the technique maps matching fingerprints only at the top-most layer of the hierarchical trees at which the fingerprints match. Lower layer matching fingerprints are neither compared nor mapped. Data blocks corresponding to the matching fingerprints are then deleted. Such an identification and mapping technique substantially reduces the amount of mapping metadata stored in data objects that have been subject to deduplication.
    Type: Grant
    Filed: June 14, 2011
    Date of Patent: May 26, 2015
    Assignee: NetApp, Inc.
    Inventors: Giridhar Appaji Nag Yasa, Nagesh Panyam Chandrasekarasastry
  • Publication number: 20150142758
    Abstract: A method assigns stored documents within a distributed storage system (DSS) to various document categories to enable a target number of documents to be deleted. An intelligent storage management (ISM) utility identifies a data storage threshold value used to control data storage within the DSS. If a current storage usage exceeds the data storage threshold value, the ISM utility calculates, based on the current storage usage, a target number of documents that can be deleted from the DSS. The ISM utility utilizes a recursive process which includes assigning stored documents to groups including a set of document categories based on data characteristics of the stored documents. The ISM utility further utilizes the recursive process to delete, based on an established ordering of the groups, all of the stored documents assigned to a subset of the groups in order to remove the target number of stored documents.
    Type: Application
    Filed: August 29, 2014
    Publication date: May 21, 2015
    Inventors: Dinakaran Joseph, Devaprasad Khandurao Nadgir, Ramkumar Ramalingam, David Elliot Shepard
  • Publication number: 20150142760
    Abstract: A method and a device is described for de-duplicating a web page. The method includes: extracting at least one core sentence from a target web page; mapping each core sentence to a unique numeric value to form a first numeric value set; determining an intersection set of the first numeric value set and each second numeric value set, and the number of numeric values included in each intersection set, and determining a maximum number of numeric values included in each intersection set; and when a ratio of the maximum number to a total number of numeric values in the first numeric value set is greater than a set threshold, processing the target web page as a duplicate web page. In embodiments of the present invention, during web page de-duplication processing, accuracy can be improved, an anti-noise capability can be enhanced, and a calculating scale can be reduced.
    Type: Application
    Filed: December 23, 2014
    Publication date: May 21, 2015
    Inventors: Nan Jiang, Hui Zhang, Jia Wan
  • Publication number: 20150142759
    Abstract: A method of detecting whether a packet from a plurality of packets transmitted by at least one transmitting station over a network has been played back is disclosed. Each packet includes a message and an identifier, the packets being successively transmitted over several consecutive time periods. The method includes receiving the packet by at least one receiving station and reading of the identifier of the received packet to obtain a received identifier, and consulting, by the receiving station, a database of identifiers already received to determine whether the received identifier has already been received. If the received identifier has not already been received, the method also includes updating the database to include the received identifier. The identifier includes an indicator of belonging to groups of packets.
    Type: Application
    Filed: November 20, 2014
    Publication date: May 21, 2015
    Inventors: Patrick DUPUTZ, Sepideh FOULADGAR, Carlos PINTO
  • Publication number: 20150142757
    Abstract: The disclosure provides an information processing method and an electronic device. The electronic device generates M components to be embedded into a first application program when installing a recording application program, M is an integer greater than or equal to 1. There is an association relationship between the M components and the recording application program. In a case where the M components are embedded into the first application program, the method includes: when the first application program runs, displaying a first graphical interface corresponding to the first application program by the electronic device, the first graphical interface including the M components; obtaining a first triggering operation for a first component of the M components; collecting, in response to the first triggering operation, first data content under the first graphical interface directly; and storing the collected first data content.
    Type: Application
    Filed: March 28, 2014
    Publication date: May 21, 2015
    Applicant: Lenovo (Beijing) Co., Ltd.
    Inventors: Kai Li, Wei Huang, Wenhui Lu, Kangli Zhao
  • Publication number: 20150142756
    Abstract: Deduplication in a distributed file system is described. Key classes are determined from a set of potential keys, the potential keys used to represent file content stored by the file system. Control of the key classes is apportioned among index nodes of the file system. Nodes in the file system, during deduplication of data chunks of the file content, generate keys calculated from the data chunks. The keys are distributed among the index nodes based on relations between the keys and the key classes controlled by the index nodes.
    Type: Application
    Filed: June 14, 2011
    Publication date: May 21, 2015
    Inventors: Mark Robert Watkins, Boris Zuckerman, Oskar Y. Batuner
  • Publication number: 20150142755
    Abstract: A control unit of a storage apparatus divides received data into one or more chunks and compresses the divided chunk(s); and regarding the chunk whose compressibility is equal to or lower than a threshold value, the control unit does not store the chunk in the first storage area, but calculates a hash value of the compressed chunk, compares the hash value with a hash value of another data already stored in the second storage area and executes first deduplication processing; and regarding the chunk whose compressibility is higher than the threshold value, the control unit stores the compressed chunk in the first storage area, reads the compressed chunk from the first storage area, calculates a hash value of the compressed chunk, compares the relevant hash value with a hash value of another data already stored in the second storage area, and executes secondary deduplication processing.
    Type: Application
    Filed: August 24, 2012
    Publication date: May 21, 2015
    Applicants: HITACHI, LTD., HITACHI INFORMATION & TELECOMMUNICATION ENGINEERING, LTD.
    Inventor: Masayuki Kishi
  • Patent number: 9037551
    Abstract: Aspects of the present disclosure provide techniques that determine whether an attribute value is associated with each configuration item in a plurality of configuration items. If it is determined that the attribute value is associated with each configuration item in the plurality of configuration items, the attribute value is deemed a redundant attribute value.
    Type: Grant
    Filed: March 7, 2012
    Date of Patent: May 19, 2015
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: David Azriel, Nimrod Nahum, Nir Mardiks
  • Patent number: 9037825
    Abstract: Conditions are enforced to prevent unintended deletion of data stored by a data storage system. For example, to delete a collection of data, a condition on the collection of data's size may be enforced. The collection may be required to be empty, for example. In addition, a condition that there not exist a pending data processing operation that can affect fulfillment of the condition on the collection of data's size is also enforced.
    Type: Grant
    Filed: November 20, 2012
    Date of Patent: May 19, 2015
    Assignee: Amazon Technologies, Inc.
    Inventors: Bryan James Donlan, Sandeep Kumar
  • Patent number: 9037546
    Abstract: In accordance with embodiments, there are provided mechanisms and methods for automatic code generation for database object deletion. These mechanisms and methods for automatic code generation for database object deletion can generate code for deleting database objects in an automated manner. The ability to generate code for deleting database objects in an automated manner can enable the efficient and accurate deletion of database objects, including database objects with relationships to other database objects.
    Type: Grant
    Filed: March 25, 2011
    Date of Patent: May 19, 2015
    Assignee: salesforce.com, inc.
    Inventors: Simon Wong, Sonali Agrawal
  • Patent number: 9037534
    Abstract: A data transformation system receives data from one or more external source systems and stores and transforms the data for providing to reporting systems. The data transformation system maintains multiple versions of data received from an external source system. The data transformation system can combine data from different versions of data and provide to the reporting system. As a result, external source systems that do not maintain data in a format appropriate for reporting systems and/or do not maintain sufficient historical data to generate different types of reports are able to generate these reports. The data transformation system can also enhance older versions of data stored in the system or exclude portions of data from reports. The data transformation system can purge older versions of data so that older data that is less frequently requested is maintained at a lower frequency than recent data.
    Type: Grant
    Filed: December 12, 2014
    Date of Patent: May 19, 2015
    Assignee: GoodData Corporation
    Inventor: Pavel Kolesnikov
  • Patent number: 9037536
    Abstract: A system and method for automated database management are provided. Statistics relating to operation of a database may be collected, wherein the database comprises one or more database objects. Characteristics of the database objects may be determined, either automatically or by user intervention, using the collected statistics, one or more policies, and/or one or more definitions. The policies and definitions may be defaults or may be customized by a user. Actions to be performed on the database objects may be determined, either automatically or by user intervention, based on the characteristics of the database objects. A schedule for performing the actions on the database objects may be automatically determined. The actions may be performed on the database objects based on the schedule.
    Type: Grant
    Filed: October 30, 2007
    Date of Patent: May 19, 2015
    Assignee: BMC SOFTWARE, INC.
    Inventors: Melody Vos, Jeff Slavin
  • Publication number: 20150134624
    Abstract: Methods, systems, and computer readable media for content item purging functionality are provided. A contact item purger, such as may be incorporated within a local client application of a content management system, leverages its knowledge as to which items have been uploaded to the content management system, and how long content items have been stored on the user device, to propose items for local deletion and thus reclaiming storage on the user device. A contact item purger may run on one or more devices of a user associated with an account on a content management system upon various triggering events, and may run with or without user interaction, thus maintaining available user device memory capacity at all times.
    Type: Application
    Filed: November 12, 2013
    Publication date: May 14, 2015
    Applicant: Dropbox, Inc.
    Inventors: Michael Dwan, Anthony Grue, Daniel Kluesing
  • Publication number: 20150134625
    Abstract: Technology is disclosed for improving the storage efficiency and communication efficiency for a storage client device by maximizing the cache hit rate and minimizing data requests to the storage server. The storage server provides a duplication list to the storage client device. The duplication list contains references (e.g. storage addresses) to data blocks that contain duplicate data content. The storage client uses the duplication list to improve the cache hit rate. The duplication list is pruned to contain references to data blocks relevant to the storage client device. The storage server can prune the duplication list based on a working set of storage objects for a client. Alternatively, the storage server can prune the duplication list based on content characteristics, e.g. duplication degree and access frequency. Duplicate blocks to which the client does not have access can be excluded from the duplication list.
    Type: Application
    Filed: November 13, 2013
    Publication date: May 14, 2015
    Inventors: James F. Lentini, Anshul Madan, Deepak R. Kenchammana-Hosekote
  • Publication number: 20150134603
    Abstract: A computer based method and system for managing contact information from the contacts of a user. The contact information is collected and transformed to a consistent format, which permits resolution of conflicting information from multiple sources, such as differences in location information from different social mediums. This transformation enables cross media communication, such as notifications between users and contacts about location or other matters. In addition, the transformation permits a single communication to be transformed for use in multiple social media platforms, whether to a single contact or a select group. User interfaces are provided for display and use of such functional interactions.
    Type: Application
    Filed: October 15, 2014
    Publication date: May 14, 2015
    Applicant: Connect Software Corporation
    Inventors: ZACH MELAMED, Ryan Allis, Anima Sarah LaVoy, Jared Weinstock, Dan Ho, Nick Gonzalez, Lilia Tamm, Dana Chambers
  • Publication number: 20150134623
    Abstract: A method, system, and data storage medium for parallel partitioning of input data into chunks for data deduplication, comprising: dividing said input data into segments; for at least one segment, appending a portion of a subsequent segment; searching the segments in parallel for candidate breaking points; and partitioning each segment into chunks based on a group of final breaking points selected from said candidate breaking points.
    Type: Application
    Filed: February 17, 2011
    Publication date: May 14, 2015
    Applicant: JITCOMM NETWORKS PTE LTD
    Inventor: Yong Steven Liu
  • Patent number: 9032032
    Abstract: Architecture for efficiently ensuring that data is stored to the desired destination datastore such as for replication processes. A copy of data (e.g., messages) sent to a datastore for storage is stored at an alternate location until a received signal indicates that the storage and replication was successful. As soon as the feedback signal is received, the copy is removed from the alternate location, and hence, improves input/output (I/O) and storage patterns. The feedback mechanism can also be used for monitoring the status of data transport associated with log shipping, for example, and taking the appropriate actions when storage (e.g., replication) is not being performed properly.
    Type: Grant
    Filed: June 26, 2008
    Date of Patent: May 12, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: David Mills, Todd Luttinen, Victor Boctor
  • Publication number: 20150127621
    Abstract: Systems and methods of data deduplication are disclosed comprising generating a hash value of a data block and comparing the hash value to a table in a first memory that correlates ranges of hash values with buckets of hash values in a second memory different from the first memory. A bucket is identified based on the comparison and the bucket is searched to locate the hash value. If the hash value is not found in the bucket, the hash value is stored in the bucket and the data block is stored in a third memory. The first memory may be volatile memory and the second memory may be non-volatile random access memory, such as an SSD. Rebalancing of buckets and the table, and use of additional metadata to determine where data blocks should be stored, are also disclosed.
    Type: Application
    Filed: November 4, 2014
    Publication date: May 7, 2015
    Inventor: Chin L. KUO
  • Publication number: 20150127607
    Abstract: Data management systems and methods include a cloud-based platform coupled to a system of agents or folders hosted on client devices. The platform does not store actual data but instead makes use of metadata provided by the agents to track a location of all data in the system and manage the distributed storage, movement and processing of the actual data among the agents. In so doing, the platform pools networked storage into “virtual clusters” using local storage at the agents. The agents collectively monitor, store, and transfer or move data, and perform data processing operations as directed by the platform, as described in detail herein. The agents include agents hosted on or coupled to processor-based devices, agents hosted on devices of a local area network, agents hosted on devices of a wide area network, agents hosted on mobile devices, and agents hosted on cloud-based devices.
    Type: Application
    Filed: September 15, 2014
    Publication date: May 7, 2015
    Inventors: Bret SAVAGE, Casey MARSHALL, Geoffrey STUTCHMAN, Ross ELTHERINGTON, Steve OWENS, George NORTHUP
  • Publication number: 20150127622
    Abstract: Mechanisms are provided for performing network efficient deduplication. Segments are extracted from files received for deduplication at a host connected to a target over one or more networks and/or fabrics in a deduplication system. Segment identifiers (IDs) are determined and compared with segment IDs for segments already deduplicated. Segments already deduplicated need not be transmitted to a target system. References and reference counts are modified at a target system. Updating references and reference counts may involve modifying filemaps, dictionaries, and datastore suitcases for both already deduplicated and not already deduplicated segments.
    Type: Application
    Filed: January 13, 2015
    Publication date: May 7, 2015
    Applicant: Dell Products L.P.
    Inventor: Vinod Jayaraman
  • Patent number: 9026503
    Abstract: The techniques introduced here provide for enabling deduplication operations for a file system without significantly affecting read performance of the file system due to fragmentation of the data sets in the file system. The techniques include determining, by a storage server that hosts the file system, a level of fragmentation that would be introduced to a data set stored in the file system as a result of performing a deduplication operation on the data set. The storage server then compares the level of fragmentation with a threshold value and determines whether to perform the deduplication operation based on a result of comparing the level of fragmentation with the threshold value. The threshold value represents an acceptable level of fragmentation in the data sets of the file system.
    Type: Grant
    Filed: February 29, 2012
    Date of Patent: May 5, 2015
    Assignee: NetApp, Inc.
    Inventors: Alok Sharma, Sunil Walwaiker, Vaijayanti Bharadwaj
  • Patent number: 9026504
    Abstract: Embodiments of the invention are directed to a system, method, or computer program product for providing expedited loading/inserting of data by an entity. Specifically, the invention expedites the loading/inserting of large quantities of data to database tables. Initially received data for loading is processed, via multi-row insert, onto in-memory or temporary tables. The data is staged on a temporary table while the appropriate base table is determined. Once determined, data from the temporary table is pointed to the base table. In this way, a massive amount of data loading from the temporary table to a base table may occur. This prevents logging and locking associated with adding individual data points or row to a base table independently. Errors are check and processed accordingly. Once updated, the data on the temporary table is deleted in mass and a check point restart is issued.
    Type: Grant
    Filed: February 4, 2013
    Date of Patent: May 5, 2015
    Assignee: Bank of America Corporation
    Inventors: Ron G. Rambo, Steven A. Walker
  • Publication number: 20150120681
    Abstract: A system and a method to aggregate multiple content servers' metadata to a local database is provided that enable various features such as improved performance, non searchable server support, duplicate handling and protocol independence. The system performs local content crawling, remote server crawling and remote server searching to create an aggregated database of metadata. The content is located in a single database. Hence, the duplicate metadata can be removed easily.
    Type: Application
    Filed: October 27, 2013
    Publication date: April 30, 2015
    Applicant: Videon Central, Inc.
    Inventors: Robert Behe, Robert Kennedy, Russ Shanahan, Derek Andrews, James Condon
  • Publication number: 20150120680
    Abstract: One or more techniques and/or systems are provided for providing a discussion summary corresponding to a search query and/or for providing discussion session search results. For example, discussion data (e.g., corresponding to real-time messaging, such as a microblog discussion) may be evaluated to identify a discussion topic for a discussion sessions (e.g., a kitchen renovation topic may be assigned to a 1 hour exchange of kitchen renovation messages by a discussion group). A discussion summary of a discussion session may be provided based upon the discussion session having a discussion topic corresponding to a search query topic of a search query. The discussion summary may be provided along with other results for the query and may describe the discussion group, identifiers such as hashtags used by the discussion group, meeting dates/times, average number(s) of participants, other discussion sessions hosted by the discussion group, future discussion sessions, and/or other information.
    Type: Application
    Filed: October 24, 2013
    Publication date: April 30, 2015
    Applicant: Microsoft Corporation
    Inventors: Omar Alonso, Kartikay Khandelwal, Mohamed Mansour, Paul Ko, Nina Mishra, Krishnaram Kenthapadi, Abhimanyu Das
  • Publication number: 20150120682
    Abstract: Embodiments of the present invention disclose a method, computer program product, and system for recognizing patterns in log files with unknown grammar. A computer replaces one or more alphanumeric strings with a first alphanumeric character to generate a first resulting string. The computer then replaces one or more identical pairs of characters of the first resulting string with a second alphanumeric character to generate a second resulting string. The computer then replaces one or more consecutive instances of the second alphanumeric character, in the second resulting string, with one instance of the second alphanumeric character to generate a compressed string.
    Type: Application
    Filed: October 28, 2013
    Publication date: April 30, 2015
    Applicant: International Business Machines Corporation
    Inventors: Fiona M. Crowther, Geza Geleji, Martin A. Ross
  • Patent number: 9020900
    Abstract: A distributed, deduplicated storage system according to certain embodiments is arranged in a parallel configuration including multiple deduplication nodes. Deduplicated data is distributed across the deduplication nodes. The deduplication nodes can be networked together and communicate with one another according using a light-weight, customized communication scheme (e.g., a scheme based on FTP or HTTP). In some cases, deduplication management information including deduplication signatures and/or other metadata is stored separately from the deduplicated data in deduplication management nodes, improving performance and scalability.
    Type: Grant
    Filed: December 13, 2011
    Date of Patent: April 28, 2015
    Assignee: CommVault Systems, Inc.
    Inventors: Manoj Kumar Vijayan Retnamma, Rajiv Kottomtharayil, Deepak Raghunath Attarde
  • Patent number: 9020909
    Abstract: Techniques and mechanisms are provided to instantly clone active files including active optimized files. When a new instance of an active file is created, a new stub is generated in the user namespace and a block map file is cloned. The block map file includes the same offsets and location pointers that existed in the original block map file. No user file data needs to be copied. If the cloned file is later modified, the behavior can be same as what happens when a de-duplicated file is modified.
    Type: Grant
    Filed: February 7, 2013
    Date of Patent: April 28, 2015
    Assignee: Dell Products L.P.
    Inventors: Vinod Jayaraman, Goutham Rao, Ratna Manoj Bolla
  • Patent number: 9020903
    Abstract: A method is used in recovering duplicate blocks in file systems. A duplicate file system block is detected in a file system. The duplicate file system block is referred by a first inode associated with a first file of the file system and a second inode associated with a second file of the file system. Metadata of the duplicate file system block is evaluated. Based on the evaluation, a set of inodes in the file system is determined. Each inode of the set of inodes refer to the duplicate file system block. Based on the determination, the set of inodes is updated.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: April 28, 2015
    Assignee: EMC Corporation
    Inventors: Srinivasa Rao Vempati, Dixitkumar Vishnubhai Patel, Jean-Pierre Bono, Marshall Hansi Wu
  • Publication number: 20150112950
    Abstract: A computer-implemented method for providing increased scalability in deduplication storage systems may include (1) identifying a database that stores a plurality of reference objects, (2) determining that at least one size-related characteristic of the database has reached a predetermined threshold, (3) partitioning the database into a plurality of sub-databases capable of being updated independent of one another, (4) identifying a request to perform an update operation that updates one or more reference objects stored within at least one sub-database, and then (5) performing the update operation on less than all of the sub-databases to avoid processing costs associated with performing the update operation on all of the sub-databases. Various other systems, methods, and computer-readable media are also disclosed.
    Type: Application
    Filed: December 23, 2014
    Publication date: April 23, 2015
    Inventors: Xianbo Zhang, Fanglu Guo, Weibao Wu
  • Patent number: 9015126
    Abstract: Techniques for effective delete operations in a distributed data store with eventually consistent replicated entries include determining to delete a particular entry from the distributed data store. Each entry includes a first field that holds data that indicates a key and a second field that holds data that indicates content associated with the key and a third field that holds data that indicates a version for the content. The method also comprises causing, at least in part, actions that result in marking the particular entry as deleted without removing the particular entry, and updating a version in the third field for the particular entry.
    Type: Grant
    Filed: April 21, 2011
    Date of Patent: April 21, 2015
    Assignee: Nokia Corporation
    Inventors: Mark Rambacher, Abhijit Bagri, Yekesa Kosuru
  • Patent number: 9015552
    Abstract: Various embodiments for differentiating between data and stubs pointing to a parent copy of deduplicated data are provided. Undeduplicated data is stored with a checksum of an initial value as a first cyclic redundancy check (CRC) seed. A stub pointing to the parent copy of the deduplicated data is stored with an additional checksum of a differing, additional initial value as a second CRC seed.
    Type: Grant
    Filed: May 14, 2013
    Date of Patent: April 21, 2015
    Assignee: International Business Machines Corporation
    Inventors: Allen K. Bates, Nils Haustein, Craig A. Klein, Frank Krick, Ulf Troppens, Daniel J. Winarski
  • Patent number: 9015131
    Abstract: When an online storage service is used to expand a storage capacity of a file server, an amount of communication in synchronization processing and an amount of data retained on the online storage service are reduced to save an amount of charge. In a kernel module provided with a storage area on the online storage service, files are divided into block files and managed, and blocks overlapping with an already registered and saved block file group are not uploaded, but only configuration information of the files is changed. A mechanism is adopted, in which DBs for managing meta information and elimination of duplication are divided and managed, and only updated sections are appropriately uploaded.
    Type: Grant
    Filed: August 26, 2011
    Date of Patent: April 21, 2015
    Assignee: Hitachi Solutions, Ltd.
    Inventors: Yasuhiro Kirihata, Kouji Nakayama
  • Patent number: 9015212
    Abstract: A system for exposing data stored in a cloud computing system to a content delivery network provider includes a database configured to receive and store metadata about the data, the database being implemented in the cloud computing system to store configuration metadata for the data related to the content delivery network, and an origin server configured to receive requests for the data from the content delivery network provider, and configured to provide the data to the content delivery network provider based on the metadata.
    Type: Grant
    Filed: October 16, 2012
    Date of Patent: April 21, 2015
    Assignee: Rackspace US, Inc.
    Inventors: Goetz David, Gregory Lee Holt
  • Publication number: 20150106344
    Abstract: Systems and methods of providing a configurable table of rules that defines a repository/archive search priority that includes multiple repositories/archives. In this manner, repository/archives are successively searched and after a first result is returned the search is stopped. Repository/archives searched in priority order based on location in pre-configured “tiers.” This enables searches to be directed to repository/archives that are best able to handle load for different types of searches, and for different types of studies as well. A duplicate priority list enables an administrator to designate which repository/archive will appear on search results list if duplicates are found. For example, in clinical study archiving systems, the search priority enables an administrator to direct searches to repository best able to handle load for different types of searches and for different types of studies.
    Type: Application
    Filed: October 7, 2014
    Publication date: April 16, 2015
    Inventor: Mark Allan Wagner
  • Publication number: 20150106345
    Abstract: According to at least one embodiment, a data storage system is provided. The data storage system includes memory, at least one processor in data communication with the memory, and a deduplication director component executable by the at least one processor. The deduplication director component is configured to receive data for storage on the data storage system, analyze the data to determine whether the data is suitable for at least one of summary-based deduplication, content-based deduplication, and no deduplication, and store, in a common object store, at least one of the data and a reference to duplicate data stored in the common object store.
    Type: Application
    Filed: October 15, 2014
    Publication date: April 16, 2015
    Inventors: Ronald Ray Trimble, Jeffrey V. Tofano, Thomas R. Ramsdell, Jon Christopher Kennedy
  • Publication number: 20150106343
    Abstract: A system and method for global data de-duplication in a cloud storage environment utilizing a plurality of data centers is provided. Each cloud storage gateway appliance divides a data stream into a plurality of data objects and generates a content-based hash value as a key for each data object. An IMMUTABLE PUT operation is utilized to store the data object at the associated key within the cloud.
    Type: Application
    Filed: October 16, 2013
    Publication date: April 16, 2015
    Applicant: NetApp, Inc.
    Inventors: Kiran Nenmeli Srinivasan, Kishore Kasi Udayashankar, Swetha Krishnan
  • Publication number: 20150106336
    Abstract: A mechanism is provided for cross-allocated block repair in a mounted file system. A set of cross-allocated blocks are identified from a plurality of blocks within an inode of the mounted file system, based on a corresponding bit associated with each cross-allocated block in a duplicated block information bitmap being in a first identified state. The set of cross-allocated blocks are repaired using a user-defined repair process. Then one or more of the set of cross-allocated blocks are deallocated based on results of the user-defined repair process.
    Type: Application
    Filed: December 15, 2014
    Publication date: April 16, 2015
    Inventors: Kalyan C. Gunda, Srikanth Srinivasan
  • Patent number: 9009435
    Abstract: Systems and computer program products are provided for optimizing selection of files for deletion from one or more data storage devices to free up a predetermined amount of space in the one or more data storage devices. A method includes analyzing an effective space occupied by each file of a plurality of files in the one or more data storage devices, identifying, from the plurality of files, one or more data blocks making up a file to free up the predetermined amount of space based on the analysis of the effective space of each file of the plurality of files, selecting one or more of the plurality of files as one or more candidate files for deletion, based on the identified one or more data blocks, and deleting the one or more candidate files for deletion from the one or more data storage devices.
    Type: Grant
    Filed: August 13, 2012
    Date of Patent: April 14, 2015
    Assignee: International Business Machines Corporation
    Inventors: Duane Mark Baldwin, Sandeep Ramesh Patil, Riyazahamad Moulasab Shiraguppi, Prashant Sodhiya
  • Patent number: 9009434
    Abstract: Systems and computer program products are provided for optimizing selection of files for eviction from a first storage pool to free up a predetermined amount of space in the first storage pool. A method includes analyzing an effective space occupied by each file of a plurality of files in the first storage pool, identifying, from the plurality of files, one or more data blocks making up a file to free up the predetermined amount of space based on the analysis of the effective space of each file of the plurality of files, selecting one or more of the plurality of files as one or more candidate files for eviction, based on the identified one or more data blocks, and evicting the one or more candidate files for eviction from the first storage pool to a second storage pool.
    Type: Grant
    Filed: August 13, 2012
    Date of Patent: April 14, 2015
    Assignee: International Business Machines Corporation
    Inventors: Duane Mark Baldwin, Sandeep Ramesh Patil, Riyazahamad Moulasab Shiraguppi, Prashant Sodhiya
  • Publication number: 20150100554
    Abstract: Systems, methods, and other embodiments associated with attribute redundancy removal are described. In one embodiment, a method includes identifying redundant attribute values in a group of attributes that describe two items. The example method also includes generating a pruned group of attributes having the redundant attribute values removed. The similarity of the two items is calculated based, at least in part, on the pruned group of attribute values.
    Type: Application
    Filed: October 31, 2013
    Publication date: April 9, 2015
    Inventors: Z. Maria WANG, Su-Ming WU
  • Patent number: 9003152
    Abstract: Methods, systems, and computer program products are provided for optimizing selection of files for eviction from a first storage pool to free up a predetermined amount of space in the first storage pool. A method includes analyzing an effective space occupied by each file of a plurality of files in the first storage pool, identifying, from the plurality of files, one or more data blocks making up a file to free up the predetermined amount of space based on the analysis of the effective space of each file of the plurality of files, selecting one or more of the plurality of files as one or more candidate files for eviction, based on the identified one or more data blocks, and evicting the one or more candidate files for eviction from the first storage pool to a second storage pool.
    Type: Grant
    Filed: November 5, 2013
    Date of Patent: April 7, 2015
    Assignee: International Business Machines Corporation
    Inventors: Duane M. Baldwin, Sandeep R. Patil, Riyazahamad M. Shiraguppi, Prashant Sodhiya
  • Patent number: 9002805
    Abstract: Methods and apparatus for conditional deletes of storage objects are disclosed. A storage medium comprises program instructions that when executed, implement a metadata node of a storage service in which a protocol based on sequence numbers is used to resolve update conflicts. The instructions store, as part of a conditional deletion record associated with a key of a particular storage object identified as a deletion candidate, a deletion sequence number derived from a particular modification sequence number of the object. In accordance with the protocol, the instructions determine whether an additional modification sequence number larger than the deletion sequence number has been generated in response to an operation associated with the key. If such an additional sequence number has been generated, the deletion of the storage object is canceled.
    Type: Grant
    Filed: December 14, 2012
    Date of Patent: April 7, 2015
    Assignee: Amazon Technologies, Inc.
    Inventors: Jeffrey Michael Barber, Praveen Kumar Gattu, Christopher Henning Elving, Derek Ernest Denny-Brown, II, Carl Yates Perry
  • Patent number: 9003151
    Abstract: Methods, systems, and computer program products are provided for optimizing selection of files for deletion from one or more data storage devices to free up a predetermined amount of space in the one or more data storage devices. A method includes analyzing an effective space occupied by each file of a plurality of files in the one or more data storage devices, identifying, from the plurality of files, one or more data blocks making up a file to free up the predetermined amount of space based on the analysis of the effective space of each file of the plurality of files, selecting one or more of the plurality of files as one or more candidate files for deletion, based on the identified one or more data blocks, and deleting the one or more candidate files for deletion from the one or more data storage devices.
    Type: Grant
    Filed: November 5, 2013
    Date of Patent: April 7, 2015
    Assignee: International Business Machines Corporation
    Inventors: Duane M. Baldwin, Sandeep R. Patil, Riyazahamad M. Shiraguppi, Prashant Sodhiya
  • Publication number: 20150095291
    Abstract: Systems and methods are disclosed herein for supplementing product records with product groups that are relevant to the product records. Queries form users may be analyzed to extract keywords. Search results for keywords are evaluated to determine category consistency among product records, including such values as entropy and taxonomy depth. Those keywords with search results having adequate category consistency are selected as product groups and the search results associated with the product groups. Product groups are associated with product records according to a random walk of a graph having as nodes products and product groups and links representing belonging of a product to a product group. Product groups may be selected based on a transition probability based on a random walk and a quality score based on usage of a product group page for the product group.
    Type: Application
    Filed: September 30, 2013
    Publication date: April 2, 2015
    Applicant: Wal-Mart Stores, Inc.
    Inventor: Shankara B. Subramanya
  • Patent number: 8996470
    Abstract: Methods and systems for maintaining the internal consistency of a fact repository are described. Accessed objects are checked for attribute-value pairs that have links to other objects. For any link to an object, the name of the linked-to object is inserted into the attribute-value pair having the link. The accessed objects are filtered to remove attribute-value pairs meeting predefined criteria, possibly resulting in null objects. Links to null objects are identified and removed.
    Type: Grant
    Filed: May 31, 2005
    Date of Patent: March 31, 2015
    Assignee: Google Inc.
    Inventors: Andrew William Hogue, Robert Joseph Siemborski, Jonathan T. Betz
  • Patent number: 8996460
    Abstract: In one aspect, a method to generate a point-in-time (PIT) snapshot of deduplication-based volume includes generating a virtual access data structure, generating a preliminary snapshot of the volume and modifying the preliminary snapshot to point to a block according to the virtual access data structure to generate the PIT snapshot of the deduplication-based volume.
    Type: Grant
    Filed: March 14, 2013
    Date of Patent: March 31, 2015
    Assignee: EMC Corporation
    Inventors: Shahar Frank, Assaf Natanzon, Jehuda Shemer
  • Patent number: 8996467
    Abstract: A distributed, cloud-based storage system provides a reliable, deduplicated, scalable and high performance backup service to heterogeneous clients that connect to it via a communications network. The distributed cloud-based storage system guarantees consistent and reliable data storage while using structured storage that lacks ACID compliance. Consistency and reliability are guaranteed using a system that includes: 1) back references from shared objects to referring objects, 2) safe orders of operation for object deletion and creation, 3) and simultaneous access to shared resources through sub-resources.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: March 31, 2015
    Assignee: Druva Inc.
    Inventors: Anand Apte, Faisal Puthuparackat, Jaspreet Singh, Milind Borate, Shekhar S. Deshkar
  • Patent number: 8996475
    Abstract: A global information management system (GIMS) includes a collection of standards and methods that allow information management on a global scale. A GIMS computer network includes a central registration database (CRD) and one or more GIMS computer systems connected over a network. Each GIMS computer system includes a relational database having a set of standardized tables. The CRD may provide a GIMS network-unique system ID to each GIMS computer system. Each GIMS computer system uses the GIMS network-unique system ID as part of a primary key for each record generated by and stored in the set of standardized tables of the GIMS database. The GIMS enables global database normalization through the globally unique identification of database records.
    Type: Grant
    Filed: July 17, 2013
    Date of Patent: March 31, 2015
    Assignee: Asibo Inc.
    Inventor: Borsu Asisi Namini