Data Cleansing, Data Scrubbing, And Deleting Duplicates Patents (Class 707/692)
  • Patent number: 11966630
    Abstract: A data storage device includes a memory device and a controller coupled to the memory device. The controller is configured to segment a key to physical (K2P) table into two or more segments, wherein each segment of the two or more segments corresponds to a caching priority of key value (KV) pair data, organize the K2P table by storing and relocating one or more K2P table entries into a respective segment of the two or more segments, wherein the storing and relocating comprises moving a K2P table entry based on the caching priority of the KV pair data into the respective segment having the caching priority, and utilize the K2P table to manage KV pair data stored in the memory device, wherein utilizing the K2P table comprises applying a same management operation, such as prefetching, to each K2P table entry of a same segment.
    Type: Grant
    Filed: June 27, 2022
    Date of Patent: April 23, 2024
    Assignee: Western Digital Technologies, Inc.
    Inventors: Ran Zamir, Alexander Bazarsky, David Avraham
  • Patent number: 11954331
    Abstract: A computer-implemented method enables workload scheduling in a storage system for optimized deduplication. The method includes determining dynamic correlations of deduplications between workload processes in a prior time window. Workload processes include one or more tasks with defined execution timing parameters. The method further includes determining deduplication ratios based on the correlations of the deduplications between the workload processes. The method further includes scheduling multiple workload processes based on a highest determined deduplication ratio of the determined deduplication ratios.
    Type: Grant
    Filed: October 7, 2021
    Date of Patent: April 9, 2024
    Assignee: International Business Machines Corporation
    Inventors: Miles Mulholland, Anuj Chandra, Kirsty G. Rodwell, Jorden Luke Allcock
  • Patent number: 11949751
    Abstract: The present disclosure relates to restricting electronic activities from being linked with record objects. According to at least one aspect of the disclosure, a method can include accessing, by one or more processors, a plurality of electronic activities, accessing a plurality of record objects of one or more systems of record, identifying an electronic activity of the plurality of electronic activities to match to one or more record objects, determining a data source provider associated with providing access to the electronic activity, and identifying a system of record corresponding to the determined data source provider. The system of record can include a plurality of candidate record objects to which to match the electronic activity. The method can include restricting the electronic activity from being linked with the at least one record object.
    Type: Grant
    Filed: January 23, 2023
    Date of Patent: April 2, 2024
    Inventors: Oleg Rogynskyy, Tetiana Lutsaievska, John Wulf, Sathya Hariesh Prakash
  • Patent number: 11936931
    Abstract: Methods, apparatus, systems and articles of manufacture to perform media device asset qualification are disclosed. An example apparatus includes at least one memory, and at least one processor to execute instructions to at least identify a first set of candidate media device assets for disqualification, the candidate media device assets including A) a signature and B) a media identifier that identifies media, generate a hash table using a second set of the candidate media device assets, determine one or more counts of matches between C) a first signature and a first media identifier of a first candidate media device asset of the second set and D) respective signatures and media identifiers of multiple ones of the second set using the hash table, the multiple ones of the second set not including the first candidate media device asset, and load the first signature into a reference database as a reference signature.
    Type: Grant
    Filed: October 17, 2022
    Date of Patent: March 19, 2024
    Assignee: The Nielsen Company (US), LLC
    Inventors: Daniel Nelson, James Petro, Albert T. Borawski
  • Patent number: 11934346
    Abstract: A cloud computing infrastructure hosts a web service with customer accounts. In a customer account, files of the customer account are listed in an index. Files indicated in the index are arranged in groups, with files in each group being scanned using scanning serverless functions in the customer account. The files in the customer account include a compressed tar archive of a software container. Member files of a compressed tar archive in a customer account are randomly-accessed by way of locators that indicate a tar offset, a logical offset, and a decompressor state for a corresponding member file. A member file is accessed by seeking to the tar offset in the compressed tar archive, restoring a decompressor to the decompressor state, decompressing the compressed tar archive using the decompressor, and moving to the logical offset in the decompressed data.
    Type: Grant
    Filed: October 17, 2022
    Date of Patent: March 19, 2024
    Assignee: Trend Micro Incorporated
    Inventor: Brendan M. Johnson
  • Patent number: 11914554
    Abstract: Methods and systems for improving data back-up, recovery, and search across different cloud-based applications, services, and platforms are described. A data management and storage system may direct compute and storage resources within a customer's cloud-based data storage account to back-up and restore data while the customer retains full control of their data. The data management and storage system may direct the compute and storage resources within the customer's cloud-based data storage account to generate and store secondary layers that are used for generating search indexes, to generate and store shared space layers and user specific layers to facilitate the deduplication of email attachments and text blocks, to perform a controlled restoration of email snapshots such that sensitive information (e.g., restricted keywords) located within stored snapshots remains protected, and to detect and preserve emails that were received or transmitted and then deleted between two consecutive snapshots.
    Type: Grant
    Filed: January 30, 2023
    Date of Patent: February 27, 2024
    Assignee: Rubrik, Inc.
    Inventors: Noel Moldvai, Jihang Lim
  • Patent number: 11907133
    Abstract: Standardized address generation from address substrings includes receiving an address string for a place-of-interest, one-to-many mapping at least one of a plurality of address substrings of the address string to respective address components, concatenating the address substrings using a template that specifies an order of concatenating the address substrings, and making the concatenated address substrings available for further use.
    Type: Grant
    Filed: July 29, 2022
    Date of Patent: February 20, 2024
    Assignee: SafeGraph, Inc.
    Inventor: Vera Sazonova
  • Patent number: 11893373
    Abstract: Techniques are disclosed for deploying functions in a cloud computing environment. Parameters are annotated in a plurality of Helm charts with a predetermined token. Duplicated values in the Helm charts are identified and the predetermined token is reused for the duplicated values. Schema files from the plurality of Helm charts are parsed to extract the predetermined tokens. Input data are received as values for the predetermined tokens. The function is deployed in the cloud computing environment using the values for the predetermined tokens as parameters in the Helm charts.
    Type: Grant
    Filed: January 28, 2022
    Date of Patent: February 6, 2024
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Frank John D'Innocenzo, Kam Yee Lee
  • Patent number: 11886397
    Abstract: Provided are methods and systems for determining multi-faceted trust scores for data. A method may commence with receiving data and determining a plurality of metadata items associated with the data. The method may continue with determining one or more facets associated with each of the plurality of metadata items. The method may further include determining a parameter and a weight associated with each of the one or more facets. Upon determining the parameter and the weight, a trust score associated with each of the plurality of metadata items may be calculated based on the parameter and the weight associated with each of the one or more facets. The method may further include calculating a multi-faceted trust score of the data based on the trust score of each of the plurality of metadata items.
    Type: Grant
    Filed: February 19, 2020
    Date of Patent: January 30, 2024
    Assignee: ASG Technologies Group, Inc.
    Inventors: Jean-Philippe Moresmau, Marcus MacNeill
  • Patent number: 11888936
    Abstract: A method for providing a proxy redirect to facilitate a storage and a retrieval of an object is disclosed. The method includes receiving a mapping of a user to a logical container that stores the object and to a storage provider that stores the logical container; receiving a key corresponding to the logical container and associated with the user; storing the mapping and the key in a database; generating, for the user, an application protocol that redirects to a pre-signed web address based on the stored mapping and the stored key; and transmitting, via a communication interface, the application protocol to the one user. The method further includes the user using the application protocol to directly access the storage provider and retrieve the object.
    Type: Grant
    Filed: July 1, 2020
    Date of Patent: January 30, 2024
    Assignee: JPMORGAN CHASE BANK, N.A.
    Inventor: Zachariah Antonas
  • Patent number: 11853326
    Abstract: A technology for retrieving data from a database. The technology includes receiving a search query specifying a target attribute and a target attribute value, accessing an index to determine one or more target files in which the target attribute value appears, the index including a plurality of attribute values, and for each of the attribute values, one or more files in which the attribute value appears, and retrieving data from the one or more target files.
    Type: Grant
    Filed: October 14, 2021
    Date of Patent: December 26, 2023
    Assignee: Google LLC
    Inventors: Hossein Ahmadi, Guang Cheng, Yannis Sismanis, Huong Thi Thu Phan, Shiyu Xie, Leo Chen, Zewen Zhang, Jing Jing Long, Amir Hossein Hormati
  • Patent number: 11836175
    Abstract: Semantic search techniques via focused summarizations are described. For example, a search query is received for a text-based content item in a data set comprising a plurality of text-based content items. A first feature vector representative of the search query is obtained. A respective semantic similarity score is determined between the first feature vector and each of a plurality of second feature vectors. Each of the second feature vectors is representative of a machine-generated summarization of a respective text-based content item. The machine-generated summarization comprises a plurality of multi-word fragments that are selected from the respective text-based content item via a transformer-based machine learning model. A search result is provided responsive to the search query.
    Type: Grant
    Filed: June 29, 2022
    Date of Patent: December 5, 2023
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Itzik Malkiel, Noam Koenigstein, Oren Barkan, Jonathan Ephrath, Yonathan Weill, Nir Nice
  • Patent number: 11797220
    Abstract: Data is ingested from a source system including by storing a plurality of data chunks in one or more chunk files and storing corresponding chunk identifiers associated with the plurality of data chunks in a first data structure. After data ingestion is complete, one or more duplicate data chunks that were stored during the data ingestion are determined and a second data structure is updated to include one or more entries corresponding to one or more determined duplicate data chunks.
    Type: Grant
    Filed: August 20, 2021
    Date of Patent: October 24, 2023
    Assignee: Cohesity, Inc.
    Inventors: Zhihuan Qiu, Sachin Jain, Anubhav Gupta, Apurv Gupta, Mohit Aron
  • Patent number: 11797486
    Abstract: A device configured to identify a file in a network device, to generate a first set of block hash codes for data blocks for a first instance of the file, and to generate a second set of block hash codes for data blocks for a second instance of the file. The device is further configured to determine the first set of block hash codes matches the second set of block hash codes and to generate an entry in a file list for the instances of the file. The device is further configured to count the number of entries that are associated with the file and to determine the number of entries is greater than the redundancy threshold value. The device is further configured to delete one or more instances of the file in response to determining that the number of entries is greater than the redundancy threshold value.
    Type: Grant
    Filed: January 3, 2022
    Date of Patent: October 24, 2023
    Assignee: Bank of America Corporation
    Inventors: Pratap Dande, Gilberto R. Dos Santos, Jayabalaji Murugan, Murali M. Atyam, Manoj Bohra
  • Patent number: 11782878
    Abstract: A deduplicated storage system storing objects receives a search term. Storage includes metadata and segments into which the objects have been split and deduplicated. The metadata includes fingerprint sequences according to which the segments should be assembled. A partial match is found when a prefix of the term is found at an end of a segment or a suffix is found at a beginning of the segment. A fingerprint of the segment having the partial match is recorded. A first sequence of fingerprints associated with a first object is read to check whether any fingerprints in the first sequence have been recorded. When a fingerprint in the sequence has been recorded, a check of a next fingerprint in the sequence is made to see if it has been recorded as having the partial match. If the next fingerprint has been recorded, the first object is reported as having the term.
    Type: Grant
    Filed: December 14, 2021
    Date of Patent: October 10, 2023
    Assignee: EMC IP Holding Company LLC
    Inventor: Philip Shilane
  • Patent number: 11748014
    Abstract: Host computers running applications that store data on a block-based storage system such as a SAN provide hints that differentiate IO data based on which application generated the IO. The hints may include tags that are associated with IO commands sent to the block-based storage system. Each host application is associated with a unique identifier that is placed in the tag. Application name-to-identifier mappings may be sent from the hosts to the block-based storage system. Per-identifier/application deduplication statistics are maintained by the block-based storage system and shared with other block-based storage system. Deduplication is disabled or de-emphasized for IO data generated by applications with statistically low deduplication ratios.
    Type: Grant
    Filed: February 14, 2020
    Date of Patent: September 5, 2023
    Assignee: DELL PRODUCTS L.P.
    Inventors: Kurumurthy Gokam, Md Haris Iqbal, Prasad Paple, Kundan Kumar
  • Patent number: 11734239
    Abstract: A record processing and storage system is operable to receive a plurality of labeled row data from a data source. Each labeled row data of the plurality of labeled row data includes at least one record and a corresponding row number of a plurality of row numbers. A plurality of pages are generated from records included in the labeled row data. The plurality of pages are stored via a page storage system. A plurality of page metadata corresponding to the plurality of pages is generated, where each of the plurality of page metadata is generated based on at least corresponding one row number of at least one labeled row data with records included in a corresponding one of the plurality of pages. Deduplication of duplicated records included the plurality of pages is facilitated based on the plurality of page metadata.
    Type: Grant
    Filed: March 15, 2022
    Date of Patent: August 22, 2023
    Assignee: Ocient Holdings LLC
    Inventors: George Kondiles, Ravi V. Khadiwala, Donald Scott Clark, Anna Veselova
  • Patent number: 11704036
    Abstract: Systems and method for implementing deduplication process based on performance analyses. The system may include a processing device to determine a first performance metric associated with retrieving a second stored data block that is within a specified range of a duplicate of the first data block and a second performance metric associated with retrieving a hash value corresponding to the second stored data block. The processing device further to retrieve the second stored data block within a specified range of the duplicate of the first data block in response to the first performance metric not exceeding the second performance metric.
    Type: Grant
    Filed: November 16, 2018
    Date of Patent: July 18, 2023
    Assignee: PURE STORAGE, INC.
    Inventors: John Colgrove, Ronald Karr, Ethan L. Miller
  • Patent number: 11681660
    Abstract: Embodiments presented herein describe techniques for deduplicating chunks of data across multiple clusters. A process executing in a storage system identifies one or more chunks in an incoming stream of data. For each chunk, a first fingerprint corresponding to the chunk is generated. The process determines whether the first fingerprint matches a second fingerprint listed in a corresponding entry in a deduplication map. Each entry of the deduplication map corresponds to a chunk stored in a location in one of the storage clusters. Upon determining that the first fingerprint matches the second fingerprint, the process writes, to a local persistent storage, a pointer referencing the location in that storage cluster.
    Type: Grant
    Filed: January 22, 2021
    Date of Patent: June 20, 2023
    Assignee: Cohesity, Inc.
    Inventor: Ganesha Shanmuganathan
  • Patent number: 11675741
    Abstract: Methods and systems for improving data back-up, recovery, and search across different cloud-based applications, services, and platforms are described. A data management and storage system may direct compute and storage resources within a customer's cloud-based data storage account to back-up and restore data while the customer retains full control of their data. The data management and storage system may direct the compute and storage resources within the customer's cloud-based data storage account to generate and store secondary layers that are used for generating search indexes, to generate and store shared space layers and user specific layers to facilitate the deduplication of email attachments and text blocks, to perform a controlled restoration of email snapshots such that sensitive information (e.g., restricted keywords) located within stored snapshots remains protected, and to detect and preserve emails that were received or transmitted and then deleted between two consecutive snapshots.
    Type: Grant
    Filed: July 8, 2021
    Date of Patent: June 13, 2023
    Assignee: Rubrik, Inc.
    Inventors: Noel Moldvai, Jihang Lim
  • Patent number: 11663194
    Abstract: In one example, a method includes receiving, at a cloud storage site, chunks that each take the form of a hash of a combination that includes two or more salts and a file object, and one of the salts is a retention salt shared by the chunks, monitoring a time period associated with the retention salt, when the time period has expired, removing the chunks that include the retention salt, and depositing the removed chunks in a deleted items cloud store.
    Type: Grant
    Filed: October 28, 2021
    Date of Patent: May 30, 2023
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventor: Peter Marelas
  • Patent number: 11663178
    Abstract: A deduplicated storage system is provided according to certain embodiments that uses one or more mechanisms to assign the deduplication databases based on the type of the client device and automatically create a new deduplication database when critical thresholds are reached. In other embodiments, deduplication databases are further split into multiple database partitions. Based on a data block distribution policy, each data block is then further assigned to a particular database partition within the deduplication database to further improve efficiency and speed of the deduplication process.
    Type: Grant
    Filed: October 23, 2020
    Date of Patent: May 30, 2023
    Assignee: Commvault Systems, Inc.
    Inventor: Prasad Nara
  • Patent number: 11663196
    Abstract: In one example, a method includes receiving, at a cloud storage site, chunks that each take the form of a hash of a combination that includes two or more salts and a file object, and one of the salts is a retention salt shared by the chunks, monitoring a time period associated with the retention salt, when the time period has expired, removing the chunks that include the retention salt, and depositing the removed chunks in a deleted items cloud store.
    Type: Grant
    Filed: October 28, 2021
    Date of Patent: May 30, 2023
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventor: Peter Marelas
  • Patent number: 11663195
    Abstract: In one example, a method includes receiving, at a cloud storage site, chunks that each take the form of a hash of a combination that includes two or more salts and a file object, and one of the salts is a retention salt shared by the chunks, monitoring a time period associated with the retention salt, when the time period has expired, removing the chunks that include the retention salt, and depositing the removed chunks in a deleted items cloud store.
    Type: Grant
    Filed: October 28, 2021
    Date of Patent: May 30, 2023
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventor: Peter Marelas
  • Patent number: 11665377
    Abstract: Aspects of the subject disclosure may include, for example, a device having a processing system including a processor; and a memory that stores executable instructions that, when executed by the processing system, facilitate performance of operations including receiving encrypted hypertext transport protocol (HTTPS) traffic including media content; separating the HTTPS traffic into audio segments and video segments; calculating a size for each audio segment in the HTTPS traffic; maintaining a sliding window of a plurality of sizes of consecutive audio segments to form a fingerprint; and identifying the media content by matching the fingerprint with a reference in a catalog. Other embodiments are disclosed.
    Type: Grant
    Filed: April 23, 2021
    Date of Patent: May 30, 2023
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Yuan Ding, Natalia Schenck, Daniel Sanchez, Umut Akyol, Lawrence E. Bakst, Vinay Sharma
  • Patent number: 11659006
    Abstract: An assessment component that facilitates assessment and enforcement of policies within a computer environment can comprise a compliance component that determines whether a policy, that defines one or more requirements associated with usage of one or more enterprise components of an enterprise computing system, is in compliance with a plurality of standardized policies that govern operation of the one or more enterprise components of the enterprise computing system. The assessment component can also comprise a policy optimization component that determines one or more changes to the policy that achieve the compliance with the plurality of standardized polices based on a determination that the policy complies with a first standardized policy of the plurality of standardized policies and fails to comply with a second standardized policy of the plurality of standardized policies.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: May 23, 2023
    Assignee: Kyndryl, Inc.
    Inventors: Milton H. Hernandez, Anup Kalia, Brian Peterson, Vugranam C. Sreedhar, Sai Zeng
  • Patent number: 11625167
    Abstract: An embodiment of a semiconductor apparatus may include technology to determine if a threshold is met based on runtime memory usage, and enable foreground memory deduplication if the threshold is determined to be met. Other embodiments are disclosed and claimed.
    Type: Grant
    Filed: November 16, 2018
    Date of Patent: April 11, 2023
    Assignee: Intel Corporation
    Inventors: Dujian Wu, Yuping Yang, Donggui Yin
  • Patent number: 11609883
    Abstract: An apparatus in one embodiment comprises at least one processing device comprising a processor coupled to a memory. The processing device is configured to identify a dataset to be scanned to generate a compression estimate for that dataset, to designate a scan criterion to be utilized in the scan, and for each of a plurality of pages of the dataset, to scan the page, where scanning the page includes performing a computation on the page to obtain a page result, determining whether or not the page result satisfies the designated scan criterion, and responsive to the page result satisfying the designated scan criterion, updating a corresponding entry of a compression estimate table for the dataset. The processing device generates the compression estimate for the dataset based at least in part on contents of the compression estimate table.
    Type: Grant
    Filed: May 29, 2018
    Date of Patent: March 21, 2023
    Assignee: EMC IP Holding Company LLC
    Inventors: Anton Kucherov, David Meiri
  • Patent number: 11599507
    Abstract: A file system may include an object storage, a merged index, and a distributed database. When a file is stored in the file system, the file may be converted to an object and be stored in the object storage. The deduplication index of the file may be stored in the distributed database. The namespace metadata of the file may be stored in the merged index. The merged index generates namespace entries of the file when the file is created, deleted, and/or modified. A namespace entry may be associated with a specific file and may include a creation version and a deletion version. When a file is deleted or modified, instead of modifying the existing namespace entries, new entries associated with different versions and including different creation or deletion versions are created. The status of a file may be monitored by one or more entries associated with a file.
    Type: Grant
    Filed: December 9, 2021
    Date of Patent: March 7, 2023
    Assignee: Druva Inc.
    Inventors: Milind Borate, Alok Kumar, Aditya Agrawal, Anup Agarwal, Somesh Jain, Aditya Kelkar, Yogendra Acharya, Anand Apte, Amit Kulkarni
  • Patent number: 11593028
    Abstract: A method of operating a computing device for processing data is provided. The method includes (a) monitoring a set of performance characteristics of the processing of the data; (b) periodically calculating, using a predefined set of coefficients, a linear combination of the monitored set of performance characteristics to yield a combined metric; and (c) upon detecting that the combined metric exceeds a threshold while operating in a first processing mode, transitioning from operating in the first processing mode to operating in a second processing mode. (1) The second processing mode has a higher bandwidth than the first processing mode, and (2) processing of data in the second processing mode is less robust than processing of data in the first processing mode. An apparatus, system, and computer program product for performing a similar method are also provided.
    Type: Grant
    Filed: March 11, 2021
    Date of Patent: February 28, 2023
    Assignee: EMC IP Holding Company LLC
    Inventors: Vladimir Shveidel, Alexei Kabishcer
  • Patent number: 11573928
    Abstract: Techniques for processing data may include: receiving a data block stored in a data set, wherein a hash value is derived from the data block; determining, in accordance with selection criteria, whether the hash value is included in a subset; responsive to determining the hash value is included in the subset, performing processing that updates a table in accordance with the hash value and the data set, and determining, in accordance with the information in the table, whether to perform deduplication processing for the data block to determine whether the data block is a duplicate of another stored data block. The table may include an entry for the hash value. The entry may include information identifying data sets referencing the data block and, for each of the data sets, may specify a reference count denoting a number of times the data set references the data block.
    Type: Grant
    Filed: March 13, 2020
    Date of Patent: February 7, 2023
    Assignee: EMC IP Holding Company LLC
    Inventors: Anton Kucherov, David Meiri
  • Patent number: 11573924
    Abstract: Methods and systems for storing and managing large numbers of small files. A data processing system includes clients that generate large numbers be stored on a storage device managed by a File System (FS). An Archive Server (AS) receives multiple files from the client, archives the files in larger archives, and sends the archives to the FS for storage. When requested to read a file, the AS retrieves the archive in which the file is stored, extracts the file and sends it to the requesting client. In other words, the AS communicates with the clients in individual file units, and with the storage device in archive units. The AS is typically constructed as an add-on layer on top of a conventional FS, which enables the FS to handle small files efficiently without modification.
    Type: Grant
    Filed: September 23, 2019
    Date of Patent: February 7, 2023
    Assignee: COGNYTE TECHNOLOGIES ISRAEL LTD.
    Inventor: Yossi Chai
  • Patent number: 11570196
    Abstract: A method for determining duplication of a vulnerability may include a vulnerability extraction step of extracting vulnerability uniform resource locator (URL) addresses including the vulnerability from an analysis target server; a hash generation step of generating the URL hash value corresponding to the extracted vulnerability from the vulnerability URL address; and a duplication determination step of determining, when the URL hash value is present in the first comparison table, that the vulnerability is duplicated and excluding the corresponding vulnerability from vulnerability information.
    Type: Grant
    Filed: February 26, 2020
    Date of Patent: January 31, 2023
    Assignee: NAVER CLOUD CORPORATION
    Inventors: Bong Goo Kang, Min Seob Lee, Won Tae Jang, June Ahn, Jihwan Yoon
  • Patent number: 11561863
    Abstract: A method for enabling data set changes to be reverted to a prior point in time or state is disclosed. In one embodiment, such a method includes providing a data set comprising one or more data elements and a specified number of generations of the data elements. In certain embodiments, the data set is a partitioned data set extended (PDSE) data set, and the data elements are “members” within the PDSE data set. The method further includes tracking changes made by a job to data elements of the data set. The method further references, in a data structure (also referred to herein as a “cluster”) associated with the job, previous generations of the data elements changed by the job. In certain embodiments, the data structure is stored in the data set. A corresponding system and computer program product are also disclosed.
    Type: Grant
    Filed: August 20, 2015
    Date of Patent: January 24, 2023
    Assignee: International Business Machines Corporation
    Inventors: Trevor A. Geisler, David C. Reed, Thomas C. Reed, Max D. Smith
  • Patent number: 11539811
    Abstract: Systems, devices and methods for adaptive compression of stored information includes a memory management computing device programmed to monitor a size of a plurality of data structures stored in a data repository. The computing device compares the size of each of a plurality of data structures to a predetermined threshold. When a size of an uncompressed data structure meets the threshold, the memory management computing device calculates a value of a first compression parameter based on a value of a first parameter and a value of a second parameter of each data element of the uncompressed data structure, calculates a value of a second compression parameter based the value of the first parameter of each data element of the uncompressed data structure, generates a compressed data structure based on the value of the first compression parameter and the second compression parameter; and replaces, in the data repository, the uncompressed data structure with the compressed data structure.
    Type: Grant
    Filed: June 21, 2022
    Date of Patent: December 27, 2022
    Assignee: Chicago Mercantile Exchange Inc.
    Inventors: Fateen Sharaby, Sriram A. Raju Datla, Dhiraj Subhash Bawadhankar, John Charles Redfield, Justin Yeong-Juin Lee
  • Patent number: 11520744
    Abstract: Described is a system (and method) that intelligently distributes data within a clustered storage environment. To provide such a capability, the system may distribute backup files by considering a source of the data to be backed-up. In particular, the system may leverage the ability of front-end components such as a backup application to perform a granular data source identification of data. Such information may be propagated to back-end components such as a storage filesystem in the form of a data source identifier (e.g. placement tag). The data source identifiers may then be accessed by the clustered storage system to intelligently distribute backup files amongst a set of storage nodes forming a cluster. For example, backup files from the same data source may be stored on the same storage node to obtain the same deduplication efficiency as a single storage system.
    Type: Grant
    Filed: August 21, 2019
    Date of Patent: December 6, 2022
    Assignee: EMC IP Holding Company LLC
    Inventors: Abhishek Rajimwale, George Mathew, Murthy Mamidi, Donna Barry Lewis
  • Patent number: 11514054
    Abstract: Supervised partitioning is used to perform record matching. A request to identify matches between records is received. A graph representation that indicates similarities between the records is partitioned and an evaluation of the partitioning is performed according to a supervised machine learning technique to generate a confidence value in the partitioning. An indication of equivalent records according to the partitioning and the confidence value of the partitioning may be provided.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: November 29, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Andrew Borthwick, Robert Anthony Barton, Jr., Stephen Michael Ash, Russell Reas
  • Patent number: 11514025
    Abstract: Performing snapshot conscious internal file modification for network-attached storage is presented herein. A file system can comprise a first component configured to modify, during a service request, storage for a subset of data blocks of a file—the service request not being recognized by an external entity as a change of content of the file. Further, the file system can comprise a second component configured to prevent, based on the service request, a copy of the storage from being created for servicing of a snapshot—the snapshot comprising a point-in-time copy of the file system.
    Type: Grant
    Filed: August 19, 2019
    Date of Patent: November 29, 2022
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventor: Ravi V. Batchu
  • Patent number: 11500841
    Abstract: Systems, computer-implemented methods, and computer program products that can facilitate encoding a tree data structure into a vector based on a set of constraints are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a constraint former that can form a set of constraints based on a first tree data structure and a vector encoder that can encode the first tree data structure into a vector based on the set of constraints.
    Type: Grant
    Filed: January 4, 2019
    Date of Patent: November 15, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Achille Fokoue-Nkoutche, Maxwell Crouse, Michael Witbrock, Ryan A. Musa, Maria Chang
  • Patent number: 11474700
    Abstract: Technologies for compressing communications for accelerator devices are disclosed. An accelerator device may include a communication abstraction logic units to manage communication with one or more remote accelerator devices. The communication abstraction logic unit may receive communication to and from a kernel on the accelerator device. The communication abstraction logic unit may compress and decompress the communication without instruction from the corresponding kernel. The communication abstraction logic unit may choose when and how to compress communications based on telemetry of the accelerator device and the remote accelerator device.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: October 18, 2022
    Assignee: Intel Corporation
    Inventors: Susanne M. Balle, Evan Custodio, Francesc Guim Bernat
  • Patent number: 11468063
    Abstract: The subject technology provides information, corresponding to properties of a build side of a join operation, to a bloom filter. The subject technology, based at least in part on the information from the bloom filter, determines, during executing of a query plan, at least one property of the join operation to determine whether to switch an aggregation operator to a pass through mode, the at least one property comprising at least a reduction rate. The subject technology, switches, in response to the reduction rate being below a threshold value, the aggregation operator to the pass through mode during runtime of the query plan and, while the aggregation operator is in the pass through mode, an input stream of data goes through the aggregation operator without being analyzed and the input stream of data matches an output stream of data flowing out of the aggregation operator.
    Type: Grant
    Filed: April 16, 2021
    Date of Patent: October 11, 2022
    Assignee: Snowflake Inc.
    Inventors: Bowei Chen, Thierry Cruanes, Florian Andreas Funke, Allison Waingold Lee, Jiaqi Yan
  • Patent number: 11461269
    Abstract: A data management device includes a persistent storage and a processor. The persistent storage includes an object storage. The processor segments a file into file segments. The processor generates meta-data of the file segments. The processor stores a portion of the file segments in a data object of the object storage. The processor stores a portion of the meta-data of the file segments in a meta-data object of the object storage.
    Type: Grant
    Filed: July 21, 2017
    Date of Patent: October 4, 2022
    Assignee: EMC IP HOLDING COMPANY
    Inventors: Shuang Liang, Mahesh Kamat, Bhimsen Bhanjois
  • Patent number: 11429573
    Abstract: A data deduplication system includes a data deduplication subsystem coupled to each of a host system and a storage system. The data deduplication system receives data from the host system, generates a data deduplication identifier for the data, and determines whether the data deduplication identifier for the data is stored in a data deduplication database. In response to determining that the data deduplication identifier is not stored in the data deduplication database, the data deduplication system stores the data deduplication identifier for the data in the data deduplication database in association with a data counter for the data, and transmits the data to the storage system for storage. In response to determining that the data deduplication identifier is stored in the data deduplication database, the data deduplication system increments a data counter that is associated with the data deduplication identifier in the data deduplication database, and discards the data.
    Type: Grant
    Filed: October 16, 2019
    Date of Patent: August 30, 2022
    Assignee: Dell Products L.P.
    Inventors: Dharmesh M. Patel, Ravikanth Chaganti, Rizwan Ali
  • Patent number: 11429634
    Abstract: In some embodiments, an interface of a content management system manages synchronized content on storage systems. For example, the interface stores, on a metadata storage structure, records of metadata associated with blocks of data stored on a storage, the records including block identifiers that uniquely identify the blocks and timestamps associated with the blocks. The interface identifies a batch of storage operations associated with the blocks, including one or more delete operations. For each delete operation, the interface queries the metadata storage structure for a timestamp corresponding to a block of data associated with the delete operation, determines whether the delete operation creates a race condition between the delete operation and an add operation associated with the block of data, and rejects the delete operation when the delete operation creates the race condition or the timestamp corresponding to the block of data is newer than a predetermined period of time.
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: August 30, 2022
    Assignee: Dropbox, Inc.
    Inventors: Nipunn Koorapati, Daniel Horn, Elmer Charles Jubb, IV
  • Patent number: 11429575
    Abstract: Example methods, apparatus, systems and articles of manufacture (e.g., physical storage media) to deduplicate common devices across multiple data sources are disclosed. An example system includes a comparison controller to identify a first device in a first data source and a second device in a second data source as a possible common device.
    Type: Grant
    Filed: July 10, 2020
    Date of Patent: August 30, 2022
    Assignee: THE NIELSEN COMPANY (US), LLC
    Inventors: Rachel Worth Olson, Michael Evan Anderson, Rishi Sriram, Margaret M. Orton, Fatemehossadat Miri, Samantha M. Mowrer, David J. Kurzynski, Molly Poppie
  • Patent number: 11423027
    Abstract: A system and method for a text search of a database, including converting a text search expression to a query plan and implementing the text search as the query plan on the database. The implementing of the text search includes a one-pass indexing as a single scan of an inverse index table associated with the database.
    Type: Grant
    Filed: January 29, 2016
    Date of Patent: August 23, 2022
    Assignee: MICRO FOCUS LLC
    Inventors: Qiming Chen, Meichun Hsu, Malu G. Castellanos
  • Patent number: 11416316
    Abstract: A first-to-second correlation engine determines correlations between first objects from a first object feed, and second objects from a second object storage, and generates first correlation messages indicative of the correlations for a first-to-second object direction and a second-to-first object direction. A second-to-first correlation engine determines respective correlations between the second objects from a second object feed and the first objects from a first object storage, and generates second correlation messages indicative of the respective correlations for the second-to-first object direction and the first-to-second object direction. A first-to-second correlation storage engine receives the first and second correlation messages for the first-to-second object direction and updates first-to-second correlation storage based on the received messages.
    Type: Grant
    Filed: October 15, 2020
    Date of Patent: August 16, 2022
    Assignee: AMADEUS S.A.S.
    Inventors: Serge Beuzit, Jean-Samuel Pasquali
  • Patent number: 11409766
    Abstract: Disclosed herein is the creation of probabilistic data structures for container reclamation. One method involves retrieving a segment object list of a data container and creating a probabilistic data structure. The segment object list comprises a plurality of segment objects, the data container comprises the plurality of segment objects and a plurality of data objects, and each segment object of the plurality of segment objects comprises a hash value determined by performing a hashing function on a corresponding data object of the plurality of data objects. The creating includes, for each segment object in the segment object list, identifying an element of a plurality of elements of the probabilistic data structure using a hash value of the each segment object and setting the element to indicate the segment object references a corresponding data object of the plurality of data objects.
    Type: Grant
    Filed: October 26, 2020
    Date of Patent: August 9, 2022
    Assignee: Veritas Technologies LLC
    Inventors: Yingsong Jia, Xin Wang, Guangbin Zhang
  • Patent number: 11403266
    Abstract: A method for deleting a row from a table in a database system comprises logically deleting the row in the first table in the database system by inserting a key of the row into a corresponding row of a dedicated table in the database system; querying the dedicated table during a query against the first table to identify the corresponding row in the dedicated table; and in response to identifying the corresponding row in the dedicated table, deleting the row from the first table and the corresponding row from the dedicated table as part of query processing during a subsequent query.
    Type: Grant
    Filed: June 4, 2019
    Date of Patent: August 2, 2022
    Assignee: International Business Machines Corporation
    Inventors: Andreas Brodt, Oliver Koeth, Daniel Martin, Knut Stolze
  • Patent number: 11403019
    Abstract: A method includes receiving a request to write a data block to a volume resident on a multi-tenant storage array, wherein the request is associated with a first tenant of the multi-tenant storage array, and determining whether the data block matches an existing data block on the multi-tenant storage array, wherein the existing block corresponds to a second tenant. In response to determining that the decrypted data block matches the existing data block: encrypting the existing data block with a shared volume encryption key; encrypting the shared volume encryption key with a first tenant encryption key and providing the shared volume encryption key encrypted with the first tenant encryption key to the first tenant; and encrypting the shared volume encryption key with a second tenant encryption key and providing the shared volume encryption key encrypted with the second tenant encryption key to the second tenant.
    Type: Grant
    Filed: October 26, 2018
    Date of Patent: August 2, 2022
    Assignee: Pure Storage, Inc.
    Inventors: Swapnil Chandrashekhar Nagle, Virendra Prakashaiah, Ronald Karr