Data Cleansing, Data Scrubbing, And Deleting Duplicates Patents (Class 707/692)
-
Patent number: 11966630Abstract: A data storage device includes a memory device and a controller coupled to the memory device. The controller is configured to segment a key to physical (K2P) table into two or more segments, wherein each segment of the two or more segments corresponds to a caching priority of key value (KV) pair data, organize the K2P table by storing and relocating one or more K2P table entries into a respective segment of the two or more segments, wherein the storing and relocating comprises moving a K2P table entry based on the caching priority of the KV pair data into the respective segment having the caching priority, and utilize the K2P table to manage KV pair data stored in the memory device, wherein utilizing the K2P table comprises applying a same management operation, such as prefetching, to each K2P table entry of a same segment.Type: GrantFiled: June 27, 2022Date of Patent: April 23, 2024Assignee: Western Digital Technologies, Inc.Inventors: Ran Zamir, Alexander Bazarsky, David Avraham
-
Patent number: 11954331Abstract: A computer-implemented method enables workload scheduling in a storage system for optimized deduplication. The method includes determining dynamic correlations of deduplications between workload processes in a prior time window. Workload processes include one or more tasks with defined execution timing parameters. The method further includes determining deduplication ratios based on the correlations of the deduplications between the workload processes. The method further includes scheduling multiple workload processes based on a highest determined deduplication ratio of the determined deduplication ratios.Type: GrantFiled: October 7, 2021Date of Patent: April 9, 2024Assignee: International Business Machines CorporationInventors: Miles Mulholland, Anuj Chandra, Kirsty G. Rodwell, Jorden Luke Allcock
-
Patent number: 11949751Abstract: The present disclosure relates to restricting electronic activities from being linked with record objects. According to at least one aspect of the disclosure, a method can include accessing, by one or more processors, a plurality of electronic activities, accessing a plurality of record objects of one or more systems of record, identifying an electronic activity of the plurality of electronic activities to match to one or more record objects, determining a data source provider associated with providing access to the electronic activity, and identifying a system of record corresponding to the determined data source provider. The system of record can include a plurality of candidate record objects to which to match the electronic activity. The method can include restricting the electronic activity from being linked with the at least one record object.Type: GrantFiled: January 23, 2023Date of Patent: April 2, 2024Inventors: Oleg Rogynskyy, Tetiana Lutsaievska, John Wulf, Sathya Hariesh Prakash
-
Patent number: 11936931Abstract: Methods, apparatus, systems and articles of manufacture to perform media device asset qualification are disclosed. An example apparatus includes at least one memory, and at least one processor to execute instructions to at least identify a first set of candidate media device assets for disqualification, the candidate media device assets including A) a signature and B) a media identifier that identifies media, generate a hash table using a second set of the candidate media device assets, determine one or more counts of matches between C) a first signature and a first media identifier of a first candidate media device asset of the second set and D) respective signatures and media identifiers of multiple ones of the second set using the hash table, the multiple ones of the second set not including the first candidate media device asset, and load the first signature into a reference database as a reference signature.Type: GrantFiled: October 17, 2022Date of Patent: March 19, 2024Assignee: The Nielsen Company (US), LLCInventors: Daniel Nelson, James Petro, Albert T. Borawski
-
Patent number: 11934346Abstract: A cloud computing infrastructure hosts a web service with customer accounts. In a customer account, files of the customer account are listed in an index. Files indicated in the index are arranged in groups, with files in each group being scanned using scanning serverless functions in the customer account. The files in the customer account include a compressed tar archive of a software container. Member files of a compressed tar archive in a customer account are randomly-accessed by way of locators that indicate a tar offset, a logical offset, and a decompressor state for a corresponding member file. A member file is accessed by seeking to the tar offset in the compressed tar archive, restoring a decompressor to the decompressor state, decompressing the compressed tar archive using the decompressor, and moving to the logical offset in the decompressed data.Type: GrantFiled: October 17, 2022Date of Patent: March 19, 2024Assignee: Trend Micro IncorporatedInventor: Brendan M. Johnson
-
Patent number: 11914554Abstract: Methods and systems for improving data back-up, recovery, and search across different cloud-based applications, services, and platforms are described. A data management and storage system may direct compute and storage resources within a customer's cloud-based data storage account to back-up and restore data while the customer retains full control of their data. The data management and storage system may direct the compute and storage resources within the customer's cloud-based data storage account to generate and store secondary layers that are used for generating search indexes, to generate and store shared space layers and user specific layers to facilitate the deduplication of email attachments and text blocks, to perform a controlled restoration of email snapshots such that sensitive information (e.g., restricted keywords) located within stored snapshots remains protected, and to detect and preserve emails that were received or transmitted and then deleted between two consecutive snapshots.Type: GrantFiled: January 30, 2023Date of Patent: February 27, 2024Assignee: Rubrik, Inc.Inventors: Noel Moldvai, Jihang Lim
-
Patent number: 11907133Abstract: Standardized address generation from address substrings includes receiving an address string for a place-of-interest, one-to-many mapping at least one of a plurality of address substrings of the address string to respective address components, concatenating the address substrings using a template that specifies an order of concatenating the address substrings, and making the concatenated address substrings available for further use.Type: GrantFiled: July 29, 2022Date of Patent: February 20, 2024Assignee: SafeGraph, Inc.Inventor: Vera Sazonova
-
Patent number: 11893373Abstract: Techniques are disclosed for deploying functions in a cloud computing environment. Parameters are annotated in a plurality of Helm charts with a predetermined token. Duplicated values in the Helm charts are identified and the predetermined token is reused for the duplicated values. Schema files from the plurality of Helm charts are parsed to extract the predetermined tokens. Input data are received as values for the predetermined tokens. The function is deployed in the cloud computing environment using the values for the predetermined tokens as parameters in the Helm charts.Type: GrantFiled: January 28, 2022Date of Patent: February 6, 2024Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Frank John D'Innocenzo, Kam Yee Lee
-
Patent number: 11886397Abstract: Provided are methods and systems for determining multi-faceted trust scores for data. A method may commence with receiving data and determining a plurality of metadata items associated with the data. The method may continue with determining one or more facets associated with each of the plurality of metadata items. The method may further include determining a parameter and a weight associated with each of the one or more facets. Upon determining the parameter and the weight, a trust score associated with each of the plurality of metadata items may be calculated based on the parameter and the weight associated with each of the one or more facets. The method may further include calculating a multi-faceted trust score of the data based on the trust score of each of the plurality of metadata items.Type: GrantFiled: February 19, 2020Date of Patent: January 30, 2024Assignee: ASG Technologies Group, Inc.Inventors: Jean-Philippe Moresmau, Marcus MacNeill
-
Patent number: 11888936Abstract: A method for providing a proxy redirect to facilitate a storage and a retrieval of an object is disclosed. The method includes receiving a mapping of a user to a logical container that stores the object and to a storage provider that stores the logical container; receiving a key corresponding to the logical container and associated with the user; storing the mapping and the key in a database; generating, for the user, an application protocol that redirects to a pre-signed web address based on the stored mapping and the stored key; and transmitting, via a communication interface, the application protocol to the one user. The method further includes the user using the application protocol to directly access the storage provider and retrieve the object.Type: GrantFiled: July 1, 2020Date of Patent: January 30, 2024Assignee: JPMORGAN CHASE BANK, N.A.Inventor: Zachariah Antonas
-
Patent number: 11853326Abstract: A technology for retrieving data from a database. The technology includes receiving a search query specifying a target attribute and a target attribute value, accessing an index to determine one or more target files in which the target attribute value appears, the index including a plurality of attribute values, and for each of the attribute values, one or more files in which the attribute value appears, and retrieving data from the one or more target files.Type: GrantFiled: October 14, 2021Date of Patent: December 26, 2023Assignee: Google LLCInventors: Hossein Ahmadi, Guang Cheng, Yannis Sismanis, Huong Thi Thu Phan, Shiyu Xie, Leo Chen, Zewen Zhang, Jing Jing Long, Amir Hossein Hormati
-
Patent number: 11836175Abstract: Semantic search techniques via focused summarizations are described. For example, a search query is received for a text-based content item in a data set comprising a plurality of text-based content items. A first feature vector representative of the search query is obtained. A respective semantic similarity score is determined between the first feature vector and each of a plurality of second feature vectors. Each of the second feature vectors is representative of a machine-generated summarization of a respective text-based content item. The machine-generated summarization comprises a plurality of multi-word fragments that are selected from the respective text-based content item via a transformer-based machine learning model. A search result is provided responsive to the search query.Type: GrantFiled: June 29, 2022Date of Patent: December 5, 2023Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Itzik Malkiel, Noam Koenigstein, Oren Barkan, Jonathan Ephrath, Yonathan Weill, Nir Nice
-
Patent number: 11797220Abstract: Data is ingested from a source system including by storing a plurality of data chunks in one or more chunk files and storing corresponding chunk identifiers associated with the plurality of data chunks in a first data structure. After data ingestion is complete, one or more duplicate data chunks that were stored during the data ingestion are determined and a second data structure is updated to include one or more entries corresponding to one or more determined duplicate data chunks.Type: GrantFiled: August 20, 2021Date of Patent: October 24, 2023Assignee: Cohesity, Inc.Inventors: Zhihuan Qiu, Sachin Jain, Anubhav Gupta, Apurv Gupta, Mohit Aron
-
Patent number: 11797486Abstract: A device configured to identify a file in a network device, to generate a first set of block hash codes for data blocks for a first instance of the file, and to generate a second set of block hash codes for data blocks for a second instance of the file. The device is further configured to determine the first set of block hash codes matches the second set of block hash codes and to generate an entry in a file list for the instances of the file. The device is further configured to count the number of entries that are associated with the file and to determine the number of entries is greater than the redundancy threshold value. The device is further configured to delete one or more instances of the file in response to determining that the number of entries is greater than the redundancy threshold value.Type: GrantFiled: January 3, 2022Date of Patent: October 24, 2023Assignee: Bank of America CorporationInventors: Pratap Dande, Gilberto R. Dos Santos, Jayabalaji Murugan, Murali M. Atyam, Manoj Bohra
-
Patent number: 11782878Abstract: A deduplicated storage system storing objects receives a search term. Storage includes metadata and segments into which the objects have been split and deduplicated. The metadata includes fingerprint sequences according to which the segments should be assembled. A partial match is found when a prefix of the term is found at an end of a segment or a suffix is found at a beginning of the segment. A fingerprint of the segment having the partial match is recorded. A first sequence of fingerprints associated with a first object is read to check whether any fingerprints in the first sequence have been recorded. When a fingerprint in the sequence has been recorded, a check of a next fingerprint in the sequence is made to see if it has been recorded as having the partial match. If the next fingerprint has been recorded, the first object is reported as having the term.Type: GrantFiled: December 14, 2021Date of Patent: October 10, 2023Assignee: EMC IP Holding Company LLCInventor: Philip Shilane
-
Patent number: 11748014Abstract: Host computers running applications that store data on a block-based storage system such as a SAN provide hints that differentiate IO data based on which application generated the IO. The hints may include tags that are associated with IO commands sent to the block-based storage system. Each host application is associated with a unique identifier that is placed in the tag. Application name-to-identifier mappings may be sent from the hosts to the block-based storage system. Per-identifier/application deduplication statistics are maintained by the block-based storage system and shared with other block-based storage system. Deduplication is disabled or de-emphasized for IO data generated by applications with statistically low deduplication ratios.Type: GrantFiled: February 14, 2020Date of Patent: September 5, 2023Assignee: DELL PRODUCTS L.P.Inventors: Kurumurthy Gokam, Md Haris Iqbal, Prasad Paple, Kundan Kumar
-
Patent number: 11734239Abstract: A record processing and storage system is operable to receive a plurality of labeled row data from a data source. Each labeled row data of the plurality of labeled row data includes at least one record and a corresponding row number of a plurality of row numbers. A plurality of pages are generated from records included in the labeled row data. The plurality of pages are stored via a page storage system. A plurality of page metadata corresponding to the plurality of pages is generated, where each of the plurality of page metadata is generated based on at least corresponding one row number of at least one labeled row data with records included in a corresponding one of the plurality of pages. Deduplication of duplicated records included the plurality of pages is facilitated based on the plurality of page metadata.Type: GrantFiled: March 15, 2022Date of Patent: August 22, 2023Assignee: Ocient Holdings LLCInventors: George Kondiles, Ravi V. Khadiwala, Donald Scott Clark, Anna Veselova
-
Patent number: 11704036Abstract: Systems and method for implementing deduplication process based on performance analyses. The system may include a processing device to determine a first performance metric associated with retrieving a second stored data block that is within a specified range of a duplicate of the first data block and a second performance metric associated with retrieving a hash value corresponding to the second stored data block. The processing device further to retrieve the second stored data block within a specified range of the duplicate of the first data block in response to the first performance metric not exceeding the second performance metric.Type: GrantFiled: November 16, 2018Date of Patent: July 18, 2023Assignee: PURE STORAGE, INC.Inventors: John Colgrove, Ronald Karr, Ethan L. Miller
-
Patent number: 11681660Abstract: Embodiments presented herein describe techniques for deduplicating chunks of data across multiple clusters. A process executing in a storage system identifies one or more chunks in an incoming stream of data. For each chunk, a first fingerprint corresponding to the chunk is generated. The process determines whether the first fingerprint matches a second fingerprint listed in a corresponding entry in a deduplication map. Each entry of the deduplication map corresponds to a chunk stored in a location in one of the storage clusters. Upon determining that the first fingerprint matches the second fingerprint, the process writes, to a local persistent storage, a pointer referencing the location in that storage cluster.Type: GrantFiled: January 22, 2021Date of Patent: June 20, 2023Assignee: Cohesity, Inc.Inventor: Ganesha Shanmuganathan
-
Patent number: 11675741Abstract: Methods and systems for improving data back-up, recovery, and search across different cloud-based applications, services, and platforms are described. A data management and storage system may direct compute and storage resources within a customer's cloud-based data storage account to back-up and restore data while the customer retains full control of their data. The data management and storage system may direct the compute and storage resources within the customer's cloud-based data storage account to generate and store secondary layers that are used for generating search indexes, to generate and store shared space layers and user specific layers to facilitate the deduplication of email attachments and text blocks, to perform a controlled restoration of email snapshots such that sensitive information (e.g., restricted keywords) located within stored snapshots remains protected, and to detect and preserve emails that were received or transmitted and then deleted between two consecutive snapshots.Type: GrantFiled: July 8, 2021Date of Patent: June 13, 2023Assignee: Rubrik, Inc.Inventors: Noel Moldvai, Jihang Lim
-
Patent number: 11663194Abstract: In one example, a method includes receiving, at a cloud storage site, chunks that each take the form of a hash of a combination that includes two or more salts and a file object, and one of the salts is a retention salt shared by the chunks, monitoring a time period associated with the retention salt, when the time period has expired, removing the chunks that include the retention salt, and depositing the removed chunks in a deleted items cloud store.Type: GrantFiled: October 28, 2021Date of Patent: May 30, 2023Assignee: EMC IP HOLDING COMPANY LLCInventor: Peter Marelas
-
Patent number: 11663178Abstract: A deduplicated storage system is provided according to certain embodiments that uses one or more mechanisms to assign the deduplication databases based on the type of the client device and automatically create a new deduplication database when critical thresholds are reached. In other embodiments, deduplication databases are further split into multiple database partitions. Based on a data block distribution policy, each data block is then further assigned to a particular database partition within the deduplication database to further improve efficiency and speed of the deduplication process.Type: GrantFiled: October 23, 2020Date of Patent: May 30, 2023Assignee: Commvault Systems, Inc.Inventor: Prasad Nara
-
Patent number: 11663196Abstract: In one example, a method includes receiving, at a cloud storage site, chunks that each take the form of a hash of a combination that includes two or more salts and a file object, and one of the salts is a retention salt shared by the chunks, monitoring a time period associated with the retention salt, when the time period has expired, removing the chunks that include the retention salt, and depositing the removed chunks in a deleted items cloud store.Type: GrantFiled: October 28, 2021Date of Patent: May 30, 2023Assignee: EMC IP HOLDING COMPANY LLCInventor: Peter Marelas
-
Patent number: 11663195Abstract: In one example, a method includes receiving, at a cloud storage site, chunks that each take the form of a hash of a combination that includes two or more salts and a file object, and one of the salts is a retention salt shared by the chunks, monitoring a time period associated with the retention salt, when the time period has expired, removing the chunks that include the retention salt, and depositing the removed chunks in a deleted items cloud store.Type: GrantFiled: October 28, 2021Date of Patent: May 30, 2023Assignee: EMC IP HOLDING COMPANY LLCInventor: Peter Marelas
-
Patent number: 11665377Abstract: Aspects of the subject disclosure may include, for example, a device having a processing system including a processor; and a memory that stores executable instructions that, when executed by the processing system, facilitate performance of operations including receiving encrypted hypertext transport protocol (HTTPS) traffic including media content; separating the HTTPS traffic into audio segments and video segments; calculating a size for each audio segment in the HTTPS traffic; maintaining a sliding window of a plurality of sizes of consecutive audio segments to form a fingerprint; and identifying the media content by matching the fingerprint with a reference in a catalog. Other embodiments are disclosed.Type: GrantFiled: April 23, 2021Date of Patent: May 30, 2023Assignee: AT&T Intellectual Property I, L.P.Inventors: Yuan Ding, Natalia Schenck, Daniel Sanchez, Umut Akyol, Lawrence E. Bakst, Vinay Sharma
-
Patent number: 11659006Abstract: An assessment component that facilitates assessment and enforcement of policies within a computer environment can comprise a compliance component that determines whether a policy, that defines one or more requirements associated with usage of one or more enterprise components of an enterprise computing system, is in compliance with a plurality of standardized policies that govern operation of the one or more enterprise components of the enterprise computing system. The assessment component can also comprise a policy optimization component that determines one or more changes to the policy that achieve the compliance with the plurality of standardized polices based on a determination that the policy complies with a first standardized policy of the plurality of standardized policies and fails to comply with a second standardized policy of the plurality of standardized policies.Type: GrantFiled: December 23, 2020Date of Patent: May 23, 2023Assignee: Kyndryl, Inc.Inventors: Milton H. Hernandez, Anup Kalia, Brian Peterson, Vugranam C. Sreedhar, Sai Zeng
-
Patent number: 11625167Abstract: An embodiment of a semiconductor apparatus may include technology to determine if a threshold is met based on runtime memory usage, and enable foreground memory deduplication if the threshold is determined to be met. Other embodiments are disclosed and claimed.Type: GrantFiled: November 16, 2018Date of Patent: April 11, 2023Assignee: Intel CorporationInventors: Dujian Wu, Yuping Yang, Donggui Yin
-
Patent number: 11609883Abstract: An apparatus in one embodiment comprises at least one processing device comprising a processor coupled to a memory. The processing device is configured to identify a dataset to be scanned to generate a compression estimate for that dataset, to designate a scan criterion to be utilized in the scan, and for each of a plurality of pages of the dataset, to scan the page, where scanning the page includes performing a computation on the page to obtain a page result, determining whether or not the page result satisfies the designated scan criterion, and responsive to the page result satisfying the designated scan criterion, updating a corresponding entry of a compression estimate table for the dataset. The processing device generates the compression estimate for the dataset based at least in part on contents of the compression estimate table.Type: GrantFiled: May 29, 2018Date of Patent: March 21, 2023Assignee: EMC IP Holding Company LLCInventors: Anton Kucherov, David Meiri
-
Patent number: 11599507Abstract: A file system may include an object storage, a merged index, and a distributed database. When a file is stored in the file system, the file may be converted to an object and be stored in the object storage. The deduplication index of the file may be stored in the distributed database. The namespace metadata of the file may be stored in the merged index. The merged index generates namespace entries of the file when the file is created, deleted, and/or modified. A namespace entry may be associated with a specific file and may include a creation version and a deletion version. When a file is deleted or modified, instead of modifying the existing namespace entries, new entries associated with different versions and including different creation or deletion versions are created. The status of a file may be monitored by one or more entries associated with a file.Type: GrantFiled: December 9, 2021Date of Patent: March 7, 2023Assignee: Druva Inc.Inventors: Milind Borate, Alok Kumar, Aditya Agrawal, Anup Agarwal, Somesh Jain, Aditya Kelkar, Yogendra Acharya, Anand Apte, Amit Kulkarni
-
Patent number: 11593028Abstract: A method of operating a computing device for processing data is provided. The method includes (a) monitoring a set of performance characteristics of the processing of the data; (b) periodically calculating, using a predefined set of coefficients, a linear combination of the monitored set of performance characteristics to yield a combined metric; and (c) upon detecting that the combined metric exceeds a threshold while operating in a first processing mode, transitioning from operating in the first processing mode to operating in a second processing mode. (1) The second processing mode has a higher bandwidth than the first processing mode, and (2) processing of data in the second processing mode is less robust than processing of data in the first processing mode. An apparatus, system, and computer program product for performing a similar method are also provided.Type: GrantFiled: March 11, 2021Date of Patent: February 28, 2023Assignee: EMC IP Holding Company LLCInventors: Vladimir Shveidel, Alexei Kabishcer
-
Patent number: 11573928Abstract: Techniques for processing data may include: receiving a data block stored in a data set, wherein a hash value is derived from the data block; determining, in accordance with selection criteria, whether the hash value is included in a subset; responsive to determining the hash value is included in the subset, performing processing that updates a table in accordance with the hash value and the data set, and determining, in accordance with the information in the table, whether to perform deduplication processing for the data block to determine whether the data block is a duplicate of another stored data block. The table may include an entry for the hash value. The entry may include information identifying data sets referencing the data block and, for each of the data sets, may specify a reference count denoting a number of times the data set references the data block.Type: GrantFiled: March 13, 2020Date of Patent: February 7, 2023Assignee: EMC IP Holding Company LLCInventors: Anton Kucherov, David Meiri
-
Patent number: 11573924Abstract: Methods and systems for storing and managing large numbers of small files. A data processing system includes clients that generate large numbers be stored on a storage device managed by a File System (FS). An Archive Server (AS) receives multiple files from the client, archives the files in larger archives, and sends the archives to the FS for storage. When requested to read a file, the AS retrieves the archive in which the file is stored, extracts the file and sends it to the requesting client. In other words, the AS communicates with the clients in individual file units, and with the storage device in archive units. The AS is typically constructed as an add-on layer on top of a conventional FS, which enables the FS to handle small files efficiently without modification.Type: GrantFiled: September 23, 2019Date of Patent: February 7, 2023Assignee: COGNYTE TECHNOLOGIES ISRAEL LTD.Inventor: Yossi Chai
-
Patent number: 11570196Abstract: A method for determining duplication of a vulnerability may include a vulnerability extraction step of extracting vulnerability uniform resource locator (URL) addresses including the vulnerability from an analysis target server; a hash generation step of generating the URL hash value corresponding to the extracted vulnerability from the vulnerability URL address; and a duplication determination step of determining, when the URL hash value is present in the first comparison table, that the vulnerability is duplicated and excluding the corresponding vulnerability from vulnerability information.Type: GrantFiled: February 26, 2020Date of Patent: January 31, 2023Assignee: NAVER CLOUD CORPORATIONInventors: Bong Goo Kang, Min Seob Lee, Won Tae Jang, June Ahn, Jihwan Yoon
-
Patent number: 11561863Abstract: A method for enabling data set changes to be reverted to a prior point in time or state is disclosed. In one embodiment, such a method includes providing a data set comprising one or more data elements and a specified number of generations of the data elements. In certain embodiments, the data set is a partitioned data set extended (PDSE) data set, and the data elements are “members” within the PDSE data set. The method further includes tracking changes made by a job to data elements of the data set. The method further references, in a data structure (also referred to herein as a “cluster”) associated with the job, previous generations of the data elements changed by the job. In certain embodiments, the data structure is stored in the data set. A corresponding system and computer program product are also disclosed.Type: GrantFiled: August 20, 2015Date of Patent: January 24, 2023Assignee: International Business Machines CorporationInventors: Trevor A. Geisler, David C. Reed, Thomas C. Reed, Max D. Smith
-
Patent number: 11539811Abstract: Systems, devices and methods for adaptive compression of stored information includes a memory management computing device programmed to monitor a size of a plurality of data structures stored in a data repository. The computing device compares the size of each of a plurality of data structures to a predetermined threshold. When a size of an uncompressed data structure meets the threshold, the memory management computing device calculates a value of a first compression parameter based on a value of a first parameter and a value of a second parameter of each data element of the uncompressed data structure, calculates a value of a second compression parameter based the value of the first parameter of each data element of the uncompressed data structure, generates a compressed data structure based on the value of the first compression parameter and the second compression parameter; and replaces, in the data repository, the uncompressed data structure with the compressed data structure.Type: GrantFiled: June 21, 2022Date of Patent: December 27, 2022Assignee: Chicago Mercantile Exchange Inc.Inventors: Fateen Sharaby, Sriram A. Raju Datla, Dhiraj Subhash Bawadhankar, John Charles Redfield, Justin Yeong-Juin Lee
-
Patent number: 11520744Abstract: Described is a system (and method) that intelligently distributes data within a clustered storage environment. To provide such a capability, the system may distribute backup files by considering a source of the data to be backed-up. In particular, the system may leverage the ability of front-end components such as a backup application to perform a granular data source identification of data. Such information may be propagated to back-end components such as a storage filesystem in the form of a data source identifier (e.g. placement tag). The data source identifiers may then be accessed by the clustered storage system to intelligently distribute backup files amongst a set of storage nodes forming a cluster. For example, backup files from the same data source may be stored on the same storage node to obtain the same deduplication efficiency as a single storage system.Type: GrantFiled: August 21, 2019Date of Patent: December 6, 2022Assignee: EMC IP Holding Company LLCInventors: Abhishek Rajimwale, George Mathew, Murthy Mamidi, Donna Barry Lewis
-
Patent number: 11514054Abstract: Supervised partitioning is used to perform record matching. A request to identify matches between records is received. A graph representation that indicates similarities between the records is partitioned and an evaluation of the partitioning is performed according to a supervised machine learning technique to generate a confidence value in the partitioning. An indication of equivalent records according to the partitioning and the confidence value of the partitioning may be provided.Type: GrantFiled: September 27, 2018Date of Patent: November 29, 2022Assignee: Amazon Technologies, Inc.Inventors: Andrew Borthwick, Robert Anthony Barton, Jr., Stephen Michael Ash, Russell Reas
-
Patent number: 11514025Abstract: Performing snapshot conscious internal file modification for network-attached storage is presented herein. A file system can comprise a first component configured to modify, during a service request, storage for a subset of data blocks of a file—the service request not being recognized by an external entity as a change of content of the file. Further, the file system can comprise a second component configured to prevent, based on the service request, a copy of the storage from being created for servicing of a snapshot—the snapshot comprising a point-in-time copy of the file system.Type: GrantFiled: August 19, 2019Date of Patent: November 29, 2022Assignee: EMC IP HOLDING COMPANY LLCInventor: Ravi V. Batchu
-
Patent number: 11500841Abstract: Systems, computer-implemented methods, and computer program products that can facilitate encoding a tree data structure into a vector based on a set of constraints are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a constraint former that can form a set of constraints based on a first tree data structure and a vector encoder that can encode the first tree data structure into a vector based on the set of constraints.Type: GrantFiled: January 4, 2019Date of Patent: November 15, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Achille Fokoue-Nkoutche, Maxwell Crouse, Michael Witbrock, Ryan A. Musa, Maria Chang
-
Patent number: 11474700Abstract: Technologies for compressing communications for accelerator devices are disclosed. An accelerator device may include a communication abstraction logic units to manage communication with one or more remote accelerator devices. The communication abstraction logic unit may receive communication to and from a kernel on the accelerator device. The communication abstraction logic unit may compress and decompress the communication without instruction from the corresponding kernel. The communication abstraction logic unit may choose when and how to compress communications based on telemetry of the accelerator device and the remote accelerator device.Type: GrantFiled: April 30, 2019Date of Patent: October 18, 2022Assignee: Intel CorporationInventors: Susanne M. Balle, Evan Custodio, Francesc Guim Bernat
-
Patent number: 11468063Abstract: The subject technology provides information, corresponding to properties of a build side of a join operation, to a bloom filter. The subject technology, based at least in part on the information from the bloom filter, determines, during executing of a query plan, at least one property of the join operation to determine whether to switch an aggregation operator to a pass through mode, the at least one property comprising at least a reduction rate. The subject technology, switches, in response to the reduction rate being below a threshold value, the aggregation operator to the pass through mode during runtime of the query plan and, while the aggregation operator is in the pass through mode, an input stream of data goes through the aggregation operator without being analyzed and the input stream of data matches an output stream of data flowing out of the aggregation operator.Type: GrantFiled: April 16, 2021Date of Patent: October 11, 2022Assignee: Snowflake Inc.Inventors: Bowei Chen, Thierry Cruanes, Florian Andreas Funke, Allison Waingold Lee, Jiaqi Yan
-
Patent number: 11461269Abstract: A data management device includes a persistent storage and a processor. The persistent storage includes an object storage. The processor segments a file into file segments. The processor generates meta-data of the file segments. The processor stores a portion of the file segments in a data object of the object storage. The processor stores a portion of the meta-data of the file segments in a meta-data object of the object storage.Type: GrantFiled: July 21, 2017Date of Patent: October 4, 2022Assignee: EMC IP HOLDING COMPANYInventors: Shuang Liang, Mahesh Kamat, Bhimsen Bhanjois
-
Patent number: 11429573Abstract: A data deduplication system includes a data deduplication subsystem coupled to each of a host system and a storage system. The data deduplication system receives data from the host system, generates a data deduplication identifier for the data, and determines whether the data deduplication identifier for the data is stored in a data deduplication database. In response to determining that the data deduplication identifier is not stored in the data deduplication database, the data deduplication system stores the data deduplication identifier for the data in the data deduplication database in association with a data counter for the data, and transmits the data to the storage system for storage. In response to determining that the data deduplication identifier is stored in the data deduplication database, the data deduplication system increments a data counter that is associated with the data deduplication identifier in the data deduplication database, and discards the data.Type: GrantFiled: October 16, 2019Date of Patent: August 30, 2022Assignee: Dell Products L.P.Inventors: Dharmesh M. Patel, Ravikanth Chaganti, Rizwan Ali
-
Patent number: 11429634Abstract: In some embodiments, an interface of a content management system manages synchronized content on storage systems. For example, the interface stores, on a metadata storage structure, records of metadata associated with blocks of data stored on a storage, the records including block identifiers that uniquely identify the blocks and timestamps associated with the blocks. The interface identifies a batch of storage operations associated with the blocks, including one or more delete operations. For each delete operation, the interface queries the metadata storage structure for a timestamp corresponding to a block of data associated with the delete operation, determines whether the delete operation creates a race condition between the delete operation and an add operation associated with the block of data, and rejects the delete operation when the delete operation creates the race condition or the timestamp corresponding to the block of data is newer than a predetermined period of time.Type: GrantFiled: December 29, 2017Date of Patent: August 30, 2022Assignee: Dropbox, Inc.Inventors: Nipunn Koorapati, Daniel Horn, Elmer Charles Jubb, IV
-
Patent number: 11429575Abstract: Example methods, apparatus, systems and articles of manufacture (e.g., physical storage media) to deduplicate common devices across multiple data sources are disclosed. An example system includes a comparison controller to identify a first device in a first data source and a second device in a second data source as a possible common device.Type: GrantFiled: July 10, 2020Date of Patent: August 30, 2022Assignee: THE NIELSEN COMPANY (US), LLCInventors: Rachel Worth Olson, Michael Evan Anderson, Rishi Sriram, Margaret M. Orton, Fatemehossadat Miri, Samantha M. Mowrer, David J. Kurzynski, Molly Poppie
-
Patent number: 11423027Abstract: A system and method for a text search of a database, including converting a text search expression to a query plan and implementing the text search as the query plan on the database. The implementing of the text search includes a one-pass indexing as a single scan of an inverse index table associated with the database.Type: GrantFiled: January 29, 2016Date of Patent: August 23, 2022Assignee: MICRO FOCUS LLCInventors: Qiming Chen, Meichun Hsu, Malu G. Castellanos
-
Patent number: 11416316Abstract: A first-to-second correlation engine determines correlations between first objects from a first object feed, and second objects from a second object storage, and generates first correlation messages indicative of the correlations for a first-to-second object direction and a second-to-first object direction. A second-to-first correlation engine determines respective correlations between the second objects from a second object feed and the first objects from a first object storage, and generates second correlation messages indicative of the respective correlations for the second-to-first object direction and the first-to-second object direction. A first-to-second correlation storage engine receives the first and second correlation messages for the first-to-second object direction and updates first-to-second correlation storage based on the received messages.Type: GrantFiled: October 15, 2020Date of Patent: August 16, 2022Assignee: AMADEUS S.A.S.Inventors: Serge Beuzit, Jean-Samuel Pasquali
-
Patent number: 11409766Abstract: Disclosed herein is the creation of probabilistic data structures for container reclamation. One method involves retrieving a segment object list of a data container and creating a probabilistic data structure. The segment object list comprises a plurality of segment objects, the data container comprises the plurality of segment objects and a plurality of data objects, and each segment object of the plurality of segment objects comprises a hash value determined by performing a hashing function on a corresponding data object of the plurality of data objects. The creating includes, for each segment object in the segment object list, identifying an element of a plurality of elements of the probabilistic data structure using a hash value of the each segment object and setting the element to indicate the segment object references a corresponding data object of the plurality of data objects.Type: GrantFiled: October 26, 2020Date of Patent: August 9, 2022Assignee: Veritas Technologies LLCInventors: Yingsong Jia, Xin Wang, Guangbin Zhang
-
Patent number: 11403266Abstract: A method for deleting a row from a table in a database system comprises logically deleting the row in the first table in the database system by inserting a key of the row into a corresponding row of a dedicated table in the database system; querying the dedicated table during a query against the first table to identify the corresponding row in the dedicated table; and in response to identifying the corresponding row in the dedicated table, deleting the row from the first table and the corresponding row from the dedicated table as part of query processing during a subsequent query.Type: GrantFiled: June 4, 2019Date of Patent: August 2, 2022Assignee: International Business Machines CorporationInventors: Andreas Brodt, Oliver Koeth, Daniel Martin, Knut Stolze
-
Patent number: 11403019Abstract: A method includes receiving a request to write a data block to a volume resident on a multi-tenant storage array, wherein the request is associated with a first tenant of the multi-tenant storage array, and determining whether the data block matches an existing data block on the multi-tenant storage array, wherein the existing block corresponds to a second tenant. In response to determining that the decrypted data block matches the existing data block: encrypting the existing data block with a shared volume encryption key; encrypting the shared volume encryption key with a first tenant encryption key and providing the shared volume encryption key encrypted with the first tenant encryption key to the first tenant; and encrypting the shared volume encryption key with a second tenant encryption key and providing the shared volume encryption key encrypted with the second tenant encryption key to the second tenant.Type: GrantFiled: October 26, 2018Date of Patent: August 2, 2022Assignee: Pure Storage, Inc.Inventors: Swapnil Chandrashekhar Nagle, Virendra Prakashaiah, Ronald Karr