Data Cleansing, Data Scrubbing, And Deleting Duplicates Patents (Class 707/692)
-
Patent number: 12265503Abstract: Techniques are described for selectively extending a WORM lock expiration time for a chunkfile. An example method comprises identifying, by a data platform implemented by a computing system, a chunkfile that includes a chunk that matches data for an object of a file system; determining, by the data platform after identifying the chunkfile, whether to deduplicate the data for the object of the file system by adding a reference to the matching chunk, wherein determining whether to deduplicate the data comprises applying a policy to at least one of a property of the chunkfile or properties of one or more of a plurality of chunks included in the chunkfile; and in response to determining to not deduplicate the data for the object of the file system, causing a new chunk for the data for the object of the file system to be stored in a different, second chunkfile.Type: GrantFiled: March 14, 2023Date of Patent: April 1, 2025Assignee: Cohesity, Inc.Inventors: Aiswarya Bhavani Shankar, Dane Van Dyck, Venkata Ranga Radhanikanth Guturi, Leo Prasath Arulraj
-
Patent number: 12238066Abstract: Techniques are disclosed for processing data packets and implementing policies in a software defined network (SDN) of a virtual computing environment. A plurality of computing nodes are communicatively coupled to network devices. The computing nodes are configured to provide at least one cloud edge processing function. The network devices are configured to enable communications between virtual machines within a virtual network of the virtual computing environment in accordance with associated policies. The network devices and the processing function are disaggregated from dependencies on particular computing nodes that are hosting the virtual machines.Type: GrantFiled: February 18, 2022Date of Patent: February 25, 2025Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Deepak Bansal, Gerald Roy Degrace
-
Patent number: 12197467Abstract: Methods for establishing a second database and maintaining synchronization between a first database and the second database in a data management system are described. According to the method, a snapshot of a state of the first database may be acquired and mounted to a second server. The second database may be restored to the second server based on the mount. The second database may replicate the state of the first database. Synchronization may be enabled between the first database and the second database. One or more metrics associated with replication of data between the databases may be identified. A backup process for transaction logs associated with the first database may be initiated and the transaction logs may be mounted to the second server based on the metrics. One or more transactions may be applied to the second database based on the transaction logs mounted to the second server.Type: GrantFiled: February 4, 2022Date of Patent: January 14, 2025Assignee: Rubrik, Inc.Inventors: Bala Sunil Kandi, Peter John Milanese
-
Patent number: 12189586Abstract: Example methods, apparatus, systems and articles of manufacture (e.g., physical storage media) to deduplicate common devices across multiple data sources are disclosed. An example apparatus includes instructions to identify a first device in a first data source and a second device in a second data source as a possible common device, calculate at least one of a station duration metric, a time match metric or a station path metric, the station duration metric, the time match metric based times of day that the first device tuned to a second set of stations and times of day that the second device tuned to the second set of stations, determine a score based on the at least one of the station duration metric, the time match metric, or the station path metric, and determine when the first device and the second device are a common device based on the score.Type: GrantFiled: August 29, 2022Date of Patent: January 7, 2025Assignee: The Nielsen Company (US), LLCInventors: Rachel Worth Olson, Michael Evan Anderson, Rishi Sriram, Margaret M. Orton, Fatemehossadat Miri, Samantha M. Mowrer, David J. Kurzynski, Molly Poppie
-
Patent number: 12189572Abstract: Computing systems methods, and non-transitory storage media are provided for obtaining images, extracting layers from each of the images, extracting segments from each of the layers, generating a compressed version of the segments by storing a single copy of each segment and metadata to reconstruct the layers from the segments and the images from the layers, and simulating a reconstruction of the image from the compressed version.Type: GrantFiled: June 13, 2023Date of Patent: January 7, 2025Assignee: Palantir Technologies Inc.Inventors: Ashray Jain, Bradley Moylan, Callum Rogers, Charissa Sonder Plattner
-
Patent number: 12190335Abstract: Methods, apparatus, systems, and articles of manufacture to generate reference signature assets from meter signatures are disclosed. Example apparatus disclosed herein include a signature comparator to compare meter signature strings with search signature strings to identify a first fragment match result, which is associated with a sequence position within a first media represented by the search signature strings included in the first fragment match result, and which is also associated with a length of the first media. Disclosed example apparatus also include candidate signature asset generation circuitry to generate a candidate signature asset from a meter signature sequence based on the sequence position and the length of the first media, and store the candidate signature asset in a candidate pool associated with the first media.Type: GrantFiled: October 29, 2021Date of Patent: January 7, 2025Assignee: The Nielsen Company (US), LLCInventors: Albert T. Borawski, Geetanjali Arya, Satish Kumar Kukunuru
-
Patent number: 12189488Abstract: One example method includes receiving from a node, in an HSAN that includes multiple nodes, an ADD_DATA request to add an entry to a distributed ledger of the HSAN, the request comprising a user ID that identifies the node, a hash of a data segment, and a storage location of the data segment at the node, performing a challenge-and-response process with the node to verify that the node has a copy of the data that was the subject of the entry, making a determination that a replication factor X has not been met, and adding the entry to the distributed ledger upon successful conclusion of the challenge-and-response process.Type: GrantFiled: July 31, 2023Date of Patent: January 7, 2025Assignee: EMC IP Holding Company LLCInventors: Arun Murti, Joey C. Lei, Adam E. Brenner, Mark D. Malamut
-
Patent number: 12182088Abstract: A method includes generating a plurality of pages from a plurality of records received from a plurality of data sources. Deduplication of the plurality of pages is facilitated based on a plurality of page metadata of the plurality of pages based on, for the each page of the plurality of pages. A filtered set of potentially-intersecting pages is identified for each given page as a proper subset of the plurality of pages stored in the page storage system based on first comparison parameters, and an intersecting set of pages that include a row number intersection with the given page is identified as a proper subset of the filtered set of potentially-intersecting pages based on second comparison parameters. Records with records with row numbers included in row number intersections with other pages in the intersecting set of pages are removed from the each page.Type: GrantFiled: September 15, 2023Date of Patent: December 31, 2024Assignee: Ocient Holdings LLCInventors: George Kondiles, Ravi V. Khadiwala, Donald Scott Clark, Anna Veselova
-
Patent number: 12174848Abstract: A computer-implemented method is provided for an automated extract, transform, and load process for a target database comprising linked data. During the data transformation phase linked data elements are added as data to a data set.Type: GrantFiled: August 19, 2019Date of Patent: December 24, 2024Assignee: ONTOFORCR NVInventors: Kenny Knecht, Paul Vauterin, Hans Constandt
-
Patent number: 12164477Abstract: A repository of replicated chunk files is analyzed to identify chunk files that meet at least a portion of combination criteria. Selected chunk files are associated together under a data protection grouping container. Erasure coding is applied to the data protection grouping container including by utilizing the selected chunk files as different data stripes of the erasure coding and generating one or more parity stripes based on the different data stripes.Type: GrantFiled: January 24, 2022Date of Patent: December 10, 2024Assignee: Cohesity, Inc.Inventors: Apurv Gupta, Akshat Agarwal, Manvendra Singh Tomar, Donthula Akshith Reddy, Kushal Singh, Tarun Kumar Yadav, Mandar Suresh Naik
-
Patent number: 12164493Abstract: A method for inserting a KV pair to a separated database, the method may include receiving a request to insert the KV pair to the separated database, wherein the separated database comprises a log structured merge (LSM) tree and KV database that is separated from LSM tree; determining whether the KV pair should be associated with a versioned LSM entry or with a non-versions LSM entry; and inserting the KV pair and a KV timestamp in the separated database according to the determining; wherein the inserting includes: storing a combination of the value and the KV timestamp in the KV database; defining an access key to the KV database; wherein the access key is based on the combination when determining that the KV pair should be associated with a versioned LSM; and wherein the access key is based on the key and not on the timestamp when determining that the KV pair should be associated with a non-versioned LSM.Type: GrantFiled: February 14, 2022Date of Patent: December 10, 2024Assignee: Pliops Ltd.Inventors: Guy Guetta, Edward Bortnikov, Michael Pan, Moshe Twitto, Tamar Weiss, Shmuel Dashevsky, Niv Dayan
-
Patent number: 12164799Abstract: Data associated with a source system is ingested. After the data is ingested, a post-processing metadata conversion process is performed including by selecting an entry of a chunk metadata data structure and determining that a data chunk associated with the selected entry is not referenced by at least a threshold number of objects. In response to determining that the data chunk associated with the selected entry is not referenced by at least the threshold number of objects, metadata of a tree data structure node corresponding to a chunk identifier associated with the data chunk is updated to store a reference to a chunk file storing the data chunk and the selected entry is removed from the chunk metadata data structure.Type: GrantFiled: August 28, 2023Date of Patent: December 10, 2024Assignee: Cohesity, Inc.Inventors: Zhihuan Qiu, Sachin Jain, Anubhav Gupta, Apurv Gupta, Mohit Aron
-
Patent number: 12159110Abstract: The present disclosure relates to systems, methods, and computer-readable media for utilizing a concept graphing system to determine and provide relationships between concepts within document collections or corpora. For example, the concept graphing system can generate and utilize machine-learning models, such as a sparse graph recovery machine-learning model, to identify less-obvious correlations between concepts, including positive and negative concept connections, as well as provide these connections within a visual concept graph. Additionally, the concept graphing system can provide a visual concept graph that determines and displays concept correlations based on the input of a single concept, multiple concepts, or no concepts.Type: GrantFiled: June 6, 2022Date of Patent: December 3, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Harsh Shrivastava, Maurice Diesendruck, Robin Abraham
-
Patent number: 12158869Abstract: A method of obtaining and imputing missing data and a measurement system having the same are disclosed.Type: GrantFiled: January 4, 2023Date of Patent: December 3, 2024Assignees: SAMSUNG ELECTRONICS CO., LTD., KOREA UNIVERSITY RESEARCH AND BUSINESS FOUNDATIONInventors: Seongwook Yoon, Sanghoon Sull, Jaehyun Kim, Heejeong Lim
-
Patent number: 12153555Abstract: A system for data space limitation includes and interface and a processor. The interface is configured to receive a query for a structured data set. The processor is configured to determine an ordered list for calculations to respond to the query; perform the calculations according to the ordered list until an allowed time required for interactivity is reached; and in response to the allowed time being reached, provide results of the calculations.Type: GrantFiled: May 23, 2024Date of Patent: November 26, 2024Assignee: Workday, Inc.Inventors: Viktor Brada, Peter Fedorocko, Filip Dousek, Hynek Walner
-
Patent number: 12147557Abstract: Computer systems and associated methods are disclosed to implement the non-interactive join of privacy-preserving dataset sketches. In some embodiments, an entity can publish a one-time sketch of their dataset that would enable another entity to join their data without exposing private information. The sketch can map, using a hash function, the identities associated with a first value of the dataset to a data structure, in some embodiments. A same or different entity can join the first sketch with a privacy-preserving second sketch of a second dataset that includes added noise, and can determine an estimate of a number of identities that correspond with specific values of the first and second datasets from the joined dataset. The sketch can be published just one time, and therefore does not require separate new private computations with privacy budgeting for each additional party when a join is desired, in some embodiments.Type: GrantFiled: June 30, 2022Date of Patent: November 19, 2024Assignee: Amazon Technologies, Inc.Inventors: James Alexander Cook, Nina Mishra
-
Patent number: 12135691Abstract: A method for storing a received data chunk (DC) in a storage system, the method includes (a) obtaining a received fingerprint of the received DC, the received fingerprint may include received fingerprint elements that are indicative of occurrences, within the received DC, of content elements, the received fingerprint elements are ordered according to a given order; (b) searching, within a tree, for a similar stored fingerprint; the tree may include tree nodes that represent multiple stored fingerprints of stored data chunks that are stored in the storage system; different levels of the tree are allocated to different content elements; (c) compressing, when finding the similar stored fingerprint, the received DC based on a similar DC associated with the similar stored fingerprint, and updating storage system metadata to indicate that the received DC is stored in the storage system in a compressed form, and based on the similar stored DC.Type: GrantFiled: October 26, 2022Date of Patent: November 5, 2024Assignee: VAST DATA LTD.Inventors: Yogev Vaknin, Niko Farhi, Asaf Levy
-
Patent number: 12093187Abstract: Logical address space portions and virtual layer blocks (VLBs) can be partitioned into multiple sets. Each of multiple nodes in a system can be assigned exclusive ownership of one of the multiple sets. In at least one embodiment, for a read I/O which is received at a first node and directed to a logical address LA1 that is owned by a second node, the first node can request that the second owning node perform resolution processing for LA1. The second node can return either a VLB address or a PLB address based on whether the second node owns a VLB used in mapping LA1 to a corresponding physical location PA1 which includes content C1 stored at LA1. The second node can set a flag in its response to indicate whether a returned address is a VLB address or a PLB address.Type: GrantFiled: March 31, 2023Date of Patent: September 17, 2024Assignee: Dell Products L.P.Inventors: Vladimir Shveidel, Uri Shabi, Dror Zalstein
-
Patent number: 12074953Abstract: The present disclosure relates to generating, updating, modifying, and otherwise managing configurations for virtual services on a cloud computing system. The present disclosure provides example implementations of a configuration management system and configuration handlers on respective server nodes that receive and process requests for modifying one or more configurations that manage operation of virtual services on the cloud. Systems described herein involve leveraging a hierarchical model of configuration characteristics to facilitate both large and small scale modifications. Moreover, the systems described herein leverage a persistent store on server nodes to identify how to update a current base configuration and sub-version as well as synchronize modifications across a set of server nodes.Type: GrantFiled: September 23, 2021Date of Patent: August 27, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Sameer Kumar Patro, Aritra Basu, Raghavendra Subhash
-
Patent number: 12061581Abstract: Example implementations relate to metadata operations in a storage system. An example includes generating, by a storage controller of a deduplication storage system, a candidate list of container indexes for matching operations of a received data segment, each container index in the candidate list having an associated match cost; identifying, by the storage controller, a journal group associated with a first container index listed in the candidate list; reducing, by the storage controller, a match cost associated with the first container index in response to a determination that the identified journal group is in a modified state; and performing, by the storage controller, the matching operations of the received data segment based at least on the reduced match cost of the first container index.Type: GrantFiled: July 26, 2022Date of Patent: August 13, 2024Assignee: Hewlett Packard Enterprise Development LPInventors: Aman Sahil, Richard Phillip Mayo
-
Patent number: 12050879Abstract: A device may generate first scores for sentences of text based on a cumulative frequency of words in each sentence, may generate second scores for the sentences based on a cumulative frequency of domain entities in each sentence, and may generate third scores for the sentences based on a sentiment analysis of each sentence. The device may generate a summary of the text, may filter the sentences to extract a first set of sentences, may filter the sentences to extract a second set of sentences, and may filter the sentences to extract a third set of sentences. The device may identify and assign weights to a first group of sentences, a second group of sentences, and a third group of sentences, may generate a ranked list of sentences based on the weighted first group, second group, and third group, and may perform actions based on the final summary.Type: GrantFiled: May 24, 2022Date of Patent: July 30, 2024Assignee: Verizon Patent and Licensing Inc.Inventors: Prakash Ranganathan, Miruna Jayakrishnasamy
-
Patent number: 12050790Abstract: Aspects of the present disclosure configure a memory sub-system processor to manage memory operations with repeating data patterns. The processor receives a request to write a block of data comprising a plurality of portions to a set of memory components and determines whether a pattern of data repeats across the plurality of portions of the block of data. In response to determining that the pattern of data repeats across the plurality of portions, the processor stores a representation of the pattern of data in a mapping table and discards the block of data to prevent storing the block of data on the set of memory components.Type: GrantFiled: August 16, 2022Date of Patent: July 30, 2024Assignee: Micron Technology, Inc.Inventor: Anoop Achuthan Rajendrababu
-
Patent number: 12045211Abstract: One example method includes collaborative deduplication. A deduplication engine implemented at a cloud level collaborates or coordinates with an extension engine of the deduplication at an edge node. This allows data ingested at a node to be collaboratively deduplicated prior to transfer to the cloud and after transfer to the cloud.Type: GrantFiled: October 27, 2020Date of Patent: July 23, 2024Assignee: EMC IP HOLDING COMPANY LLCInventors: Mohamed Sohail, Karim Fathy, Robert A. Lincourt
-
Patent number: 12032534Abstract: A method and system is used in managing deduplication of data in storage systems. A first digest for a deduplication candidate is received. At least one stream associated with the deduplication candidate is detected. At least one neighboring digest segment of a first loaded digest segment associated with the at least one stream is loaded. Whether the digest is located in the at least one neighboring digest segment is determined. If the digest is not located in the at least one neighboring digest segment, the digest is processed.Type: GrantFiled: August 2, 2019Date of Patent: July 9, 2024Assignee: EMC IP Holding Company LLCInventors: Nickolay Dalmatov, Richard Ruef, Kurt Everson
-
Patent number: 12026386Abstract: A method for differential compression includes receiving input data blocks that are selected for compression. For each input data block, the input data block is divided into at least two segments. For each of the at least two segments, a similarity degree between the respective segment and each of the data blocks excluding the respective data block is computed. For each of the at least two segments, the data block which has a biggest similarity degree with the respective segment among the data blocks excluding the respective data block is selected as an optimal reference data block for the respective segment. The differential compression is applied to the input data block and optimal reference blocks in response to determining a differential compression that is to be applied based on the similarity degree between the segments of the input data block and the corresponding optimal reference blocks.Type: GrantFiled: September 23, 2022Date of Patent: July 2, 2024Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventor: Assaf Natanzon
-
Patent number: 12001685Abstract: A plurality of data stripes and one or more parity stripes are generated using a plurality of data chunks stored in a write-ahead log based on an erasure coding configuration. The plurality of data stripes and the one or more parity stripes are stored on corresponding different storage devices. The plurality of data stripes and the one or more parity stripes are associated together under a data protection grouping container.Type: GrantFiled: March 31, 2022Date of Patent: June 4, 2024Assignee: Cohesity, Inc.Inventors: Apurv Gupta, Akshat Agarwal
-
Patent number: 11995467Abstract: Systems, devices, and methods are provided for validation, deletion, and/or recovery of resources in a service environment. A machine (e.g., server) may receive a request to identify or discover a list of resources that are unused in a service environment. A machine (e.g., server) may receive a request to delete one or more resources in a service environment. In at least one embodiment, deletion of a resource involves a two-stage process where the resource is recoverably deleted in a first stage (e.g., by deactivating or disabling the resource) such that the resource can be recovered prior to a predetermined time period by reactivating or re-enabling the resource and, in a second stage, the resource is unrecoverably deleted.Type: GrantFiled: July 14, 2021Date of Patent: May 28, 2024Assignee: Amazon Technologies, Inc.Inventors: Suresh Prakash Goacher, Arun Anilkumar, Nishit Nihal Vas
-
Patent number: 11977527Abstract: In certain embodiments, machine learning and lineage data may be used to manage data. In some embodiments, a computing system may use lineage data to identify two datasets that may be related. The computing system may determine that a user has access to a derivative dataset but does not have access to an original dataset that was used to create the derivative dataset. In response, the computing system may use a machine learning model to generate a similarity score indicating a level of similarity between the original dataset and the derivative dataset. If the similarity score satisfies a threshold score, the computing system may modify access rights of the user so that the user is unable to access a portion of the data in the derivative dataset.Type: GrantFiled: January 3, 2022Date of Patent: May 7, 2024Assignee: Capital One Services, LLCInventors: William Ye, Jon Stofer, Thomas J. O'Connor, Jose Moreno
-
Patent number: 11966630Abstract: A data storage device includes a memory device and a controller coupled to the memory device. The controller is configured to segment a key to physical (K2P) table into two or more segments, wherein each segment of the two or more segments corresponds to a caching priority of key value (KV) pair data, organize the K2P table by storing and relocating one or more K2P table entries into a respective segment of the two or more segments, wherein the storing and relocating comprises moving a K2P table entry based on the caching priority of the KV pair data into the respective segment having the caching priority, and utilize the K2P table to manage KV pair data stored in the memory device, wherein utilizing the K2P table comprises applying a same management operation, such as prefetching, to each K2P table entry of a same segment.Type: GrantFiled: June 27, 2022Date of Patent: April 23, 2024Assignee: Western Digital Technologies, Inc.Inventors: Ran Zamir, Alexander Bazarsky, David Avraham
-
Patent number: 11954331Abstract: A computer-implemented method enables workload scheduling in a storage system for optimized deduplication. The method includes determining dynamic correlations of deduplications between workload processes in a prior time window. Workload processes include one or more tasks with defined execution timing parameters. The method further includes determining deduplication ratios based on the correlations of the deduplications between the workload processes. The method further includes scheduling multiple workload processes based on a highest determined deduplication ratio of the determined deduplication ratios.Type: GrantFiled: October 7, 2021Date of Patent: April 9, 2024Assignee: International Business Machines CorporationInventors: Miles Mulholland, Anuj Chandra, Kirsty G. Rodwell, Jorden Luke Allcock
-
Patent number: 11949751Abstract: The present disclosure relates to restricting electronic activities from being linked with record objects. According to at least one aspect of the disclosure, a method can include accessing, by one or more processors, a plurality of electronic activities, accessing a plurality of record objects of one or more systems of record, identifying an electronic activity of the plurality of electronic activities to match to one or more record objects, determining a data source provider associated with providing access to the electronic activity, and identifying a system of record corresponding to the determined data source provider. The system of record can include a plurality of candidate record objects to which to match the electronic activity. The method can include restricting the electronic activity from being linked with the at least one record object.Type: GrantFiled: January 23, 2023Date of Patent: April 2, 2024Inventors: Oleg Rogynskyy, Tetiana Lutsaievska, John Wulf, Sathya Hariesh Prakash
-
Patent number: 11936931Abstract: Methods, apparatus, systems and articles of manufacture to perform media device asset qualification are disclosed. An example apparatus includes at least one memory, and at least one processor to execute instructions to at least identify a first set of candidate media device assets for disqualification, the candidate media device assets including A) a signature and B) a media identifier that identifies media, generate a hash table using a second set of the candidate media device assets, determine one or more counts of matches between C) a first signature and a first media identifier of a first candidate media device asset of the second set and D) respective signatures and media identifiers of multiple ones of the second set using the hash table, the multiple ones of the second set not including the first candidate media device asset, and load the first signature into a reference database as a reference signature.Type: GrantFiled: October 17, 2022Date of Patent: March 19, 2024Assignee: The Nielsen Company (US), LLCInventors: Daniel Nelson, James Petro, Albert T. Borawski
-
Patent number: 11934346Abstract: A cloud computing infrastructure hosts a web service with customer accounts. In a customer account, files of the customer account are listed in an index. Files indicated in the index are arranged in groups, with files in each group being scanned using scanning serverless functions in the customer account. The files in the customer account include a compressed tar archive of a software container. Member files of a compressed tar archive in a customer account are randomly-accessed by way of locators that indicate a tar offset, a logical offset, and a decompressor state for a corresponding member file. A member file is accessed by seeking to the tar offset in the compressed tar archive, restoring a decompressor to the decompressor state, decompressing the compressed tar archive using the decompressor, and moving to the logical offset in the decompressed data.Type: GrantFiled: October 17, 2022Date of Patent: March 19, 2024Assignee: Trend Micro IncorporatedInventor: Brendan M. Johnson
-
Patent number: 11914554Abstract: Methods and systems for improving data back-up, recovery, and search across different cloud-based applications, services, and platforms are described. A data management and storage system may direct compute and storage resources within a customer's cloud-based data storage account to back-up and restore data while the customer retains full control of their data. The data management and storage system may direct the compute and storage resources within the customer's cloud-based data storage account to generate and store secondary layers that are used for generating search indexes, to generate and store shared space layers and user specific layers to facilitate the deduplication of email attachments and text blocks, to perform a controlled restoration of email snapshots such that sensitive information (e.g., restricted keywords) located within stored snapshots remains protected, and to detect and preserve emails that were received or transmitted and then deleted between two consecutive snapshots.Type: GrantFiled: January 30, 2023Date of Patent: February 27, 2024Assignee: Rubrik, Inc.Inventors: Noel Moldvai, Jihang Lim
-
Patent number: 11907133Abstract: Standardized address generation from address substrings includes receiving an address string for a place-of-interest, one-to-many mapping at least one of a plurality of address substrings of the address string to respective address components, concatenating the address substrings using a template that specifies an order of concatenating the address substrings, and making the concatenated address substrings available for further use.Type: GrantFiled: July 29, 2022Date of Patent: February 20, 2024Assignee: SafeGraph, Inc.Inventor: Vera Sazonova
-
Patent number: 11893373Abstract: Techniques are disclosed for deploying functions in a cloud computing environment. Parameters are annotated in a plurality of Helm charts with a predetermined token. Duplicated values in the Helm charts are identified and the predetermined token is reused for the duplicated values. Schema files from the plurality of Helm charts are parsed to extract the predetermined tokens. Input data are received as values for the predetermined tokens. The function is deployed in the cloud computing environment using the values for the predetermined tokens as parameters in the Helm charts.Type: GrantFiled: January 28, 2022Date of Patent: February 6, 2024Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Frank John D'Innocenzo, Kam Yee Lee
-
Patent number: 11886397Abstract: Provided are methods and systems for determining multi-faceted trust scores for data. A method may commence with receiving data and determining a plurality of metadata items associated with the data. The method may continue with determining one or more facets associated with each of the plurality of metadata items. The method may further include determining a parameter and a weight associated with each of the one or more facets. Upon determining the parameter and the weight, a trust score associated with each of the plurality of metadata items may be calculated based on the parameter and the weight associated with each of the one or more facets. The method may further include calculating a multi-faceted trust score of the data based on the trust score of each of the plurality of metadata items.Type: GrantFiled: February 19, 2020Date of Patent: January 30, 2024Assignee: ASG Technologies Group, Inc.Inventors: Jean-Philippe Moresmau, Marcus MacNeill
-
Patent number: 11888936Abstract: A method for providing a proxy redirect to facilitate a storage and a retrieval of an object is disclosed. The method includes receiving a mapping of a user to a logical container that stores the object and to a storage provider that stores the logical container; receiving a key corresponding to the logical container and associated with the user; storing the mapping and the key in a database; generating, for the user, an application protocol that redirects to a pre-signed web address based on the stored mapping and the stored key; and transmitting, via a communication interface, the application protocol to the one user. The method further includes the user using the application protocol to directly access the storage provider and retrieve the object.Type: GrantFiled: July 1, 2020Date of Patent: January 30, 2024Assignee: JPMORGAN CHASE BANK, N.A.Inventor: Zachariah Antonas
-
Patent number: 11853326Abstract: A technology for retrieving data from a database. The technology includes receiving a search query specifying a target attribute and a target attribute value, accessing an index to determine one or more target files in which the target attribute value appears, the index including a plurality of attribute values, and for each of the attribute values, one or more files in which the attribute value appears, and retrieving data from the one or more target files.Type: GrantFiled: October 14, 2021Date of Patent: December 26, 2023Assignee: Google LLCInventors: Hossein Ahmadi, Guang Cheng, Yannis Sismanis, Huong Thi Thu Phan, Shiyu Xie, Leo Chen, Zewen Zhang, Jing Jing Long, Amir Hossein Hormati
-
Patent number: 11836175Abstract: Semantic search techniques via focused summarizations are described. For example, a search query is received for a text-based content item in a data set comprising a plurality of text-based content items. A first feature vector representative of the search query is obtained. A respective semantic similarity score is determined between the first feature vector and each of a plurality of second feature vectors. Each of the second feature vectors is representative of a machine-generated summarization of a respective text-based content item. The machine-generated summarization comprises a plurality of multi-word fragments that are selected from the respective text-based content item via a transformer-based machine learning model. A search result is provided responsive to the search query.Type: GrantFiled: June 29, 2022Date of Patent: December 5, 2023Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Itzik Malkiel, Noam Koenigstein, Oren Barkan, Jonathan Ephrath, Yonathan Weill, Nir Nice
-
Patent number: 11797220Abstract: Data is ingested from a source system including by storing a plurality of data chunks in one or more chunk files and storing corresponding chunk identifiers associated with the plurality of data chunks in a first data structure. After data ingestion is complete, one or more duplicate data chunks that were stored during the data ingestion are determined and a second data structure is updated to include one or more entries corresponding to one or more determined duplicate data chunks.Type: GrantFiled: August 20, 2021Date of Patent: October 24, 2023Assignee: Cohesity, Inc.Inventors: Zhihuan Qiu, Sachin Jain, Anubhav Gupta, Apurv Gupta, Mohit Aron
-
Patent number: 11797486Abstract: A device configured to identify a file in a network device, to generate a first set of block hash codes for data blocks for a first instance of the file, and to generate a second set of block hash codes for data blocks for a second instance of the file. The device is further configured to determine the first set of block hash codes matches the second set of block hash codes and to generate an entry in a file list for the instances of the file. The device is further configured to count the number of entries that are associated with the file and to determine the number of entries is greater than the redundancy threshold value. The device is further configured to delete one or more instances of the file in response to determining that the number of entries is greater than the redundancy threshold value.Type: GrantFiled: January 3, 2022Date of Patent: October 24, 2023Assignee: Bank of America CorporationInventors: Pratap Dande, Gilberto R. Dos Santos, Jayabalaji Murugan, Murali M. Atyam, Manoj Bohra
-
Patent number: 11782878Abstract: A deduplicated storage system storing objects receives a search term. Storage includes metadata and segments into which the objects have been split and deduplicated. The metadata includes fingerprint sequences according to which the segments should be assembled. A partial match is found when a prefix of the term is found at an end of a segment or a suffix is found at a beginning of the segment. A fingerprint of the segment having the partial match is recorded. A first sequence of fingerprints associated with a first object is read to check whether any fingerprints in the first sequence have been recorded. When a fingerprint in the sequence has been recorded, a check of a next fingerprint in the sequence is made to see if it has been recorded as having the partial match. If the next fingerprint has been recorded, the first object is reported as having the term.Type: GrantFiled: December 14, 2021Date of Patent: October 10, 2023Assignee: EMC IP Holding Company LLCInventor: Philip Shilane
-
Patent number: 11748014Abstract: Host computers running applications that store data on a block-based storage system such as a SAN provide hints that differentiate IO data based on which application generated the IO. The hints may include tags that are associated with IO commands sent to the block-based storage system. Each host application is associated with a unique identifier that is placed in the tag. Application name-to-identifier mappings may be sent from the hosts to the block-based storage system. Per-identifier/application deduplication statistics are maintained by the block-based storage system and shared with other block-based storage system. Deduplication is disabled or de-emphasized for IO data generated by applications with statistically low deduplication ratios.Type: GrantFiled: February 14, 2020Date of Patent: September 5, 2023Assignee: DELL PRODUCTS L.P.Inventors: Kurumurthy Gokam, Md Haris Iqbal, Prasad Paple, Kundan Kumar
-
Patent number: 11734239Abstract: A record processing and storage system is operable to receive a plurality of labeled row data from a data source. Each labeled row data of the plurality of labeled row data includes at least one record and a corresponding row number of a plurality of row numbers. A plurality of pages are generated from records included in the labeled row data. The plurality of pages are stored via a page storage system. A plurality of page metadata corresponding to the plurality of pages is generated, where each of the plurality of page metadata is generated based on at least corresponding one row number of at least one labeled row data with records included in a corresponding one of the plurality of pages. Deduplication of duplicated records included the plurality of pages is facilitated based on the plurality of page metadata.Type: GrantFiled: March 15, 2022Date of Patent: August 22, 2023Assignee: Ocient Holdings LLCInventors: George Kondiles, Ravi V. Khadiwala, Donald Scott Clark, Anna Veselova
-
Patent number: 11704036Abstract: Systems and method for implementing deduplication process based on performance analyses. The system may include a processing device to determine a first performance metric associated with retrieving a second stored data block that is within a specified range of a duplicate of the first data block and a second performance metric associated with retrieving a hash value corresponding to the second stored data block. The processing device further to retrieve the second stored data block within a specified range of the duplicate of the first data block in response to the first performance metric not exceeding the second performance metric.Type: GrantFiled: November 16, 2018Date of Patent: July 18, 2023Assignee: PURE STORAGE, INC.Inventors: John Colgrove, Ronald Karr, Ethan L. Miller
-
Patent number: 11681660Abstract: Embodiments presented herein describe techniques for deduplicating chunks of data across multiple clusters. A process executing in a storage system identifies one or more chunks in an incoming stream of data. For each chunk, a first fingerprint corresponding to the chunk is generated. The process determines whether the first fingerprint matches a second fingerprint listed in a corresponding entry in a deduplication map. Each entry of the deduplication map corresponds to a chunk stored in a location in one of the storage clusters. Upon determining that the first fingerprint matches the second fingerprint, the process writes, to a local persistent storage, a pointer referencing the location in that storage cluster.Type: GrantFiled: January 22, 2021Date of Patent: June 20, 2023Assignee: Cohesity, Inc.Inventor: Ganesha Shanmuganathan
-
Patent number: 11675741Abstract: Methods and systems for improving data back-up, recovery, and search across different cloud-based applications, services, and platforms are described. A data management and storage system may direct compute and storage resources within a customer's cloud-based data storage account to back-up and restore data while the customer retains full control of their data. The data management and storage system may direct the compute and storage resources within the customer's cloud-based data storage account to generate and store secondary layers that are used for generating search indexes, to generate and store shared space layers and user specific layers to facilitate the deduplication of email attachments and text blocks, to perform a controlled restoration of email snapshots such that sensitive information (e.g., restricted keywords) located within stored snapshots remains protected, and to detect and preserve emails that were received or transmitted and then deleted between two consecutive snapshots.Type: GrantFiled: July 8, 2021Date of Patent: June 13, 2023Assignee: Rubrik, Inc.Inventors: Noel Moldvai, Jihang Lim
-
Patent number: 11663194Abstract: In one example, a method includes receiving, at a cloud storage site, chunks that each take the form of a hash of a combination that includes two or more salts and a file object, and one of the salts is a retention salt shared by the chunks, monitoring a time period associated with the retention salt, when the time period has expired, removing the chunks that include the retention salt, and depositing the removed chunks in a deleted items cloud store.Type: GrantFiled: October 28, 2021Date of Patent: May 30, 2023Assignee: EMC IP HOLDING COMPANY LLCInventor: Peter Marelas
-
Patent number: 11663195Abstract: In one example, a method includes receiving, at a cloud storage site, chunks that each take the form of a hash of a combination that includes two or more salts and a file object, and one of the salts is a retention salt shared by the chunks, monitoring a time period associated with the retention salt, when the time period has expired, removing the chunks that include the retention salt, and depositing the removed chunks in a deleted items cloud store.Type: GrantFiled: October 28, 2021Date of Patent: May 30, 2023Assignee: EMC IP HOLDING COMPANY LLCInventor: Peter Marelas