Patents by Inventor James Robert Benton

James Robert Benton has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Scalable chunk store for data deduplication

Patent number: 10394757

Abstract: Data streams may be stored in a chunk store in the form of stream maps and data chunks. Data chunks corresponding to a data stream may be stored in a chunk container, and a stream map corresponding to the data stream may point to the data chunks in the chunk container. Multiple stream maps may be stored in a stream container, and may point to the data chunks in the chunk container in a manner that duplicate data chunks are not present. Techniques are provided herein for localizing the storage of related data chunks in such chunk containers, for locating data chunks stored in chunk containers, for storing data streams in chunk stores in localized manners that enhance locality and decrease defragmentation, and for reorganizing stored data streams in chunks stores.

Type: Grant

Filed: November 18, 2010

Date of Patent: August 27, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chun Ho (Ian) Cheung, Paul Adrian Oltean, Ran Kalach, Abhishek Gupta, James Robert Benton, Ronakkumar Desai
Using index partitioning and reconciliation for data deduplication

Patent number: 9785666

Abstract: The subject disclosure is directed towards a data deduplication technology in which a hash index service's index is partitioned into subspace indexes, with less than the entire hash index service's index cached to save memory. The subspace index is accessed to determine whether a data chunk already exists or needs to be indexed and stored. The index may be divided into subspaces based on criteria associated with the data to index, such as file type, data type, time of last usage, and so on. Also described is subspace reconciliation, in which duplicate entries in subspaces are detected so as to remove entries and chunks from the deduplication system. Subspace reconciliation may be performed at off-peak time, when more system resources are available, and may be interrupted if resources are needed. Subspaces to reconcile may be based on similarity, including via similarity of signatures that each compactly represents the subspace's hashes.

Type: Grant

Filed: July 13, 2015

Date of Patent: October 10, 2017

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jin Li, Sudipta Sengupta, Ran Kalach, Ronakkumar N. Desai, Paul Adrian Oltean, James Robert Benton
USING INDEX PARTITIONING AND RECONCILIATION FOR DATA DEDUPLICATION

Publication number: 20160012098

Abstract: The subject disclosure is directed towards a data deduplication technology in which a hash index service's index is partitioned into subspace indexes, with less than the entire hash index service's index cached to save memory. The subspace index is accessed to determine whether a data chunk already exists or needs to be indexed and stored. The index may be divided into subspaces based on criteria associated with the data to index, such as file type, data type, time of last usage, and so on. Also described is subspace reconciliation, in which duplicate entries in subspaces are detected so as to remove entries and chunks from the deduplication system. Subspace reconciliation may be performed at off-peak time, when more system resources are available, and may be interrupted if resources are needed. Subspaces to reconcile may be based on similarity, including via similarity of signatures that each compactly represents the subspace's hashes.

Type: Application

Filed: July 13, 2015

Publication date: January 14, 2016

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Jin Li, Sudipta Sengupta, Ran Kalach, Ronakkumar N. Desai, Paul Adrian Oltean, James Robert Benton
Using index partitioning and reconciliation for data deduplication

Patent number: 9110936

Abstract: The subject disclosure is directed towards a data deduplication technology in which a hash index service's index is partitioned into subspace indexes, with less than the entire hash index service's index cached to save memory. The subspace index is accessed to determine whether a data chunk already exists or needs to be indexed and stored. The index may be divided into subspaces based on criteria associated with the data to index, such as file type, data type, time of last usage, and so on. Also described is subspace reconciliation, in which duplicate entries in subspaces are detected so as to remove entries and chunks from the deduplication system. Subspace reconciliation may be performed at off-peak time, when more system resources are available, and may be interrupted if resources are needed. Subspaces to reconcile may be based on similarity, including via similarity of signatures that each compactly represents the subspace's hashes.

Type: Grant

Filed: December 28, 2010

Date of Patent: August 18, 2015

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Jin Li, Sudipta Sengupta, Ran Kalach, Ronakkumar N. Desai, Paul Adrian Oltean, James Robert Benton
Optimization of a partially deduplicated file

Patent number: 8990171

Abstract: The subject disclosure is directed towards transforming a file having at least one undeduplicated portion into a fully deduplicated file. For each of the at least one undeduplicated portion, a deduplication mechanism defines at least one chunk between file offsets associated with the at least one undeduplicated portion. Chunk boundaries associated with the at least one chunk are stored within deduplication metadata. The deduplication mechanism aligns the at least one chunk with chunk boundaries of at least one deduplicated portion of the file. Then, the at least one chunk is committed to a chunk store.

Type: Grant

Filed: September 1, 2011

Date of Patent: March 24, 2015

Assignee: Microsoft Corporation

Inventors: Ran Kalach, Kashif Hasan, Paul Adrian Oltean, James Robert Benton, Chun Ho Cheung, Abhishek Gupta
Partial recall of deduplicated files

Patent number: 8645335

Abstract: The subject disclosure is directed towards changing a file from a fully deduplicated state to a partially deduplicated state in which some of the file data is deduplicated in a chunk store, and some is recalled into the file, that is, in the file's storage volume. A partial recall mechanism such as in a file system filter tracks (e.g., via a bitmap in a file reparse point) whether file data is maintained in the chunk store or has been recalled to the file. Data is recalled from the chunk store as needed, and committed (e.g., flushed) to the file. Also described is efficiently returning the file to a fully deduplicated state by using the tracking information to determine which parts of the file are already deduplicated into the chunk store so as to avoid their further deduplication processing.

Type: Grant

Filed: December 16, 2010

Date of Patent: February 4, 2014

Assignee: Microsoft Corporation

Inventors: Abhishek Gupta, Ran Kalach, Chun Ho Cheung, James Robert Benton, Joerg-Thomas Pfenning
Optimization of a Partially Deduplicated File

Publication number: 20130060739

Abstract: The subject disclosure is directed towards transforming a file having at least one undeduplicated portion into a fully deduplicated file. For each of the at least one undeduplicated portion, a deduplication mechanism defines at least one chunk between file offsets associated with the at least one undeduplicated portion. Chunk boundaries associated with the at least one chunk are stored within deduplication metadata. The deduplication mechanism aligns the at least one chunk with chunk boundaries of at least one deduplicated portion of the file. Then, the at least one chunk is committed to a chunk store.

Type: Application

Filed: September 1, 2011

Publication date: March 7, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Ran Kalach, Kashif Hasan, Paul Adrian Oltean, James Robert Benton, Chun Ho Cheung, Abhishek Gupta
VERIFYING A DATA RECOVERY COMPONENT USING A MANAGED INTERFACE

Publication number: 20130054533

Abstract: The subject disclosure is directed towards verifying a data recovery component of a volume snapshot service using a managed interface. The managed interface enables interoperability between the data recovery component and one or more complementary data recovery components by converting compatible instructions for the data recovery component and a complementary data recovery component into native data recovery operations for the volume snapshot service and vice versa. Via the managed interface, the complementary data recovery component emulates the native data recovery operations. Using status information associated with such an emulation, the data recovery component is verifiable.

Type: Application

Filed: August 24, 2011

Publication date: February 28, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Howard Hao, James Robert Benton, Thothathri Vanamamalai
Extensible pipeline for data deduplication

Patent number: 8380681

Abstract: The subject disclosure is directed towards data deduplication (optimization) performed by phases/modules of a modular data deduplication pipeline. At each phase, the pipeline allows modules to be replaced, selected or extended, e.g., different algorithms can be used for chunking or compression based upon the type of data being processed. The pipeline facilitates secure data processing, batch processing, and parallel processing. The pipeline is tunable based upon feedback, e.g., by selecting modules to increase deduplication quality, performance and/or throughput. Also described is selecting, filtering, ranking, sorting and/or grouping the files to deduplicate, e.g., based upon properties and/or statistical properties of the files and/or a file dataset and/or internal or external feedback.

Type: Grant

Filed: December 16, 2010

Date of Patent: February 19, 2013

Assignee: Microsoft Corporation

Inventors: Paul Adrian Oltean, Ran Kalach, Ahmed M. El-Shimi, James Robert Benton
Using Index Partitioning and Reconciliation for Data Deduplication

Publication number: 20120166401

Abstract: The subject disclosure is directed towards a data deduplication technology in which a hash index service's index is partitioned into subspace indexes, with less than the entire hash index service's index cached to save memory. The subspace index is accessed to determine whether a data chunk already exists or needs to be indexed and stored. The index may be divided into subspaces based on criteria associated with the data to index, such as file type, data type, time of last usage, and so on. Also described is subspace reconciliation, in which duplicate entries in subspaces are detected so as to remove entries and chunks from the deduplication system. Subspace reconciliation may be performed at off-peak time, when more system resources are available, and may be interrupted if resources are needed. Subspaces to reconcile may be based on similarity, including via similarity of signatures that each compactly represents the subspace's hashes.

Type: Application

Filed: December 28, 2010

Publication date: June 28, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Jin Li, Sudipta Sengupta, Ran Kalach, Ronakkumar N. Desai, Paul Adrian Oltean, James Robert Benton
Partial Recall of Deduplicated Files

Publication number: 20120158675

Abstract: The subject disclosure is directed towards changing a file from a fully deduplicated state to a partially deduplicated state in which some of the file data is deduplicated in a chunk store, and some is recalled into the file, that is, in the file's storage volume. A partial recall mechanism such as in a file system filter tracks (e.g., via a bitmap in a file reparse point) whether file data is maintained in the chunk store or has been recalled to the file. Data is recalled from the chunk store as needed, and committed (e.g., flushed) to the file. Also described is efficiently returning the file to a fully deduplicated state by using the tracking information to determine which parts of the file are already deduplicated into the chunk store so as to avoid their further deduplication processing.

Type: Application

Filed: December 16, 2010

Publication date: June 21, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Abhishek Gupta, Ran Kalach, Chun Ho Cheung, James Robert Benton, Joerg-Thomas Pfenning
Extensible Pipeline for Data Deduplication

Publication number: 20120158672

Abstract: The subject disclosure is directed towards data deduplication (optimization) performed by phases/modules of a modular data deduplication pipeline. At each phase, the pipeline allows modules to be replaced, selected or extended, e.g., different algorithms can be used for chunking or compression based upon the type of data being processed. The pipeline facilitates secure data processing, batch processing, and parallel processing. The pipeline is tunable based upon feedback, e.g., by selecting modules to increase deduplication quality, performance and/or throughput. Also described is selecting, filtering, ranking, sorting and/or grouping the files to deduplicate, e.g., based upon properties and/or statistical properties of the files and/or a file dataset and/or internal or external feedback.

Type: Application

Filed: December 16, 2010

Publication date: June 21, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Paul Adrian Oltean, Ran Kalach, Ahmed M. El-Shimi, James Robert Benton
GARBAGE COLLECTION AND HOTSPOTS RELIEF FOR A DATA DEDUPLICATION CHUNK STORE

Publication number: 20120159098

Abstract: Techniques for garbage collecting unused data chunks in storage are provided. According to one implementation, data chunks stored in a chunk container that are unused are identified based an analysis of one or more stream map chunks indicated as deleted. The identified data chunks are indicated as deleted. The storage space in the chunk container filled by the data chunks indicated as deleted may then be reclaimed. Techniques for selectively backing up data chunks are also provided. According to one implementation, a data chunk is received for storing in a chunk container. A backup copy of the received data chunk is stored in a backup container if the received data chunk is in a predetermined top percentage of most referenced data chunks in the chunk container and has a number of references greater than a predetermined reference threshold.

Type: Application

Filed: December 17, 2010

Publication date: June 21, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Chun Ho (Ian) Cheung, Paul Adrian Oltean, James Robert Benton
SCALABLE CHUNK STORE FOR DATA DEDUPLICATION

Publication number: 20120131025

Abstract: Data streams may be stored in a chunk store in the form of stream maps and data chunks. Data chunks corresponding to a data stream may be stored in a chunk container, and a stream map corresponding to the data stream may point to the data chunks in the chunk container. Multiple stream maps may be stored in a stream container, and may point to the data chunks in the chunk container in a manner that duplicate data chunks are not present. Techniques are provided herein for localizing the storage of related data chunks in such chunk containers, for locating data chunks stored in chunk containers, for storing data streams in chunk stores in localized manners that enhance locality and decrease defragmentation, and for reorganizing stored data streams in chunks stores.

Type: Application

Filed: November 18, 2010

Publication date: May 24, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Chun Ho (Ian) Cheung, Paul Adrian Oltean, Ran Kalach, Abhishek Gupta, James Robert Benton, Ronakkumar Desai