Patents by Inventor Paul Adrian Oltean

Paul Adrian Oltean has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Content aware chunking for achieving an improved chunk size distribution

Patent number: 8918375

Abstract: The subject disclosure is directed towards partitioning a file into chunks that satisfy a chunk size restriction, such as maximum and minimum chunk sizes, using a sliding window. For file positions within the chunk size restriction, a signature representative of a window fingerprint is compared with a target pattern, with a chunk boundary candidate identified if matched. Other signatures and patterns are then checked to determine a highest ranking signature (corresponding to a lowest numbered Rule) to associate with that chunk boundary candidate, or set an actual boundary if the highest ranked signature is matched. If the maximum chunk size is reached without matching the highest ranked signature, the chunking mechanism regresses to set the boundary based on the candidate with the next highest ranked signature (if no candidates, the boundary is set at the maximum). Also described is setting chunk boundaries based upon pattern detection (e.g., runs of zeros).

Type: Grant

Filed: August 31, 2011

Date of Patent: December 23, 2014

Assignee: Microsoft Corporation

Inventors: Jin Li, Sudipta Sengupta, Sanjeev Mehrotra, Ran Kalach, Paul Adrian Oltean
ALTERNATE DATA STREAM CACHE FOR FILE CLASSIFICATION

Publication number: 20140351225

Abstract: Described is caching classification-related metadata for a file in an alternate data stream of that file. When a file is classified (e.g., for data management), the classification properties are cached in association with the file, along with classification-related metadata that indicates the state of the file at the time of caching. The classification-related metadata in the alternate data stream is then useable in determining whether the classification properties are valid and up-to-date when next accessed, or whether the file needs to be reclassified. If the properties are valid and up-to-date, they may be used without requiring the computationally costly steps of reclassification. Also described is using more than one alternate data stream for the cache, and extending the classification-related metadata through a defined extension mechanism.

Type: Application

Filed: August 11, 2014

Publication date: November 27, 2014

Applicant: Microsoft Corporation

Inventors: Clyde Law, Paul Adrian Oltean, Ran Kalach, Nir Ben-Zvi, Matthias H. Wollnik
PREDICTING DATA COMPRESSIBILITY USING DATA ENTROPY ESTIMATION

Publication number: 20140244604

Abstract: The subject disclosure is directed towards predicting compressibility of a data block, and using the predicted compressibility in determining whether a data block if compressed will be sufficiently compressible to justify compression. In one aspect, data of the data block is processed to obtain an entropy estimate of the data block, e.g., based upon distinct value estimation. The compressibility prediction may be used in conjunction with a chunking mechanism of a data deduplication system.

Type: Application

Filed: February 28, 2013

Publication date: August 28, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Paul Adrian Oltean, Cosmin A. Rusu, Arnd Christian König, Mark Steven Manasse, Jin Li, Sudipta Sengupta, Sanjeev Mehrotra
Alternate data stream cache for file classification

Patent number: 8805837

Abstract: Described is caching classification-related metadata for a file in an alternate data stream of that file. When a file is classified (e.g., for data management), the classification properties are cached in association with the file, along with classification-related metadata that indicates the state of the file at the time of caching. The classification-related metadata in the alternate data stream is then useable in determining whether the classification properties are valid and up-to-date when next accessed, or whether the file needs to be reclassified. If the properties are valid and up-to-date, they may be used without requiring the computationally costly steps of reclassification. Also described is using more than one alternate data stream for the cache, and extending the classification-related metadata through a defined extension mechanism.

Type: Grant

Filed: October 26, 2009

Date of Patent: August 12, 2014

Assignee: Microsoft Corporation

Inventors: Clyde Law, Paul Adrian Oltean, Ran Kalach, Nir Ben-Zvi, Matthias H. Wollnik
Integrated Data Deduplication and Encryption

Publication number: 20140189348

Abstract: The subject disclosure is directed towards encryption and deduplication integration between computing devices and a network resource. Files are partitioned into data blocks and deduplicated via removal of duplicate data blocks. Using multiple cryptographic keys, each data block is encrypted and stored at the network resource but can only be decrypted by an authorized user, such as domain entity having an appropriate deduplication domain-based cryptographic key. Another cryptographic key referred to as a content-derived cryptographic key ensures that duplicate data blocks encrypt to substantially equivalent encrypted data.

Type: Application

Filed: December 31, 2012

Publication date: July 3, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Ahmed Moustafa El-Shimi, Paul Adrian Oltean, Ran Kalach, Sudipta Sengupta, Jin Li, Roy D'Souza, Omkant Pandey, Ramarathnam Venkatesan
DATA ERROR DETECTION AND CORRECTION USING HASH VALUES

Publication number: 20140181575

Abstract: The subject disclosure is directed towards a data storage service that uses hash values, such as substantially collision-free hash values, to maintain data integrity. These hash values are persisted in the form of mappings corresponding to data blocks in one or more data stores. If a data error is detected, these mappings allow the data storage service to search the one or more data stores for data blocks having matching hash values. If a data block is found that corresponds to a hash value for a corrupted or lost data block, the data storage service uses that data block to repair the corrupted or lost data block.

Type: Application

Filed: December 21, 2012

Publication date: June 26, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Ran Kalach, Kashif Hasan, Paul Adrian Oltean, James R. Benton, Chun Ho Cheung, Ahmed Moustafa El-Shimi
Optimization of a Partially Deduplicated File

Publication number: 20130060739

Abstract: The subject disclosure is directed towards transforming a file having at least one undeduplicated portion into a fully deduplicated file. For each of the at least one undeduplicated portion, a deduplication mechanism defines at least one chunk between file offsets associated with the at least one undeduplicated portion. Chunk boundaries associated with the at least one chunk are stored within deduplication metadata. The deduplication mechanism aligns the at least one chunk with chunk boundaries of at least one deduplicated portion of the file. Then, the at least one chunk is committed to a chunk store.

Type: Application

Filed: September 1, 2011

Publication date: March 7, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Ran Kalach, Kashif Hasan, Paul Adrian Oltean, James Robert Benton, Chun Ho Cheung, Abhishek Gupta
Content Aware Chunking for Achieving an Improved Chunk Size Distribution

Publication number: 20130054544

Abstract: The subject disclosure is directed towards partitioning a file into chunks that satisfy a chunk size restriction, such as maximum and minimum chunk sizes, using a sliding window. For file positions within the chunk size restriction, a signature representative of a window fingerprint is compared with a target pattern, with a chunk boundary candidate identified if matched. Other signatures and patterns are then checked to determine a highest ranking signature (corresponding to a lowest numbered Rule) to associate with that chunk boundary candidate, or set an actual boundary if the highest ranked signature is matched. If the maximum chunk size is reached without matching the highest ranked signature, the chunking mechanism regresses to set the boundary based on the candidate with the next highest ranked signature (if no candidates, the boundary is set at the maximum). Also described is setting chunk boundaries based upon pattern detection (e.g., runs of zeros).

Type: Application

Filed: August 31, 2011

Publication date: February 28, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Jin Li, Sudipta Sengupta, Sanjeev Mehrotra, Ran Kalach, Paul Adrian Oltean
Extensible pipeline for data deduplication

Patent number: 8380681

Abstract: The subject disclosure is directed towards data deduplication (optimization) performed by phases/modules of a modular data deduplication pipeline. At each phase, the pipeline allows modules to be replaced, selected or extended, e.g., different algorithms can be used for chunking or compression based upon the type of data being processed. The pipeline facilitates secure data processing, batch processing, and parallel processing. The pipeline is tunable based upon feedback, e.g., by selecting modules to increase deduplication quality, performance and/or throughput. Also described is selecting, filtering, ranking, sorting and/or grouping the files to deduplicate, e.g., based upon properties and/or statistical properties of the files and/or a file dataset and/or internal or external feedback.

Type: Grant

Filed: December 16, 2010

Date of Patent: February 19, 2013

Assignee: Microsoft Corporation

Inventors: Paul Adrian Oltean, Ran Kalach, Ahmed M. El-Shimi, James Robert Benton
Creating host-level application-consistent backups of virtual machines

Patent number: 8321377

Abstract: A host server hosting one or more virtual machines can back up host volumes and the one or more virtual machines installed thereon in an application-consistent manner. In one implementation, a host-level requestor instructs a host-level writer to identify which virtual machines qualify for application-consistent backups. The host-level requestor then instructs the host-level writer to initiate virtual machine backups through guest-level requesters in each appropriately-configured virtual machine, wherein the virtual machines create application-consistent backups within the virtual machine volumes. The host-level requester then initiates snapshots of the server volumes on the host-level. The virtual machine-level snapshots can thus be retrieved from within the host-level snapshots of the server volumes.

Type: Grant

Filed: April 17, 2006

Date of Patent: November 27, 2012

Assignee: Microsoft Corporation

Inventors: Michael L. Michael, William L. Scheidel, Paul Brandon Luber, Paul Adrian Oltean, Ran Kalach
DETERMINATION OF LANDMARKS

Publication number: 20120259897

Abstract: Hash values corresponding to a file are processed in windows to determine a minimum hash value for each window. Each window may begin at a minimum hash value determined for a previous window and end after a fixed number of hash values. If a hash value is less than a threshold hash value, it is added to a buffer that is used to store the hash values in sorted order for a current window. If a hash value is greater than the threshold, it is added to another buffer whose hash values are not stored in sorted order. At the end of the current window, the minimum hash value in the first buffer is selected as the landmark for the window. If the first buffer is empty, then the hash values in the other buffer are sorted and the minimum hash value is selected as the landmark for the window.

Type: Application

Filed: April 7, 2011

Publication date: October 11, 2012

Applicant: Microsoft Corporation

Inventors: Mark S. Manasse, Arnd Christian König, Paul Adrian Oltean
BACKUP AND RESTORE STRATEGIES FOR DATA DEDUPLICATION

Publication number: 20120233417

Abstract: Techniques for backup and restore of optimized data streams are described. A chunk store includes each optimized data stream as a plurality of chunks including at least one data chunk and corresponding optimized stream metadata. The chunk store includes data chunks in a deduplicated manner. Optimized data streams stored in the chunk store are identified for backup. At least a portion of the chunk store is stored in backup storage according to an optimized backup technique, an un-optimized backup technique, an item level backup technique, or a data chunk identifier backup technique. Optimized data streams stored in the backup storage may be restored. A file reconstructor includes a callback module that generates calls to a restore application to request optimized stream metadata and any referenced data chunks from the backup storage. The file reconstructor reconstructs the data streams from the referenced data chunks.

Type: Application

Filed: March 11, 2011

Publication date: September 13, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Ran Kalach, Chun Ho (Ian) Cheung, Paul Adrian Oltean, Mathew James Dickson
Using Index Partitioning and Reconciliation for Data Deduplication

Publication number: 20120166401

Abstract: The subject disclosure is directed towards a data deduplication technology in which a hash index service's index is partitioned into subspace indexes, with less than the entire hash index service's index cached to save memory. The subspace index is accessed to determine whether a data chunk already exists or needs to be indexed and stored. The index may be divided into subspaces based on criteria associated with the data to index, such as file type, data type, time of last usage, and so on. Also described is subspace reconciliation, in which duplicate entries in subspaces are detected so as to remove entries and chunks from the deduplication system. Subspace reconciliation may be performed at off-peak time, when more system resources are available, and may be interrupted if resources are needed. Subspaces to reconcile may be based on similarity, including via similarity of signatures that each compactly represents the subspace's hashes.

Type: Application

Filed: December 28, 2010

Publication date: June 28, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Jin Li, Sudipta Sengupta, Ran Kalach, Ronakkumar N. Desai, Paul Adrian Oltean, James Robert Benton
GARBAGE COLLECTION AND HOTSPOTS RELIEF FOR A DATA DEDUPLICATION CHUNK STORE

Publication number: 20120159098

Abstract: Techniques for garbage collecting unused data chunks in storage are provided. According to one implementation, data chunks stored in a chunk container that are unused are identified based an analysis of one or more stream map chunks indicated as deleted. The identified data chunks are indicated as deleted. The storage space in the chunk container filled by the data chunks indicated as deleted may then be reclaimed. Techniques for selectively backing up data chunks are also provided. According to one implementation, a data chunk is received for storing in a chunk container. A backup copy of the received data chunk is stored in a backup container if the received data chunk is in a predetermined top percentage of most referenced data chunks in the chunk container and has a number of references greater than a predetermined reference threshold.

Type: Application

Filed: December 17, 2010

Publication date: June 21, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Chun Ho (Ian) Cheung, Paul Adrian Oltean, James Robert Benton
Extensible Pipeline for Data Deduplication

Publication number: 20120158672

Abstract: The subject disclosure is directed towards data deduplication (optimization) performed by phases/modules of a modular data deduplication pipeline. At each phase, the pipeline allows modules to be replaced, selected or extended, e.g., different algorithms can be used for chunking or compression based upon the type of data being processed. The pipeline facilitates secure data processing, batch processing, and parallel processing. The pipeline is tunable based upon feedback, e.g., by selecting modules to increase deduplication quality, performance and/or throughput. Also described is selecting, filtering, ranking, sorting and/or grouping the files to deduplicate, e.g., based upon properties and/or statistical properties of the files and/or a file dataset and/or internal or external feedback.

Type: Application

Filed: December 16, 2010

Publication date: June 21, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Paul Adrian Oltean, Ran Kalach, Ahmed M. El-Shimi, James Robert Benton
Data Deduplication in a Virtualization Environment

Publication number: 20120151177

Abstract: Techniques are described herein that are capable of optimizing (i.e., deduplicating) data in a virtualization environment. For example, optimization designations (a.k.a. deduplication designations) may be assigned to respective regions of a virtualized storage file. A virtualized storage file is a file that is configured to be mounted as a disk or a volume to provide a file system interface for accessing hosted files. In accordance with this example, each optimization designation indicates an extent to which the respective region is to be optimized (i.e., deduplicated). In another example, a virtualized storage file is mounted to provide a virtual disk that includes hosted files. In accordance with this example, optimization designations are assigned to the respective hosted files. In further accordance with this example, each optimization designation indicates an extent to which the respective hosted file is to be optimized.

Type: Application

Filed: December 14, 2010

Publication date: June 14, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Ran Kalach, Paul Adrian Oltean, Cristian G. Teodorescu, Mathew James Dickson
SCALABLE CHUNK STORE FOR DATA DEDUPLICATION

Publication number: 20120131025

Abstract: Data streams may be stored in a chunk store in the form of stream maps and data chunks. Data chunks corresponding to a data stream may be stored in a chunk container, and a stream map corresponding to the data stream may point to the data chunks in the chunk container. Multiple stream maps may be stored in a stream container, and may point to the data chunks in the chunk container in a manner that duplicate data chunks are not present. Techniques are provided herein for localizing the storage of related data chunks in such chunk containers, for locating data chunks stored in chunk containers, for storing data streams in chunk stores in localized manners that enhance locality and decrease defragmentation, and for reorganizing stored data streams in chunks stores.

Type: Application

Filed: November 18, 2010

Publication date: May 24, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Chun Ho (Ian) Cheung, Paul Adrian Oltean, Ran Kalach, Abhishek Gupta, James Robert Benton, Ronakkumar Desai
Fast and Low-RAM-Footprint Indexing for Data Deduplication

Publication number: 20110276781

Abstract: The subject disclosure is directed towards a data deduplication technology in which a hash index service's index maintains a hash index in a secondary storage device such as a hard drive, along with a compact index table and look-ahead cache in RAM that operate to reduce the I/O to access the secondary storage device during deduplication operations. Also described is a session cache for maintaining data during a deduplication session, and encoding of a read-only compact index table for efficiency.

Type: Application

Filed: December 28, 2010

Publication date: November 10, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Sudipta Sengupta, Biplob Debnath, Jin Li, Ronakkumar N. Desai, Paul Adrian Oltean
Fast and Low-RAM-Footprint Indexing for Data Deduplication

Publication number: 20110276780

Abstract: The subject disclosure is directed towards a data deduplication technology in which a hash index service's index maintains a hash index in a secondary storage device such as a hard drive, along with a compact index table and look-ahead cache in RAM that operate to reduce the I/O to access the secondary storage device during deduplication operations. Also described is a session cache for maintaining data during a deduplication session, and encoding of a read-only compact index table for efficiency.

Type: Application

Filed: December 28, 2010

Publication date: November 10, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Sudipta Sengupta, Biplob Debnath, Jin Li, Ronakkumar N. Desai, Paul Adrian Oltean
Controlling Resource Access Based on Resource Properties

Publication number: 20110126281

Abstract: Described is a technology by which access to a resource is determined by evaluating a resource label of the resource against a user claim of an access request, according to policy decoupled from the resource. The resource may be a file, and the resource label may be obtained by classifying the file into classification properties, such that a change to the file may change its resource label, thereby changing which users have access to the file. The resource label-based access evaluation may be logically combined with a conventional ACL-based access evaluation to determine whether to grant or deny access to the resource.

Type: Application

Filed: November 20, 2009

Publication date: May 26, 2011

Inventors: Nir Ben-Zvi, Raja Pazhanivel Perumal, Anders Samuelsson, Jeffrey B. Hamblin, Ran Kalach, Ziquan Li, Matthias H. Wollnik, Clyde Law, Paul Adrian Oltean

prev 1 2 3 next