Patents by Inventor Frank D. McSherry
Frank D. McSherry has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10171284Abstract: A computer-readable storage medium stores computer-executable instructions that, when executed by a processor, perform operations including scheduling first and second threads to operate independently on first and second partitions of data. The operations include beginning a first operation on the first and second partitions by the first and second threads, respectively. The operations include tracking progress of the first operation by the first and second threads using a replicated data structure. The operations include, for a record on which the first operation will be performed, adding an entry to the replicated data structure with a timestamp indicating an epoch and iteration. The operations include determining a number of yet-to-be-processed records for a selected entry of the replicated data structure. The selected entry has the most recent timestamp for the first thread. The operations include terminating the first thread when the number of yet-to-be-processed records for the selected entry is zero.Type: GrantFiled: November 24, 2017Date of Patent: January 1, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Frank D. McSherry, Rebecca Isaacs, Michael A. Isard, Derek G. Murray
-
Publication number: 20180097684Abstract: A computer-readable storage medium stores computer-executable instructions that, when executed by a processor, perform operations including scheduling first and second threads to operate independently on first and second partitions of data. The operations include beginning a first operation on the first and second partitions by the first and second threads, respectively. The operations include tracking progress of the first operation by the first and second threads using a replicated data structure. The operations include, for a record on which the first operation will be performed, adding an entry to the replicated data structure with a timestamp indicating an epoch and iteration. The operations include determining a number of yet-to-be-processed records for a selected entry of the replicated data structure. The selected entry has the most recent timestamp for the first thread. The operations include terminating the first thread when the number of yet-to-be-processed records for the selected entry is zero.Type: ApplicationFiled: November 24, 2017Publication date: April 5, 2018Inventors: Frank D. Mcsherry, Rebecca Isaacs, Michael A. Isard, Derek G. Murray
-
Patent number: 9832068Abstract: Various embodiments provide techniques for working with large-scale collections of data pertaining to real world systems, such as a social network, a roadmap/GPS system, etc. The techniques perform incremental, iterative, and interactive parallel computation using a coordination clock protocol, which applies to scheduling computations and managing resources such as memory and network resources, etc., in cyclic graphs including those resulting from a differential dataflow model that performs computations on differences in the collections of data.Type: GrantFiled: December 17, 2012Date of Patent: November 28, 2017Assignee: Microsoft Technology Licensing, LLCInventors: Frank D. McSherry, Rebecca Isaacs, Michael A. Isard, Derek G. Murray
-
Patent number: 9165035Abstract: The techniques discussed herein efficiently perform data-parallel computations on collections of data by implementing a differential dataflow model that performs computations on differences in the collections of data. The techniques discussed herein describe defined operators for use in a data-parallel program that performs the computations on the determined differences between the collections of data by creating a lattice and indexing the differences in the collection of data according to the lattice.Type: GrantFiled: May 10, 2012Date of Patent: October 20, 2015Assignee: Microsoft Technology Licensing, LLCInventors: Frank D. McSherry, Rebecca Isaacs, Michael A. Isard, Derek G. Murray
-
Publication number: 20140172939Abstract: Various embodiments provide techniques for working with large-scale collections of data pertaining to real world systems, such as a social network, a roadmap/GPS system, etc. The techniques perform incremental, iterative, and interactive parallel computation using a coordination clock protocol, which applies to scheduling computations and managing resources such as memory and network resources, etc., in cyclic graphs including those resulting from a differential dataflow model that performs computations on differences in the collections of data.Type: ApplicationFiled: December 17, 2012Publication date: June 19, 2014Applicant: MICROSOFT CORPORATIONInventors: Frank D. McSherry, Rebecca Isaacs, Michael A. Isard, Derek G. Murray
-
Patent number: 8639649Abstract: Given that a differentially private mechanism has a known conditional distribution, probabilistic inference techniques may be used along with the known conditional distribution, and generated results from previously computed queries on private data, to generate a posterior distribution for the differentially private mechanism used by the system. The generated posterior distribution may be used to describe the probability of every possible result being the correct result. The probability may then be used to qualify conclusions or calculations that may depend on the returned result.Type: GrantFiled: March 23, 2010Date of Patent: January 28, 2014Assignee: Microsoft CorporationInventors: Frank D. McSherry, Oliver M. C. Williams
-
Patent number: 8619984Abstract: User rating data may be received at a correlation engine through a network. The user rating data may include ratings generated by a plurality of users for a plurality of items. Correlation data may be generated from the received user rating data by the correlation engine. The correlation data may identify correlations between the items based on the user generated ratings. Noise may be generated by the correlation engine, and the generated noise may be added to the generated correlation data by the correlation engine to provide differential privacy protection to the user rating data.Type: GrantFiled: September 11, 2009Date of Patent: December 31, 2013Assignee: Microsoft CorporationInventors: Frank D. McSherry, Ilya Mironov
-
Publication number: 20130304744Abstract: The techniques discussed herein efficiently perform data-parallel computations on collections of data by implementing a differential dataflow model that performs computations on differences in the collections of data. The techniques discussed herein describe defined operators for use in a data-parallel program that performs the computations on the determined differences between the collections of data by creating a lattice and indexing the differences in the collection of data according to the lattice.Type: ApplicationFiled: May 10, 2012Publication date: November 14, 2013Applicant: MICROSOFT CORPORATIONInventors: Frank D. McSherry, Rebecca Isaacs, Michael A. Isard, Derek G. Murray
-
Patent number: 8145682Abstract: A query log includes a list of queries and a count for each query representing the number of times that the query was received by a search engine. In order to provide differential privacy protection to the queries, noise is generated and added to each count, and queries that have counts that fall below a threshold are removed from the query log. A distribution associated with a function used to generate the noise is referenced to determine a distribution of a number of times that a hypothetical query having a zero count would have its count exceed the threshold after the addition of noise. Random queries of an amount equal to a sample from the distribution of number of times are added to the query log with a count that is greater than the threshold count.Type: GrantFiled: February 25, 2010Date of Patent: March 27, 2012Assignee: Microsoft CorporationInventors: Frank D. McSherry, Kunal Talwar
-
Publication number: 20110238611Abstract: Given that a differentially private mechanism has a known conditional distribution, probabilistic inference techniques may be used along with the known conditional distribution, and generated results from previously computed queries on private data, to generate a posterior distribution for the differentially private mechanism used by the system. The generated posterior distribution may be used to describe the probability of every possible result being the correct result. The probability may then be used to qualify conclusions or calculations that may depend on the returned result.Type: ApplicationFiled: March 23, 2010Publication date: September 29, 2011Applicant: MICROSOFT CORPORATIONInventors: Frank D. McSherry, Oliver M. C. Williams
-
Publication number: 20110208763Abstract: A query log includes a list of queries and a count for each query representing the number of times that the query was received by a search engine. In order to provide differential privacy protection to the queries, noise is generated and added to each count, and queries that have counts that fall below a threshold are removed from the query log. A distribution associated with a function used to generate the noise is referenced to determine a distribution of a number of times that a hypothetical query having a zero count would have its count exceed the threshold after the addition of noise. Random queries of an amount equal to a sample from the distribution of number of times are added to the query log with a count that is greater than the threshold count.Type: ApplicationFiled: February 25, 2010Publication date: August 25, 2011Applicant: Microsoft CorporationInventors: Frank D. McSherry, Kunal Talwar
-
Patent number: 8005821Abstract: Systems and methods for injecting noise into secure function evaluation to protect the privacy of the participants and for computing a collective noisy result by combining results and noise generated based on input from the participants. When implemented using distributed computing devices, each device may have access to a subset of data. A query may be distributed to the devices, and each device applies the query to its own subset of data to obtain a subset result. Each device then divides its subset result into one or more shares, and the shares are combined to form a collective result. The devices may also generate random bits. The random bits may be combined and used to generate noise. The collective result can be combined with the noise to obtain a collective noisy result.Type: GrantFiled: October 6, 2005Date of Patent: August 23, 2011Assignee: Microsoft CorporationInventors: Cynthia Dwork, Frank D. McSherry
-
Publication number: 20110064221Abstract: User rating data may be received at a correlation engine through a network. The user rating data may include ratings generated by a plurality of users for a plurality of items. Correlation data may be generated from the received user rating data by the correlation engine. The correlation data may identify correlations between the items based on the user generated ratings. Noise may be generated by the correlation engine, and the generated noise may be added to the generated correlation data by the correlation engine to provide differential privacy protection to the user rating data.Type: ApplicationFiled: September 11, 2009Publication date: March 17, 2011Applicant: Microsoft CorporationInventors: Frank D. McSherry, Ilya Mironov
-
Patent number: 7818335Abstract: Systems and methods are provided for selectively determining privacy guarantees. For example, a first class of data may be guaranteed a first level of privacy, while other data classes are only guaranteed some lesser level of privacy. An amount of privacy is guaranteed by adding noise values to database query outputs. Noise distributions can be tailored to be appropriate for the particular data in a given database by calculating a “diameter” of the data. When the distribution is based on the diameter of a first class of data, and the diameter measurement does not account for additional data in the database, the result is that query outputs leak information about the additional data.Type: GrantFiled: December 22, 2005Date of Patent: October 19, 2010Assignee: Microsoft CorporationInventors: Cynthia Dwork, Frank D. McSherry
-
Patent number: 7769707Abstract: Privacy of data can be preserved while utility of the output is maximized by selecting from an appropriately calculated distribution of noise values to add to an output. A distribution that includes a high likelihood of large noise values may lead to less useful output data. Conversely, a distribution that includes very low likelihood of large noise values may lead to less privacy. A distribution should be calculated to provide an appropriate level of output utility and privacy based on the query that is performed and the desired privacy level.Type: GrantFiled: November 30, 2005Date of Patent: August 3, 2010Assignee: Microsoft CorporationInventors: Cynthia Dwork, Frank D. McSherry
-
Patent number: 7739356Abstract: An improved entity naming scheme employs the use of two sets of names: local names and global names. The local and global naming scheme may be applied to entities that are assigned to a number of different global compartments. Local entities are entities that are assigned to the same compartment, while non-local entities are entities that are assigned to different compartments. Each entity is assigned a local name that is unique among all local entities. Additionally, a number of global entities are identified. Global entities are entities that are referenced by one or more non-local entities. Each global entity is assigned a global name that is unique among all global entities.Type: GrantFiled: December 16, 2005Date of Patent: June 15, 2010Assignee: Microsoft CorporationInventors: Frank D. McSherry, Ulfar Erlingsson
-
Patent number: 7716144Abstract: Techniques are provided that identify near-duplicate items in large collections of items. A list of (value, frequency) pairs is received, and a sample (value, instance) is returned. The value is chosen from the values of the first list, and the instance is a value less than frequency, in such a way that the probability of selecting the same sample from two lists is equal to the similarity of the two lists.Type: GrantFiled: March 22, 2007Date of Patent: May 11, 2010Assignee: Microsoft CorporationInventors: Frank D. McSherry, Kunal Talwar, Mark Steven Manasse
-
Patent number: 7698250Abstract: Systems and methods are provided for controlling privacy loss associated with database participation. In general, privacy loss can be evaluated based on information available to a hypothetical adversary with access to a database under two scenarios: a first scenario in which the database does not contain data about a particular privacy principal, and a second scenario in which the database does contain data about the privacy principal. Such evaluation can be made for example by a mechanism for determining sensitivity of at least one database query output to addition to the database of data associated with a privacy principal. An appropriate noise distribution can be calculated based on the sensitivity measurement and optionally a privacy parameter. A noise value is selected from the distribution and added to query outputs.Type: GrantFiled: December 16, 2005Date of Patent: April 13, 2010Assignee: Microsoft CorporationInventors: Cynthia Dwork, Frank D. McSherry
-
Publication number: 20100070511Abstract: Documents that are near-duplicates may be determined using techniques involving consistent uniform hashing. A biased bit may be placed in the leading position of a sequence of bits that may be generated and subsequently used in comparison techniques to determine near-duplicate documents. Unbiased bits may be used in subsequent positions of the sequence of bits, after the biased bit, for use in comparison techniques. Samples may be used collectively, as opposed to individually, in the generation of biased bits. Sequences of bits may thus be produced not on a single sample basis, but for multiple samples, thereby amortizing the cost of generating randomness for the samples. Less than one bit of randomness per sample may be used.Type: ApplicationFiled: September 17, 2008Publication date: March 18, 2010Applicant: MICROSOFT CORPORATIONInventors: Mark Steven Manasse, Frank D. McSherry, Kunal Talwar
-
Patent number: 7676513Abstract: While consulting indexes to conduct a search, a determination is made from time to time as to whether it is more efficient to consult individual indexes in a set or to merge the indexes and consult the merged index. The cost of merging indexes is compared with the cost of individually querying indexes. In accordance with the result of this comparison, the indexes are merged and the merged index is consulted, or the indexes are individually consulted. A cost-balance invariant in the form of an inequality is used to equate the cost of merging indexes to a weighted cost of individually querying indexes. As query events are received, the costs are updated. As long as the cost-balance invariant is not violated, indexes are merged and the merged index is queried. If the cost-balance invariant is violated, indexes are not merged, and the indexes are individually queried.Type: GrantFiled: January 6, 2006Date of Patent: March 9, 2010Assignee: Microsoft CorporationInventors: Frank D. McSherry, John P. MacCormick