Patents by Inventor Frank McSherry

Frank McSherry has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

High level programming extensions for distributed data parallel processing

Patent number: 8209664

Abstract: General-purpose distributed data-parallel computing using high-level computing languages is described. Data parallel portions of a sequential program that is written by a developer in a high-level language are automatically translated into a distributed execution plan. A set of extensions to a sequential high-level computing language are provided to support distributed parallel computations and to facilitate generation and optimization of distributed execution plans. The extensions are fully integrated with the programming language, thereby enabling developers to write sequential language programs using known constructs while providing the ability to invoke the extensions to enable better generation and optimization of the execution plan for a distributed computing environment.

Type: Grant

Filed: March 18, 2009

Date of Patent: June 26, 2012

Assignee: Microsoft Corporation

Inventors: Yuan Yu, Ulfar Erlingsson, Michael A Isard, Frank McSherry
High Level Programming Extensions For Distributed Data Parallel Processing

Publication number: 20100241827

Abstract: General-purpose distributed data-parallel computing using high-level computing languages is described. Data parallel portions of a sequential program that is written by a developer in a high-level language are automatically translated into a distributed execution plan. A set of extensions to a sequential high-level computing language are provided to support distributed parallel computations and to facilitate generation and optimization of distributed execution plans. The extensions are fully integrated with the programming language, thereby enabling developers to write sequential language programs using known constructs while providing the ability to invoke the extensions to enable better generation and optimization of the execution plan for a distributed computing environment.

Type: Application

Filed: March 18, 2009

Publication date: September 23, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Yuan Yu, Ulfar Erlingsson, Michael A. Isard, Frank McSherry
BIDDING ON RELATED KEYWORDS

Publication number: 20090234734

Abstract: Advertising slots on a search engine results page may be determined based on keywords and/or results to a user query. Advertisers may use the keywords and/or the results to the query to place their ads into the advertising slots. Rules may be applied to determine how ads are displayed or not displayed. For example, a larger set of keywords may be inferred from the initial set of keywords on which the ad provider placed her bids. This greatly increases the potential reach of an advertiser's ad campaign or a search engine provider's revenue from ad placement.

Type: Application

Filed: March 17, 2008

Publication date: September 17, 2009

Applicant: MICROSOFT CORPORATION

Inventors: Sreenivas Gollapudi, Frank McSherry, Rina Panigrahy, Kunal Talwar
CONSISTENT CONTINGENCY TABLE RELEASE

Publication number: 20090182797

Abstract: Techniques for contingency table release provide an accurate and consistent set of tables while guaranteeing that privacy is preserved. A positive and integral database is constructed that corresponds to these tables. Therefore, a database can be generated that preserves low-order marginals up to a small error. Moreover, a gracefully degrading version of the results is provided as a database can be computed such that the error in the low-order marginals is small, and increases smoothly with the order of the marginal.

Type: Application

Filed: January 10, 2008

Publication date: July 16, 2009

Applicant: MICROSOFT CORPORATION

Inventors: Cynthia Dwork, Frank McSherry, Kunal Talwar, Boaz Barak, Kamalika Chaudhuri, Satyen Kale
Hash tables

Publication number: 20070234005

Abstract: Hash tables comprising load factors of up to and above 97% are disclosed. The hash tables may be associated with three or more hash functions, each hash function being applied to a key to identify a location in a hash table. The load factor of a hash table may be increased, obviating any need to increase the size of the hash table to accommodate more insertions. Such increase in load factor may be accomplished by a combination of increasing the number of cells per bucket in a hash table and increasing the number of hash functions associated with the hash table.

Type: Application

Filed: March 29, 2006

Publication date: October 4, 2007

Applicant: Microsoft Corporation

Inventors: Ulfar Erlingsson, Mark Manasse, Frank McSherry, Abraham Flaxman
Scheduling of index merges

Publication number: 20070174314

Abstract: While consulting indexes to conduct a search, a determination is made from time to time as to whether it is more efficient to consult individual indexes in a set or to merge the indexes and consult the merged index. The cost of merging indexes is compared with the cost of individually querying indexes. In accordance with the result of this comparison, the indexes are merged and the merged index is consulted, or the indexes are individually consulted. A cost-balance invariant in the form of an inequality is used to equate the cost of merging indexes to a weighted cost of individually querying indexes. As query events are received, the costs are updated. As long as the cost-balance invariant is not violated, indexes are merged and the merged index is queried. If the cost-balance invariant is violated, indexes are not merged, and the indexes are individually queried.

Type: Application

Filed: January 6, 2006

Publication date: July 26, 2007

Applicant: Microsoft Corporation

Inventors: Frank McSherry, John MacCormick
Protection against timing and resource consumption attacks

Publication number: 20070150437

Abstract: Systems and methods are provided for obscuring an amount of a resource used to process an item. In general, contemplated techniques comprise assigning a maximum allowable amount of the resource for processing a sub-part of the item. If the maximum allowable amount of the resource is reached, processing the sub-part may be terminated. Once all sub-parts are processed, a noisy quantity of the resource that was consumed in processing the item may be released. The noisy quantity is determined by adding a positive amount of the resource, combined with a noise value, to an actual quantity of the resource that was consumed.

Type: Application

Filed: December 22, 2005

Publication date: June 28, 2007

Applicant: Microsoft Corporation

Inventors: Cynthia Dwork, Frank McSherry, Ilya Mironov
Selective privacy guarantees

Publication number: 20070147606

Abstract: Systems and methods are provided for selectively determining privacy guarantees. For example, a first class of data may be guaranteed a first level of privacy, while other data classes are only guaranteed some lesser level of privacy. An amount of privacy is guaranteed by adding noise values to database query outputs. Noise distributions can be tailored to be appropriate for the particular data in a given database by calculating a “diameter” of the data. When the distribution is based on the diameter of a first class of data, and the diameter measurement does not account for additional data in the database, the result is that query outputs leak information about the additional data.

Type: Application

Filed: December 22, 2005

Publication date: June 28, 2007

Applicant: Microsoft Corporation

Inventors: Cynthia Dwork, Frank McSherry
Global and local entity naming

Publication number: 20070143437

Abstract: An improved entity naming scheme employs the use of two sets of names: local names and global names. The local and global naming scheme may be applied to entities that are assigned to a number of different global compartments. Local entities are entities that are assigned to the same compartment, while non-local entities are entities that are assigned to different compartments. Each entity is assigned a local name that is unique among all local entities. Additionally, a number of global entities are identified. Global entities are entities that are referenced by one or more non-local entities. Each global entity is assigned a global name that is unique among all global entities.

Type: Application

Filed: December 16, 2005

Publication date: June 21, 2007

Applicant: Microsoft Corporation

Inventors: Frank McSherry, Ulfar Erlingsson
Differential data privacy

Publication number: 20070143289

Abstract: Systems and methods are provided for controlling privacy loss associated with database participation. In general, privacy loss can be evaluated based on information available to a hypothetical adversary with access to a database under two scenarios: a first scenario in which the database does not contain data about a particular privacy principal, and a second scenario in which the database does contain data about the privacy principal. Such evaluation can be made for example by a mechanism for determining sensitivity of at least one database query output to addition to the database of data associated with a privacy principal. An appropriate noise distribution can be calculated based on the sensitivity measurement and optionally a privacy parameter. A noise value is selected from the distribution and added to query outputs.

Type: Application

Filed: December 16, 2005

Publication date: June 21, 2007

Applicant: Microsoft Corporation

Inventors: Cynthia Dwork, Frank McSherry
Noisy histograms

Publication number: 20070136027

Abstract: A histogram can be generated and displayed with noisy category values, where the noise values are selected from a noise distribution that is calculated using a histogram diameter. The noise values are combined with histogram category values, thereby producing noisy histogram category values that do not reveal information about the contributors.

Type: Application

Filed: December 9, 2005

Publication date: June 14, 2007

Applicant: Microsoft Corporation

Inventors: Cynthia Dwork, Frank McSherry
Exponential noise distribution to optimize database privacy and output utility

Publication number: 20070130147

Abstract: An amount of noise to add to a query output may be selected to preserve privacy of inputs while maximizing utility of the released output. Noise values can be distributed according to a substantially symmetric exponential density function (“exponential distribution”). That is, the most likely noise value can be zero, and noise values of increasing absolute value can decrease in probability according to the exponential function.

Type: Application

Filed: December 2, 2005

Publication date: June 7, 2007

Applicant: Microsoft Corporation

Inventors: Cynthia Dwork, Frank McSherry
Data diameter privacy policies

Publication number: 20070124268

Abstract: Privacy of data can be preserved while utility of the output is maximized by selecting from an appropriately calculated distribution of noise values to add to an output. A distribution that includes a high likelihood of large noise values may lead to less useful output data. Conversely, a distribution that includes very low likelihood of large noise values may lead to less privacy. A distribution should be calculated to provide an appropriate level of output utility and privacy based on the query that is performed and the desired privacy level.

Type: Application

Filed: November 30, 2005

Publication date: May 31, 2007

Applicant: Microsoft Corporation

Inventors: Cynthia Dwork, Frank McSherry
Noise in secure function evaluation

Publication number: 20070083493

Abstract: Techniques are provided for injecting noise into secure function evaluation to protect the privacy of the participants. A system and method are illustrated that can compute a collective noisy result by combining results and noise generated based on input from the participants. When implemented using distributed computing devices, each device may have access to a subset of data. A query may be distributed to the devices, and each device applies the query to its own subset of data to obtain a subset result. Each device then divides its subset result into one or more shares, and the shares are combined to form a collective result. The devices may also generate random bits. The random bits may be combined and used to generate noise. The collective result can be combined with the noise to obtain a collective noisy result.

Type: Application

Filed: October 6, 2005

Publication date: April 12, 2007

Applicant: Microsoft Corporation

Inventors: Cynthia Dwork, Frank McSherry
Determination of useful convergence of static rank

Publication number: 20070061315

Abstract: An input or query is determined for which a search engine's static ranking computation is the answer. By understanding how this input or query differs from the posed input or query, the precise termination point of an iterative convergence problem can be determined. An iterative process provides the following inputs to the system: a graph of hyperlinks, and a vector of how the probability mass is redistributed. Given the set of ranks (the output results), it is determined how the input (e.g., the query) would have to be changed to get the rank(s) as the answer or result. Backward answer analysis is provided in the web page context. The difference between what was asked and what should have been asked is determined. After the difference is computed, it is determined if the iterative process should be stopped or not.

Type: Application

Filed: September 15, 2005

Publication date: March 15, 2007

Applicant: Microsoft Corporation

Inventor: Frank McSherry
Private clustering and statistical queries while analyzing a large database

Publication number: 20060200431

Abstract: A database has a plurality of entries and a plurality of attributes common to each entry, where each entry corresponds to an individual. A query is received from a querying entity query and is passed to the database, and an answer is received in response. An amount of noise is generated and added to the answer to result in an obscured answer, and the obscured answer is returned to the querying entity. The noise is normally distributed around zero with a particular variance. The variance R may be determined in accordance with R>8 T log2(T/?)/?2, where T is the permitted number of queries T, ? is the utter failure probability, and ? is the largest admissible increase in confidence. Thus, a level of protection of privacy is provided to each individual represented within the database. Example noise generation techniques, systems, and methods may be used for privacy preservation in such areas as k means, principal component analysis, statistical query learning models, and perceptron algorithms.

Type: Application

Filed: March 1, 2005

Publication date: September 7, 2006

Applicant: Microsoft Corporation

Inventors: Cynthia Dwork, Frank McSherry, Yaacov Nissim Kobliner, Avrim Blum
Efficiently ranking web pages via matrix index manipulation and improved caching

Publication number: 20060026191

Abstract: Methods and systems are described for computing page rankings more efficiently. Using an interconnectivity matrix describing the interconnection of web pages, a new matrix is computed. The new matrix is used to compute the average of values associated with each web page's neighboring web pages. The secondary eigenvector of this new matrix is computed, and indices for web pages are relabeled according to the eigenvector. The data structure storing the interconnectivity information is preferably also physically sorted according to the eigenvector. By reorganizing the matrix used in the web page ranking computations, caching is performed more efficiently, resulting in faster page ranking techniques. Methods for efficiently allocating the distribution of resources are also described.

Type: Application

Filed: July 30, 2004

Publication date: February 2, 2006

Applicant: Microsoft Corporation

Inventor: Frank McSherry
Partitioning social networks

Publication number: 20060015588

Abstract: The present invention provides a unique system and method that facilitates reducing network traffic between a plurality of servers located on a social-based network. The system and method involve identifying a plurality of vertices or service users on the network with respect to their server or network locations. The vertices' contacts or connections can be located or determined as well. In order to minimize communication traffic, the vertices and their connections with respect to their respective server locations can be analyzed to determine whether at least a subset of nodes should be moved or relocated to another server to facilitate mitigating network traffic while balancing user load among the various servers or parts of the network. Thus, an underlying social network can be effectively partitioned. In addition, the network can be parsed into a collection of nested layers, whereby each successively less dense layer can be partitioned with respect to the previous (partitioned) more dense layer.

Type: Application

Filed: June 30, 2004

Publication date: January 19, 2006

Applicant: Microsoft Corporation

Inventors: Dimitris Achlioptas, Frank McSherry
Efficient computation of web page rankings

Publication number: 20060004811

Abstract: Methods and systems are provided for efficiently computing page rankings of web pages or other interconnected objects. The rankings are produced by efficiently computing a principal eigenvector of a page ranking transition matrix. The methods and systems provided herein can be used to produce page rankings in a distributed and/or incremental manner, and can be used to allocate computing resources to processing page rankings for those pages that most demand them.

Type: Application

Filed: July 1, 2004

Publication date: January 5, 2006

Applicant: Microsoft Corporation

Inventor: Frank McSherry
Personalization of web page search rankings

Publication number: 20050149502

Abstract: Methods and systems are provided for efficiently computing personalized rankings of web pages or other interconnected objects. The personalized rankings are produced by efficiently computing an approximation matrix to an ideal personalized page ranking matrix. The methods and systems provided herein can be used to produce search results with particular relevance to an individual searcher.

Type: Application

Filed: January 5, 2004

Publication date: July 7, 2005

Applicant: Microsoft Corporation

Inventor: Frank McSherry

1 2 next