Patents by Inventor Christopher A. Olston
Christopher A. Olston has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10789544Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for batching inputs to machine learning models. One of the methods includes receiving a stream of requests, each request identifying a respective input for processing by a first machine learning model; adding the respective input from each request to a first queue of inputs for processing by the first machine learning model; determining, at a first time, that a count of inputs in the first queue as of the first time equals or exceeds a maximum batch size and, in response: generating a first batched input from the inputs in the queue as of the first time so that a count of inputs in the first batched input equals the maximum batch size, and providing the first batched input for processing by the first machine learning model.Type: GrantFiled: April 5, 2016Date of Patent: September 29, 2020Inventors: Noah Fiedel, Christopher Olston, Jeremiah Harmsen
-
Patent number: 10417439Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a catalog for multiple datasets, the method comprising accessing multiple extant data sets, the extant data sets including data sets that are independently generated and structurally dissimilar; organizing the data sets into collections, each data set in each collection belonging to the collection based on collection data associated with the data set; for each collection of data sets: determining, from a subset of the data sets that belong to the collection, metadata that describe the data sets that belong to the collection, wherein the metadata does not include the collection data, and attributing, to other data sets in the collection, the metadata determined from the subset of data sets; and generating, from the collections of data sets and the determined metadata, a catalog for the multiple datasets.Type: GrantFiled: April 6, 2017Date of Patent: September 17, 2019Assignee: Google LLCInventors: Philip Korn, Steven Euijong Whang, Natalya Fridman Noy, Sudip Roy, Neoklis Polyzotis, Alon Yitzchak Halevy, Christopher Olston
-
Publication number: 20170293671Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a catalog for multiple datasets, the method comprising accessing multiple extant data sets, the extant data sets including data sets that are independently generated and structurally dissimilar; organizing the data sets into collections, each data set in each collection belonging to the collection based on collection data associated with the data set; for each collection of data sets: determining, from a subset of the data sets that belong to the collection, metadata that describe the data sets that belong to the collection, wherein the metadata does not include the collection data, and attributing, to other data sets in the collection, the metadata determined from the subset of data sets; and generating, from the collections of data sets and the determined metadata, a catalog for the multiple datasets.Type: ApplicationFiled: April 6, 2017Publication date: October 12, 2017Inventors: Philip Korn, Steven Euijong Whang, Natalya Fridman Noy, Sudip Roy, Neoklis Polyzotis, Alon Yitzchak Halevy, Christopher Olston
-
Publication number: 20170286864Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for batching inputs to machine learning models. One of the methods includes receiving a stream of requests, each request identifying a respective input for processing by a first machine learning model; adding the respective input from each request to a first queue of inputs for processing by the first machine learning model; determining, at a first time, that a count of inputs in the first queue as of the first time equals or exceeds a maximum batch size and, in response: generating a first batched input from the inputs in the queue as of the first time so that a count of inputs in the first batched input equals the maximum batch size, and providing the first batched input for processing by the first machine learning model.Type: ApplicationFiled: April 5, 2016Publication date: October 5, 2017Inventors: Noah Fiedel, Christopher Olston, Jeremiah Harmsen
-
Patent number: 8965842Abstract: A method and system are given for providing a virtual environment spanning a desktop and a cloud. In one example, the method includes receiving a query template over a data set that resides in the cloud, optimizing the query template to segment the query template into an offline phase and an online phase, executing the offline phase on the cloud to build one or more indexes, and sending the one or more indexes to the desktop.Type: GrantFiled: July 15, 2014Date of Patent: February 24, 2015Assignee: Yahoo! Inc.Inventor: Christopher Olston
-
Patent number: 8949834Abstract: Disclosed are methods and apparatus for scheduling an asynchronous workflow having a plurality of processing paths. In one embodiment, one or more predefined constraint metrics that constrain temporal asynchrony for one or more portions of the workflow may be received or provided. Input data is periodically received or intermediate or output data is generated for one or more of the processing paths of the workflow, via one or more operators, based on a scheduler process. One or more of the processing paths for generating the intermediate or output data are dynamically selected based on received input data or generated intermediate or output data and the one or more constraint metrics. The selected one or more processing paths of the workflow are then executed so that each selected processing path generates intermediate or output data for the workflow.Type: GrantFiled: April 7, 2010Date of Patent: February 3, 2015Assignee: Yahoo! Inc.Inventor: Christopher A. Olston
-
Publication number: 20140324822Abstract: A method and system are given for providing a virtual environment spanning a desktop and a cloud. In one example, the method includes receiving a query template over a data set that resides in the cloud, optimizing the query template to segment the query template into an offline phase and an online phase, executing the offline phase on the cloud to build one or more indexes, and sending the one or more indexes to the desktop.Type: ApplicationFiled: July 15, 2014Publication date: October 30, 2014Inventor: Christopher Olston
-
Patent number: 8838527Abstract: A method and system are given for providing a virtual environment spanning a desktop and a cloud. In one example, the method includes receiving a query template over a data set that resides in the cloud, optimizing the query template to segment the query template into an offline phase and an online phase, executing the offline phase on the cloud to build one or more indexes, and sending the one or more indexes to the desktop.Type: GrantFiled: November 6, 2008Date of Patent: September 16, 2014Assignee: Yahoo! Inc.Inventor: Christopher Olston
-
Patent number: 8745183Abstract: An improved system and method is provided for adaptively refreshing a web page. A base version of the web page may be partitioned into a collection of fragments. Then the collection of fragments may be compared with the corresponding fragments of a recent version of the web page to determine a divergence measurement of the difference between the base version and the recent version of the web page. The divergence measurement may be recorded in a change profile representing a change history of the web page that includes a sequence of numeric pairs indicating a time offset and a divergence measurement of the difference between a version of the web page at the time offset and a base version of the web page. The refresh period for the web page may be adjusted by applying an adaptive refresh policy using the divergence measurements recorded in the change profile.Type: GrantFiled: October 26, 2006Date of Patent: June 3, 2014Assignee: Yahoo! Inc.Inventor: Christopher Olston
-
Patent number: 8732160Abstract: A method and a system are provided for exploring a large textual data set via interactive aggregation. In one example, the method includes receiving the large textual data set and an original query template, building an index for the query template, wherein the building the index comprises ordering the index a particular way to optimize query time, receiving one or more bindings for the original query template, computing an answer to the original query template using the index and the one or more bindings, and anticipating one or more future queries that a user may submit and that are related to the original query template.Type: GrantFiled: November 11, 2008Date of Patent: May 20, 2014Assignee: Yahoo! Inc.Inventor: Christopher Olston
-
Patent number: 8312011Abstract: The present invention relates to methods, systems, and computer readable media comprising instructions for identifying needy queries for which additional responsive content is needed. The method of the present invention comprises receiving a query comprising one or more terms and retrieving one or more content items identified as responsive to the query, the one or more content items ranked according to one or more ranking techniques. A score is generated for the one or more ranked content items identified as responsive to the query. A determination is thereafter made as to whether the query is needy based upon a comparison of the one or more scores associated with the one or more content items identified as responsive to the query and a needy query score threshold.Type: GrantFiled: May 16, 2011Date of Patent: November 13, 2012Assignee: Yahoo! Inc.Inventors: Christopher Olston, Sandeep Pandey
-
Publication number: 20110252427Abstract: Disclosed are methods and apparatus for scheduling an asynchronous workflow having a plurality of processing paths. In one embodiment, one or more predefined constraint metrics that constrain temporal asynchrony for one or more portions of the workflow may be received or provided. Input data is periodically received or intermediate or output data is generated for one or more of the processing paths of the workflow, via one or more operators, based on a scheduler process. One or more of the processing paths for generating the intermediate or output data are dynamically selected based on received input data or generated intermediate or output data and the one or more constraint metrics. The selected one or more processing paths of the workflow are then executed so that each selected processing path generates intermediate or output data for the workflow.Type: ApplicationFiled: April 7, 2010Publication date: October 13, 2011Applicant: YAHOO! INC.Inventor: Christopher A. Olston
-
Publication number: 20110218991Abstract: The present invention relates to methods, systems, and computer readable media comprising instructions for identifying needy queries for which additional responsive content is needed. The method of the present invention comprises receiving a query comprising one or more terms and retrieving one or more content items identified as responsive to the query, the one or more content items ranked according to one or more ranking techniques. A score is generated for the one or more ranked content items identified as responsive to the query. A determination is thereafter made as to whether the query is needy based upon a comparison of the one or more scores associated with the one or more content items identified as responsive to the query and a needy query score threshold.Type: ApplicationFiled: May 16, 2011Publication date: September 8, 2011Applicant: YAHOO! INC.Inventors: Christopher Olston, Sandeep Pandey
-
Patent number: 7991769Abstract: An improved system and method is provided for searching a collection of objects that may be located in hierarchies of auxiliary information for retrieval of response objects. A framework to perform a generalization search in hierarchies may be used to generalize a search by moving up to a higher level in a hierarchy of taxonomies or to specialize a search by moving down to a lower level in the hierarchy of taxonomies. Once the system may decide to enumerate response objects at a particular level of generalization, a budgeted generalization search may be used for enumerating a set of response objects within a budgeted cost.Type: GrantFiled: July 7, 2006Date of Patent: August 2, 2011Assignee: Yahoo! Inc.Inventors: Marcus Felipe Fontoura, Vanja Josifovski, Christopher Olston, Shanmugasundaram Ravikumar, Andrew Tomkins
-
Patent number: 7970760Abstract: Methods, systems, and computer readable media comprising instructions for identifying needy queries for which additional responsive content is needed. A method comprises receiving a query comprising one or more terms and retrieving one or more content items identified as responsive to the query, the one or more content items ranked according to one or more ranking techniques. A score is generated for the one or more ranked content items identified as responsive to the query. A determination is thereafter made as to whether the query is needy based upon a comparison of the one or more scores associated with the one or more content items identified as responsive to the query and a needy query score threshold.Type: GrantFiled: March 11, 2008Date of Patent: June 28, 2011Assignee: Yahoo! Inc.Inventors: Christopher Olston, Sandeep Pandey
-
Patent number: 7921416Abstract: The present invention, in an example embodiment, provides a special-purpose formal language and translator for the parallel processing of large databases in a distributed system. The special-purpose language has features of both a declarative programming language and a procedural programming language and supports the co-grouping of tables, each with an arbitrary alignment function, and the specification of procedural operations to be performed on the resulting co-groups. The language's translator translates a program in the language into optimized structured calls to an application programming interface for implementations of functionality related to the parallel processing of tasks over a distributed system. In an example embodiment, the application programming interface includes interfaces for MapReduce functionality, whose implementations are supplemented by the embodiment.Type: GrantFiled: October 20, 2006Date of Patent: April 5, 2011Assignee: Yahoo! Inc.Inventors: Marcus Felipe Fontoura, Vanja Josifovski, Shanmugasundaram Ravikumar, Christopher Olston, Benjamin Clay Reed, Andrew Tomkins
-
Patent number: 7899807Abstract: An improved system and method for crawl ordering of a web crawler by impact upon search results of a search engine is provided. Content-independent features of uncrawled web pages may be obtained, and the impact of uncrawled web pages may be estimated for queries of a workload using the content-independent features. The impact of uncrawled web pages may be estimated for queries by computing an expected impact score for uncrawled web pages that match needy queries. Query sketches may be created for a subset of the queries by computing an expected impact score for crawled web pages and uncrawled web pages matching the queries. Web pages may then be selected to fetch using a combined query-based estimate and query-independent estimate of the impact of fetching the web pages on search query results.Type: GrantFiled: December 20, 2007Date of Patent: March 1, 2011Assignee: Yahoo! Inc.Inventors: Christopher Olston, Sandeep Pandey
-
Patent number: 7805447Abstract: Computer-implemented methods, modules and clients relate to expanded, pruned sample table for testing database queries against a base table. The expanded, pruned sample table is formed from the base table by a process of initial sampling, synthesis, and pruning.Type: GrantFiled: January 16, 2008Date of Patent: September 28, 2010Assignee: Yahoo! Inc.Inventors: Christopher Olston, Utkarsh Srivastava
-
Publication number: 20100121847Abstract: A method and a system are provided for exploring a large textual data set via interactive aggregation. In one example, the method includes receiving the large textual data set and an original query template, building an index for the query template, wherein the building the index comprises ordering the index a particular way to optimize query time, receiving one or more bindings for the original query template, computing an answer to the original query template using the index and the one or more bindings, and anticipating one or more future queries that a user may submit and that are related to the original query template.Type: ApplicationFiled: November 11, 2008Publication date: May 13, 2010Inventor: Christopher Olston
-
Publication number: 20100114867Abstract: A method and system are given for providing a virtual environment spanning a desktop and a cloud. In one example, the method includes receiving a query template over a data set that resides in the cloud, optimizing the query template to segment the query template into an offline phase and an online phase, executing the offline phase on the cloud to build one or more indexes, and sending the one or more indexes to the desktop.Type: ApplicationFiled: November 6, 2008Publication date: May 6, 2010Inventor: Christopher Olston