Patents by Inventor Michael Bendersky

Michael Bendersky has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method for Training Large Language Models to Perform Query Intent Classification

Publication number: 20240135187

Abstract: Provided are computing systems, methods, and platforms that train query processing models, such as large language models, to perform query intent classification tasks by using retrieval augmentation and multi-stage distillation. Unlabeled training examples of queries may be obtained, and a set of the training examples may be augmented with additional feature annotations to generate augmented training examples. A first query processing model may annotate the retrieval augmented queries to generate inferred labels for the augmented training examples. A second query processing model may be trained on the inferred labels, distilling the query processing model that was trained with retrieval augmentation into a non-retrieval augmented query processing model. The second query processing model may annotate the entire set of unlabeled training examples. Another stage of distillation may train a third query processing model using the entire set of unlabeled training examples without retrieval augmentation.

Type: Application

Filed: October 22, 2023

Publication date: April 25, 2024

Inventors: Krishna Pragash Srinivasan, Michael Bendersky, Anupam Samanta, Lingrui Liao, Luca Bertelli, Ming-Wei Chang, Iftekhar Naim, Siddhartha Brahma, Siamak Shakeri, Hongkun Yu, John Nham, Karthik Raman, Raphael Dominik Hoffmann
Dynamic Language Models for Continuously Evolving Content

Publication number: 20230401382

Abstract: Provided are systems and methods for incremental training of machine learning models to adapt to changes in an underlying data distribution. One example setting in which the techniques described herein may be beneficial is for incrementally training natural language models to enable the models to have or adapt to a dynamically changing vocabulary. Incremental training is provided as a feasible and inexpensive way of adapting machine learning models to evolving vocabulary without having to retrain them from scratch.

Type: Application

Filed: October 19, 2021

Publication date: December 14, 2023

Inventors: Spurthi Amba Hombaiah, Mingyang Zhang, Michael Bendersky, Tao Chen, Marc Alexander Najork
Systems and Methods for Machine-Learned Prediction of Semantic Similarity Between Documents

Publication number: 20230297783

Abstract: Systems and methods of the present disclosure are directed to a method for predicting semantic similarity between documents. The method can include obtaining a first document and a second document. The method can include parsing the first document into a plurality of first textual blocks and the second document into a plurality of second textual blocks. The method can include processing each of the plurality of first textual blocks and the second textual blocks with a machine-learned semantic document encoding model to obtain a first document encoding and a second document encoding. The method can include determining a similarity metric descriptive of a semantic similarity between the first document and the second document based on the first document encoding and the second document encoding.

Type: Application

Filed: May 22, 2023

Publication date: September 21, 2023

Inventors: Liu Yang, Marc Najork, Michael Bendersky, Mingyang Zhang, Cheng Li
SYSTEMS AND METHODS FOR USING DOCUMENT ACTIVITY LOGS TO TRAIN MACHINE-LEARNED MODELS FOR DETERMINING DOCUMENT RELEVANCE

Publication number: 20230267277

Abstract: Systems and methods of the present disclosure are directed to a method for training a machine-learned semantic matching model. The method can include obtaining a first and second document and a first and second activity log. The method can include determining, based on the first document activity log and the second document activity log, a relation label indicative of whether the documents are related. The method can include inputting the documents into the model to receive a semantic similarity value representing an estimated semantic similarity between the first document and the second document. The method can include evaluating a loss function that evaluates a difference between the relation label and the semantic similarity value. The method can include modifying values of parameters of the model based on the loss function.

Type: Application

Filed: June 15, 2020

Publication date: August 24, 2023

Inventors: Weize Kong, Michael Bendersky, Marc Najork, Rama Kumar Pasumarthi, Zhen Qin, Rolf Jagerman
Layout-Aware Multimodal Pretraining for Multimodal Document Understanding

Publication number: 20230222285

Abstract: Systems and methods for document processing that can process and understand the layout, text size, text style, and multimedia of a document can generate more accurate and informed document representations. The layout of a document paired with text size and style can indicate what portions of a document are possibly more important, and the understanding of that importance can help with understanding of the document. Systems and methods utilizing a hierarchical framework that processes the block-level and the document-level of a document can capitalize on these indicators to generate a better document representation.

Type: Application

Filed: December 22, 2020

Publication date: July 13, 2023

Inventors: Mingyang Zhang, Cheng Li, Tao Chen, Spurthi Amba Hombaiah, Michael Bendersky, Marc Alexander Najork, Te-Lin Wu
Systems and methods for machine-learned prediction of semantic similarity between documents

Patent number: 11694034

Abstract: Systems and methods of the present disclosure are directed to a method for predicting semantic similarity between documents. The method can include obtaining a first document and a second document. The method can include parsing the first document into a plurality of first textual blocks and the second document into a plurality of second textual blocks. The method can include processing each of the plurality of first textual blocks and the second textual blocks with a machine-learned semantic document encoding model to obtain a first document encoding and a second document encoding. The method can include determining a similarity metric descriptive of a semantic similarity between the first document and the second document based on the first document encoding and the second document encoding.

Type: Grant

Filed: October 23, 2020

Date of Patent: July 4, 2023

Assignee: GOOGLE LLC

Inventors: Liu Yang, Marc Najork, Michael Bendersky, Mingyang Zhang, Cheng Li
AUTOMATIC FILE ORGANIZATION WITHIN A CLOUD STORAGE SYSTEM

Publication number: 20230177004

Abstract: Techniques are described herein for enabling more computationally efficient organization of files within a cloud storage system. A method includes: receiving information identifying a document and a set of folders; for each folder in the set of folders, using a trained model to predict a similarity measure between the folder and the document; for each folder in the set of folders, determining a score for the folder based on the predicted similarity measure for the folder; selecting a candidate folder from the set of folders using the scores of the folders within the set of folders; and providing, on a user interface, a selectable option to associate the document with the candidate folder.

Type: Application

Filed: December 7, 2021

Publication date: June 8, 2023

Inventors: Weize Kong, Mingyang Zhang, Michael Bendersky, Marc Alexander Najork, Mike Colagrosso, Brandon Vargo, Remy Burger
ADVERSARIAL BANDITS POLICY FOR CRAWLING HIGHLY DYNAMIC CONTENT

Publication number: 20230169128

Abstract: Techniques of generating recrawl policies for commercial offer pages include generating a multiple strategy approach using a number of different strategies. In some implementations, each strategy is an arm of a K-armed adversarial bandits algorithm with reinforcement learning. Moreover, in some implementations, the multiple strategy approach also uses a machine learning algorithm to estimate parameters such as a click rate, impression rate, and likelihood of price change, i.e., change rate, which was assumed known in the conventional approaches.

Type: Application

Filed: March 30, 2020

Publication date: June 1, 2023

Inventors: Michael Bendersky, Przemyslaw Gajda, Sergey Novikov, Marc Alexander Najork, Shuguang Han
Systems and Methods for Machine-Learned Prediction of Semantic Similarity Between Documents

Publication number: 20220129638

Abstract: Systems and methods of the present disclosure are directed to a method for predicting semantic similarity between documents. The method can include obtaining a first document and a second document. The method can include parsing the first document into a plurality of first textual blocks and the second document into a plurality of second textual blocks. The method can include processing each of the plurality of first textual blocks and the second textual blocks with a machine-learned semantic document encoding model to obtain a first document encoding and a second document encoding. The method can include determining a similarity metric descriptive of a semantic similarity between the first document and the second document based on the first document encoding and the second document encoding.

Type: Application

Filed: October 23, 2020

Publication date: April 28, 2022

Inventors: Liu Yang, Marc Najork, Michael Bendersky, Mingyang Zhang, Cheng Li
Search and retrieval of structured information cards

Patent number: 11238058

Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate identification of additional trigger-terms for a structured information card. In one aspect, the method includes actions of accessing data associated with a template for presenting structured information, wherein the accessed data references (i) a label term and (ii) a value. Other actions may include obtaining a candidate label term, identifying one or more entities that are associated with the label term, identifying one or more of the entities that are associated with the candidate label term, and for each particular entity of the one or more entities that are associated with the candidate label term, associating, with the candidate label term, (i) a label term that is associated with the particular entity, and (ii) the value associated with the label term.

Type: Grant

Filed: November 2, 2020

Date of Patent: February 1, 2022

Assignee: Google LLC

Inventors: Marc Alexander Najork, Sujith Ravi, Michael Bendersky, Peter Shao-sen Young, Timothy Youngjin Sohn, Mingyang Zhang, Thomas Nelson, Xuanhui Wang
TRAINING A RANKING MODEL

Publication number: 20210125108

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a ranking machine learning model. In one aspect, a method includes the actions of receiving training data for a ranking machine learning model, the training data including training examples, and each training example including data identifying: a search query, result documents from a result list for the search query, and a result document that was selected by a user from the result list, receiving position data for each training example in the training data, the position data identifying a respective position of the selected result document in the result list for the search query in the training example; determining, for each training example in the training data, a respective selection bias value; and determining a respective importance value for each training example from the selection bias value for the training example, the importance value.

Type: Application

Filed: October 24, 2016

Publication date: April 29, 2021

Applicant: Google LLC

Inventors: Donald Arthur Metzler, JR., Xuanhui Wang, Marc Alexander Najork, Michael Bendersky
SEARCH AND RETRIEVAL OF STRUCTURED INFORMATION CARDS

Publication number: 20210049165

Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate identification of additional trigger-terms for a structured information card. In one aspect, the method includes actions of accessing data associated with a template for presenting structured information, wherein the accessed data references (i) a label term and (ii) a value. Other actions may include obtaining a candidate label term, identifying one or more entities that are associated with the label term, identifying one or more of the entities that are associated with the candidate label term, and for each particular entity of the one or more entities that are associated with the candidate label term, associating, with the candidate label term, (i) a label term that is associated with the particular entity, and (ii) the value associated with the label term.

Type: Application

Filed: November 2, 2020

Publication date: February 18, 2021

Inventors: Marc Alexander Najork, Sujith Ravi, Michael Bendersky, Peter Shao-sen Young, Timothy Youngjin Sohn, Mingyang Zhang, Thomas Nelson, Xuanhui Wang
Search and retrieval of structured information cards

Patent number: 10824630

Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate identification of additional trigger-terms for a structured information card. In one aspect, the method includes actions of accessing data associated with a template for presenting structured information, wherein the accessed data references (i) a label term and (ii) a value. Other actions may include obtaining a candidate label term, identifying one or more entities that are associated with the label term, identifying one or more of the entities that are associated with the candidate label term, and for each particular entity of the one or more entities that are associated with the candidate label term, associating, with the candidate label term, (i) a label term that is associated with the particular entity, and (ii) the value associated with the label term.

Type: Grant

Filed: October 26, 2016

Date of Patent: November 3, 2020

Assignee: GOOGLE LLC

Inventors: Marc Alexander Najork, Sujith Ravi, Michael Bendersky, Peter Shao-sen Young, Timothy Youngjin Sohn, Mingyang Zhang, Thomas Nelson, Xuanhui Wang
SEARCH AND RETRIEVAL OF STRUCTURED INFORMATION CARDS

Publication number: 20180113865

Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate identification of additional trigger-terms for a structured information card. In one aspect, the method includes actions of accessing data associated with a template for presenting structured information, wherein the accessed data references (i) a label term and (ii) a value. Other actions may include obtaining a candidate label term, identifying one or more entities that are associated with the label term, identifying one or more of the entities that are associated with the candidate label term, and for each particular entity of the one or more entities that are associated with the candidate label term, associating, with the candidate label term, (i) a label term that is associated with the particular entity, and (ii) the value associated with the label term.

Type: Application

Filed: October 26, 2016

Publication date: April 26, 2018

Inventors: Marc Alexander Najork, Sujith Ravi, Michael Bendersky, Peter Shao-sen Young, Timothy Youngjin Sohn, Mingyang Zhang, Thomas Nelson, Xuanhui Wang
HIERARCHICALLY-STRUCTURED INDEXING AND RETRIEVAL

Publication number: 20110258034

Abstract: Novel and efficient methods are described for indexing advertisements (“ads”) and other resources that are defined and organized in accordance with a hierarchical schema. In accordance with at least one embodiment, an ad corpus is transformed into a collection of hierarchically structured textual documents. An indexing technique that exploits the hierarchical structure is then applied to construct a compact yet effective ad index that can be used for performing advanced match or other ad retrieval functions. Various retrieval methods are also described herein that are capable of exploiting the hierarchical structure of the ad corpus to retrieve more relevant ads than those yielded by conventional methods.

Type: Application

Filed: April 15, 2010

Publication date: October 20, 2011

Applicant: YAHOO! INC.

Inventors: Donald Metzler, Evgeniy Gabrilovich, Vanja Josifovski, Michael Bendersky