Patents by Inventor Michael Bendersky

Michael Bendersky has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240135187
    Abstract: Provided are computing systems, methods, and platforms that train query processing models, such as large language models, to perform query intent classification tasks by using retrieval augmentation and multi-stage distillation. Unlabeled training examples of queries may be obtained, and a set of the training examples may be augmented with additional feature annotations to generate augmented training examples. A first query processing model may annotate the retrieval augmented queries to generate inferred labels for the augmented training examples. A second query processing model may be trained on the inferred labels, distilling the query processing model that was trained with retrieval augmentation into a non-retrieval augmented query processing model. The second query processing model may annotate the entire set of unlabeled training examples. Another stage of distillation may train a third query processing model using the entire set of unlabeled training examples without retrieval augmentation.
    Type: Application
    Filed: October 22, 2023
    Publication date: April 25, 2024
    Inventors: Krishna Pragash Srinivasan, Michael Bendersky, Anupam Samanta, Lingrui Liao, Luca Bertelli, Ming-Wei Chang, Iftekhar Naim, Siddhartha Brahma, Siamak Shakeri, Hongkun Yu, John Nham, Karthik Raman, Raphael Dominik Hoffmann
  • Publication number: 20230401382
    Abstract: Provided are systems and methods for incremental training of machine learning models to adapt to changes in an underlying data distribution. One example setting in which the techniques described herein may be beneficial is for incrementally training natural language models to enable the models to have or adapt to a dynamically changing vocabulary. Incremental training is provided as a feasible and inexpensive way of adapting machine learning models to evolving vocabulary without having to retrain them from scratch.
    Type: Application
    Filed: October 19, 2021
    Publication date: December 14, 2023
    Inventors: Spurthi Amba Hombaiah, Mingyang Zhang, Michael Bendersky, Tao Chen, Marc Alexander Najork
  • Publication number: 20230297783
    Abstract: Systems and methods of the present disclosure are directed to a method for predicting semantic similarity between documents. The method can include obtaining a first document and a second document. The method can include parsing the first document into a plurality of first textual blocks and the second document into a plurality of second textual blocks. The method can include processing each of the plurality of first textual blocks and the second textual blocks with a machine-learned semantic document encoding model to obtain a first document encoding and a second document encoding. The method can include determining a similarity metric descriptive of a semantic similarity between the first document and the second document based on the first document encoding and the second document encoding.
    Type: Application
    Filed: May 22, 2023
    Publication date: September 21, 2023
    Inventors: Liu Yang, Marc Najork, Michael Bendersky, Mingyang Zhang, Cheng Li
  • Publication number: 20230267277
    Abstract: Systems and methods of the present disclosure are directed to a method for training a machine-learned semantic matching model. The method can include obtaining a first and second document and a first and second activity log. The method can include determining, based on the first document activity log and the second document activity log, a relation label indicative of whether the documents are related. The method can include inputting the documents into the model to receive a semantic similarity value representing an estimated semantic similarity between the first document and the second document. The method can include evaluating a loss function that evaluates a difference between the relation label and the semantic similarity value. The method can include modifying values of parameters of the model based on the loss function.
    Type: Application
    Filed: June 15, 2020
    Publication date: August 24, 2023
    Inventors: Weize Kong, Michael Bendersky, Marc Najork, Rama Kumar Pasumarthi, Zhen Qin, Rolf Jagerman
  • Publication number: 20230222285
    Abstract: Systems and methods for document processing that can process and understand the layout, text size, text style, and multimedia of a document can generate more accurate and informed document representations. The layout of a document paired with text size and style can indicate what portions of a document are possibly more important, and the understanding of that importance can help with understanding of the document. Systems and methods utilizing a hierarchical framework that processes the block-level and the document-level of a document can capitalize on these indicators to generate a better document representation.
    Type: Application
    Filed: December 22, 2020
    Publication date: July 13, 2023
    Inventors: Mingyang Zhang, Cheng Li, Tao Chen, Spurthi Amba Hombaiah, Michael Bendersky, Marc Alexander Najork, Te-Lin Wu
  • Patent number: 11694034
    Abstract: Systems and methods of the present disclosure are directed to a method for predicting semantic similarity between documents. The method can include obtaining a first document and a second document. The method can include parsing the first document into a plurality of first textual blocks and the second document into a plurality of second textual blocks. The method can include processing each of the plurality of first textual blocks and the second textual blocks with a machine-learned semantic document encoding model to obtain a first document encoding and a second document encoding. The method can include determining a similarity metric descriptive of a semantic similarity between the first document and the second document based on the first document encoding and the second document encoding.
    Type: Grant
    Filed: October 23, 2020
    Date of Patent: July 4, 2023
    Assignee: GOOGLE LLC
    Inventors: Liu Yang, Marc Najork, Michael Bendersky, Mingyang Zhang, Cheng Li
  • Publication number: 20230177004
    Abstract: Techniques are described herein for enabling more computationally efficient organization of files within a cloud storage system. A method includes: receiving information identifying a document and a set of folders; for each folder in the set of folders, using a trained model to predict a similarity measure between the folder and the document; for each folder in the set of folders, determining a score for the folder based on the predicted similarity measure for the folder; selecting a candidate folder from the set of folders using the scores of the folders within the set of folders; and providing, on a user interface, a selectable option to associate the document with the candidate folder.
    Type: Application
    Filed: December 7, 2021
    Publication date: June 8, 2023
    Inventors: Weize Kong, Mingyang Zhang, Michael Bendersky, Marc Alexander Najork, Mike Colagrosso, Brandon Vargo, Remy Burger
  • Publication number: 20230169128
    Abstract: Techniques of generating recrawl policies for commercial offer pages include generating a multiple strategy approach using a number of different strategies. In some implementations, each strategy is an arm of a K-armed adversarial bandits algorithm with reinforcement learning. Moreover, in some implementations, the multiple strategy approach also uses a machine learning algorithm to estimate parameters such as a click rate, impression rate, and likelihood of price change, i.e., change rate, which was assumed known in the conventional approaches.
    Type: Application
    Filed: March 30, 2020
    Publication date: June 1, 2023
    Inventors: Michael Bendersky, Przemyslaw Gajda, Sergey Novikov, Marc Alexander Najork, Shuguang Han
  • Publication number: 20220129638
    Abstract: Systems and methods of the present disclosure are directed to a method for predicting semantic similarity between documents. The method can include obtaining a first document and a second document. The method can include parsing the first document into a plurality of first textual blocks and the second document into a plurality of second textual blocks. The method can include processing each of the plurality of first textual blocks and the second textual blocks with a machine-learned semantic document encoding model to obtain a first document encoding and a second document encoding. The method can include determining a similarity metric descriptive of a semantic similarity between the first document and the second document based on the first document encoding and the second document encoding.
    Type: Application
    Filed: October 23, 2020
    Publication date: April 28, 2022
    Inventors: Liu Yang, Marc Najork, Michael Bendersky, Mingyang Zhang, Cheng Li
  • Patent number: 11238058
    Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate identification of additional trigger-terms for a structured information card. In one aspect, the method includes actions of accessing data associated with a template for presenting structured information, wherein the accessed data references (i) a label term and (ii) a value. Other actions may include obtaining a candidate label term, identifying one or more entities that are associated with the label term, identifying one or more of the entities that are associated with the candidate label term, and for each particular entity of the one or more entities that are associated with the candidate label term, associating, with the candidate label term, (i) a label term that is associated with the particular entity, and (ii) the value associated with the label term.
    Type: Grant
    Filed: November 2, 2020
    Date of Patent: February 1, 2022
    Assignee: Google LLC
    Inventors: Marc Alexander Najork, Sujith Ravi, Michael Bendersky, Peter Shao-sen Young, Timothy Youngjin Sohn, Mingyang Zhang, Thomas Nelson, Xuanhui Wang
  • Publication number: 20210125108
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a ranking machine learning model. In one aspect, a method includes the actions of receiving training data for a ranking machine learning model, the training data including training examples, and each training example including data identifying: a search query, result documents from a result list for the search query, and a result document that was selected by a user from the result list, receiving position data for each training example in the training data, the position data identifying a respective position of the selected result document in the result list for the search query in the training example; determining, for each training example in the training data, a respective selection bias value; and determining a respective importance value for each training example from the selection bias value for the training example, the importance value.
    Type: Application
    Filed: October 24, 2016
    Publication date: April 29, 2021
    Applicant: Google LLC
    Inventors: Donald Arthur Metzler, JR., Xuanhui Wang, Marc Alexander Najork, Michael Bendersky
  • Publication number: 20210049165
    Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate identification of additional trigger-terms for a structured information card. In one aspect, the method includes actions of accessing data associated with a template for presenting structured information, wherein the accessed data references (i) a label term and (ii) a value. Other actions may include obtaining a candidate label term, identifying one or more entities that are associated with the label term, identifying one or more of the entities that are associated with the candidate label term, and for each particular entity of the one or more entities that are associated with the candidate label term, associating, with the candidate label term, (i) a label term that is associated with the particular entity, and (ii) the value associated with the label term.
    Type: Application
    Filed: November 2, 2020
    Publication date: February 18, 2021
    Inventors: Marc Alexander Najork, Sujith Ravi, Michael Bendersky, Peter Shao-sen Young, Timothy Youngjin Sohn, Mingyang Zhang, Thomas Nelson, Xuanhui Wang
  • Patent number: 10824630
    Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate identification of additional trigger-terms for a structured information card. In one aspect, the method includes actions of accessing data associated with a template for presenting structured information, wherein the accessed data references (i) a label term and (ii) a value. Other actions may include obtaining a candidate label term, identifying one or more entities that are associated with the label term, identifying one or more of the entities that are associated with the candidate label term, and for each particular entity of the one or more entities that are associated with the candidate label term, associating, with the candidate label term, (i) a label term that is associated with the particular entity, and (ii) the value associated with the label term.
    Type: Grant
    Filed: October 26, 2016
    Date of Patent: November 3, 2020
    Assignee: GOOGLE LLC
    Inventors: Marc Alexander Najork, Sujith Ravi, Michael Bendersky, Peter Shao-sen Young, Timothy Youngjin Sohn, Mingyang Zhang, Thomas Nelson, Xuanhui Wang
  • Publication number: 20180113865
    Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate identification of additional trigger-terms for a structured information card. In one aspect, the method includes actions of accessing data associated with a template for presenting structured information, wherein the accessed data references (i) a label term and (ii) a value. Other actions may include obtaining a candidate label term, identifying one or more entities that are associated with the label term, identifying one or more of the entities that are associated with the candidate label term, and for each particular entity of the one or more entities that are associated with the candidate label term, associating, with the candidate label term, (i) a label term that is associated with the particular entity, and (ii) the value associated with the label term.
    Type: Application
    Filed: October 26, 2016
    Publication date: April 26, 2018
    Inventors: Marc Alexander Najork, Sujith Ravi, Michael Bendersky, Peter Shao-sen Young, Timothy Youngjin Sohn, Mingyang Zhang, Thomas Nelson, Xuanhui Wang
  • Publication number: 20110258034
    Abstract: Novel and efficient methods are described for indexing advertisements (“ads”) and other resources that are defined and organized in accordance with a hierarchical schema. In accordance with at least one embodiment, an ad corpus is transformed into a collection of hierarchically structured textual documents. An indexing technique that exploits the hierarchical structure is then applied to construct a compact yet effective ad index that can be used for performing advanced match or other ad retrieval functions. Various retrieval methods are also described herein that are capable of exploiting the hierarchical structure of the ad corpus to retrieve more relevant ads than those yielded by conventional methods.
    Type: Application
    Filed: April 15, 2010
    Publication date: October 20, 2011
    Applicant: YAHOO! INC.
    Inventors: Donald Metzler, Evgeniy Gabrilovich, Vanja Josifovski, Michael Bendersky