Patents by Inventor Marc Alexander Najork

Marc Alexander Najork has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230401382
    Abstract: Provided are systems and methods for incremental training of machine learning models to adapt to changes in an underlying data distribution. One example setting in which the techniques described herein may be beneficial is for incrementally training natural language models to enable the models to have or adapt to a dynamically changing vocabulary. Incremental training is provided as a feasible and inexpensive way of adapting machine learning models to evolving vocabulary without having to retrain them from scratch.
    Type: Application
    Filed: October 19, 2021
    Publication date: December 14, 2023
    Inventors: Spurthi Amba Hombaiah, Mingyang Zhang, Michael Bendersky, Tao Chen, Marc Alexander Najork
  • Publication number: 20230222285
    Abstract: Systems and methods for document processing that can process and understand the layout, text size, text style, and multimedia of a document can generate more accurate and informed document representations. The layout of a document paired with text size and style can indicate what portions of a document are possibly more important, and the understanding of that importance can help with understanding of the document. Systems and methods utilizing a hierarchical framework that processes the block-level and the document-level of a document can capitalize on these indicators to generate a better document representation.
    Type: Application
    Filed: December 22, 2020
    Publication date: July 13, 2023
    Inventors: Mingyang Zhang, Cheng Li, Tao Chen, Spurthi Amba Hombaiah, Michael Bendersky, Marc Alexander Najork, Te-Lin Wu
  • Publication number: 20230177004
    Abstract: Techniques are described herein for enabling more computationally efficient organization of files within a cloud storage system. A method includes: receiving information identifying a document and a set of folders; for each folder in the set of folders, using a trained model to predict a similarity measure between the folder and the document; for each folder in the set of folders, determining a score for the folder based on the predicted similarity measure for the folder; selecting a candidate folder from the set of folders using the scores of the folders within the set of folders; and providing, on a user interface, a selectable option to associate the document with the candidate folder.
    Type: Application
    Filed: December 7, 2021
    Publication date: June 8, 2023
    Inventors: Weize Kong, Mingyang Zhang, Michael Bendersky, Marc Alexander Najork, Mike Colagrosso, Brandon Vargo, Remy Burger
  • Publication number: 20230169128
    Abstract: Techniques of generating recrawl policies for commercial offer pages include generating a multiple strategy approach using a number of different strategies. In some implementations, each strategy is an arm of a K-armed adversarial bandits algorithm with reinforcement learning. Moreover, in some implementations, the multiple strategy approach also uses a machine learning algorithm to estimate parameters such as a click rate, impression rate, and likelihood of price change, i.e., change rate, which was assumed known in the conventional approaches.
    Type: Application
    Filed: March 30, 2020
    Publication date: June 1, 2023
    Inventors: Michael Bendersky, Przemyslaw Gajda, Sergey Novikov, Marc Alexander Najork, Shuguang Han
  • Publication number: 20230094198
    Abstract: Implementations relate to training a model that can be used to process values for defined features, where the values are specific to a user account, to generate a predicted user measure that reflects both popularity and quality of the user account. The model is trained based on losses that are each generated as a function of both a corresponding generated popularity measure and a corresponding generated quality measure of a corresponding training instance. Accordingly, the model can be trained to generate, based on values for a given user account, a single measure that reflects both quality and popularity of the given user account. Implementations are additionally or alternatively directed to utilizing such predicted user measures to restrict provisioning of content items that are from user accounts having respective predicted user measures that fail to satisfy a threshold.
    Type: Application
    Filed: December 5, 2022
    Publication date: March 30, 2023
    Inventors: Spurthi Amba Hombaiah, Vladimir Ofitserov, Mike Bendersky, Marc Alexander Najork
  • Patent number: 11551150
    Abstract: Implementations relate to training a model that can be used to process values for defined features, where the values are specific to a user account, to generate a predicted user measure that reflects both popularity and quality of the user account. The model is trained based on losses that are each generated as a function of both a corresponding generated popularity measure and a corresponding generated quality measure of a corresponding training instance. Accordingly, the model can be trained to generate, based on values for a given user account, a single measure that reflects both quality and popularity of the given user account. Implementations are additionally or alternatively directed to utilizing such predicted user measures to restrict provisioning of content items that are from user accounts having respective predicted user measures that fail to satisfy a threshold.
    Type: Grant
    Filed: July 6, 2020
    Date of Patent: January 10, 2023
    Assignee: GOOGLE LLC
    Inventors: Spurthi Amba Hombaiah, Vladimir Ofitserov, Mike Bendersky, Marc Alexander Najork
  • Patent number: 11238058
    Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate identification of additional trigger-terms for a structured information card. In one aspect, the method includes actions of accessing data associated with a template for presenting structured information, wherein the accessed data references (i) a label term and (ii) a value. Other actions may include obtaining a candidate label term, identifying one or more entities that are associated with the label term, identifying one or more of the entities that are associated with the candidate label term, and for each particular entity of the one or more entities that are associated with the candidate label term, associating, with the candidate label term, (i) a label term that is associated with the particular entity, and (ii) the value associated with the label term.
    Type: Grant
    Filed: November 2, 2020
    Date of Patent: February 1, 2022
    Assignee: Google LLC
    Inventors: Marc Alexander Najork, Sujith Ravi, Michael Bendersky, Peter Shao-sen Young, Timothy Youngjin Sohn, Mingyang Zhang, Thomas Nelson, Xuanhui Wang
  • Publication number: 20220004918
    Abstract: Implementations relate to training a model that can be used to process values for defined features, where the values are specific to a user account, to generate a predicted user measure that reflects both popularity and quality of the user account. The model is trained based on losses that are each generated as a function of both a corresponding generated popularity measure and a corresponding generated quality measure of a corresponding training instance. Accordingly, the model can be trained to generate, based on values for a given user account, a single measure that reflects both quality and popularity of the given user account. Implementations are additionally or alternatively directed to utilizing such predicted user measures to restrict provisioning of content items that are from user accounts having respective predicted user measures that fail to satisfy a threshold.
    Type: Application
    Filed: July 6, 2020
    Publication date: January 6, 2022
    Inventors: Spurthi Amba Hombaiah, Vladimir Ofitserov, Mike Bendersky, Marc Alexander Najork
  • Publication number: 20210374345
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing a machine learning task on a tuple of respective input sequences to generate an output. In one aspect, one of the systems includes a neural network comprising a plurality of encoder neural networks and a head neural network, each encoder neural network configured to: receive a respective input sequence from the tuple; process the respective input sequence using one or more encoder network layers to generate an encoded representation comprising a sequence of tokens; and process each of some or all of the tokens in the sequence of tokens using a projection layer to generate a lower-dimensional representation, and the head neural network configured to: receive lower-dimensional representations of a respective proper subset of the sequence of tokens generated by the encoder neural network; and process the lower-dimensional representations to generate the output.
    Type: Application
    Filed: June 1, 2021
    Publication date: December 2, 2021
    Inventors: Karthik Raman, Liu Yang, Mike Bendersky, Jiecao Chen, Marc Alexander Najork
  • Publication number: 20210125108
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a ranking machine learning model. In one aspect, a method includes the actions of receiving training data for a ranking machine learning model, the training data including training examples, and each training example including data identifying: a search query, result documents from a result list for the search query, and a result document that was selected by a user from the result list, receiving position data for each training example in the training data, the position data identifying a respective position of the selected result document in the result list for the search query in the training example; determining, for each training example in the training data, a respective selection bias value; and determining a respective importance value for each training example from the selection bias value for the training example, the importance value.
    Type: Application
    Filed: October 24, 2016
    Publication date: April 29, 2021
    Applicant: Google LLC
    Inventors: Donald Arthur Metzler, JR., Xuanhui Wang, Marc Alexander Najork, Michael Bendersky
  • Patent number: 10970293
    Abstract: Methods and apparatus related to using document feature(s) of a document that is responsive to a query, and optionally query feature(s) of the query, to determine a presentation characteristic for presenting a search result that corresponds to the document. In some implementations, measures associated with the document feature(s) and/or query feature(s) may be used to determine the presentation characteristic. The measures may be based on past interactions, by corresponding users, with other documents that share one or more of the document features with the document, where a plurality of the other documents are different from the document (and optionally each different from one another). In some implementations, the document and/or the other documents include, or are restricted to, documents that are access restricted.
    Type: Grant
    Filed: August 26, 2019
    Date of Patent: April 6, 2021
    Assignee: GOOGLE LLC
    Inventors: Mike Bendersky, Marc Alexander Najork, Donald Metzler, Xuanhui Wang
  • Publication number: 20210049165
    Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate identification of additional trigger-terms for a structured information card. In one aspect, the method includes actions of accessing data associated with a template for presenting structured information, wherein the accessed data references (i) a label term and (ii) a value. Other actions may include obtaining a candidate label term, identifying one or more entities that are associated with the label term, identifying one or more of the entities that are associated with the candidate label term, and for each particular entity of the one or more entities that are associated with the candidate label term, associating, with the candidate label term, (i) a label term that is associated with the particular entity, and (ii) the value associated with the label term.
    Type: Application
    Filed: November 2, 2020
    Publication date: February 18, 2021
    Inventors: Marc Alexander Najork, Sujith Ravi, Michael Bendersky, Peter Shao-sen Young, Timothy Youngjin Sohn, Mingyang Zhang, Thomas Nelson, Xuanhui Wang
  • Patent number: 10824630
    Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate identification of additional trigger-terms for a structured information card. In one aspect, the method includes actions of accessing data associated with a template for presenting structured information, wherein the accessed data references (i) a label term and (ii) a value. Other actions may include obtaining a candidate label term, identifying one or more entities that are associated with the label term, identifying one or more of the entities that are associated with the candidate label term, and for each particular entity of the one or more entities that are associated with the candidate label term, associating, with the candidate label term, (i) a label term that is associated with the particular entity, and (ii) the value associated with the label term.
    Type: Grant
    Filed: October 26, 2016
    Date of Patent: November 3, 2020
    Assignee: GOOGLE LLC
    Inventors: Marc Alexander Najork, Sujith Ravi, Michael Bendersky, Peter Shao-sen Young, Timothy Youngjin Sohn, Mingyang Zhang, Thomas Nelson, Xuanhui Wang
  • Patent number: 10540610
    Abstract: Methods, apparatus, and computer-readable media are provided for analyzing a cluster of communications, such as B2C emails, to generate a template for the cluster that defines transient segments and fixed segments of the cluster of communications. More particularly, methods, apparatus, and computer-readable media are provided for generating and/or applying a trained structured machine learning model for a generated template that can be used to determine, for one or more transient segments of subsequent communications, a corresponding probability that a given semantic label is the correct semantic label for extracted content of the transient segment(s).
    Type: Grant
    Filed: April 27, 2016
    Date of Patent: January 21, 2020
    Assignee: GOOGLE LLC
    Inventors: Jie Yang, Amr Ahmed, Luis Garcia Pueyo, Mike Bendersky, Amitabh Saikia, Marc-Allen Cartright, Marc Alexander Najork, MyLinh Yang, Hui Tan, Weinan Zhang, Vanja Josifovski, Alexander J. Smola
  • Publication number: 20190377741
    Abstract: Methods and apparatus related to using document feature(s) of a document that is responsive to a query, and optionally query feature(s) of the query, to determine a presentation characteristic for presenting a search result that corresponds to the document. In some implementations, measures associated with the document feature(s) and/or query feature(s) may be used to determine the presentation characteristic. The measures may be based on past interactions, by corresponding users, with other documents that share one or more of the document features with the document, where a plurality of the other documents are different from the document (and optionally each different from one another). In some implementations, the document and/or the other documents include, or are restricted to, documents that are access restricted.
    Type: Application
    Filed: August 26, 2019
    Publication date: December 12, 2019
    Inventors: Mike Bendersky, Marc Alexander Najork, Donald Metzler, Xuanhui Wang
  • Patent number: 10394832
    Abstract: Methods and apparatus related to using document feature(s) of a document that is responsive to a query, and optionally query feature(s) of the query, to determine a presentation characteristic for presenting a search result that corresponds to the document. In some implementations, measures associated with the document feature(s) and/or query feature(s) may be used to determine the presentation characteristic. The measures may be based on past interactions, by corresponding users, with other documents that share one or more of the document features with the document, where a plurality of the other documents are different from the document (and optionally each different from one another). In some implementations, the document and/or the other documents include, or are restricted to, documents that are access restricted.
    Type: Grant
    Filed: October 24, 2016
    Date of Patent: August 27, 2019
    Assignee: GOOGLE LLC
    Inventors: Mike Bendersky, Marc Alexander Najork, Donald Metzler, Xuanhui Wang
  • Publication number: 20180113865
    Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate identification of additional trigger-terms for a structured information card. In one aspect, the method includes actions of accessing data associated with a template for presenting structured information, wherein the accessed data references (i) a label term and (ii) a value. Other actions may include obtaining a candidate label term, identifying one or more entities that are associated with the label term, identifying one or more of the entities that are associated with the candidate label term, and for each particular entity of the one or more entities that are associated with the candidate label term, associating, with the candidate label term, (i) a label term that is associated with the particular entity, and (ii) the value associated with the label term.
    Type: Application
    Filed: October 26, 2016
    Publication date: April 26, 2018
    Inventors: Marc Alexander Najork, Sujith Ravi, Michael Bendersky, Peter Shao-sen Young, Timothy Youngjin Sohn, Mingyang Zhang, Thomas Nelson, Xuanhui Wang
  • Publication number: 20180113866
    Abstract: Methods and apparatus related to using document feature(s) of a document that is responsive to a query, and optionally query feature(s) of the query, to determine a presentation characteristic for presenting a search result that corresponds to the document. In some implementations, measures associated with the document feature(s) and/or query feature(s) may be used to determine the presentation characteristic. The measures may be based on past interactions, by corresponding users, with other documents that share one or more of the document features with the document, where a plurality of the other documents are different from the document (and optionally each different from one another). In some implementations, the document and/or the other documents include, or are restricted to, documents that are access restricted.
    Type: Application
    Filed: October 24, 2016
    Publication date: April 26, 2018
    Inventors: Mike Bendersky, Marc Alexander Najork, Donald Metzler, Xuanhui Wang
  • Patent number: 9953185
    Abstract: In various implementations, a plurality of non-private n-grams that satisfy a privacy criterion may be identified within a search log of private search queries and corresponding post-search activity. A plurality of query patterns may be generated based on the plurality of non-private n-grams. Aggregate search activity statistics associated with each of the plurality of query patterns may be determined from the search log. Aggregate search activity statistics associated with each query pattern may be indicative of search activity associated with a plurality of private search queries in the search log that match the query pattern. In response to a determination that aggregate search activity statistics for a given query pattern satisfy a performance criterion, a methodology for generating data that is presented in response to search queries that match the given query pattern may be altered based on aggregate search activity statistics associated with the given query pattern.
    Type: Grant
    Filed: November 24, 2015
    Date of Patent: April 24, 2018
    Assignee: GOOGLE LLC
    Inventors: Mike Bendersky, Donald Metzler, Marc Alexander Najork, Dor Naveh, Vlad Panait, Xuanhui Wang
  • Publication number: 20170147834
    Abstract: In various implementations, a plurality of non-private n-grams that satisfy a privacy criterion may be identified within a search log of private search queries and corresponding post-search activity. A plurality of query patterns may be generated based on the plurality of non-private n-grams. Aggregate search activity statistics associated with each of the plurality of query patterns may be determined from the search log. Aggregate search activity statistics associated with each query pattern may be indicative of search activity associated with a plurality of private search queries in the search log that match the query pattern. In response to a determination that aggregate search activity statistics for a given query pattern satisfy a performance criterion, a methodology for generating data that is presented in response to search queries that match the given query pattern may be altered based on aggregate search activity statistics associated with the given query pattern.
    Type: Application
    Filed: November 24, 2015
    Publication date: May 25, 2017
    Inventors: Mike Bendersky, Donald Metzler, Marc Alexander Najork, Dor Naveh, Vlad Panait, Xuanhui Wang