Patents by Inventor Chenyan XIONG

Chenyan XIONG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240070202
    Abstract: A computer-implemented technique is described herein for assisting a user in advancing a task objective. The technique uses a suggestion-generating system (SGS) to provide one or more suggestions to a user in response to at least a last-submitted query provided by the user. The SGS may correspond to a classification-type or generative-type neural network. The SGS uses a machine-trained model that is trained using a multi-task training framework based on plural groups of training examples, which, in turn, are produced using different respective example-generating methods. One such example-generating method constructs a training example from queries in a search session. It operates by identifying the task-related intent the queries, and then identifying at least one sequence of queries in the search session that exhibits a coherent task-related intent. A training example is constructed based on queries in such a sequence.
    Type: Application
    Filed: November 7, 2023
    Publication date: February 29, 2024
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Corby Louis ROSSET, Chenyan XIONG, Paul Nathan BENNETT, Saurabh Kumar TIWARY, Daniel Fernando CAMPOS, Xia SONG, Nicholas Eric CRASWELL
  • Publication number: 20240046026
    Abstract: A method for text compression comprises recognizing a prefix string of one or more text characters preceding a target string of a plurality of text characters to be compressed. The prefix string is provided to a natural language generation (NLG) model configured to output one or more predicted continuations each having an associated rank. If the one or more predicted continuations include a matching predicted continuation relative to the next one or more text characters of the target string, the next one or more text characters are compressed as an NLG-type compressed representation. If no predicted continuations match the next one or more text characters of the target string, a longest matching entry in a compression dictionary is identified. The next one or more text characters of the target string are compressed as a dictionary-type compressed representation that includes the dictionary index value of the longest matching entry.
    Type: Application
    Filed: October 17, 2023
    Publication date: February 8, 2024
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Ronny LEMPEL, Chenyan XIONG
  • Patent number: 11853362
    Abstract: A computer-implemented technique is described herein for assisting a user in advancing a task objective. The technique uses a suggestion-generating system (SGS) to provide one or more suggestions to a user in response to at least a last-submitted query provided by the user. The SGS may correspond to a classification-type or generative-type neural network. The SGS uses a machine-trained model that is trained using a multi-task training framework based on plural groups of training examples, which, in turn, are produced using different respective example-generating methods. One such example-generating method constructs a training example from queries in a search session. It operates by identifying the task-related intent the queries, and then identifying at least one sequence of queries in the search session that exhibits a coherent task-related intent. A training example is constructed based on queries in such a sequence.
    Type: Grant
    Filed: April 16, 2020
    Date of Patent: December 26, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Corby Louis Rosset, Chenyan Xiong, Paul Nathan Bennett, Saurabh Kumar Tiwary, Daniel Fernando Campos, Xia Song, Nicholas Eric Craswell
  • Patent number: 11829374
    Abstract: Document embedding vectors for each document of a corpus may be generated by combining embedding vectors for document subparts, thereby yielding a final embedding vector for the document. A machine learning model is trained using a query corpus and the document corpus, where the model generates a ranking score for a given (query, document) pair. During training, rankings scores are generated using the model, such that the training dataset is further refined using the generated ranking scores. For example, top documents and a negative document may be determined for a given query and subsequently used as training data. Multiple negative documents may therefore be determined for a given query. A negative document for a given query may be determined from the negative documents using noise-contrastive estimation. Such determined negative documents may be evaluated using a loss function during model training, thereby yielding a more robust model for search processing.
    Type: Grant
    Filed: March 19, 2021
    Date of Patent: November 28, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Junaid Ahmed, Li Xiong, Arnold Overwijk, Chenyan Xiong
  • Patent number: 11803693
    Abstract: A method for text compression comprises recognizing a prefix string of one or more text characters preceding a target string of a plurality of text characters to be compressed. The prefix string is provided to a natural language generation (NLG) model configured to output one or more predicted continuations each having an associated rank. If the one or more predicted continuations include a matching predicted continuation relative to the next one or more text characters of the target string, the next one or more text characters are compressed as an NLG-type compressed representation. If no predicted continuations match the next one or more text characters of the target string, a longest matching entry in a compression dictionary is identified. The next one or more text characters of the target string are compressed as a dictionary-type compressed representation that includes the dictionary index value of the longest matching entry.
    Type: Grant
    Filed: June 18, 2021
    Date of Patent: October 31, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Ronny Lempel, Chenyan Xiong
  • Patent number: 11734559
    Abstract: To provide automated categorization of structured textual content individual nodes of textual content, from a document object model encapsulation of the structured textual content, have a multidimensional vector associated with them, where the values of the various dimensions of the multidimensional vector are based on the textual content in the corresponding node, the visual features applied or associated with the textual content of the corresponding node, and positional information of the textual content of the corresponding node. The multidimensional vectors are input to a neighbor-imbuing neural network. The enhanced multidimensional vectors output by the neighbor-imbuing neural network are then be provided to a categorization neural network. The resulting output can be in the form of multidimensional vectors whose dimensionality is proportional to categories into which the structured textual content is to be categorized. A weighted merge takes into account multiple nodes that are grouped together.
    Type: Grant
    Filed: June 19, 2020
    Date of Patent: August 22, 2023
    Assignee: MICRSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Charumathi Lakshmanan, Ye Li, Arnold Overwijk, Chenyan Xiong, Jiguang Shen, Junaid Ahmed, Jiaming Guo
  • Patent number: 11657223
    Abstract: A system for extracting a key phrase from a document includes a neural key phrase extraction model (“BLING-KPE”) having a first layer to extract a word sequence from the document, a second layer to represent each word in the word sequence by ELMo embedding, position embedding, and visual features, and a third layer to concatenate the ELMo embedding, the position embedding, and the visual features to produce hybrid word embeddings. A convolutional transformer models the hybrid word embeddings to n-gram embeddings, and a feedforward layer converts the n-gram embeddings into a probability distribution over a set of n-grams and calculates a key phrase score of each n-gram. The neural key phrase extraction model is trained on annotated data based on a labeled loss function to compute cross entropy loss of the key phrase score of each n-gram as compared with a label from the annotated dataset.
    Type: Grant
    Filed: December 16, 2021
    Date of Patent: May 23, 2023
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Li Xiong, Chuan Hu, Arnold Overwijk, Junaid Ahmed, Daniel Fernando Campos, Chenyan Xiong
  • Publication number: 20220405461
    Abstract: A method for text compression comprises recognizing a prefix string of one or more text characters preceding a target string of a plurality of text characters to be compressed. The prefix string is provided to a natural language generation (NLG) model configured to output one or more predicted continuations each having an associated rank. If the one or more predicted continuations include a matching predicted continuation relative to the next one or more text characters of the target string, the next one or more text characters are compressed as an NLG-type compressed representation. If no predicted continuations match the next one or more text characters of the target string, a longest matching entry in a compression dictionary is identified. The next one or more text characters of the target string are compressed as a dictionary-type compressed representation that includes the dictionary index value of the longest matching entry.
    Type: Application
    Filed: June 18, 2021
    Publication date: December 22, 2022
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Ronny LEMPEL, Chenyan XIONG
  • Publication number: 20220374479
    Abstract: This document relates to natural language processing using a framework such as a neural network. One example method involves obtaining a first document and a second document and propagating attention from the first document to the second document. The example method also involves producing contextualized semantic representations of individual words in the second document based at least on the propagating. The contextualized semantic representations can provide a basis for performing one or more natural language processing operations.
    Type: Application
    Filed: July 18, 2022
    Publication date: November 24, 2022
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Chenyan Xiong, Chen Zhao, Corbin Louis Rosset, Paul Nathan Bennett, Xia Song, Saurabh Kumar Tiwary
  • Patent number: 11423093
    Abstract: This document relates to natural language processing using a framework such as a neural network. One example method involves obtaining a first document and a second document and propagating attention from the first document to the second document. The example method also involves producing contextualized semantic representations of individual words in the second document based at least on the propagating. The contextualized semantic representations can provide a basis for performing one or more natural language processing operations.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: August 23, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Chenyan Xiong, Chen Zhao, Corbin Louis Rosset, Paul Nathan Bennett, Xia Song, Saurabh Kumar Tiwary
  • Publication number: 20220179871
    Abstract: Document embedding vectors for each document of a corpus may be generated by combining embedding vectors for document subparts, thereby yielding a final embedding vector for the document. A machine learning model is trained using a query corpus and the document corpus, where the model generates a ranking score for a given (query, document) pair. During training, rankings scores are generated using the model, such that the training dataset is further refined using the generated ranking scores. For example, top documents and a negative document may be determined for a given query and subsequently used as training data. Multiple negative documents may therefore be determined for a given query. A negative document for a given query may be determined from the negative documents using noise-contrastive estimation. Such determined negative documents may be evaluated using a loss function during model training, thereby yielding a more robust model for search processing.
    Type: Application
    Filed: March 19, 2021
    Publication date: June 9, 2022
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Junaid AHMED, Li XIONG, Arnold OVERWIJK, Chenyan XIONG
  • Publication number: 20220108078
    Abstract: A system for extracting a key phrase from a document includes a neural key phrase extraction model (“BLING-KPE”) having a first layer to extract a word sequence from the document, a second layer to represent each word in the word sequence by ELMo embedding, position embedding, and visual features, and a third layer to concatenate the ELMo embedding, the position embedding, and the visual features to produce hybrid word embeddings. A convolutional transformer models the hybrid word embeddings to n-gram embeddings, and a feedforward layer converts the n-gram embeddings into a probability distribution over a set of n-grams and calculates a key phrase score of each n-gram. The neural key phrase extraction model is trained on annotated data based on a labeled loss function to compute cross entropy loss of the key phrase score of each n-gram as compared with a label from the annotated dataset.
    Type: Application
    Filed: December 16, 2021
    Publication date: April 7, 2022
    Inventors: Li XIONG, Chuan HU, Arnold OVERWIJK, Junaid AHMED, Daniel Fernando CAMPOS, Chenyan XIONG
  • Patent number: 11250214
    Abstract: A system for extracting a key phrase from a document includes a neural key phrase extraction model (“BLING-KPE”) having a first layer to extract a word sequence from the document, a second layer to represent each word in the word sequence by ELMo embedding, position embedding, and visual features, and a third layer to concatenate the ELMo embedding, the position embedding, and the visual features to produce hybrid word embeddings. A convolutional transformer models the hybrid word embeddings to n-gram embeddings, and a feedforward layer converts the n-gram embeddings into a probability distribution over a set of n-grams and calculates a key phrase score of each n-gram. The neural key phrase extraction model is trained on annotated data based on a labeled loss function to compute cross entropy loss of the key phrase score of each n-gram as compared with a label from the annotated dataset.
    Type: Grant
    Filed: July 2, 2019
    Date of Patent: February 15, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Li Xiong, Chuan Hu, Arnold Overwijk, Junaid Ahmed, Daniel Fernando Campos, Chenyan Xiong
  • Publication number: 20210397944
    Abstract: To provide automated categorization of structured textual content individual nodes of textual content, from a document object model encapsulation of the structured textual content, have a multidimensional vector associated with them, where the values of the various dimensions of the multidimensional vector are based on the textual content in the corresponding node, the visual features applied or associated with the textual content of the corresponding node, and positional information of the textual content of the corresponding node. The multidimensional vectors are input to a neighbor-imbuing neural network. The enhanced multidimensional vectors output by the neighbor-imbuing neural network are then be provided to a categorization neural network. The resulting output can be in the form of multidimensional vectors whose dimensionality is proportional to categories into which the structured textual content is to be categorized. A weighted merge takes into account multiple nodes that are grouped together.
    Type: Application
    Filed: June 19, 2020
    Publication date: December 23, 2021
    Inventors: Charumathi Lakshmanan, Ye Li, Arnold Overwijk, Chenyan Xiong, Jiguang Shen, Junaid Ahmed, Jiaming Guo
  • Publication number: 20210326742
    Abstract: A computer-implemented technique is described herein for assisting a user in advancing a task objective. The technique uses a suggestion-generating system (SGS) to provide one or more suggestions to a user in response to at least a last-submitted query provided by the user. The SGS may correspond to a classification-type or generative-type neural network. The SGS uses a machine-trained model that is trained using a multi-task training framework based on plural groups of training examples, which, in turn, are produced using different respective example-generating methods. One such example-generating method constructs a training example from queries in a search session. It operates by identifying the task-related intent the queries, and then identifying at least one sequence of queries in the search session that exhibits a coherent task-related intent. A training example is constructed based on queries in such a sequence.
    Type: Application
    Filed: April 16, 2020
    Publication date: October 21, 2021
    Inventors: Corby Louis ROSSET, Chenyan XIONG, Paul Nathan BENNETT, Saurabh Kumar TIWARY, Daniel Fernando CAMPOS, Xia SONG, Nicholas Eric CRASWELL
  • Patent number: 11138285
    Abstract: A computer-implemented technique receives an input expression that a user submits with an intent to accomplish some objective. The technique then uses a machine-trained intent encoder component to map the input expression into an input expression intent vector (IEIV). The IEIV corresponds to a distributed representation of the intent associated with the input expression, within a vector intent vector space. The technique then leverages the intent vector to facilitate some downstream application task, such as the retrieval of information. Some application tasks also use a neighbor search component to find expressions that express an intent similar to that of the input expression. A training system trains the intent encoder component based on the nexus between queries and user clicks, as recorded in a search engine's search log.
    Type: Grant
    Filed: March 7, 2019
    Date of Patent: October 5, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Hongfei Zhang, Xia Song, Chenyan Xiong, Corbin Louis Rosset, Paul Nathan Bennett, Nicholas Eric Craswell, Saurabh Kumar Tiwary
  • Publication number: 20210089594
    Abstract: This document relates to natural language processing using a framework such as a neural network. One example method involves obtaining a first document and a second document and propagating attention from the first document to the second document. The example method also involves producing contextualized semantic representations of individual words in the second document based at least on the propagating. The contextualized semantic representations can provide a basis for performing one or more natural language processing operations.
    Type: Application
    Filed: September 25, 2019
    Publication date: March 25, 2021
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Chenyan Xiong, Chen Zhao, Corbin Louis Rosset, Paul Nathan Bennett, Xia Song, Saurabh Kumar Tiwary
  • Publication number: 20210004439
    Abstract: A system for extracting a key phrase from a document includes a neural key phrase extraction model (“BLING-KPE”) having a first layer to extract a word sequence from the document, a second layer to represent each word in the word sequence by ELMo embedding, position embedding, and visual features, and a third layer to concatenate the ELMo embedding, the position embedding, and the visual features to produce hybrid word embeddings. A convolutional transformer models the hybrid word embeddings to n-gram embeddings, and a feedforward layer converts the n-gram embeddings into a probability distribution over a set of n-grams and calculates a key phrase score of each n-gram. The neural key phrase extraction model is trained on annotated data based on a labeled loss function to compute cross entropy loss of the key phrase score of each n-gram as compared with a label from the annotated dataset.
    Type: Application
    Filed: July 2, 2019
    Publication date: January 7, 2021
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Li XIONG, Chuan HU, Arnold OVERWIJK, Junaid AHMED, Daniel Fernando CAMPOS, Chenyan XIONG
  • Publication number: 20200285687
    Abstract: A computer-implemented technique is described herein that receives an input expression that a user submits with an intent to accomplish some objective. The technique then uses a machine-trained intent encoder component to map the input expression into an input expression intent vector (IEIV). The IEIV corresponds to a distributed representation of the intent associated with the input expression, within a vector intent vector space. The technique then leverages the intent vector to facilitate some downstream application task, such as the retrieval of information. Some application tasks also use a neighbor search component to find expressions that express an intent similar to that of the input expression. A training system trains the intent encoder component based on the nexus between queries and user clicks, as recorded in a search engine's search log.
    Type: Application
    Filed: March 7, 2019
    Publication date: September 10, 2020
    Inventors: Hongfei ZHANG, Xia SONG, Chenyan XIONG, Corbin Louis ROSSET, Paul Nathan BENNETT, Nicholas Eric CRASWELL, Saurabh Kumar TIWARY