Patents by Inventor Chenyan XIONG

Chenyan XIONG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Using a Multi-Task-Trained Neural Network to Guide Interaction with a Query-Processing System via Useful Suggestions

Publication number: 20240070202

Abstract: A computer-implemented technique is described herein for assisting a user in advancing a task objective. The technique uses a suggestion-generating system (SGS) to provide one or more suggestions to a user in response to at least a last-submitted query provided by the user. The SGS may correspond to a classification-type or generative-type neural network. The SGS uses a machine-trained model that is trained using a multi-task training framework based on plural groups of training examples, which, in turn, are produced using different respective example-generating methods. One such example-generating method constructs a training example from queries in a search session. It operates by identifying the task-related intent the queries, and then identifying at least one sequence of queries in the search session that exhibits a coherent task-related intent. A training example is constructed based on queries in such a sequence.

Type: Application

Filed: November 7, 2023

Publication date: February 29, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Corby Louis ROSSET, Chenyan XIONG, Paul Nathan BENNETT, Saurabh Kumar TIWARY, Daniel Fernando CAMPOS, Xia SONG, Nicholas Eric CRASWELL
TEXT COMPRESSION WITH PREDICTED CONTINUATIONS

Publication number: 20240046026

Abstract: A method for text compression comprises recognizing a prefix string of one or more text characters preceding a target string of a plurality of text characters to be compressed. The prefix string is provided to a natural language generation (NLG) model configured to output one or more predicted continuations each having an associated rank. If the one or more predicted continuations include a matching predicted continuation relative to the next one or more text characters of the target string, the next one or more text characters are compressed as an NLG-type compressed representation. If no predicted continuations match the next one or more text characters of the target string, a longest matching entry in a compression dictionary is identified. The next one or more text characters of the target string are compressed as a dictionary-type compressed representation that includes the dictionary index value of the longest matching entry.

Type: Application

Filed: October 17, 2023

Publication date: February 8, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Ronny LEMPEL, Chenyan XIONG
Using a multi-task-trained neural network to guide interaction with a query-processing system via useful suggestions

Patent number: 11853362

Abstract: A computer-implemented technique is described herein for assisting a user in advancing a task objective. The technique uses a suggestion-generating system (SGS) to provide one or more suggestions to a user in response to at least a last-submitted query provided by the user. The SGS may correspond to a classification-type or generative-type neural network. The SGS uses a machine-trained model that is trained using a multi-task training framework based on plural groups of training examples, which, in turn, are produced using different respective example-generating methods. One such example-generating method constructs a training example from queries in a search session. It operates by identifying the task-related intent the queries, and then identifying at least one sequence of queries in the search session that exhibits a coherent task-related intent. A training example is constructed based on queries in such a sequence.

Type: Grant

Filed: April 16, 2020

Date of Patent: December 26, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Corby Louis Rosset, Chenyan Xiong, Paul Nathan Bennett, Saurabh Kumar Tiwary, Daniel Fernando Campos, Xia Song, Nicholas Eric Craswell
Document body vectorization and noise-contrastive training

Patent number: 11829374

Abstract: Document embedding vectors for each document of a corpus may be generated by combining embedding vectors for document subparts, thereby yielding a final embedding vector for the document. A machine learning model is trained using a query corpus and the document corpus, where the model generates a ranking score for a given (query, document) pair. During training, rankings scores are generated using the model, such that the training dataset is further refined using the generated ranking scores. For example, top documents and a negative document may be determined for a given query and subsequently used as training data. Multiple negative documents may therefore be determined for a given query. A negative document for a given query may be determined from the negative documents using noise-contrastive estimation. Such determined negative documents may be evaluated using a loss function during model training, thereby yielding a more robust model for search processing.

Type: Grant

Filed: March 19, 2021

Date of Patent: November 28, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Junaid Ahmed, Li Xiong, Arnold Overwijk, Chenyan Xiong
Text compression with predicted continuations

Patent number: 11803693

Abstract: A method for text compression comprises recognizing a prefix string of one or more text characters preceding a target string of a plurality of text characters to be compressed. The prefix string is provided to a natural language generation (NLG) model configured to output one or more predicted continuations each having an associated rank. If the one or more predicted continuations include a matching predicted continuation relative to the next one or more text characters of the target string, the next one or more text characters are compressed as an NLG-type compressed representation. If no predicted continuations match the next one or more text characters of the target string, a longest matching entry in a compression dictionary is identified. The next one or more text characters of the target string are compressed as a dictionary-type compressed representation that includes the dictionary index value of the longest matching entry.

Type: Grant

Filed: June 18, 2021

Date of Patent: October 31, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Ronny Lempel, Chenyan Xiong
Automated structured textual content categorization accuracy with neural networks

Patent number: 11734559

Abstract: To provide automated categorization of structured textual content individual nodes of textual content, from a document object model encapsulation of the structured textual content, have a multidimensional vector associated with them, where the values of the various dimensions of the multidimensional vector are based on the textual content in the corresponding node, the visual features applied or associated with the textual content of the corresponding node, and positional information of the textual content of the corresponding node. The multidimensional vectors are input to a neighbor-imbuing neural network. The enhanced multidimensional vectors output by the neighbor-imbuing neural network are then be provided to a categorization neural network. The resulting output can be in the form of multidimensional vectors whose dimensionality is proportional to categories into which the structured textual content is to be categorized. A weighted merge takes into account multiple nodes that are grouped together.

Type: Grant

Filed: June 19, 2020

Date of Patent: August 22, 2023

Assignee: MICRSOFT TECHNOLOGY LICENSING, LLC

Inventors: Charumathi Lakshmanan, Ye Li, Arnold Overwijk, Chenyan Xiong, Jiguang Shen, Junaid Ahmed, Jiaming Guo
Keyphase extraction beyond language modeling

Patent number: 11657223

Abstract: A system for extracting a key phrase from a document includes a neural key phrase extraction model (“BLING-KPE”) having a first layer to extract a word sequence from the document, a second layer to represent each word in the word sequence by ELMo embedding, position embedding, and visual features, and a third layer to concatenate the ELMo embedding, the position embedding, and the visual features to produce hybrid word embeddings. A convolutional transformer models the hybrid word embeddings to n-gram embeddings, and a feedforward layer converts the n-gram embeddings into a probability distribution over a set of n-grams and calculates a key phrase score of each n-gram. The neural key phrase extraction model is trained on annotated data based on a labeled loss function to compute cross entropy loss of the key phrase score of each n-gram as compared with a label from the annotated dataset.

Type: Grant

Filed: December 16, 2021

Date of Patent: May 23, 2023

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Li Xiong, Chuan Hu, Arnold Overwijk, Junaid Ahmed, Daniel Fernando Campos, Chenyan Xiong
TEXT COMPRESSION WITH PREDICTED CONTINUATIONS

Publication number: 20220405461

Abstract: A method for text compression comprises recognizing a prefix string of one or more text characters preceding a target string of a plurality of text characters to be compressed. The prefix string is provided to a natural language generation (NLG) model configured to output one or more predicted continuations each having an associated rank. If the one or more predicted continuations include a matching predicted continuation relative to the next one or more text characters of the target string, the next one or more text characters are compressed as an NLG-type compressed representation. If no predicted continuations match the next one or more text characters of the target string, a longest matching entry in a compression dictionary is identified. The next one or more text characters of the target string are compressed as a dictionary-type compressed representation that includes the dictionary index value of the longest matching entry.

Type: Application

Filed: June 18, 2021

Publication date: December 22, 2022

Applicant: Microsoft Technology Licensing, LLC

Inventors: Ronny LEMPEL, Chenyan XIONG
INTER-DOCUMENT ATTENTION MECHANISM

Publication number: 20220374479

Abstract: This document relates to natural language processing using a framework such as a neural network. One example method involves obtaining a first document and a second document and propagating attention from the first document to the second document. The example method also involves producing contextualized semantic representations of individual words in the second document based at least on the propagating. The contextualized semantic representations can provide a basis for performing one or more natural language processing operations.

Type: Application

Filed: July 18, 2022

Publication date: November 24, 2022

Applicant: Microsoft Technology Licensing, LLC

Inventors: Chenyan Xiong, Chen Zhao, Corbin Louis Rosset, Paul Nathan Bennett, Xia Song, Saurabh Kumar Tiwary
Inter-document attention mechanism

Patent number: 11423093

Abstract: This document relates to natural language processing using a framework such as a neural network. One example method involves obtaining a first document and a second document and propagating attention from the first document to the second document. The example method also involves producing contextualized semantic representations of individual words in the second document based at least on the propagating. The contextualized semantic representations can provide a basis for performing one or more natural language processing operations.

Type: Grant

Filed: September 25, 2019

Date of Patent: August 23, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chenyan Xiong, Chen Zhao, Corbin Louis Rosset, Paul Nathan Bennett, Xia Song, Saurabh Kumar Tiwary
DOCUMENT BODY VECTORIZATION AND NOISE-CONTRASTIVE TRAINING

Publication number: 20220179871

Abstract: Document embedding vectors for each document of a corpus may be generated by combining embedding vectors for document subparts, thereby yielding a final embedding vector for the document. A machine learning model is trained using a query corpus and the document corpus, where the model generates a ranking score for a given (query, document) pair. During training, rankings scores are generated using the model, such that the training dataset is further refined using the generated ranking scores. For example, top documents and a negative document may be determined for a given query and subsequently used as training data. Multiple negative documents may therefore be determined for a given query. A negative document for a given query may be determined from the negative documents using noise-contrastive estimation. Such determined negative documents may be evaluated using a loss function during model training, thereby yielding a more robust model for search processing.

Type: Application

Filed: March 19, 2021

Publication date: June 9, 2022

Applicant: Microsoft Technology Licensing, LLC

Inventors: Junaid AHMED, Li XIONG, Arnold OVERWIJK, Chenyan XIONG
KEYPHASE EXTRACTION BEYOND LANGUAGE MODELING

Publication number: 20220108078

Abstract: A system for extracting a key phrase from a document includes a neural key phrase extraction model (“BLING-KPE”) having a first layer to extract a word sequence from the document, a second layer to represent each word in the word sequence by ELMo embedding, position embedding, and visual features, and a third layer to concatenate the ELMo embedding, the position embedding, and the visual features to produce hybrid word embeddings. A convolutional transformer models the hybrid word embeddings to n-gram embeddings, and a feedforward layer converts the n-gram embeddings into a probability distribution over a set of n-grams and calculates a key phrase score of each n-gram. The neural key phrase extraction model is trained on annotated data based on a labeled loss function to compute cross entropy loss of the key phrase score of each n-gram as compared with a label from the annotated dataset.

Type: Application

Filed: December 16, 2021

Publication date: April 7, 2022

Inventors: Li XIONG, Chuan HU, Arnold OVERWIJK, Junaid AHMED, Daniel Fernando CAMPOS, Chenyan XIONG
Keyphrase extraction beyond language modeling

Patent number: 11250214

Abstract: A system for extracting a key phrase from a document includes a neural key phrase extraction model (“BLING-KPE”) having a first layer to extract a word sequence from the document, a second layer to represent each word in the word sequence by ELMo embedding, position embedding, and visual features, and a third layer to concatenate the ELMo embedding, the position embedding, and the visual features to produce hybrid word embeddings. A convolutional transformer models the hybrid word embeddings to n-gram embeddings, and a feedforward layer converts the n-gram embeddings into a probability distribution over a set of n-grams and calculates a key phrase score of each n-gram. The neural key phrase extraction model is trained on annotated data based on a labeled loss function to compute cross entropy loss of the key phrase score of each n-gram as compared with a label from the annotated dataset.

Type: Grant

Filed: July 2, 2019

Date of Patent: February 15, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Li Xiong, Chuan Hu, Arnold Overwijk, Junaid Ahmed, Daniel Fernando Campos, Chenyan Xiong
Automated Structured Textual Content Categorization Accuracy With Neural Networks

Publication number: 20210397944

Abstract: To provide automated categorization of structured textual content individual nodes of textual content, from a document object model encapsulation of the structured textual content, have a multidimensional vector associated with them, where the values of the various dimensions of the multidimensional vector are based on the textual content in the corresponding node, the visual features applied or associated with the textual content of the corresponding node, and positional information of the textual content of the corresponding node. The multidimensional vectors are input to a neighbor-imbuing neural network. The enhanced multidimensional vectors output by the neighbor-imbuing neural network are then be provided to a categorization neural network. The resulting output can be in the form of multidimensional vectors whose dimensionality is proportional to categories into which the structured textual content is to be categorized. A weighted merge takes into account multiple nodes that are grouped together.

Type: Application

Filed: June 19, 2020

Publication date: December 23, 2021

Inventors: Charumathi Lakshmanan, Ye Li, Arnold Overwijk, Chenyan Xiong, Jiguang Shen, Junaid Ahmed, Jiaming Guo
Using a Multi-Task-Trained Neural Network to Guide Interaction with a Query-Processing System via Useful Suggestions

Publication number: 20210326742

Abstract: A computer-implemented technique is described herein for assisting a user in advancing a task objective. The technique uses a suggestion-generating system (SGS) to provide one or more suggestions to a user in response to at least a last-submitted query provided by the user. The SGS may correspond to a classification-type or generative-type neural network. The SGS uses a machine-trained model that is trained using a multi-task training framework based on plural groups of training examples, which, in turn, are produced using different respective example-generating methods. One such example-generating method constructs a training example from queries in a search session. It operates by identifying the task-related intent the queries, and then identifying at least one sequence of queries in the search session that exhibits a coherent task-related intent. A training example is constructed based on queries in such a sequence.

Type: Application

Filed: April 16, 2020

Publication date: October 21, 2021

Inventors: Corby Louis ROSSET, Chenyan XIONG, Paul Nathan BENNETT, Saurabh Kumar TIWARY, Daniel Fernando CAMPOS, Xia SONG, Nicholas Eric CRASWELL
Intent encoder trained using search logs

Patent number: 11138285

Abstract: A computer-implemented technique receives an input expression that a user submits with an intent to accomplish some objective. The technique then uses a machine-trained intent encoder component to map the input expression into an input expression intent vector (IEIV). The IEIV corresponds to a distributed representation of the intent associated with the input expression, within a vector intent vector space. The technique then leverages the intent vector to facilitate some downstream application task, such as the retrieval of information. Some application tasks also use a neighbor search component to find expressions that express an intent similar to that of the input expression. A training system trains the intent encoder component based on the nexus between queries and user clicks, as recorded in a search engine's search log.

Type: Grant

Filed: March 7, 2019

Date of Patent: October 5, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Hongfei Zhang, Xia Song, Chenyan Xiong, Corbin Louis Rosset, Paul Nathan Bennett, Nicholas Eric Craswell, Saurabh Kumar Tiwary
INTER-DOCUMENT ATTENTION MECHANISM

Publication number: 20210089594

Abstract: This document relates to natural language processing using a framework such as a neural network. One example method involves obtaining a first document and a second document and propagating attention from the first document to the second document. The example method also involves producing contextualized semantic representations of individual words in the second document based at least on the propagating. The contextualized semantic representations can provide a basis for performing one or more natural language processing operations.

Type: Application

Filed: September 25, 2019

Publication date: March 25, 2021

Applicant: Microsoft Technology Licensing, LLC

Inventors: Chenyan Xiong, Chen Zhao, Corbin Louis Rosset, Paul Nathan Bennett, Xia Song, Saurabh Kumar Tiwary
KEYPHRASE EXTRACTION BEYOND LANGUAGE MODELING

Publication number: 20210004439

Abstract: A system for extracting a key phrase from a document includes a neural key phrase extraction model (“BLING-KPE”) having a first layer to extract a word sequence from the document, a second layer to represent each word in the word sequence by ELMo embedding, position embedding, and visual features, and a third layer to concatenate the ELMo embedding, the position embedding, and the visual features to produce hybrid word embeddings. A convolutional transformer models the hybrid word embeddings to n-gram embeddings, and a feedforward layer converts the n-gram embeddings into a probability distribution over a set of n-grams and calculates a key phrase score of each n-gram. The neural key phrase extraction model is trained on annotated data based on a labeled loss function to compute cross entropy loss of the key phrase score of each n-gram as compared with a label from the annotated dataset.

Type: Application

Filed: July 2, 2019

Publication date: January 7, 2021

Applicant: Microsoft Technology Licensing, LLC

Inventors: Li XIONG, Chuan HU, Arnold OVERWIJK, Junaid AHMED, Daniel Fernando CAMPOS, Chenyan XIONG
Intent Encoder Trained Using Search Logs

Publication number: 20200285687

Abstract: A computer-implemented technique is described herein that receives an input expression that a user submits with an intent to accomplish some objective. The technique then uses a machine-trained intent encoder component to map the input expression into an input expression intent vector (IEIV). The IEIV corresponds to a distributed representation of the intent associated with the input expression, within a vector intent vector space. The technique then leverages the intent vector to facilitate some downstream application task, such as the retrieval of information. Some application tasks also use a neighbor search component to find expressions that express an intent similar to that of the input expression. A training system trains the intent encoder component based on the nexus between queries and user clicks, as recorded in a search engine's search log.

Type: Application

Filed: March 7, 2019

Publication date: September 10, 2020

Inventors: Hongfei ZHANG, Xia SONG, Chenyan XIONG, Corbin Louis ROSSET, Paul Nathan BENNETT, Nicholas Eric CRASWELL, Saurabh Kumar TIWARY