Patents by Inventor Daniel Gillick

Daniel Gillick has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Identifying codemixed text

Patent number: 10579733

Abstract: A method for identifying codemixed text includes receiving codemixed text and segmenting the codemixed text into a plurality of tokens. Each token includes at least one character and is delineated from any adjacent tokens by a space. For each token of the codemixed text, the method also includes extracting features from the token and predicting a probability distribution over possible languages for the token using a language identifier model configured to receive the extracted features from the token as feature inputs. The method also includes assigning a language to each token of the codemixed text by executing a greedy search on the probability distribution over the possible languages predicted for each respective token.

Type: Grant

Filed: May 10, 2018

Date of Patent: March 3, 2020

Assignee: Google LLC

Inventors: Jason Riesa, Daniel Gillick, Yuan Zhang, Anton Bakalov, Jason Baldridge, David Weiss
Identifying Codemixed Text

Publication number: 20190347323

Abstract: A method for identifying codemixed text includes receiving codemixed text and segmenting the codemixed text into a plurality of tokens. Each token includes at least one character and is delineated from any adjacent tokens by a space. For each token of the codemixed text, the method also includes extracting features from the token and predicting a probability distribution over possible languages for the token using a language identifier model configured to receive the extracted features from the token as feature inputs. The method also includes assigning a language to each token of the codemixed text by executing a greedy search on the probability distribution over the possible languages predicted for each respective token.

Type: Application

Filed: May 10, 2018

Publication date: November 14, 2019

Applicant: Google LLC

Inventors: Jason Riesa, Daniel Gillick, Yuan Zhang, Anton Bakalov, Jason Baldridge, David Weiss
Techniques for automatically identifying salient entities in documents

Patent number: 9619457

Abstract: A computer-implemented technique can include obtaining a training corpus including pairs of (i) documents and (ii) corresponding abstracts. The technique can include identifying a set of entity mentions in each abstract and each corresponding document based on their respective part-of-speech (POS) tags and dependency parses. The technique can include clustering the sets of entity mentions referring to a same underlying entity to obtain clusters for each document and each corresponding abstract. The technique can include aligning specific abstract entity mentions to corresponding document entity mentions to obtain a set of aligned abstract and document entities. The technique can include labeling the set of aligned entities as salient and unaligned entities as non-salient to generate a labeled corpus. The technique can also include training features of a classifier using the labeled corpus to obtain a trained classifier.

Type: Grant

Filed: July 16, 2014

Date of Patent: April 11, 2017

Assignee: GOOGLE INC.

Inventors: Daniel Gillick, Amarnag Subramanya

Identifying codemixed text

Identifying Codemixed Text

Techniques for automatically identifying salient entities in documents