Patents by Inventor Ani Nenkova

Ani Nenkova has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

EXTRACTING DOCUMENT HIERARCHY USING A MULTIMODAL, LAYER-WISE LINK PREDICTION NEURAL NETWORK

Publication number: 20240161529

Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate a digital document hierarchy comprising layers of parent-child element relationships from the visual elements. For example, for a layer of the layers, the disclosed systems determine, from the visual elements, candidate parent visual elements and child visual elements. In addition, for the layer of the layers, the disclosed systems generate, from the feature embeddings utilizing a neural network, element classifications for the candidate parent visual elements and parent-child element link probabilities for the candidate parent visual elements and the child visual elements. Moreover, for the layer, the disclosed systems select parent visual elements from the candidate parent visual elements based on the parent-child element link probabilities. Further, the disclosed systems utilize the digital document hierarchy to generate an interactive digital document from the digital document image.

Type: Application

Filed: November 15, 2022

Publication date: May 16, 2024

Inventors: Vlad Morariu, Puneet Mathur, Rajiv Jain, Ashutosh Mehra, Jiuxiang Gu, Franck Dernoncourt, Anandhavelu N, Quan Tran, Verena Kaynig-Fittkau, Nedim Lipka, Ani Nenkova
LABEL INDUCTION

Publication number: 20240135096

Abstract: Systems and methods for document classification are described. Embodiments of the present disclosure generate classification data for a plurality of samples using a neural network trained to identify a plurality of known classes; select a set of samples for annotation from the plurality of samples using an open-set metric based on the classification data, wherein the annotation includes an unknown class; and train the neural network to identify the unknown class based on the annotation of the set of samples.

Type: Application

Filed: October 23, 2022

Publication date: April 25, 2024

Inventors: Rajiv Bhawanji Jain, Michelle Yuan, Vlad Ion Morariu, Ani Nenkova Nenkova, Smitha Bangalore Naresh, Nikolaos Barmpalios, Ruchi Deshpande, Ruiyi Zhang, Jiuxiang Gu, Varun Manjunatha, Nedim Lipka, Andrew Marc Greene
SYSTEMS AND METHODS FOR DATA CORRECTION

Publication number: 20240135165

Abstract: One aspect of systems and methods for data correction includes identifying a false label from among predicted labels corresponding to different parts of an input sample, wherein the predicted labels are generated by a neural network trained based on a training set comprising training samples and training labels corresponding to parts of the training samples; computing an influence of each of the training labels on the false label by approximating a change in a conditional loss for the neural network corresponding to each of the training labels; identifying a part of a training sample of the training samples and a corresponding source label from among the training labels based on the computed influence; and modifying the training set based on the identified part of the training sample and the corresponding source label to obtain a corrected training set.

Type: Application

Filed: October 18, 2022

Publication date: April 25, 2024

Inventors: Varun Manjunatha, Sarthak Jain, Rajiv Bhawanji Jain, Ani Nenkova Nenkova, Christopher Alan Tensmeyer, Franck Dernoncourt, Quan Hung Tran, Ruchi Deshpande
IMAGE AND SEMANTIC BASED TABLE RECOGNITION

Publication number: 20240104951

Abstract: In various examples, a table recognition model receives an image of a table and generates, using a first encoder of the table recognition machine learning model, an image feature vector including features extracted from the image of the table; generates, using a first decoder of the table recognition machine learning model and the image feature vector, a set of coordinates within the image representing rows and columns associated with the table, and generates, using a second decoder of the table recognition machine learning model and the image feature vector, a set of bounding boxes and semantic features associated with cells the table, then determines, using a third decoder of the table recognition machine learning model, a table structure associated with the table using the image feature vector, the set of coordinates, the set of bounding boxes, and the semantic features.

Type: Application

Filed: September 19, 2022

Publication date: March 28, 2024

Inventors: Jiuxiang Gu, Vlad Morariu, Tong Sun, Jason wen yong Kuen, Ani Nenkova
Fact correction of natural language sentences using data tables

Patent number: 11880655

Abstract: Embodiments are disclosed for performing fact correction of natural language sentences using data tables. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input sentence, tokenizing elements of the input sentence, and identifying, by a first machine learning model, a data table associated with the input sentence. The systems and methods further comprise a second machine learning model identifying a tokenized element of the input sentence that renders the input sentence false based on the data table and masking the tokenized element of the tokenized input sentence that renders the input sentence false. The systems and method further includes a third machine learning model predicting a new value for the masked tokenized element based on the input sentence with the masked tokenized element and the identified data table and providing an output including a modified input sentence with the new value.

Type: Grant

Filed: April 19, 2022

Date of Patent: January 23, 2024

Assignee: Adobe Inc.

Inventors: Christopher Tensmeyer, Danilo Neves Ribeiro, Varun Manjunatha, Nedim Lipka, Ani Nenkova
MULTIMODAL EXTRACTION ACROSS MULTIPLE GRANULARITIES

Publication number: 20230376687

Abstract: Embodiments are provided for facilitating multimodal extraction across multiple granularities. In one implementation, a set of features of a document for a plurality of granularities of the document is obtained. Via a machine learning model, the set of features of the document are modified to generate a set of modified features using a set of self-attention values to determine relationships within a first type of feature and a set of cross-attention values to determine relationships between the first type of feature and a second type of feature. Thereafter, the set of modified features are provided to a second machine learning model to perform a classification task.

Type: Application

Filed: May 17, 2022

Publication date: November 23, 2023

Inventors: Vlad Ion Morariu, Tong Sun, Nikolaos Barmpalios, Zilong Wang, Jiuxiang Gu, Ani Nenkova Nenkova, Christopher Tensmeyer
ADAPTIVE SPARSE ATTENTION PATTERN

Publication number: 20230368003

Abstract: The technology described herein is directed to an adaptive sparse attention pattern that is learned during fine-tuning and deployed in a machine-learning model. In aspects, a row or a column in an attention matrix with an importance score for a task that is above a threshold importance score is identified. The important row or the column is included in an adaptive attention pattern used with a machine-learning model having a self-attention operation. In response to an input, a task-specific inference is generated for the input using the machine-learning model with the adaptive attention pattern.

Type: Application

Filed: May 10, 2022

Publication date: November 16, 2023

Inventors: Jiuxiang Gu, Zihan Wang, Jason Wen Yong Kuen, Handong Zhao, Vlad Ion Morariu, Ruiyi Zhang, Ani Nenkova Nenkova, Tong Sun
FACT CORRECTION OF NATURAL LANGUAGE SENTENCES USING DATA TABLES

Publication number: 20230334244

Abstract: Embodiments are disclosed for performing fact correction of natural language sentences using data tables. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input sentence, tokenizing elements of the input sentence, and identifying, by a first machine learning model, a data table associated with the input sentence. The systems and methods further comprise a second machine learning model identifying a tokenized element of the input sentence that renders the input sentence false based on the data table and masking the tokenized element of the tokenized input sentence that renders the input sentence false. The systems and method further includes a third machine learning model predicting a new value for the masked tokenized element based on the input sentence with the masked tokenized element and the identified data table and providing an output including a modified input sentence with the new value.

Type: Application

Filed: April 19, 2022

Publication date: October 19, 2023

Applicant: Adobe Inc.

Inventors: Christopher TENSMEYER, Danilo Neves Ribeiro, Varun MANJUNATHA, Nedim LIPKA, Ani NENKOVA
ASSISTED REVIEW OF TEXT CONTENT USING A MACHINE LEARNING MODEL

Publication number: 20230186667

Abstract: Techniques described herein are directed to assisting review of documents. In one embodiment, one or more text segments and one or more subjects in a document are identified. A text segment in the document is associated with a corresponding subject identified in the document. The text segment is classified with a content type value corresponding to a relation of the text segment to the corresponding subject. Thereafter, information is provided for the text segment associated with the corresponding subject for display on a user interface. Such information can include a representation of the content type value for the text segment.

Type: Application

Filed: December 13, 2021

Publication date: June 15, 2023

Inventors: Navita Goyal, Ani Nenkova Nenkova, Natwar Modani, Ayush Maheshwari, Inderjeet Jayakumar Nair
UNIFIED PRETRAINING FRAMEWORK FOR DOCUMENT UNDERSTANDING

Publication number: 20230154221

Abstract: The technology described includes methods for pretraining a document encoder model based on multimodal self cross-attention. One method includes receiving image data that encodes a set of pretraining documents. A set of sentences is extracted from the image data. A bounding box for each sentence is generated. For each sentence, a set of predicted features is generated by using an encoder machine-learning model. The encoder model performs cross-attention between a set of masked-textual features for the sentence and a set of masked-visual features for the sentence. The set of masked-textual features is based on a masking function and the sentence. The set of masked-visual features is based on the masking function and the corresponding bounding box. A document-encoder model is pretrained based on the set of predicted features for each sentence and pretraining tasks. The pretraining tasks includes masked sentence modeling, visual contrastive learning, or visual-language alignment.

Type: Application

Filed: November 16, 2021

Publication date: May 18, 2023

Inventors: Jiuxiang Gu, Ani Nenkova Nenkova, Nikolaos Barmpalios, Vlad Ion Morariu, Tong Sun, Rajiv Bhawanji Jain, Jason wen yong Kuen, Handong Zhao
SYSTEM AND METHOD FOR DOCUMENT COLLECTION, GROUPING AND SUMMARIZATION

Publication number: 20120240032

Abstract: System for generating a summary of a plurality of documents is provided. The system includes a computer readable document collection containing a plurality of related documents stored in electronic form therein, a plurality of forms of multiple document summarization engines, and a router for determining a temporal relationship of at least a subset of the documents in the collection and selecting one of the plurality of forms of multiple document summarization engines for generating a summary of the subset of documents based on the temporal relationship.

Type: Application

Filed: March 30, 2012

Publication date: September 20, 2012

Inventors: Kathleen R. McKeown, Regina Barzilay, Dave Evans, Vasileios Hatziv Assiloglou, Judith Klavans, Ani Nenkova, Barry Schiffman
SYSTEM AND METHOD FOR DOCUMENT COLLECTION, GROUPING AND SUMMARIZATION

Publication number: 20120240020

Abstract: Computer-based method of generating a summary of one or more documents comprises identifying content including text having a measurable quality from a predetermined location, evaluating the content, using a computer processor, to determine whether the content represents a document of interest, and preparing a summary of the content if the content represents document of interest. A computer-based method of generating a summary of one or more documents, each including two or more sentences, is also provided.

Type: Application

Filed: May 17, 2012

Publication date: September 20, 2012

Inventors: Kathleen R. McKeown, Regina Barzilay, Dave Evans, Vasileios Hatzivassiloglou, Judith Klavans, Ani Nenkova, Barry Schiffman
System and method for document collection, grouping and summarization

Patent number: 8176418

Abstract: A system for generating a summary of a plurality of documents and presenting the summary information to a user is provided which includes a computer readable document collection containing a plurality of related documents stored in electronic form. Documents can be pre-processed to group documents into document clusters. The document clusters can also be assigned to predetermined document categories for presentation to a user. A number of multiple document summarization engines are provided which generate summaries for specific classes of multiple documents clusters. A summarizer router is employed to determining a relationship of the documents in a cluster and select one of the document summarization engines for use in generating a summary of the cluster. A single event engine is provided to generate summaries of documents which are closely related temporally and to a specific event.

Type: Grant

Filed: March 4, 2005

Date of Patent: May 8, 2012

Assignee: The Trustees of Columbia University in the City of New York

Inventors: Kathleen R. McKeown, Regina Barzilay, Dave Evans, Vasileios Hatzivassiloglou, Judith Klavans, Ani Nenkova, Barry Schiffman
Method and apparatus for classification of relative position of one or more text messages in an email thread

Publication number: 20060031304

Abstract: Methods and apparatus are disclosed for classifying the relative position of one or more text messages (including transcribed audio messages) in a related thread of text messages. One or more classifiers are applied to the text messages; and a classification of the text messages is obtained that indicates the relative position of the text messages in the thread. For example, a thread can include a root message, a leaf message and one or more inner messages, and the classification can indicate whether each text message is a root message, a leaf message or an inner message. The classifiers are trained on a set of training messages that have been previously classified to indicate a relative position of each training message in a corresponding thread. The classifiers employ one or more features that help to distinguish between root and non-root messages.

Type: Application

Filed: April 27, 2004

Publication date: February 9, 2006

Inventors: Amit Bagga, Ani Nenkova
Method and apparatus for summarizing one or more text messages using indicative summaries

Publication number: 20050262214

Abstract: A method and apparatus are provided for summarizing a text message, such as an email message or a transcribed audio message. A portion of each text message, such as a sentence, is extracted as an indicative summary of the text message based on a degree of overlap of words in the sentence with a set of words, such as words in the message subject or words in a related root message. The extracted portion is based on a score for each portion of the text message, such as a sentence. An interface is also provided for presenting the indicative summaries of a set of related text messages to a user.

Type: Application

Filed: April 27, 2004

Publication date: November 24, 2005

Inventors: Amit Bagga, Ani Nenkova
System and method for document collection, grouping and summarization

Publication number: 20050203970

Abstract: A system for generating a summary of a plurality of documents and presenting the summary information to a user is provided which includes a computer readable document collection containing a plurality of related documents stored in electronic form. Documents can be pre-processed to group documents into document clusters. The document clusters can also be assigned to predetermined document categories for presentation to a user. A number of multiple document summarization engines are provided which generate summaries for specific classes of multiple documents clusters. A summarizer router is employed to determining a relationship of the documents in a cluster and select one of the document summarization engines for use in generating a summary of the cluster. A single event engine is provided to generate summaries of documents which are closely related temporally and to a specific event.

Type: Application

Filed: March 4, 2005

Publication date: September 15, 2005

Inventors: Kathleen McKeown, Regina Barzilay, Dave Evans, Vasileios Hatzivassiloglou, Judith Klavans, Ani Nenkova, Barry Schiffman