Patents by Inventor Ani Nenkova

Ani Nenkova has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240161529
    Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate a digital document hierarchy comprising layers of parent-child element relationships from the visual elements. For example, for a layer of the layers, the disclosed systems determine, from the visual elements, candidate parent visual elements and child visual elements. In addition, for the layer of the layers, the disclosed systems generate, from the feature embeddings utilizing a neural network, element classifications for the candidate parent visual elements and parent-child element link probabilities for the candidate parent visual elements and the child visual elements. Moreover, for the layer, the disclosed systems select parent visual elements from the candidate parent visual elements based on the parent-child element link probabilities. Further, the disclosed systems utilize the digital document hierarchy to generate an interactive digital document from the digital document image.
    Type: Application
    Filed: November 15, 2022
    Publication date: May 16, 2024
    Inventors: Vlad Morariu, Puneet Mathur, Rajiv Jain, Ashutosh Mehra, Jiuxiang Gu, Franck Dernoncourt, Anandhavelu N, Quan Tran, Verena Kaynig-Fittkau, Nedim Lipka, Ani Nenkova
  • Publication number: 20240135096
    Abstract: Systems and methods for document classification are described. Embodiments of the present disclosure generate classification data for a plurality of samples using a neural network trained to identify a plurality of known classes; select a set of samples for annotation from the plurality of samples using an open-set metric based on the classification data, wherein the annotation includes an unknown class; and train the neural network to identify the unknown class based on the annotation of the set of samples.
    Type: Application
    Filed: October 23, 2022
    Publication date: April 25, 2024
    Inventors: Rajiv Bhawanji Jain, Michelle Yuan, Vlad Ion Morariu, Ani Nenkova Nenkova, Smitha Bangalore Naresh, Nikolaos Barmpalios, Ruchi Deshpande, Ruiyi Zhang, Jiuxiang Gu, Varun Manjunatha, Nedim Lipka, Andrew Marc Greene
  • Publication number: 20240135165
    Abstract: One aspect of systems and methods for data correction includes identifying a false label from among predicted labels corresponding to different parts of an input sample, wherein the predicted labels are generated by a neural network trained based on a training set comprising training samples and training labels corresponding to parts of the training samples; computing an influence of each of the training labels on the false label by approximating a change in a conditional loss for the neural network corresponding to each of the training labels; identifying a part of a training sample of the training samples and a corresponding source label from among the training labels based on the computed influence; and modifying the training set based on the identified part of the training sample and the corresponding source label to obtain a corrected training set.
    Type: Application
    Filed: October 18, 2022
    Publication date: April 25, 2024
    Inventors: Varun Manjunatha, Sarthak Jain, Rajiv Bhawanji Jain, Ani Nenkova Nenkova, Christopher Alan Tensmeyer, Franck Dernoncourt, Quan Hung Tran, Ruchi Deshpande
  • Publication number: 20240104951
    Abstract: In various examples, a table recognition model receives an image of a table and generates, using a first encoder of the table recognition machine learning model, an image feature vector including features extracted from the image of the table; generates, using a first decoder of the table recognition machine learning model and the image feature vector, a set of coordinates within the image representing rows and columns associated with the table, and generates, using a second decoder of the table recognition machine learning model and the image feature vector, a set of bounding boxes and semantic features associated with cells the table, then determines, using a third decoder of the table recognition machine learning model, a table structure associated with the table using the image feature vector, the set of coordinates, the set of bounding boxes, and the semantic features.
    Type: Application
    Filed: September 19, 2022
    Publication date: March 28, 2024
    Inventors: Jiuxiang Gu, Vlad Morariu, Tong Sun, Jason wen yong Kuen, Ani Nenkova
  • Patent number: 11880655
    Abstract: Embodiments are disclosed for performing fact correction of natural language sentences using data tables. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input sentence, tokenizing elements of the input sentence, and identifying, by a first machine learning model, a data table associated with the input sentence. The systems and methods further comprise a second machine learning model identifying a tokenized element of the input sentence that renders the input sentence false based on the data table and masking the tokenized element of the tokenized input sentence that renders the input sentence false. The systems and method further includes a third machine learning model predicting a new value for the masked tokenized element based on the input sentence with the masked tokenized element and the identified data table and providing an output including a modified input sentence with the new value.
    Type: Grant
    Filed: April 19, 2022
    Date of Patent: January 23, 2024
    Assignee: Adobe Inc.
    Inventors: Christopher Tensmeyer, Danilo Neves Ribeiro, Varun Manjunatha, Nedim Lipka, Ani Nenkova
  • Publication number: 20230376687
    Abstract: Embodiments are provided for facilitating multimodal extraction across multiple granularities. In one implementation, a set of features of a document for a plurality of granularities of the document is obtained. Via a machine learning model, the set of features of the document are modified to generate a set of modified features using a set of self-attention values to determine relationships within a first type of feature and a set of cross-attention values to determine relationships between the first type of feature and a second type of feature. Thereafter, the set of modified features are provided to a second machine learning model to perform a classification task.
    Type: Application
    Filed: May 17, 2022
    Publication date: November 23, 2023
    Inventors: Vlad Ion Morariu, Tong Sun, Nikolaos Barmpalios, Zilong Wang, Jiuxiang Gu, Ani Nenkova Nenkova, Christopher Tensmeyer
  • Publication number: 20230368003
    Abstract: The technology described herein is directed to an adaptive sparse attention pattern that is learned during fine-tuning and deployed in a machine-learning model. In aspects, a row or a column in an attention matrix with an importance score for a task that is above a threshold importance score is identified. The important row or the column is included in an adaptive attention pattern used with a machine-learning model having a self-attention operation. In response to an input, a task-specific inference is generated for the input using the machine-learning model with the adaptive attention pattern.
    Type: Application
    Filed: May 10, 2022
    Publication date: November 16, 2023
    Inventors: Jiuxiang Gu, Zihan Wang, Jason Wen Yong Kuen, Handong Zhao, Vlad Ion Morariu, Ruiyi Zhang, Ani Nenkova Nenkova, Tong Sun
  • Publication number: 20230334244
    Abstract: Embodiments are disclosed for performing fact correction of natural language sentences using data tables. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input sentence, tokenizing elements of the input sentence, and identifying, by a first machine learning model, a data table associated with the input sentence. The systems and methods further comprise a second machine learning model identifying a tokenized element of the input sentence that renders the input sentence false based on the data table and masking the tokenized element of the tokenized input sentence that renders the input sentence false. The systems and method further includes a third machine learning model predicting a new value for the masked tokenized element based on the input sentence with the masked tokenized element and the identified data table and providing an output including a modified input sentence with the new value.
    Type: Application
    Filed: April 19, 2022
    Publication date: October 19, 2023
    Applicant: Adobe Inc.
    Inventors: Christopher TENSMEYER, Danilo Neves Ribeiro, Varun MANJUNATHA, Nedim LIPKA, Ani NENKOVA
  • Publication number: 20230186667
    Abstract: Techniques described herein are directed to assisting review of documents. In one embodiment, one or more text segments and one or more subjects in a document are identified. A text segment in the document is associated with a corresponding subject identified in the document. The text segment is classified with a content type value corresponding to a relation of the text segment to the corresponding subject. Thereafter, information is provided for the text segment associated with the corresponding subject for display on a user interface. Such information can include a representation of the content type value for the text segment.
    Type: Application
    Filed: December 13, 2021
    Publication date: June 15, 2023
    Inventors: Navita Goyal, Ani Nenkova Nenkova, Natwar Modani, Ayush Maheshwari, Inderjeet Jayakumar Nair
  • Publication number: 20230154221
    Abstract: The technology described includes methods for pretraining a document encoder model based on multimodal self cross-attention. One method includes receiving image data that encodes a set of pretraining documents. A set of sentences is extracted from the image data. A bounding box for each sentence is generated. For each sentence, a set of predicted features is generated by using an encoder machine-learning model. The encoder model performs cross-attention between a set of masked-textual features for the sentence and a set of masked-visual features for the sentence. The set of masked-textual features is based on a masking function and the sentence. The set of masked-visual features is based on the masking function and the corresponding bounding box. A document-encoder model is pretrained based on the set of predicted features for each sentence and pretraining tasks. The pretraining tasks includes masked sentence modeling, visual contrastive learning, or visual-language alignment.
    Type: Application
    Filed: November 16, 2021
    Publication date: May 18, 2023
    Inventors: Jiuxiang Gu, Ani Nenkova Nenkova, Nikolaos Barmpalios, Vlad Ion Morariu, Tong Sun, Rajiv Bhawanji Jain, Jason wen yong Kuen, Handong Zhao
  • Publication number: 20120240032
    Abstract: System for generating a summary of a plurality of documents is provided. The system includes a computer readable document collection containing a plurality of related documents stored in electronic form therein, a plurality of forms of multiple document summarization engines, and a router for determining a temporal relationship of at least a subset of the documents in the collection and selecting one of the plurality of forms of multiple document summarization engines for generating a summary of the subset of documents based on the temporal relationship.
    Type: Application
    Filed: March 30, 2012
    Publication date: September 20, 2012
    Inventors: Kathleen R. McKeown, Regina Barzilay, Dave Evans, Vasileios Hatziv Assiloglou, Judith Klavans, Ani Nenkova, Barry Schiffman
  • Publication number: 20120240020
    Abstract: Computer-based method of generating a summary of one or more documents comprises identifying content including text having a measurable quality from a predetermined location, evaluating the content, using a computer processor, to determine whether the content represents a document of interest, and preparing a summary of the content if the content represents document of interest. A computer-based method of generating a summary of one or more documents, each including two or more sentences, is also provided.
    Type: Application
    Filed: May 17, 2012
    Publication date: September 20, 2012
    Inventors: Kathleen R. McKeown, Regina Barzilay, Dave Evans, Vasileios Hatzivassiloglou, Judith Klavans, Ani Nenkova, Barry Schiffman
  • Patent number: 8176418
    Abstract: A system for generating a summary of a plurality of documents and presenting the summary information to a user is provided which includes a computer readable document collection containing a plurality of related documents stored in electronic form. Documents can be pre-processed to group documents into document clusters. The document clusters can also be assigned to predetermined document categories for presentation to a user. A number of multiple document summarization engines are provided which generate summaries for specific classes of multiple documents clusters. A summarizer router is employed to determining a relationship of the documents in a cluster and select one of the document summarization engines for use in generating a summary of the cluster. A single event engine is provided to generate summaries of documents which are closely related temporally and to a specific event.
    Type: Grant
    Filed: March 4, 2005
    Date of Patent: May 8, 2012
    Assignee: The Trustees of Columbia University in the City of New York
    Inventors: Kathleen R. McKeown, Regina Barzilay, Dave Evans, Vasileios Hatzivassiloglou, Judith Klavans, Ani Nenkova, Barry Schiffman
  • Publication number: 20060031304
    Abstract: Methods and apparatus are disclosed for classifying the relative position of one or more text messages (including transcribed audio messages) in a related thread of text messages. One or more classifiers are applied to the text messages; and a classification of the text messages is obtained that indicates the relative position of the text messages in the thread. For example, a thread can include a root message, a leaf message and one or more inner messages, and the classification can indicate whether each text message is a root message, a leaf message or an inner message. The classifiers are trained on a set of training messages that have been previously classified to indicate a relative position of each training message in a corresponding thread. The classifiers employ one or more features that help to distinguish between root and non-root messages.
    Type: Application
    Filed: April 27, 2004
    Publication date: February 9, 2006
    Inventors: Amit Bagga, Ani Nenkova
  • Publication number: 20050262214
    Abstract: A method and apparatus are provided for summarizing a text message, such as an email message or a transcribed audio message. A portion of each text message, such as a sentence, is extracted as an indicative summary of the text message based on a degree of overlap of words in the sentence with a set of words, such as words in the message subject or words in a related root message. The extracted portion is based on a score for each portion of the text message, such as a sentence. An interface is also provided for presenting the indicative summaries of a set of related text messages to a user.
    Type: Application
    Filed: April 27, 2004
    Publication date: November 24, 2005
    Inventors: Amit Bagga, Ani Nenkova
  • Publication number: 20050203970
    Abstract: A system for generating a summary of a plurality of documents and presenting the summary information to a user is provided which includes a computer readable document collection containing a plurality of related documents stored in electronic form. Documents can be pre-processed to group documents into document clusters. The document clusters can also be assigned to predetermined document categories for presentation to a user. A number of multiple document summarization engines are provided which generate summaries for specific classes of multiple documents clusters. A summarizer router is employed to determining a relationship of the documents in a cluster and select one of the document summarization engines for use in generating a summary of the cluster. A single event engine is provided to generate summaries of documents which are closely related temporally and to a specific event.
    Type: Application
    Filed: March 4, 2005
    Publication date: September 15, 2005
    Inventors: Kathleen McKeown, Regina Barzilay, Dave Evans, Vasileios Hatzivassiloglou, Judith Klavans, Ani Nenkova, Barry Schiffman