Patents by Inventor Salim Roukos

Salim Roukos has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Knowledge base question answering

Patent number: 11868716

Abstract: One or more computer processors parse a received natural language question into an abstract meaning representation (AMR) graph. The one or more computer processors enrich the AMR graph into an extended AMR graph. The one or more computer processors transform the extended AMR graph into a query graph utilizing a path-based approach, wherein the query graph is a directed edge-labeled graph. The one or more computer processors generate one or more answers to the natural language question through one or more queries created utilizing the query graph.

Type: Grant

Filed: August 31, 2021

Date of Patent: January 9, 2024

Assignee: International Business Machines Corporation

Inventors: Srinivas Ravishankar, Pavan Kapanipathi Bangalore, Ibrahim Abdelaziz, Nandana Mihindukulasooriya, Dinesh Garg, Salim Roukos, Alexander Gray
Generating error event descriptions using context-specific attention

Patent number: 11853149

Abstract: Generating error event descriptions by receiving a set of error messages associated with an error event, generating a tokenization of at least one line of the set of error messages, providing the tokenization to an attention head according to a context of the tokenization, providing an output of the attention head as input to a generative model, generating a description of the error event according to the output, and providing the description to a user.

Type: Grant

Filed: September 10, 2021

Date of Patent: December 26, 2023

Assignee: International Business Machines Corporation

Inventors: Anjali Shah, Jennifer A. Mallette, Salim Roukos
ZERO-SHOT ENTITY LINKING BASED ON SYMBOLIC INFORMATION

Publication number: 20230229859

Abstract: Methods, systems, and computer program products for zero-shot entity linking based on symbolic information are provided herein. A computer-implemented method includes obtaining a knowledge graph comprising a set of entities and a training dataset comprising text samples for at least a subset of the entities in the knowledge graph; training a machine learning model to map an entity mention substring of a given sample of text to one corresponding entity in the set of entities, wherein the machine learning model is trained using a multi-task machine learning framework using symbolic information extracted from the knowledge graph; and mapping an entity mention substring of a new sample of text to one of the entities in the set using the trained machine learning model.

Type: Application

Filed: January 14, 2022

Publication date: July 20, 2023

Inventors: Dinesh Khandelwal, G P Shrivatsa Bhargav, Saswati Dana, Dinesh Garg, Pavan Kapanipathi Bangalore, Salim Roukos, Alexander Gray, L. Venkata Subramaniam
GENERATING ERROR EVENT DESCRIPTIONS USING CONTEXT- SPECIFIC ATTENTION

Publication number: 20230084422

Abstract: Generating error event descriptions by receiving a set of error messages associated with an error event, generating a tokenization of at least one line of the set of error messages, providing the tokenization to an attention head according to a context of the tokenization, providing an output of the attention head as input to a generative model, generating a description of the error event according to the output, and providing the description to a user.

Type: Application

Filed: September 10, 2021

Publication date: March 16, 2023

Inventors: ANJALI SHAH, Jennifer A. Mallette, Salim Roukos
KNOWLEDGE BASE QUESTION ANSWERING

Publication number: 20230060589

Abstract: One or more computer processors parse a received natural language question into an abstract meaning representation (AMR) graph. The one or more computer processors enrich the AMR graph into an extended AMR graph. The one or more computer processors transform the extended AMR graph into a query graph utilizing a path-based approach, wherein the query graph is a directed edge-labeled graph. The one or more computer processors generate one or more answers to the natural language question through one or more queries created utilizing the query graph.

Type: Application

Filed: August 31, 2021

Publication date: March 2, 2023

Inventors: Srinivas Ravishankar, Pavan Kapanipathi Bangalore, IBRAHIM ABDELAZIZ, NANDANA MIHINDUKULASOORIYA, Dinesh Garg, Salim Roukos, Alexander Gray
Extracting Facts from Unstructured Text

Publication number: 20220207384

Abstract: A system, computer program product, and method are provided for extraction of factual data from unstructured natural language (NL) text. A detection model is applied to convert unstructured NL text in a first language to annotated NL text. The detection model identifies two or more mentions from the unstructured NL text and a logical position of the mentions. The detection model further identifies a sequential position for each of the mentions and attaches a sequential position identifier. A pattern of rules corresponding with the annotated NL text is identified and applied to the annotated NL text, and one or more facts embedded within the annotated NL text are extracted and converted into structured data.

Type: Application

Filed: December 30, 2020

Publication date: June 30, 2022

Applicant: International Business Machines Corporation

Inventors: Radu Florian, Salim Roukos, Martin Franz
Text classification using models with complementary granularity and accuracy

Patent number: 11373041

Abstract: A processor may receive a text segment. The processor may analyze the text segment at a plurality of granularity levels wherein each of the plurality of granularity levels has a comparative selection value for identifying one or more objects of interest within the text segment. The processor may select an optimized granularity level with an optimum comparative selection value. The processor may identify the one or more objects of interest within the text segment. The processor may display the one or more objects of interest to a user.

Type: Grant

Filed: September 18, 2020

Date of Patent: June 28, 2022

Assignee: International Business Machines Corporation

Inventors: Jian Ni, Radu Florian, Salim Roukos, Vittorio Castelli
IMPLEMENTING RELATION LINKING FOR KNOWLEDGE BASES

Publication number: 20220129770

Abstract: A computer-implemented method according to one embodiment includes identifying a natural language query; translating the natural language query into an intermediate representation; converting the intermediate representation into one or more query triples; and performing relation linking between each of the one or more query triples and a plurality of knowledge base triples.

Type: Application

Filed: October 23, 2020

Publication date: April 28, 2022

Inventors: Nandana Mihindukulasooriya, Gaetano Rossiello, Alfio Massimiliano Gliozzo, Pavan Kapanipathi Bangalore, Salim Roukos
TEXT CLASSIFICATION USING MODELS WITH COMPLEMENTARY GRANULARITY AND ACCURACY

Publication number: 20220092262

Abstract: A processor may receive a text segment. The processor may analyze the text segment at a plurality of granularity levels wherein each of the plurality of granularity levels has a comparative selection value for identifying one or more objects of interest within the text segment. The processor may select an optimized granularity level with an optimum comparative selection value. The processor may identify the one or more objects of interest within the text segment. The processor may display the one or more objects of interest to a user.

Type: Application

Filed: September 18, 2020

Publication date: March 24, 2022

Inventors: Jian Ni, Radu Florian, Salim Roukos, Vittorio Castelli
Automatic cognate detection in a computer-assisted language learning system

Patent number: 9665562

Abstract: According to an aspect, a first word in a first language and a second word in a second language in a bilingual corpus are stemmed. A probability for aligning the first stem and the second stem and a distance metric between the normalized first stem and the normalized second stem are calculated. The first word and the second word are identified as a cognate pair when the probability and the distance metric meet a threshold criterion and stored as a cognate pair in a set of cognates. A candidate sentence in the second language is retrieved from a corpus. The candidate sentence is filtered by the active vocabulary of a user in the second language and the set of cognates. A sentence quality score is calculated for the candidate sentence; and the candidate sentence is ranked for presentation to the user based on the sentence quality scorer.

Type: Grant

Filed: June 22, 2016

Date of Patent: May 30, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jiri Navratil, Salim Roukos, Robert T. Ward
Automatic cognate detection in a computer-assisted language learning system

Patent number: 9400781

Abstract: According to an aspect, a first word in a first language and a second word in a second language in a bilingual corpus are stemmed. A probability for aligning the first stem and the second stem and a distance metric between the normalized first stem and the normalized second stem are calculated. The first word and the second word are identified as a cognate pair when the probability and the distance metric meet a threshold criterion and stored as a cognate pair in a set of cognates. A candidate sentence in the second language is retrieved from a corpus. The candidate sentence is filtered by the active vocabulary of a user in the second language and the set of cognates. A sentence quality score is calculated for the candidate sentence; and the candidate sentence is ranked for presentation to the user based on the sentence quality scorer.

Type: Grant

Filed: February 8, 2016

Date of Patent: July 26, 2016

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jiri Navratil, Salim Roukos, Robert T. Ward
Method and apparatus for annotating a document

Publication number: 20070061703

Abstract: Methods and apparatus are provided for annotating documents with one or more of entities, events and relations. Documents are annotated by presenting the document to a user; presenting the user with a list of possible entity types, wherein the list of possible entity types is configurable; and obtaining at least one mention annotation that associates a selected phrase in the document with one of the possible entity types. The selected phrase can be presented to the user, for example, based on one or more presentation rules associated with the associated entity type. The method can be implemented, for example, in a client-server configuration where a browser communicates with a remote server.

Type: Application

Filed: September 12, 2005

Publication date: March 15, 2007

Applicant: International Business Machines Corporation

Inventors: Nandakishore Kambhatla, Salim Roukos
Mention-synchronous entity tracking system and method for chaining mentions

Publication number: 20050237227

Abstract: A Bell Tree data structure is provided to model the process of chaining the mentions, from one or more documents, into entities, tracking the entire process; where the data structure is used in an entity tracking process that produces multiple results ranked by a product of probability scores.

Type: Application

Filed: April 27, 2004

Publication date: October 27, 2005

Applicant: International Business Machines Corporation

Inventors: Abraham Ittycheriah, Hongyan Jing, Nandakishore Kambhatla, Xiaoqiang Luo, Salim Roukos
System and method for rapid development of natural language understanding using active learning

Publication number: 20040111253

Abstract: A method, computer program product, and data processing system for training a statistical parser by utilizing active learning techniques to reduce the size of the corpus of human-annotated training samples (e.g., sentences) needed is disclosed. According to a preferred embodiment of the present invention, the statistical parser under training is used to compare the grammatical structure of the samples according to the parser's current level of training. The samples are then divided into clusters, with each cluster representing samples having a similar structure as ascertained by the statistical parser. Uncertainty metrics are applied to the clustered samples to select samples from each cluster that reflect uncertainty in the statistical parser's grammatical model. These selected samples may then be annotated by a human trainer for training the statistical parser.

Type: Application

Filed: December 10, 2002

Publication date: June 10, 2004

Applicant: International Business Machines Corporation

Inventors: Xiaoqiang Luo, Salim Roukos, Min Tang
Specific task composite acoustic models

Patent number: 6260014

Abstract: A method for recognizing speech includes the steps of providing a generic model having a baseform representation of a vocabulary of words, identifying a subset of words relating to an application, constructing a task specific model for the subset of words, constructing a composite model by combining the generic and task specific models and modifying the baseform representation of the subset of words such that the subset of words are recognized by the task specific model. A system for recognizing speech includes a composite model having a generic model having a generic baseform representation of a vocabulary of words and a task specific model for recognizing a subset of words relating to an application wherein the subset of words are recognized using a modified baseform representation. A recognizer compares words input thereto with the generic model for words other than the subset of words and with the task specific model for the subset of words.

Type: Grant

Filed: September 14, 1998

Date of Patent: July 10, 2001

Assignee: International Business Machines Corporation

Inventors: Lalit Rai Bahl, David Lubensky, Mukund Padmanabhan, Salim Roukos
Natural language task-oriented dialog manager and method

Patent number: 6246981

Abstract: A system for conversant interaction includes a recognizer for receiving and processing input information and outputting a recognized representation of the input information. A dialog manager is coupled to the recognizer for receiving the recognized representation of the input information, the dialog manager having task-oriented forms for associating user input information therewith, the dialog manager being capable of selecting an applicable form from the task-oriented forms responsive to the input information by scoring the forms relative to each other. A synthesizer is employed for converting a response generated by the dialog manager to output the response. A program storage device and method are also provided.

Type: Grant

Filed: November 25, 1998

Date of Patent: June 12, 2001

Assignee: International Business Machines Corporation

Inventors: Kishore A. Papineni, Salim Roukos, Robert T. Ward
Telephone messaging and editing system

Patent number: 6219638

Abstract: A messaging system for receiving speech over a telephone and converting the speech to text includes a first server for receiving speech input by a user, a speech recognition system for converting the speech to text, a speech synthesizer for converting the text to speech for playing back the synthesized speech for correction by the user and a correction mechanism for enabling the user to correct the speech such that the corrected speech is provided as text for transmittal over a communication system.

Type: Grant

Filed: November 3, 1998

Date of Patent: April 17, 2001

Assignee: International Business Machines Corporation

Inventors: Mukund Padmanabhan, Michael Picheny, David Nahamoo, Salim Roukos
Statistical translation system and method for fast sense disambiguation and translation of large corpora using fertility models and sense models

Patent number: 6092034

Abstract: A system and method for translating a series of source words in a first language to a series of target words in a second language is provided. The system includes an input device for inputting the series of source words. A fertility hypothesis generator operatively coupled to the input device generates at least one fertility hypotheses for a fertility of a source word, based on the source word and a context of the source word. A sense hypothesis generator operatively coupled to the input device generates sense hypotheses for a translation of the source word, based on the source word and the context of the source word. A fertility model operatively coupled to the fertility hypothesis generator determines a probability of the fertility of the source word, based on the source word and the context of the source word.

Type: Grant

Filed: July 27, 1998

Date of Patent: July 18, 2000

Assignee: International Business Machines Corporation

Inventors: Jeffrey Scott McCarley, Salim Roukos
Building scalable n-gram language models using maximum likelihood maximum entropy n-gram models

Patent number: 5640487

Abstract: The present invention is an n-gram language modeler which significantly reduces the memory storage requirement and convergence time for language modelling systems and methods. The present invention aligns each n-gram with one of "n" number of non-intersecting classes. A count is determined for each n-gram representing the number of times each n-gram occurred in the training data. The n-grams are separated into classes and complement counts are determined. Using these counts and complement counts factors are determined, one factor for each class, using an iterative scaling algorithm. The language model probability, i.e., the probability that a word occurs given the occurrence of the previous two words, is determined using these factors.

Type: Grant

Filed: June 7, 1995

Date of Patent: June 17, 1997

Assignee: International Business Machines Corporation

Inventors: Raymond Lau, Ronald Rosenfeld, Salim Roukos
Building scalable N-gram language models using maximum likelihood maximum entropy N-gram models

Patent number: 5467425

Abstract: The present invention is an n-gram language modeler which significantly reduces the memory storage requirement and convergence time for language modelling systems and methods. The present invention aligns each n-gram with one of "n" number of non-intersecting classes. A count is determined for each n-gram representing the number of times each n-gram occurred in the training data. The n-grams are separated into classes and complement counts are determined. Using these counts and complement counts factors are determined, one factor for each class, using an iterative scaling algorithm. The language model probability, i.e., the probability that a word occurs given the occurrence of the previous two words, is determined using these factors.

Type: Grant

Filed: February 26, 1993

Date of Patent: November 14, 1995

Assignee: International Business Machines Corporation

Inventors: Raymond Lau, Ronald Rosenfeld, Salim Roukos