Patents by Inventor Branimir K. Boguraev

Branimir K. Boguraev has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20160203120
    Abstract: According to an aspect, a candidate lexical kernel unit that includes a word token sequence having two or more words is received. Domain terms that contain the two or more words are retrieved from a terminology resource file of domain terms associated with a domain. The candidate lexical kernel unit and the retrieved domain terms are analyzed to determine whether the candidate lexical kernel unit satisfies specified criteria for use as a building block by a natural-language processing (NLP) tool for building larger lexical units in the domain. Each of the larger lexical units includes a greater number of words than the candidate lexical kernel unit. The candidate lexical kernel unit is identified as a lexical kernel unit based on determining that the candidate lexical kernel unit satisfies the specified criteria. The lexical kernel unit is output to a domain-specific lexical kernel unit file for input to the NLP tool.
    Type: Application
    Filed: March 11, 2015
    Publication date: July 14, 2016
    Inventors: Branimir K. Boguraev, Esme Manandise, Benjamin P. Segal
  • Publication number: 20160203119
    Abstract: According to an aspect, a candidate lexical kernel unit that includes a word token sequence having two or more words is received. Domain terms that contain the two or more words are retrieved from a terminology resource file of domain terms associated with a domain. The candidate lexical kernel unit and the retrieved domain terms are analyzed to determine whether the candidate lexical kernel unit satisfies specified criteria for use as a building block by a natural-language processing (NLP) tool for building larger lexical units in the domain. Each of the larger lexical units includes a greater number of words than the candidate lexical kernel unit. The candidate lexical kernel unit is identified as a lexical kernel unit based on determining that the candidate lexical kernel unit satisfies the specified criteria. The lexical kernel unit is output to a domain-specific lexical kernel unit file for input to the NLP tool.
    Type: Application
    Filed: January 9, 2015
    Publication date: July 14, 2016
    Inventors: Branimir K. Boguraev, Esme Manandise, Benjamin P. Segal
  • Publication number: 20160179783
    Abstract: According to an aspect, a candidate token sequence including one or more word tokens is extracted from an unstructured domain glossary that includes entries associated with a domain. A look-up operation is performed to retrieve language data for each word token in the candidate token sequence and annotates each word token in the candidate token sequence found by the look-up operation with corresponding retrieved language data to form an annotated sequence. A pattern match of the annotated sequence is performed relative to a repository of patterns and identifies a best matching pattern from the repository of patterns to the annotated sequence based on matching criteria. The annotated sequence is refined with lexical information associated with the best matching pattern as a refined annotated sequence. The candidate token sequence and the refined annotated sequence are output to a domain-specific computational lexicon file.
    Type: Application
    Filed: March 5, 2015
    Publication date: June 23, 2016
    Inventors: Branimir K. Boguraev, Esme Manandise, Benjamin P. Segal
  • Publication number: 20160179782
    Abstract: According to an aspect, a candidate token sequence including one or more word tokens is extracted from an unstructured domain glossary that includes entries associated with a domain. A look-up operation is performed to retrieve language data for each word token in the candidate token sequence and annotates each word token in the candidate token sequence found by the look-up operation with corresponding retrieved language data to form an annotated sequence. A pattern match of the annotated sequence is performed relative to a repository of patterns and identifies a best matching pattern from the repository of patterns to the annotated sequence based on matching criteria. The annotated sequence is refined with lexical information associated with the best matching pattern as a refined annotated sequence. The candidate token sequence and the refined annotated sequence are output to a domain-specific computational lexicon file.
    Type: Application
    Filed: December 23, 2014
    Publication date: June 23, 2016
    Inventors: Branimir K. Boguraev, Esme Manandise, Benjamin P. Segal
  • Publication number: 20160147840
    Abstract: Annotations can be handled by a computer system that receives a query that specifies parameters for extraction of particular annotations from a set of annotations. Annotations include metadata that describes properties of the associated text fragment. A first entity subset, a second entity subset and a relations subset of annotations are extracted from an annotated text corpus. Contextual information relative to the extracted annotations is also extracted from the corpus. A user interface is generated to display frame elements that include the extracted annotations subsets and the extracted contextual information. In response to selections to the frame elements, the system receives input that specifies modifications to the annotations. Based on the input received, the set of annotations is modified in the annotated text corpus.
    Type: Application
    Filed: November 21, 2014
    Publication date: May 26, 2016
    Inventors: Branimir K. Boguraev, Anthony T. Levas
  • Publication number: 20160062980
    Abstract: Mechanisms are provided in a question answering (QA) system comprising a QA system pipeline that analyzes an input question and generates an answer to the input question, for pre-processing the input question. The mechanisms receive an input question and input the input question to a pre-processor flow path having one or more pre-processors. The one or more pre-processors transform the input question into a transformed question by correcting errors in a formulation of the input question that are determined to be detrimental to efficient and accurate processing of the input question by a QA system pipeline of the QA system. The transformed question is then input to the QA system pipeline of the QA system which processes the transformed question to generate and output an answer to the input question.
    Type: Application
    Filed: August 29, 2014
    Publication date: March 3, 2016
    Inventors: Branimir K. Boguraev, John P. Bufe, III, Matthew T. Hatem, Jared M.D. Smythe
  • Patent number: 9031832
    Abstract: Context-based disambiguation of acronyms and/or abbreviations may determine a target abbreviation and one or more keywords appearing in context with the target abbreviation in a received passage, the target abbreviation representing a shortened form of one or more word. A contextual search query including the target abbreviation and said one or more keywords may be generated. A pseudo document index may be searched for one or more expansions of the target abbreviation by invoking the contextual search query, the pseudo document index containing index of one or more pseudo documents, associated one or more abbreviations and associated context keywords. One or more pseudo documents associated with the target abbreviation may be returned based on the searching of the pseudo document index.
    Type: Grant
    Filed: September 6, 2012
    Date of Patent: May 12, 2015
    Assignee: International Business Machines Corporation
    Inventors: Branimir K. Boguraev, Jennifer Chu-Carroll, David A. Ferrucci, Anthony T. Levas, John M. Prager
  • Patent number: 9020805
    Abstract: Context-based disambiguation of acronyms and/or abbreviations may determine a target abbreviation and one or more keywords appearing in context with the target abbreviation in a received passage, the target abbreviation representing a shortened form of one or more word. A contextual search query including the target abbreviation and said one or more keywords may be generated. A pseudo document index may be searched for one or more expansions of the target abbreviation by invoking the contextual search query, the pseudo document index containing index of one or more pseudo documents, associated one or more abbreviations and associated context keywords. One or more pseudo documents associated with the target abbreviation may be returned based on the searching of the pseudo document index.
    Type: Grant
    Filed: September 23, 2011
    Date of Patent: April 28, 2015
    Assignee: International Business Machines Corporation
    Inventors: Branimir K. Boguraev, Jennifer Chu-Carroll, David A. Ferrucci, Anthony T. Levas, John M. Prager
  • Publication number: 20140072948
    Abstract: A method of generating secondary questions in a question-answer system. Missing information is identified from a corpus of data using a computerized device. The missing information comprises any information that improves confidence scores for candidate answers to a question. The computerized device automatically generates a plurality of hypotheses concerning the missing information. The computerized device automatically generates at least one secondary question based on each of the plurality of hypotheses. The hypotheses are ranked based on relative utility to determine an order in which the computerized device outputs the at least one secondary question to external sources to obtain responses.
    Type: Application
    Filed: September 11, 2012
    Publication date: March 13, 2014
    Applicant: International Business Machines Corporation
    Inventors: Branimir K. Boguraev, David W. Buchanan, Jennifer Chu-Carroll, David A. Ferrucci, Aditya A. Kalyanpur, James W. Murdock, IV, Siddharth A. Patwardhan
  • Publication number: 20140072947
    Abstract: A method of generating secondary questions in a question-answer system. Missing information is identified from a corpus of data using a computerized device. The missing information comprises any information that improves confidence scores for candidate answers to a question. The computerized device automatically generates a plurality of hypotheses concerning the missing information. The computerized device automatically generates at least one secondary question based on each of the plurality of hypotheses. The hypotheses are ranked based on relative utility to determine an order in which the computerized device outputs the at least one secondary question to external sources to obtain responses.
    Type: Application
    Filed: September 11, 2012
    Publication date: March 13, 2014
    Applicant: International Business Machines Corporation
    Inventors: Branimir K. Boguraev, David W. Buchanan, Jennifer Chu-Carroll, David A. Ferrucci, Aditya A. Kalyanpur, James W. Murdock, IV, Siddharth A. Patwardhan
  • Publication number: 20130013546
    Abstract: A system for providing community for customer questions receives a customer question. The customer question may be classified into a classification from a plurality of classifications categorizing whether a question is answerable, needs expert assistance, needs more information, or is not answerable. Based on the classification and one or more incentives, the question may be further routed to an appropriate community. The interactions with a customer in receiving and answering the customer question may be recorded.
    Type: Application
    Filed: September 14, 2012
    Publication date: January 10, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sugato Bagchi, Branimir K. Boguraev, Anthony T. Levas, Roberto Sicconi, Wlodek W. Zadrozny
  • Publication number: 20120330648
    Abstract: Context-based disambiguation of acronyms and/or abbreviations may determine a target abbreviation and one or more keywords appearing in context with the target abbreviation in a received passage, the target abbreviation representing a shortened form of one or more word. A contextual search query including the target abbreviation and said one or more keywords may be generated. A pseudo document index may be searched for one or more expansions of the target abbreviation by invoking the contextual search query, the pseudo document index containing index of one or more pseudo documents, associated one or more abbreviations and associated context keywords. One or more pseudo documents associated with the target abbreviation may be returned based on the searching of the pseudo document index.
    Type: Application
    Filed: September 6, 2012
    Publication date: December 27, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Branimir K. Boguraev, Jennifer Chu-Carroll, David A. Ferrucci, Anthony T. Levas, John M. Prager
  • Publication number: 20120084112
    Abstract: A system for providing community for customer questions receives a customer question. The customer question may be classified into a classification from a plurality of classifications categorizing whether a question is answerable, needs expert assistance, needs more information, or is not answerable. Based on the classification and one or more incentives, the question may be further routed to an appropriate community. The interactions with a customer in receiving and answering the customer question may be recorded.
    Type: Application
    Filed: September 24, 2011
    Publication date: April 5, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sugato Bagchi, Branimir K. Boguraev, Anthony T. Levas, Roberto Sicconi, Wlodek W. Zadrozny
  • Publication number: 20120084076
    Abstract: Context-based disambiguation of acronyms and/or abbreviations may determine a target abbreviation and one or more keywords appearing in context with the target abbreviation in a received passage, the target abbreviation representing a shortened form of one or more word. A contextual search query including the target abbreviation and said one or more keywords may be generated. A pseudo document index may be searched for one or more expansions of the target abbreviation by invoking the contextual search query, the pseudo document index containing index of one or more pseudo documents, associated one or more abbreviations and associated context keywords. One or more pseudo documents associated with the target abbreviation may be returned based on the searching of the pseudo document index.
    Type: Application
    Filed: September 23, 2011
    Publication date: April 5, 2012
    Applicant: International Business Machines Corporation
    Inventors: Branimir K. Boguraev, Jennifer Chu-Carroll, David A. Ferrucci, Anthony T. Levas, John M. Prager
  • Publication number: 20120036478
    Abstract: An apparatus includes a data processing system for generating and displaying a semantic type concordance. The data processing system includes memory storing a computer program, a display to display data of a concordance generated by the program, and a processor configured to execute the computer program. The computer program includes instructions for displaying a user interface configured to enable a user to select semantic types and specify at least one text document, generating a concordance of the at least one document based on the semantic types, and displaying data of the generated concordance on the display.
    Type: Application
    Filed: August 6, 2010
    Publication date: February 9, 2012
    Applicant: International Business Machines Corporation
    Inventors: Branimir K. Boguraev, Youssef Drissi, David A. Ferrucci, Paul T. Keyser, Anthony T. Levas
  • Patent number: 7937338
    Abstract: A system and method for processing documents by utilizing the textual content and layout of the documents, including visual indicators, to more efficiently and reliably process the documents across various document types. The system and method identifies visually distinguishable elements within the document, such as section and sub-section boundary indicators, to mark, divide and label the boundaries and content type such that the sections are more clearly identifiable and easily processed. The system and method uses known elements, including section heading types, keywords, section type classifiers, sub-section heading constructs, stop words, and the like to adaptively identify and process a broad range of document types. The system and method continually refines and updates these known elements and allows users to discover and define new elements for further refinement and updating.
    Type: Grant
    Filed: April 30, 2008
    Date of Patent: May 3, 2011
    Assignee: International Business Machines Corporation
    Inventors: Branimir K. Boguraev, Roy J. Byrd, Keh-Shin F. Cheng, Anni R. Coden, Michael A. Tanenblatt, Wilfried Teiken
  • Publication number: 20090276378
    Abstract: A system and method for processing documents by utilizing the textual content and layout of the documents, including visual indicators, to more efficiently and reliably process the documents across various document types. The system and method identifies visually distinguishable elements within the document, such as section and sub-section boundary indicators, to mark, divide and label the boundaries and content type such that the sections are more clearly identifiable and easily processed. The system and method uses known elements, including section heading types, keywords, section type classifiers, sub-section heading constructs, stop words, and the like to adaptively identify and process a broad range of document types. The system and method continually refines and updates these known elements and allows users to discover and define new elements for further refinement and updating.
    Type: Application
    Filed: April 30, 2008
    Publication date: November 5, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Branimir K. Boguraev, Roy J. Byrd, Keh-Shin F. Cheng, Anni R. Coden, Michael A. Tanenblatt, Wilfried Teiken
  • Patent number: 6212494
    Abstract: A method involving computer-mediated linguistic analysis of online technical documentation to extract and catalog from the documentation knowledge essential to, for example, creating a online help database useful in providing online assistance to users in performing a task. The method comprises stripping markup tags from the documentation, linguistically analyzing and annotating the text, including the steps of morphologically and lexically analyzing the text, disambiguating between possible parts-of-speech for each word, and syntactically analyzing and labeling each word.
    Type: Grant
    Filed: July 20, 1998
    Date of Patent: April 3, 2001
    Assignee: Apple Computer, Inc.
    Inventor: Branimir K. Boguraev
  • Patent number: 5799268
    Abstract: A method involving computer-mediated linguistic analysis of online technical documentation to extract and catalog from the documentation knowledge essential to, for example, creating a online help database useful in providing online assistance to users in performing a task. The method comprises stripping markup tags from the documentation, linguistically analyzing and annotating the text, including the steps of morphologically and lexically analyzing the text, disambiguating between possible parts-of-speech for each word, and syntactically analyzing and labeling each word.
    Type: Grant
    Filed: September 28, 1994
    Date of Patent: August 25, 1998
    Assignee: Apple Computer, Inc.
    Inventor: Branimir K. Boguraev