Patents by Inventor Nicolae Duta
Nicolae Duta has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11348330Abstract: Systems, methods, and computer-executable instructions for extracting key value data. Optical character recognition (OCR) text of a document is received. The y-coordinate of characters are adjusted to a common y-coordinate. The rows of OCR text are tokenized into tokens based on a distance between characters. The tokens are ordered based on the x,y coordinates of the characters. The document is clustered into a cluster based on the ordered tokens and ordered tokens from other documents. Keys for the cluster are determined from the first set of documents. Each key is a token from a first set of documents. A value is assigned to each kay based on the tokens for the document, and values are assigned to each key for the other documents. The values for the document and the values for the other documents are stored in an output document.Type: GrantFiled: June 9, 2020Date of Patent: May 31, 2022Assignee: Microsoft Technology Licensing, LLCInventor: Nicolae Duta
-
Patent number: 11030403Abstract: Methods and systems are provided for creating a calendar event using context. A natural language expression including at least one of words, terms, and phrases of text may be received at a calendar event creation module from an application. The calendar event creation module may identify one or more slots in the text of the natural language expression related to the calendar event using a first grammar module and a second grammar module. The one or more slots identified by the first grammar module and the second grammar module that indicate a calendar event may be compared to determine whether there is a match between the one or more identified slots. If a match is found, at least one calendar event using the one or more slots identified by the first grammar module and the second grammar module may be created.Type: GrantFiled: May 11, 2016Date of Patent: June 8, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Timothy J. Hazen, Diamond Bishop, Nicolae Duta, Mohammad Babaeizadeh, Peter Longo
-
Patent number: 10878195Abstract: A “Table Extractor” provides various techniques for automatically delimiting and extracting tables from arbitrary documents. In various implementations, the Table extractor also generates functional relationships on those tables that are suitable for generating query responses via any of a variety of natural language processing techniques. In other words, the Table Extractor provides techniques for detecting and representing table information in a way suitable for information extraction. These techniques output relational functions on the table in the form of tuples constructed from automatically identified headers and labels and the relationships between those headers and labels and the contents of one or more cells of the table. These tuples are suitable for correlating natural language questions about a specific piece of information in the table with the rows, columns, and/or cells that contain that information.Type: GrantFiled: May 3, 2018Date of Patent: December 29, 2020Assignee: Microsoft Technology Licensing, LLCInventor: Nicolae Duta
-
Publication number: 20200302219Abstract: Systems, methods, and computer-executable instructions for extracting key value data. Optical character recognition (OCR) text of a document is received. The y-coordinate of characters are adjusted to a common y-coordinate. The rows of OCR text are tokenized into tokens based on a distance between characters. The tokens are ordered based on the x,y coordinates of the characters. The document is clustered into a cluster based on the ordered tokens and ordered tokens from other documents. Keys for the cluster are determined from the first set of documents. Each key is a token from a first set of documents. A value is assigned to each kay based on the tokens for the document, and values are assigned to each key for the other documents. The values for the document and the values for the other documents are stored in an output document.Type: ApplicationFiled: June 9, 2020Publication date: September 24, 2020Inventor: Nicolae Duta
-
Patent number: 10713524Abstract: Systems, methods, and computer-executable instructions for extracting key value data. Optical character recognition (OCR) text of a document is received. The y-coordinate of characters are adjusted to a common y-coordinate. The rows of OCR text are tokenized into tokens based on a distance between characters. The tokens are ordered based on the x,y coordinates of the characters. The document is clustered into a cluster based on the ordered tokens and ordered tokens from other documents. Keys for the cluster are determined from the first set of documents. Each key is a token from a first set of documents. A value is assigned to each key based on the tokens for the document, and values are assigned to each key for the other documents. The values for the document and the values for the other documents are stored in an output document.Type: GrantFiled: October 10, 2018Date of Patent: July 14, 2020Assignee: Microsoft Technology Licensing, LLCInventor: Nicolae Duta
-
Publication number: 20200117944Abstract: Systems, methods, and computer-executable instructions for extracting key value data. Optical character recognition (OCR) text of a document is received. The y-coordinate of characters are adjusted to a common y-coordinate. The rows of OCR text are tokenized into tokens based on a distance between characters. The tokens are ordered based on the x,y coordinates of the characters. The document is clustered into a cluster based on the ordered tokens and ordered tokens from other documents. Keys for the cluster are determined from the first set of documents. Each key is a token from a first set of documents. A value is assigned to each key based on the tokens for the document, and values are assigned to each key for the other documents. The values for the document and the values for the other documents are stored in an output document.Type: ApplicationFiled: October 10, 2018Publication date: April 16, 2020Inventor: Nicolae Duta
-
Publication number: 20190340240Abstract: A “Table Extractor” provides various techniques for automatically delimiting and extracting tables from arbitrary documents. In various implementations, the Table extractor also generates functional relationships on those tables that are suitable for generating query responses via any of a variety of natural language processing techniques. In other words, the Table Extractor provides techniques for detecting and representing table information in a way suitable for information extraction. These techniques output relational functions on the table in the form of tuples constructed from automatically identified headers and labels and the relationships between those headers and labels and the contents of one or more cells of the table. These tuples are suitable for correlating natural language questions about a specific piece of information in the table with the rows, columns, and/or cells that contain that information.Type: ApplicationFiled: May 3, 2018Publication date: November 7, 2019Applicant: Microsoft Technology Licensing, LLCInventor: Nicolae DUTA
-
Patent number: 10282419Abstract: An arrangement and corresponding method are described for multi-domain natural language processing. Multiple parallel domain pipelines are used for processing a natural language input. Each domain pipeline represents a different specific subject domain of related concepts. Each domain pipeline includes a mention module that processes the natural language input using natural language understanding (NLU) to determine a corresponding list of mentions, and an interpretation generator that receives the list of mentions and produces a rank-ordered domain output set of sentence-level interpretation candidates. A global evidence ranker receives the domain output sets from the domain pipelines and produces an overall rank-ordered final output set of sentence-level interpretations.Type: GrantFiled: December 12, 2012Date of Patent: May 7, 2019Assignee: Nuance Communications, Inc.Inventors: Matthieu Hebert, Jean-Philippe Robichaud, Christopher M. Parisien, Nicolae Duta, Jerome Tremblay, Amjad Almahairi, Lakshmish Kaushik, Maryse Boisvert
-
Patent number: 9620110Abstract: An automated method is described for developing an automated speech input semantic classification system such as a call routing system. A set of semantic classifications is defined for classification of input speech utterances, where each semantic classification represents a specific semantic classification of the speech input. The semantic classification system is trained from training data from training data substantially without manually transcribed in-domain training data, and then operated to assign input speech utterances to the defined semantic classifications. Adaptation training data based on input speech utterances is collected with manually assigned semantic labels from at least one source of already collected language data. When the adaptation training data satisfies a pre-determined adaptation criteria, the semantic classification system is automatically retrained based on the adaptation training data.Type: GrantFiled: April 28, 2014Date of Patent: April 11, 2017Assignee: Nuance Communications, Inc.Inventors: Nicolae Duta, Réal Tremblay, Andrew D. Mauro, S. Douglas Peters
-
Publication number: 20160253310Abstract: Methods and systems are provided for creating a calendar event using context. A natural language expression including at least one of words, terms, and phrases of text may be received at a calendar event creation module from an application. The calendar event creation module may identify one or more slots in the text of the natural language expression related to the calendar event using a first grammar module and a second grammar module. The one or more slots identified by the first grammar module and the second grammar module that indicate a calendar event may be compared to determine whether there is a match between the one or more identified slots. If a match is found, at least one calendar event using the one or more slots identified by the first grammar module and the second grammar module may be created.Type: ApplicationFiled: May 11, 2016Publication date: September 1, 2016Applicant: Microsoft Technology Licensing, LLCInventors: Timothy J. Hazen, Diamond Bishop, Nicolae Duta, Mohammad Babaeizadeh, Peter Longo
-
Patent number: 9372851Abstract: Methods and systems are provided for creating a calendar event using context. A natural language expression including at least one of words, terms, and phrases of text may be received at a calendar event creation module from an application. The calendar event creation module may identify one or more slots in the text of the natural language expression related to the calendar event using a first grammar module and a second grammar module. The one or more slots identified by the first grammar module and the second grammar module that indicate a calendar event may be compared to determine whether there is a match between the one or more identified slots. If a match is found, at least one calendar event using the one or more slots identified by the first grammar module and the second grammar module may be created.Type: GrantFiled: April 1, 2014Date of Patent: June 21, 2016Assignee: Microsoft Technology Licensing, LLCInventors: Timothy J. Hazen, Diamond Bishop, Nicolae Duta, Mohammad Babaeizadeh, Peter Longo
-
Publication number: 20160140957Abstract: An automated method is described for developing an automated speech input semantic classification system such as a call routing system. A set of semantic classifications is defined for classification of input speech utterances, where each semantic classification represents a specific semantic classification of the speech input. The semantic classification system is trained from training data from training data substantially without manually transcribed in-domain training data, and then operated to assign input speech utterances to the defined semantic classifications. Adaptation training data based on input speech utterances is collected with manually assigned semantic labels from at least one source of already collected language data. When the adaptation training data satisfies a pre-determined adaptation criteria, the semantic classification system is automatically retrained based on the adaptation training data.Type: ApplicationFiled: April 28, 2014Publication date: May 19, 2016Applicant: Nuance Communications, Inc.Inventors: Nicolae Duta, Réal Tremblay, Andrew D. Mauro, S. Douglas Peters
-
Publication number: 20150278199Abstract: Methods and systems are provided for creating a calendar event using context. A natural language expression including at least one of words, terms, and phrases of text may be received at a calendar event creation module from an application. The calendar event creation module may identify one or more slots in the text of the natural language expression related to the calendar event using a first grammar module and a second grammar module. The one or more slots identified by the first grammar module and the second grammar module that indicate a calendar event may be compared to determine whether there is a match between the one or more identified slots. If a match is found, at least one calendar event using the one or more slots identified by the first grammar module and the second grammar module may be created.Type: ApplicationFiled: April 1, 2014Publication date: October 1, 2015Applicant: Microsoft CorporationInventors: Timothy J. Hazen, Diamond Bishop, Nicolae Duta, Mohammad Babaeizadeh, Peter Longo
-
Patent number: 8781833Abstract: An automated method is described for developing an automated speech input semantic classification system such as a call routing system. A set of semantic classifications is defined for classification of input speech utterances, where each semantic classification represents a specific semantic classification of the speech input. The semantic classification system is trained from training data having little or no in-domain manually transcribed training data, and then operated to assign input speech utterances to the defined semantic classifications. Adaptation training data based on input speech utterances is collected with manually assigned semantic labels. When the adaptation training data satisfies a pre-determined adaptation criteria, the semantic classification system is automatically retrained based on the adaptation training data.Type: GrantFiled: July 15, 2009Date of Patent: July 15, 2014Assignee: Nuance Communications, Inc.Inventors: Nicolae Duta, Rèal Tremblay, Andy Mauro, Douglas Peters
-
Publication number: 20140163959Abstract: An arrangement and corresponding method are described for multi-domain natural language processing. Multiple parallel domain pipelines are used for processing a natural language input. Each domain pipeline represents a different specific subject domain of related concepts. Each domain pipeline includes a mention module that processes the natural language input using natural language understanding (NLU) to determine a corresponding list of mentions, and an interpretation generator that receives the list of mentions and produces a rank-ordered domain output set of sentence-level interpretation candidates. A global evidence ranker receives the domain output sets from the domain pipelines and produces an overall rank-ordered final output set of sentence-level interpretations.Type: ApplicationFiled: December 12, 2012Publication date: June 12, 2014Applicant: Nuance Communications, Inc.Inventors: Matthieu Hebert, Jean-Philippe Robichaud, Christopher M. Parisien, Nicolae Duta, Jerome Tremblay, Amjad Almahairi, Lakshmish Kaushik, Maryse Boisvert
-
Patent number: 8515736Abstract: Techniques disclosed herein include systems and methods for reusing semantically-labeled data collected for previous or existing call routing applications. Such reuse of semantically-labeled utterances can be used for automating and accelerating application design as well as data transcription and labeling for new and future call routing applications. Such techniques include using a semantic database containing transcriptions and semantic labels for several call routing applications along with corresponding baseline routers trained for those applications. This semantic database can be used to derive a semantic similarity measure between any pair of utterances, such as transcribed sentences. A mathematical model predicts how semantically related two utterances are, such as by identifying a same user intent to identifying completely unrelated intents.Type: GrantFiled: September 30, 2010Date of Patent: August 20, 2013Assignee: Nuance Communications, Inc.Inventor: Nicolae Duta
-
Publication number: 20130018864Abstract: Some embodiments relate to techniques for receiving a query comprising content; in response to the query being received, determining that the content may have at least a first semantic meaning or a second semantic meaning that is different than the first semantic meaning; and identifying a plurality of search engines to which to submit a representation of the query, the plurality of search engines comprising a first search engine identified based on the first semantic meaning and a second search engine identified based on the second semantic meaning.Type: ApplicationFiled: July 14, 2011Publication date: January 17, 2013Applicant: Nuance Communications, Inc.Inventors: Marc W. Regan, Vladimir Sejnoha, Matthieu Hebert, Nicolae Duta, Nir Halperin, Carmit Brikman, Michael Leong
-
Publication number: 20100023331Abstract: An automated method is described for developing an automated speech input semantic classification system such as a call routing system. A set of semantic classifications is defined for classification of input speech utterances, where each semantic classification represents a specific semantic classification of the speech input. The semantic classification system is trained from training data having little or no in-domain manually transcribed training data, and then operated to assign input speech utterances to the defined semantic classifications. Adaptation training data based on input speech utterances is collected with manually assigned semantic labels. When the adaptation training data satisfies a pre-determined adaptation criteria, the semantic classification system is automatically retrained based on the adaptation training data.Type: ApplicationFiled: July 15, 2009Publication date: January 28, 2010Applicant: Nuance Communications, Inc.Inventors: Nicolae Duta, Rèal Tremblay, Andy Mauro, Douglas Peters
-
Patent number: 7400757Abstract: A method is provided for segmenting an image of interest of a left ventricle. The method includes determining a myocardium contour according to a graph cut of candidate endocardium contours, and a spline fitting to candidate epicardium contours in the absence of shape propagation. The method further includes applying a plurality of shape constraints to candidate endocardium contours and candidate epicardium contours to determine the myocardium contour, wherein a template is determined by shape propagation of a plurality of images in a sequence including the image of interest in the presence of shape propagation.Type: GrantFiled: February 2, 2005Date of Patent: July 15, 2008Assignee: Siemens Medical Solutions USA, Inc.Inventors: Marie-Pierre Jolly, Ying Sun, Nicolae Duta
-
Publication number: 20030035573Abstract: An automated method for detection of an object of interest in magnetic resonance (MR) two-dimensional (2-D) images wherein the images comprise gray level patterns, the method includes a learning stage utilizing a set of positive/negative training samples drawn from a specified feature space.Type: ApplicationFiled: December 20, 2000Publication date: February 20, 2003Inventors: Nicolae Duta, Marie-Pierre Jolly