Patents by Inventor Nicholas Kibre

Nicholas Kibre has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230351098
    Abstract: Solutions for custom display post processing (DPP) in speech recognition (SR) use a customized multi-stage DPP pipeline that transforms a stream of SR tokens from lexical form to display form. A first transformation stage of the DPP pipeline receives the stream of tokens, in turn, by an upstream filter, a base model stage, and a downstream filter, and transforms a first aspect of the stream of tokens (e.g., disfluency, inverse text normalization (ITN), capitalization, etc.) from lexical form into display form. The upstream filter and/or the downstream filter alter the stream of tokens to change the default behavior of the DPP pipeline into custom behavior. Additional transformation stages of the DPP pipeline perform further transforms, allowing for outputting final text in a display format that is customized for a specific user. This permits each user to efficiently leverage a common baseline DPP pipeline to produce a custom output.
    Type: Application
    Filed: July 26, 2022
    Publication date: November 2, 2023
    Inventors: Wei LIU, Padma VARADHARAJAN, Piyush BEHRE, Nicholas KIBRE, Edward C. LIN, Shuangyu CHANG, Che ZHAO, Khuram SHAHID, Heiko Willy RAHMEL
  • Publication number: 20230289536
    Abstract: Solutions for on-device streaming inverse text normalization (ITN) include: receiving a stream of tokens, each token representing an element of human speech; tagging, by a tagger that can work in a streaming manner (e.g., a neural network), the stream of tokens with one or more tags of a plurality of tags to produce a tagged stream of tokens, each tag of the plurality of tags representing a different normalization category of a plurality of normalization categories; based on at least a first tag representing a first normalization category, converting, by a first language converter of a plurality of category-specific natural language converters (e.g., weighted finite state transducers, WFSTs), at least one token of the tagged stream of tokens, from a first lexical language form, to a first natural language form; and based on at least the first natural language form, outputting a natural language representation of the stream of tokens.
    Type: Application
    Filed: March 11, 2022
    Publication date: September 14, 2023
    Inventors: Yashesh GAUR, Nicholas KIBRE, Issac J. ALPHONSO, Jian XUE, Jinyu LI, Piyush BEHRE, Shawn CHANG
  • Patent number: 10923105
    Abstract: In non-limiting examples of the present disclosure, systems, methods and devices for mapping hyperarticulated sounds to text units are presented. A plurality of textual units may be received. The plurality of textual units may be processed with a natural language processing engine. A sentence structure for the plurality of textual units may be identified, wherein the sentence structure comprises a plurality of words. The plurality of words may be processed with a text-to-speech engine. A text-to-speech output comprising a plurality of pronunciations may be identified, wherein each of the plurality of pronunciations corresponds to a syllabic unit of one of the plurality of words. A hyperarticulated vowel sound may be mapped to each syllabic unit from the text-to-speech output. A pronunciation instruction corresponding to each hyperarticulated vowel sound may be caused to be surfaced.
    Type: Grant
    Filed: November 29, 2018
    Date of Patent: February 16, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kevin Chad Larson, Tanya Matskewich, Gregory Carl Hitchcock, Michael Tholfsen, Guillaume Simonnet, Viktoryia Akulich, Nicholas Kibre, Christina Chen Campbell
  • Publication number: 20200118542
    Abstract: In non-limiting examples of the present disclosure, systems, methods and devices for mapping hyperarticulated sounds to text units are presented. A plurality of textual units may be received. The plurality of textual units may be processed with a natural language processing engine. A sentence structure for the plurality of textual units may be identified, wherein the sentence structure comprises a plurality of words. The plurality of words may be processed with a text-to-speech engine. A text-to-speech output comprising a plurality of pronunciations may be identified, wherein each of the plurality of pronunciations corresponds to a syllabic unit of one of the plurality of words. A hyperarticulated vowel sound may be mapped to each syllabic unit from the text-to-speech output. A pronunciation instruction corresponding to each hyperarticulated vowel sound may be caused to be surfaced.
    Type: Application
    Filed: November 29, 2018
    Publication date: April 16, 2020
    Inventors: Kevin Chad Larson, Tanya Matskewich, Gregory Carl Hitchcock, Michael Tholfsen, Guillaume Simonnet, Viktoryia Akulich, Nicholas Kibre, Christina Chen Campbell
  • Patent number: 9947317
    Abstract: A new pronunciation learning system for dynamically learning new pronunciations assisted by user correction logs. The user correction logs provide a record of speech recognition events and subsequent user behavior that implicitly confirms or rejects the recognition result and/or shows the user's intended words by via subsequent input. The system analyzes the correction logs and distills them down to a set of words which lack acceptable pronunciations. Hypothetical pronunciations, constrained by spelling and other linguistic knowledge, are generated for each of the words. Offline recognition determines the hypothetical pronunciations with a good acoustical match to the audio data likely to contain the words. The matching pronunciations are aggregated and adjudicated to select new pronunciations for the words to improve general or personalized recognition models.
    Type: Grant
    Filed: February 13, 2017
    Date of Patent: April 17, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Nicholas Kibre, Umut Ozertem, Sarangarajan Parthasarathy, Ziad Al Bawab
  • Publication number: 20170154623
    Abstract: A new pronunciation learning system for dynamically learning new pronunciations assisted by user correction logs. The user correction logs provide a record of speech recognition events and subsequent user behavior that implicitly confirms or rejects the recognition result and/or shows the user's intended words by via subsequent input. The system analyzes the correction logs and distills them down to a set of words which lack acceptable pronunciations. Hypothetical pronunciations, constrained by spelling and other linguistic knowledge, are generated for each of the words. Offline recognition determines the hypothetical pronunciations with a good acoustical match to the audio data likely to contain the words. The matching pronunciations are aggregated and adjudicated to select new pronunciations for the words to improve general or personalized recognition models.
    Type: Application
    Filed: February 13, 2017
    Publication date: June 1, 2017
    Applicant: Microsoft Technology Licensing, LLC.
    Inventors: Nicholas Kibre, Umut Ozertem, Sarangarajan Parthasarathy, Ziad Al Bawab
  • Patent number: 9589562
    Abstract: A new pronunciation learning system for dynamically learning new pronunciations assisted by user correction logs. The user correction logs provide a record of speech recognition events and subsequent user behavior that implicitly confirms or rejects the recognition result and/or shows the user's intended words by via subsequent input. The system analyzes the correction logs and distills them down to a set of words which lack acceptable pronunciations. Hypothetical pronunciations, constrained by spelling and other linguistic knowledge, are generated for each of the words. Offline recognition determines the hypothetical pronunciations with a good acoustical match to the audio data likely to contain the words. The matching pronunciations are aggregated and adjudicated to select new pronunciations for the words to improve general or personalized recognition models.
    Type: Grant
    Filed: February 21, 2014
    Date of Patent: March 7, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Nicholas Kibre, Umut Ozertem, Sarangarajan Parthasarathy, Ziad Al Bawab
  • Publication number: 20150243278
    Abstract: A new pronunciation learning system for dynamically learning new pronunciations assisted by user correction logs. The user correction logs provide a record of speech recognition events and subsequent user behavior that implicitly confirms or rejects the recognition result and/or shows the user's intended words by via subsequent input. The system analyzes the correction logs and distills them down to a set of words which lack acceptable pronunciations. Hypothetical pronunciations, constrained by spelling and other linguistic knowledge, are generated for each of the words. Offline recognition determines the hypothetical pronunciations with a good acoustical match to the audio data likely to contain the words. The matching pronunciations are aggregated and adjudicated to select new pronunciations for the words to improve general or personalized recognition models.
    Type: Application
    Filed: February 21, 2014
    Publication date: August 27, 2015
    Applicant: MICROSOFT CORPORATION
    Inventors: Nicholas Kibre, Umut Ozertem, Sarangarajan Parthasarathy, Ziad Al Bawab
  • Patent number: 6985862
    Abstract: A multi-level method for estimating and training weights associated with grammar options is presented. The implementation of the method implemented differs depending on the amount of utterance data available for each option to be tuned. A first implementation, modified maximum likelihood estimation (MLE), can be used to estimate weights for a grammar option when few utterances are available for the option. Option weights are then estimated using an obtainable statistic that creates a basis for the predictability model. A second implementation, error corrective training (ECT), can be used to estimate option weight when a sufficiently large number of utterances are available. The ECT method minimizes the errors in the score of the correct interpretation of the utterance and the highest scoring incorrect interpretation in an utterance training set. The ECT method is iterated to converge on a solution for option weights.
    Type: Grant
    Filed: March 22, 2001
    Date of Patent: January 10, 2006
    Assignee: Tellme Networks, Inc.
    Inventors: Nikko Ström, Nicholas Kibre
  • Patent number: 6845358
    Abstract: A prosody matching template in the form of a tree structure stores indices which point to lookup table and template information prescribing pitch and duration values that are used to add inflection to the output of a text-to-speech synthesizer. The lookup module employs a search algorithm that explores each branch of the tree, assigning penalty scores based on whether the syllable represented by a node of the tree does or does not match the corresponding syllable of the target word. The path with the lowest penalty score is selected as the index into the prosody template table. The system will add nodes by cloning existing nodes in cases where it is not possible to find a one-to-one match between the number of syllables in the target word and the number of nodes in the tree.
    Type: Grant
    Filed: January 5, 2001
    Date of Patent: January 18, 2005
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Nicholas Kibre, Ted H. Applebaum
  • Patent number: 6792407
    Abstract: A new speaker provides speech from which comparison snippets are extracted. The comparison snippets are compared with initial snippets stored in a recorded snippet database that is associated with a concatenative synthesizer. The comparison of the snippets to the initial snippets produces required sound units. A greedy selection algorithm is performed with the required sound units for identifying the smallest subset of the input text that contains all of the text for the new speaker to read. The new speaker then reads the optimally selected text and sound units are extracted from the human speech such that the recorded snippet database is modified and the speech synthesized adopts the voice quality and characteristics of the new speaker.
    Type: Grant
    Filed: March 30, 2001
    Date of Patent: September 14, 2004
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Nicholas Kibre, Steven Pearson, Brian Hanson, Jean-Claude Junqua
  • Publication number: 20030004717
    Abstract: A multi-level method for estimating and training weights associated with grammar options is presented. The implementation of the method implemented differs depending on the amount of utterance data available for each option to be tuned. A first implementation, modified maximum likelihood estimation (MLE), can be used to estimate weights for a grammar option when few utterances are available for the option. Option weights are then estimated using an obtainable statistic that creates a basis for the predictability model. A second implementation, error corrective training (ECT) , can be used to estimate option weight when a sufficiently large number of utterances are available. The ECT method minimizes the errors in the score of the correct interpretation of the utterance and the highest scoring incorrect interpretation in an utterance training set. The ECT method is iterated to converge on a solution for option weights.
    Type: Application
    Filed: March 22, 2001
    Publication date: January 2, 2003
    Inventors: Nikko Strom, Nicholas Kibre
  • Publication number: 20020193994
    Abstract: A new speaker provides speech from which comparison snippets are extracted. The comparison snippets are compared with initial snippets stored in a recorded snippet database that is associated with a concatenative synthesizer. The comparison of the snippets to the initial snippets produces required sound units. A greedy selection algorithm is performed with the required sound units for identifying the smallest subset of the input text that contains all of the text for the new speaker to read. The new speaker then reads the optimally selected text and sound units are extracted from the human speech such that the recorded snippet database is modified and the speech synthesized adopts the voice quality and characteristics of the new speaker.
    Type: Application
    Filed: March 30, 2001
    Publication date: December 19, 2002
    Inventors: Nicholas Kibre, Steven Pearson, Brian Hanson, Jean-Claude Junqua
  • Publication number: 20020128841
    Abstract: A prosody matching template in the form of a tree structure stores indices which point to lookup table and template information prescribing pitch and duration values that are used to add inflection to the output of a text-to-speech synthesizer. The lookup module employs a search algorithm that explores each branch of the tree, assigning penalty scores based on whether the syllable represented by a node of the tree does or does not match the corresponding syllable of the target word. The path with the lowest penalty score is selected as the index into the prosody template table. The system will add nodes by cloning existing nodes in cases where it is not possible to find a one-to-one match between the number of syllables in the target word and the number of nodes in the tree.
    Type: Application
    Filed: January 5, 2001
    Publication date: September 12, 2002
    Inventors: Nicholas Kibre, Ted H. Applebaum
  • Patent number: 6202049
    Abstract: Speech signal parameters are extracted from time-series data corresponding to different sound units containing the same vowel. The extracted parameters are used to train a statistical model, such as a Hidden Markov-based Model, that has a data structure for separately modeling the nuclear trajectory region of the vowel and its surrounding transition elements. The model is trained as through embedded re-estimation to automatically determine optimally aligned models that identify the nuclear trajectory region. The boundaries of the nuclear trajectory region serve to delimit the overlap region for subsequent sound unit concatenation.
    Type: Grant
    Filed: March 9, 1999
    Date of Patent: March 13, 2001
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Nicholas Kibre, Steve Pearson
  • Patent number: 6144939
    Abstract: The concatenative speech synthesizer employs demi-syllable subword units to generate speech. The synthesizer is based on a source-filter model that uses source signals that correspond closely to the human glottal source and that uses filter parameters that correspond closely to the human vocal tract. Concatenation of the demi-syllable units is facilitated by two separate cross fade techniques, one applied in the time domain to the demi-syllable source signal waveforms, and one applied in the frequency domain by interpolating the corresponding filter parameters of the concatenated demi-syllables. The dual cross fade technique results in natural sounding synthesis that avoids time-domain glitches without degrading or smearing characteristic resonances in the filter domain.
    Type: Grant
    Filed: November 25, 1998
    Date of Patent: November 7, 2000
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Steve Pearson, Nicholas Kibre, Nancy Niedzielski
  • Patent number: 5966691
    Abstract: The operating system or application program generates events captured by an event handler mechanism that, in turn, invokes message assembler and graphics assembler modules. The message assembler constructs pseudo-random sentences or notification messages based on the type of event captured and optionally upon selected state variables. The message assembler supplies text strings to a text-to-speech engine. A linguistic database containing a lexicon of predefined words supplies text data that the message assembler concatenates to form the text strings. Text strings are assembled and optionally tagged to alter the message, speaking voice and inflection based on the nature of the message and its context. Graphics elements are assembled by a graphics assembler in response to the event handler and based on optionally supplied user-defined graphics parameters.
    Type: Grant
    Filed: April 29, 1997
    Date of Patent: October 12, 1999
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Nicholas Kibre, Yoshizumi Terada, Kazue Hata, Rhonda Shaw
  • Patent number: 5878393
    Abstract: Computer-stored text, such as numerical information, is processed by a word list generator to develop a word list corresponding to those words that are to be spoken by the system. The word list generator assigns a prosodic environment state or token to each entry in the list. The prosodic environment identifies how the word functions in its current prosodic context. Different intonations are applied based on the prosodic environment. Next, the preceding and adjacent words are examined to determine how each word may need to be pronounced differently, based on the ending phoneme of the preceding word and the beginning phoneme of the following word. Using this phonological information along with the prosodic information, a sample list is generated by accessing a dictionary of stored samples. The sample list is then serially played through suitable digital-to-analog conversion circuitry to generate the text-to-speech output.
    Type: Grant
    Filed: September 9, 1996
    Date of Patent: March 2, 1999
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Kazue Hata, Nicholas Kibre
  • Patent number: RE39336
    Abstract: The concatenative speech synthesizer employs demi-syllable subword units to generate speech. The synthesizer is based on a source-filter model that uses source signals that correspond closely to the human glottal source and that uses filter parameters that correspond closely to the human vocal tract. Concatenation of the demi-syllable units is facilitated by two separate cross face techniques, one applied in the time domain in the demi-syllable source signal waveforms, and one applied in the frequency domain by interpolating the corresponding filter parameters of the concatenated demi-syllables. The dual cross fade technique results in natural sounding synthesis that avoids time-domain glitches without degrading or smearing characteristic resonances in the filter domain.
    Type: Grant
    Filed: November 5, 2002
    Date of Patent: October 10, 2006
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Steve Pearson, Nicholas Kibre, Nancy Niedzielski