Patents by Inventor Nicholas Kibre

Nicholas Kibre has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

CUSTOM DISPLAY POST PROCESSING IN SPEECH RECOGNITION

Publication number: 20230351098

Abstract: Solutions for custom display post processing (DPP) in speech recognition (SR) use a customized multi-stage DPP pipeline that transforms a stream of SR tokens from lexical form to display form. A first transformation stage of the DPP pipeline receives the stream of tokens, in turn, by an upstream filter, a base model stage, and a downstream filter, and transforms a first aspect of the stream of tokens (e.g., disfluency, inverse text normalization (ITN), capitalization, etc.) from lexical form into display form. The upstream filter and/or the downstream filter alter the stream of tokens to change the default behavior of the DPP pipeline into custom behavior. Additional transformation stages of the DPP pipeline perform further transforms, allowing for outputting final text in a display format that is customized for a specific user. This permits each user to efficiently leverage a common baseline DPP pipeline to produce a custom output.

Type: Application

Filed: July 26, 2022

Publication date: November 2, 2023

Inventors: Wei LIU, Padma VARADHARAJAN, Piyush BEHRE, Nicholas KIBRE, Edward C. LIN, Shuangyu CHANG, Che ZHAO, Khuram SHAHID, Heiko Willy RAHMEL
ON-DEVICE STREAMING INVERSE TEXT NORMALIZATION (ITN)

Publication number: 20230289536

Abstract: Solutions for on-device streaming inverse text normalization (ITN) include: receiving a stream of tokens, each token representing an element of human speech; tagging, by a tagger that can work in a streaming manner (e.g., a neural network), the stream of tokens with one or more tags of a plurality of tags to produce a tagged stream of tokens, each tag of the plurality of tags representing a different normalization category of a plurality of normalization categories; based on at least a first tag representing a first normalization category, converting, by a first language converter of a plurality of category-specific natural language converters (e.g., weighted finite state transducers, WFSTs), at least one token of the tagged stream of tokens, from a first lexical language form, to a first natural language form; and based on at least the first natural language form, outputting a natural language representation of the stream of tokens.

Type: Application

Filed: March 11, 2022

Publication date: September 14, 2023

Inventors: Yashesh GAUR, Nicholas KIBRE, Issac J. ALPHONSO, Jian XUE, Jinyu LI, Piyush BEHRE, Shawn CHANG
Conversion of text-to-speech pronunciation outputs to hyperarticulated vowels

Patent number: 10923105

Abstract: In non-limiting examples of the present disclosure, systems, methods and devices for mapping hyperarticulated sounds to text units are presented. A plurality of textual units may be received. The plurality of textual units may be processed with a natural language processing engine. A sentence structure for the plurality of textual units may be identified, wherein the sentence structure comprises a plurality of words. The plurality of words may be processed with a text-to-speech engine. A text-to-speech output comprising a plurality of pronunciations may be identified, wherein each of the plurality of pronunciations corresponds to a syllabic unit of one of the plurality of words. A hyperarticulated vowel sound may be mapped to each syllabic unit from the text-to-speech output. A pronunciation instruction corresponding to each hyperarticulated vowel sound may be caused to be surfaced.

Type: Grant

Filed: November 29, 2018

Date of Patent: February 16, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Kevin Chad Larson, Tanya Matskewich, Gregory Carl Hitchcock, Michael Tholfsen, Guillaume Simonnet, Viktoryia Akulich, Nicholas Kibre, Christina Chen Campbell
CONVERSION OF TEXT-TO-SPEECH PRONUNCIATION OUTPUTS TO HYPERARTICULATED VOWELS

Publication number: 20200118542

Abstract: In non-limiting examples of the present disclosure, systems, methods and devices for mapping hyperarticulated sounds to text units are presented. A plurality of textual units may be received. The plurality of textual units may be processed with a natural language processing engine. A sentence structure for the plurality of textual units may be identified, wherein the sentence structure comprises a plurality of words. The plurality of words may be processed with a text-to-speech engine. A text-to-speech output comprising a plurality of pronunciations may be identified, wherein each of the plurality of pronunciations corresponds to a syllabic unit of one of the plurality of words. A hyperarticulated vowel sound may be mapped to each syllabic unit from the text-to-speech output. A pronunciation instruction corresponding to each hyperarticulated vowel sound may be caused to be surfaced.

Type: Application

Filed: November 29, 2018

Publication date: April 16, 2020

Inventors: Kevin Chad Larson, Tanya Matskewich, Gregory Carl Hitchcock, Michael Tholfsen, Guillaume Simonnet, Viktoryia Akulich, Nicholas Kibre, Christina Chen Campbell
Pronunciation learning through correction logs

Patent number: 9947317

Abstract: A new pronunciation learning system for dynamically learning new pronunciations assisted by user correction logs. The user correction logs provide a record of speech recognition events and subsequent user behavior that implicitly confirms or rejects the recognition result and/or shows the user's intended words by via subsequent input. The system analyzes the correction logs and distills them down to a set of words which lack acceptable pronunciations. Hypothetical pronunciations, constrained by spelling and other linguistic knowledge, are generated for each of the words. Offline recognition determines the hypothetical pronunciations with a good acoustical match to the audio data likely to contain the words. The matching pronunciations are aggregated and adjudicated to select new pronunciations for the words to improve general or personalized recognition models.

Type: Grant

Filed: February 13, 2017

Date of Patent: April 17, 2018

Assignee: Microsoft Technology Licensing, LLC

Inventors: Nicholas Kibre, Umut Ozertem, Sarangarajan Parthasarathy, Ziad Al Bawab
PRONUNCIATION LEARNING THROUGH CORRECTION LOGS

Publication number: 20170154623

Abstract: A new pronunciation learning system for dynamically learning new pronunciations assisted by user correction logs. The user correction logs provide a record of speech recognition events and subsequent user behavior that implicitly confirms or rejects the recognition result and/or shows the user's intended words by via subsequent input. The system analyzes the correction logs and distills them down to a set of words which lack acceptable pronunciations. Hypothetical pronunciations, constrained by spelling and other linguistic knowledge, are generated for each of the words. Offline recognition determines the hypothetical pronunciations with a good acoustical match to the audio data likely to contain the words. The matching pronunciations are aggregated and adjudicated to select new pronunciations for the words to improve general or personalized recognition models.

Type: Application

Filed: February 13, 2017

Publication date: June 1, 2017

Applicant: Microsoft Technology Licensing, LLC.

Inventors: Nicholas Kibre, Umut Ozertem, Sarangarajan Parthasarathy, Ziad Al Bawab
Pronunciation learning through correction logs

Patent number: 9589562

Abstract: A new pronunciation learning system for dynamically learning new pronunciations assisted by user correction logs. The user correction logs provide a record of speech recognition events and subsequent user behavior that implicitly confirms or rejects the recognition result and/or shows the user's intended words by via subsequent input. The system analyzes the correction logs and distills them down to a set of words which lack acceptable pronunciations. Hypothetical pronunciations, constrained by spelling and other linguistic knowledge, are generated for each of the words. Offline recognition determines the hypothetical pronunciations with a good acoustical match to the audio data likely to contain the words. The matching pronunciations are aggregated and adjudicated to select new pronunciations for the words to improve general or personalized recognition models.

Type: Grant

Filed: February 21, 2014

Date of Patent: March 7, 2017

Assignee: Microsoft Technology Licensing, LLC

Inventors: Nicholas Kibre, Umut Ozertem, Sarangarajan Parthasarathy, Ziad Al Bawab
PRONUNCIATION LEARNING THROUGH CORRECTION LOGS

Publication number: 20150243278

Abstract: A new pronunciation learning system for dynamically learning new pronunciations assisted by user correction logs. The user correction logs provide a record of speech recognition events and subsequent user behavior that implicitly confirms or rejects the recognition result and/or shows the user's intended words by via subsequent input. The system analyzes the correction logs and distills them down to a set of words which lack acceptable pronunciations. Hypothetical pronunciations, constrained by spelling and other linguistic knowledge, are generated for each of the words. Offline recognition determines the hypothetical pronunciations with a good acoustical match to the audio data likely to contain the words. The matching pronunciations are aggregated and adjudicated to select new pronunciations for the words to improve general or personalized recognition models.

Type: Application

Filed: February 21, 2014

Publication date: August 27, 2015

Applicant: MICROSOFT CORPORATION

Inventors: Nicholas Kibre, Umut Ozertem, Sarangarajan Parthasarathy, Ziad Al Bawab
Histogram grammar weighting and error corrective training of grammar weights

Patent number: 6985862

Abstract: A multi-level method for estimating and training weights associated with grammar options is presented. The implementation of the method implemented differs depending on the amount of utterance data available for each option to be tuned. A first implementation, modified maximum likelihood estimation (MLE), can be used to estimate weights for a grammar option when few utterances are available for the option. Option weights are then estimated using an obtainable statistic that creates a basis for the predictability model. A second implementation, error corrective training (ECT), can be used to estimate option weight when a sufficiently large number of utterances are available. The ECT method minimizes the errors in the score of the correct interpretation of the utterance and the highest scoring incorrect interpretation in an utterance training set. The ECT method is iterated to converge on a solution for option weights.

Type: Grant

Filed: March 22, 2001

Date of Patent: January 10, 2006

Assignee: Tellme Networks, Inc.

Inventors: Nikko Ström, Nicholas Kibre
Prosody template matching for text-to-speech systems

Patent number: 6845358

Abstract: A prosody matching template in the form of a tree structure stores indices which point to lookup table and template information prescribing pitch and duration values that are used to add inflection to the output of a text-to-speech synthesizer. The lookup module employs a search algorithm that explores each branch of the tree, assigning penalty scores based on whether the syllable represented by a node of the tree does or does not match the corresponding syllable of the target word. The path with the lowest penalty score is selected as the index into the prosody template table. The system will add nodes by cloning existing nodes in cases where it is not possible to find a one-to-one match between the number of syllables in the target word and the number of nodes in the tree.

Type: Grant

Filed: January 5, 2001

Date of Patent: January 18, 2005

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Nicholas Kibre, Ted H. Applebaum
Text selection and recording by feedback and adaptation for development of personalized text-to-speech systems

Patent number: 6792407

Abstract: A new speaker provides speech from which comparison snippets are extracted. The comparison snippets are compared with initial snippets stored in a recorded snippet database that is associated with a concatenative synthesizer. The comparison of the snippets to the initial snippets produces required sound units. A greedy selection algorithm is performed with the required sound units for identifying the smallest subset of the input text that contains all of the text for the new speaker to read. The new speaker then reads the optimally selected text and sound units are extracted from the human speech such that the recorded snippet database is modified and the speech synthesized adopts the voice quality and characteristics of the new speaker.

Type: Grant

Filed: March 30, 2001

Date of Patent: September 14, 2004

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Nicholas Kibre, Steven Pearson, Brian Hanson, Jean-Claude Junqua
Histogram grammar weighting and error corrective training of grammar weights

Publication number: 20030004717

Abstract: A multi-level method for estimating and training weights associated with grammar options is presented. The implementation of the method implemented differs depending on the amount of utterance data available for each option to be tuned. A first implementation, modified maximum likelihood estimation (MLE), can be used to estimate weights for a grammar option when few utterances are available for the option. Option weights are then estimated using an obtainable statistic that creates a basis for the predictability model. A second implementation, error corrective training (ECT) , can be used to estimate option weight when a sufficiently large number of utterances are available. The ECT method minimizes the errors in the score of the correct interpretation of the utterance and the highest scoring incorrect interpretation in an utterance training set. The ECT method is iterated to converge on a solution for option weights.

Type: Application

Filed: March 22, 2001

Publication date: January 2, 2003

Inventors: Nikko Strom, Nicholas Kibre
Text selection and recording by feedback and adaptation for development of personalized text-to-speech systems

Publication number: 20020193994

Abstract: A new speaker provides speech from which comparison snippets are extracted. The comparison snippets are compared with initial snippets stored in a recorded snippet database that is associated with a concatenative synthesizer. The comparison of the snippets to the initial snippets produces required sound units. A greedy selection algorithm is performed with the required sound units for identifying the smallest subset of the input text that contains all of the text for the new speaker to read. The new speaker then reads the optimally selected text and sound units are extracted from the human speech such that the recorded snippet database is modified and the speech synthesized adopts the voice quality and characteristics of the new speaker.

Type: Application

Filed: March 30, 2001

Publication date: December 19, 2002

Inventors: Nicholas Kibre, Steven Pearson, Brian Hanson, Jean-Claude Junqua
Prosody template matching for text-to-speech systems

Publication number: 20020128841

Abstract: A prosody matching template in the form of a tree structure stores indices which point to lookup table and template information prescribing pitch and duration values that are used to add inflection to the output of a text-to-speech synthesizer. The lookup module employs a search algorithm that explores each branch of the tree, assigning penalty scores based on whether the syllable represented by a node of the tree does or does not match the corresponding syllable of the target word. The path with the lowest penalty score is selected as the index into the prosody template table. The system will add nodes by cloning existing nodes in cases where it is not possible to find a one-to-one match between the number of syllables in the target word and the number of nodes in the tree.

Type: Application

Filed: January 5, 2001

Publication date: September 12, 2002

Inventors: Nicholas Kibre, Ted H. Applebaum
Identification of unit overlap regions for concatenative speech synthesis system

Patent number: 6202049

Abstract: Speech signal parameters are extracted from time-series data corresponding to different sound units containing the same vowel. The extracted parameters are used to train a statistical model, such as a Hidden Markov-based Model, that has a data structure for separately modeling the nuclear trajectory region of the vowel and its surrounding transition elements. The model is trained as through embedded re-estimation to automatically determine optimally aligned models that identify the nuclear trajectory region. The boundaries of the nuclear trajectory region serve to delimit the overlap region for subsequent sound unit concatenation.

Type: Grant

Filed: March 9, 1999

Date of Patent: March 13, 2001

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Nicholas Kibre, Steve Pearson
Formant-based speech synthesizer employing demi-syllable concatenation with independent cross fade in the filter parameter and source domains

Patent number: 6144939

Abstract: The concatenative speech synthesizer employs demi-syllable subword units to generate speech. The synthesizer is based on a source-filter model that uses source signals that correspond closely to the human glottal source and that uses filter parameters that correspond closely to the human vocal tract. Concatenation of the demi-syllable units is facilitated by two separate cross fade techniques, one applied in the time domain to the demi-syllable source signal waveforms, and one applied in the frequency domain by interpolating the corresponding filter parameters of the concatenated demi-syllables. The dual cross fade technique results in natural sounding synthesis that avoids time-domain glitches without degrading or smearing characteristic resonances in the filter domain.

Type: Grant

Filed: November 25, 1998

Date of Patent: November 7, 2000

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Steve Pearson, Nicholas Kibre, Nancy Niedzielski
Message assembler using pseudo randomly chosen words in finite state slots

Patent number: 5966691

Abstract: The operating system or application program generates events captured by an event handler mechanism that, in turn, invokes message assembler and graphics assembler modules. The message assembler constructs pseudo-random sentences or notification messages based on the type of event captured and optionally upon selected state variables. The message assembler supplies text strings to a text-to-speech engine. A linguistic database containing a lexicon of predefined words supplies text data that the message assembler concatenates to form the text strings. Text strings are assembled and optionally tagged to alter the message, speaking voice and inflection based on the nature of the message and its context. Graphics elements are assembled by a graphics assembler in response to the event handler and based on optionally supplied user-defined graphics parameters.

Type: Grant

Filed: April 29, 1997

Date of Patent: October 12, 1999

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Nicholas Kibre, Yoshizumi Terada, Kazue Hata, Rhonda Shaw
High quality concatenative reading system

Patent number: 5878393

Abstract: Computer-stored text, such as numerical information, is processed by a word list generator to develop a word list corresponding to those words that are to be spoken by the system. The word list generator assigns a prosodic environment state or token to each entry in the list. The prosodic environment identifies how the word functions in its current prosodic context. Different intonations are applied based on the prosodic environment. Next, the preceding and adjacent words are examined to determine how each word may need to be pronounced differently, based on the ending phoneme of the preceding word and the beginning phoneme of the following word. Using this phonological information along with the prosodic information, a sample list is generated by accessing a dictionary of stored samples. The sample list is then serially played through suitable digital-to-analog conversion circuitry to generate the text-to-speech output.

Type: Grant

Filed: September 9, 1996

Date of Patent: March 2, 1999

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Kazue Hata, Nicholas Kibre
Formant-based speech synthesizer employing demi-syllable concatenation with independent cross fade in the filter parameter and source domains

Patent number: RE39336

Abstract: The concatenative speech synthesizer employs demi-syllable subword units to generate speech. The synthesizer is based on a source-filter model that uses source signals that correspond closely to the human glottal source and that uses filter parameters that correspond closely to the human vocal tract. Concatenation of the demi-syllable units is facilitated by two separate cross face techniques, one applied in the time domain in the demi-syllable source signal waveforms, and one applied in the frequency domain by interpolating the corresponding filter parameters of the concatenated demi-syllables. The dual cross fade technique results in natural sounding synthesis that avoids time-domain glitches without degrading or smearing characteristic resonances in the filter domain.

Type: Grant

Filed: November 5, 2002

Date of Patent: October 10, 2006

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Steve Pearson, Nicholas Kibre, Nancy Niedzielski