Patents by Inventor Vincent Pollet
Vincent Pollet has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11069335Abstract: Aspects of the disclosure are related to synthesizing speech or other audio based on input data. Additionally, aspects of the disclosure are related to using one or more recurrent neural networks. For example, a computing device may receive text input; may determine features based on the text input; may provide the features as input to an recurrent neural network; may determine embedded data from one or more activations of a hidden layer of the recurrent neural network; may determine speech data based on a speech unit search that attempts to select, from a database, speech units based on the embedded data; and may generate speech output based on the speech data.Type: GrantFiled: July 12, 2017Date of Patent: July 20, 2021Assignee: Cerence Operating CompanyInventors: Vincent Pollet, Enrico Zovato
-
Publication number: 20200211529Abstract: Techniques for performing multi-style speech synthesis. The techniques include using at least one computer hardware processor to perform: obtaining input comprising text and an identification of a desired speaking style to use in rendering the text as speech; identifying a plurality of speech segments for use in rendering the text as speech, the identifying comprising identifying a first speech segment recorded and/or synthesized in a first speaking style that is different from the desired speaking style based at least in part on a measure of similarity between the desired speaking style and the first speaking style; synthesizing speech from the text in the desired speaking style at least in part by using the first speech segment; and outputting the synthesized speech.Type: ApplicationFiled: February 11, 2020Publication date: July 2, 2020Applicant: Nuance Communications, Inc.Inventor: Vincent Pollet
-
Publication number: 20190108830Abstract: Techniques for performing multi-style speech synthesis. The techniques include using at least one computer hardware processor to perform: obtaining input comprising text and an identification of a desired speaking style to use in rendering the text as speech; identifying a plurality of speech segments for use in rendering the text as speech, the identifying comprising identifying a first speech segment recorded and/or synthesized in a first speaking style that is different from the desired speaking style based at least in part on a measure of similarity between the desired speaking style and the first speaking style; synthesizing speech from the text in the desired speaking style at least in part by using the first speech segment; and outputting the synthesized speech.Type: ApplicationFiled: June 4, 2018Publication date: April 11, 2019Applicant: Nuance Communications, Inc.Inventor: Vincent Pollet
-
Patent number: 9990915Abstract: Techniques for performing multi-style speech synthesis. The techniques include using at least one computer hardware processor to perform: obtaining input comprising text and an identification of a desired speaking style to use in rendering the text as speech; identifying a plurality of speech segments for use in synthesizing the text as speech, the identifying comprising identifying a first speech segment recorded and/or synthesized in a first speaking style that is different from the desired speaking style based at least in part on a measure of similarity between the desired speaking style and the first speaking style; synthesizing speech from the text in the desired speaking style at least in part by using the first speech segment; and outputting the synthesized speech.Type: GrantFiled: December 28, 2016Date of Patent: June 5, 2018Assignee: Nuance Communications, Inc.Inventor: Vincent Pollet
-
Publication number: 20180096677Abstract: Aspects of the disclosure are related to synthesizing speech or other audio based on input data. Additionally, aspects of the disclosure are related to using one or more recurrent neural networks. For example, a computing device may receive text input; may determine features based on the text input; may provide the features as input to an recurrent neural network; may determine embedded data from one or more activations of a hidden layer of the recurrent neural network; may determine speech data based on a speech unit search that attempts to select, from a database, speech units based on the embedded data; and may generate speech output based on the speech data.Type: ApplicationFiled: July 12, 2017Publication date: April 5, 2018Inventors: Vincent Pollet, Enrico Zovato
-
Publication number: 20170110110Abstract: Techniques for performing multi-style speech synthesis. The techniques include using at least one computer hardware processor to perform: obtaining input comprising text and an identification of a first speaking style to use in rendering the text as speech; identifying a plurality of speech segments for use in rendering the text as speech, the identified plurality of speech segments comprising a first speech segment having the first speaking style and a second speech segment having a second speaking style different from the first speaking style; and rendering the text as speech having the first speaking style, at least in part, by using the identified plurality of speech segments.Type: ApplicationFiled: December 28, 2016Publication date: April 20, 2017Applicant: Nuance Communications, Inc.Inventor: Vincent Pollet
-
Patent number: 9570065Abstract: Techniques for performing multi-style speech synthesis. The techniques include using at least one computer hardware processor to perform: obtaining input comprising text and an identification of a first speaking style to use in rendering the text as speech; identifying a plurality of speech segments for use in rendering the text as speech, the identified plurality of speech segments comprising a first speech segment having the first speaking style and a second speech segment having a second speaking style different from the first speaking style; and rendering the text as speech having the first speaking style, at least in part, by using the identified plurality of speech segments.Type: GrantFiled: September 29, 2014Date of Patent: February 14, 2017Assignee: Nuance Communications, Inc.Inventor: Vincent Pollet
-
Patent number: 9484045Abstract: An embodiment according to the invention provides a capability of automatically predicting how favorable a given speech signal is for statistical modeling, which is advantageous in a variety of different contexts. In Multi-Form Segment (MFS) synthesis, for example, an embodiment according to the invention uses prediction capability to provide an automatic acoustic driven template versus model decision maker with an output quality that is high, stable and depends gradually on the system footprint. In speaker selection for a statistical Text-to-Speech synthesis (TTS) system build, as another example context, an embodiment according to the invention enables a fast selection of the most appropriate speaker among several available ones for the full voice dataset recording and preparation, based on a small amount of recorded speech material.Type: GrantFiled: September 7, 2012Date of Patent: November 1, 2016Assignee: Nuance Communications, Inc.Inventors: Alexander Sorin, Slava Shechtman, Vincent Pollet
-
Publication number: 20160093289Abstract: Techniques for performing multi-style speech synthesis. The techniques include using at least one computer hardware processor to perform: obtaining input comprising text and an identification of a first speaking style to use in rendering the text as speech; identifying a plurality of speech segments for use in rendering the text as speech, the identified plurality of speech segments comprising a first speech segment having the first speaking style and a second speech segment having a second speaking style different from the first speaking style; and rendering the text as speech having the first speaking style, at least in part, by using the identified plurality of speech segments.Type: ApplicationFiled: September 29, 2014Publication date: March 31, 2016Inventor: Vincent Pollet
-
Publication number: 20140074468Abstract: An embodiment according to the invention provides a capability of automatically predicting how favorable a given speech signal is for statistical modeling, which is advantageous in a variety of different contexts. In Multi-Form Segment (MFS) synthesis, for example, an embodiment according to the invention uses prediction capability to provide an automatic acoustic driven template versus model decision maker with an output quality that is high, stable and depends gradually on the system footprint. In speaker selection for a statistical Text-to-Speech synthesis (TTS) system build, as another example context, an embodiment according to the invention enables a fast selection of the most appropriate speaker among several available ones for the full voice dataset recording and preparation, based on a small amount of recorded speech material.Type: ApplicationFiled: September 7, 2012Publication date: March 13, 2014Applicant: Nuance Communications, Inc.Inventors: Alexander Sorin, Slava Shechtman, Vincent Pollet
-
Patent number: 8321222Abstract: A speech synthesis system and method is described. A speech segment database references speech segments having various different speech representational structures. A speech segment selector selects from the speech segment database a sequence of speech segment candidates corresponding to a target text. A speech segment sequencer generates from the speech segment candidates sequenced speech segments corresponding to the target text. A speech segment synthesizer combines the selected sequenced speech segments to produce a synthesized speech signal output corresponding to the target text.Type: GrantFiled: August 14, 2007Date of Patent: November 27, 2012Assignee: Nuance Communications, Inc.Inventors: Vincent Pollet, Andrew Breen
-
Patent number: 7567896Abstract: A system and method generate synthesized speech through concatenation of speech segments that are derived from a large prosodically-rich corpus of speech segments including using an additional dictionary of speech segment identifier sequences.Type: GrantFiled: January 18, 2005Date of Patent: July 28, 2009Assignee: Nuance Communications, Inc.Inventors: Geert Coorman, Vincent Pollet, Stefaan Van Gerven, Mario De Bock, Bert Van Coile, Jan De Moortel
-
Publication number: 20090048841Abstract: A speech synthesis system and method is described. A speech segment database references speech segments having various different speech representational structures. A speech segment selector selects from the speech segment database a sequence of speech segment candidates corresponding to a target text. A speech segment sequencer generates from the speech segment candidates sequenced speech segments corresponding to the target text. A speech segment synthesizer combines the selected sequenced speech segments to produce a synthesized speech signal output corresponding to the target text.Type: ApplicationFiled: August 14, 2007Publication date: February 19, 2009Applicant: Nuance Communications, Inc.Inventors: Vincent Pollet, Andrew Breen
-
Publication number: 20050182629Abstract: A system and method generate synthesized speech through concatenation of speech segments that are derived from a large prosodically-rich corpus of speech segments including using an additional dictionary of speech segment identifier sequences.Type: ApplicationFiled: January 18, 2005Publication date: August 18, 2005Inventors: Geert Coorman, Vincent Pollet, Stefaan Van Gerven, Mario De Bock, Bert Van Coile, Jan De Moortel