Sound Editing, Manipulating Voice Of The Synthesizer (epo) Patents (Class 704/E13.004)
-
Patent number: 11967324Abstract: The method S200 can include: at an aircraft, receiving an audio utterance from air traffic control S210, converting the audio utterance to text, determining commands from the text using a question-and-answer model S240, and optionally controlling the aircraft based on the commands S250. The method functions to automatically interpret flight commands from the air traffic control (ATC) stream.Type: GrantFiled: October 28, 2022Date of Patent: April 23, 2024Assignee: Merlin Labs, Inc.Inventors: Michael Pust, Joseph Bondaryk, Matthew George
-
Patent number: 11948555Abstract: A method of processing a video file to generate a modified video file, the modified video file including a translated audio content of the video file, the method comprising: receiving the video file; accessing a facial model or a speech model for a specific speaker, wherein the facial model maps speech to facial expressions, and the speech model maps text to speech; receiving a reference content for the originating video file for the specific speaker; generating modified audio content for the specific speaker and/or modified facial expression for the specific speaker; and modifying the video file in accordance with the modified content and/or the modified expression to generate the modified video file.Type: GrantFiled: March 20, 2020Date of Patent: April 2, 2024Assignee: NEP SUPERSHOOTERS L.P.Inventors: Mark Christie, Gerald Chao
-
Patent number: 11942071Abstract: An information processing system includes at least one memory storing a program and at least one processor. The at least one processor implements the program to input a piece of sound source data obtained by encoding a first identification data representative of a sound source, a piece of style data obtained by encoding a second identification data representative of a performance style, and synthesis data representative of sounding conditions into a synthesis model generated by machine learning, and to generate, using the synthesis model, feature data representative of acoustic features of a target sound of the sound source to be generated in the performance style and according to the sounding conditions, and to generate an audio signal corresponding to the target sound using the generated feature data.Type: GrantFiled: May 4, 2021Date of Patent: March 26, 2024Assignee: YAMAHA CORPORATIONInventors: Ryunosuke Daido, Merlijn Blaauw, Jordi Bonada
-
Patent number: 11922942Abstract: Devices and techniques are generally described for generating response templates for natural language processing. In various examples, a first knowledge graph comprising a plurality of entities may be received. First text data may be received for a first response template, the first text data defining a natural language input configured to invoke the first response template. A response definition may be received for the first response template, the response definition defining a response associated with the first response template. Natural language input data may be received. A determination may be made that the natural language input data corresponds to the natural language input configured to invoke the first response template. The first response template may be configured to generate natural language output data.Type: GrantFiled: June 4, 2020Date of Patent: March 5, 2024Assignee: Amazon Technologies, Inc.Inventors: Emre Can Kilinc, Thomas Reno, John Zucchi, Joshua Kaplan
-
Patent number: 11803399Abstract: A method, computer program product, and computer system for defining, at a computing device, psychometric data for a user. An interactive virtual assistant, selected from a plurality of interactive virtual assistants, may be provided on the computing device based upon, at least in part, the psychometric data defined for the user. The user may be prompted, via the interactive virtual assistant, with one or more options.Type: GrantFiled: May 31, 2020Date of Patent: October 31, 2023Assignee: Happy Money, Inc.Inventors: Adam Zarlengo, Chris Courtney, Michael Tepper, Josh Hemsley, Ryan Howes, Daniel Sinner, Scott Saunders
-
Patent number: 11798542Abstract: The disclosed computer-implemented method may include receiving input voice data synchronous with a visual state of a user interface of the third-party application, generating multiple sentence alternatives for the received input voice data, identifying a best sentence of the multiple sentence alternatives, executing a dialog script for the third-party application using the best sentence, the dialog script generating a response to the received voice data comprising output voice data and a corresponding visual response, and providing the visual response and the output voice data to the third-party application, the third-party application playing the output voice data synchronous with updating the user interface based on the visual response. Various other methods, systems, and computer-readable media are also disclosed.Type: GrantFiled: March 26, 2021Date of Patent: October 24, 2023Assignee: Alan AI, Inc.Inventors: Andrey Ryabov, Ramu V. Sunkara
-
Patent number: 11776537Abstract: A computer-implemented method is provided to optimize natural language processing of voice interaction data in product/service categorization and product/service application. The computer-implemented method receives, from a voice interaction device through a context discovery interface, user voice data corresponding to a user. Furthermore, the computer-implemented method performs, with an NLP engine, natural language processing of the user voice data to determine a context category. Additionally, the computer-implemented method selects, with an AI engine, one of a plurality of context-specific applier interfaces based on the context category. The computer-implemented method automatically transitions, with the AI engine, to said one of the plurality of context-specific applier interfaces. Finally, the computer-implemented method interacts, via the AI engine, with the user via a voice interaction to initiate the product/service application.Type: GrantFiled: December 7, 2022Date of Patent: October 3, 2023Assignee: Blue Lakes Technology, Inc.Inventors: Anand Menon, Satyaprashvitha Nara
-
Patent number: 11741941Abstract: A discriminator trained on labeled samples of speech can compute probabilities of voice properties. A speech synthesis generative neural network that takes in text and continuous scale values of voice properties is trained to synthesize speech audio that the discriminator will infer as matching the values of the input voice properties. Voice parameters can include speaker voice parameters, accents, and attitudes, among others. Training can be done by transfer learning from an existing neural speech synthesis model or such a model can be trained with a loss function that considers speech and parameter values. A graphical user interface can allow voice designers for products to synthesize speech with a desired voice or generate a speech synthesis engine with frozen voice parameters. A vector of parameters can be used for comparison to previously registered voices in databases such as ones for trademark registration.Type: GrantFiled: June 7, 2021Date of Patent: August 29, 2023Assignee: SoundHound, IncInventor: Andrew Richards
-
Patent number: 11741954Abstract: Provided are a method and an apparatus for providing an intelligent voice response at a voice assistant device. The method includes obtaining, by a voice assistant device, a voice input from a user, identifying non-speech input while obtaining the voice input, determining a correlation between the voice input and the non-speech input, and generating, based on the correlation, a response comprising an action related to the correlation or a suggestion related to the correlation.Type: GrantFiled: February 11, 2021Date of Patent: August 29, 2023Assignee: SAMSUNG ELEOTRONICS CO., LTD.Inventors: Vinay Vasanth Patage, Sourabh Tiwari, Ravibhushan B. Tayshete
-
Patent number: 11455985Abstract: An information processing apparatus determines, on the basis of a speech of a user to be evaluated, a reference feature quantity representing a feature of the user's speech at normal times, acquires audio feature quantity data of a target speech to be evaluated made by the user, and evaluates the feature of the target speech on the basis of a comparison result between the audio feature quantity of the target speech and the reference feature quantity.Type: GrantFiled: February 9, 2017Date of Patent: September 27, 2022Assignee: SONY INTERACTIVE ENTERTAINMENT INC.Inventors: Shinichi Kariya, Shinichi Honda, Hiroyuki Segawa
-
Patent number: 11430445Abstract: A system including one or more processors and one or more non-transitory computer-readable media storing computing instructions configured to run on the one or more processors and perform receiving a voice command from a user to perform a virtual action intended to apply to one item of two or more items in a cart of the user; generating a concept vector representing a concept in the voice command; transforming the respective item attributes for each of the two or more items into a respective feature vector; generating a respective candidate score for the each of the two or more items; identifying the one item to which the voice command is intended to apply; and executing an action with respect to the one item based on the voice command. Other embodiments are disclosed.Type: GrantFiled: January 30, 2020Date of Patent: August 30, 2022Assignee: WALMART APOLLO, LLCInventors: Ghodratollah Aalipour Hafshejani, Phani Ram Sayapaneni
-
Patent number: 11380327Abstract: The present disclosure relates to a field of intelligent communications, and discloses a speech communication system and method with human-machine coordination, which resolve a problem of bad client experience because great differences occur after a switchover in a call through a prior human-machine coordination and time of a client is wasted.Type: GrantFiled: April 2, 2021Date of Patent: July 5, 2022Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.Inventor: Huapeng Sima
-
Patent number: 11361751Abstract: In a speech synthesis method, an emotion intensity feature vector is set for a target synthesis text, an acoustic feature vector corresponding to an emotion intensity is generated based on the emotion intensity feature vector by using an acoustic model, and a speech corresponding to the emotion intensity is synthesized based on the acoustic feature vector. The emotion intensity feature vector is continuously adjustable, and emotion speeches of different intensities can be generated based on values of different emotion intensity feature vectors, so that emotion types of a synthesized speech are more diversified. This application may be applied to a human-computer interaction process in the artificial intelligence (AI) field, to perform intelligent emotion speech synthesis.Type: GrantFiled: April 8, 2021Date of Patent: June 14, 2022Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Liqun Deng, Jiansheng Wei, Wenhua Sun
-
Patent number: 8311830Abstract: Provided is a system and method for building and managing a customized voice of an end-user, comprising the steps of designing a set of prompts for collection from the user, wherein the prompts are selected from both an analysis tool and by the user's own choosing to capture voice characteristics unique to the user. The prompts are delivered to the user over a network to allow the user to save a user recording on a server of a service provider. This recording is then retrieved and stored on the server and then set up on the server to build a voice database using text-to-speech synthesis tools. A graphical interface allows the user to continuously refine the data file to improve the voice and customize parameter and configuration settings, thereby forming a customized voice database which can be deployed or accessed.Type: GrantFiled: December 6, 2011Date of Patent: November 13, 2012Assignee: Cepstral, LLCInventors: Craig F. Campbell, Kevin A. Lenzo, Alexandre D. Cox
-
Patent number: 8311831Abstract: A voice emphasizing device emphasizes in a speech a “strained rough voice” at a position where a speaker or user of the speech intends to generate emphasis or musical expression. Thereby, the voice emphasizing device can provide the position with emphasis of anger, excitement, tension, or an animated way of speaking, or musical expression of Enka (Japanese ballad), blues, rock, or the like. As a result, rich vocal expression can be achieved. The voice emphasizing device includes: an emphasis utterance section detection unit (12) detecting, from an input speech waveform, an emphasis section that is a time duration having a waveform intended by the speaker or user to be converted; and a voice emphasizing unit (13) increasing fluctuation of an amplitude envelope of the waveform in the detected emphasis section.Type: GrantFiled: September 29, 2008Date of Patent: November 13, 2012Assignee: Panasonic CorporationInventors: Yumiko Kato, Takahiro Kamai, Masakatsu Hoshimi
-
Patent number: 8086457Abstract: Provided is a system and method for building and managing a customized voice of an end-user, comprising the steps of designing a set of prompts for collection from the user, wherein the prompts are selected from both an analysis tool and by the user's own choosing to capture voice characteristics unique to the user. The prompts are delivered to the user over a network to allow the user to save a user recording on a server of a service provider. This recording is then retrieved and stored on the server and then set up on the server to build a voice database using text-to-speech synthesis tools. A graphical interface allows the user to continuously refine the data file to improve the voice and customize parameter and configuration settings, thereby forming a customized voice database which can be deployed or accessed.Type: GrantFiled: May 29, 2008Date of Patent: December 27, 2011Assignee: Cepstral, LLCInventors: Craig F. Campbell, Kevin A. Lenzo, Alexandre D. Cox
-
Publication number: 20090076822Abstract: A sequence is received of time domain digital audio samples representing sound (e.g., a sound generated by a human voice or a musical instrument). The time domain digital audio samples are processed to derive a corresponding sequence of audio pulses in the time domain. Each of the audio pulses is associated with a characteristic frequency. Frequency domain information is derived about each of at least some of the audio pulses. The sound represented by the time domain digital audio samples is transformed by processing the audio pulses using the frequency domain information.Type: ApplicationFiled: September 13, 2007Publication date: March 19, 2009Inventor: Jordi Bonada Sanjaume
-
Publication number: 20090006097Abstract: Pronunciation correction for text-to-speech (TTS) systems and speech recognition (SR) systems between different languages is provided. If a word requiring pronunciation by a target language TTS or SR is from a same language as the target language, but is not found in a lexicon of words from the target language, a letter-to-speech (LTS) rules set of the target language is used to generate a letter-to-speech output for the word for use by the TTS or SR configured according to the target language. If the word is from a different language as the target language, phonemes comprising the word according to its native language are mapped to phonemes of the target language. The phoneme mapping is used by the TTS or SR configured according to the target language for generating or recognizing an audible form of the word according to the target language.Type: ApplicationFiled: June 29, 2007Publication date: January 1, 2009Applicant: Microsoft CorporationInventors: Cameron Ali Etezadi, Timothy David Sharpe
-
Publication number: 20080235025Abstract: A prosody modification device includes: a real voice prosody input part that receives real voice prosody information extracted from an utterance of a human; a regular prosody generating part that generates regular prosody information having a regular phoneme boundary that determines a boundary between phonemes and a regular phoneme length of a phoneme by using data representing a regular or statistical phoneme length in an utterance of a human with respect to a section including at least a phoneme or a phoneme string to be modified in the real voice prosody information; and a real voice prosody modification part that resets a real voice phoneme boundary by using the generated regular prosody information so that the real voice phoneme boundary and a real voice phoneme length of the phoneme or the phoneme string to be modified in the real voice prosody information are approximate to an actual phoneme boundary and an actual phoneme length of the utterance of the human, thereby modifying the real voice prosody inType: ApplicationFiled: February 11, 2008Publication date: September 25, 2008Applicant: FUJITSU LIMITEDInventors: Kentaro Murase, Nobuyuki Katae