Sound Editing, Manipulating Voice Of The Synthesizer (epo) Patents (Class 704/E13.004)
  • Patent number: 11948555
    Abstract: A method of processing a video file to generate a modified video file, the modified video file including a translated audio content of the video file, the method comprising: receiving the video file; accessing a facial model or a speech model for a specific speaker, wherein the facial model maps speech to facial expressions, and the speech model maps text to speech; receiving a reference content for the originating video file for the specific speaker; generating modified audio content for the specific speaker and/or modified facial expression for the specific speaker; and modifying the video file in accordance with the modified content and/or the modified expression to generate the modified video file.
    Type: Grant
    Filed: March 20, 2020
    Date of Patent: April 2, 2024
    Assignee: NEP SUPERSHOOTERS L.P.
    Inventors: Mark Christie, Gerald Chao
  • Patent number: 11942071
    Abstract: An information processing system includes at least one memory storing a program and at least one processor. The at least one processor implements the program to input a piece of sound source data obtained by encoding a first identification data representative of a sound source, a piece of style data obtained by encoding a second identification data representative of a performance style, and synthesis data representative of sounding conditions into a synthesis model generated by machine learning, and to generate, using the synthesis model, feature data representative of acoustic features of a target sound of the sound source to be generated in the performance style and according to the sounding conditions, and to generate an audio signal corresponding to the target sound using the generated feature data.
    Type: Grant
    Filed: May 4, 2021
    Date of Patent: March 26, 2024
    Assignee: YAMAHA CORPORATION
    Inventors: Ryunosuke Daido, Merlijn Blaauw, Jordi Bonada
  • Patent number: 11922942
    Abstract: Devices and techniques are generally described for generating response templates for natural language processing. In various examples, a first knowledge graph comprising a plurality of entities may be received. First text data may be received for a first response template, the first text data defining a natural language input configured to invoke the first response template. A response definition may be received for the first response template, the response definition defining a response associated with the first response template. Natural language input data may be received. A determination may be made that the natural language input data corresponds to the natural language input configured to invoke the first response template. The first response template may be configured to generate natural language output data.
    Type: Grant
    Filed: June 4, 2020
    Date of Patent: March 5, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Emre Can Kilinc, Thomas Reno, John Zucchi, Joshua Kaplan
  • Patent number: 11803399
    Abstract: A method, computer program product, and computer system for defining, at a computing device, psychometric data for a user. An interactive virtual assistant, selected from a plurality of interactive virtual assistants, may be provided on the computing device based upon, at least in part, the psychometric data defined for the user. The user may be prompted, via the interactive virtual assistant, with one or more options.
    Type: Grant
    Filed: May 31, 2020
    Date of Patent: October 31, 2023
    Assignee: Happy Money, Inc.
    Inventors: Adam Zarlengo, Chris Courtney, Michael Tepper, Josh Hemsley, Ryan Howes, Daniel Sinner, Scott Saunders
  • Patent number: 11798542
    Abstract: The disclosed computer-implemented method may include receiving input voice data synchronous with a visual state of a user interface of the third-party application, generating multiple sentence alternatives for the received input voice data, identifying a best sentence of the multiple sentence alternatives, executing a dialog script for the third-party application using the best sentence, the dialog script generating a response to the received voice data comprising output voice data and a corresponding visual response, and providing the visual response and the output voice data to the third-party application, the third-party application playing the output voice data synchronous with updating the user interface based on the visual response. Various other methods, systems, and computer-readable media are also disclosed.
    Type: Grant
    Filed: March 26, 2021
    Date of Patent: October 24, 2023
    Assignee: Alan AI, Inc.
    Inventors: Andrey Ryabov, Ramu V. Sunkara
  • Patent number: 11776537
    Abstract: A computer-implemented method is provided to optimize natural language processing of voice interaction data in product/service categorization and product/service application. The computer-implemented method receives, from a voice interaction device through a context discovery interface, user voice data corresponding to a user. Furthermore, the computer-implemented method performs, with an NLP engine, natural language processing of the user voice data to determine a context category. Additionally, the computer-implemented method selects, with an AI engine, one of a plurality of context-specific applier interfaces based on the context category. The computer-implemented method automatically transitions, with the AI engine, to said one of the plurality of context-specific applier interfaces. Finally, the computer-implemented method interacts, via the AI engine, with the user via a voice interaction to initiate the product/service application.
    Type: Grant
    Filed: December 7, 2022
    Date of Patent: October 3, 2023
    Assignee: Blue Lakes Technology, Inc.
    Inventors: Anand Menon, Satyaprashvitha Nara
  • Patent number: 11741954
    Abstract: Provided are a method and an apparatus for providing an intelligent voice response at a voice assistant device. The method includes obtaining, by a voice assistant device, a voice input from a user, identifying non-speech input while obtaining the voice input, determining a correlation between the voice input and the non-speech input, and generating, based on the correlation, a response comprising an action related to the correlation or a suggestion related to the correlation.
    Type: Grant
    Filed: February 11, 2021
    Date of Patent: August 29, 2023
    Assignee: SAMSUNG ELEOTRONICS CO., LTD.
    Inventors: Vinay Vasanth Patage, Sourabh Tiwari, Ravibhushan B. Tayshete
  • Patent number: 11741941
    Abstract: A discriminator trained on labeled samples of speech can compute probabilities of voice properties. A speech synthesis generative neural network that takes in text and continuous scale values of voice properties is trained to synthesize speech audio that the discriminator will infer as matching the values of the input voice properties. Voice parameters can include speaker voice parameters, accents, and attitudes, among others. Training can be done by transfer learning from an existing neural speech synthesis model or such a model can be trained with a loss function that considers speech and parameter values. A graphical user interface can allow voice designers for products to synthesize speech with a desired voice or generate a speech synthesis engine with frozen voice parameters. A vector of parameters can be used for comparison to previously registered voices in databases such as ones for trademark registration.
    Type: Grant
    Filed: June 7, 2021
    Date of Patent: August 29, 2023
    Assignee: SoundHound, Inc
    Inventor: Andrew Richards
  • Patent number: 11455985
    Abstract: An information processing apparatus determines, on the basis of a speech of a user to be evaluated, a reference feature quantity representing a feature of the user's speech at normal times, acquires audio feature quantity data of a target speech to be evaluated made by the user, and evaluates the feature of the target speech on the basis of a comparison result between the audio feature quantity of the target speech and the reference feature quantity.
    Type: Grant
    Filed: February 9, 2017
    Date of Patent: September 27, 2022
    Assignee: SONY INTERACTIVE ENTERTAINMENT INC.
    Inventors: Shinichi Kariya, Shinichi Honda, Hiroyuki Segawa
  • Patent number: 11430445
    Abstract: A system including one or more processors and one or more non-transitory computer-readable media storing computing instructions configured to run on the one or more processors and perform receiving a voice command from a user to perform a virtual action intended to apply to one item of two or more items in a cart of the user; generating a concept vector representing a concept in the voice command; transforming the respective item attributes for each of the two or more items into a respective feature vector; generating a respective candidate score for the each of the two or more items; identifying the one item to which the voice command is intended to apply; and executing an action with respect to the one item based on the voice command. Other embodiments are disclosed.
    Type: Grant
    Filed: January 30, 2020
    Date of Patent: August 30, 2022
    Assignee: WALMART APOLLO, LLC
    Inventors: Ghodratollah Aalipour Hafshejani, Phani Ram Sayapaneni
  • Patent number: 11380327
    Abstract: The present disclosure relates to a field of intelligent communications, and discloses a speech communication system and method with human-machine coordination, which resolve a problem of bad client experience because great differences occur after a switchover in a call through a prior human-machine coordination and time of a client is wasted.
    Type: Grant
    Filed: April 2, 2021
    Date of Patent: July 5, 2022
    Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.
    Inventor: Huapeng Sima
  • Patent number: 11361751
    Abstract: In a speech synthesis method, an emotion intensity feature vector is set for a target synthesis text, an acoustic feature vector corresponding to an emotion intensity is generated based on the emotion intensity feature vector by using an acoustic model, and a speech corresponding to the emotion intensity is synthesized based on the acoustic feature vector. The emotion intensity feature vector is continuously adjustable, and emotion speeches of different intensities can be generated based on values of different emotion intensity feature vectors, so that emotion types of a synthesized speech are more diversified. This application may be applied to a human-computer interaction process in the artificial intelligence (AI) field, to perform intelligent emotion speech synthesis.
    Type: Grant
    Filed: April 8, 2021
    Date of Patent: June 14, 2022
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Liqun Deng, Jiansheng Wei, Wenhua Sun
  • Patent number: 8311830
    Abstract: Provided is a system and method for building and managing a customized voice of an end-user, comprising the steps of designing a set of prompts for collection from the user, wherein the prompts are selected from both an analysis tool and by the user's own choosing to capture voice characteristics unique to the user. The prompts are delivered to the user over a network to allow the user to save a user recording on a server of a service provider. This recording is then retrieved and stored on the server and then set up on the server to build a voice database using text-to-speech synthesis tools. A graphical interface allows the user to continuously refine the data file to improve the voice and customize parameter and configuration settings, thereby forming a customized voice database which can be deployed or accessed.
    Type: Grant
    Filed: December 6, 2011
    Date of Patent: November 13, 2012
    Assignee: Cepstral, LLC
    Inventors: Craig F. Campbell, Kevin A. Lenzo, Alexandre D. Cox
  • Patent number: 8311831
    Abstract: A voice emphasizing device emphasizes in a speech a “strained rough voice” at a position where a speaker or user of the speech intends to generate emphasis or musical expression. Thereby, the voice emphasizing device can provide the position with emphasis of anger, excitement, tension, or an animated way of speaking, or musical expression of Enka (Japanese ballad), blues, rock, or the like. As a result, rich vocal expression can be achieved. The voice emphasizing device includes: an emphasis utterance section detection unit (12) detecting, from an input speech waveform, an emphasis section that is a time duration having a waveform intended by the speaker or user to be converted; and a voice emphasizing unit (13) increasing fluctuation of an amplitude envelope of the waveform in the detected emphasis section.
    Type: Grant
    Filed: September 29, 2008
    Date of Patent: November 13, 2012
    Assignee: Panasonic Corporation
    Inventors: Yumiko Kato, Takahiro Kamai, Masakatsu Hoshimi
  • Patent number: 8086457
    Abstract: Provided is a system and method for building and managing a customized voice of an end-user, comprising the steps of designing a set of prompts for collection from the user, wherein the prompts are selected from both an analysis tool and by the user's own choosing to capture voice characteristics unique to the user. The prompts are delivered to the user over a network to allow the user to save a user recording on a server of a service provider. This recording is then retrieved and stored on the server and then set up on the server to build a voice database using text-to-speech synthesis tools. A graphical interface allows the user to continuously refine the data file to improve the voice and customize parameter and configuration settings, thereby forming a customized voice database which can be deployed or accessed.
    Type: Grant
    Filed: May 29, 2008
    Date of Patent: December 27, 2011
    Assignee: Cepstral, LLC
    Inventors: Craig F. Campbell, Kevin A. Lenzo, Alexandre D. Cox
  • Publication number: 20090076822
    Abstract: A sequence is received of time domain digital audio samples representing sound (e.g., a sound generated by a human voice or a musical instrument). The time domain digital audio samples are processed to derive a corresponding sequence of audio pulses in the time domain. Each of the audio pulses is associated with a characteristic frequency. Frequency domain information is derived about each of at least some of the audio pulses. The sound represented by the time domain digital audio samples is transformed by processing the audio pulses using the frequency domain information.
    Type: Application
    Filed: September 13, 2007
    Publication date: March 19, 2009
    Inventor: Jordi Bonada Sanjaume
  • Publication number: 20090006097
    Abstract: Pronunciation correction for text-to-speech (TTS) systems and speech recognition (SR) systems between different languages is provided. If a word requiring pronunciation by a target language TTS or SR is from a same language as the target language, but is not found in a lexicon of words from the target language, a letter-to-speech (LTS) rules set of the target language is used to generate a letter-to-speech output for the word for use by the TTS or SR configured according to the target language. If the word is from a different language as the target language, phonemes comprising the word according to its native language are mapped to phonemes of the target language. The phoneme mapping is used by the TTS or SR configured according to the target language for generating or recognizing an audible form of the word according to the target language.
    Type: Application
    Filed: June 29, 2007
    Publication date: January 1, 2009
    Applicant: Microsoft Corporation
    Inventors: Cameron Ali Etezadi, Timothy David Sharpe
  • Publication number: 20080235025
    Abstract: A prosody modification device includes: a real voice prosody input part that receives real voice prosody information extracted from an utterance of a human; a regular prosody generating part that generates regular prosody information having a regular phoneme boundary that determines a boundary between phonemes and a regular phoneme length of a phoneme by using data representing a regular or statistical phoneme length in an utterance of a human with respect to a section including at least a phoneme or a phoneme string to be modified in the real voice prosody information; and a real voice prosody modification part that resets a real voice phoneme boundary by using the generated regular prosody information so that the real voice phoneme boundary and a real voice phoneme length of the phoneme or the phoneme string to be modified in the real voice prosody information are approximate to an actual phoneme boundary and an actual phoneme length of the utterance of the human, thereby modifying the real voice prosody in
    Type: Application
    Filed: February 11, 2008
    Publication date: September 25, 2008
    Applicant: FUJITSU LIMITED
    Inventors: Kentaro Murase, Nobuyuki Katae