Synthesis Patents (Class 704/258)

Neural network (Class 704/259)

Image to speech (Class 704/260)

Vocal tract model (Class 704/261)

Linear prediction (Class 704/262)

Correlation (Class 704/263)

Excitation (Class 704/264)

Interpolation (Class 704/265)

Specialized model (Class 704/266)

Time element (Class 704/267)

Frequency element (Class 704/268)

Transformation (Class 704/269)

Robust speech audio generation for video games

Patent number: 12233338

Abstract: This specification describes a computer-implemented method of training a machine-learned speech audio generation system for use in video games. The training comprises: receiving one or more training examples. Each training example comprises: (i) ground-truth acoustic features for speech audio, (ii) speech content data representing speech content of the speech audio, and (iii) a ground-truth speaker identifier for a speaker of the speech audio.

Type: Grant

Filed: November 16, 2021

Date of Patent: February 25, 2025

Assignee: Electronic Arts Inc.

Inventors: Ping Zhong, Zahra Shakeri, Siddharth Gururani, Kilol Gupta, Shahab Raji
Graphical user interface for customized storytelling

Patent number: 12230244

Abstract: Systems and methods are described herein for an application and graphical user interface (“GUI”) for customized storytelling. In an example, a user can create profiles for a listener user and a reader user. The listener user profile can include information about the listener user. The reader user profile can include a voice model of the reader user's voice. The GUI can allow the user to provide a brief description of a story. The application can send the story description and listener user profile to a server that uses an artificial intelligence engine to generate a customized story for the listener user. The application can apply the reader user voice model to the story and play audio of the reader user's voice reading the story.

Type: Grant

Filed: December 28, 2023

Date of Patent: February 18, 2025

Inventor: Todd Searcy
High fidelity audio super resolution

Patent number: 12217742

Abstract: Embodiments are disclosed for generating full-band audio from narrowband audio using a GAN-based audio super resolution model. A method of generating full-band audio may include receiving narrow-band input audio data, upsampling the narrow-band input audio data to generate upsampled audio data, providing the upsampled audio data to an audio super resolution model, the audio super resolution model trained to perform bandwidth expansion from narrow-band to wide-band, and returning wide-band output audio data corresponding to the narrow-band input audio data.

Type: Grant

Filed: November 23, 2021

Date of Patent: February 4, 2025

Assignees: Adobe Inc., The Trustees of Princeton University

Inventors: Zeyu Jin, Jiaqi Su, Adam Finkelstein
Differentiable wavetable synthesizer using plurality of machine learning models to reduce computational complexity of audio synthesis

Patent number: 12198673

Abstract: The present disclosure describes techniques for differentiable wavetable synthesizer. The techniques comprise extracting features from a dataset of sounds, wherein the features comprise at least timbre embedding; input the features to the first machine learning model, wherein the first machine learning model is configured to extract a set of N×L learnable parameters, N represents a number of wavetables, and L represents a wavetable length; outputting a plurality of wavetables, wherein each of plurality of wavetables comprises a waveform associated with a unique timbre, the plurality of wavetables form a dictionary, and the plurality of wavetables are portable to perform audio-related tasks. Finally, the said wavetables are used to initialize another machine learning model so as to help reduce computational complexity of an audio synthesis obtained as output of the another machine learning model.

Type: Grant

Filed: November 12, 2021

Date of Patent: January 14, 2025

Assignee: LEMON INC.

Inventors: Lamtharn Hantrakul, Siyuan Shan, Jitong Chen, Matthew David Avent, David Trevelyan
Assigning SSML tags to an audio corpus

Patent number: 12198676

Abstract: This disclosure describes a system that converts an audio object (e.g., an audio book, a podcast, a videoconference meeting) to text with SSML tags so that any future text-to-speech conversion enables speech synthesis to sound more human-like. The system analyzes the audio object to identify speech output characteristics for different tokens. Variations in speech output characteristics can distinguish between an utterance spoken by one character and an utterance spoken by another character. The system assigns the tokens to the characters and compares a speech output characteristic for a token to a baseline speech output characteristic associated with an identified character. Next, the system determines an amount of deviation between the speech output characteristic for the token and the baseline speech output characteristic. The system uses this deviation to determine a relative speech output characteristic value, which is to be included in an SSML tag for a token.

Type: Grant

Filed: March 31, 2022

Date of Patent: January 14, 2025

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Mikayel Mirzoyan, Vidush Vishwanath
End-to-end text-to-speech conversion

Patent number: 12190860

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.

Type: Grant

Filed: November 21, 2023

Date of Patent: January 7, 2025

Assignee: Google LLC

Inventors: Samuel Bengio, Yuxuan Wang, Zongheng Yang, Zhifeng Chen, Yonghui Wu, Ioannis Agiomyrgiannakis, Ron J. Weiss, Navdeep Jaitly, Ryan M. Rifkin, Robert Andrew James Clark, Quoc V. Le, Russell J. Ryan, Ying Xiao
Method and apparatus for generating target sounds

Patent number: 12190014

Abstract: A method for headphone playback of target sound. Some of the bins of a sound file are accessed, wherein the sound file has several bins each bin storing a number of audio sections. Audio sections are selected from the accessed bins, and mixed while cross fading to form a target sound sequence. A headphone speaker is driven with the target sound sequence. Other aspects are also described and claimed.

Type: Grant

Filed: June 4, 2021

Date of Patent: January 7, 2025

Assignee: Apple Inc.

Inventors: Cecilia Casarini, Ian M. Fisch, Jakub Mazur, Mitchell R. Lerner, Pablo David Brazell Ruiz, Stephen W. Ryner, Jr., Tyrone T. Chen
Virtual assistant host platform configured for interactive voice response simulation

Patent number: 12159631

Abstract: Aspects of the disclosure relate to using machine learning to simulate an interactive voice response system. A computing platform may receive user interaction information corresponding to interactions between a user and enterprise computing devices. Based on the user interaction information, the computing platform may identify predicted intents for the user, and may generate hotkey information based on the predicted intents. The computing platform may send the hotkey information and commands directing the mobile device to output the hotkey information. The computing platform may receive hotkey input information from the mobile device. Based on the hotkey input information, the computing platform may generate a hotkey response message. The computing platform may send, to the mobile device, the hotkey response message and commands directing the mobile device to convert the hotkey response message to an audio output and to output the audio output.

Type: Grant

Filed: September 21, 2023

Date of Patent: December 3, 2024

Assignee: Bank of America Corporation

Inventors: Srinivas Dundigalla, Pavan Chayanam, Saurabh Mehta
Devices and methods for a speech-based user interface

Patent number: 12154543

Abstract: A device may identify a plurality of sources for outputs that the device is configured to provide. The plurality of sources may include at least one of a particular application in the device, an operating system of the device, a particular area within a display of the device, or a particular graphical user interface object. The device may also assign a set of distinct voices to respective sources of the plurality of sources. The device may also receive a request for speech output. The device may also select a particular source that is associated with the requested speech output. The device may also generate speech having particular voice characteristics of a particular voice assigned to the particular source.

Type: Grant

Filed: October 2, 2023

Date of Patent: November 26, 2024

Assignee: Google LLC

Inventors: Ioannis Agiomyrgiannakis, Fergus James Henderson
Noise reduction in shared workspaces

Patent number: 12142264

Abstract: A method and system of reducing noise associated with telephony-based activities occurring in shared workspaces is provided. An end-user may lower their own voice to a whisper or other less audible or intelligible utterances and submit such low-quality audio signals to an automated speech recognition system via a microphone. The words identified by the automated speech recognition system are provided to a speech synthesizer, and a synthesized audio signal is created artificially that carries the content of the original human-produced utterances. The synthesized audio signal is significantly more audible and intelligible than the original audio signal. The method allows customer support agents to speak at barely audible levels yet be heard clearly by their customers.

Type: Grant

Filed: August 25, 2022

Date of Patent: November 12, 2024

Assignee: United Services Automobile Association (USAA)

Inventors: Justin Dax Haslam, Donnette L. Moncrief Brown, Eric David Schroeder, Ravi Durairaj, Deborah Janette Schulz
System and method for recommendation of terms, including recommendation of search terms in a search system

Patent number: 12099532

Abstract: Embodiments of systems and methods for providing search term suggestions in a search system are disclosed. Embodiments as disclosed may utilize the sound of an original search term to locate candidate terms based on the sound of the candidate terms and the frequency of appearance of the candidate terms in the corpus of documents being searched. A set of search term suggestions can then be determined from the candidate terms and returned to the user as search term suggestions for the original search term.

Type: Grant

Filed: January 11, 2023

Date of Patent: September 24, 2024

Assignee: OPEN TEXT SA ULC

Inventors: Jingjing Liu, Dingmeng Xue, Minhong Zhou
Neural text-to-speech synthesis with multi-level text information

Patent number: 12094447

Abstract: A method and apparatus for generating speech through neural text-to-speech (TTS) synthesis. A text input may be obtained (1310). Phoneme or character level text information may be generated based on the text input (1320). Context-sensitive text information may be generated based on the text input (1330). A text feature may be generated based on the phoneme or character level text information and the context-sensitive text information (1340). A speech waveform corresponding to the text input may be generated based at least on the text feature (1350).

Type: Grant

Filed: December 13, 2018

Date of Patent: September 17, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Huaiping Ming, Lei He
Neural-network-based text-to-speech model for novel speaker generation

Patent number: 12087275

Abstract: Systems and methods for text-to-speech with novel speakers can obtain text data and output audio data. The input text data may be input along with one or more speaker preferences. The speaker preferences can include speaker characteristics. The speaker preferences can be processed by a machine-learned model conditioned on a learned prior distribution to determine a speaker embedding. The speaker embedding can then be processed with the text data to generate an output that includes audio data descriptive of the text data spoken by a novel speaker.

Type: Grant

Filed: February 16, 2022

Date of Patent: September 10, 2024

Assignee: GOOGLE LLC

Inventors: Daisy Antonia Stanton, Sean Matthew Shannon, Soroosh Mariooryad, Russell John-Wyatt Skerry-Ryan, Eric Dean Battenberg, Thomas Edward Bagby, David Teh-Hwa Kao
Identity transfer models for generating audio/video content

Patent number: 12087268

Abstract: Systems, devices, and methods are provided for training and/or inferencing using machine-learning models. In at least one embodiment, a user selects a source media (e.g., video or audio file) and a target identity. A content embedding may be extracted from the source media, and an identity embedding may be obtained for the target identity. The content embedding of the source media and the identity embedding of the target identity may be provided to a transfer model that generates synthesized media. For example, a user may select a song that is sung by a first artist and then select a second artist as the target identity to produce a cover of the song in the voice of the second artist.

Type: Grant

Filed: December 3, 2021

Date of Patent: September 10, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Wenbin Ouyang, Naveen Sudhakaran Nair
Computer vision and mapping for audio

Patent number: 12080261

Abstract: Systems, devices, media, and methods are presented for playing audio sounds, such as music, on a portable electronic device using a digital color image of a note matrix on a map. A computer vision engine, in an example implementation, includes a mapping module, a color detection module, and a music playback module. The camera captures a color image of the map, including a marker and a note matrix. Based on the color image, the computer vision engine detects a token color value associated with each field. Each token color value is associated with a sound sample from a specific musical instrument. A global state map is stored in memory, including the token color value and location of each field in the note matrix. The music playback module, for each column, in order, plays the notes associated with one or more the rows, using the corresponding sound sample, according to the global state map.

Type: Grant

Filed: May 2, 2023

Date of Patent: September 3, 2024

Assignee: Snap Inc.

Inventors: Ilteris Canberk, Donald Giovannini, Sana Park
Alteration of speech within an audio stream based on a characteristic of the speech

Patent number: 12067968

Abstract: In some implementations, a system may receive an audio stream associated with a call between a user and an agent. The system may process, by the device and using a speech alteration model, speech from a first channel of the audio stream to alter the speech from having a first speech characteristic to having a second speech characteristic, wherein the speech alteration model is trained based on reference audio data associated with the first speech characteristic and the second speech characteristic and based on reference speech data associated with the first speech characteristic and the second speech characteristic. The system may extract the speech from the first channel that has the first speech characteristic. The system may provide, within a second channel of the audio stream, altered speech that corresponds to the speech and that has the first speech characteristic.

Type: Grant

Filed: August 30, 2021

Date of Patent: August 20, 2024

Assignee: Capital One Services, LLC

Inventors: Tasneem Adamjee, Lin Ni Lisa Cheng, Tyler Maiman, Yichen Zhang
Method for extracting speech from degraded signals by predicting the inputs to a speech vocoder

Patent number: 12020682

Abstract: A method for Parametric resynthesis (PR) producing an audible signal. A degraded audio signal is received which includes a distorted target audio signal. A prediction model predicts parameters of the audible signal from the degraded signal. The prediction model was trained to minimize a loss function between the target audio signal and the predicted audible signal. The predicted parameters are provided to a waveform generator which synthesizes the audible signal.

Type: Grant

Filed: March 20, 2020

Date of Patent: June 25, 2024

Assignee: Research Foundation of the City University of New York

Inventors: Michael Mandel, Soumi Maiti
Information processing method, information processing device, and program

Patent number: 12014723

Abstract: An information processing method is realized by a computer and includes generating a first characteristic transition which is a transition of acoustic characteristics, in accordance with an instruction from a user, generating a second characteristic transition which is a transition of acoustic characteristics of voice that is pronounced in a specific pronunciation style selected from a plurality of pronunciation styles, and generating a combined characteristic transition which is a transition of the acoustic characteristics of synthesized voice by combining the first characteristic transition and the second characteristic transition.

Type: Grant

Filed: December 11, 2020

Date of Patent: June 18, 2024

Assignee: YAMAHA CORPORATION

Inventors: Makoto Tachibana, Motoki Ogasawara
Electronic apparatus and processing system

Patent number: 12008989

Abstract: An electronic apparatus and a processing system are disclosed. In one embodiment, an electronic apparatus comprises a communication unit and at least one processor. The communication unit is configured to acquire information related to an other-party apparatus. The at least one processor is configured to receive input of a voice signal output from a first voice input unit. The electronic apparatus and the other-party apparatus are capable of communicating with each other with voice/message converted communication in which first voice input to the first voice input unit is converted into a first message and the first message is displayed on the other-party apparatus. The at least one processor determines execution of the voice/message converted communication, based on the information.

Type: Grant

Filed: October 16, 2020

Date of Patent: June 11, 2024

Assignee: KYOCERA Corporation

Inventors: Akihito Hatanaka, Tomoki Iwaizumi, Youji Hamada, Hisae Honma, Kousuke Nagase, Tomohiro Sudou
Real-time accent conversion model

Patent number: 11948550

Abstract: Techniques for real-time accent conversion are described herein. An example computing device receives an indication of a first accent and a second accent. The computing device further receives, via at least one microphone, speech content having the first accent. The computing device is configured to derive, using a first machine-learning algorithm trained with audio data including the first accent, a linguistic representation of the received speech content having the first accent. The computing device is configured to, based on the derived linguistic representation of the received speech content having the first accent, synthesize, using a second machine learning-algorithm trained with (i) audio data comprising the first accent and (ii) audio data including the second accent, audio data representative of the received speech content having the second accent.

Type: Grant

Filed: August 27, 2021

Date of Patent: April 2, 2024

Assignee: SANAS.AI INC.

Inventors: Maxim Serebryakov, Shawn Zhang
Using buffered audio to overcome lapses in telephony signal

Patent number: 11943153

Abstract: A facility for conveying first side of a voice call from a first participant to a second participant is described. Over the duration of the voice call, the facility receives the first side of the call. The facility seeks to forward the received first side of the voice call to a downstream node on a path to the second participant. The facility records the received first side of the call for at least part of the call. The facility identifies a just-ended portion of the voice call for which forwarding of the received first side of the voice call was unsuccessful. In response, the facility transmits to the downstream node the recorded first side of the voice call that coincides with the identified portion of the voice call.

Type: Grant

Filed: June 28, 2021

Date of Patent: March 26, 2024

Assignee: DISH Wireless L.L.C.

Inventors: Kevin Yao, Prashant Raghuvanshi
Augmented expression system

Patent number: 11875439

Abstract: Embodiments described herein relate to an augmented expression system to generate and cause display of a specially configured interface to present an augmented reality perspective. The augmented expression system receives image and video data of a user and tracks facial landmarks of the user based on the image and video data, in real-time to generate and present a 3-dimensional (3D) bitmoji of the user.

Type: Grant

Filed: April 15, 2020

Date of Patent: January 16, 2024

Assignee: Snap Inc.

Inventors: Chen Cao, Yang Gao, Zehao Xue
Speaking rhythm transformation apparatus, model learning apparatus, methods therefor, and program

Patent number: 11869529

Abstract: It is intended to accurately convert a speech rhythm. A model storage unit (10) stores a speech rhythm conversion model which is a neural network that receives, as an input thereto, a first feature value vector including information related to a speech rhythm of at least a phoneme extracted from a first speech signal resulting from a speech uttered by a speaker in a first group, converts the speech rhythm of the first speech signal to a speech rhythm of a speaker in a second group, and outputs the speech rhythm of the speaker in the second group. A feature value extraction unit (11) extracts, from the input speech signal resulting from the speech uttered by the speaker in the first group, information related to a vocal tract spectrum and information related to the speech rhythm.

Type: Grant

Filed: June 20, 2019

Date of Patent: January 9, 2024

Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventor: Sadao Hiroya
End-to-end text-to-speech conversion

Patent number: 11862142

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.

Type: Grant

Filed: August 2, 2021

Date of Patent: January 2, 2024

Assignee: Google LLC

Inventors: Samuel Bengio, Yuxuan Wang, Zongheng Yang, Zhifeng Chen, Yonghui Wu, Ioannis Agiomyrgiannakis, Ron J. Weiss, Navdeep Jaitly, Ryan M. Rifkin, Robert Andrew James Clark, Quoc V. Le, Russell J. Ryan, Ying Xiao
Assertiveness module for developing mental model

Patent number: 11824819

Abstract: A method, a computer program product, and a computer system generate an accurate mental model of an automated agent. The method includes receiving an input from a user device associated with a user during a communication session between the user and the automated agent. The method includes determining a response to the input. The method includes determining a confidence score of the response relative to a confidence threshold. The method includes determining an assertiveness feature associated with the response, the assertiveness feature comprising an expression of the automated agent based on the confidence score. The method includes transmitting the response and the assertiveness feature to the user device, the expression configured to update anthropomorphic characteristics of a graphical representation of the automated agent shown on a graphical user interface of the communication session displayed on a display device of the user device.

Type: Grant

Filed: January 26, 2022

Date of Patent: November 21, 2023

Assignee: International Business Machines Corporation

Inventors: Qian Pan, James Johnson, Zahra Ashktorab, Dakuo Wang
GAN-based speech synthesis model and training method

Patent number: 11817079

Abstract: The present disclosure provides a GAN-based speech synthesis model, a training method, and a speech synthesis method. According to the speech synthesis method, to-be-converted text is obtained and is converted into a text phoneme, the text phoneme is further digitized to obtain text data, and the text data is converted into a text vector to be input into a speech synthesis model. In this way, target audio corresponding to the to-be-converted text is obtained. When a target Mel-frequency spectrum is generated by using a trained generator, accuracy of the generated target Mel-frequency spectrum can reach that of a standard Mel-frequency spectrum. Through constant adversary between the generator and a discriminator and the trainings thereof, acoustic losses of the target Mel-frequency spectrum are reduced, and acoustic losses of the target audio generated based on the target Mel-frequency spectrum are also reduced, thereby improving accuracy of audio synthesized from speech.

Type: Grant

Filed: June 16, 2023

Date of Patent: November 14, 2023

Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.

Inventors: Huapeng Sima, Zhiqiang Mao
Virtual assistant host platform configured for interactive voice response simulation

Patent number: 11810565

Abstract: Aspects of the disclosure relate to using machine learning to simulate an interactive voice response system. A computing platform may receive user interaction information corresponding to interactions between a user and enterprise computing devices. Based on the user interaction information, the computing platform may identify predicted intents for the user, and may generate hotkey information based on the predicted intents. The computing platform may send the hotkey information and commands directing the mobile device to output the hotkey information. The computing platform may receive hotkey input information from the mobile device. Based on the hotkey input information, the computing platform may generate a hotkey response message. The computing platform may send, to the mobile device, the hotkey response message and commands directing the mobile device to convert the hotkey response message to an audio output and to output the audio output.

Type: Grant

Filed: July 27, 2022

Date of Patent: November 7, 2023

Assignee: Bank of America Corporation

Inventors: Srinivas Dundigalla, Pavan Chayanam, Saurabh Mehta
Devices and methods for a speech-based user interface

Patent number: 11798526

Abstract: A device may identify a plurality of sources for outputs that the device is configured to provide. The plurality of sources may include at least one of a particular application in the device, an operating system of the device, a particular area within a display of the device, or a particular graphical user interface object. The device may also assign a set of distinct voices to respective sources of the plurality of sources. The device may also receive a request for speech output. The device may also select a particular source that is associated with the requested speech output. The device may also generate speech having particular voice characteristics of a particular voice assigned to the particular source.

Type: Grant

Filed: March 1, 2022

Date of Patent: October 24, 2023

Assignee: Google LLC

Inventors: Ioannis Agiomyrgiannakis, Fergus James Henderson
Methods and systems for improving word discrimination with phonologically-trained machine learning models

Patent number: 11783813

Abstract: A hearing aid system presents a hearing impaired user with customized enhanced intelligibility speech sound in a preferred language while maintaining the voice identity of speaker. The system includes a neural network model trained with a set of source speech data representing sampling from a speech population relevant to the user. The model is also custom trained with a set of parallel or non-parallel alternative articulations, collected during an interactive session with user or algorithmically generated based on the hearing profile of the user or category of users with common linguistic and hearing profiles.

Type: Grant

Filed: July 25, 2022

Date of Patent: October 10, 2023

Inventor: Abbas Rafii
Techniques to enable stateful decompression on hardware decompression acceleration engines

Patent number: 11762698

Abstract: A hardware decompression acceleration engine including: an input buffer for receiving to-be-decompressed data from a software layer of a host computer; a decompression processing unit coupled to the input buffer for decompressing the to-be-decompressed data, the decompression processing unit further receiving first and second flags from the software layer of the host computer, wherein the first flag is indicative of a location of the to-be-decompressed data in a to-be-decompressed data block and the second flag is indicative of a presence of an intermediate state; and an output buffer for storing decompressed data from the decompression processing unit.

Type: Grant

Filed: June 18, 2021

Date of Patent: September 19, 2023

Assignee: SCALEFLUX, INC.

Inventors: Linqiang Ouyang, Mark Vernon, Dan Liu, Jinchao Lyu, Yang Liu
Audio signal discontinuity correction processing system

Patent number: 11749291

Abstract: An audio signal processing system and method is executed by an audio signal processing device to decode an audio packet to obtain decoded audio and determine an occurrence of a discontinuity occurring with a sudden increase of an amplitude of the decoded audio obtained by decoding the audio packet. The audio packet may be received correctly after an occurrence of a packet loss, and corrected to improve subjective quality of the decoded audio, wherein correcting the discontinuity of the decoded audio comprises causing distances between ISF/LSF parameters corresponding to a frame in which a packet loss has occurred to be equal.

Type: Grant

Filed: January 13, 2022

Date of Patent: September 5, 2023

Assignee: NTT DOCOMO, INC.

Inventors: Kimitaka Tsutsumi, Kei Kikuiri, Atsushi Yamaguchi
Real-time neural text-to-speech

Patent number: 11705107

Abstract: Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant of WaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditional TTS systems, where each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.

Type: Grant

Filed: October 1, 2020

Date of Patent: July 18, 2023

Assignee: Baidu USA LLC

Inventors: Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, John Miller, Andrew Ng, Jonathan Raiman, Shubhahrata Sengupta, Mohammad Shoeybi
Computer vision and mapping for audio applications

Patent number: 11670267

Abstract: Systems, devices, media, and methods are presented for playing audio sounds, such as music, on a portable electronic device using a digital color image of a note matrix on a map. A computer vision engine, in an example implementation, includes a mapping module, a color detection module, and a music playback module. The camera captures a color image of the map, including a marker and a note matrix. Based on the color image, the computer vision engine detects a token color value associated with each field. Each token color value is associated with a sound sample from a specific musical instrument. A global state map is stored in memory, including the token color value and location of each field in the note matrix. The music playback module, for each column, in order, plays the notes associated with one or more the rows, using the corresponding sound sample, according to the global state map.

Type: Grant

Filed: August 9, 2021

Date of Patent: June 6, 2023

Assignee: Snap Inc.

Inventors: Ilteris Canberk, Donald Giovannini, Sana Park
Voice monitoring system and voice monitoring method

Patent number: 11631419

Abstract: A recording device records a video and an imaging time, and a voice. Based on the voice, a sound parameter calculator calculates a sound parameter for specifying magnitude of the voice in a monitoring area at the imaging time for each of pixels and for each of certain times. A sound parameter storage unit stores the sound parameter. A sound parameter display controller superimposes a voice heat map on a captured image of the monitoring area and displays the superimposed image on a monitor. At this time, the sound parameter display controller displays the voice heat map based on a cumulative time value of magnitude of the voice, according to designation of a time range.

Type: Grant

Filed: February 12, 2021

Date of Patent: April 18, 2023

Assignee: PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO., LTD.

Inventors: Ryota Fujii, Hiroyuki Matsumoto, Hiroaki Hayashi, Kazunori Hayashi
Information processing apparatus and non-transitory computer readable medium storing program

Patent number: 11606629

Abstract: An information processing apparatus includes an acquisition unit that acquires voice data and image data, respectively, a display control unit that performs control to display the image data acquired by the acquisition unit in synchronization with the voice data, a reception unit that receives a display element to be added for display to a specific character in the image data displayed by the display control unit, and a setting unit that sets a playback period in which the specific character in the voice data is played back, as a display period of the display element received by the reception unit in the image data.

Type: Grant

Filed: July 19, 2019

Date of Patent: March 14, 2023

Assignee: FUJIFILM Business Innovation Corp.

Inventor: Mai Suzuki
Providing personalized songs in automated chatting

Patent number: 11587541

Abstract: The present disclosure provides method and apparatus for providing personalized songs in automated chatting. A message may be received in a chat flow. Personalized lyrics of a user may be generated based at least on a personal language model of the user in response to the message. A personalized song may be generated based on the personalized lyrics. The personalized song may be provided in the chat flow.

Type: Grant

Filed: June 21, 2017

Date of Patent: February 21, 2023

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Xianchao Wu, Kazushige Ito, Kazuna Tsuboi
System and method for recommendation of terms, including recommendation of search terms in a search system

Patent number: 11586654

Abstract: Embodiments of systems and methods for providing search term suggestions in a search system are disclosed. Embodiments as disclosed may utilize the sound of an original search term to locate candidate terms based on the sound of the candidate terms and the frequency of appearance of the candidate terms in the corpus of documents being searched. A set of search term suggestions can then be determined from the candidate terms and returned to the user as search term suggestions for the original search term.

Type: Grant

Filed: August 30, 2018

Date of Patent: February 21, 2023

Assignee: OPEN TEXT SA ULC

Inventors: Jingjing Liu, Dingmeng Xue, Minhong Zhou
System and method using parameterized speech synthesis to train acoustic models

Patent number: 11545136

Abstract: A method for removing private data from an acoustic model includes capturing speech from a large population of users, creating a text-to-speech voice from at least a portion of the large population of users, discarding speech data from a database of speech, creating text-to-speech waveforms from the text-to-speech voice and the new database of speech with the discarded speech data and generating an automatic speech recognition model using the text-to-speech waveforms.

Type: Grant

Filed: October 21, 2019

Date of Patent: January 3, 2023

Assignee: NUANCE COMMUNICATIONS, INC.

Inventors: Vincent Laurent Pollet, Carl Benjamin Quillen, Philip Charles Woodland, William F. Ganong, III, Steven Hoskins
Systems and methods for processing retail facility-related information requests of retail facility workers

Patent number: 11487821

Abstract: In some embodiments, methods and systems are provided for processing information requests of workers at a retail facility and retrieving information associated with the retail facility based on the information requests. An electronic device permits a worker at the retail facility to input an information request in association with at least one worker at the retail facility or at least one product at the retail facility. A computing device receives, from the electronic device, electronic data representative of a scope of the information request, analyzes this electronic data to determine the scope of the information request, obtain relevant information from one or more databases, and transmits the obtained information to the electronic device, which in turn outputs the information to the worker.

Type: Grant

Filed: April 30, 2020

Date of Patent: November 1, 2022

Assignee: Walmart Apollo, LLC

Inventors: William Craig Robinson, Jr., Dong T. Nguyen, Mekhala M. Vithala, Makeshwaran Sampath, Spencer S. Seeger, Bahula Bosetti, Praneeth Gubbala, Songshan Li, Santosh Kumar Kurashetty, Srihari Attuluri, Venkata Maguluri, Lindsay S. Leftwich, George E. Loring
Automated text-to-speech conversion, such as driving mode voice memo

Patent number: 11490229

Abstract: Various embodiments generally relate to systems and methods for creation of voice memos while an electronic device is in a driving mode. In some embodiments, a triggering event can be used to indicate that the electronic device is within a car or about to be within a car and that text communications should be translated (e.g., via an application or a conversion platform) into a voice memo that can be played via a speaker. These triggering events can include a manual selection or an automatic selection based on a set of transition criteria (e.g., electronic device moving above a certain speed, following a roadway, approaching a location in a map of a marked car, etc.).

Type: Grant

Filed: June 5, 2020

Date of Patent: November 1, 2022

Assignee: T-Mobile USA, Inc.

Inventor: Niraj Nayak
Waveform generation using end-to-end text-to-waveform system

Patent number: 11482207

Abstract: Described herein are embodiments of an end-to-end text-to-speech (TTS) system with parallel wave generation. In one or more embodiments, a Gaussian inverse autoregressive flow is distilled from an autoregressive WaveNet by minimizing a novel regularized Kullback-Leibler (KL) divergence between their highly-peaked output distributions. Embodiments of the methodology computes the KL divergence in a closed-form, which simplifies the training process and provides very efficient distillation. Embodiments of a novel text-to-wave neural architecture for speech synthesis are also described, which are fully convolutional and enable fast end-to-end training from scratch. These embodiments significantly outperform the previous pipeline that connects a text-to-spectrogram model to a separately trained WaveNet. Also, a parallel waveform synthesizer embodiment conditioned on the hidden representation in an embodiment of this end-to-end model were successfully distilled.

Type: Grant

Filed: December 21, 2020

Date of Patent: October 25, 2022

Assignee: Baidu USA LLC

Inventors: Wei Ping, Kainan Peng, Jitong Chen
Information processing device, method and medium

Patent number: 11478710

Abstract: A computer functions as: an individual performance data receiver that receives a plurality of pieces of individual performance data from two or more performer user terminals connected via a network, each of the pieces of individual performance data being generated in accordance with a performance operation by a user on the corresponding performer user terminal and transmitted during a performance; a synthetic performance data generator that generates synthetic performance data by synthesizing pieces of individual performance data in an identical time slot, of the plurality of pieces of individual performance data received from the performer user terminals, during the performance, the synthetic performance data being used to reproduce sound in which individual performance contents on the two or more performer user terminals are mixed; and a synthetic performance data transmitter that transmits the synthetic performance data to at least one appreciator user terminal connected via the network during the performa

Type: Grant

Filed: September 9, 2020

Date of Patent: October 25, 2022

Assignee: SQUARE ENIX CO., LTD.

Inventors: Kiyotaka Akaza, Yosuke Hayashi, Kei Odagiri
Electronic device and system for processing user input and method thereof

Patent number: 11455992

Abstract: According to an embodiment, an electronic device includes a user interface, a memory, and a processor. The memory is configured to store a main database including at least one input data or an auxiliary database including at least one category including pieces of response data. The processor is configured to receive a user input, using the user interface, and extract first information from the user input. The processor is also configured to identify whether a portion of the input data of the main database matches the first information. The processor is further configured to identify a category of the user input in response to identifying that the input data of the main database does not match the first information. Additionally, the processor is configured to identify the auxiliary database corresponding to the category, and provide first response data based on an event that the auxiliary database is identified.

Type: Grant

Filed: February 19, 2020

Date of Patent: September 27, 2022

Assignee: Samsung Electronics Co., Ltd.

Inventors: Sunbeom Kwon, Sunok Kim, Hyelim Woo, Jeehun Ha
Speech synthesizer using artificial intelligence, method of operating speech synthesizer and computer-readable recording medium

Patent number: 11443732

Abstract: A speech synthesizer includes a memory configured to store a plurality of sentences and prior information of a word classified into a minor class among a plurality of classes with respect to each sentence, and a processor configured to determine an oversampling rate of the word based on the prior information, determine the number of times of oversampling of the word using the determined oversampling rate and generate sentences including the word by the determined number of times of oversampling. The plurality of classes includes a first class corresponding to first reading break, a second class corresponding to second reading break greater than the first break and a third class corresponding to third reading break greater than the second break, and the minor class has a smallest count among the first to third classes in one sentence.

Type: Grant

Filed: February 15, 2019

Date of Patent: September 13, 2022

Assignee: LG ELECTRONICS INC.

Inventors: Jonghoon Chae, Sungmin Han
Virtual assistant host platform configured for interactive voice response simulation

Patent number: 11437037

Abstract: Aspects of the disclosure relate to using machine learning to simulate an interactive voice response system. A computing platform may receive user interaction information corresponding to interactions between a user and enterprise computing devices. Based on the user interaction information, the computing platform may identify predicted intents for the user, and may generate hotkey information based on the predicted intents. The computing platform may send the hotkey information and commands directing the mobile device to output the hotkey information. The computing platform may receive hotkey input information from the mobile device. Based on the hotkey input information, the computing platform may generate a hotkey response message. The computing platform may send, to the mobile device, the hotkey response message and commands directing the mobile device to convert the hotkey response message to an audio output and to output the audio output.

Type: Grant

Filed: July 20, 2020

Date of Patent: September 6, 2022

Assignee: Bank of America Corporation

Inventors: Srinivas Dundigalla, Saurabh Mehta, Pavan K. Chayanam
Method for automatically translating raw data into real human voiced audio content

Patent number: 11430423

Abstract: A method for automatically translating raw data into real human voiced audio content is provided according to an embodiment of the present disclosure. The method may comprise ingesting data, separating the data into or associating the data with a data type, and creating a list of descriptive data associated with the data type. In some embodiments, the method further comprises compiling audio phrases types associated with the descriptive data, associating a pre-recorded audio file with each audio phrase, and merging a plurality of pre-recorded audio files to create a final audio file.

Type: Grant

Filed: April 17, 2019

Date of Patent: August 30, 2022

Assignee: Weatherology, LLC

Inventor: Derek Christopher Heit
Method and electronic device for separating mixed sound signal

Patent number: 11430427

Abstract: This application can provide a method and electronic device for separating mixed sound signal. The method includes: obtaining a first hidden variable representing a human voice feature and a second hidden variable representing an accompaniment sound feature by inputting feature data of a mixed sound extracted from a mixed sound signal into a coding model for the mixed sound; obtaining first feature data of a human voice and second feature data of an accompaniment sound by inputting the first hidden variable and the second hidden variable into a first decoding model for the human voice and a second decoding model for the accompaniment sound respectively; and obtaining, based on the first feature data and the second feature data, the human voice and the accompaniment sound.

Type: Grant

Filed: June 21, 2021

Date of Patent: August 30, 2022

Assignee: Beijing Dajia Internet Information Technology Co., Ltd.

Inventors: Ning Zhang, Yan Li, Tao Jiang
Hand washing supporting device, method and program

Patent number: 11417194

Abstract: A hand washing supporting device according to embodiments is a hand washing supporting device including a processor, in which the processor is configured to perform a detection process that detects, upon detecting the motion of the hand by the user for a predetermined period of time by the detection process, a motion of a hand by a user, a determination process that determines a start of hand washing by the user, and a notification process that notifies, in accordance with a determination of the start of the hand washing by the determination process, the user of a difference between time required from the start to an end of the hand washing by the user and the predetermined period of time.

Type: Grant

Filed: June 25, 2019

Date of Patent: August 16, 2022

Assignee: Nippon Telegraph and Telephone Corporation

Inventors: Atsuhiko Maeda, Midori Kodama, Motonori Nakamura, Ippei Shake
Encoding apparatus, decoding apparatus, fricative sound judgment apparatus, and methods and programs therefor

Patent number: 11417345

Abstract: An encoding apparatus performing encoding by an encoding process in which bits are preferentially assigned to a low side to obtain a spectrum code, the encoding apparatus judging whether a sound signal is a hissing sound or not, obtaining and encoding, if the encoding apparatus judges that the sound signal is a hissing sound, what is obtained by exchanging all or a part of a spectrum existing on a lower side than a predetermined frequency in a frequency spectrum sequence of the sound signal for all or a part of a spectrum existing on a higher side of the predetermined frequency in the frequency spectrum sequence, the number of all or the part of the high-side frequency spectrum sequence being the same as the number of all or the part of the low-side frequency spectrum sequence.

Type: Grant

Filed: December 3, 2018

Date of Patent: August 16, 2022

Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventors: Ryosuke Sugiura, Yutaka Kamamoto, Takehiro Moriya
Audio control for extended-reality shared space

Patent number: 11399229

Abstract: Methods, systems, computer-readable media, and apparatuses for audio signal processing are presented. Some configurations include determining that first audio activity in at least one microphone signal is voice activity; determining whether the voice activity is voice activity of a participant in an application session active on a device; based at least on a result of the determining whether the voice activity is voice activity of a participant in the application session, generating an antinoise signal to cancel the first audio activity; and by a loudspeaker, producing an acoustic signal that is based on the antinoise signal. Applications relating to shared virtual spaces are described.

Type: Grant

Filed: July 9, 2020

Date of Patent: July 26, 2022

Assignee: QUALCOMM Incorporated

Inventors: Robert Tartz, Scott Beith, Mehrad Tavakoli, Gerhard Reitmayr

1 2 3 4 5 … next