Speech Signal Processing Patents (Class 704/200)

Psychoacoustic (Class 704/200.1)

For storage or transmission (Class 704/201)

Recognition (Class 704/231)

Synthesis (Class 704/258)

Application (Class 704/270)

3D audio rendering using volumetric audio rendering and scripted audio level-of-detail

Patent number: 12363497

Abstract: An audio engine is provided for acoustically rendering a three-dimensional virtual environment. The audio engine uses geometric volumes to represent sound sources and any sound occluders. A volumetric response is generated based on sound projected from a volumetric sound source to a listener, taking into consideration any volumetric occluders in-between. The audio engine also provides for modification of a level of detail of sound over time based on distance between a listener and a sound source. Other aspects are also described and claimed.

Type: Grant

Filed: March 14, 2024

Date of Patent: July 15, 2025

Assignee: Apple Inc.

Inventors: David Thall, Christopher A. Wolfe, James E. McCartney
Asset tracking and notification processing

Patent number: 12348903

Abstract: An asset is identified in a predefined area. The type of asset is identified through video or wireless tag identification. A security policy associated with the type of asset is obtained. Entries in an asset log are recorded based on a location of the asset, an individual handling the asset, and/or actions taken by an individual with respect to the asset based on the security policy. A real-time notification is sent when the location, the handling, and/or any of the actions warrant a notification based on the security policy.

Type: Grant

Filed: March 9, 2022

Date of Patent: July 1, 2025

Assignee: NCR Atleos Corporation

Inventors: Sudip Rahman Khan, Matthew Robert Burris, Christopher John Costello, Gregory Joseph Hartl
Speaker verification system using ultrasound energy in human speech

Patent number: 12347440

Abstract: A speaker verification system includes an ultrasonic microphone generating electrical sound signal from an audio signal corresponding to both ultrasonic frequency sound and sub-ultrasonic frequency sound. A liveness detection module generates a high frequency energy distribution based on the electrical sound signal. The liveness module generates a verification result based on the high frequency energy distribution. A speaker verification module verifies the audio signal based on the high frequency energy distribution signals and a neural network architecture.

Type: Grant

Filed: May 3, 2023

Date of Patent: July 1, 2025

Assignee: Board of Trustees of Michigan State University

Inventors: Qiben Yan, Hanqing Guo, Li Xiao
Processing continued conversations over multiple devices

Patent number: 12347420

Abstract: Implementations related to facilitating continued conversations of a user with an automated assistant when the user changes locations relative to one or more devices in an ecosystem of linked assistant devices. The user initially invokes a first device and provides a request, which is processed by the first device. The first device provides a notification to one or more other devices in the ecosystem to indicate that the user is likely to issue a further assistant request. The first device processes subsequent audio data to determine whether the subsequent audio data includes a further assistant request. The one or more other notified devices process device-specific sensor data to determine whether the user is co-present with the one of the other devices. If the user presence is detected, an indication is provided to the first device, causing the first device to cease processing subsequent audio data. Further, the co-present device starts to process subsequent audio data.

Type: Grant

Filed: October 17, 2022

Date of Patent: July 1, 2025

Assignee: GOOGLE LLC

Inventors: Victor Carbune, Matthew Sharifi
Packet loss recovery method for audio data packet, electronic device and storage medium

Patent number: 12277940

Abstract: The disclosure provides a packet loss recovery method for an audio data packet an electronic device and a storage medium. The method includes: receiving an audio data packet sent by a vehicle-mounted terminal, and identifying a discarded first sampling point set in response to detecting packet loss; obtaining a second sampling point set and a third sampling point set each adjacent to the first sampling point set, in which the second sampling point set is prior to the first sampling point set, the third sampling point set is behind the first sampling point set; and generating target audio data of the first sampling points based on first audio data sampled at the second sampling points and second audio data sampled at the third sampling points, and inserting the target audio data at sampling positions of the first sampling points.

Type: Grant

Filed: September 12, 2022

Date of Patent: April 15, 2025

Assignee: APOLLO INTELLIGENT CONNECTIVITY (BEIJING) TECHNOLOGY CO., LTD.

Inventor: Wenhuan Zhou
Model based prediction in a critically sampled filterbank

Patent number: 12277945

Abstract: The present document relates to audio source coding systems which make use of linear prediction in combination with a filterbank. A method for estimating a sample of a subband signal from two or more previous samples of the subband signal is described. The subband signal corresponds to a plurality of subbands, having an equal subband spacing, of a subband-domain representation of an audio signal. The method comprises determining signal model data using a model parameter; determining a first prediction coefficient in response to the model parameter using a first lookup table and/or a first analytical function; determining a second prediction coefficient in response to the model parameter using a second lookup table and/or a second analytical function; and determining the estimate of the sample by applying the first prediction coefficient to the first previous sample and applying the second prediction coefficient to the second previous sample.

Type: Grant

Filed: February 20, 2024

Date of Patent: April 15, 2025

Assignee: DOLBY INTERNATIONAL AB

Inventor: Lars Villemoes
Attention-based clockwork hierarchical variational encoder

Patent number: 12272349

Abstract: A method for representing an intended prosody in synthesized speech includes receiving a text utterance having at least one word, and selecting an utterance embedding for the text utterance. Each word in the text utterance has at least one syllable and each syllable has at least one phoneme. The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration of the syllable by decoding a prosodic syllable embedding for the syllable based on attention by an attention mechanism to linguistic features of each phoneme of the syllable and generating a plurality of fixed-length predicted frames based on the predicted duration for the syllable.

Type: Grant

Filed: October 16, 2023

Date of Patent: April 8, 2025

Assignee: Google LLC

Inventors: Robert Clark, Chun-An Chan, Vincent Wan
Dereverberation and noise reduction

Patent number: 12272369

Abstract: A system configured to improve audio processing by performing dereverberation and noise reduction during a communication session. In some examples, the system may include a deep neural network (DNN) configured to perform speech enhancement, which is located after an Acoustic Echo Cancellation (AEC) component. For example, the DNN may process isolated audio data output by the AEC component to jointly mitigate additive noise and reverberation. In other examples, the system may include a DNN configured to perform acoustic interference cancellation, which may jointly mitigate additive noise, reverberation, and residual echo, removing the need to perform residual echo suppression processing. The DNN is configured to process complex-valued spectrograms corresponding to the isolated audio data and/or estimated echo data generated by the AEC component.

Type: Grant

Filed: January 19, 2022

Date of Patent: April 8, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Amit Singh Chhetri, Mrudula V. Athi, Pradeep Kumar Govindaraju, Rong Hu
Multiple state digital assistant for continuous dialog

Patent number: 12266380

Abstract: Systems and processes for operating an intelligent automated assistant are provided. For example, a first speech input is received from a user. In response to receiving the first speech input, a response is provided. A first output is provided corresponding to a digital assistant in a first state, and a second speech input is received from the user. A first plurality of values is obtained. Based on the first plurality of values, a first confidence level corresponding to the second speech input is obtained. In accordance with a determination that the first confidence level exceeds a first threshold confidence level, a second output is provided corresponding to the digital assistant in a second state. The second speech input continues to be received.

Type: Grant

Filed: August 23, 2023

Date of Patent: April 1, 2025

Assignee: Apple Inc.

Inventors: Sreeneel Maddika, Ahmed Serag El Din Hussen Abdelaziz, Chaitanya Mannemala, Srikanth Vishnubhotla, Garrett L. Weinberg
Response endpoint selection based on audio characteristics of physical environments

Patent number: 12267288

Abstract: A computing system may determine audio data representing a message that is to be provided to a first user, wherein the first user is associated with at least a first device and a second device. First data corresponding to an audio characteristic of a first physical environment in which the first device is located and second data corresponding to the audio characteristic of a second physical environment in which the second device is located may be determined, wherein the second physical environment is different than the first physical environment and the second data is different than the first data.

Type: Grant

Filed: January 2, 2023

Date of Patent: April 1, 2025

Assignee: Amazon Technologies, Inc.

Inventor: Scott Ian Blanksteen
Shared encoder for natural language understanding processing

Patent number: 12266355

Abstract: Techniques for using a shared encoder and multiple different decoders for natural language understanding (NLU) tasks are described. The individual decoders are configured to perform different tasks using the output from one shared encoder. The decoders can process with respect to different domains and different languages. Using the shared encoder can reduce computation time during runtime. Using the shared encoder can reduce training costs (e.g., time and resources) when the system is updated to incorporate additional intents and entities. The system employs an attention mechanism to extract encoded representation data that can be used by the different decoders for its specific task.

Type: Grant

Filed: March 9, 2022

Date of Patent: April 1, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Jonathan Jakob Hueser, Fabian Triefenbach, Chandana Satya Prakash, Jin Cao, Wael Hamza, Mariusz Momotko
System and method for watermarking audio data for automated speech recognition (ASR) systems

Patent number: 12260866

Abstract: A method, computer program product, and computing system for processing audio information associated with a speech processing system and encoding a watermark in a non-disruptive portion of the audio information.

Type: Grant

Filed: August 30, 2022

Date of Patent: March 25, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Patrick Aubrey Naylor, Dushyant Sharma, William Francis Ganong, III, Uwe Helmut Jost, Ljubomir Milanovic
Automatic interpretation server and method based on zero UI for connecting terminal devices only within a speech-receiving distance

Patent number: 12260865

Abstract: Provided a method performed by an automatic interpretation server based on a zero user interface (UI), which communicates with a plurality of terminal devices having a microphone function, a speaker function, a communication function, and a wearable function. The method includes connecting terminal devices disposed within a designated automatic interpretation zone, receiving a voice signal of a first user from a first terminal device among the terminal devices within the automatic interpretation zone, matching a plurality of users placed within a speech-receivable distance of the first terminal device, and performing automatic interpretation on the voice signal and transmitting results of the automatic interpretation to a second terminal device of at least one second user corresponding to a result of the matching.

Type: Grant

Filed: July 19, 2022

Date of Patent: March 25, 2025

Assignee: Electronics and Telecommunications Research Institute

Inventors: Seung Yun, Sang Hun Kim, Min Kyu Lee, Joon Gyu Maeng
Augmenting identifying metadata related to group communication session participants using artificial intelligence techniques

Patent number: 12255936

Abstract: Methods, apparatus, and processor-readable storage media for augmenting identifying metadata related to group communication session participants using artificial intelligence techniques are provided herein.

Type: Grant

Filed: February 8, 2023

Date of Patent: March 18, 2025

Assignee: Dell Products L.P.

Inventors: Dhilip S. Kumar, Hung T. Dinh, Bijan Kumar Mohanty
Audio device with uncertainty quantification and related methods

Patent number: 12248727

Abstract: An audio device comprising memory, an interface, and one or more processors, wherein the one or more processors are configured to obtain audio data; process the audio data for provision of an audio output; process the audio data for provision of one or more audio parameters indicative of one or more characteristics of the audio data; map the one or more audio parameters to a first latent space of a first neural network for provision of a mapping parameter indicative of whether the one or more audio parameters belong to a training manifold of the first latent space; determine, based on the mapping parameter, an uncertainty parameter indicative of an uncertainty of processing quality; and control the processing of the audio data for provision of the audio output based on the uncertainty parameter.

Type: Grant

Filed: March 14, 2024

Date of Patent: March 11, 2025

Inventors: Clément Laroche, Diego Caviedes Nozal
Text assisted telephony on wireless device method and apparatus

Patent number: 12244752

Abstract: A communication system and method usable to facilitate communication between a hearing user and an assisted user. In particular, the system employs a wireless portable tablet or other portable electronic computing device linked to a captioning enabled phone as a remote interface for that phone, thereby providing an assisted user with more options, more freedom, and improved usability of the system.

Type: Grant

Filed: September 27, 2023

Date of Patent: March 4, 2025

Assignee: ULTRATEC, INC.

Inventors: Christopher R. Engelke, Kevin R. Colwell, Troy Vitek
Adaptive visual speech recognition

Patent number: 12211488

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data using an adaptive visual speech recognition model. One of the methods includes receiving a video that includes a plurality of video frames that depict a first speaker: obtaining a first embedding characterizing the first speaker; and processing a first input comprising (i) the video and (ii) the first embedding using a visual speech recognition neural network having a plurality of parameters, wherein the visual speech recognition neural network is configured to process the video and the first embedding in accordance with trained values of the parameters to generate a speech recognition output that defines a sequence of one or more words being spoken by the first speaker in the video.

Type: Grant

Filed: June 15, 2022

Date of Patent: January 28, 2025

Assignee: DeepMind Technologies Limited

Inventors: Ioannis Alexandros Assael, Brendan Shillingford, Joao Ferdinando Gomes de Freitas
System and method for cardiovascular risk prediction and computer readable medium thereof

Patent number: 12198340

Abstract: Provided are a system and a method for cardiovascular risk prediction, where artificial intelligence is utilized to perform segmentation on non-contrast or contrast medical images to identify precise regions of the heart, pericardium, and aorta of a subject, such that the adipose tissue volume and calcium score can be derived from the medical images to assist in cardiovascular risk prediction. Also provided is a computer readable medium for storing a computer executable code to implement the method.

Type: Grant

Filed: May 31, 2022

Date of Patent: January 14, 2025

Assignee: NATIONAL TAIWAN UNIVERSITY

Inventors: Tzung-Dau Wang, Wen-Jeng Lee, Yu-Cheng Huang, Chiu-Wang Tseng, Cheng-Kuang Lee, Wei-Chung Wang, Cheng-Ying Chou
Shared speech processing network for multiple speech applications

Patent number: 12200450

Abstract: A device to process speech includes a speech processing network that includes an input configured to receive audio data. The speech processing network also includes one or more network layers configured to process the audio data to generate a network output. The speech processing network includes an output configured to be coupled to multiple speech application modules to enable the network output to be provided as a common input to each of the multiple speech application modules.

Type: Grant

Filed: May 26, 2023

Date of Patent: January 14, 2025

Assignee: QUALCOMM Incorporated

Inventors: Lae-Hoon Kim, Sunkuk Moon, Erik Visser, Prajakt Kulkarni
Audio spoof detection using attention-based contrastive learning

Patent number: 12189712

Abstract: An exemplary method for detecting fake audios comprises: converting audio data into an image representation of the audio data; providing the image representation of the audio data to a trained machine-learning model, the machine learning model: generating, using a trained self-attention branch, one or more representation embeddings corresponding to the image representation of the audio data; and receiving, using a trained classifier component, the one or more representation embeddings and outputting a classification result. The machine-learning model is trained by: in a first stage, training one or more self- and cross-attention components via contrastive learning, each self- and cross-attention component comprises a first self-attention branch, a second self-attention branch, and a cross-attention branch; and in a second stage, training the classifier component; and providing the classification result.

Type: Grant

Filed: January 29, 2024

Date of Patent: January 7, 2025

Assignee: Reality Defender, Inc.

Inventors: Gaurav Bharaj, Chirag Goel, Surya Koppisetti, Ben Colman, Ali Shahriyari
Systems and methods for determining a next action based on entities and intents

Patent number: 12183344

Abstract: Systems, apparatuses, methods, and computer program products are disclosed for predicting an entity and intent based on captured speech. An example method includes capturing speech and converting the speech to text. The example method further includes causing generation of one or more entities and one or more intents based on the speech and the text. The example method further includes determining a next action based on each of the one or more entities and each of the one or more intents.

Type: Grant

Filed: November 24, 2021

Date of Patent: December 31, 2024

Assignee: Wells Fargo Bank, N.A.

Inventors: Vinothkumar Venkataraman, Rahul Ignatius, Naveen Gururaja Yeri, Paul Davis
Touch-free, voice-assistant, and/or tracking systems and methods for automating inventory generation and tracking of childcare products

Patent number: 12175967

Abstract: Touch-free, voice-assistant, and/or tracking systems and methods are described for automating inventory generation and tracking of childcare products. In various aspects, the touch-free, voice-assistant, and/or tracking systems and methods comprise receiving, by one or more processors, one or more name values corresponding to one or more children of a childcare program associated with at least one physical childcare location. A childcare product inventory, comprising a childcare product of at least one product type, is generated based on a count of the one or more name values. Child event data is received comprising information related to use of the childcare product. The child event data is based on audible input of a user as received via a voice command interface of a voice-assistant application (app) as implemented on a voice assistance device. The childcare product inventory may be updated based on the child event data.

Type: Grant

Filed: October 29, 2021

Date of Patent: December 24, 2024

Assignee: The Procter & Gamble Company

Inventor: Brad S. Hoekzema
Action estimation device, action estimation method, and recording medium

Patent number: 12142291

Abstract: An action estimation device includes: an obtainer that obtains sound information pertaining to an inaudible sound, the inaudible sound being a sound in an ultrasonic band collected by a sound collector; and an estimator that estimates an output result, obtained by inputting the sound information obtained by the obtainer into a trained model indicating a relationship between the sound information and action information pertaining to an action of a person, as the action information of the person.

Type: Grant

Filed: June 21, 2022

Date of Patent: November 12, 2024

Assignee: PANASONIC INTELLECTUAL PROPERTY CORPORATION OF AMERICA

Inventors: Taketoshi Nakao, Toshiyuki Matsumura, Tatsumi Nagashima, Tetsuji Fuchikami
System and method for integrating auditory and non-auditory inputs for adaptable speech recognition

Patent number: 12136421

Abstract: A speech recognition method includes receiving audible data and user data. The audible data includes information about an utterance by the user. The user data includes information about movements by the user. The method further includes fusing the audible data and the user data to obtain fused data and determining at least one spoken word of the utterance based on the fused data.

Type: Grant

Filed: March 3, 2022

Date of Patent: November 5, 2024

Assignee: GM GLOBAL TECHNOLOGY OPERATIONS LLC

Inventors: Jacob Alan Bond, Hannah Elizabeth Wagner, Joseph F. Szczerba, Alan D. Hejl
Methods for neural network-based voice enhancement and systems thereof

Patent number: 12125496

Abstract: The disclosed technology relates to methods, voice enhancement systems, and non-transitory computer readable media for real-time voice enhancement. In some examples, input audio data including foreground speech content, non-content elements, and speech characteristics is fragmented into input speech frames. The input speech frames are converted to low-dimensional representations of the input speech frames. One or more of the fragmentation or the conversion is based on an application of a first trained neural network to the input audio data. The low-dimensional representations of the input speech frames omit one or more of the non-content elements. A second trained neural network is applied to the low-dimensional representations of the input speech frames to generate target speech frames. The target speech frames are combined to generate output audio data. The output audio data further includes one or more portions of the foreground speech content and one or more of the speech characteristics.

Type: Grant

Filed: April 24, 2024

Date of Patent: October 22, 2024

Assignee: SANAS.AI INC.

Inventors: Shawn Zhang, Lukas Pfeifenberger, Jason Wu, Piotr Dura, David Braude, Bajibabu Bollepalli, Alvaro Escudero, Gokce Keskin, Ankita Jha, Maxim Serebryakov
Speech signal processing method and speech separation method

Patent number: 12106768

Abstract: This application provides a speech signal processing method performed by a computer device. Through an iterative training process, a teacher speech separation model can play a smooth role in the training of a student speech separation model based on the accuracy of separation results of the student speech separation model of outputting a target speech signal from a mixed speech signal and the consistency between separation results obtained by the teacher speech separation model of outputting the target speech signal from the mixed speech signal and the student speech separation model of performing the same task, thereby maintaining the separation stability while improving the separation accuracy of the student speech separation model as a trained speech separation model, and greatly improving the separation capability of the trained speech separation model.

Type: Grant

Filed: February 17, 2022

Date of Patent: October 1, 2024

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Jun Wang, Wingyip Lam
Processing of audio signals during high frequency reconstruction

Patent number: 12106762

Abstract: The application relates to HFR (High Frequency Reconstruction/Regeneration) of audio signals. In particular, the application relates to a method and system for performing HFR of audio signals having large variations in energy level across the low frequency range which is used to reconstruct the high frequencies of the audio signal. A system configured to generate a plurality of high frequency subband signals covering a high frequency interval from a plurality of low frequency subband signals is described.

Type: Grant

Filed: May 2, 2024

Date of Patent: October 1, 2024

Assignee: DOLBY INTERNATIONAL AB

Inventor: Kristofer Kjoerling
Weakly-supervised sound event detection method and system based on adaptive hierarchical pooling

Patent number: 12080319

Abstract: The present disclosure provides a weakly-supervised sound event detection method and system based on adaptive hierarchical pooling. The system includes an acoustic model and an adaptive hierarchical pooling algorithm module (AHPA-model), where the acoustic model inputs a pre-processed and feature-extracted audio signal, and predicts a frame-level prediction probability aggregated by the AHPA-module to obtain a sentence-level prediction probability. The acoustic model and a relaxation parameter are jointly optimized to obtain an optimal model weight and an optimal relaxation parameter based for formulating each category of sound event. A pre-processed and feature-extracted unknown audio signal is input to obtain frame-level prediction probabilities of all target sound events (TSEs), and sentence-level prediction probabilities of all categories of TSEs are obtained based on an optimal pooling strategy of each category of TSE.

Type: Grant

Filed: June 27, 2022

Date of Patent: September 3, 2024

Assignee: Jiangsu University

Inventors: Qirong Mao, Lijian Gao, Yaxin Shen, Qinghua Ren, Yongzhao Zhan, Keyang Cheng
Methods and systems for transcription of audio data

Patent number: 12062374

Abstract: Systems, devices, and methods transcribe words recorded in audio data. A computer-generated transcript is provided. The transcript comprises records for each word in the computer-generated transcript. At least one confirmation input is received for each record. The at least one confirmation input modifies a selected record and automatically identifies a next record for receiving a next confirmation input. A sequence of confirmation inputs may rapidly modify and validate each record in a sequence of records in the computer-generated transcript. A validated transcript is generated from the modified records and is provided from an evidence management system.

Type: Grant

Filed: February 7, 2023

Date of Patent: August 13, 2024

Assignee: Axon Enterprise, Inc.

Inventors: Noah Spitzer-Williams, Choongyeun Cho, Thomas Crosley, Zachary Goist, Daniel Bellia, Vinh Nguyen, Chelsea Alexander-Taylor
Electronic apparatus and method for controlling electronic apparatus

Patent number: 12056517

Abstract: Disclosed are an electronic device and a method for controlling thereof. According to an embodiment, a method for controlling an electronic apparatus includes: obtaining a voice command while a first application is executed in foreground; obtaining a text by recognizing the voice command; identifying at least one second application to perform the voice command based on the text; based on information of the first application and the at least one second application; identify whether to execute each of the first application and the at least one second application in the foreground or background of the electronic apparatus; and providing the first application and the at least one second application based on the identification.

Type: Grant

Filed: February 17, 2022

Date of Patent: August 6, 2024

Assignee: Samsung Electronics Co., Ltd.

Inventors: Yeonho Lee, Sangwook Park, Youngbin Shin, Kookjin Yeo
Device leadership negotiation among voice interface devices

Patent number: 12046241

Abstract: The various implementations described herein include methods and systems for determining device leadership among voice interface devices. In one aspect, a method is performed at a first electronic device of a plurality of electronic devices, each having microphones, a speaker, processors, and memory storing programs for execution by the processors. The first device detects a voice input. It determines a device state and a relevance of the voice input. It identifies a subset of electronic devices from the plurality to which the voice input is relevant. In accordance with a determination that the subset includes the first device, the first device determines a first score of a criterion associated with the voice input and receives second scores of the criterion from other devices in the subset. In accordance with a determination that the first score is higher than the second scores, the first device responds to the detected input.

Type: Grant

Filed: May 4, 2023

Date of Patent: July 23, 2024

Assignee: Google LLC

Inventors: Kenneth Mixter, Diego Melendo Casado, Alexander H. Gruenstein, Terry Tai, Christopher Thaddeus Hughes, Matthew Nirvan Sharifi
Method for determining and linking important parts among STT result and reference data

Patent number: 12033659

Abstract: Disclosed is a method for determining important parts among a speech-to-text (STT) result and reference data, which is performed by a computing device. The method may include acquiring STT data generated by performing STT with respect to a speech signal; acquiring reference data; determining first important information in one data among the STT data and the reference data; and determining second important information linked with the first important information in other data different from data in which the first important information is determined among the STT data and the reference data.

Type: Grant

Filed: December 27, 2023

Date of Patent: July 9, 2024

Assignee: ActionPower Corp.

Inventors: Hyungwoo Kim, Hwanbok Mun, Kangwook Kim
Generating expressive speech audio from text data

Patent number: 12033611

Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.

Type: Grant

Filed: February 28, 2022

Date of Patent: July 9, 2024

Assignee: ELECTRONIC ARTS INC.

Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
Artificial latency for moderating voice communication

Patent number: 12027177

Abstract: A computer-implemented method to determine whether to introduce latency into an audio stream from a particular speaker includes an audio stream from a sender device. The method further includes providing, as input to a trained machine-learning model, the audio stream and a speech analysis score, information about one or more voice emotion parameters, and one or more voice emotion scores for a first user associated with the sender device, wherein the trained machine-learning model is iteratively applied to the audio stream and wherein each iteration corresponds to a respective portion of the audio stream. The method further includes generating as output, with the trained machine-learning model, a level of toxicity in the audio stream. The method further includes transmitting the audio stream to a recipient device, wherein the transmitting is performed to introduce a time delay in the audio stream based on the level of toxicity.

Type: Grant

Filed: September 8, 2022

Date of Patent: July 2, 2024

Assignee: Roblox Corporation

Inventors: Mahesh Kumar Nandwana, Philippe Clavel, Morgan McGuire
Emitting word timings with end-to-end models

Patent number: 12027154

Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

Type: Grant

Filed: February 9, 2023

Date of Patent: July 2, 2024

Assignee: Google LLC

Inventors: Tara N. Sainath, Basilio Garcia Castillo, David Rybach, Trevor Strohman, Ruoming Pang
Unsupervised learning of disentangled speech content and style representation

Patent number: 12027151

Abstract: A linguistic content and speaking style disentanglement model includes a content encoder, a style encoder, and a decoder. The content encoder is configured to receive input speech as input and generate a latent representation of linguistic content for the input speech output. The content encoder is trained to disentangle speaking style information from the latent representation of linguistic content. The style encoder is configured to receive the input speech as input and generate a latent representation of speaking style for the input speech as output. The style encoder is trained to disentangle linguistic content information from the latent representation of speaking style. The decoder is configured to generate output speech based on the latent representation of linguistic content for the input speech and the latent representation of speaking style for the same or different input speech.

Type: Grant

Filed: November 18, 2021

Date of Patent: July 2, 2024

Assignee: Google LLC

Inventors: Ruoming Pang, Andros Tjandra, Yu Zhang, Shigeki Karita
Formalizing informal agreements in physical space and digital space

Patent number: 12021822

Abstract: A computer-implemented method includes receiving a communication between first and second users via a communication channel associated with a communication space, and identifying the first user having a first role and the second user having a second role, a formality of the communication is determined based on the second role. The method includes identifying a transformer model for the communication space and monitoring the communication for an agreement clause via the transformer model by deriving an agreement clause based on the communication and classifying the derived agreement clause.

Type: Grant

Filed: October 4, 2022

Date of Patent: June 25, 2024

Assignee: International Business Machines Corporation

Inventors: Aaron K. Baughman, Jeremy R. Fox, Raghuveer Prasad Nagar, Dinesh Kumar Bhudavaram
Methods and systems for training convolutional neural networks

Patent number: 12020405

Abstract: A computer-implemented method for training a convolutional neural network includes receiving a captured image. A denoised image is generated by applying the convolutional neural network to the captured image. The convolutional neural network is trained based on a high frequency loss function, as well as the captured image and the denoised image.

Type: Grant

Filed: November 3, 2021

Date of Patent: June 25, 2024

Assignee: LEICA MICROSYSTEMS CMS GMBH

Inventor: Jose Miguel Serra Lleti
Method and system for conversation transcription with metadata

Patent number: 12020708

Abstract: Methods and systems for enabling an efficient review of meeting content via a metadata-enriched, speaker-attributed transcript are disclosed. By incorporating speaker diarization and other metadata, the system can provide a structured and effective way to review and/or edit the transcript. One type of metadata can be image or video data to represent the meeting content. Furthermore, the present subject matter utilizes a multimodal diarization model to identify and label different speakers. The system can synchronize various sources of data, e.g., audio channel data, voice feature vectors, acoustic beamforming, image identification, and extrinsic data, to implement speaker diarization.

Type: Grant

Filed: October 11, 2021

Date of Patent: June 25, 2024

Assignee: SoundHound AI IP, LLC.

Inventors: Kiersten L. Bradley, Ethan Coeytaux, Ziming Yin
Knowledge graph question answering with neural machine translation

Patent number: 12013884

Abstract: A modular two-stage neural architecture is used in translating a natural language question into a logic form such as a SPARQL Protocol and RDF Query Language (SPARQL) query. In a first stage, a neural machine translation (NMT)-based sequence-to-sequence (Seq2Seq) model translates a question into a sketch of the desired SPARQL query called a SPARQL silhouette. In a second stage a neural graph search module predicts the correct relations in the underlying knowledge graph.

Type: Grant

Filed: June 30, 2022

Date of Patent: June 18, 2024

Assignee: International Business Machines Corporation

Inventors: Saswati Dana, Dinesh Garg, Dinesh Khandelwal, G P Shrivatsa Bhargav, Sukannya Purkayastha
Oversampling in a combined transposer filterbank

Patent number: 11993817

Abstract: The present invention relates to coding of audio signals, and in particular to high frequency reconstruction methods including a frequency domain harmonic transposer. A system and method for generating a high frequency component of a signal from a low frequency component of the signal is described.

Type: Grant

Filed: January 19, 2023

Date of Patent: May 28, 2024

Assignee: Dolby International AB

Inventors: Lars Villemoes, Per Ekstrand
Neural networks for speaker verification

Patent number: 11961525

Abstract: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.

Type: Grant

Filed: August 3, 2021

Date of Patent: April 16, 2024

Assignee: Google LLC

Inventors: Georg Heigold, Samuel Bengio, Ignacio Lopez Moreno
Sound categorization system

Patent number: 11947593

Abstract: A system, method, and computer program product for hierarchical categorization of sound comprising one or more neural networks implemented on one or more processors. The one or more neural networks are configured to categorize a sound into a two or more tiered hierarchical coarse categorization and a finest level categorization in the hierarchy. The categorization sound may be used to search a database for similar or contextually related sounds.

Type: Grant

Filed: September 28, 2018

Date of Patent: April 2, 2024

Inventors: Arindam Jati, Naveen Kumar, Ruxin Chen
Inter-channel feature extraction method, audio separation method and apparatus, and computing device

Patent number: 11908483

Abstract: This application relates to a method of extracting an inter channel feature from a multi-channel multi-sound source mixed audio signal performed at a computing device.

Type: Grant

Filed: August 12, 2021

Date of Patent: February 20, 2024

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Rongzhi Gu, Shixiong Zhang, Lianwu Chen, Yong Xu, Meng Yu, Dan Su, Dong Yu
Audio forwarding method, device and storage medium

Patent number: 11903067

Abstract: An audio forwarding method, an audio forwarding method device and a storage medium are described. The audio forwarding method comprises: establishing a first communication link with a sound source device based on a first wireless communication protocol; establishing a second communication link with an audio playback device based on a second wireless communication protocol; receiving first audio data from the sound source device through the first communication link; processing the first audio data to generate second audio data, and storing the second audio data into a second buffer; and transmitting the second audio data to the audio playback device through the second communication link.

Type: Grant

Filed: June 24, 2022

Date of Patent: February 13, 2024

Assignee: Nanjing Zgmicro Company Limited

Inventor: Bin Xu
Embedded audio sensor system and methods

Patent number: 11894015

Abstract: An embedded sensor can include an audio detector, a digital signal processor, a library, and a rules engine. The digital signal processor can be configured to receive signals from the audio detector and to identify the environment in which the embedded sensor is located. The library can store statistical models associated with specific environments, and the digital signal processor can be configured identify specific events based on detected sounds within the particular environment by utilizing the statistical model associated with the particular environment. The DSP can associate a probability of accuracy for the identified audible event. A rules engine can be configured to receive the probability and transmit a report of the detected audible event.

Type: Grant

Filed: October 31, 2022

Date of Patent: February 6, 2024

Assignee: CELLULAR SOUTH, INC.

Inventors: Brett Rogers, Tommy Naugle, Stephen Bye, Craig Sparks, Arman Kirakosyan
Low-frequency emphasis for LPC-based coding in frequency domain

Patent number: 11854561

Abstract: The invention provides an audio encoder including a combination of a linear predictive coding filter having a plurality of linear predictive coding coefficients and a time-frequency converter, wherein the combination is configured to filter and to convert a frame of the audio signal into a frequency domain in order to output a spectrum based on the frame and on the linear predictive coding coefficients; a low frequency emphasizer configured to calculate a processed spectrum based on the spectrum, wherein spectral lines of the processed spectrum representing a lower frequency than a reference spectral line are emphasized; and a control device configured to control the calculation of the processed spectrum by the low frequency emphasizer depending on the linear predictive coding coefficients of the linear predictive coding filter.

Type: Grant

Filed: November 22, 2022

Date of Patent: December 26, 2023

Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.

Inventors: Stefan Doehla, Bernhard Grill, Christian Helmrich, Nikolaus Rettelbach
Activation trigger processing

Patent number: 11823670

Abstract: Utterance-based user interfaces can include activation trigger processing techniques for detecting activation triggers and causing execution of certain commands associated with particular command pattern activation triggers without waiting for output from a separate speech processing engine. The activation trigger processing techniques can also detect speech analysis patterns and selectively activate a speech processing engine.

Type: Grant

Filed: April 17, 2020

Date of Patent: November 21, 2023

Assignee: Spotify AB

Inventor: Richard Mitic
Multi-threaded speaker identification

Patent number: 11810572

Abstract: A system, method, and computer-program product includes distributing a plurality of audio data files of a speech data corpus to a plurality of computing nodes that each implement a plurality of audio processing threads, executing the plurality of audio processing threads associated with each of the plurality of computing nodes to detect a plurality of tentative speakers participating in each of the plurality of audio data files, generating, via a clustering algorithm, a plurality of clusters of embedding signatures based on a plurality of embedding signatures associated with the plurality of tentative speakers in each of the plurality of audio data files, and detecting a plurality of global speakers associated with the speech data corpus based on the plurality of clusters of embedding signatures.

Type: Grant

Filed: June 8, 2023

Date of Patent: November 7, 2023

Assignee: SAS INSTITUTE INC.

Inventors: Xiaozhuo Cheng, Xiaolong Li, Xu Yang
Low power mode for speech capture devices

Patent number: 11810593

Abstract: A system configured to perform low power mode wakeword detection is provided. A device reduces power consumption without compromising functionality by placing a primary processor into a low power mode and using a secondary processor to monitor for sound detection. The secondary processor stores input audio data in a buffer component while performing sound detection on the input audio data. If the secondary processor detects a sound, the secondary processor sends an interrupt signal to the primary processor, causing the primary processor to enter an active mode. While in the active mode, the primary processor performs wakeword detection using the buffered audio data. To reduce a latency, the primary processor processes the buffered audio data at an accelerated rate. In some examples, the device may further reduce power consumption by including a second buffer component and only processing the input audio data after detecting a sound.

Type: Grant

Filed: November 6, 2020

Date of Patent: November 7, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Dibyendu Nandy, Om Prakash Gangwal

1 2 3 4 5 … next