Pitch Patents (Class 704/207)

Voiced or unvoiced (Class 704/208)

System and method for speech to text conversion

Patent number: 12374329

Abstract: A system and a method for enabling automatic conversion of speech input to text is disclosed. The method involves receiving an audio file pertaining to speech of a user, extracting a first set of attributes indicative of plurality of time frames spaced along the duration of the audio file, extracting a second set of attributes indicative of speech patterns, predicting a first set of characters to generate a first output sentence, determining, through an artificial intelligence (AI) engine, a first data set comprising a corpus of sentences of a predefined language based on a predefined language usage parameters, and generating a textual output in the predefined language based on a combination of the extracted first and second set of attributes, the predicted first set of characters, and the AI engine based determination.

Type: Grant

Filed: December 31, 2021

Date of Patent: July 29, 2025

Assignee: JIO PLATFORMS LIMITED

Inventors: Gaurav Duggal, Arijay Chaudhry, Paras Ahuja, Anil Kumar Goyal, Venkateshwaran Mahadevan
Apparatus and method for improved concealment of the adaptive codebook in a CELP-like concealment employing improved pitch lag estimation

Patent number: 12315518

Abstract: An apparatus for determining an estimated pitch lag is provided. The apparatus includes an input interface for receiving a plurality of original pitch lag values, and a pitch lag estimator for estimating the estimated pitch lag. The pitch lag estimator is configured to estimate the estimated pitch lag depending on a plurality of original pitch lag values and depending on a plurality of information values, wherein for each original pitch lag value of the plurality of original pitch lag values, an information value of the plurality of information values is assigned to the original pitch lag value.

Type: Grant

Filed: June 30, 2022

Date of Patent: May 27, 2025

Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.

Inventors: Jeremie Lecomte, Michael Schnabel, Goran Markovic, Martin Dietz, Bernhard Neugebauer
Phase reconstruction in a speech decoder

Patent number: 12300253

Abstract: Innovations in phase quantization during speech encoding and phase reconstruction during speech decoding are described. For example, to encode a set of phase values, a speech encoder omits higher-frequency phase values and/or represents at least some of the phase values as a weighted sum of basis functions. Or, as another example, to decode a set of phase values, a speech decoder reconstructs at least some of the phase values using a weighted sum of basis functions and/or reconstructs lower-frequency phase values then uses at least some of the lower-frequency phase values to synthesize higher-frequency phase values. In many cases, the innovations improve the performance of a speech codec in low bitrate scenarios, even when encoded data is delivered over a network that suffers from insufficient bandwidth or transmission quality problems.

Type: Grant

Filed: October 5, 2023

Date of Patent: May 13, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Soren Skak Jensen, Sriram Srinivasan, Koen Bernard Vos
Method and system for real-time and low latency synthesis of audio using neural networks and differentiable digital signal processors

Patent number: 12283284

Abstract: Example aspects include techniques for implementing real-time and low-latency synthesis of audio. These techniques may include generating a frame by sampling audio input in increments equal to a buffer size of until a threshold corresponding to a frame size used to train a machine learning (ML) model is reached, detecting feature information within the frame, determining, by the ML model, control information for audio reproduction based on the feature information. In addition, the techniques may include generating filtered noise information by inverting the noise magnitude control information using an overlap and add technique, generating, based on the control information, additive harmonic information by combining a plurality of scaled wavetables, and rendering audio output based on the filtered noise information and the additive harmonic information.

Type: Grant

Filed: May 19, 2022

Date of Patent: April 22, 2025

Assignee: LEMON INC.

Inventors: Lamtharn Hantrakul, David Trevelyan, Haonan Chen, Matthew David Avent, Janne Jayne Harm Renée Spijkervet
Speech processing method and apparatus

Patent number: 12260853

Abstract: A speech processing method includes obtaining first speech information from a user, determining one or more similar speech segments in the first speech information and deleting one or more similar frames each of the one or more similar speech segments to obtain second speech information, and analyzing the second speech information to determine a user intent corresponding to the first speech information. A duration of the first speech information exceeds a preset analysis duration threshold, and a duration of the second speech information does not exceed the preset analysis duration threshold.

Type: Grant

Filed: March 10, 2022

Date of Patent: March 25, 2025

Assignee: LENOVO (BEIJING) LIMITED

Inventors: Yinping Zhang, Chenyu Zhang, Lili Guo
Multi-lag format for audio coding

Patent number: 12223968

Abstract: Described herein is a method of encoding an audio signal. The method comprises: generating a plurality of subband audio signals based on the audio signal; determining a spectral envelope of the audio signal; for each subband audio signal, determining autocorrelation information for the subband audio signal based on an autocorrelation function of the subband audio signal; and generating an encoded representation of the audio signal, the encoded representation comprising a representation of the spectral envelope of the audio signal and a representation of the autocorrelation information for the plurality of subband audio signals. Further described are methods of decoding the audio signal from the encoded representation, as well as corresponding encoders, decoders, computer programs, and computer-readable recording media.

Type: Grant

Filed: August 18, 2020

Date of Patent: February 11, 2025

Assignee: Dolby International AB

Inventors: Lars Villemoes, Heidi-Maria Lehtonen, Heiko Purnhagen, Per Hedelin
Apparatus and method for processing an audio signal using a harmonic post-filter

Patent number: 12190897

Abstract: An apparatus for processing an audio signal having associated therewith a pitch lag information and a gain information, includes a domain converter for converting a first domain representation of the audio signal into a second domain representation of the audio signal; and a harmonic post-filter for filtering the second domain representation of the audio signal, wherein the post-filter is based on a transfer function including a numerator and a denominator, wherein the numerator includes a gain value indicated by the gain information, and wherein the denominator includes an integer part of a pitch lag indicated by the pitch lag information and a multi-tap filter depending on a fractional part of the pitch lag.

Type: Grant

Filed: May 16, 2023

Date of Patent: January 7, 2025

Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.

Inventors: Emmanuel Ravelli, Christian Helmrich, Goran Markovic, Matthias Neusinger, Sascha Disch, Manuel Jander, Martin Dietz
Decoding apparatus and method, and program

Patent number: 12183353

Abstract: The present technology relates to a decoding apparatus, a decoding method and a program which make it possible to obtain sound with higher quality. A demultiplexing circuit demultiplexes an input code string into a gain code string and a signal code string. A signal decoding circuit decodes the signal code string to output a time series signal. A gain decoding circuit decodes the gain code string. That is, the gain decoding circuit reads out gain values and gain inclination values at predetermined gain sample positions of the time series signal and interpolation mode information. An interpolation processing unit obtains a gain value at each sample position between two gain sample positions through linear interpolation or non-linear interpolation according to the interpolation mode based on the gain values and the gain inclination values. A gain applying circuit adjusts a gain of the time series signal based on the gain values. The present technology can be applied to a decoding apparatus.

Type: Grant

Filed: March 31, 2023

Date of Patent: December 31, 2024

Assignee: Sony Group Corporation

Inventors: Yuki Yamamoto, Toru Chinen, Hiroyuki Honma, Runyu Shi
Method for transforming audio signal, device, and storage medium

Patent number: 12142287

Abstract: A method for transforming an audio signal comprises obtaining a plurality of segmental original frequency-domain signal segments and a plurality of segmental target frequency-domain signal segments by segmenting and performing a Fourier transform on an original audio signal and an initial target audio signal obtained by pitch shifting on the original audio signal; obtaining a plurality of original formant envelopes by respectively filtering the plurality of segmental original frequency-domain signal segments according to a plurality of original segment window functions, and obtaining a plurality of target formant envelopes by respectively filtering the plurality of segmental target frequency-domain signal segments according to a plurality of target segment window functions; and determining a pitch-shifted audio signal based on the plurality of segmental target frequency-domain signal segments, the plurality of original formant envelopes, and the plurality of target formant envelopes.

Type: Grant

Filed: November 29, 2019

Date of Patent: November 12, 2024

Assignee: BIGO TECHNOLOGY PTE. LTD.

Inventor: Xiaojie Wu
Encoder and decoder

Patent number: 12119843

Abstract: An entropy code encoder includes a register and first, second, third, and fourth arithmetic circuits. The first arithmetic circuit is configured to output, based on an input symbol, a first value corresponding to an appearance frequency of the input symbol and a second value corresponding to a cumulative distribution of the first value. The second arithmetic circuit is configured to output a third value corresponding to division of a value of bits in the register by the first value. The third arithmetic circuit is configured to output a fourth value obtained by adding the second value to a bit-shifted value of the third value, to update a value in the register. The fourth arithmetic circuit is configured to compare the value of upper bits in the register and the first value and output a value of lower bits in the register as a compressed data stream.

Type: Grant

Filed: March 3, 2023

Date of Patent: October 15, 2024

Assignee: Kioxia Corporation

Inventor: Takashi Takemoto
Apparatus, system, and method of neural-network (NN) based active acoustic control (AAC)

Patent number: 12112736

Abstract: For example, a controller of an Active Acoustic Control (AAC) system may be configured to process input information including AAC configuration information, and a plurality of noise inputs representing acoustic noise at a plurality of noise sensing locations. For example, the controller may be configured to process the input information to determine a sound control pattern to control sound within a sound control zone based on the plurality of noise inputs. For example, the controller may include a Neural-Network (NN) trained to generate an NN output based on an NN input, wherein the NN input is based on the AAC configuration information. For example, the controller may be configured to generate the sound control pattern based on the NN output, and to output the sound control pattern to one or more acoustic transducers.

Type: Grant

Filed: June 27, 2023

Date of Patent: October 8, 2024

Assignee: SILENTIUM LTD.

Inventors: Tzvi Fridman, Yochai Edlitz, Noam Bar, Mordehay Elbaz
Low-power auto-correlation antenna selection for multi-antenna system

Patent number: 12088392

Abstract: Systems and methods for low-power auto-correlation antenna selection for multi-antenna systems are disclosed. In particular, a computing device, such as an Internet of Things (IoT) computing device, may include a transceiver operating with multiple antennas. For example, the computing device may operate under a low-power wireless standard such as Long Range BLUETOOTH LOW ENERGY (LR BLE). In an exemplary aspect, an antenna from amongst the multiple antennas may be selected based on which antenna is receiving a best copy of a periodic signal. The periodic signal is likely indicative of a preamble pattern and, as such, may be used to activate a cross-correlation circuit for signal detection confirmation. Power consumption is reduced by delaying activation of the cross-correlation circuit until a likely signal is detected by detection of the periodic signal.

Type: Grant

Filed: February 6, 2023

Date of Patent: September 10, 2024

Assignee: Qorvo US, Inc.

Inventor: Andrew Fort
Audio coding of tonal components with a spectrum reservation flag

Patent number: 12062379

Abstract: An audio coding method includes obtaining a current frame that includes a high-frequency band signal and a low-frequency band signal; performing first coding on the high-frequency band signal and the low-frequency band signal to obtain a first coding parameter; determining a spectrum reservation flag of each frequency bin of the high-frequency band signal, where the spectrum reservation flag indicates whether a first spectrum corresponding to the frequency bin is reserved in a second spectrum corresponding to the frequency bin; and performing second coding on the high-frequency band signal based on the spectrum reservation flag of each frequency bin of the high-frequency band signal to obtain a second coding parameter, where the second coding parameter indicates information about a target tonal component of the high-frequency band signal.

Type: Grant

Filed: November 30, 2022

Date of Patent: August 13, 2024

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Bingyin Xia, Jiawei Li, Zhe Wang
Emotion adjustment system and emotion adjustment method

Patent number: 12014735

Abstract: An emotion adjustment system for determining a user's emotions based on a user's voice includes: a microphone configured to receive the user's voice; a controller configured to extract a plurality of sound quality factors in response to processing the user's voice, calculate a depression index of the user based on at least one sound quality factor among the plurality of sound quality factors, identify an emotional state of the user as a depressive state when the depression index is a preset value or more, determine the depressive state as a first state or a second state based on a correlation between at least two sound quality factors among the plurality of sound quality factors, and transmit a control command corresponding to the emotional state of the user identified as the first state or the second state; and a feedback device configured to perform an operation corresponding to the control command.

Type: Grant

Filed: July 30, 2021

Date of Patent: June 18, 2024

Assignees: Hyundai Motor Company, Kia Corporation

Inventors: Ki Chang Kim, Dong Chul Park, Tae Kun Yun, Jin Sung Lee
Methods, encoder and decoder for handling envelope representation coefficients

Patent number: 11990145

Abstract: A method performed by an encoder. The method comprises determining envelope representation residual coefficients as first compressed envelope representation coefficients subtracted from the input envelope representation coefficients. The method comprises transforming the envelope representation residual coefficients into a warped domain so as to obtain transformed envelope representation residual coefficients. The method comprises applying, at least one of a plurality of gain-shape coding schemes on the transformed envelope representation residual coefficients in order to achieve gain-shape coded envelope representation residual coefficients, where the plurality of gain-shape coding schemes have mutually different trade-offs in one or more of gain resolution and shape resolution for one or more of the transformed envelope representation residual coefficients.

Type: Grant

Filed: August 22, 2022

Date of Patent: May 21, 2024

Assignee: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)

Inventors: Jonas Svedberg, Stefan Bruhn, Martin Sehlstedt
Crowd-sourced technique for pitch track generation

Patent number: 11900904

Abstract: Digital signal processing and machine learning techniques can be employed in a vocal capture and performance social network to computationally generate vocal pitch tracks from a collection of vocal performances captured against a common temporal baseline such as a backing track or an original performance by a popularizing artist. In this way, crowd-sourced pitch tracks may be generated and distributed for use in subsequent karaoke-style vocal audio captures or other applications. Large numbers of performances of a song can be used to generate a pitch track. Computationally determined pitch trackings from individual audio signal encodings of the crowd-sourced vocal performance set are aggregated and processed as an observation sequence of a trained Hidden Markov Model (HMM) or other statistical model to produce an output pitch track.

Type: Grant

Filed: February 14, 2022

Date of Patent: February 13, 2024

Assignee: Smule, Inc.

Inventors: Stefan Sullivan, John Shimmin, Dean Schaffer, Perry R. Cook
Voice processing method, apparatus, and device and storage medium

Patent number: 11900954

Abstract: A voice processing method includes: determining a historical voice frame corresponding to a target voice frame; determining a frequency-domain characteristic of the historical voice frame; invoking a network model to predict the frequency-domain characteristic of the historical voice frame, to obtain a parameter set of the target voice frame, the parameter set including a plurality of types of parameters, the network model including a plurality of neural networks (NNs), and a number of the types of the parameters in the parameter set being determined according to a number of the NNs; and reconstructing the target voice frame according to the parameter set.

Type: Grant

Filed: March 24, 2022

Date of Patent: February 13, 2024

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Wei Xiao, Meng Wang, Shidong Shang, Zurong Wu
Conversational agent response determined using a sentiment

Patent number: 11900938

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for handing off a user conversation between computer-implemented agents. One of the methods includes receiving, by a computer-implemented agent specific to a user device, a digital representation of speech encoding an utterance, determining, by the computer-implemented agent, that the utterance specifies a requirement to establish a communication with another computer-implemented agent, and establishing, by the computer-implemented agent, a communication between the other computer-implemented agent and the user device.

Type: Grant

Filed: July 18, 2022

Date of Patent: February 13, 2024

Assignee: GOOGLE LLC

Inventors: Johnny Chen, Thomas L. Dean, Qiangfeng Peter Lau, Sudeep Gandhe, Gabriel Schine
Mental health diagnostics using audio data

Patent number: 11887622

Abstract: The present disclosure generally relates to a system and method for obtaining a diagnosis of a mental health condition. An exemplary system can receive an audio input; convert the audio input into a text string; identify a speaker associated with the text string; based on at least a portion of the audio input, determine a predefined audio characteristic of a plurality of predefined audio characteristics; based on the determined audio characteristic, identify an emotion corresponding to the portion of the audio input; generate a set of structured data based on the text string, the speaker, the predefined audio characteristic, and the identified emotion; and provide an output for obtaining the diagnosis of the mental disorder or condition, wherein the output is indicative of at least a portion of the set of structured data.

Type: Grant

Filed: September 12, 2019

Date of Patent: January 30, 2024

Assignee: United States Department of Veteran Affairs

Inventors: Qian Hu, Brian P. Marx, Patricia D. King, Seth-David Donald Dworman, Matthew E. Coarr, Keith A. Crouch, Stelios Melachrinoudis, Cheryl Clark, Terence M. Keane
Automatic dubbing method and apparatus

Patent number: 11887578

Abstract: A method and system for automatic dubbing method is disclosed, comprising, responsive to receiving a selection of media content for playback on a user device by a user of the user device, processing extracted speeches of a first voice from the media content to generate replacement speeches using a set of phenomes of a second voice of the user of the user device, and replacing the extracted speeches of the first voice with the generated replacement speeches in the audio portion of the media content for playback on the user device.

Type: Grant

Filed: November 10, 2022

Date of Patent: January 30, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Henry Gabryjelski, Jian Luan, Dapeng Li
Periodic-combined-envelope-sequence generation device, periodic-combined-envelope-sequence generation method, periodic-combined-envelope-sequence generation program and recording medium

Patent number: 11848021

Abstract: An envelope sequence is provided that can improve approximation accuracy near peaks caused by the pitch period of an audio signal. A periodic-combined-envelope-sequence generation device according to the present invention takes, as an input audio signal, a time-domain audio digital signal in each frame, which is a predetermined time segment, and generates a periodic combined envelope sequence as an envelope sequence. The periodic-combined-envelope-sequence generation device according to the present invention comprises at least a spectral-envelope-sequence calculating part and a periodic-combined-envelope generating part. The spectral-envelope-sequence calculating part calculates a spectral envelope sequence of the input audio signal on the basis of time-domain linear prediction of the input audio signal.

Type: Grant

Filed: September 29, 2022

Date of Patent: December 19, 2023

Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventors: Takehiro Moriya, Yutaka Kamamoto, Noboru Harada
Sound processing method, sound processing apparatus, and recording medium

Patent number: 11842719

Abstract: A sound processing method obtains note data representative of a note; obtains an audio signal to be processed; specifies, in accordance with the note, an expression sample representative of a sound expression to be imparted to the note and an expression period, of the audio signal, to which the sound expression is to be imparted to the note; and specifies, in accordance with the expression sample and the expression period, a processing parameter relating to an expression imparting processing for imparting the sound expression to a portion corresponding to the expression period in the audio signal. The method then generates a processed audio signal by performing the expression imparting processing in accordance with the expression sample, the expression period, and the processing parameter to the audio signal.

Type: Grant

Filed: September 21, 2020

Date of Patent: December 12, 2023

Assignee: YAMAHA CORPORATION

Inventors: Merlijn Blaauw, Jordi Bonada, Ryunosuke Daido, Yuji Hisaminato
Augmentation of audiographic images for improved machine learning

Patent number: 11816577

Abstract: Generally, the present disclosure is directed to systems and methods that generate augmented training data for machine-learned models via application of one or more augmentation techniques to audiographic images that visually represent audio signals. In particular, the present disclosure provides a number of novel augmentation operations which can be performed directly upon the audiographic image (e.g., as opposed to the raw audio data) to generate augmented training data that results in improved model performance. As an example, the audiographic images can be or include one or more spectrograms or filter bank sequences.

Type: Grant

Filed: September 28, 2021

Date of Patent: November 14, 2023

Assignee: GOOGLE LLC

Inventors: Daniel Sung-Joon Park, Quoc Le, William Chan, Ekin Dogus Cubuk, Barret Zoph, Yu Zhang, Chung-Cheng Chiu
Computer-implemented method for speech synthesis, computer device, and non-transitory computer readable storage medium

Patent number: 11763796

Abstract: A computer-implemented method for speech synthesis, a computer device, and a non-transitory computer readable storage medium are provided. The method includes: obtaining a speech text to be synthesized; obtaining a Mel spectrum corresponding to the speech text to be synthesized according to the speech text to be synthesized; inputting the Mel spectrum into a complex neural network, and obtaining a complex spectrum corresponding to the speech text to be synthesized, wherein the complex spectrum comprises real component information and imaginary component information; and obtaining a synthetic speech corresponding to the speech text to be synthesized, according to the complex spectrum. The method can efficiently and simply complete speech synthesis.

Type: Grant

Filed: December 10, 2020

Date of Patent: September 19, 2023

Assignee: UBTECH ROBOTICS CORP LTD

Inventors: Dongyan Huang, Leyuan Sheng, Youjun Xiong
Self-supervised pitch estimation

Patent number: 11756530

Abstract: Example embodiments relate to techniques for training artificial neural networks or oilier machine-learning encoders to accurately predict the pitch of input audio samples in a semitone or otherwise logarithmically-scaled pitch space. An example method may include generating, from a sample of audio data, two training samples by applying two different pitch shifts to the sample of audio training data. This can be done by converting the sample of audio data into the frequency domain and then shifting the transformed data. These known shifts are then compared to the predicted pitches generated by applying the two training samples to the encoder. The encoder is then updated based on the comparison, such that the relative pitch output by the encoder is improved with respect to accuracy. One or more audio samples, labeled with absolute pitch values, can then be used to calibrate the relative pitch values generated by the trained encoder.

Type: Grant

Filed: September 25, 2020

Date of Patent: September 12, 2023

Assignee: Google LLC

Inventors: Marco Tagliasacchi, Mihajlo Velimirovic, Matthew Sharifi, Dominik Roblek, Christian Frank, Beat Gfeller
High resolution audio coding for improving package loss concealment

Patent number: 11749290

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing long-term prediction (LTP) are described. One example of the methods includes determining a pitch gain and a pitch lag of an input audio signal for at least a predetermined number of frames. It is determined that the pitch gain of the input audio signal has exceeded a predetermined threshold and that a change of the pitch lag of the input audio signal has been within a predetermined range for at least the predetermined number of frames. In response to determining that the pitch gain of the input audio signal has exceeded the predetermined threshold and that the change of the third pitch lag has been within the predetermined range for at least the predetermined number of frames, a pitch gain is set for a current frame of the input audio signal.

Type: Grant

Filed: July 12, 2021

Date of Patent: September 5, 2023

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventor: Yang Gao
Systems and methods for deriving expression of intent from recorded speech

Patent number: 11727922

Abstract: A computerized system for deriving expression of intent from recorded speech includes: a text classification module comparing a transcription of recorded speech against a text classifier to generate a first set of representations of potential intents; a phonetics classification module comparing a phonetic transcription of the recorded speech against a phonetics classifier to generate a second set of representations; an audio classification module comparing an audio version of the recorded speech with an audio classifier to generate a third set of representations; and a discriminator module for receiving the first, second and third sets of the representations of potential intents and generating one derived expression of intent by processing the first, second and third sets together; where at least two of the text classification module, the phonetics classification module, and the audio classification module are asynchronous processes from one another.

Type: Grant

Filed: May 11, 2021

Date of Patent: August 15, 2023

Assignee: Verint Americas Inc.

Inventor: Moshe Villaizan
Speech encoding using a pre-encoded database

Patent number: 11710492

Abstract: Methods, systems, and devices for encoding are described. A device, which may be otherwise known as user equipment (UE), may support standards-compatible audio encoding (e.g., speech encoding) using a pre-encoded database. The device may receive a digital representation of an audio signal and identify, based on receiving the digital representation of the audio signal, a database that is pre-encoded according to a coding standard and that includes a quantity of digital representations of other audio signals. The device may encode the digital representation of the audio signal using a machine learning scheme and information from the database pre-encoded according to the coding standard. The device may generate a bitstream of the digital representation that is compatible with the coding standard based on encoding the digital representation of the audio signal, and output a representation of the bitstream.

Type: Grant

Filed: October 2, 2019

Date of Patent: July 25, 2023

Assignee: QUALCOMM Incorporated

Inventors: Stephane Pierre Villette, Daniel Jared Sinder
Audio-based detection and tracking of emergency vehicles

Patent number: 11711648

Abstract: Techniques are provided for audio-based detection and tracking of an acoustic source. A methodology implementing the techniques according to an embodiment includes generating acoustic signal spectra from signals provided by a microphone array, and performing beamforming on the acoustic signal spectra to generate beam signal spectra, using time-frequency masks to reduce noise. The method also includes detecting, by a deep neural network (DNN) classifier, an acoustic event, associated with the acoustic source, in the beam signal spectra. The DNN is trained on acoustic features associated with the acoustic event. The method further includes performing pattern extraction, in response to the detection, to identify time-frequency bins of the acoustic signal spectra that are associated with the acoustic event, and estimating a motion direction of the source relative to the array of microphones based on Doppler frequency shift of the acoustic event calculated from the time-frequency bins of the extracted pattern.

Type: Grant

Filed: March 10, 2020

Date of Patent: July 25, 2023

Assignee: Intel Corporation

Inventors: Kuba Lopatka, Adam Kupryjanow, Lukasz Kurylo, Karol Duzinkiewicz, Przemyslaw Maziewski, Marek Zabkiewicz
Methods and systems for transcription of audio data

Patent number: 11640824

Abstract: Systems, devices, and methods transcribe words recorded in audio data. A computer-generated transcript is provided. The transcript comprises records for each word in the computer-generated transcript. At least one confirmation input is received for each record. The at least one confirmation input modifies a selected record and automatically identifies a next record for receiving a next confirmation input. A sequence of confirmation inputs may rapidly modify and validate each record in a sequence of records in the computer-generated transcript. A validated transcript is generated from the modified records and is provided from an evidence management system.

Type: Grant

Filed: July 15, 2020

Date of Patent: May 2, 2023

Assignee: Axon Enterprise, Inc.

Inventors: Noah Spitzer-Williams, Choongyeun Cho, Thomas Crosley, Zachary Charles Goist, Daniel Michael Bellia, Vinh Hein Nguyen, Chelsea Alexander-Taylor
Method for processing audio and electronic device

Patent number: 11636836

Abstract: Provided is a method for processing audio including: acquiring an accompaniment audio signal and a voice signal of a current to-be-processed musical composition; determining a target reverberation intensity parameter value of the acquired accompaniment audio signal, wherein the target reverberation intensity parameter value is configured to indicate a rhythm speed, an accompaniment type, and a performance score of a singer of the current to-be-processed musical composition; and reverberating the acquired vocal signal based on the target reverberation intensity parameter value.

Type: Grant

Filed: March 23, 2022

Date of Patent: April 25, 2023

Assignee: BEIJING DAJIA INTERNET INFORMATION TECHNOLOGY CO., LTD.

Inventors: Xiguang Zheng, Chen Zhang
Methods and devices for vector segmentation for coding

Patent number: 11621725

Abstract: A method for partitioning of input vectors for coding is presented. The method comprises obtaining of an input vector. The input vector is segmented, in a non-recursive manner, into an integer number, NSEG, of input vector segments. A representation of a respective relative energy difference between parts of the input vector on each side of each boundary between the input vector segments is determined, in a recursive manner. The input vector segments and the representations of the relative energy differences are provided for individual coding. Partitioning units and computer programs for partitioning of input vectors for coding, as well as positional encoders, are presented.

Type: Grant

Filed: January 11, 2022

Date of Patent: April 4, 2023

Assignee: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)

Inventors: Tomas Jansson Toftgård, Volodya Grancharov, Jonas Svedberg
Methods and systems for providing changes to a live voice stream

Patent number: 11610577

Abstract: Methods and Systems for providing a change to a voice interacting with a user are described. Information indicating a change that can be made to the voice can be received. The voice can be changed based on the information.

Type: Grant

Filed: November 19, 2020

Date of Patent: March 21, 2023

Assignee: Capital One Services, LLC

Inventors: Anh Truong, Mark Watson, Jeremy Goodsitt, Vincent Pham, Fardin Abdi Taghi Abad, Kate Key, Austin Walters, Reza Farivar
Dialog device, dialog method, and dialog computer program

Patent number: 11605377

Abstract: The dialog device according to the present invention includes a prediction unit 254 configured to predict an utterance length attribute of a user utterance in response to a the machine utterance, a selection unit 256 configured to use the utterance length attribute to select, as a feature model for usage in an end determination of the user utterance, at least one of an acoustic feature model or a lexical feature model, and an estimation unit 258 configured to estimate an end point in the user utterance using the selected model. By using this dialog device, it is possible to shorten the waiting time until a response is output to a user utterance by a machine, and to realize a more natural conversation between a user and a machine.

Type: Grant

Filed: March 19, 2020

Date of Patent: March 14, 2023

Assignee: HITACHI, LTD.

Inventors: Amalia Istiqlali Adiba, Takeshi Homma, Takashi Sumiyoshi
Audiovisual collaboration method with latency management for wide-area broadcast

Patent number: 11553235

Abstract: Techniques have been developed to facilitate the livestreaming of group audiovisual performances. Audiovisual performances including vocal music are captured and coordinated with performances of other users in ways that can create compelling user and listener experiences. For example, in some cases or embodiments, duets with a host performer may be supported in a sing-with-the-artist style audiovisual livestream in which aspiring vocalists request or queue particular songs for a live radio show entertainment format. The developed techniques provide a communications latency-tolerant mechanism for synchronizing vocal performances captured at geographically-separated devices (e.g., at globally-distributed, but network-connected mobile phones or tablets or at audiovisual capture devices geographically separated from a live studio).

Type: Grant

Filed: June 7, 2021

Date of Patent: January 10, 2023

Assignee: Smule, Inc.

Inventors: Anton Holmberg, Benjamin Hersh, Jeannie Yang, Perry R. Cook, Jeffrey C. Smith
Information processing device, information processing system, and information processing method, and program

Patent number: 11545153

Abstract: Provided is a device, a method that allow a remote terminal to perform a process on the basis of a local-terminal-side user utterance. There are a local terminal and a remote terminal. The local terminal performs a process of a semantic analysis of a user utterance input into the local terminal. On the basis of a result of the semantic analysis, the local terminal determines whether or not the user utterance is a request to the remote terminal for a process. Moreover, in a case where the user utterance is a request to the remote terminal for a process, the local terminal transmits the result of the semantic analysis by a semantic-analysis part to the remote terminal. The remote terminal receives the result of the semantic analysis of the local-terminal-side user utterance, and performs a process based on the received result of the semantic analysis of the local-terminal-side user utterance.

Type: Grant

Filed: March 12, 2019

Date of Patent: January 3, 2023

Assignee: Sony Corporation

Inventor: Keiichi Yamada
Fast playback in media files with reduced impact to speech quality

Patent number: 11488620

Abstract: The present invention is a computer program product and method for increasing the playback speed of audio or other media files. The computer program product and method identifies pedagogic media files and adds a flag to the metadata of the media file. The flag represents the number and type of pauses or silent sections in the pedagogic media file. Based on the flag, the computer program product and method may fast forward or remove a portion of the pauses and silent sections to provide a new playback speed.

Type: Grant

Filed: June 12, 2019

Date of Patent: November 1, 2022

Assignee: International Business Machines Corporation

Inventor: Deepa Jain
Phase reconstruction in a speech decoder

Patent number: 11443751

Abstract: Innovations in phase quantization during speech encoding and phase reconstruction during speech decoding are described. For example, to encode a set of phase values, a speech encoder omits higher-frequency phase values and/or represents at least some of the phase values as a weighted sum of basis functions. Or, as another example, to decode a set of phase values, a speech decoder reconstructs at least some of the phase values using a weighted sum of basis functions and/or reconstructs lower-frequency phase values then uses at least some of the lower-frequency phase values to synthesize higher-frequency phase values. In many cases, the innovations improve the performance of a speech codec in low bitrate scenarios, even when encoded data is delivered over a network that suffers from insufficient bandwidth or transmission quality problems.

Type: Grant

Filed: February 12, 2021

Date of Patent: September 13, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Soren Skak Jensen, Sriram Srinivasan, Koen Bernard Vos
Real-time pitch tracking by detection of glottal excitation epochs in speech signal using Hilbert envelope

Patent number: 11443761

Abstract: A technique, suitable for real-time processing, is disclosed for pitch tracking by detection of glottal excitation epochs in speech signal. It uses Hilbert envelope to enhance saliency of the glottal excitation epochs and to reduce the ripples due to the vocal tract filter. The processing comprises the steps of dynamic range compression, calculation of the Hilbert envelope, and epoch marking. The Hilbert envelope is calculated using the output of a FIR filter based Hilbert transformer and the delay-compensated signal. The epoch marking uses a dynamic peak detector with fast rise and slow fall and nonlinear smoothing to further enhance the saliency of the epochs, followed by a differentiator or a Teager energy operator, and amplitude-duration thresholding. The technique is meant for use in speech codecs, voice conversion, speech and speaker recognition, diagnosis of voice disorders, speech training aids, and other applications involving pitch estimation.

Type: Grant

Filed: August 3, 2019

Date of Patent: September 13, 2022

Inventors: Prem Chand Pandey, Hirak Dasgupta, Nataraj Kathriki Shambulingappa
Method and apparatus for detecting a voice activity in an input audio signal

Patent number: 11430461

Abstract: A method for detecting a voice activity in an input audio signal composed of frames includes that a noise characteristic of the input signal is determined based on a received frame of the input audio signal. A voice activity detection (VAD) parameter is derived based on the noise characteristic of the input audio signal using an adaptive function. The derived VAD parameter is compared with a threshold value to provide a voice activity detection decision. The input audio signal is processed according to the voice activity detection decision.

Type: Grant

Filed: September 21, 2020

Date of Patent: August 30, 2022

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventor: Zhe Wang
Conversational agent response determined using a sentiment

Patent number: 11423902

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for handing off a user conversation between computer-implemented agents. One of the methods includes receiving, by a computer-implemented agent specific to a user device, a digital representation of speech encoding an utterance, determining, by the computer-implemented agent, that the utterance specifies a requirement to establish a communication with another computer-implemented agent, and establishing, by the computer-implemented agent, a communication between the other computer-implemented agent and the user device.

Type: Grant

Filed: July 27, 2020

Date of Patent: August 23, 2022

Assignee: GOOGLE LLC

Inventors: Johnny Chen, Thomas L. Dean, Qiangfeng Peter Lau, Sudeep Gandhe, Gabriel Schine
System and method for pulmonary condition monitoring and analysis

Patent number: 11380351

Abstract: A method for pulmonary condition monitoring includes selecting a phrase from an utterance of a user of an electronic device, wherein the phrase matches an entry of multiple phrases. At least one speech feature that is associated with one or more pulmonary conditions within the phrase is identified. A pulmonary condition is determined based on analysis of the at least one speech feature.

Type: Grant

Filed: January 14, 2019

Date of Patent: July 5, 2022

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Ebrahim Nematihosseinabadi, Md Mahbubur Rahman, Viswam Nathan, Korosh Vatanparvar, Jilong Kuang, Jun Gao
Dialect based speaker identification

Patent number: 11348591

Abstract: A speaker identification system and method to identify a speaker based on the speaker's voice is disclosed. In an exemplary embodiment, the speaker identification system comprises a Gaussian Mixture Model (GMM) for speaker accent and dialect identification for a given speech signal input by the speaker and an Artificial Neural Network (ANN) to identify the speaker based on the identified dialect, in which the output of the GMM is input to the ANN.

Type: Grant

Filed: September 23, 2021

Date of Patent: May 31, 2022

Assignee: King Abdulaziz University

Inventors: Muhammad Moinuddin, Ubaid M. Al-Saggaf, Shahid Munir Shah, Rizwan Ahmed Khan, Zahraa Ubaid Al-Saggaf
Transmitting apparatus, receiving apparatus, transmitting method, and receiving method

Patent number: 11303489

Abstract: A transmitting apparatus includes a first signal generating unit that generates, on the basis of data a first signal transmitted by single carrier block transmission; a second signal generating unit that generates, on the basis of an RS, a second signal transmitted by orthogonal frequency division multiplex transmission; a switching operator that selects and outputs the second signal in a first transmission period and selects and outputs the first signal in a second transmission period; an antenna that transmits the signal output from the switching operator; and a control-signal generating unit that controls the second signal generating unit such that, in the first transmission period, the RS is arranged in a frequency band allocated for transmission of the RS from the transmitting apparatus among frequency bands usable in OFDM.

Type: Grant

Filed: January 30, 2020

Date of Patent: April 12, 2022

Assignee: Mitsubishi Electric Corporation

Inventors: Fumihiro Hasegawa, Akinori Taira
Voice generation based on characteristics of an avatar

Patent number: 11289067

Abstract: Methods and systems for generating voices based on characteristics of an avatar. One or more characteristics of an avatar are obtained and one or more parameters of a voice synthesizer for generating a voice corresponding to the one or more avatar characteristics are determined. The voice synthesizer is configured based on the one or more parameters and a voice is generated using the parameterized voice synthesizer.

Type: Grant

Filed: June 25, 2019

Date of Patent: March 29, 2022

Assignee: International Business Machines Corporation

Inventors: Kristina Marie Brimijoin, Gregory Boland, Joseph Schwarz
Systems and methods for intelligent playback

Patent number: 11282534

Abstract: Systems and methods for intelligent playback of media content may include an intelligent media playback system that, in response to determining the speech tempo in audio content by measuring syllable density of speech in the audio content, automatically adjusts a playback speed of the audio content as the audio content is being played based on the determined speech tempo. In some embodiments, the system may automatically and dynamically adjust the playback speed to result in a desired target speech tempo. In addition, the system may determine whether to automatically adjust playback speed of the audio content, as the media is being played, based on the detected speech tempo of the speech in the audio content and the determined type of content of media. Such automatic adjustments in playback speed result in more efficient playback of the audio content.

Type: Grant

Filed: August 3, 2018

Date of Patent: March 22, 2022

Assignee: Sling Media PVT Ltd

Inventors: Yatish Jayant Naik Raikar, Varunkumar Tripathi, Karthik Mahabaleshwar Hegde
Method and device for efficiently distributing a bit-budget in a CELP codec

Patent number: 11276412

Abstract: A method and device allocates a bit-budget to a plurality of first parts of a CELP core module of (a) an encoder for encoding a sound signal or (b) a decoder for decoding the sound signal. In the method and device, bit-budget allocation tables assign, for each of a plurality of intermediate bit rates, respective bit-budgets to the first CELP core module parts. A CELP core module bit rate is determined and one of the intermediate bit rates is selected based on the determined CELP core module bit rate. The respective bit-budgets assigned by the bit-budget allocation tables for the selected intermediate bit rate are allocated to the first CELP core module parts.

Type: Grant

Filed: September 20, 2018

Date of Patent: March 15, 2022

Assignee: VOICEAGE CORPORATION

Inventor: Vaclav Eksler
Systems and methods of pre-processing of speech signals for improved speech recognition

Patent number: 11270721

Abstract: Pre-processing systems, methods of pre-processing, and speech processing systems for improved Automated Speech Recognition are provided. Some pre-processing systems for improved speech recognition of a speech signal are provided, which systems comprise a pitch estimation circuit; and a pitch equalization processor. The pitch estimation circuit is configured to receive the speech signal to determine a pitch index of the speech signal, and the pitch equalization processor is configured to receive the speech signal and pitch information, to equalize a speech pitch of the speech signal using the pitch information, and to provide a pitch-equalized speech signal.

Type: Grant

Filed: May 21, 2019

Date of Patent: March 8, 2022

Assignee: PLANTRONICS, INC.

Inventors: Youhong Lu, Arun Rajasekaran
Language-based content recommendations using closed captions

Patent number: 11270071

Abstract: Systems, apparatuses, and methods are described herein for providing language-level content recommendations to users based on an analysis of closed captions of content viewed by the users and other data. Language-level analysis of content viewed by a user may be performed to generate metrics that are associated with the user. The metrics may be used to provide recommendations for content, which may include advertising, that is closely aligned with the user's interests.

Type: Grant

Filed: December 28, 2017

Date of Patent: March 8, 2022

Assignee: Comcast Cable Communications, LLC

Inventor: Richard Walsh
Self-service terminal (SST) maintenance and support processing

Patent number: 11263876

Abstract: Data is collected for Self-Service Terminals (SSTs) including tallies, events, and outcomes associated with servicing the SSTs. Statistical correlations are derived from the tallies and events with respect to the outcomes. Subsequent collected data is processed with the statistical correlations and a probability for a failure of a component or a part of the component associated with a particular SST is reported for servicing the component or part before the failure.

Type: Grant

Filed: September 28, 2017

Date of Patent: March 1, 2022

Assignee: NCR Corporation

Inventors: Claudio Cifarelli, Gardiner Arthur, Iain M. N. Cowan, Massimo Mastropietro, Callum Ellis Morton

1 2 3 4 5 … next