Patents Examined by Bhavesh M. Mehta
  • Patent number: 11538487
    Abstract: The disclosure discloses a voice signal enhancing method and device, which divide a voice signal at the present scene into multiple frame signals based on a preset time interval; feed multiple frame signals into a trained neural network based on a preset step size, perform convolution operations on multiple frame signals through skip-connected convolutional layers to obtain multiple enhanced frame signals; superpose each enhanced frame signal according to the time domain of each enhanced frame signal to obtain an enhanced voice signal. Compared with the prior art, the present disclosure automatically enhances voice signals through the neural network without manual interference, so the effects and the application scenes of voice enhancement is not necessary to be limited by the preset method and method designers, thereby reducing the occurrence frequency of signal distortion and extra noises, which in turn improves the effects of the voice signal enhancement.
    Type: Grant
    Filed: March 31, 2020
    Date of Patent: December 27, 2022
    Assignee: YEALINK (XIAMEN) NETWORK TECHNOLOGY CO., LTD.
    Inventors: Wanjian Feng, Lianchang Zhang, Jiantao Liu
  • Patent number: 11538461
    Abstract: Some implementations include methods for detecting missing subtitles associated with a media presentation and may include receiving an audio component and a subtitle component associated with a media presentation, the audio component including an audio sequence, the audio sequence divided into a plurality of audio segments; evaluating the plurality of audio segments using a combination of a recurrent neural network and a convolutional neural network to identify refined speech segments associated with the audio sequence, the recurrent neural network trained based on a plurality of languages, the convolutional neural network trained based on a plurality of categories of sound; determining timestamps associated with the identified refined speech segments; and determining missing subtitles based on the timestamps associated with the identified refined speech segments and timestamps associated with subtitles included in the subtitle component.
    Type: Grant
    Filed: March 18, 2021
    Date of Patent: December 27, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Honey Gupta, Mayank Sharma
  • Patent number: 11530930
    Abstract: A transportation vehicle having a navigation system and an operating system connected to the navigation system for data transmission via a bus system. The transportation vehicle has a microphone and includes a phoneme generation module for generating phonemes from an acoustic voice signal or the output signal of the microphone; the phonemes are part of a predefined selection of exclusively monosyllabic phonemes; and a phoneme-to-grapheme module for generating inputs to operate the transportation vehicle based on monosyllabic phonemes generated by the phoneme generation module.
    Type: Grant
    Filed: September 12, 2018
    Date of Patent: December 20, 2022
    Assignee: VOLKSWAGEN AKTIENGESELLSCHAFT
    Inventors: Okko Buss, Mark Pleschka
  • Patent number: 11521596
    Abstract: An artificial intelligence (“AI”) system for tuning a machine learning interactive voice response system is provided. The AI system may perform analysis of outputs generated by the machine learning models. The AI system may determine an expected model output for a given test input. The AI system may determine accuracy, precision and recall scores for an actual output garneted in response to the test input. The system may determine performance metrics for interim outputs generated by individual machine learning models within the interactive voice response system. The AI system may replace malfunctioning models with replacement models.
    Type: Grant
    Filed: August 14, 2020
    Date of Patent: December 6, 2022
    Assignee: Bank of America Corporation
    Inventors: Bharathiraja Krishnamoorthy, Emad Noorizadeh, Ravisha Andar
  • Patent number: 11521638
    Abstract: An audio event detection method including performing a framing processing on an audio to obtain audio data for each time period in the audio and extracting a specified feature vector from the audio data of each time period; inputting the specified feature vector of the audio data to a Recurrent Neural Network/Bidirectional Recurrent Neural Network (RNN/BI-RNN) model, to obtain a posterior probability of each pre-set audio event in the audio data of each time period; obtaining, for each time period, a target audio event of the audio data according to the posterior probability of each audio event in the audio data and a pre-set audio decoding algorithm; and extracting an optimal audio data sequence of the target audio event from the audio data of each time period.
    Type: Grant
    Filed: November 1, 2019
    Date of Patent: December 6, 2022
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LTD
    Inventor: Haibo Liu
  • Patent number: 11521595
    Abstract: A method for training a speech recognition model with a loss function includes receiving an audio signal including a first segment corresponding to audio spoken by a first speaker, a second segment corresponding to audio spoken by a second speaker, and an overlapping region where the first segment overlaps the second segment. The overlapping region includes a known start time and a known end time. The method also includes generating a respective masked audio embedding for each of the first and second speakers. The method also includes applying a masking loss after the known end time to the respective masked audio embedding for the first speaker when the first speaker was speaking prior to the known start time, or applying the masking loss prior to the known start time when the first speaker was speaking after the known end time.
    Type: Grant
    Filed: May 1, 2020
    Date of Patent: December 6, 2022
    Assignee: Google LLC
    Inventors: Anshuman Tripathi, Han Lu, Hasim Sak
  • Patent number: 11514893
    Abstract: Techniques performed by a data processing system for processing voice content received from a user herein include receiving a first audio input from a user comprising spoken content, analyzing the first audio input using one or more natural language processing models to produce a first textual output comprising a textual representation of the first audio input, analyzing the first textual output using one or more machine learning models to determine first context information of the first textual output, and processing the first textual output in the application based on the first context information.
    Type: Grant
    Filed: March 13, 2020
    Date of Patent: November 29, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Erez Kikin-Gil, Emily Tran, Benjamin David Smith, Alan Liu, Erik Thomas Oveson
  • Patent number: 11508385
    Abstract: Disclosed is a method of processing a residual signal for audio coding and an audio coding apparatus. The method learns a feature map of a reference signal through a residual signal learning engine including a convolutional layer and a neural network and performs learning based on a result obtained by mapping a node of an output layer of the neural network and a quantization level of index of the residual signal.
    Type: Grant
    Filed: November 18, 2019
    Date of Patent: November 22, 2022
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Seung Kwon Beack, Jongmo Sung, Mi Suk Lee, Tae Jin Lee, Hui Yong Kim
  • Patent number: 11508389
    Abstract: To provide an audio signal processing apparatus, an audio signal processing system, and an audio signal processing method that include: a first converting part that converts an input data sequence of an audio signal into frequency data using an IIR system DFT at each processing timing, a window processing part that performs window processing on the frequency data using a window function, a signal processing part that performs predetermined signal processing on the frequency data on which window processing has been performed, and a second converting part that converts the frequency data, on which the signal processing has been performed, into a time-axis data sequence.
    Type: Grant
    Filed: February 11, 2021
    Date of Patent: November 22, 2022
    Assignee: Audio-Technica Corporation
    Inventor: Ariisa Wada
  • Patent number: 11495234
    Abstract: A data mining device, and a speech recognition method and system using the same are disclosed. The speech recognition method includes selecting speech data including a dialect from speech data, analyzing and refining the speech data including a dialect, and learning an acoustic model and a language model through an artificial intelligence (AI) algorithm using the refined speech data including a dialect. The user is able to use a dialect speech recognition service which is improved using services such as eMBB, URLLC, or mMTC of 5G mobile communications.
    Type: Grant
    Filed: May 30, 2019
    Date of Patent: November 8, 2022
    Assignee: LG Electronics Inc.
    Inventors: Jee Hye Lee, Seon Yeong Park
  • Patent number: 11462228
    Abstract: A speech intelligibility calculating method is a method executed by a speech intelligibility calculating apparatus, the speech intelligibility calculating method including: a speech intelligibility calculating step of calculating a speech intelligibility that is an objective assessment index of a speech quality, based on a difference component between features found through an analysis of an input clean speech and an input enhanced speech, using one or more filter banks; and a step of outputting the speech intelligibility calculated at the speech intelligibility calculating step. This speech intelligibility calculating method is capable of calculating a speech intelligibility without any dependency on a speech enhancement method.
    Type: Grant
    Filed: August 3, 2018
    Date of Patent: October 4, 2022
    Assignees: NIPPON TELEGRAPH AND TELEPHONE CORPORATION, WAKAYAMA UNIVERSITY
    Inventors: Shoko Araki, Tomohiro Nakatani, Keisuke Kinoshita, Toshio Irino, Toshie Matsui, Katsuhiko Yamamoto
  • Patent number: 11443761
    Abstract: A technique, suitable for real-time processing, is disclosed for pitch tracking by detection of glottal excitation epochs in speech signal. It uses Hilbert envelope to enhance saliency of the glottal excitation epochs and to reduce the ripples due to the vocal tract filter. The processing comprises the steps of dynamic range compression, calculation of the Hilbert envelope, and epoch marking. The Hilbert envelope is calculated using the output of a FIR filter based Hilbert transformer and the delay-compensated signal. The epoch marking uses a dynamic peak detector with fast rise and slow fall and nonlinear smoothing to further enhance the saliency of the epochs, followed by a differentiator or a Teager energy operator, and amplitude-duration thresholding. The technique is meant for use in speech codecs, voice conversion, speech and speaker recognition, diagnosis of voice disorders, speech training aids, and other applications involving pitch estimation.
    Type: Grant
    Filed: August 3, 2019
    Date of Patent: September 13, 2022
    Inventors: Prem Chand Pandey, Hirak Dasgupta, Nataraj Kathriki Shambulingappa
  • Patent number: 11380343
    Abstract: A method for encoding an audio signal, comprising using one or more algorithms operating on a processor to filter the audio signal into two output signals, wherein each output signal has a sampling rate that is equal to a sampling rate of the audio signal, and wherein one of the output signals includes high frequency data. Using one or more algorithms operating on the processor to window the high frequency data by selecting a set of the high frequency data. Using one or more algorithms operating on the processor to determine a set of linear predictive coding (LPC) coefficients for the windowed data. Using one or more algorithms operating on the processor to generate energy scale values for the windowed data. Using one or more algorithms operating on the processor to generate an encoded high frequency bitstream.
    Type: Grant
    Filed: September 12, 2019
    Date of Patent: July 5, 2022
    Assignee: IMMERSION NETWORKS, INC.
    Inventors: James David Johnston, King Wei Hor
  • Patent number: 11189302
    Abstract: A speech emotion detection system may obtain to-be-detected speech data. The system may generate speech frames based on framing processing and the to-be-detected speech data. The system may extract speech features corresponding to the speech frames to form a speech feature matrix corresponding to the to-be-detected speech data. The system may input the speech feature matrix to an emotion state probability detection model. The system may generate, based on the speech feature matrix and the emotion state probability detection model, an emotion state probability matrix corresponding to the to-be-detected speech data. The system may input the emotion state probability matrix and the speech feature matrix to an emotion state transition model. The system may generate an emotion state sequence based on the emotional state probability matrix, the speech feature matrix, and the emotional state transition model. The system may determine an emotion state based on the emotion state sequence.
    Type: Grant
    Filed: October 11, 2019
    Date of Patent: November 30, 2021
    Assignee: Tencent Technology (Shenzhen) Company Limited
    Inventor: Haibo Liu
  • Patent number: 11164582
    Abstract: Set forth is a motorized computing device that selectively navigates to a user according content of a spoken utterance directed at the motorized computing device. The motorized computing device can modify operations of one or more motors of the motorized computing device according to whether the user provided a spoken utterance while the one or more motors are operating. The motorized computing device can render content according to interactions between the user and an automated assistant. For instance, when automated assistant is requested to provide graphical content for the user, the motorized computing device can navigate to the user in order to present the content the user. However, in some implementations, when the user requests audio content, the motorized computing device can bypass navigating to the user when the motorized computing device is within a distance from the user for audibly rendering the audio content.
    Type: Grant
    Filed: April 29, 2019
    Date of Patent: November 2, 2021
    Assignee: GOOGLE LLC
    Inventors: Scott Stanford, Keun-Young Park, Vitalii Tomkiv, Hideaki Matsui, Angad Sidhu
  • Patent number: 11132509
    Abstract: A speech interface device is configured to perform natural language understanding (NLU) processing in a manner that optimizes the use of resources on the speech interface device. In an example process, a domain classifier(s) is used to generate domain classifier scores associated with multiple candidate domains, and the candidate domains can then be evaluated, one candidate domain at a time, in accordance with the domain classifier scores (e.g., starting with a highest scoring candidate domain). For each candidate domain undergoing the evaluation, input data is by that domain's NLU model(s), and, as soon as a domain-specific NLU model(s) produces a NLU result with a confidence score that satisfies a threshold confidence score, the evaluation can be stopped for any remaining candidate domains.
    Type: Grant
    Filed: December 3, 2018
    Date of Patent: September 28, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Stanislaw Ignacy Pasko, Ross William McGowan, Aliaksei Kuzmin, Rui Liu
  • Patent number: 10991362
    Abstract: Provided is a target speech signal extraction method for robust speech recognition including: receiving information on a direction of arrival of the target speech source with respect to the microphones; generating a nullformer by using the information on the direction of arrival of the target speech source to remove the target speech signal from the input signals and to estimate noise; setting a real output of the target speech source using an adaptive vector as a first channel and setting a dummy output by the nullformer as a remaining channel; setting a cost function for minimizing dependency between the real output of the target speech source and the dummy output using the nullformer by performing independent component analysis (ICA) or independent vector analysis (IVA); setting an auxiliary function to the cost function; and estimating the target speech signal by using the cost function and the auxiliary function.
    Type: Grant
    Filed: April 15, 2020
    Date of Patent: April 27, 2021
    Assignee: INDUSTRY-UNIVERSITY COOPERATION FOUNDATION SOGANG UNIVERSITY
    Inventors: Hyung Min Park, Seoyoung Lee, Seung-Yun Kim, Byung Joon Cho, Uihyeop Shin
  • Patent number: 10930277
    Abstract: A voice interaction architecture has a hands-free, electronic voice controlled assistant that permits users to verbally request information from cloud services. Since the assistant relies primarily, if not exclusively, on voice interactions, configuring the assistant for the first time may pose a challenge, particularly to a novice user who is unfamiliar with network settings (such as wife access keys). The architecture supports several approaches to configuring the voice controlled assistant that may be accomplished without much or any user input, thereby promoting a positive out-of-box experience for the user. More particularly, these approaches involve use of audible or optical signals to configure the voice controlled assistant.
    Type: Grant
    Filed: August 12, 2016
    Date of Patent: February 23, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Tony David, Parag Garg
  • Patent number: 10923101
    Abstract: A system, a computer program product, and method for controlling synthesized speech output on a voice-controlled device. The voice-controlled device recognized that speech input is being received. The voice-controlled device outputs synthesized speech based on the speech input. While outputting synthesized speech based on the audio is captured. The voice-controlled device recognized the audio input as speech and pausing the outputting of synthesized speech. Otherwise, in response to the captured audio not being recognized as speech and above a settable background noise threshold, pausing the outputting of synthesized speech. The paused output of speech based on the synthesized speech input is resumed after the pausing of the output of synthesized speech being within a settable pause timeframe.
    Type: Grant
    Filed: December 26, 2017
    Date of Patent: February 16, 2021
    Assignee: International Business Machines Corporation
    Inventors: Shang Qing Guo, Jonathan Lenchner
  • Patent number: 10909992
    Abstract: The lossless coding method includes selecting one of a first coding method and a second coding method, based on a range in which a quantization index of energy is represented, and coding the quantization index by using the selected coding method. The lossless decoding method includes determining a coding method of a differential quantization index of energy included in a bitstream and decoding the differential quantization index by using one of a first decoding method and a second decoding method based on a range in which a quantization index of energy is represented, in response to the determined coding method.
    Type: Grant
    Filed: May 29, 2020
    Date of Patent: February 2, 2021
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Ki-hyun Choo