End Point Detection (epo) Patents (Class 704/E11.005)
  • Patent number: 11972751
    Abstract: Disclosed are a method and an apparatus for detecting a voice end point by using acoustic and language modeling information to accomplish strong voice recognition. A voice end point detection method according to an embodiment may comprise the steps of: inputting an acoustic feature vector sequence extracted from a microphone input signal into an acoustic embedding extraction unit, a phonemic embedding extraction unit, and a decoder embedding extraction unit, which are based on a recurrent neural network (RNN); combining acoustic embedding, phonemic embedding, and decoder embedding to configure a feature vector by the acoustic embedding extraction unit, the phonemic embedding extraction unit, and the decoder embedding extraction unit; and inputting the combined feature vector into a deep neural network (DNN)-based classifier to detect a voice end point.
    Type: Grant
    Filed: June 29, 2020
    Date of Patent: April 30, 2024
    Assignee: IUCF-HYU (INDUSTRY-UNIVERSITY COOPERATION FOUNDATION HANYANG UNIVERSITY)
    Inventors: Joon-Hyuk Chang, Inyoung Hwang
  • Patent number: 11921916
    Abstract: Image editing on a wearable device includes a system which obtains sensor data via the wearable device. The sensor data includes a representation of hand movement, head movement or voice command associated with a user. The system executes an application for editing an image based on the obtained sensor data. The system provides for display a list of image adjustment types associated with the application. The system selects an image adjustment type based on one or more of the hand movement, the head movement or the voice command. The system provides for display a prompt having options to adjust a property of the selected image adjustment type. The system selects one of the options included in the prompt. The system modifies an image based on the selected option. The system then provides the modified image for storage in a data structure of a memory unit in the wearable device.
    Type: Grant
    Filed: December 31, 2020
    Date of Patent: March 5, 2024
    Assignee: Google LLC
    Inventors: Thomas Binder, Ronald Frank Wotzlaw
  • Patent number: 11817117
    Abstract: In various examples, end of speech (EOS) for an audio signal is determined based at least in part on a rate of speech for a speaker. For a segment of the audio signal, EOS is indicated based at least in part on an EOS threshold determined based at least in part on the rate of speech for the speaker.
    Type: Grant
    Filed: January 29, 2021
    Date of Patent: November 14, 2023
    Assignee: NVIDIA CORPORATION
    Inventors: Utkarsh Vaidya, Ravindra Yeshwant Lokhande, Viraj Gangadhar Karandikar, Niranjan Rajendra Wartikar, Sumit Kumar Bhattacharya
  • Patent number: 11798575
    Abstract: Embodiments allow for an auto-mixer to gate microphones on and off based on speech detection, without losing or discarding the speech received during the speech recognition period. An example method includes receiving and storing an input audio signal. The method also includes determining, based on a first segment of the input audio signal, that the input audio signal comprises speech, and determining a delay between the input audio signal and a corresponding output audio signal provided to a speaker. The method also includes reducing the delay, wherein reducing the delay comprises removing one or more segments of the stored input audio signal to create a time-compressed audio signal and providing the time-compressed audio signal as the corresponding output audio signal. The method also includes determining that the delay is less than a threshold duration, and responsively providing the input audio signal as the corresponding output audio signal.
    Type: Grant
    Filed: May 3, 2021
    Date of Patent: October 24, 2023
    Assignee: Shure Acquisition Holdings, Inc.
    Inventors: Michael Ryan Lester, Jose Roberto Regalbuto, David Grant Cason
  • Patent number: 11776529
    Abstract: A method, the method includes determining a target segment partially overlapping a preceding segment from a speech signal, determining a target character sequence corresponding to the target segment by decoding the target segment, identifying a first overlapping portion between the target character sequence and a preceding character sequence based on an edit distance, and merging the target character sequence and the preceding character sequence based on the first overlapping portion. A cost applied to the edit distance is determined based on any one or any combination of any two or more of a type of operation performed at the edit distance, whether characters to be operated are located in the first overlapping portion, and whether the characters to be operated match. A portion overlapping the preceding segment in the target segment is greater than or equal to 8.3% of the target segment.
    Type: Grant
    Filed: July 7, 2021
    Date of Patent: October 3, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Tae Gyoon Kang
  • Patent number: 11769517
    Abstract: This invention provides a signal processing apparatus capable of obtaining an output signal of sufficiently high quality if the phase of an input signal is largely different from the phase of a true voice. The signal processing apparatus includes a voice detector that receives a mixed signal including a voice and a signal other than the voice and obtains existence of the voice as a voice flag, a corrector that receives the mixed signal and the voice flag and obtains a corrected mixed signal generated by correcting the mixed signal in accordance with a state of the voice flag, and a shaper that receives the corrected mixed signal and shapes the corrected mixed signal.
    Type: Grant
    Filed: August 24, 2018
    Date of Patent: September 26, 2023
    Assignees: NEC CORPORATION, NEC Platforms, Ltd.
    Inventors: Akihiko Sugiyama, Ryoji Miyahara
  • Patent number: 11727917
    Abstract: Embodiments describe a method for speech endpoint detection including receiving identification data for a first state associated with a first frame of speech data from a WFST language model, determining that the first frame of the speech data includes silence data, incrementing a silence counter associated with the first state, copying a value of the silence counter of the first state to a corresponding silence counter field in a second state associated with the first state in an active state list, and determining that the value of the silence counter for the first state is above a silence threshold. The method further includes, determining that an endpoint of the speech has occurred in response to determining that the silence counter is above the silence threshold, and outputting text data representing a plurality of words determined from the speech data that was received prior to the endpoint.
    Type: Grant
    Filed: May 24, 2021
    Date of Patent: August 15, 2023
    Assignee: Amazon Technologies, Inc.
    Inventor: Pushkaraksha Gejji
  • Patent number: 11721323
    Abstract: A method, the method includes determining a target segment from a speech signal, determining a target character sequence corresponding to the target segment by decoding the target segment, identifying a first overlapping portion between the target character sequence and a preceding character sequence based on an edit distance, and merging the target character sequence and the preceding character sequence based on the first overlapping portion. A cost applied to the edit distance is determined based on any one or any combination of any two or more of a type of operation performed at the edit distance, whether characters to be operated are located in the first overlapping portion, and whether the characters to be operated match.
    Type: Grant
    Filed: October 29, 2020
    Date of Patent: August 8, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Tae Gyoon Kang
  • Patent number: 11662974
    Abstract: In one aspect a device-side audio handling input/output unit (DIO) of a hardware device writes audio data generated by the hardware device within a ring buffer. An input provided by a user for activation of a software program is received, and a notification that the software program is ready to accept the audio data is generated. A system-side audio handling input/output unit (SIO) additionally provides past audio data from the ring buffer to the software program. Other aspects also are described.
    Type: Grant
    Filed: March 26, 2021
    Date of Patent: May 30, 2023
    Assignee: Apple Inc.
    Inventors: Jeffrey C. Moore, Richard M. Powell, Alexander C. Powers, Anthony J. Guetta
  • Patent number: 11640227
    Abstract: Disclosed are systems, methods, and computer-readable storage media to provide voice driven dynamic menus. One aspect disclosed is a method including receiving, by an electronic device, video data and audio data, displaying, by the electronic device, a video window, determining, by the electronic device, whether the audio data includes a voice signal, displaying, by the electronic device, a first menu in the video window in response to the audio data including a voice signal, displaying, by the electronic device, a second menu in the video window in response to a voice signal being absent from the audio data, receiving, by the electronic device, input from the displayed menu, and writing, by the electronic device, to an output device based on the received input.
    Type: Grant
    Filed: October 20, 2020
    Date of Patent: May 2, 2023
    Assignee: SNAP INC.
    Inventor: Jesse Chand
  • Patent number: 11636871
    Abstract: Disclosed are a method, an electronic apparatus for detecting tampering audio and a storage medium. The method includes: acquiring a signal to be detected, and performing a wavelet transform of a first preset order on the signal to be detected so as to obtain a first low-frequency coefficient and a first high-frequency coefficient corresponding to the signal to be detected, the number of which is equal to that of the first preset order; performing an inverse wavelet transform on the first high-frequency coefficient having an order greater than or equal to a second preset order so as to obtain a first high-frequency component signal corresponding to the signal to be detected; calculating a first Mel cepstrum feature of the first high-frequency component signal in units of frame, and concatenating the first Mel cepstrum features of a current frame signal and a preset number of frame signals.
    Type: Grant
    Filed: February 8, 2022
    Date of Patent: April 25, 2023
    Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
    Inventors: Jianhua Tao, Shan Liang, Shuai Nie, Jiangyan Yi
  • Patent number: 11601542
    Abstract: A crash detection system for a vehicle comprises first and second batteries and a computational system comprising a processor and a non-transitory computer-readable medium. First and second antennas are both in electronic communication with the computational system and powered by one of the batteries, with the antennas configured to independently wirelessly communicate with an external network. A microphone powered by one of the batteries and in electronic communication with the computational system. The microphone continuously receives sound waves in real time and transmits sound signals to the processor. The processor monitors properties of the sound waves within the sound signals, compares the properties to thresholds stored in the non-transitory computer-readable medium, determines if the vehicle has been involved in a collision if at least one of the properties crosses the respective threshold, and communicates with the external network to report the collision with one of the antennas.
    Type: Grant
    Filed: September 22, 2021
    Date of Patent: March 7, 2023
    Assignee: GM GLOBAL TECHNOLOGY OPERATIONS LLC
    Inventors: Alex Jose Veloso, Mateus Amstalden Santa Rosa, Russell A. Patenaude, Matthew Edward Gilbert-Eyres, Dipankar Pal
  • Patent number: 11488603
    Abstract: Embodiments of the present disclosure provide a method and apparatus for processing a speech. The method may include: acquiring an original speech; performing speech recognition on the original speech, to obtain an original text corresponding to the original speech; associating a speech segment in the original speech with a text segment in the original text; recognizing an abnormal segment in the original speech and/or the original text; and processing a text segment indicated by the abnormal segment in the original text and/or the speech segment indicated by the abnormal segment in the original speech, to generate a final speech. A speech segment in the original speech is associated with a text segment in the original text to realize visual processing of the speech.
    Type: Grant
    Filed: December 11, 2019
    Date of Patent: November 1, 2022
    Assignee: Beijing Baidu Netcom Science and Technology Co., Ltd.
    Inventors: Wanqi Tang, Jiamei Kang, Lixia Zeng, Yijing Zhou, Hanmei Xie, Lina Zhu
  • Patent number: 8938313
    Abstract: An auditory event boundary detector employs down-sampling of the input digital audio signal without an anti-aliasing filter, resulting in a narrower bandwidth intermediate signal with aliasing. Spectral changes of that intermediate signal, indicating event boundaries, may be detected using an adaptive filter to track a linear predictive model of the samples of the intermediate signal. Changes in the magnitude or power of the filter error correspond to changes in the spectrum of the input audio signal. The adaptive filter converges at a rate consistent with the duration of auditory events, so filter error magnitude or power changes indicate event boundaries. The detector is much less complex than methods employing time-to-frequency transforms for the full bandwidth of the audio signal.
    Type: Grant
    Filed: April 12, 2010
    Date of Patent: January 20, 2015
    Assignee: Dolby Laboratories Licensing Corporation
    Inventor: Glenn N. Dickins
  • Publication number: 20100030559
    Abstract: According to a disclosed embodiment, an endpointer determines the background energy of a first portion of a speech signal, and a cepstral computing module extracts one or more features of the first portion. The endpointer calculates an average distance of the first portion based on the features. Subsequently, an energy computing module measures the energy of a second portion of the speech signal, and the cepstral computing module extracts one or more features of the second portion. Based on the features of the second portion, the endpointer calculates a distance of the second portion. Thereafter, the endpointer contrasts the energy of the second portion with the background energy of the first portion, and compares the distance of the second portion with the distance of the first portion. The second portion of the speech signal is classified by the endpointer as speech or non-speech based on the contrast and the comparison.
    Type: Application
    Filed: June 25, 2009
    Publication date: February 4, 2010
    Applicant: MINDSPEED TECHNOLOGIES, INC.
    Inventors: Sahar E. Bou-Ghazale, Ayman O. Asadi, Khaled Assaleh