End Point Detection (epo) Patents (Class 704/E11.005)
  • Patent number: 12236950
    Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.
    Type: Grant
    Filed: January 3, 2023
    Date of Patent: February 25, 2025
    Assignee: Amazon Technologies, Inc.
    Inventors: Ariya Rastrow, Eli Joshua Fidler, Roland Maximilian Rolf Maas, Nikko Strom, Aaron Eakin, Diamond Bishop, Bjorn Hoffmeister, Sanjeev Mishra
  • Patent number: 12210085
    Abstract: An angle measuring device includes an antenna device having antenna elements equally spaced along a first axis and a second axis, respectively, a selecting unit that selects phase differences with which a variance thereof becomes a predetermined value or less, from a plurality of phase differences of signals received from a transmission device by the antenna elements, an azimuth angle computing unit that computes an azimuth angle of the transmission device from a ratio of a first phase difference between signals received by two antenna elements spaced by a predetermined distance along the first axis and a second phase difference between signals received by two antenna elements space by the predetermined distance along the second axis, and an elevation angle computing unit that computes an elevation angle of the transmission device, based on the computed azimuth angle and the first or second phase difference.
    Type: Grant
    Filed: August 9, 2022
    Date of Patent: January 28, 2025
    Assignee: ALPS ALPINE CO., LTD.
    Inventors: Taiki Igarashi, Mitsunobu Inoue, Naoya Shimada, Daisuke Takai
  • Patent number: 12183322
    Abstract: A method includes receiving a sequence of acoustic frames characterizing one or more utterances as input to a multilingual automated speech recognition (ASR) model. The method also includes generating a higher order feature representation for a corresponding acoustic frame. The method also includes generating a hidden representation based on a sequence of non-blank symbols output by a final softmax layer. The method also includes generating a probability distribution over possible speech recognition hypotheses based on the hidden representation generated by the prediction network at each of the plurality of output steps and the higher order feature representation generated by the encoder at each of the plurality of output steps. The method also includes predicting an end of utterance (EOU) token at an end of each utterance. The method also includes classifying each acoustic frame as either speech, initial silence, intermediate silence, or final silence.
    Type: Grant
    Filed: September 22, 2022
    Date of Patent: December 31, 2024
    Assignee: Google LLC
    Inventors: Bo Li, Tara N. Sainath, Ruoming Pang, Shuo-yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani
  • Patent number: 12161481
    Abstract: The invention is a method for automatic detection of neurocognitive impairment, comprising, generating, in a segmentation and labelling step (11), a labelled segment series (26) from a speech sample (22) using a speech recognition unit (24); and generating from the labelled segment series (26), in an acoustic parameter calculation step (12), acoustic parameters (30) characterizing the speech sample (22).
    Type: Grant
    Filed: December 16, 2019
    Date of Patent: December 10, 2024
    Assignee: Szededi Tudomanyegyetem
    Inventors: Gábor Gosztolya, Ildikó Hoffmann, János Kálmán, Magdolna Pákáski, László Tóth, Veronika Vincze
  • Patent number: 12119012
    Abstract: The present disclosure relates to a method and an apparatus for audio processing and a storage medium. The method includes: obtaining an audio mixing feature of a target object, in which the audio mixing feature at least includes: a voiceprint feature and a pitch feature of the target object; and determining a target audio matching with the target object in the mixed audio according to the audio mixing feature.
    Type: Grant
    Filed: June 21, 2021
    Date of Patent: October 15, 2024
    Assignee: Beijing Xiaomi Pinecone Electronics Co., Ltd.
    Inventors: Na Xu, Yongtao Jia, Linzhang Wang
  • Patent number: 12112744
    Abstract: The disclosure provides a multimodal speech recognition method and system, and a computer-readable storage medium. The method includes calculating a first logarithmic mel-frequency spectral coefficient and a second logarithmic mel-frequency spectral coefficient when a target millimeter-wave signal and a target audio signal both contain speech information corresponding to a target user; inputting the first and the second logarithmic mel-frequency spectral coefficient into a fusion network to determine a target fusion feature, where the fusion network includes at least a calibration module and a mapping module, the calibration module is configured to perform mutual feature calibration on the target audio/millimeter-wave signals, and the mapping module is configured to fuse a calibrated millimeter-wave feature and a calibrated audio feature; and inputting the target fusion feature into a semantic feature network to determine a speech recognition result corresponding to the target user.
    Type: Grant
    Filed: March 2, 2022
    Date of Patent: October 8, 2024
    Assignee: Zhejiang University
    Inventors: Feng Lin, Tiantian Liu, Ming Gao, Chao Wang, Zhongjie Ba, Jinsong Han, Wenyao Xu, Kui Ren
  • Patent number: 12100410
    Abstract: As pitch enhancement processing, a pitch enhancement apparatus obtains, for a time segment judged to be a time segment including a signal that is a consonant, for each time of the time segment, as an output signal, a signal including a signal obtained by adding a signal, which was obtained by multiplying a signal at a time that is an earlier time than the time by the number of samples T0 corresponding to a pitch period of the time segment, the pitch gain ?0 of the time segment, a predetermined constant B0, and a value that is greater than 0 and less than 1, and a signal at the time.
    Type: Grant
    Filed: March 22, 2019
    Date of Patent: September 24, 2024
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Yutaka Kamamoto, Ryosuke Sugiura, Takehiro Moriya
  • Patent number: 12061646
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to identify media that has been pitch shifted, time shifted, and/or resampled. An example apparatus includes: memory; instructions in the apparatus; and processor circuitry to execute the instructions to: transmit a fingerprint of an audio signal and adjusting instructions to a central facility to facilitate a query, the adjusting instructions identifying at least one of a pitch shift, a time shift, or a resample ratio; obtain a response including an identifier for the audio signal and information corresponding to how the audio signal was adjusted; and change the adjusting instructions based on the information.
    Type: Grant
    Filed: July 5, 2023
    Date of Patent: August 13, 2024
    Assignee: GRACENOTE, INC.
    Inventors: Robert Coover, Matthew James Wilkinson, Jeffrey Scott, Yongju Hong
  • Patent number: 12050557
    Abstract: A computerized system and method of compressing symbolic information organized into a plurality of documents, each document having a plurality of symbols, the system and method including: (i) automatically identifying a plurality of sequential (also referred to as adjacent) and/or non-sequential symbol (also referred to as non-adjacent) pairs in an input document; (ii) counting the number of appearances of each unique symbol pair; and (iii) producing a compressed document that includes a replacement symbol at each position associated with one of the plurality of symbol pairs, at least one of which corresponds to a non-sequential symbol pair. For each non-sequential pair the compressed document includes corresponding indicia indicating a distance between locations of the non-sequential symbols of the pair in the input document.
    Type: Grant
    Filed: November 22, 2021
    Date of Patent: July 30, 2024
    Inventor: Takashi Suzuki
  • Patent number: 11996119
    Abstract: The end-of-talk prediction device (10) of the present invention comprises: a divide unit (11) for dividing, using delimiter symbols indicating delimitations within segments, a string in which the utterance in the dialog has been text-converted by speech recognition, the delimiter symbols included in the result of the speech recognition; and an end-of-talk prediction unit (12) for predicting, using an end-of-talk prediction model (14), whether the utterance corresponding to the divided string divided by the divide unit (11) is an end-of-talk utterance of the speaker.
    Type: Grant
    Filed: August 14, 2019
    Date of Patent: May 28, 2024
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Setsuo Yamada, Yoshiaki Noda, Takaaki Hasegawa
  • Patent number: 11972751
    Abstract: Disclosed are a method and an apparatus for detecting a voice end point by using acoustic and language modeling information to accomplish strong voice recognition. A voice end point detection method according to an embodiment may comprise the steps of: inputting an acoustic feature vector sequence extracted from a microphone input signal into an acoustic embedding extraction unit, a phonemic embedding extraction unit, and a decoder embedding extraction unit, which are based on a recurrent neural network (RNN); combining acoustic embedding, phonemic embedding, and decoder embedding to configure a feature vector by the acoustic embedding extraction unit, the phonemic embedding extraction unit, and the decoder embedding extraction unit; and inputting the combined feature vector into a deep neural network (DNN)-based classifier to detect a voice end point.
    Type: Grant
    Filed: June 29, 2020
    Date of Patent: April 30, 2024
    Assignee: IUCF-HYU (INDUSTRY-UNIVERSITY COOPERATION FOUNDATION HANYANG UNIVERSITY)
    Inventors: Joon-Hyuk Chang, Inyoung Hwang
  • Patent number: 11921916
    Abstract: Image editing on a wearable device includes a system which obtains sensor data via the wearable device. The sensor data includes a representation of hand movement, head movement or voice command associated with a user. The system executes an application for editing an image based on the obtained sensor data. The system provides for display a list of image adjustment types associated with the application. The system selects an image adjustment type based on one or more of the hand movement, the head movement or the voice command. The system provides for display a prompt having options to adjust a property of the selected image adjustment type. The system selects one of the options included in the prompt. The system modifies an image based on the selected option. The system then provides the modified image for storage in a data structure of a memory unit in the wearable device.
    Type: Grant
    Filed: December 31, 2020
    Date of Patent: March 5, 2024
    Assignee: Google LLC
    Inventors: Thomas Binder, Ronald Frank Wotzlaw
  • Patent number: 11817117
    Abstract: In various examples, end of speech (EOS) for an audio signal is determined based at least in part on a rate of speech for a speaker. For a segment of the audio signal, EOS is indicated based at least in part on an EOS threshold determined based at least in part on the rate of speech for the speaker.
    Type: Grant
    Filed: January 29, 2021
    Date of Patent: November 14, 2023
    Assignee: NVIDIA CORPORATION
    Inventors: Utkarsh Vaidya, Ravindra Yeshwant Lokhande, Viraj Gangadhar Karandikar, Niranjan Rajendra Wartikar, Sumit Kumar Bhattacharya
  • Patent number: 11798575
    Abstract: Embodiments allow for an auto-mixer to gate microphones on and off based on speech detection, without losing or discarding the speech received during the speech recognition period. An example method includes receiving and storing an input audio signal. The method also includes determining, based on a first segment of the input audio signal, that the input audio signal comprises speech, and determining a delay between the input audio signal and a corresponding output audio signal provided to a speaker. The method also includes reducing the delay, wherein reducing the delay comprises removing one or more segments of the stored input audio signal to create a time-compressed audio signal and providing the time-compressed audio signal as the corresponding output audio signal. The method also includes determining that the delay is less than a threshold duration, and responsively providing the input audio signal as the corresponding output audio signal.
    Type: Grant
    Filed: May 3, 2021
    Date of Patent: October 24, 2023
    Assignee: Shure Acquisition Holdings, Inc.
    Inventors: Michael Ryan Lester, Jose Roberto Regalbuto, David Grant Cason
  • Patent number: 11776529
    Abstract: A method, the method includes determining a target segment partially overlapping a preceding segment from a speech signal, determining a target character sequence corresponding to the target segment by decoding the target segment, identifying a first overlapping portion between the target character sequence and a preceding character sequence based on an edit distance, and merging the target character sequence and the preceding character sequence based on the first overlapping portion. A cost applied to the edit distance is determined based on any one or any combination of any two or more of a type of operation performed at the edit distance, whether characters to be operated are located in the first overlapping portion, and whether the characters to be operated match. A portion overlapping the preceding segment in the target segment is greater than or equal to 8.3% of the target segment.
    Type: Grant
    Filed: July 7, 2021
    Date of Patent: October 3, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Tae Gyoon Kang
  • Patent number: 11769517
    Abstract: This invention provides a signal processing apparatus capable of obtaining an output signal of sufficiently high quality if the phase of an input signal is largely different from the phase of a true voice. The signal processing apparatus includes a voice detector that receives a mixed signal including a voice and a signal other than the voice and obtains existence of the voice as a voice flag, a corrector that receives the mixed signal and the voice flag and obtains a corrected mixed signal generated by correcting the mixed signal in accordance with a state of the voice flag, and a shaper that receives the corrected mixed signal and shapes the corrected mixed signal.
    Type: Grant
    Filed: August 24, 2018
    Date of Patent: September 26, 2023
    Assignees: NEC CORPORATION, NEC Platforms, Ltd.
    Inventors: Akihiko Sugiyama, Ryoji Miyahara
  • Patent number: 11727917
    Abstract: Embodiments describe a method for speech endpoint detection including receiving identification data for a first state associated with a first frame of speech data from a WFST language model, determining that the first frame of the speech data includes silence data, incrementing a silence counter associated with the first state, copying a value of the silence counter of the first state to a corresponding silence counter field in a second state associated with the first state in an active state list, and determining that the value of the silence counter for the first state is above a silence threshold. The method further includes, determining that an endpoint of the speech has occurred in response to determining that the silence counter is above the silence threshold, and outputting text data representing a plurality of words determined from the speech data that was received prior to the endpoint.
    Type: Grant
    Filed: May 24, 2021
    Date of Patent: August 15, 2023
    Assignee: Amazon Technologies, Inc.
    Inventor: Pushkaraksha Gejji
  • Patent number: 11721323
    Abstract: A method, the method includes determining a target segment from a speech signal, determining a target character sequence corresponding to the target segment by decoding the target segment, identifying a first overlapping portion between the target character sequence and a preceding character sequence based on an edit distance, and merging the target character sequence and the preceding character sequence based on the first overlapping portion. A cost applied to the edit distance is determined based on any one or any combination of any two or more of a type of operation performed at the edit distance, whether characters to be operated are located in the first overlapping portion, and whether the characters to be operated match.
    Type: Grant
    Filed: October 29, 2020
    Date of Patent: August 8, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Tae Gyoon Kang
  • Patent number: 11662974
    Abstract: In one aspect a device-side audio handling input/output unit (DIO) of a hardware device writes audio data generated by the hardware device within a ring buffer. An input provided by a user for activation of a software program is received, and a notification that the software program is ready to accept the audio data is generated. A system-side audio handling input/output unit (SIO) additionally provides past audio data from the ring buffer to the software program. Other aspects also are described.
    Type: Grant
    Filed: March 26, 2021
    Date of Patent: May 30, 2023
    Assignee: Apple Inc.
    Inventors: Jeffrey C. Moore, Richard M. Powell, Alexander C. Powers, Anthony J. Guetta
  • Patent number: 11640227
    Abstract: Disclosed are systems, methods, and computer-readable storage media to provide voice driven dynamic menus. One aspect disclosed is a method including receiving, by an electronic device, video data and audio data, displaying, by the electronic device, a video window, determining, by the electronic device, whether the audio data includes a voice signal, displaying, by the electronic device, a first menu in the video window in response to the audio data including a voice signal, displaying, by the electronic device, a second menu in the video window in response to a voice signal being absent from the audio data, receiving, by the electronic device, input from the displayed menu, and writing, by the electronic device, to an output device based on the received input.
    Type: Grant
    Filed: October 20, 2020
    Date of Patent: May 2, 2023
    Assignee: SNAP INC.
    Inventor: Jesse Chand
  • Patent number: 11636871
    Abstract: Disclosed are a method, an electronic apparatus for detecting tampering audio and a storage medium. The method includes: acquiring a signal to be detected, and performing a wavelet transform of a first preset order on the signal to be detected so as to obtain a first low-frequency coefficient and a first high-frequency coefficient corresponding to the signal to be detected, the number of which is equal to that of the first preset order; performing an inverse wavelet transform on the first high-frequency coefficient having an order greater than or equal to a second preset order so as to obtain a first high-frequency component signal corresponding to the signal to be detected; calculating a first Mel cepstrum feature of the first high-frequency component signal in units of frame, and concatenating the first Mel cepstrum features of a current frame signal and a preset number of frame signals.
    Type: Grant
    Filed: February 8, 2022
    Date of Patent: April 25, 2023
    Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
    Inventors: Jianhua Tao, Shan Liang, Shuai Nie, Jiangyan Yi
  • Patent number: 11601542
    Abstract: A crash detection system for a vehicle comprises first and second batteries and a computational system comprising a processor and a non-transitory computer-readable medium. First and second antennas are both in electronic communication with the computational system and powered by one of the batteries, with the antennas configured to independently wirelessly communicate with an external network. A microphone powered by one of the batteries and in electronic communication with the computational system. The microphone continuously receives sound waves in real time and transmits sound signals to the processor. The processor monitors properties of the sound waves within the sound signals, compares the properties to thresholds stored in the non-transitory computer-readable medium, determines if the vehicle has been involved in a collision if at least one of the properties crosses the respective threshold, and communicates with the external network to report the collision with one of the antennas.
    Type: Grant
    Filed: September 22, 2021
    Date of Patent: March 7, 2023
    Assignee: GM GLOBAL TECHNOLOGY OPERATIONS LLC
    Inventors: Alex Jose Veloso, Mateus Amstalden Santa Rosa, Russell A. Patenaude, Matthew Edward Gilbert-Eyres, Dipankar Pal
  • Patent number: 11488603
    Abstract: Embodiments of the present disclosure provide a method and apparatus for processing a speech. The method may include: acquiring an original speech; performing speech recognition on the original speech, to obtain an original text corresponding to the original speech; associating a speech segment in the original speech with a text segment in the original text; recognizing an abnormal segment in the original speech and/or the original text; and processing a text segment indicated by the abnormal segment in the original text and/or the speech segment indicated by the abnormal segment in the original speech, to generate a final speech. A speech segment in the original speech is associated with a text segment in the original text to realize visual processing of the speech.
    Type: Grant
    Filed: December 11, 2019
    Date of Patent: November 1, 2022
    Assignee: Beijing Baidu Netcom Science and Technology Co., Ltd.
    Inventors: Wanqi Tang, Jiamei Kang, Lixia Zeng, Yijing Zhou, Hanmei Xie, Lina Zhu
  • Patent number: 8938313
    Abstract: An auditory event boundary detector employs down-sampling of the input digital audio signal without an anti-aliasing filter, resulting in a narrower bandwidth intermediate signal with aliasing. Spectral changes of that intermediate signal, indicating event boundaries, may be detected using an adaptive filter to track a linear predictive model of the samples of the intermediate signal. Changes in the magnitude or power of the filter error correspond to changes in the spectrum of the input audio signal. The adaptive filter converges at a rate consistent with the duration of auditory events, so filter error magnitude or power changes indicate event boundaries. The detector is much less complex than methods employing time-to-frequency transforms for the full bandwidth of the audio signal.
    Type: Grant
    Filed: April 12, 2010
    Date of Patent: January 20, 2015
    Assignee: Dolby Laboratories Licensing Corporation
    Inventor: Glenn N. Dickins
  • Publication number: 20100030559
    Abstract: According to a disclosed embodiment, an endpointer determines the background energy of a first portion of a speech signal, and a cepstral computing module extracts one or more features of the first portion. The endpointer calculates an average distance of the first portion based on the features. Subsequently, an energy computing module measures the energy of a second portion of the speech signal, and the cepstral computing module extracts one or more features of the second portion. Based on the features of the second portion, the endpointer calculates a distance of the second portion. Thereafter, the endpointer contrasts the energy of the second portion with the background energy of the first portion, and compares the distance of the second portion with the distance of the first portion. The second portion of the speech signal is classified by the endpointer as speech or non-speech based on the contrast and the comparison.
    Type: Application
    Filed: June 25, 2009
    Publication date: February 4, 2010
    Applicant: MINDSPEED TECHNOLOGIES, INC.
    Inventors: Sahar E. Bou-Ghazale, Ayman O. Asadi, Khaled Assaleh