End Point Detection (epo) Patents (Class 704/E11.005)

Device-directed utterance detection

Patent number: 12236950

Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.

Type: Grant

Filed: January 3, 2023

Date of Patent: February 25, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Ariya Rastrow, Eli Joshua Fidler, Roland Maximilian Rolf Maas, Nikko Strom, Aaron Eakin, Diamond Bishop, Bjorn Hoffmeister, Sanjeev Mishra
Angle measuring device and angle measuring method

Patent number: 12210085

Abstract: An angle measuring device includes an antenna device having antenna elements equally spaced along a first axis and a second axis, respectively, a selecting unit that selects phase differences with which a variance thereof becomes a predetermined value or less, from a plurality of phase differences of signals received from a transmission device by the antenna elements, an azimuth angle computing unit that computes an azimuth angle of the transmission device from a ratio of a first phase difference between signals received by two antenna elements spaced by a predetermined distance along the first axis and a second phase difference between signals received by two antenna elements space by the predetermined distance along the second axis, and an elevation angle computing unit that computes an elevation angle of the transmission device, based on the computed azimuth angle and the first or second phase difference.

Type: Grant

Filed: August 9, 2022

Date of Patent: January 28, 2025

Assignee: ALPS ALPINE CO., LTD.

Inventors: Taiki Igarashi, Mitsunobu Inoue, Naoya Shimada, Daisuke Takai
Language agnostic multilingual end-to-end streaming on-device ASR system

Patent number: 12183322

Abstract: A method includes receiving a sequence of acoustic frames characterizing one or more utterances as input to a multilingual automated speech recognition (ASR) model. The method also includes generating a higher order feature representation for a corresponding acoustic frame. The method also includes generating a hidden representation based on a sequence of non-blank symbols output by a final softmax layer. The method also includes generating a probability distribution over possible speech recognition hypotheses based on the hidden representation generated by the prediction network at each of the plurality of output steps and the higher order feature representation generated by the encoder at each of the plurality of output steps. The method also includes predicting an end of utterance (EOU) token at an end of each utterance. The method also includes classifying each acoustic frame as either speech, initial silence, intermediate silence, or final silence.

Type: Grant

Filed: September 22, 2022

Date of Patent: December 31, 2024

Assignee: Google LLC

Inventors: Bo Li, Tara N. Sainath, Ruoming Pang, Shuo-yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani
Automatic detection of neurocognitive impairment based on a speech sample

Patent number: 12161481

Abstract: The invention is a method for automatic detection of neurocognitive impairment, comprising, generating, in a segmentation and labelling step (11), a labelled segment series (26) from a speech sample (22) using a speech recognition unit (24); and generating from the labelled segment series (26), in an acoustic parameter calculation step (12), acoustic parameters (30) characterizing the speech sample (22).

Type: Grant

Filed: December 16, 2019

Date of Patent: December 10, 2024

Assignee: Szededi Tudomanyegyetem

Inventors: Gábor Gosztolya, Ildikó Hoffmann, János Kálmán, Magdolna Pákáski, László Tóth, Veronika Vincze
Method and apparatus for voice recognition in mixed audio based on pitch features using network models, and storage medium

Patent number: 12119012

Abstract: The present disclosure relates to a method and an apparatus for audio processing and a storage medium. The method includes: obtaining an audio mixing feature of a target object, in which the audio mixing feature at least includes: a voiceprint feature and a pitch feature of the target object; and determining a target audio matching with the target object in the mixed audio according to the audio mixing feature.

Type: Grant

Filed: June 21, 2021

Date of Patent: October 15, 2024

Assignee: Beijing Xiaomi Pinecone Electronics Co., Ltd.

Inventors: Na Xu, Yongtao Jia, Linzhang Wang
Multimodal speech recognition method and system, and computer-readable storage medium

Patent number: 12112744

Abstract: The disclosure provides a multimodal speech recognition method and system, and a computer-readable storage medium. The method includes calculating a first logarithmic mel-frequency spectral coefficient and a second logarithmic mel-frequency spectral coefficient when a target millimeter-wave signal and a target audio signal both contain speech information corresponding to a target user; inputting the first and the second logarithmic mel-frequency spectral coefficient into a fusion network to determine a target fusion feature, where the fusion network includes at least a calibration module and a mapping module, the calibration module is configured to perform mutual feature calibration on the target audio/millimeter-wave signals, and the mapping module is configured to fuse a calibrated millimeter-wave feature and a calibrated audio feature; and inputting the target fusion feature into a semantic feature network to determine a speech recognition result corresponding to the target user.

Type: Grant

Filed: March 2, 2022

Date of Patent: October 8, 2024

Assignee: Zhejiang University

Inventors: Feng Lin, Tiantian Liu, Ming Gao, Chao Wang, Zhongjie Ba, Jinsong Han, Wenyao Xu, Kui Ren
Pitch emphasis apparatus, method, program, and recording medium for the same

Patent number: 12100410

Abstract: As pitch enhancement processing, a pitch enhancement apparatus obtains, for a time segment judged to be a time segment including a signal that is a consonant, for each time of the time segment, as an output signal, a signal including a signal obtained by adding a signal, which was obtained by multiplying a signal at a time that is an earlier time than the time by the number of samples T0 corresponding to a pitch period of the time segment, the pitch gain ?0 of the time segment, a predetermined constant B0, and a value that is greater than 0 and less than 1, and a signal at the time.

Type: Grant

Filed: March 22, 2019

Date of Patent: September 24, 2024

Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventors: Yutaka Kamamoto, Ryosuke Sugiura, Takehiro Moriya
Methods and apparatus to identify media that has been pitch shifted, time shifted, and/or resampled

Patent number: 12061646

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to identify media that has been pitch shifted, time shifted, and/or resampled. An example apparatus includes: memory; instructions in the apparatus; and processor circuitry to execute the instructions to: transmit a fingerprint of an audio signal and adjusting instructions to a central facility to facilitate a query, the adjusting instructions identifying at least one of a pitch shift, a time shift, or a resample ratio; obtain a response including an identifier for the audio signal and information corresponding to how the audio signal was adjusted; and change the adjusting instructions based on the information.

Type: Grant

Filed: July 5, 2023

Date of Patent: August 13, 2024

Assignee: GRACENOTE, INC.

Inventors: Robert Coover, Matthew James Wilkinson, Jeffrey Scott, Yongju Hong
Computerized systems and methods of data compression

Patent number: 12050557

Abstract: A computerized system and method of compressing symbolic information organized into a plurality of documents, each document having a plurality of symbols, the system and method including: (i) automatically identifying a plurality of sequential (also referred to as adjacent) and/or non-sequential symbol (also referred to as non-adjacent) pairs in an input document; (ii) counting the number of appearances of each unique symbol pair; and (iii) producing a compressed document that includes a replacement symbol at each position associated with one of the plurality of symbol pairs, at least one of which corresponds to a non-sequential symbol pair. For each non-sequential pair the compressed document includes corresponding indicia indicating a distance between locations of the non-sequential symbols of the pair in the input document.

Type: Grant

Filed: November 22, 2021

Date of Patent: July 30, 2024

Inventor: Takashi Suzuki
End-of-talk prediction device, end-of-talk prediction method, and non-transitory computer readable recording medium

Patent number: 11996119

Abstract: The end-of-talk prediction device (10) of the present invention comprises: a divide unit (11) for dividing, using delimiter symbols indicating delimitations within segments, a string in which the utterance in the dialog has been text-converted by speech recognition, the delimiter symbols included in the result of the speech recognition; and an end-of-talk prediction unit (12) for predicting, using an end-of-talk prediction model (14), whether the utterance corresponding to the divided string divided by the divide unit (11) is an end-of-talk utterance of the speaker.

Type: Grant

Filed: August 14, 2019

Date of Patent: May 28, 2024

Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventors: Setsuo Yamada, Yoshiaki Noda, Takaaki Hasegawa
Method and apparatus for detecting voice end point using acoustic and language modeling information for robust voice

Patent number: 11972751

Abstract: Disclosed are a method and an apparatus for detecting a voice end point by using acoustic and language modeling information to accomplish strong voice recognition. A voice end point detection method according to an embodiment may comprise the steps of: inputting an acoustic feature vector sequence extracted from a microphone input signal into an acoustic embedding extraction unit, a phonemic embedding extraction unit, and a decoder embedding extraction unit, which are based on a recurrent neural network (RNN); combining acoustic embedding, phonemic embedding, and decoder embedding to configure a feature vector by the acoustic embedding extraction unit, the phonemic embedding extraction unit, and the decoder embedding extraction unit; and inputting the combined feature vector into a deep neural network (DNN)-based classifier to detect a voice end point.

Type: Grant

Filed: June 29, 2020

Date of Patent: April 30, 2024

Assignee: IUCF-HYU (INDUSTRY-UNIVERSITY COOPERATION FOUNDATION HANYANG UNIVERSITY)

Inventors: Joon-Hyuk Chang, Inyoung Hwang
Image editing with audio data

Patent number: 11921916

Abstract: Image editing on a wearable device includes a system which obtains sensor data via the wearable device. The sensor data includes a representation of hand movement, head movement or voice command associated with a user. The system executes an application for editing an image based on the obtained sensor data. The system provides for display a list of image adjustment types associated with the application. The system selects an image adjustment type based on one or more of the hand movement, the head movement or the voice command. The system provides for display a prompt having options to adjust a property of the selected image adjustment type. The system selects one of the options included in the prompt. The system modifies an image based on the selected option. The system then provides the modified image for storage in a data structure of a memory unit in the wearable device.

Type: Grant

Filed: December 31, 2020

Date of Patent: March 5, 2024

Assignee: Google LLC

Inventors: Thomas Binder, Ronald Frank Wotzlaw
Speaker adaptive end of speech detection for conversational AI applications

Patent number: 11817117

Abstract: In various examples, end of speech (EOS) for an audio signal is determined based at least in part on a rate of speech for a speaker. For a segment of the audio signal, EOS is indicated based at least in part on an EOS threshold determined based at least in part on the rate of speech for the speaker.

Type: Grant

Filed: January 29, 2021

Date of Patent: November 14, 2023

Assignee: NVIDIA CORPORATION

Inventors: Utkarsh Vaidya, Ravindra Yeshwant Lokhande, Viraj Gangadhar Karandikar, Niranjan Rajendra Wartikar, Sumit Kumar Bhattacharya
Systems and methods for intelligent voice activation for auto-mixing

Patent number: 11798575

Abstract: Embodiments allow for an auto-mixer to gate microphones on and off based on speech detection, without losing or discarding the speech received during the speech recognition period. An example method includes receiving and storing an input audio signal. The method also includes determining, based on a first segment of the input audio signal, that the input audio signal comprises speech, and determining a delay between the input audio signal and a corresponding output audio signal provided to a speaker. The method also includes reducing the delay, wherein reducing the delay comprises removing one or more segments of the stored input audio signal to create a time-compressed audio signal and providing the time-compressed audio signal as the corresponding output audio signal. The method also includes determining that the delay is less than a threshold duration, and responsively providing the input audio signal as the corresponding output audio signal.

Type: Grant

Filed: May 3, 2021

Date of Patent: October 24, 2023

Assignee: Shure Acquisition Holdings, Inc.

Inventors: Michael Ryan Lester, Jose Roberto Regalbuto, David Grant Cason
Method and apparatus with speech processing

Patent number: 11776529

Abstract: A method, the method includes determining a target segment partially overlapping a preceding segment from a speech signal, determining a target character sequence corresponding to the target segment by decoding the target segment, identifying a first overlapping portion between the target character sequence and a preceding character sequence based on an edit distance, and merging the target character sequence and the preceding character sequence based on the first overlapping portion. A cost applied to the edit distance is determined based on any one or any combination of any two or more of a type of operation performed at the edit distance, whether characters to be operated are located in the first overlapping portion, and whether the characters to be operated match. A portion overlapping the preceding segment in the target segment is greater than or equal to 8.3% of the target segment.

Type: Grant

Filed: July 7, 2021

Date of Patent: October 3, 2023

Assignee: Samsung Electronics Co., Ltd.

Inventor: Tae Gyoon Kang
Signal processing apparatus, signal processing method, and signal processing program

Patent number: 11769517

Abstract: This invention provides a signal processing apparatus capable of obtaining an output signal of sufficiently high quality if the phase of an input signal is largely different from the phase of a true voice. The signal processing apparatus includes a voice detector that receives a mixed signal including a voice and a signal other than the voice and obtains existence of the voice as a voice flag, a corrector that receives the mixed signal and the voice flag and obtains a corrected mixed signal generated by correcting the mixed signal in accordance with a state of the voice flag, and a shaper that receives the corrected mixed signal and shapes the corrected mixed signal.

Type: Grant

Filed: August 24, 2018

Date of Patent: September 26, 2023

Assignees: NEC CORPORATION, NEC Platforms, Ltd.

Inventors: Akihiko Sugiyama, Ryoji Miyahara
Silent phonemes for tracking end of speech

Patent number: 11727917

Abstract: Embodiments describe a method for speech endpoint detection including receiving identification data for a first state associated with a first frame of speech data from a WFST language model, determining that the first frame of the speech data includes silence data, incrementing a silence counter associated with the first state, copying a value of the silence counter of the first state to a corresponding silence counter field in a second state associated with the first state in an active state list, and determining that the value of the silence counter for the first state is above a silence threshold. The method further includes, determining that an endpoint of the speech has occurred in response to determining that the silence counter is above the silence threshold, and outputting text data representing a plurality of words determined from the speech data that was received prior to the endpoint.

Type: Grant

Filed: May 24, 2021

Date of Patent: August 15, 2023

Assignee: Amazon Technologies, Inc.

Inventor: Pushkaraksha Gejji
Method and apparatus with speech processing

Patent number: 11721323

Abstract: A method, the method includes determining a target segment from a speech signal, determining a target character sequence corresponding to the target segment by decoding the target segment, identifying a first overlapping portion between the target character sequence and a preceding character sequence based on an edit distance, and merging the target character sequence and the preceding character sequence based on the first overlapping portion. A cost applied to the edit distance is determined based on any one or any combination of any two or more of a type of operation performed at the edit distance, whether characters to be operated are located in the first overlapping portion, and whether the characters to be operated match.

Type: Grant

Filed: October 29, 2020

Date of Patent: August 8, 2023

Assignee: Samsung Electronics Co., Ltd.

Inventor: Tae Gyoon Kang
Mechanism for retrieval of previously captured audio

Patent number: 11662974

Abstract: In one aspect a device-side audio handling input/output unit (DIO) of a hardware device writes audio data generated by the hardware device within a ring buffer. An input provided by a user for activation of a software program is received, and a notification that the software program is ready to accept the audio data is generated. A system-side audio handling input/output unit (SIO) additionally provides past audio data from the ring buffer to the software program. Other aspects also are described.

Type: Grant

Filed: March 26, 2021

Date of Patent: May 30, 2023

Assignee: Apple Inc.

Inventors: Jeffrey C. Moore, Richard M. Powell, Alexander C. Powers, Anthony J. Guetta
Voice driven dynamic menus

Patent number: 11640227

Abstract: Disclosed are systems, methods, and computer-readable storage media to provide voice driven dynamic menus. One aspect disclosed is a method including receiving, by an electronic device, video data and audio data, displaying, by the electronic device, a video window, determining, by the electronic device, whether the audio data includes a voice signal, displaying, by the electronic device, a first menu in the video window in response to the audio data including a voice signal, displaying, by the electronic device, a second menu in the video window in response to a voice signal being absent from the audio data, receiving, by the electronic device, input from the displayed menu, and writing, by the electronic device, to an output device based on the received input.

Type: Grant

Filed: October 20, 2020

Date of Patent: May 2, 2023

Assignee: SNAP INC.

Inventor: Jesse Chand
Method and electronic apparatus for detecting tampering audio, and storage medium

Patent number: 11636871

Abstract: Disclosed are a method, an electronic apparatus for detecting tampering audio and a storage medium. The method includes: acquiring a signal to be detected, and performing a wavelet transform of a first preset order on the signal to be detected so as to obtain a first low-frequency coefficient and a first high-frequency coefficient corresponding to the signal to be detected, the number of which is equal to that of the first preset order; performing an inverse wavelet transform on the first high-frequency coefficient having an order greater than or equal to a second preset order so as to obtain a first high-frequency component signal corresponding to the signal to be detected; calculating a first Mel cepstrum feature of the first high-frequency component signal in units of frame, and concatenating the first Mel cepstrum features of a current frame signal and a preset number of frame signals.

Type: Grant

Filed: February 8, 2022

Date of Patent: April 25, 2023

Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua Tao, Shan Liang, Shuai Nie, Jiangyan Yi
Crash detection system comprising a microphone, first and second batteries, and first and second antennas and a method of operating the same

Patent number: 11601542

Abstract: A crash detection system for a vehicle comprises first and second batteries and a computational system comprising a processor and a non-transitory computer-readable medium. First and second antennas are both in electronic communication with the computational system and powered by one of the batteries, with the antennas configured to independently wirelessly communicate with an external network. A microphone powered by one of the batteries and in electronic communication with the computational system. The microphone continuously receives sound waves in real time and transmits sound signals to the processor. The processor monitors properties of the sound waves within the sound signals, compares the properties to thresholds stored in the non-transitory computer-readable medium, determines if the vehicle has been involved in a collision if at least one of the properties crosses the respective threshold, and communicates with the external network to report the collision with one of the antennas.

Type: Grant

Filed: September 22, 2021

Date of Patent: March 7, 2023

Assignee: GM GLOBAL TECHNOLOGY OPERATIONS LLC

Inventors: Alex Jose Veloso, Mateus Amstalden Santa Rosa, Russell A. Patenaude, Matthew Edward Gilbert-Eyres, Dipankar Pal
Method and apparatus for processing speech

Patent number: 11488603

Abstract: Embodiments of the present disclosure provide a method and apparatus for processing a speech. The method may include: acquiring an original speech; performing speech recognition on the original speech, to obtain an original text corresponding to the original speech; associating a speech segment in the original speech with a text segment in the original text; recognizing an abnormal segment in the original speech and/or the original text; and processing a text segment indicated by the abnormal segment in the original text and/or the speech segment indicated by the abnormal segment in the original speech, to generate a final speech. A speech segment in the original speech is associated with a text segment in the original text to realize visual processing of the speech.

Type: Grant

Filed: December 11, 2019

Date of Patent: November 1, 2022

Assignee: Beijing Baidu Netcom Science and Technology Co., Ltd.

Inventors: Wanqi Tang, Jiamei Kang, Lixia Zeng, Yijing Zhou, Hanmei Xie, Lina Zhu
Low complexity auditory event boundary detection

Patent number: 8938313

Abstract: An auditory event boundary detector employs down-sampling of the input digital audio signal without an anti-aliasing filter, resulting in a narrower bandwidth intermediate signal with aliasing. Spectral changes of that intermediate signal, indicating event boundaries, may be detected using an adaptive filter to track a linear predictive model of the samples of the intermediate signal. Changes in the magnitude or power of the filter error correspond to changes in the spectrum of the input audio signal. The adaptive filter converges at a rate consistent with the duration of auditory events, so filter error magnitude or power changes indicate event boundaries. The detector is much less complex than methods employing time-to-frequency transforms for the full bandwidth of the audio signal.

Type: Grant

Filed: April 12, 2010

Date of Patent: January 20, 2015

Assignee: Dolby Laboratories Licensing Corporation

Inventor: Glenn N. Dickins
System and method for an endpoint detection of speech for improved speech recognition in noisy environments

Publication number: 20100030559

Abstract: According to a disclosed embodiment, an endpointer determines the background energy of a first portion of a speech signal, and a cepstral computing module extracts one or more features of the first portion. The endpointer calculates an average distance of the first portion based on the features. Subsequently, an energy computing module measures the energy of a second portion of the speech signal, and the cepstral computing module extracts one or more features of the second portion. Based on the features of the second portion, the endpointer calculates a distance of the second portion. Thereafter, the endpointer contrasts the energy of the second portion with the background energy of the first portion, and compares the distance of the second portion with the distance of the first portion. The second portion of the speech signal is classified by the endpointer as speech or non-speech based on the contrast and the comparison.

Type: Application

Filed: June 25, 2009

Publication date: February 4, 2010

Applicant: MINDSPEED TECHNOLOGIES, INC.

Inventors: Sahar E. Bou-Ghazale, Ayman O. Asadi, Khaled Assaleh