End Point Detection (epo) Patents (Class 704/E11.005)
-
Patent number: 12236950Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.Type: GrantFiled: January 3, 2023Date of Patent: February 25, 2025Assignee: Amazon Technologies, Inc.Inventors: Ariya Rastrow, Eli Joshua Fidler, Roland Maximilian Rolf Maas, Nikko Strom, Aaron Eakin, Diamond Bishop, Bjorn Hoffmeister, Sanjeev Mishra
-
Patent number: 12210085Abstract: An angle measuring device includes an antenna device having antenna elements equally spaced along a first axis and a second axis, respectively, a selecting unit that selects phase differences with which a variance thereof becomes a predetermined value or less, from a plurality of phase differences of signals received from a transmission device by the antenna elements, an azimuth angle computing unit that computes an azimuth angle of the transmission device from a ratio of a first phase difference between signals received by two antenna elements spaced by a predetermined distance along the first axis and a second phase difference between signals received by two antenna elements space by the predetermined distance along the second axis, and an elevation angle computing unit that computes an elevation angle of the transmission device, based on the computed azimuth angle and the first or second phase difference.Type: GrantFiled: August 9, 2022Date of Patent: January 28, 2025Assignee: ALPS ALPINE CO., LTD.Inventors: Taiki Igarashi, Mitsunobu Inoue, Naoya Shimada, Daisuke Takai
-
Patent number: 12183322Abstract: A method includes receiving a sequence of acoustic frames characterizing one or more utterances as input to a multilingual automated speech recognition (ASR) model. The method also includes generating a higher order feature representation for a corresponding acoustic frame. The method also includes generating a hidden representation based on a sequence of non-blank symbols output by a final softmax layer. The method also includes generating a probability distribution over possible speech recognition hypotheses based on the hidden representation generated by the prediction network at each of the plurality of output steps and the higher order feature representation generated by the encoder at each of the plurality of output steps. The method also includes predicting an end of utterance (EOU) token at an end of each utterance. The method also includes classifying each acoustic frame as either speech, initial silence, intermediate silence, or final silence.Type: GrantFiled: September 22, 2022Date of Patent: December 31, 2024Assignee: Google LLCInventors: Bo Li, Tara N. Sainath, Ruoming Pang, Shuo-yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani
-
Patent number: 12161481Abstract: The invention is a method for automatic detection of neurocognitive impairment, comprising, generating, in a segmentation and labelling step (11), a labelled segment series (26) from a speech sample (22) using a speech recognition unit (24); and generating from the labelled segment series (26), in an acoustic parameter calculation step (12), acoustic parameters (30) characterizing the speech sample (22).Type: GrantFiled: December 16, 2019Date of Patent: December 10, 2024Assignee: Szededi TudomanyegyetemInventors: Gábor Gosztolya, Ildikó Hoffmann, János Kálmán, Magdolna Pákáski, László Tóth, Veronika Vincze
-
Patent number: 12119012Abstract: The present disclosure relates to a method and an apparatus for audio processing and a storage medium. The method includes: obtaining an audio mixing feature of a target object, in which the audio mixing feature at least includes: a voiceprint feature and a pitch feature of the target object; and determining a target audio matching with the target object in the mixed audio according to the audio mixing feature.Type: GrantFiled: June 21, 2021Date of Patent: October 15, 2024Assignee: Beijing Xiaomi Pinecone Electronics Co., Ltd.Inventors: Na Xu, Yongtao Jia, Linzhang Wang
-
Patent number: 12112744Abstract: The disclosure provides a multimodal speech recognition method and system, and a computer-readable storage medium. The method includes calculating a first logarithmic mel-frequency spectral coefficient and a second logarithmic mel-frequency spectral coefficient when a target millimeter-wave signal and a target audio signal both contain speech information corresponding to a target user; inputting the first and the second logarithmic mel-frequency spectral coefficient into a fusion network to determine a target fusion feature, where the fusion network includes at least a calibration module and a mapping module, the calibration module is configured to perform mutual feature calibration on the target audio/millimeter-wave signals, and the mapping module is configured to fuse a calibrated millimeter-wave feature and a calibrated audio feature; and inputting the target fusion feature into a semantic feature network to determine a speech recognition result corresponding to the target user.Type: GrantFiled: March 2, 2022Date of Patent: October 8, 2024Assignee: Zhejiang UniversityInventors: Feng Lin, Tiantian Liu, Ming Gao, Chao Wang, Zhongjie Ba, Jinsong Han, Wenyao Xu, Kui Ren
-
Patent number: 12100410Abstract: As pitch enhancement processing, a pitch enhancement apparatus obtains, for a time segment judged to be a time segment including a signal that is a consonant, for each time of the time segment, as an output signal, a signal including a signal obtained by adding a signal, which was obtained by multiplying a signal at a time that is an earlier time than the time by the number of samples T0 corresponding to a pitch period of the time segment, the pitch gain ?0 of the time segment, a predetermined constant B0, and a value that is greater than 0 and less than 1, and a signal at the time.Type: GrantFiled: March 22, 2019Date of Patent: September 24, 2024Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Yutaka Kamamoto, Ryosuke Sugiura, Takehiro Moriya
-
Patent number: 12061646Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to identify media that has been pitch shifted, time shifted, and/or resampled. An example apparatus includes: memory; instructions in the apparatus; and processor circuitry to execute the instructions to: transmit a fingerprint of an audio signal and adjusting instructions to a central facility to facilitate a query, the adjusting instructions identifying at least one of a pitch shift, a time shift, or a resample ratio; obtain a response including an identifier for the audio signal and information corresponding to how the audio signal was adjusted; and change the adjusting instructions based on the information.Type: GrantFiled: July 5, 2023Date of Patent: August 13, 2024Assignee: GRACENOTE, INC.Inventors: Robert Coover, Matthew James Wilkinson, Jeffrey Scott, Yongju Hong
-
Patent number: 12050557Abstract: A computerized system and method of compressing symbolic information organized into a plurality of documents, each document having a plurality of symbols, the system and method including: (i) automatically identifying a plurality of sequential (also referred to as adjacent) and/or non-sequential symbol (also referred to as non-adjacent) pairs in an input document; (ii) counting the number of appearances of each unique symbol pair; and (iii) producing a compressed document that includes a replacement symbol at each position associated with one of the plurality of symbol pairs, at least one of which corresponds to a non-sequential symbol pair. For each non-sequential pair the compressed document includes corresponding indicia indicating a distance between locations of the non-sequential symbols of the pair in the input document.Type: GrantFiled: November 22, 2021Date of Patent: July 30, 2024Inventor: Takashi Suzuki
-
Patent number: 11996119Abstract: The end-of-talk prediction device (10) of the present invention comprises: a divide unit (11) for dividing, using delimiter symbols indicating delimitations within segments, a string in which the utterance in the dialog has been text-converted by speech recognition, the delimiter symbols included in the result of the speech recognition; and an end-of-talk prediction unit (12) for predicting, using an end-of-talk prediction model (14), whether the utterance corresponding to the divided string divided by the divide unit (11) is an end-of-talk utterance of the speaker.Type: GrantFiled: August 14, 2019Date of Patent: May 28, 2024Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Setsuo Yamada, Yoshiaki Noda, Takaaki Hasegawa
-
Patent number: 11972751Abstract: Disclosed are a method and an apparatus for detecting a voice end point by using acoustic and language modeling information to accomplish strong voice recognition. A voice end point detection method according to an embodiment may comprise the steps of: inputting an acoustic feature vector sequence extracted from a microphone input signal into an acoustic embedding extraction unit, a phonemic embedding extraction unit, and a decoder embedding extraction unit, which are based on a recurrent neural network (RNN); combining acoustic embedding, phonemic embedding, and decoder embedding to configure a feature vector by the acoustic embedding extraction unit, the phonemic embedding extraction unit, and the decoder embedding extraction unit; and inputting the combined feature vector into a deep neural network (DNN)-based classifier to detect a voice end point.Type: GrantFiled: June 29, 2020Date of Patent: April 30, 2024Assignee: IUCF-HYU (INDUSTRY-UNIVERSITY COOPERATION FOUNDATION HANYANG UNIVERSITY)Inventors: Joon-Hyuk Chang, Inyoung Hwang
-
Patent number: 11921916Abstract: Image editing on a wearable device includes a system which obtains sensor data via the wearable device. The sensor data includes a representation of hand movement, head movement or voice command associated with a user. The system executes an application for editing an image based on the obtained sensor data. The system provides for display a list of image adjustment types associated with the application. The system selects an image adjustment type based on one or more of the hand movement, the head movement or the voice command. The system provides for display a prompt having options to adjust a property of the selected image adjustment type. The system selects one of the options included in the prompt. The system modifies an image based on the selected option. The system then provides the modified image for storage in a data structure of a memory unit in the wearable device.Type: GrantFiled: December 31, 2020Date of Patent: March 5, 2024Assignee: Google LLCInventors: Thomas Binder, Ronald Frank Wotzlaw
-
Patent number: 11817117Abstract: In various examples, end of speech (EOS) for an audio signal is determined based at least in part on a rate of speech for a speaker. For a segment of the audio signal, EOS is indicated based at least in part on an EOS threshold determined based at least in part on the rate of speech for the speaker.Type: GrantFiled: January 29, 2021Date of Patent: November 14, 2023Assignee: NVIDIA CORPORATIONInventors: Utkarsh Vaidya, Ravindra Yeshwant Lokhande, Viraj Gangadhar Karandikar, Niranjan Rajendra Wartikar, Sumit Kumar Bhattacharya
-
Patent number: 11798575Abstract: Embodiments allow for an auto-mixer to gate microphones on and off based on speech detection, without losing or discarding the speech received during the speech recognition period. An example method includes receiving and storing an input audio signal. The method also includes determining, based on a first segment of the input audio signal, that the input audio signal comprises speech, and determining a delay between the input audio signal and a corresponding output audio signal provided to a speaker. The method also includes reducing the delay, wherein reducing the delay comprises removing one or more segments of the stored input audio signal to create a time-compressed audio signal and providing the time-compressed audio signal as the corresponding output audio signal. The method also includes determining that the delay is less than a threshold duration, and responsively providing the input audio signal as the corresponding output audio signal.Type: GrantFiled: May 3, 2021Date of Patent: October 24, 2023Assignee: Shure Acquisition Holdings, Inc.Inventors: Michael Ryan Lester, Jose Roberto Regalbuto, David Grant Cason
-
Patent number: 11776529Abstract: A method, the method includes determining a target segment partially overlapping a preceding segment from a speech signal, determining a target character sequence corresponding to the target segment by decoding the target segment, identifying a first overlapping portion between the target character sequence and a preceding character sequence based on an edit distance, and merging the target character sequence and the preceding character sequence based on the first overlapping portion. A cost applied to the edit distance is determined based on any one or any combination of any two or more of a type of operation performed at the edit distance, whether characters to be operated are located in the first overlapping portion, and whether the characters to be operated match. A portion overlapping the preceding segment in the target segment is greater than or equal to 8.3% of the target segment.Type: GrantFiled: July 7, 2021Date of Patent: October 3, 2023Assignee: Samsung Electronics Co., Ltd.Inventor: Tae Gyoon Kang
-
Patent number: 11769517Abstract: This invention provides a signal processing apparatus capable of obtaining an output signal of sufficiently high quality if the phase of an input signal is largely different from the phase of a true voice. The signal processing apparatus includes a voice detector that receives a mixed signal including a voice and a signal other than the voice and obtains existence of the voice as a voice flag, a corrector that receives the mixed signal and the voice flag and obtains a corrected mixed signal generated by correcting the mixed signal in accordance with a state of the voice flag, and a shaper that receives the corrected mixed signal and shapes the corrected mixed signal.Type: GrantFiled: August 24, 2018Date of Patent: September 26, 2023Assignees: NEC CORPORATION, NEC Platforms, Ltd.Inventors: Akihiko Sugiyama, Ryoji Miyahara
-
Patent number: 11727917Abstract: Embodiments describe a method for speech endpoint detection including receiving identification data for a first state associated with a first frame of speech data from a WFST language model, determining that the first frame of the speech data includes silence data, incrementing a silence counter associated with the first state, copying a value of the silence counter of the first state to a corresponding silence counter field in a second state associated with the first state in an active state list, and determining that the value of the silence counter for the first state is above a silence threshold. The method further includes, determining that an endpoint of the speech has occurred in response to determining that the silence counter is above the silence threshold, and outputting text data representing a plurality of words determined from the speech data that was received prior to the endpoint.Type: GrantFiled: May 24, 2021Date of Patent: August 15, 2023Assignee: Amazon Technologies, Inc.Inventor: Pushkaraksha Gejji
-
Patent number: 11721323Abstract: A method, the method includes determining a target segment from a speech signal, determining a target character sequence corresponding to the target segment by decoding the target segment, identifying a first overlapping portion between the target character sequence and a preceding character sequence based on an edit distance, and merging the target character sequence and the preceding character sequence based on the first overlapping portion. A cost applied to the edit distance is determined based on any one or any combination of any two or more of a type of operation performed at the edit distance, whether characters to be operated are located in the first overlapping portion, and whether the characters to be operated match.Type: GrantFiled: October 29, 2020Date of Patent: August 8, 2023Assignee: Samsung Electronics Co., Ltd.Inventor: Tae Gyoon Kang
-
Patent number: 11662974Abstract: In one aspect a device-side audio handling input/output unit (DIO) of a hardware device writes audio data generated by the hardware device within a ring buffer. An input provided by a user for activation of a software program is received, and a notification that the software program is ready to accept the audio data is generated. A system-side audio handling input/output unit (SIO) additionally provides past audio data from the ring buffer to the software program. Other aspects also are described.Type: GrantFiled: March 26, 2021Date of Patent: May 30, 2023Assignee: Apple Inc.Inventors: Jeffrey C. Moore, Richard M. Powell, Alexander C. Powers, Anthony J. Guetta
-
Patent number: 11640227Abstract: Disclosed are systems, methods, and computer-readable storage media to provide voice driven dynamic menus. One aspect disclosed is a method including receiving, by an electronic device, video data and audio data, displaying, by the electronic device, a video window, determining, by the electronic device, whether the audio data includes a voice signal, displaying, by the electronic device, a first menu in the video window in response to the audio data including a voice signal, displaying, by the electronic device, a second menu in the video window in response to a voice signal being absent from the audio data, receiving, by the electronic device, input from the displayed menu, and writing, by the electronic device, to an output device based on the received input.Type: GrantFiled: October 20, 2020Date of Patent: May 2, 2023Assignee: SNAP INC.Inventor: Jesse Chand
-
Patent number: 11636871Abstract: Disclosed are a method, an electronic apparatus for detecting tampering audio and a storage medium. The method includes: acquiring a signal to be detected, and performing a wavelet transform of a first preset order on the signal to be detected so as to obtain a first low-frequency coefficient and a first high-frequency coefficient corresponding to the signal to be detected, the number of which is equal to that of the first preset order; performing an inverse wavelet transform on the first high-frequency coefficient having an order greater than or equal to a second preset order so as to obtain a first high-frequency component signal corresponding to the signal to be detected; calculating a first Mel cepstrum feature of the first high-frequency component signal in units of frame, and concatenating the first Mel cepstrum features of a current frame signal and a preset number of frame signals.Type: GrantFiled: February 8, 2022Date of Patent: April 25, 2023Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCESInventors: Jianhua Tao, Shan Liang, Shuai Nie, Jiangyan Yi
-
Patent number: 11601542Abstract: A crash detection system for a vehicle comprises first and second batteries and a computational system comprising a processor and a non-transitory computer-readable medium. First and second antennas are both in electronic communication with the computational system and powered by one of the batteries, with the antennas configured to independently wirelessly communicate with an external network. A microphone powered by one of the batteries and in electronic communication with the computational system. The microphone continuously receives sound waves in real time and transmits sound signals to the processor. The processor monitors properties of the sound waves within the sound signals, compares the properties to thresholds stored in the non-transitory computer-readable medium, determines if the vehicle has been involved in a collision if at least one of the properties crosses the respective threshold, and communicates with the external network to report the collision with one of the antennas.Type: GrantFiled: September 22, 2021Date of Patent: March 7, 2023Assignee: GM GLOBAL TECHNOLOGY OPERATIONS LLCInventors: Alex Jose Veloso, Mateus Amstalden Santa Rosa, Russell A. Patenaude, Matthew Edward Gilbert-Eyres, Dipankar Pal
-
Patent number: 11488603Abstract: Embodiments of the present disclosure provide a method and apparatus for processing a speech. The method may include: acquiring an original speech; performing speech recognition on the original speech, to obtain an original text corresponding to the original speech; associating a speech segment in the original speech with a text segment in the original text; recognizing an abnormal segment in the original speech and/or the original text; and processing a text segment indicated by the abnormal segment in the original text and/or the speech segment indicated by the abnormal segment in the original speech, to generate a final speech. A speech segment in the original speech is associated with a text segment in the original text to realize visual processing of the speech.Type: GrantFiled: December 11, 2019Date of Patent: November 1, 2022Assignee: Beijing Baidu Netcom Science and Technology Co., Ltd.Inventors: Wanqi Tang, Jiamei Kang, Lixia Zeng, Yijing Zhou, Hanmei Xie, Lina Zhu
-
Patent number: 8938313Abstract: An auditory event boundary detector employs down-sampling of the input digital audio signal without an anti-aliasing filter, resulting in a narrower bandwidth intermediate signal with aliasing. Spectral changes of that intermediate signal, indicating event boundaries, may be detected using an adaptive filter to track a linear predictive model of the samples of the intermediate signal. Changes in the magnitude or power of the filter error correspond to changes in the spectrum of the input audio signal. The adaptive filter converges at a rate consistent with the duration of auditory events, so filter error magnitude or power changes indicate event boundaries. The detector is much less complex than methods employing time-to-frequency transforms for the full bandwidth of the audio signal.Type: GrantFiled: April 12, 2010Date of Patent: January 20, 2015Assignee: Dolby Laboratories Licensing CorporationInventor: Glenn N. Dickins
-
Publication number: 20100030559Abstract: According to a disclosed embodiment, an endpointer determines the background energy of a first portion of a speech signal, and a cepstral computing module extracts one or more features of the first portion. The endpointer calculates an average distance of the first portion based on the features. Subsequently, an energy computing module measures the energy of a second portion of the speech signal, and the cepstral computing module extracts one or more features of the second portion. Based on the features of the second portion, the endpointer calculates a distance of the second portion. Thereafter, the endpointer contrasts the energy of the second portion with the background energy of the first portion, and compares the distance of the second portion with the distance of the first portion. The second portion of the speech signal is classified by the endpointer as speech or non-speech based on the contrast and the comparison.Type: ApplicationFiled: June 25, 2009Publication date: February 4, 2010Applicant: MINDSPEED TECHNOLOGIES, INC.Inventors: Sahar E. Bou-Ghazale, Ayman O. Asadi, Khaled Assaleh