Patents by Inventor Bjorn Hoffmeister

Bjorn Hoffmeister has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DEVICE-DIRECTED UTTERANCE DETECTION

Publication number: 20250174231

Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.

Type: Application

Filed: January 21, 2025

Publication date: May 29, 2025

Inventors: Ariya Rastrow, Eli Joshua Fidler, Roland Maximilian Rolf Maas, Nikko Strom, Aaron Eakin, Diamond Bishop, Bjorn Hoffmeister, Sanjeev Mishra
Device-directed utterance detection

Patent number: 12236950

Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.

Type: Grant

Filed: January 3, 2023

Date of Patent: February 25, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Ariya Rastrow, Eli Joshua Fidler, Roland Maximilian Rolf Maas, Nikko Strom, Aaron Eakin, Diamond Bishop, Bjorn Hoffmeister, Sanjeev Mishra
Endpointing in speech processing

Patent number: 12211517

Abstract: A speech-processing system may determine potential endpoints in a user's speech. Such endpoint prediction may include determining a potential endpoint in a stream of audio data, and may additionally including determining an endpoint score representing a likelihood that the potential endpoint represents an end of speech representing a complete user input. When the potential endpoint has been determined, the system may publish a transcript of speech that preceded the potential endpoint, and send it to downstream components. The system may continue to transcribe audio data and determine additional potential endpoints while the downstream components process the transcript. The downstream components may determine whether the transcript is complete; e.g., represents the entirety of the user input. Final endpoint determinations may be made based on the results of the downstream processing including automatic speech recognition, natural language understanding, etc.

Type: Grant

Filed: September 15, 2021

Date of Patent: January 28, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Roland Maximilian Rolf Maas, Bjorn Hoffmeister, Ariya Rastrow, James Garnet Droppo, Veerdhawal Pande, Maarten Van Segbroeck, Gautam Tiwari, Andrew Smith, Eli Joshua Fidler
Language model adaptation

Patent number: 12014726

Abstract: Exemplary embodiments relate to adapting a generic language model during runtime using domain-specific language model data. The system performs an audio frame-level analysis, to determine if the utterance corresponds to a particular domain and whether the ASR hypothesis needs to be rescored. The system processes, using a trained classifier, the ASR hypothesis (a partial hypothesis) generated for the audio data processed so far. The system determines whether to rescore the hypothesis after every few audio frames (representing a word in the utterance) are processed by the speech recognition system.

Type: Grant

Filed: March 28, 2022

Date of Patent: June 18, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Ankur Gandhe, Ariya Rastrow, Roland Maximilian Rolf Maas, Bjorn Hoffmeister
DEVICE-DIRECTED UTTERANCE DETECTION

Publication number: 20230223023

Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.

Type: Application

Filed: January 3, 2023

Publication date: July 13, 2023

Inventors: Ariya Rastrow, Eli Joshua Fidler, Roland Maximilian Rolf Maas, Nikko Strom, Aaron Eakin, Diamond Bishop, Bjorn Hoffmeister, Sanjeev Mishra
Wake word detection modeling

Patent number: 11657804

Abstract: Features are disclosed for detecting words in audio using contextual information in addition to automatic speech recognition results. A detection model can be generated and used to determine whether a particular word, such as a keyword or “wake word,” has been uttered. The detection model can operate on features derived from an audio signal, contextual information associated with generation of the audio signal, and the like. In some embodiments, the detection model can be customized for particular users or groups of users based usage patterns associated with the users.

Type: Grant

Filed: November 5, 2020

Date of Patent: May 23, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Rohit Prasad, Kenneth John Basye, Spyridon Matsoukas, Rajiv Ramachandran, Shiv Naga Prasad Vitaladevuni, Bjorn Hoffmeister
Deep multi-channel acoustic modeling using multiple microphone array geometries

Patent number: 11574628

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-geometry/multi-channel DNN) that is trained using a plurality of microphone array geometries. Thus, the first model may receive a variable number of microphone channels, generate multiple outputs using multiple microphone array geometries, and select the best output as a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. The DNN front-end enables improved performance despite a reduction in microphones.

Type: Grant

Filed: March 28, 2019

Date of Patent: February 7, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Kenichi Kumatani, Minhua Wu, Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister
Device-directed utterance detection

Patent number: 11551685

Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.

Type: Grant

Filed: March 18, 2020

Date of Patent: January 10, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Ariya Rastrow, Eli Joshua Fidler, Roland Maximilian Rolf Maas, Nikko Strom, Aaron Eakin, Diamond Bishop, Bjorn Hoffmeister, Sanjeev Mishra
Anchored speech detection and speech recognition

Patent number: 11514901

Abstract: A system configured to process speech commands may classify incoming audio as desired speech, undesired speech, or non-speech. Desired speech is speech that is from a same speaker as reference speech. The reference speech may be obtained from a configuration session or from a first portion of input speech that includes a wakeword. The reference speech may be encoded using a recurrent neural network (RNN) encoder to create a reference feature vector. The reference feature vector and incoming audio data may be processed by a trained neural network classifier to label the incoming audio data (for example, frame-by-frame) as to whether each frame is spoken by the same speaker as the reference speech. The labels may be passed to an automatic speech recognition (ASR) component which may allow the ASR component to focus its processing on the desired speech.

Type: Grant

Filed: June 11, 2019

Date of Patent: November 29, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Sree Hari Krishnan Parthasarathi, Bjorn Hoffmeister, Brian King, Roland Maas
LANGUAGE MODEL ADAPTATION

Publication number: 20220358908

Abstract: Exemplary embodiments relate to adapting a generic language model during runtime using domain-specific language model data. The system performs an audio frame-level analysis, to determine if the utterance corresponds to a particular domain and whether the ASR hypothesis needs to be rescored. The system processes, using a trained classifier, the ASR hypothesis (a partial hypothesis) generated for the audio data processed so far. The system determines whether to rescore the hypothesis after every few audio frames (representing a word in the utterance) are processed by the speech recognition system.

Type: Application

Filed: March 28, 2022

Publication date: November 10, 2022

Inventors: Ankur Gandhe, Ariya Rastrow, Roland Maximilian Rolf Maas, Bjorn Hoffmeister
Deep multi-channel acoustic modeling

Patent number: 11475881

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.

Type: Grant

Filed: July 17, 2020

Date of Patent: October 18, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Arindam Mandal, Kenichi Kumatani, Nikko Strom, Minhua Wu, Shiva Sundaram, Bjorn Hoffmeister, Jeremie Lecomte
Detecting system-directed speech

Patent number: 11361763

Abstract: A speech-processing system capable of receiving and processing audio data to determine if the audio data includes speech that was intended for the system. Non-system directed speech may be filtered out while system-directed speech may be selected for further processing. A system-directed speech detector may use a trained machine learning model (such as a deep neural network or the like) to process a feature vector representing a variety of characteristics of the incoming audio data, including the results of automatic speech recognition and/or other data. Using the feature vector the model may output an indicator as to whether the speech is system-directed. The system may also incorporate other filters such as voice activity detection prior to speech recognition, or the like.

Type: Grant

Filed: September 1, 2017

Date of Patent: June 14, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Roland Maximilian Rolf Maas, Sri Harish Reddy Mallidi, Spyridon Matsoukas, Bjorn Hoffmeister
Language model adaptation

Patent number: 11302310

Abstract: Exemplary embodiments relate to adapting a generic language model during runtime using domain-specific language model data. The system performs an audio frame-level analysis, to determine if the utterance corresponds to a particular domain and whether the ASR hypothesis needs to be rescored. The system processes, using a trained classifier, the ASR hypothesis (a partial hypothesis) generated for the audio data processed so far. The system determines whether to rescore the hypothesis after every few audio frames (representing a word in the utterance) are processed by the speech recognition system.

Type: Grant

Filed: May 30, 2019

Date of Patent: April 12, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Ankur Gandhe, Ariya Rastrow, Roland Maximilian Rolf Maas, Bjorn Hoffmeister
DEVICE-DIRECTED UTTERANCE DETECTION

Publication number: 20210295833

Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.

Type: Application

Filed: March 18, 2020

Publication date: September 23, 2021

Inventors: Ariya Rastrow, Eli Joshua Fidler, Roland Maximilian Rolf Maas, Nikko Strom, Aaron Eakin, Diamond Bishop, Bjorn Hoffmeister, Sanjeev Mishra
KEYWORD DETECTION MODELING USING CONTEXTUAL INFORMATION

Publication number: 20210134276

Abstract: Features are disclosed for detecting words in audio using contextual information in addition to automatic speech recognition results. A detection model can be generated and used to determine whether a particular word, such as a keyword or “wake word,” has been uttered. The detection model can operate on features derived from an audio signal, contextual information associated with generation of the audio signal, and the like. In some embodiments, the detection model can be customized for particular users or groups of users based usage patterns associated with the users.

Type: Application

Filed: November 5, 2020

Publication date: May 6, 2021

Inventors: Rohit Prasad, Kenneth John Basye, Spyridon Matsoukas, Rajiv Ramachandran, Shiv Naga Prasad Vitaladevuni, Bjorn Hoffmeister
Monophone-based background modeling for wakeword detection

Patent number: 10964315

Abstract: An approach to wakeword detection uses an explicit representation of non-wakeword speech in the form of subword (e.g., phonetic monophone) units that do not necessarily occur in the wakeword and that broadly represent general speech. These subword units are arranged in a “background” model, which at runtime essentially competes with the wakeword model such that a wakeword is less likely to be declare as occurring when the input matches that background model well. An HMM may be used with the model to locate possible occurrences of the wakeword. Features are determined from portions of the input corresponding to subword units of the wakeword detected using the HMM. A secondary classifier is then used to process the features to yield a decision of whether the wakeword occurred.

Type: Grant

Filed: June 30, 2017

Date of Patent: March 30, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Minhua Wu, Sankaran Panchapagesan, Ming Sun, Shiv Naga Prasad Vitaladevuni, Bjorn Hoffmeister, Ryan Paul Thomas, Arindam Mandal
Speech detection and speech recognition

Patent number: 10923111

Abstract: A system configured to recognize text represented by speech may determine that a first portion of audio data corresponds to speech from a first speaker and that a second portion of audio data corresponds to speech from the first speaker and a second speaker. Features of the first portion are compared to features of the second portion to determine a similarity therebetween. Based on this similarity, speech from the first speaker is distinguished from speech from the second speaker and text corresponding to speech from the first speaker is determined.

Type: Grant

Filed: March 28, 2019

Date of Patent: February 16, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Xing Fan, I-Fan Chen, Yuzong Liu, Bjorn Hoffmeister, Yiming Wang, Tongfei Chen
Keyword detection modeling using contextual information

Patent number: 10832662

Abstract: Features are disclosed for detecting words in audio using contextual information in addition to automatic speech recognition results. A detection model can be generated and used to determine whether a particular word, such as a keyword or “wake word,” has been uttered. The detection model can operate on features derived from an audio signal, contextual information associated with generation of the audio signal, and the like. In some embodiments, the detection model can be customized for particular users or groups of users based usage patterns associated with the users.

Type: Grant

Filed: July 3, 2017

Date of Patent: November 10, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Rohit Prasad, Kenneth John Basye, Spyridon Matsoukas, Rajiv Ramachandran, Shiv Naga Prasad Vitaladevuni, Bjorn Hoffmeister
DEEP MULTI-CHANNEL ACOUSTIC MODELING

Publication number: 20200349928

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.

Type: Application

Filed: July 17, 2020

Publication date: November 5, 2020

Inventors: Arindam Mandal, Kenichi Kumatani, Nikko Strom, Minhua Wu, Shiva Sundaram, Bjorn Hoffmeister, Jeremie Lecomte
Deep multi-channel acoustic modeling

Patent number: 10726830

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.

Type: Grant

Filed: September 27, 2018

Date of Patent: July 28, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Arindam Mandal, Kenichi Kumatani, Nikko Strom, Minhua Wu, Shiva Sundaram, Bjorn Hoffmeister, Jeremie Lecomte

1 2 3 next