Patents Examined by Seong-Ah A Shin

Long-context end-to-end speech recognition system

Patent number: 11978435

Abstract: This invention relates generally to speech processing and more particularly to end-to-end automatic speech recognition (ASR) that utilizes long contextual information. Some embodiments of the invention provide a system and a method for end-to-end ASR suitable for recognizing long audio recordings such as lecture and conversational speeches. This disclosure includes a Transformer-based ASR system that utilizes contextual information, wherein the Transformer accepts multiple utterances at the same time and predicts transcript for the last utterance. This is repeated in a sliding-window fashion with one-utterance shifts to recognize the entire recording. In addition, some embodiments of the present invention may use acoustic and/or text features obtained from only the previous utterances spoken by the same speaker as the last utterance when the long audio recording includes multiple speakers.

Type: Grant

Filed: October 13, 2020

Date of Patent: May 7, 2024

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux
Event recognition method and apparatus, model training method and apparatus, and storage medium

Patent number: 11972213

Abstract: This application discloses an event recognition method, including: obtaining, by a terminal device, a target sentence used for recognizing a type of a target event; processing, by the terminal device, a target sentence based on an event recognition model, to obtain the type of the target event, the event recognition model being used for determining the type of the target event by using a trigger word in the target sentence and at least one context word of the trigger word, the trigger word being used for indicating candidate types of the target event, and the candidate types including the type of the target event; and outputting, by the terminal device, the type of the target event. According to the technical solutions of this application, an event recognition process is performed by using a trigger word and a context word of the trigger word.

Type: Grant

Filed: August 20, 2020

Date of Patent: April 30, 2024

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventor: Shulin Liu
Method and electronic device for processing a spoken utterance

Patent number: 11967319

Abstract: Methods and electronic devices for processing a spoken utterance associated with a user of an electronic device are disclosed. The method includes generating a textual representation of the spoken utterance having words, identifying a nonce word and a non-normalized word amongst the words, and generating a plurality of candidate textual representations based on the textual representation. The candidates have at least one of a first set of candidate textual representations and a second set of candidate textual representations, such that candidates from the first set are missing the nonce word from the words of the textual representation, and candidates from the second set have the non-normalized word from the words of the textual representation replaced by a normalized version thereof. The method includes comparing the candidates against grammars, and in response to a match, triggering an action associated with the grammar.

Type: Grant

Filed: August 3, 2021

Date of Patent: April 23, 2024

Assignee: Direct Cursus Technology L.L.C

Inventors: Daniil Garrievich Anastasyev, Boris Andreevich Samoylov, Vyacheslav Vyacheslavovich Alipov
Audio recognition method, method, apparatus for positioning target audio, and device

Patent number: 11967316

Abstract: Embodiments of this application disclose method and apparatus for positioning a target audio signal by an audio interaction device, and an audio interaction device The method includes: obtaining audio signals in a plurality of directions in a space, and performing echo cancellation on the audio signal, the audio signal including a target-audio direct signal; obtaining weights of a plurality of time-frequency points in the audio signals, a weight of each time-frequency point indicating, at the time-frequency point, a relative proportion of the target-audio direct signal in the audio signals; weighting time-frequency components of the audio signal at the plurality of time-frequency points separately for each of the plurality of directions by using the weights of the plurality of time-frequency points, to obtain a weighted audio signal energy distribution; and obtaining a sound source azimuth corresponding to the target-audio direct signal in the audio signals accordingly.

Type: Grant

Filed: February 23, 2021

Date of Patent: April 23, 2024

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Jimeng Zheng, Ian Ernan Liu, Yi Gao, Weiwei Li
Continuous utterance estimation apparatus, continuous utterance estimation method, and program

Patent number: 11961517

Abstract: Operations are appropriately changed in accordance with the use method. A keyword detection unit 11 generates a keyword detection result indicating a result of detecting an utterance of a predetermined keyword from an input voice. A voice detection unit 12 generates a voice section detection result indicating a result of detecting a voice section from the input voice. A sequential utterance detection unit 13 generates a sequential utterance detection result indicating that a sequential utterance has been made if the keyword detection result indicates that the keyword has been detected and if the voice section detection result indicates that the voice section has been detected.

Type: Grant

Filed: August 28, 2019

Date of Patent: April 16, 2024

Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventors: Kazunori Kobayashi, Shoichiro Saito, Hiroaki Ito
Key phrase spotting

Patent number: 11948570

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting utterances of a key phrase in an audio signal. One of the methods includes receiving, by a key phrase spotting system, an audio signal encoding one or more utterances; while continuing to receive the audio signal, generating, by the key phrase spotting system, an attention output using an attention mechanism that is configured to compute the attention output based on a series of encodings generated by an encoder comprising one or more neural network layers; generating, by the key phrase spotting system and using attention output, output that indicates whether the audio signal likely encodes the key phrase; and providing, by the key phrase spotting system, the output that indicates whether the audio signal likely encodes the key phrase.

Type: Grant

Filed: March 9, 2022

Date of Patent: April 2, 2024

Assignee: Google LLC

Inventors: Wei Li, Rohit Prakash Prabhavalkar, Kanury Kanishka Rao, Yanzhang He, Ian C. Mcgraw, Anton Bakhtin
Phoneme-based contextualization for cross-lingual speech recognition in end-to-end models

Patent number: 11942076

Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.

Type: Grant

Filed: February 16, 2022

Date of Patent: March 26, 2024

Assignee: Google LLC

Inventors: Ke Hu, Golan Pundak, Rohit Prakash Prabhavalkar, Antoine Jean Bruguier, Tara N. Sainath
Method and apparatus for matching audio clips, computer-readable medium, and electronic device

Patent number: 11929090

Abstract: A method for matching audio clips includes: obtaining a first feature sequence corresponding to a first audio clip and a second feature sequence corresponding to a second audio clip; constructing a distance matrix, elements in the distance matrix representing respective distances between first positions in the first feature sequence and second positions in the second feature sequence; calculating a first accumulation distance between a start position and a target position in the distance matrix, and calculating a second accumulation distance between an end position and the target position in the distance matrix; and calculating a minimum distance between the first feature sequence and the second feature sequence based on the first accumulation distance and the second accumulation distance, and determining a degree of matching between the first audio clip and the second audio clip according to the minimum distance.

Type: Grant

Filed: June 2, 2021

Date of Patent: March 12, 2024

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Fang Chao Lin, Wei Biao Yun, Peng Zeng
Immutable universal language assistive translation and interpretation system that verifies and validates translations and interpretations by smart contract and blockchain technology

Patent number: 11907677

Abstract: A universal language assistive translation and interpretation system that is configured to verify and validate translations and interpretations by way of blockchain technology and smart contracts, multiple cross-format translation and interpretation blockchain validating and recording processes for verifying and validating cross-format translations and interpretations by smart contract and blockchain technology, and several validated cross-format translation and interpretation blockchain access processes for providing cross-format interpretations and translations of inter-communications between users regardless of ability or disability are disclosed.

Type: Grant

Filed: March 2, 2023

Date of Patent: February 20, 2024

Inventor: Arash Borhany
Azimuth estimation method, device, and storage medium

Patent number: 11908456

Abstract: Embodiments of this application discloses an azimuth estimation method performed at a computing device, the method including: obtaining, in real time, multi-channel sampling signals and buffering the multi-channel sampling signals; performing wakeup word detection on one or more sampling signals of the multi-channel sampling signals, and determining a wakeup word detection score for each channel of the one or more sampling signals; performing a spatial spectrum estimation on the buffered multi-channel sampling signals to obtain a spatial spectrum estimation result, when the wakeup word detection scores of the one or more sampling signals indicates that a wakeup word exists in the one or more sampling signals; and determining an azimuth of a target voice associated with the multi-channel sampling signals according to the spatial spectrum estimation result and a highest wakeup word detection score, thereby improving the accuracy of the azimuth estimation in a voice interaction process.

Type: Grant

Filed: August 28, 2020

Date of Patent: February 20, 2024

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Jimeng Zheng, Yi Gao, Meng Yu, Ian Ernan Liu
Automated expression parallelization

Patent number: 11900065

Abstract: A system is capable of automatically adjusting or reconstructing a baseline expression to generate a parallelized expression. Evaluation of the parallelized expression provide a substantially similar output as the evaluation of the baseline query in more efficient manner. In some implementations, data indicating an expression to be evaluated on a primary thread of the one or more processors is obtained. Elements of the expression are identified. The elements are grouped into a parse tree representation. Elements of the expression are classified as belonging to either a first category that includes elements that are eligible for parallel processing or a second category that includes elements that are not eligible for parallel processing. A particular element that is classified as belonging to the first category is identified and evaluated on a non-primary thread of the one or more processors. The non-primary thread is evaluated in parallel with the primary thread.

Type: Grant

Filed: July 1, 2022

Date of Patent: February 13, 2024

Assignee: Appian Corporation

Inventors: Brian Joseph Sullivan, Matthew David Hilliard
Information collection device, and control method

Patent number: 11898866

Abstract: Annoyance is reduced that is caused to a subject person from which information is collected.

Type: Grant

Filed: May 30, 2019

Date of Patent: February 13, 2024

Assignee: Faurecia Clarion Electronics Co., Ltd.

Inventor: Masataka Motohashi
Speech processing system

Patent number: 11893984

Abstract: This disclosure proposes systems and methods for speech processing and sharing permitted entity information across speech processing systems. A first system can receive first audio data representing a first utterance. The first system can receive a first dialog identifier associated with a previous utterance. The first system can determine that the first audio data references a first entity. In some cases, the first system may not be able to resolve the first entity based on information in the first audio data. The first system can send, to a second system different from the first system, a first request for information about the first entity. The first request includes the first dialog identifier. The first system can receive first data responsive to the first request from the second system. The first system can process the first data and the first audio data to determine second data responsive to the first utterance, and output a first response representing the second data.

Type: Grant

Filed: June 22, 2020

Date of Patent: February 6, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Zoe Adams, Robert Monell Kilgore
Methods and systems for enabling a digital assistant to generate an ambient aware response

Patent number: 11887591

Abstract: Embodiments herein disclose methods and systems for providing a digital assistant in a device, which can generate responses to commands from a user based on ambience of the user. On receiving a command from the user of the device to perform an action, content stored in the device can be extracted. The embodiments include determining degree of privacy and sensitivity of the content. The embodiments include determining ambience of the user based on ambient noise, location of the device, presence of other humans, emotional state of the user, application parameters, user activity, and so on. The embodiments include generating a response and revealing the response based on the determined ambience and the degree of privacy and sensitivity of the extracted content. The embodiments include facilitating dialog with the user for generating appropriate responses based on the ambience of the user.

Type: Grant

Filed: June 24, 2019

Date of Patent: January 30, 2024

Inventors: Siddhartha Mukherjee, Udit Bhargava
Offline voice control

Patent number: 11869503

Abstract: As noted above, example techniques relate to offline voice control. A local voice input engine may process voice inputs locally when processing voice inputs via a cloud-based voice assistant service is not possible. Some techniques involve local (on-device) voice-assisted set-up of a cloud-based voice assistant service. Further example techniques involve local voice-assisted troubleshooting the cloud-based voice assistant service. Other techniques relate to interactions between local and cloud-based processing of voice inputs on a device that supports both local and cloud-based processing.

Type: Grant

Filed: December 13, 2021

Date of Patent: January 9, 2024

Assignee: Sonos, Inc.

Inventor: Connor Smith
End-to-end text-to-speech conversion

Patent number: 11862142

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.

Type: Grant

Filed: August 2, 2021

Date of Patent: January 2, 2024

Assignee: Google LLC

Inventors: Samuel Bengio, Yuxuan Wang, Zongheng Yang, Zhifeng Chen, Yonghui Wu, Ioannis Agiomyrgiannakis, Ron J. Weiss, Navdeep Jaitly, Ryan M. Rifkin, Robert Andrew James Clark, Quoc V. Le, Russell J. Ryan, Ying Xiao
Invoking an automated assistant to perform multiple tasks through an individual command

Patent number: 11861393

Abstract: Methods, apparatus, systems, and computer-readable media for engaging an automated assistant to perform multiple tasks through a multitask command. The multitask command can be a command that, when provided by a user, causes the automated assistant to invoke multiple different agent modules for performing tasks to complete the multitask command. During execution of the multitask command, a user can provide input that can be used by one or more agent modules to perform their respective tasks. Furthermore, feedback from one or more agent modules can be used by the automated assistant to dynamically alter tasks in order to more effectively use resources available during completion of the multitask command.

Type: Grant

Filed: November 2, 2022

Date of Patent: January 2, 2024

Assignee: GOOGLE LLC

Inventors: Yuzhao Ni, David Schairer
Hybrid architecture for transcription of real-time audio based on event data between on-premises system and cloud-based advanced audio processing system

Patent number: 11854551

Abstract: Transcribing portions of a communication session between a user device and an on-premises device of an enterprise includes receiving, by a computer located remotely from the on-premises device, a media stream of the communication session from the on-premises device and receiving, by the computer, at least one event associated with the media stream from the on-premises device. Furthermore, the computer determines a portion of the media stream to transcribe based on the at least one event and transcribes the portion of the media stream.

Type: Grant

Filed: March 22, 2019

Date of Patent: December 26, 2023

Assignee: Avaya Inc.

Inventors: Matthew A. Peters, Robert E. Braudes, Jeffrey L. Aigner
Method and system for processing user spoken utterance

Patent number: 11817093

Abstract: There is disclosed a method and system for processing a user spoken utterance, the method comprising: receiving, from a user, an indication of the user spoken utterance; generating, a text representation hypothesis based on the user spoken utterance; processing, using a first trained scenario model and a second trained scenario model, the text representation hypothesis to generate a first scenario hypothesis and a second scenario hypothesis, respectively; the first trained scenario model and the second trained scenario model having been trained using at least partially different corpus of texts; analyzing, using a Machine Learning Algorithm (MLA), the first scenario hypothesis and the second scenario hypothesis to determine a winning scenario having a higher confidence score; based on the winning scenario, determining by an associated one of the first trained scenario model and the second trained scenario model, an action to be executed by the electronic device; executing the action.

Type: Grant

Filed: December 7, 2020

Date of Patent: November 14, 2023

Assignee: YANDEX EUROPE AG

Inventors: Vyacheslav Vyacheslavovich Alipov, Oleg Aleksandrovich Sadovnikov, Nikita Vladimirovich Zubkov
Vehicle control apparatus

Patent number: 11783823

Abstract: A vehicle control apparatus to be used in a vehicle controllable on the basis of a voice input includes a determination unit and an input unit. The determination unit is configured to determine whether a main operator of the vehicle is in a predetermined state where the main operator is not possible to perform an operation or is not performing an operation. The input unit is configured to accept an operational input based on a voice of the main operator, as well as to accept an operational input based on a voice of a passenger of the vehicle in a case where the determination unit has determined that the main operator is in the predetermined state.

Type: Grant

Filed: July 30, 2020

Date of Patent: October 10, 2023

Assignee: SUBARU CORPORATION

Inventor: Katsuo Senmyo

1 2 3 4 5 … next