Patents Examined by Seong-Ah A Shin
  • Patent number: 11978435
    Abstract: This invention relates generally to speech processing and more particularly to end-to-end automatic speech recognition (ASR) that utilizes long contextual information. Some embodiments of the invention provide a system and a method for end-to-end ASR suitable for recognizing long audio recordings such as lecture and conversational speeches. This disclosure includes a Transformer-based ASR system that utilizes contextual information, wherein the Transformer accepts multiple utterances at the same time and predicts transcript for the last utterance. This is repeated in a sliding-window fashion with one-utterance shifts to recognize the entire recording. In addition, some embodiments of the present invention may use acoustic and/or text features obtained from only the previous utterances spoken by the same speaker as the last utterance when the long audio recording includes multiple speakers.
    Type: Grant
    Filed: October 13, 2020
    Date of Patent: May 7, 2024
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux
  • Patent number: 11972213
    Abstract: This application discloses an event recognition method, including: obtaining, by a terminal device, a target sentence used for recognizing a type of a target event; processing, by the terminal device, a target sentence based on an event recognition model, to obtain the type of the target event, the event recognition model being used for determining the type of the target event by using a trigger word in the target sentence and at least one context word of the trigger word, the trigger word being used for indicating candidate types of the target event, and the candidate types including the type of the target event; and outputting, by the terminal device, the type of the target event. According to the technical solutions of this application, an event recognition process is performed by using a trigger word and a context word of the trigger word.
    Type: Grant
    Filed: August 20, 2020
    Date of Patent: April 30, 2024
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventor: Shulin Liu
  • Patent number: 11967319
    Abstract: Methods and electronic devices for processing a spoken utterance associated with a user of an electronic device are disclosed. The method includes generating a textual representation of the spoken utterance having words, identifying a nonce word and a non-normalized word amongst the words, and generating a plurality of candidate textual representations based on the textual representation. The candidates have at least one of a first set of candidate textual representations and a second set of candidate textual representations, such that candidates from the first set are missing the nonce word from the words of the textual representation, and candidates from the second set have the non-normalized word from the words of the textual representation replaced by a normalized version thereof. The method includes comparing the candidates against grammars, and in response to a match, triggering an action associated with the grammar.
    Type: Grant
    Filed: August 3, 2021
    Date of Patent: April 23, 2024
    Assignee: Direct Cursus Technology L.L.C
    Inventors: Daniil Garrievich Anastasyev, Boris Andreevich Samoylov, Vyacheslav Vyacheslavovich Alipov
  • Patent number: 11967316
    Abstract: Embodiments of this application disclose method and apparatus for positioning a target audio signal by an audio interaction device, and an audio interaction device The method includes: obtaining audio signals in a plurality of directions in a space, and performing echo cancellation on the audio signal, the audio signal including a target-audio direct signal; obtaining weights of a plurality of time-frequency points in the audio signals, a weight of each time-frequency point indicating, at the time-frequency point, a relative proportion of the target-audio direct signal in the audio signals; weighting time-frequency components of the audio signal at the plurality of time-frequency points separately for each of the plurality of directions by using the weights of the plurality of time-frequency points, to obtain a weighted audio signal energy distribution; and obtaining a sound source azimuth corresponding to the target-audio direct signal in the audio signals accordingly.
    Type: Grant
    Filed: February 23, 2021
    Date of Patent: April 23, 2024
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Jimeng Zheng, Ian Ernan Liu, Yi Gao, Weiwei Li
  • Patent number: 11961517
    Abstract: Operations are appropriately changed in accordance with the use method. A keyword detection unit 11 generates a keyword detection result indicating a result of detecting an utterance of a predetermined keyword from an input voice. A voice detection unit 12 generates a voice section detection result indicating a result of detecting a voice section from the input voice. A sequential utterance detection unit 13 generates a sequential utterance detection result indicating that a sequential utterance has been made if the keyword detection result indicates that the keyword has been detected and if the voice section detection result indicates that the voice section has been detected.
    Type: Grant
    Filed: August 28, 2019
    Date of Patent: April 16, 2024
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Kazunori Kobayashi, Shoichiro Saito, Hiroaki Ito
  • Patent number: 11948570
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting utterances of a key phrase in an audio signal. One of the methods includes receiving, by a key phrase spotting system, an audio signal encoding one or more utterances; while continuing to receive the audio signal, generating, by the key phrase spotting system, an attention output using an attention mechanism that is configured to compute the attention output based on a series of encodings generated by an encoder comprising one or more neural network layers; generating, by the key phrase spotting system and using attention output, output that indicates whether the audio signal likely encodes the key phrase; and providing, by the key phrase spotting system, the output that indicates whether the audio signal likely encodes the key phrase.
    Type: Grant
    Filed: March 9, 2022
    Date of Patent: April 2, 2024
    Assignee: Google LLC
    Inventors: Wei Li, Rohit Prakash Prabhavalkar, Kanury Kanishka Rao, Yanzhang He, Ian C. Mcgraw, Anton Bakhtin
  • Patent number: 11942076
    Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.
    Type: Grant
    Filed: February 16, 2022
    Date of Patent: March 26, 2024
    Assignee: Google LLC
    Inventors: Ke Hu, Golan Pundak, Rohit Prakash Prabhavalkar, Antoine Jean Bruguier, Tara N. Sainath
  • Patent number: 11929090
    Abstract: A method for matching audio clips includes: obtaining a first feature sequence corresponding to a first audio clip and a second feature sequence corresponding to a second audio clip; constructing a distance matrix, elements in the distance matrix representing respective distances between first positions in the first feature sequence and second positions in the second feature sequence; calculating a first accumulation distance between a start position and a target position in the distance matrix, and calculating a second accumulation distance between an end position and the target position in the distance matrix; and calculating a minimum distance between the first feature sequence and the second feature sequence based on the first accumulation distance and the second accumulation distance, and determining a degree of matching between the first audio clip and the second audio clip according to the minimum distance.
    Type: Grant
    Filed: June 2, 2021
    Date of Patent: March 12, 2024
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Fang Chao Lin, Wei Biao Yun, Peng Zeng
  • Patent number: 11907677
    Abstract: A universal language assistive translation and interpretation system that is configured to verify and validate translations and interpretations by way of blockchain technology and smart contracts, multiple cross-format translation and interpretation blockchain validating and recording processes for verifying and validating cross-format translations and interpretations by smart contract and blockchain technology, and several validated cross-format translation and interpretation blockchain access processes for providing cross-format interpretations and translations of inter-communications between users regardless of ability or disability are disclosed.
    Type: Grant
    Filed: March 2, 2023
    Date of Patent: February 20, 2024
    Inventor: Arash Borhany
  • Patent number: 11908456
    Abstract: Embodiments of this application discloses an azimuth estimation method performed at a computing device, the method including: obtaining, in real time, multi-channel sampling signals and buffering the multi-channel sampling signals; performing wakeup word detection on one or more sampling signals of the multi-channel sampling signals, and determining a wakeup word detection score for each channel of the one or more sampling signals; performing a spatial spectrum estimation on the buffered multi-channel sampling signals to obtain a spatial spectrum estimation result, when the wakeup word detection scores of the one or more sampling signals indicates that a wakeup word exists in the one or more sampling signals; and determining an azimuth of a target voice associated with the multi-channel sampling signals according to the spatial spectrum estimation result and a highest wakeup word detection score, thereby improving the accuracy of the azimuth estimation in a voice interaction process.
    Type: Grant
    Filed: August 28, 2020
    Date of Patent: February 20, 2024
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Jimeng Zheng, Yi Gao, Meng Yu, Ian Ernan Liu
  • Patent number: 11900065
    Abstract: A system is capable of automatically adjusting or reconstructing a baseline expression to generate a parallelized expression. Evaluation of the parallelized expression provide a substantially similar output as the evaluation of the baseline query in more efficient manner. In some implementations, data indicating an expression to be evaluated on a primary thread of the one or more processors is obtained. Elements of the expression are identified. The elements are grouped into a parse tree representation. Elements of the expression are classified as belonging to either a first category that includes elements that are eligible for parallel processing or a second category that includes elements that are not eligible for parallel processing. A particular element that is classified as belonging to the first category is identified and evaluated on a non-primary thread of the one or more processors. The non-primary thread is evaluated in parallel with the primary thread.
    Type: Grant
    Filed: July 1, 2022
    Date of Patent: February 13, 2024
    Assignee: Appian Corporation
    Inventors: Brian Joseph Sullivan, Matthew David Hilliard
  • Patent number: 11898866
    Abstract: Annoyance is reduced that is caused to a subject person from which information is collected.
    Type: Grant
    Filed: May 30, 2019
    Date of Patent: February 13, 2024
    Assignee: Faurecia Clarion Electronics Co., Ltd.
    Inventor: Masataka Motohashi
  • Patent number: 11893984
    Abstract: This disclosure proposes systems and methods for speech processing and sharing permitted entity information across speech processing systems. A first system can receive first audio data representing a first utterance. The first system can receive a first dialog identifier associated with a previous utterance. The first system can determine that the first audio data references a first entity. In some cases, the first system may not be able to resolve the first entity based on information in the first audio data. The first system can send, to a second system different from the first system, a first request for information about the first entity. The first request includes the first dialog identifier. The first system can receive first data responsive to the first request from the second system. The first system can process the first data and the first audio data to determine second data responsive to the first utterance, and output a first response representing the second data.
    Type: Grant
    Filed: June 22, 2020
    Date of Patent: February 6, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Zoe Adams, Robert Monell Kilgore
  • Patent number: 11887591
    Abstract: Embodiments herein disclose methods and systems for providing a digital assistant in a device, which can generate responses to commands from a user based on ambience of the user. On receiving a command from the user of the device to perform an action, content stored in the device can be extracted. The embodiments include determining degree of privacy and sensitivity of the content. The embodiments include determining ambience of the user based on ambient noise, location of the device, presence of other humans, emotional state of the user, application parameters, user activity, and so on. The embodiments include generating a response and revealing the response based on the determined ambience and the degree of privacy and sensitivity of the extracted content. The embodiments include facilitating dialog with the user for generating appropriate responses based on the ambience of the user.
    Type: Grant
    Filed: June 24, 2019
    Date of Patent: January 30, 2024
    Inventors: Siddhartha Mukherjee, Udit Bhargava
  • Patent number: 11869503
    Abstract: As noted above, example techniques relate to offline voice control. A local voice input engine may process voice inputs locally when processing voice inputs via a cloud-based voice assistant service is not possible. Some techniques involve local (on-device) voice-assisted set-up of a cloud-based voice assistant service. Further example techniques involve local voice-assisted troubleshooting the cloud-based voice assistant service. Other techniques relate to interactions between local and cloud-based processing of voice inputs on a device that supports both local and cloud-based processing.
    Type: Grant
    Filed: December 13, 2021
    Date of Patent: January 9, 2024
    Assignee: Sonos, Inc.
    Inventor: Connor Smith
  • Patent number: 11862142
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.
    Type: Grant
    Filed: August 2, 2021
    Date of Patent: January 2, 2024
    Assignee: Google LLC
    Inventors: Samuel Bengio, Yuxuan Wang, Zongheng Yang, Zhifeng Chen, Yonghui Wu, Ioannis Agiomyrgiannakis, Ron J. Weiss, Navdeep Jaitly, Ryan M. Rifkin, Robert Andrew James Clark, Quoc V. Le, Russell J. Ryan, Ying Xiao
  • Patent number: 11861393
    Abstract: Methods, apparatus, systems, and computer-readable media for engaging an automated assistant to perform multiple tasks through a multitask command. The multitask command can be a command that, when provided by a user, causes the automated assistant to invoke multiple different agent modules for performing tasks to complete the multitask command. During execution of the multitask command, a user can provide input that can be used by one or more agent modules to perform their respective tasks. Furthermore, feedback from one or more agent modules can be used by the automated assistant to dynamically alter tasks in order to more effectively use resources available during completion of the multitask command.
    Type: Grant
    Filed: November 2, 2022
    Date of Patent: January 2, 2024
    Assignee: GOOGLE LLC
    Inventors: Yuzhao Ni, David Schairer
  • Patent number: 11854551
    Abstract: Transcribing portions of a communication session between a user device and an on-premises device of an enterprise includes receiving, by a computer located remotely from the on-premises device, a media stream of the communication session from the on-premises device and receiving, by the computer, at least one event associated with the media stream from the on-premises device. Furthermore, the computer determines a portion of the media stream to transcribe based on the at least one event and transcribes the portion of the media stream.
    Type: Grant
    Filed: March 22, 2019
    Date of Patent: December 26, 2023
    Assignee: Avaya Inc.
    Inventors: Matthew A. Peters, Robert E. Braudes, Jeffrey L. Aigner
  • Patent number: 11817093
    Abstract: There is disclosed a method and system for processing a user spoken utterance, the method comprising: receiving, from a user, an indication of the user spoken utterance; generating, a text representation hypothesis based on the user spoken utterance; processing, using a first trained scenario model and a second trained scenario model, the text representation hypothesis to generate a first scenario hypothesis and a second scenario hypothesis, respectively; the first trained scenario model and the second trained scenario model having been trained using at least partially different corpus of texts; analyzing, using a Machine Learning Algorithm (MLA), the first scenario hypothesis and the second scenario hypothesis to determine a winning scenario having a higher confidence score; based on the winning scenario, determining by an associated one of the first trained scenario model and the second trained scenario model, an action to be executed by the electronic device; executing the action.
    Type: Grant
    Filed: December 7, 2020
    Date of Patent: November 14, 2023
    Assignee: YANDEX EUROPE AG
    Inventors: Vyacheslav Vyacheslavovich Alipov, Oleg Aleksandrovich Sadovnikov, Nikita Vladimirovich Zubkov
  • Patent number: 11783823
    Abstract: A vehicle control apparatus to be used in a vehicle controllable on the basis of a voice input includes a determination unit and an input unit. The determination unit is configured to determine whether a main operator of the vehicle is in a predetermined state where the main operator is not possible to perform an operation or is not performing an operation. The input unit is configured to accept an operational input based on a voice of the main operator, as well as to accept an operational input based on a voice of a passenger of the vehicle in a case where the determination unit has determined that the main operator is in the predetermined state.
    Type: Grant
    Filed: July 30, 2020
    Date of Patent: October 10, 2023
    Assignee: SUBARU CORPORATION
    Inventor: Katsuo Senmyo