Patents Examined by Shreyans A Patel

Speech recognition biasing

Patent number: 11670300

Abstract: Systems and methods are described include a robot and/or an associated computing system that can use various cues about an environment of the robot to apply a bias to increase the accuracy of speech transcription. In some implementations, audio data corresponding to a spoken instruction to a robot is received. Candidate transcriptions of the audio data are obtained. A respective action of the robot corresponding to each of the candidate transcriptions of the audio data is determined. One or more scores indicating characteristics of a potential outcome of performing the respective action corresponding to the candidate transcription of the audio data are determined for each of the candidate transcriptions of the audio data. A particular candidate transcription is selected from among the candidate transcriptions based at least on the one or more scores. The action determined for the particular candidate transcription is performed.

Type: Grant

Filed: July 8, 2022

Date of Patent: June 6, 2023

Assignee: X Development LLC

Inventor: Daniel Alex Lam
Method and system for speech enhancement

Patent number: 11657828

Abstract: Embodiments improve speech data quality through training a neural network for de-noising audio enhancement. One such embodiment creates simulated noisy speech data from high quality speech data. In turn, training, e.g., deep normalizing flow training, is performed on a neural network using the high quality speech data and the simulated noisy speech data to train the neural network to create de-noised speech data given noisy speech data. Performing the training includes minimizing errors in the neural network according to at least one of (i) a decoding error of an Automatic Speech Recognition (ASR) system processing current de-noised speech data results generated by the neural network during the training and (ii) spectral distance between the high quality speech data and the current de-noised speech data results generated by the neural network during the training.

Type: Grant

Filed: January 31, 2020

Date of Patent: May 23, 2023

Assignee: Nuance Communications, Inc.

Inventor: Carl Benjamin Quillen
Generating neural network outputs using insertion commands

Patent number: 11657277

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing sequence modeling tasks using insertions. One of the methods includes receiving a system input that includes one or more source elements from a source sequence and zero or more target elements from a target sequence, wherein each source element is selected from a vocabulary of source elements and wherein each target element is selected from a vocabulary of target elements; generating a partial concatenated sequence that includes the one or more source elements from the source sequence and the zero or more target elements from the target sequence, wherein the source and target elements arranged in the partial concatenated sequence according to a combined order; and generating a final concatenated sequence that includes a finalized source sequence and a finalized target sequence, wherein the finalized target sequence includes one or more target elements.

Type: Grant

Filed: May 26, 2020

Date of Patent: May 23, 2023

Assignee: Google LLC

Inventors: William Chan, Mitchell Thomas Stern, Nikita Kitaev, Kelvin Gu, Jakob D. Uszkoreit
Direction based end-pointing for speech recognition

Patent number: 11651780

Abstract: A speech recognition system utilizing automatic speech recognition techniques such as end-pointing techniques in conjunction with beamforming and/or signal processing to isolate speech from one or more speaking users from multiple received audio signals and to detect the beginning and/or end of the speech based at least in part on the isolation. Audio capture devices such as microphones may be arranged in a beamforming array to receive the multiple audio signals. Multiple audio sources including speech may be identified in different beams and processed.

Type: Grant

Filed: June 7, 2021

Date of Patent: May 16, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Kenneth John Basye, Jeffrey Penrod Adams
Variational embedding capacity in expressive end-to-end speech synthesis

Patent number: 11646010

Abstract: A method for estimating an embedding capacity includes receiving, at a deterministic reference encoder, a reference audio signal, and determining a reference embedding corresponding to the reference audio signal, the reference embedding having a corresponding embedding dimensionality. The method also includes measuring a first reconstruction loss as a function of the corresponding embedding dimensionality of the reference embedding and obtaining a variational embedding from a variational posterior. The variational embedding has a corresponding embedding dimensionality and a specified capacity. The method also includes measuring a second reconstruction loss as a function of the corresponding embedding dimensionality of the variational embedding and estimating a capacity of the reference embedding by comparing the first measured reconstruction loss for the reference embedding relative to the second measured reconstruction loss for the variational embedding having the specified capacity.

Type: Grant

Filed: December 9, 2021

Date of Patent: May 9, 2023

Assignee: Google LLC

Inventors: Eric Dean Battenberg, Daisy Stanton, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-Hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon
Method and system for analyzing customer calls by implementing a machine learning model to identify emotions

Patent number: 11630999

Abstract: A method and system for voice emotion identification contained in audio in a call providing customer support between a customer and a service agent by implementing an emotion identification application to identify emotions captured in a voice of the customer from audio received by a media streaming device; receiving, by the emotion identification application, an audio stream of a series of voice samples contained in consecutive frames from audio received; extracting, by the emotion identification application, a set of voice emotion features from each frame in each voice sample of the audio by applying a trained machine learning (ML) model for identifying emotions utilizing a neural networks to determine one or more voice emotions by a configured set of voice emotion features captured in each voice sample; and classifying, by the emotion identification application, each emotion determined by the trained ML model based on a set of classifying features to label one or more types of emotions captured in each voic

Type: Grant

Filed: December 19, 2019

Date of Patent: April 18, 2023

Inventors: Arun Lokman Gangotri, Balarama Mathukumilli, Debashish Sahoo
Hyper-square implementation of tree AllReduce algorithm for distributed parallel deep learning

Patent number: 11620502

Abstract: The present disclosure provides a method for syncing data of a computing task across a plurality of groups of computing nodes. Each group including a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively couple each of computing nodes A-D with corresponding computing nodes A-D in each of a plurality of neighboring groups. The method comprises syncing data at a computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node; and broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node.

Type: Grant

Filed: January 30, 2020

Date of Patent: April 4, 2023

Assignee: Alibaba Group Holding Limited

Inventors: Liang Han, Yang Jiao
Methods and systems for providing changes to a live voice stream

Patent number: 11610577

Abstract: Methods and Systems for providing a change to a voice interacting with a user are described. Information indicating a change that can be made to the voice can be received. The voice can be changed based on the information.

Type: Grant

Filed: November 19, 2020

Date of Patent: March 21, 2023

Assignee: Capital One Services, LLC

Inventors: Anh Truong, Mark Watson, Jeremy Goodsitt, Vincent Pham, Fardin Abdi Taghi Abad, Kate Key, Austin Walters, Reza Farivar
Method and system for parametric speech synthesis

Patent number: 11605371

Abstract: Embodiments of the present systems and methods may provide techniques for synthesizing speech in any voice in any language in any accent. For example, in an embodiment, a text-to-speech conversion system may comprise a text converter adapted to convert input text to at least one phoneme selected from a plurality of phonemes stored in memory, a machine-learning model storing voice patterns for a plurality of individuals and adapted to receive the at least one phoneme and an identity of a speaker and to generate acoustic features for each phoneme, and a decoder adapted to receive the generated acoustic features and to generate a speech signal simulating a voice of the identified speaker in a language.

Type: Grant

Filed: June 14, 2019

Date of Patent: March 14, 2023

Assignee: Georgetown University

Inventors: Joe Garman, Ophir Frieder
Systems and methods for providing audible flight information

Patent number: 11605370

Abstract: Disclosed are methods and systems for providing audible flight information to an operator of an aircraft. A method, for example, may include receiving flight information detected by one or more sensors positioned on the aircraft, causing an image to be displayed on a display device, the image including a plurality of text items corresponding to the flight information, receiving a first operator selection indicative of one or more of the text items, parsing the one or more text items to generate a set of intermediate data, synthesizing audio data based on the intermediate data, and causing audible content corresponding to the audio data to be emitted by one or more audio emitting devices, wherein the audible content includes speech corresponding to the flight information.

Type: Grant

Filed: August 12, 2021

Date of Patent: March 14, 2023

Assignee: Honeywell International Inc.

Inventor: Dongfang Zhang
Transcription of communications

Patent number: 11600279

Abstract: A method to transcribe communications may include obtaining audio data originating at a first device during a communication session between the first device and a second device and providing the audio data to an automated speech recognition system configured to transcribe the audio data. The method may further include obtaining multiple hypothesis transcriptions generated by the automated speech recognition system. Each of the multiple hypothesis transcriptions may include one or more words determined by the automated speech recognition system to be a transcription of a portion of the audio data. The method may further include determining one or more consistent words that are included in two or more of the multiple hypothesis transcriptions and in response to determining the one or more consistent words, providing the one or more consistent words to the second device for presentation of the one or more consistent words by the second device.

Type: Grant

Filed: August 26, 2019

Date of Patent: March 7, 2023

Assignee: Sorenson IP Holdings, LLC

Inventors: Brian Chevrier, Shane Roylance, Kenneth Boehme
Learning device and learning method, recognition device and recognition method, program, and storage medium

Patent number: 11599791

Abstract: An example embodiment includes a neural network unit to which a plurality of element values based on learning target data are input, and a learning unit that trains the neural network unit. The neural network unit has a plurality of learning cells each including a plurality of input nodes that perform predetermined weighting on each of the plurality of element values and an output node that sums the plurality of weighted element values and outputs the sum, and in accordance with an output value of each of the learning cells, the learning unit updates weighting coefficients of the plurality of input nodes of each of the learning cells or adds a new learning cell to the neural network unit.

Type: Grant

Filed: November 20, 2018

Date of Patent: March 7, 2023

Assignee: NEC Solution Innovators, Ltd.

Inventors: Yoshihito Miyauchi, Akio Uda, Katsuhiro Nakade
Advanced machine learning interfaces

Patent number: 11593562

Abstract: A smart assistant is disclosed that provides for interfaces to capture requirements for a technical assistance request and then execute actions responsive to the technical assistance request. Example embodiments relate to parsing natural language input defining a technical assistance request to determine a series of instructions responsive to the technical assistance request. The smart assistant may also automatically detect a condition and generate a technical assistance request responsive to the condition. One or more driver applications may control or command one or more computing systems to respond to the technical assistance request.

Type: Grant

Filed: November 11, 2019

Date of Patent: February 28, 2023

Assignee: Affirm, Inc.

Inventors: Adam Smith, Tarak Upadhyaya, Juan Lozano, Daniel Hung
Agent system, agent processing method, and non-transitory storage medium that stores an agent processing program

Patent number: 11593568

Abstract: An agent system includes a first memory and a first processor coupled to the first memory. The first processor analyzes contents of a verbal question, and carries out pre-processing that replaces vocabulary, which is used in the contents of the question, with homogenized vocabulary, and generates response information based on results of analysis. In a case in which there exists substitution vocabulary that has replaced original vocabulary in the pre-processing, the first processor changes the response information such that it can be recognized that the substitution vocabulary in the response information is synonymous with the original vocabulary, and outputs the response information.

Type: Grant

Filed: November 12, 2020

Date of Patent: February 28, 2023

Assignee: TOYOTA JIDOSHA KABUSHIKI KAISHA

Inventors: Chikage Kubo, Keiko Nakano, Eiichi Maeda, Hiroyuki Nishizawa
Electronic apparatus and method for controlling thereof

Patent number: 11587547

Abstract: An electronic apparatus which acquires input data to be input into a TTS module for outputting a voice through the TTS module, acquires a voice signal corresponding to the input data through the TTS module, detects an error in the acquired voice signal based on the input data, corrects the input data based on the detection result, and acquires a corrected voice signal corresponding to the corrected input data through the TTS module.

Type: Grant

Filed: February 12, 2020

Date of Patent: February 21, 2023

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Hosang Sung, Kyoungbo Min, Seonho Hwang, Doohwa Hong, Eunmi Oh, Jonghoon Jeong, Kihyun Choo
Synthetic speech processing

Patent number: 11580955

Abstract: A speech-processing system receives input data representing text. A first encoder processes segments of the text to determine embedding data representing the text, and a second encoder processes corresponding audio data to determine prosodic data corresponding to the text. The embedding and prosodic data is processed to create output data including a representation of speech corresponding to the text and prosody.

Type: Grant

Filed: March 31, 2021

Date of Patent: February 14, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Yixiong Meng, Roberto Barra Chicote, Grzegorz Beringer, Zeya Chen, Jie Liang, James Garnet Droppo, Chia-Hao Chang, Oguz Hasan Elibol
Multilingual speech synthesis and cross-language voice cloning

Patent number: 11580952

Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.

Type: Grant

Filed: April 22, 2020

Date of Patent: February 14, 2023

Assignee: Google LLC

Inventors: Yu Zhang, Ron J. Weiss, Byungha Chun, Yonghui Wu, Zhifeng Chen, Russell John Wyatt Skerry-Ryan, Ye Jia, Andrew M. Rosenberg, Bhuvana Ramabhadran
Synthetic speech processing

Patent number: 11574624

Abstract: A speech-processing system receives input data representing text. An input encoder processes the input data to determine first embedding data representing the text. A local attention encoder processes a subset of the first embedding data in accordance with a predicted size to determine second embedding data. An attention encoder processes the second embedding data to determine third embedding data. A decoder processes the third embedding data to determine audio data corresponding to the text.

Type: Grant

Filed: March 31, 2021

Date of Patent: February 7, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Arnaud Vincent Pierre Yves Joly, Panagiota Karanasou, Alexis Pierre Jean-Baptiste Moinet, Thomas Renaud Drugman, Sri Vishnu Kumar Karlapati, Syed Ammar Abbas, Simon Slangen
Unified vision and dialogue transformer with BERT

Patent number: 11562147

Abstract: A visual dialogue model receives image input and text input that includes a dialogue history between the model and a current utterance by a human user. The model generates a unified contextualized representation using a transformer encoder network, in which the unified contextualized representation includes a token level encoding of the image input and text input. The model generates an encoded visual dialogue input from the unified contextualized representation using visual dialogue encoding layers. The encoded visual dialogue input includes a position level encoding and a segment type encoding. The model generates an answer prediction from the encoded visual dialogue input using a first self-attention mask associated with discriminative settings or a second self-attention mask associated with generative settings. Dense annotation fine tuning may be performed to increase accuracy of the answer prediction. The model provides the answer prediction as a response to the current utterance of the human user.

Type: Grant

Filed: July 15, 2020

Date of Patent: January 24, 2023

Assignee: Salesforce.com, Inc.

Inventors: Yue Wang, Chu Hong Hoi, Shafiq Rayhan Joty
Voice system and voice output method of moving machine

Patent number: 11557275

Abstract: A voice system of a moving machine is a voice system of a moving machine driven by a driver who is exposed to an outside of the moving machine and includes: a noise estimating section which estimates a future noise state based on information related to a noise generation factor; and a voice control section which changes an attribute of voice in accordance with the estimated noise state, the voice being voice to be output to the driver.

Type: Grant

Filed: May 30, 2019

Date of Patent: January 17, 2023

Assignee: KAWASAKI MOTORS, LTD.

Inventors: Masanori Kinuhata, Daisuke Kawai, Shohei Terai, Hisanosuke Kawada, Hirotoshi Shimura

prev 1 2 3 4 5 6 7 8 … next