Patents Examined by Shreyans A Patel
  • Patent number: 11670300
    Abstract: Systems and methods are described include a robot and/or an associated computing system that can use various cues about an environment of the robot to apply a bias to increase the accuracy of speech transcription. In some implementations, audio data corresponding to a spoken instruction to a robot is received. Candidate transcriptions of the audio data are obtained. A respective action of the robot corresponding to each of the candidate transcriptions of the audio data is determined. One or more scores indicating characteristics of a potential outcome of performing the respective action corresponding to the candidate transcription of the audio data are determined for each of the candidate transcriptions of the audio data. A particular candidate transcription is selected from among the candidate transcriptions based at least on the one or more scores. The action determined for the particular candidate transcription is performed.
    Type: Grant
    Filed: July 8, 2022
    Date of Patent: June 6, 2023
    Assignee: X Development LLC
    Inventor: Daniel Alex Lam
  • Patent number: 11657828
    Abstract: Embodiments improve speech data quality through training a neural network for de-noising audio enhancement. One such embodiment creates simulated noisy speech data from high quality speech data. In turn, training, e.g., deep normalizing flow training, is performed on a neural network using the high quality speech data and the simulated noisy speech data to train the neural network to create de-noised speech data given noisy speech data. Performing the training includes minimizing errors in the neural network according to at least one of (i) a decoding error of an Automatic Speech Recognition (ASR) system processing current de-noised speech data results generated by the neural network during the training and (ii) spectral distance between the high quality speech data and the current de-noised speech data results generated by the neural network during the training.
    Type: Grant
    Filed: January 31, 2020
    Date of Patent: May 23, 2023
    Assignee: Nuance Communications, Inc.
    Inventor: Carl Benjamin Quillen
  • Patent number: 11657277
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing sequence modeling tasks using insertions. One of the methods includes receiving a system input that includes one or more source elements from a source sequence and zero or more target elements from a target sequence, wherein each source element is selected from a vocabulary of source elements and wherein each target element is selected from a vocabulary of target elements; generating a partial concatenated sequence that includes the one or more source elements from the source sequence and the zero or more target elements from the target sequence, wherein the source and target elements arranged in the partial concatenated sequence according to a combined order; and generating a final concatenated sequence that includes a finalized source sequence and a finalized target sequence, wherein the finalized target sequence includes one or more target elements.
    Type: Grant
    Filed: May 26, 2020
    Date of Patent: May 23, 2023
    Assignee: Google LLC
    Inventors: William Chan, Mitchell Thomas Stern, Nikita Kitaev, Kelvin Gu, Jakob D. Uszkoreit
  • Patent number: 11651780
    Abstract: A speech recognition system utilizing automatic speech recognition techniques such as end-pointing techniques in conjunction with beamforming and/or signal processing to isolate speech from one or more speaking users from multiple received audio signals and to detect the beginning and/or end of the speech based at least in part on the isolation. Audio capture devices such as microphones may be arranged in a beamforming array to receive the multiple audio signals. Multiple audio sources including speech may be identified in different beams and processed.
    Type: Grant
    Filed: June 7, 2021
    Date of Patent: May 16, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Kenneth John Basye, Jeffrey Penrod Adams
  • Patent number: 11646010
    Abstract: A method for estimating an embedding capacity includes receiving, at a deterministic reference encoder, a reference audio signal, and determining a reference embedding corresponding to the reference audio signal, the reference embedding having a corresponding embedding dimensionality. The method also includes measuring a first reconstruction loss as a function of the corresponding embedding dimensionality of the reference embedding and obtaining a variational embedding from a variational posterior. The variational embedding has a corresponding embedding dimensionality and a specified capacity. The method also includes measuring a second reconstruction loss as a function of the corresponding embedding dimensionality of the variational embedding and estimating a capacity of the reference embedding by comparing the first measured reconstruction loss for the reference embedding relative to the second measured reconstruction loss for the variational embedding having the specified capacity.
    Type: Grant
    Filed: December 9, 2021
    Date of Patent: May 9, 2023
    Assignee: Google LLC
    Inventors: Eric Dean Battenberg, Daisy Stanton, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-Hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon
  • Patent number: 11630999
    Abstract: A method and system for voice emotion identification contained in audio in a call providing customer support between a customer and a service agent by implementing an emotion identification application to identify emotions captured in a voice of the customer from audio received by a media streaming device; receiving, by the emotion identification application, an audio stream of a series of voice samples contained in consecutive frames from audio received; extracting, by the emotion identification application, a set of voice emotion features from each frame in each voice sample of the audio by applying a trained machine learning (ML) model for identifying emotions utilizing a neural networks to determine one or more voice emotions by a configured set of voice emotion features captured in each voice sample; and classifying, by the emotion identification application, each emotion determined by the trained ML model based on a set of classifying features to label one or more types of emotions captured in each voic
    Type: Grant
    Filed: December 19, 2019
    Date of Patent: April 18, 2023
    Inventors: Arun Lokman Gangotri, Balarama Mathukumilli, Debashish Sahoo
  • Patent number: 11620502
    Abstract: The present disclosure provides a method for syncing data of a computing task across a plurality of groups of computing nodes. Each group including a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively couple each of computing nodes A-D with corresponding computing nodes A-D in each of a plurality of neighboring groups. The method comprises syncing data at a computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node; and broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node.
    Type: Grant
    Filed: January 30, 2020
    Date of Patent: April 4, 2023
    Assignee: Alibaba Group Holding Limited
    Inventors: Liang Han, Yang Jiao
  • Patent number: 11610577
    Abstract: Methods and Systems for providing a change to a voice interacting with a user are described. Information indicating a change that can be made to the voice can be received. The voice can be changed based on the information.
    Type: Grant
    Filed: November 19, 2020
    Date of Patent: March 21, 2023
    Assignee: Capital One Services, LLC
    Inventors: Anh Truong, Mark Watson, Jeremy Goodsitt, Vincent Pham, Fardin Abdi Taghi Abad, Kate Key, Austin Walters, Reza Farivar
  • Patent number: 11605371
    Abstract: Embodiments of the present systems and methods may provide techniques for synthesizing speech in any voice in any language in any accent. For example, in an embodiment, a text-to-speech conversion system may comprise a text converter adapted to convert input text to at least one phoneme selected from a plurality of phonemes stored in memory, a machine-learning model storing voice patterns for a plurality of individuals and adapted to receive the at least one phoneme and an identity of a speaker and to generate acoustic features for each phoneme, and a decoder adapted to receive the generated acoustic features and to generate a speech signal simulating a voice of the identified speaker in a language.
    Type: Grant
    Filed: June 14, 2019
    Date of Patent: March 14, 2023
    Assignee: Georgetown University
    Inventors: Joe Garman, Ophir Frieder
  • Patent number: 11605370
    Abstract: Disclosed are methods and systems for providing audible flight information to an operator of an aircraft. A method, for example, may include receiving flight information detected by one or more sensors positioned on the aircraft, causing an image to be displayed on a display device, the image including a plurality of text items corresponding to the flight information, receiving a first operator selection indicative of one or more of the text items, parsing the one or more text items to generate a set of intermediate data, synthesizing audio data based on the intermediate data, and causing audible content corresponding to the audio data to be emitted by one or more audio emitting devices, wherein the audible content includes speech corresponding to the flight information.
    Type: Grant
    Filed: August 12, 2021
    Date of Patent: March 14, 2023
    Assignee: Honeywell International Inc.
    Inventor: Dongfang Zhang
  • Patent number: 11600279
    Abstract: A method to transcribe communications may include obtaining audio data originating at a first device during a communication session between the first device and a second device and providing the audio data to an automated speech recognition system configured to transcribe the audio data. The method may further include obtaining multiple hypothesis transcriptions generated by the automated speech recognition system. Each of the multiple hypothesis transcriptions may include one or more words determined by the automated speech recognition system to be a transcription of a portion of the audio data. The method may further include determining one or more consistent words that are included in two or more of the multiple hypothesis transcriptions and in response to determining the one or more consistent words, providing the one or more consistent words to the second device for presentation of the one or more consistent words by the second device.
    Type: Grant
    Filed: August 26, 2019
    Date of Patent: March 7, 2023
    Assignee: Sorenson IP Holdings, LLC
    Inventors: Brian Chevrier, Shane Roylance, Kenneth Boehme
  • Patent number: 11599791
    Abstract: An example embodiment includes a neural network unit to which a plurality of element values based on learning target data are input, and a learning unit that trains the neural network unit. The neural network unit has a plurality of learning cells each including a plurality of input nodes that perform predetermined weighting on each of the plurality of element values and an output node that sums the plurality of weighted element values and outputs the sum, and in accordance with an output value of each of the learning cells, the learning unit updates weighting coefficients of the plurality of input nodes of each of the learning cells or adds a new learning cell to the neural network unit.
    Type: Grant
    Filed: November 20, 2018
    Date of Patent: March 7, 2023
    Assignee: NEC Solution Innovators, Ltd.
    Inventors: Yoshihito Miyauchi, Akio Uda, Katsuhiro Nakade
  • Patent number: 11593562
    Abstract: A smart assistant is disclosed that provides for interfaces to capture requirements for a technical assistance request and then execute actions responsive to the technical assistance request. Example embodiments relate to parsing natural language input defining a technical assistance request to determine a series of instructions responsive to the technical assistance request. The smart assistant may also automatically detect a condition and generate a technical assistance request responsive to the condition. One or more driver applications may control or command one or more computing systems to respond to the technical assistance request.
    Type: Grant
    Filed: November 11, 2019
    Date of Patent: February 28, 2023
    Assignee: Affirm, Inc.
    Inventors: Adam Smith, Tarak Upadhyaya, Juan Lozano, Daniel Hung
  • Patent number: 11593568
    Abstract: An agent system includes a first memory and a first processor coupled to the first memory. The first processor analyzes contents of a verbal question, and carries out pre-processing that replaces vocabulary, which is used in the contents of the question, with homogenized vocabulary, and generates response information based on results of analysis. In a case in which there exists substitution vocabulary that has replaced original vocabulary in the pre-processing, the first processor changes the response information such that it can be recognized that the substitution vocabulary in the response information is synonymous with the original vocabulary, and outputs the response information.
    Type: Grant
    Filed: November 12, 2020
    Date of Patent: February 28, 2023
    Assignee: TOYOTA JIDOSHA KABUSHIKI KAISHA
    Inventors: Chikage Kubo, Keiko Nakano, Eiichi Maeda, Hiroyuki Nishizawa
  • Patent number: 11587547
    Abstract: An electronic apparatus which acquires input data to be input into a TTS module for outputting a voice through the TTS module, acquires a voice signal corresponding to the input data through the TTS module, detects an error in the acquired voice signal based on the input data, corrects the input data based on the detection result, and acquires a corrected voice signal corresponding to the corrected input data through the TTS module.
    Type: Grant
    Filed: February 12, 2020
    Date of Patent: February 21, 2023
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Hosang Sung, Kyoungbo Min, Seonho Hwang, Doohwa Hong, Eunmi Oh, Jonghoon Jeong, Kihyun Choo
  • Patent number: 11580955
    Abstract: A speech-processing system receives input data representing text. A first encoder processes segments of the text to determine embedding data representing the text, and a second encoder processes corresponding audio data to determine prosodic data corresponding to the text. The embedding and prosodic data is processed to create output data including a representation of speech corresponding to the text and prosody.
    Type: Grant
    Filed: March 31, 2021
    Date of Patent: February 14, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Yixiong Meng, Roberto Barra Chicote, Grzegorz Beringer, Zeya Chen, Jie Liang, James Garnet Droppo, Chia-Hao Chang, Oguz Hasan Elibol
  • Patent number: 11580952
    Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.
    Type: Grant
    Filed: April 22, 2020
    Date of Patent: February 14, 2023
    Assignee: Google LLC
    Inventors: Yu Zhang, Ron J. Weiss, Byungha Chun, Yonghui Wu, Zhifeng Chen, Russell John Wyatt Skerry-Ryan, Ye Jia, Andrew M. Rosenberg, Bhuvana Ramabhadran
  • Patent number: 11574624
    Abstract: A speech-processing system receives input data representing text. An input encoder processes the input data to determine first embedding data representing the text. A local attention encoder processes a subset of the first embedding data in accordance with a predicted size to determine second embedding data. An attention encoder processes the second embedding data to determine third embedding data. A decoder processes the third embedding data to determine audio data corresponding to the text.
    Type: Grant
    Filed: March 31, 2021
    Date of Patent: February 7, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Arnaud Vincent Pierre Yves Joly, Panagiota Karanasou, Alexis Pierre Jean-Baptiste Moinet, Thomas Renaud Drugman, Sri Vishnu Kumar Karlapati, Syed Ammar Abbas, Simon Slangen
  • Patent number: 11562147
    Abstract: A visual dialogue model receives image input and text input that includes a dialogue history between the model and a current utterance by a human user. The model generates a unified contextualized representation using a transformer encoder network, in which the unified contextualized representation includes a token level encoding of the image input and text input. The model generates an encoded visual dialogue input from the unified contextualized representation using visual dialogue encoding layers. The encoded visual dialogue input includes a position level encoding and a segment type encoding. The model generates an answer prediction from the encoded visual dialogue input using a first self-attention mask associated with discriminative settings or a second self-attention mask associated with generative settings. Dense annotation fine tuning may be performed to increase accuracy of the answer prediction. The model provides the answer prediction as a response to the current utterance of the human user.
    Type: Grant
    Filed: July 15, 2020
    Date of Patent: January 24, 2023
    Assignee: Salesforce.com, Inc.
    Inventors: Yue Wang, Chu Hong Hoi, Shafiq Rayhan Joty
  • Patent number: 11557275
    Abstract: A voice system of a moving machine is a voice system of a moving machine driven by a driver who is exposed to an outside of the moving machine and includes: a noise estimating section which estimates a future noise state based on information related to a noise generation factor; and a voice control section which changes an attribute of voice in accordance with the estimated noise state, the voice being voice to be output to the driver.
    Type: Grant
    Filed: May 30, 2019
    Date of Patent: January 17, 2023
    Assignee: KAWASAKI MOTORS, LTD.
    Inventors: Masanori Kinuhata, Daisuke Kawai, Shohei Terai, Hisanosuke Kawada, Hirotoshi Shimura