Patents Examined by Jakieda R Jackson

Speech endpointing

Patent number: 11710477

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing are described. In one aspect, a method includes the action of accessing voice query log data that includes voice queries spoken by a particular user. The actions further include based on the voice query log data that includes voice queries spoken by a particular user, determining a pause threshold from the voice query log data that includes voice queries spoken by the particular user. The actions further include receiving, from the particular user, an utterance. The actions further include determining that the particular user has stopped speaking for at least a period of time equal to the pause threshold. The actions further include based on determining that the particular user has stopped speaking for at least a period of time equal to the pause threshold, processing the utterance as a voice query.

Type: Grant

Filed: June 21, 2021

Date of Patent: July 25, 2023

Assignee: Google LLC

Inventors: Siddhi Tadpatrikar, Michael Buchanan, Pravir Kumar Gupta
Real-time neural text-to-speech

Patent number: 11705107

Abstract: Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant of WaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditional TTS systems, where each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.

Type: Grant

Filed: October 1, 2020

Date of Patent: July 18, 2023

Assignee: Baidu USA LLC

Inventors: Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, John Miller, Andrew Ng, Jonathan Raiman, Shubhahrata Sengupta, Mohammad Shoeybi
Variable-speed phonetic pronunciation machine

Patent number: 11694680

Abstract: A machine causes a touch-sensitive screen to present a graphical user interface that depicts a slider control aligned with a word that includes a first alphabetic letter and a second alphabetic letter. A first zone of the slider control corresponds to the first alphabetic letter, and a second zone of the slider control corresponds to the second alphabetic letter. The machine detects a touch-and-drag input that begins within the first zone and enters the second zone. In response to the touch-and-drag input beginning within the first zone, the machine presents a first phoneme that corresponds to the first alphabetic letter, and the presenting of the first phoneme may include audio playback of the first phoneme. In response to the touch-and-drag input entering the second zone, the machine presents a second phoneme that corresponds to the second alphabetic letter, which may include audio playback of the second phoneme.

Type: Grant

Filed: May 4, 2022

Date of Patent: July 4, 2023

Assignee: Learning Squared, Inc.

Inventors: Vera Blau-McCandliss, Bruce Donald McCandliss
Data generation apparatus and data generation method that generate recognition text from speech data

Patent number: 11694028

Abstract: According to one embodiment, the data generation apparatus includes a speech synthesis unit, a speech recognition unit, a matching processing unit, and a dataset generation unit. The speech synthesis unit generates speech data from an original text. The speech recognition unit generates a recognition text by speech recognition from the speech data. The matching processing unit performs matching between the original text and the recognition text. The dataset generation unit generates a dataset in such a manner where the speech data, from which the recognition text satisfying a certain condition for a matching degree relative to the original text is generated, is associated with the original text, based on a matching result.

Type: Grant

Filed: August 31, 2020

Date of Patent: July 4, 2023

Assignee: KABUSHIKI KAISHA TOSHIBA

Inventors: Hiroshi Fujimura, Kenji Iwata, Hui Di, Pengfei Chen
Systems and methods for automatic joining as a virtual meeting participant for transcription

Patent number: 11676623

Abstract: Method, system, device, and non-transitory computer-readable medium for joining a virtual participant in a conversation. In some examples, a computer-implemented method includes: identifying a first conversation scheduled to be participated by a first group of actual participants; joining a first virtual participant into the first conversation; obtaining, via the first virtual participant, a first set of audio data associated with the first conversation while the first conversation occurs; transcribing, via the first virtual participant, the first set of audio data into a first set of text data while the first conversation occurs; and presenting the first set of text data to the first group of actual participants while the first conversation occurs.

Type: Grant

Filed: February 23, 2022

Date of Patent: June 13, 2023

Assignee: Otter.ai, Inc.

Inventors: Amro Younes, Winfred James, Tao Xing, Cheng Yuan, Yun Fu, Simon Lau, Robert Firebaugh, Sam Liang
Voice assistant tracking and activation

Patent number: 11676601

Abstract: According to one embodiment, a system for voice assistant tracking and activation includes a tracking component, a wake component, a listening component, and a link component. The tracking component is configured to track availability of a plurality of voice assistant services. The wake component is configured to determine a plurality of wake words, each plurality of wake words corresponding to a specific voice assistant service of the plurality of voice assistant services. The listening component is configured to receive audio and detect a first wake word of the plurality of wake words that corresponds to a first voice assistant service of the plurality of voice assistant services. The link component is configured to establish a voice link with the first voice assistant service for voice input by the user.

Type: Grant

Filed: September 28, 2021

Date of Patent: June 13, 2023

Assignee: Ford Global Technologies, LLC

Inventors: Ashish Nadkar, Alan Daniel Gonzalez
Instantaneous learning in text-to-speech during dialog

Patent number: 11676572

Abstract: A method for instantaneous learning in text-to-speech (TTS) during dialog includes receiving a user pronunciation of a particular word present in a query spoken by a user. The method also includes receiving a TTS pronunciation of the same particular word that is present in a TTS input where the TTS pronunciation of the particular word is different than the user pronunciation of the particular word. The method also includes obtaining user pronunciation-related features and TTS pronunciation related features associated with the particular word. The method also includes generating a pronunciation decision selecting one of the user pronunciation or the TTS pronunciation of the particular word that is associated with a highest confidence. The method also include providing the TTS audio that includes a synthesized speech representation of the response to the query using the user pronunciation or the TTS pronunciation for the particular word.

Type: Grant

Filed: March 3, 2021

Date of Patent: June 13, 2023

Assignee: Google LLC

Inventors: Vijayaditya Peddinti, Bhuvana Ramabhadran, Andrew Rosenberg, Mateusz Golebiewski
Speaker recognition method, electronic device, and storage medium

Patent number: 11676609

Abstract: The present disclosure provides a speaker recognition method, an electronic device, and a storage medium. An implementation includes: segmenting the target audio file and the to-be-recognized audio file into a plurality of audio units respectively; extracting an audio feature from each of the audio units to obtain an audio feature sequence of the target audio file and an audio feature sequence of the to-be-recognized audio file; performing feature learning on the audio feature sequence of the target audio file and the audio feature sequence of the to-be-recognized audio file by using Siamese neural network, to obtain a feature vector corresponding to the target audio file and feature vectors respectively corresponding to the plurality of audio units in the to-be-recognized audio file; and recognizing, by using an attention mechanism-based machine learning model, the audio units belonging to the target speaker in the to-be-recognized audio file based on the feature vectors.

Type: Grant

Filed: December 20, 2022

Date of Patent: June 13, 2023

Inventors: Hang Li, Wenbiao Ding, Zitao Liu
Speech processing techniques

Patent number: 11670285

Abstract: Techniques for an interactive turn-based reading experience are described. A system may take turns reading content, such as a book, with a user. The system may process audio data representing a user reading a portion of the content, determine reading evaluation data, and determine how to proceed for the next turn based on the reading evaluation data. For example, based on the reading evaluation data, the system may read a portion of the content by outputting synthesized speech representing the content, may ask the user re-read a portion of the content, or may ask the user to read a different, smaller portion of the content.

Type: Grant

Filed: November 24, 2020

Date of Patent: June 6, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Kevin Crews, Prasanna H Sridhar, Ariya Rastrow, Nicholas Matthew Jutila, Andrew Oberlin, Samarth Batra, Paul Anthony Bernhardt, Veerdhawal Pande, Roland Maximilian Rolf Maas
Electronic apparatus, controlling method of thereof and non-transitory computer readable recording medium

Patent number: 11669693

Abstract: An electronic apparatus includes an input unit comprising input circuitry configured to receive a natural language input, a communicator comprising communication circuitry configured to perform communication with a plurality of external chatting servers, and a processor configured to analyze a characteristic of the natural language and a characteristic of the user and to identify a chatting server corresponding to the natural language from among the plurality of chatting servers, and to control the communicator to transmit the natural language to the identified chatting server in order to receive a response with respect to the natural language.

Type: Grant

Filed: July 16, 2021

Date of Patent: June 6, 2023

Assignee: Samsung Electronics Co., Ltd.

Inventors: Chang-hwan Choi, Ji-hwan Yun, Man-un Jeong
Techniques for dynamic auditory phrase completion

Patent number: 11657814

Abstract: Embodiments of the present disclosure set forth a computer-implemented method comprising detecting an initial phrase portion included in a first auditory signal generated by a user, identifying, based on the initial phrase portion, a supplemental phrase portion that complements the initial phrase portion to form a complete phrase, and providing a command signal that drives an output device to generate an audio output corresponding to the supplemental phrase portion.

Type: Grant

Filed: October 8, 2020

Date of Patent: May 23, 2023

Assignee: Harman International Industries, Incorporated

Inventors: Stefan Marti, Joseph Verbeke, Evgeny Burmistrov, Priya Seshadri
Data driven radio enhancement

Patent number: 11657830

Abstract: Systems and methods are disclosed for data driven radio enhancement. For example, methods may include demodulating a radio signal to obtain a demodulated audio signal; determining a window of audio samples based on the demodulated audio signal; applying an audio enhancement network to the window of audio samples to obtain an enhanced audio segment, in which the audio enhancement network includes a machine learning network that has been trained using demodulated audio signals derived from radio signals; and storing, playing, or transmitting an enhanced audio signal based on the enhanced audio segment.

Type: Grant

Filed: January 26, 2021

Date of Patent: May 23, 2023

Assignee: BabbleLabs LLC

Inventors: Samer Hijazi, Kamil Krzysztof Wojcicki, Dror Maydan, Christopher Rowen
Developer voice actions system

Patent number: 11657816

Abstract: Methods, systems, and apparatus, for defining and monitoring an event for a physical entity and the performance of an action in response to the occurrence of the event. A method includes receiving data indicating an event for a physical entity, the event specified in part by a physical environment feature for which the occurrence of the event is to be monitored by the data processing apparatus; receiving data indicating an action associated with the event and to be taken in response to the occurrence of the event; monitoring for the occurrence of the event for the physical entity; and in response to the occurrence of the event, causing the action associated with the event to be performed.

Type: Grant

Filed: November 16, 2020

Date of Patent: May 23, 2023

Assignee: GOOGLE LLC

Inventors: Bo Wang, Sunil Vemuri, Nitin Mangesh Shetti, Pravir Kumar Gupta, Scott B. Huffman, Javier Alejandro Rey, Jeffrey A. Boortz
Computer systems and methods for efficient query resolution by customer representatives

Patent number: 11645660

Abstract: A computerized method of representing customer interactions with an organization includes: receiving, by a computing device, customer web interaction data segments and customer conversation data segments; pre-processing the customer conversation data segments to remove specified types of information; scoring each of the pre-processed customer conversation data segments; pre-processing the customer web data interaction segments; extracting from the pre-processed customer web interaction data segments tokens; combining the pre-processed customer conversation data segments and the pre-processed customer web interaction data segments into a customer data set; parsing the customer data set into one or more windows; assigning, for each window, pre-trained weights to each of the tokens in each window; assigning a transaction theme to each window based on the tokens in each window; and generating, based on the transaction themes, a ranked list of topic keywords reflecting the customer web interaction data segments an

Type: Grant

Filed: September 6, 2019

Date of Patent: May 9, 2023

Assignee: FMR LLC

Inventors: Ankush Chopra, Abhishek Desai, Aravind Chandramouli
Robust audio identification with interference cancellation

Patent number: 11631404

Abstract: Audio distortion compensation methods to improve accuracy and efficiency of audio content identification are described. The method is also applicable to speech recognition. Methods to detect the interference from speakers and sources, and distortion to audio from environment and devices, are discussed. Additional methods to detect distortion to the content after performing search and correlation are illustrated. The causes of actual distortion at each client are measured and registered and learnt to generate rules for determining likely distortion and interference sources. The learnt rules are applied at the client, and likely distortions that are detected are compensated or heavily distorted sections are ignored at audio level or signature and feature level based on compute resources available. Further methods to subtract the likely distortions in the query at both audio level and after processing at signature and feature level are described.

Type: Grant

Filed: August 12, 2021

Date of Patent: April 18, 2023

Assignee: ROKU, INC.

Inventors: Jose Pio Pereira, Sunil Suresh Kulkarni, Mihailo M. Stojancic, Shashank Merchant, Peter Wendt
User speech profile management

Patent number: 11626104

Abstract: A device includes processors configured to determine, in a first power mode, whether an audio stream corresponds to speech of at least two talkers. The processors are configured to, based on determining that the audio stream corresponds to speech of at least two talkers, analyze, in a second power mode, audio feature data of the audio stream to generate a segmentation result. The processors are configured to perform a comparison of a plurality of user speech profiles to an audio feature data set of a plurality of audio feature data sets of a talker-homogenous audio segment to determine whether the audio feature data set matches any of the user speech profiles. The processors are configured to, based on determining that the audio feature data set does not match any of the plurality of user speech profiles, generate a user speech profile based on the plurality of audio feature data sets.

Type: Grant

Filed: December 8, 2020

Date of Patent: April 11, 2023

Assignee: QUALCOMM Incorporated

Inventors: Soo Jin Park, Sunkuk Moon, Lae-Hoon Kim, Erik Visser
Semi-sorted batching with variable length input for efficient training

Patent number: 11615782

Abstract: Techniques are described for training neural networks on variable length datasets. The numeric representation of the length of each training sample is randomly perturbed to yield a pseudo-length, and the samples sorted by pseudo-length to achieve lower zero padding rate (ZPR) than completely randomized batching (thus saving computation time) yet higher randomness than strictly sorted batching (thus achieving better model performance than strictly sorted batching).

Type: Grant

Filed: November 30, 2020

Date of Patent: March 28, 2023

Assignee: Sony Interactive Entertainment Inc.

Inventors: Zhenhao Ge, Lakshmish Kaushik, Saket Kumar, Masanori Omote
Terminal and operating method thereof

Patent number: 11615777

Abstract: A terminal may include a display that is divided into at least two areas, when a real time broadcasting, where a user of the terminal is a host, starts through a broadcasting channel, and of which one area of the at least two areas is allocated to the host; an input/output interface that receives a voice of the host; a communication interface that receives one item selected of at least one or more items and a certain text from a terminal of a certain guest, of at least one or more guests who entered the broadcasting channel; and a processor that generates a voice message converted from the certain text into the voice of the host or a voice of the certain guest.

Type: Grant

Filed: August 6, 2020

Date of Patent: March 28, 2023

Assignee: Hyperconnect Inc.

Inventors: Sang Il Ahn, Ju Young Hong, Yong Uk Jeong
Speaker conversion for video games

Patent number: 11605388

Abstract: This specification describes a computer-implemented method of generating speech audio for use in a video game, wherein the speech audio is generated using a voice convertor that has been trained to convert audio data for a source speaker into audio data for a target speaker. The method comprises receiving: (i) source speech audio, and (ii) a target speaker identifier. The source speech audio comprises speech content in the voice of a source speaker. Source acoustic features are determined for the source speech audio. A target speaker embedding associated with the target speaker identifier is generated as output of a speaker encoder of the voice convertor. The target speaker embedding and the source acoustic features are inputted into an acoustic feature encoder of the voice convertor. One or more acoustic feature encodings are generated as output of the acoustic feature encoder. The one or more acoustic feature encodings are derived from the target speaker embedding and the source acoustic features.

Type: Grant

Filed: November 9, 2020

Date of Patent: March 14, 2023

Assignee: Electronic Arts Inc.

Inventors: Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Harold Chaput, Navid Aghdaie, Kazi Zaman
Speech recognition using unspoken text and speech synthesis

Patent number: 11605368

Abstract: A method for training a generative adversarial network (GAN)-based text-to-speech (TTS) model and a speech recognition model in unison includes obtaining a plurality of training text utterances. At each of a plurality of output steps for each training text utterance, the method also includes generating, for output by the GAN-Based TTS model, a synthetic speech representation of the corresponding training text utterance, and determining, using an adversarial discriminator of the GAN, an adversarial loss term indicative of an amount of acoustic noise disparity in one of the non-synthetic speech representations selected from the set of spoken training utterances relative to the corresponding synthetic speech representation of the corresponding training text utterance. The method also includes updating parameters of the GAN-based TTS model based on the adversarial loss term determined at each of the plurality of output steps for each training text utterance of the plurality of training text utterances.

Type: Grant

Filed: November 11, 2021

Date of Patent: March 14, 2023

Assignee: Google LLC

Inventors: Zhehuai Chen, Andrew M. Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno Mengibar

prev 1 2 3 4 5 6 7 8 … next