Patents Examined by Jakieda R Jackson
  • Patent number: 11710477
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing are described. In one aspect, a method includes the action of accessing voice query log data that includes voice queries spoken by a particular user. The actions further include based on the voice query log data that includes voice queries spoken by a particular user, determining a pause threshold from the voice query log data that includes voice queries spoken by the particular user. The actions further include receiving, from the particular user, an utterance. The actions further include determining that the particular user has stopped speaking for at least a period of time equal to the pause threshold. The actions further include based on determining that the particular user has stopped speaking for at least a period of time equal to the pause threshold, processing the utterance as a voice query.
    Type: Grant
    Filed: June 21, 2021
    Date of Patent: July 25, 2023
    Assignee: Google LLC
    Inventors: Siddhi Tadpatrikar, Michael Buchanan, Pravir Kumar Gupta
  • Patent number: 11705107
    Abstract: Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant of WaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditional TTS systems, where each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.
    Type: Grant
    Filed: October 1, 2020
    Date of Patent: July 18, 2023
    Assignee: Baidu USA LLC
    Inventors: Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, John Miller, Andrew Ng, Jonathan Raiman, Shubhahrata Sengupta, Mohammad Shoeybi
  • Patent number: 11694680
    Abstract: A machine causes a touch-sensitive screen to present a graphical user interface that depicts a slider control aligned with a word that includes a first alphabetic letter and a second alphabetic letter. A first zone of the slider control corresponds to the first alphabetic letter, and a second zone of the slider control corresponds to the second alphabetic letter. The machine detects a touch-and-drag input that begins within the first zone and enters the second zone. In response to the touch-and-drag input beginning within the first zone, the machine presents a first phoneme that corresponds to the first alphabetic letter, and the presenting of the first phoneme may include audio playback of the first phoneme. In response to the touch-and-drag input entering the second zone, the machine presents a second phoneme that corresponds to the second alphabetic letter, which may include audio playback of the second phoneme.
    Type: Grant
    Filed: May 4, 2022
    Date of Patent: July 4, 2023
    Assignee: Learning Squared, Inc.
    Inventors: Vera Blau-McCandliss, Bruce Donald McCandliss
  • Patent number: 11694028
    Abstract: According to one embodiment, the data generation apparatus includes a speech synthesis unit, a speech recognition unit, a matching processing unit, and a dataset generation unit. The speech synthesis unit generates speech data from an original text. The speech recognition unit generates a recognition text by speech recognition from the speech data. The matching processing unit performs matching between the original text and the recognition text. The dataset generation unit generates a dataset in such a manner where the speech data, from which the recognition text satisfying a certain condition for a matching degree relative to the original text is generated, is associated with the original text, based on a matching result.
    Type: Grant
    Filed: August 31, 2020
    Date of Patent: July 4, 2023
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Hiroshi Fujimura, Kenji Iwata, Hui Di, Pengfei Chen
  • Patent number: 11676623
    Abstract: Method, system, device, and non-transitory computer-readable medium for joining a virtual participant in a conversation. In some examples, a computer-implemented method includes: identifying a first conversation scheduled to be participated by a first group of actual participants; joining a first virtual participant into the first conversation; obtaining, via the first virtual participant, a first set of audio data associated with the first conversation while the first conversation occurs; transcribing, via the first virtual participant, the first set of audio data into a first set of text data while the first conversation occurs; and presenting the first set of text data to the first group of actual participants while the first conversation occurs.
    Type: Grant
    Filed: February 23, 2022
    Date of Patent: June 13, 2023
    Assignee: Otter.ai, Inc.
    Inventors: Amro Younes, Winfred James, Tao Xing, Cheng Yuan, Yun Fu, Simon Lau, Robert Firebaugh, Sam Liang
  • Patent number: 11676601
    Abstract: According to one embodiment, a system for voice assistant tracking and activation includes a tracking component, a wake component, a listening component, and a link component. The tracking component is configured to track availability of a plurality of voice assistant services. The wake component is configured to determine a plurality of wake words, each plurality of wake words corresponding to a specific voice assistant service of the plurality of voice assistant services. The listening component is configured to receive audio and detect a first wake word of the plurality of wake words that corresponds to a first voice assistant service of the plurality of voice assistant services. The link component is configured to establish a voice link with the first voice assistant service for voice input by the user.
    Type: Grant
    Filed: September 28, 2021
    Date of Patent: June 13, 2023
    Assignee: Ford Global Technologies, LLC
    Inventors: Ashish Nadkar, Alan Daniel Gonzalez
  • Patent number: 11676572
    Abstract: A method for instantaneous learning in text-to-speech (TTS) during dialog includes receiving a user pronunciation of a particular word present in a query spoken by a user. The method also includes receiving a TTS pronunciation of the same particular word that is present in a TTS input where the TTS pronunciation of the particular word is different than the user pronunciation of the particular word. The method also includes obtaining user pronunciation-related features and TTS pronunciation related features associated with the particular word. The method also includes generating a pronunciation decision selecting one of the user pronunciation or the TTS pronunciation of the particular word that is associated with a highest confidence. The method also include providing the TTS audio that includes a synthesized speech representation of the response to the query using the user pronunciation or the TTS pronunciation for the particular word.
    Type: Grant
    Filed: March 3, 2021
    Date of Patent: June 13, 2023
    Assignee: Google LLC
    Inventors: Vijayaditya Peddinti, Bhuvana Ramabhadran, Andrew Rosenberg, Mateusz Golebiewski
  • Patent number: 11676609
    Abstract: The present disclosure provides a speaker recognition method, an electronic device, and a storage medium. An implementation includes: segmenting the target audio file and the to-be-recognized audio file into a plurality of audio units respectively; extracting an audio feature from each of the audio units to obtain an audio feature sequence of the target audio file and an audio feature sequence of the to-be-recognized audio file; performing feature learning on the audio feature sequence of the target audio file and the audio feature sequence of the to-be-recognized audio file by using Siamese neural network, to obtain a feature vector corresponding to the target audio file and feature vectors respectively corresponding to the plurality of audio units in the to-be-recognized audio file; and recognizing, by using an attention mechanism-based machine learning model, the audio units belonging to the target speaker in the to-be-recognized audio file based on the feature vectors.
    Type: Grant
    Filed: December 20, 2022
    Date of Patent: June 13, 2023
    Inventors: Hang Li, Wenbiao Ding, Zitao Liu
  • Patent number: 11670285
    Abstract: Techniques for an interactive turn-based reading experience are described. A system may take turns reading content, such as a book, with a user. The system may process audio data representing a user reading a portion of the content, determine reading evaluation data, and determine how to proceed for the next turn based on the reading evaluation data. For example, based on the reading evaluation data, the system may read a portion of the content by outputting synthesized speech representing the content, may ask the user re-read a portion of the content, or may ask the user to read a different, smaller portion of the content.
    Type: Grant
    Filed: November 24, 2020
    Date of Patent: June 6, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Kevin Crews, Prasanna H Sridhar, Ariya Rastrow, Nicholas Matthew Jutila, Andrew Oberlin, Samarth Batra, Paul Anthony Bernhardt, Veerdhawal Pande, Roland Maximilian Rolf Maas
  • Patent number: 11669693
    Abstract: An electronic apparatus includes an input unit comprising input circuitry configured to receive a natural language input, a communicator comprising communication circuitry configured to perform communication with a plurality of external chatting servers, and a processor configured to analyze a characteristic of the natural language and a characteristic of the user and to identify a chatting server corresponding to the natural language from among the plurality of chatting servers, and to control the communicator to transmit the natural language to the identified chatting server in order to receive a response with respect to the natural language.
    Type: Grant
    Filed: July 16, 2021
    Date of Patent: June 6, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Chang-hwan Choi, Ji-hwan Yun, Man-un Jeong
  • Patent number: 11657814
    Abstract: Embodiments of the present disclosure set forth a computer-implemented method comprising detecting an initial phrase portion included in a first auditory signal generated by a user, identifying, based on the initial phrase portion, a supplemental phrase portion that complements the initial phrase portion to form a complete phrase, and providing a command signal that drives an output device to generate an audio output corresponding to the supplemental phrase portion.
    Type: Grant
    Filed: October 8, 2020
    Date of Patent: May 23, 2023
    Assignee: Harman International Industries, Incorporated
    Inventors: Stefan Marti, Joseph Verbeke, Evgeny Burmistrov, Priya Seshadri
  • Patent number: 11657830
    Abstract: Systems and methods are disclosed for data driven radio enhancement. For example, methods may include demodulating a radio signal to obtain a demodulated audio signal; determining a window of audio samples based on the demodulated audio signal; applying an audio enhancement network to the window of audio samples to obtain an enhanced audio segment, in which the audio enhancement network includes a machine learning network that has been trained using demodulated audio signals derived from radio signals; and storing, playing, or transmitting an enhanced audio signal based on the enhanced audio segment.
    Type: Grant
    Filed: January 26, 2021
    Date of Patent: May 23, 2023
    Assignee: BabbleLabs LLC
    Inventors: Samer Hijazi, Kamil Krzysztof Wojcicki, Dror Maydan, Christopher Rowen
  • Patent number: 11657816
    Abstract: Methods, systems, and apparatus, for defining and monitoring an event for a physical entity and the performance of an action in response to the occurrence of the event. A method includes receiving data indicating an event for a physical entity, the event specified in part by a physical environment feature for which the occurrence of the event is to be monitored by the data processing apparatus; receiving data indicating an action associated with the event and to be taken in response to the occurrence of the event; monitoring for the occurrence of the event for the physical entity; and in response to the occurrence of the event, causing the action associated with the event to be performed.
    Type: Grant
    Filed: November 16, 2020
    Date of Patent: May 23, 2023
    Assignee: GOOGLE LLC
    Inventors: Bo Wang, Sunil Vemuri, Nitin Mangesh Shetti, Pravir Kumar Gupta, Scott B. Huffman, Javier Alejandro Rey, Jeffrey A. Boortz
  • Patent number: 11645660
    Abstract: A computerized method of representing customer interactions with an organization includes: receiving, by a computing device, customer web interaction data segments and customer conversation data segments; pre-processing the customer conversation data segments to remove specified types of information; scoring each of the pre-processed customer conversation data segments; pre-processing the customer web data interaction segments; extracting from the pre-processed customer web interaction data segments tokens; combining the pre-processed customer conversation data segments and the pre-processed customer web interaction data segments into a customer data set; parsing the customer data set into one or more windows; assigning, for each window, pre-trained weights to each of the tokens in each window; assigning a transaction theme to each window based on the tokens in each window; and generating, based on the transaction themes, a ranked list of topic keywords reflecting the customer web interaction data segments an
    Type: Grant
    Filed: September 6, 2019
    Date of Patent: May 9, 2023
    Assignee: FMR LLC
    Inventors: Ankush Chopra, Abhishek Desai, Aravind Chandramouli
  • Patent number: 11631404
    Abstract: Audio distortion compensation methods to improve accuracy and efficiency of audio content identification are described. The method is also applicable to speech recognition. Methods to detect the interference from speakers and sources, and distortion to audio from environment and devices, are discussed. Additional methods to detect distortion to the content after performing search and correlation are illustrated. The causes of actual distortion at each client are measured and registered and learnt to generate rules for determining likely distortion and interference sources. The learnt rules are applied at the client, and likely distortions that are detected are compensated or heavily distorted sections are ignored at audio level or signature and feature level based on compute resources available. Further methods to subtract the likely distortions in the query at both audio level and after processing at signature and feature level are described.
    Type: Grant
    Filed: August 12, 2021
    Date of Patent: April 18, 2023
    Assignee: ROKU, INC.
    Inventors: Jose Pio Pereira, Sunil Suresh Kulkarni, Mihailo M. Stojancic, Shashank Merchant, Peter Wendt
  • Patent number: 11626104
    Abstract: A device includes processors configured to determine, in a first power mode, whether an audio stream corresponds to speech of at least two talkers. The processors are configured to, based on determining that the audio stream corresponds to speech of at least two talkers, analyze, in a second power mode, audio feature data of the audio stream to generate a segmentation result. The processors are configured to perform a comparison of a plurality of user speech profiles to an audio feature data set of a plurality of audio feature data sets of a talker-homogenous audio segment to determine whether the audio feature data set matches any of the user speech profiles. The processors are configured to, based on determining that the audio feature data set does not match any of the plurality of user speech profiles, generate a user speech profile based on the plurality of audio feature data sets.
    Type: Grant
    Filed: December 8, 2020
    Date of Patent: April 11, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Soo Jin Park, Sunkuk Moon, Lae-Hoon Kim, Erik Visser
  • Patent number: 11615782
    Abstract: Techniques are described for training neural networks on variable length datasets. The numeric representation of the length of each training sample is randomly perturbed to yield a pseudo-length, and the samples sorted by pseudo-length to achieve lower zero padding rate (ZPR) than completely randomized batching (thus saving computation time) yet higher randomness than strictly sorted batching (thus achieving better model performance than strictly sorted batching).
    Type: Grant
    Filed: November 30, 2020
    Date of Patent: March 28, 2023
    Assignee: Sony Interactive Entertainment Inc.
    Inventors: Zhenhao Ge, Lakshmish Kaushik, Saket Kumar, Masanori Omote
  • Patent number: 11615777
    Abstract: A terminal may include a display that is divided into at least two areas, when a real time broadcasting, where a user of the terminal is a host, starts through a broadcasting channel, and of which one area of the at least two areas is allocated to the host; an input/output interface that receives a voice of the host; a communication interface that receives one item selected of at least one or more items and a certain text from a terminal of a certain guest, of at least one or more guests who entered the broadcasting channel; and a processor that generates a voice message converted from the certain text into the voice of the host or a voice of the certain guest.
    Type: Grant
    Filed: August 6, 2020
    Date of Patent: March 28, 2023
    Assignee: Hyperconnect Inc.
    Inventors: Sang Il Ahn, Ju Young Hong, Yong Uk Jeong
  • Patent number: 11605388
    Abstract: This specification describes a computer-implemented method of generating speech audio for use in a video game, wherein the speech audio is generated using a voice convertor that has been trained to convert audio data for a source speaker into audio data for a target speaker. The method comprises receiving: (i) source speech audio, and (ii) a target speaker identifier. The source speech audio comprises speech content in the voice of a source speaker. Source acoustic features are determined for the source speech audio. A target speaker embedding associated with the target speaker identifier is generated as output of a speaker encoder of the voice convertor. The target speaker embedding and the source acoustic features are inputted into an acoustic feature encoder of the voice convertor. One or more acoustic feature encodings are generated as output of the acoustic feature encoder. The one or more acoustic feature encodings are derived from the target speaker embedding and the source acoustic features.
    Type: Grant
    Filed: November 9, 2020
    Date of Patent: March 14, 2023
    Assignee: Electronic Arts Inc.
    Inventors: Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Harold Chaput, Navid Aghdaie, Kazi Zaman
  • Patent number: 11605368
    Abstract: A method for training a generative adversarial network (GAN)-based text-to-speech (TTS) model and a speech recognition model in unison includes obtaining a plurality of training text utterances. At each of a plurality of output steps for each training text utterance, the method also includes generating, for output by the GAN-Based TTS model, a synthetic speech representation of the corresponding training text utterance, and determining, using an adversarial discriminator of the GAN, an adversarial loss term indicative of an amount of acoustic noise disparity in one of the non-synthetic speech representations selected from the set of spoken training utterances relative to the corresponding synthetic speech representation of the corresponding training text utterance. The method also includes updating parameters of the GAN-based TTS model based on the adversarial loss term determined at each of the plurality of output steps for each training text utterance of the plurality of training text utterances.
    Type: Grant
    Filed: November 11, 2021
    Date of Patent: March 14, 2023
    Assignee: Google LLC
    Inventors: Zhehuai Chen, Andrew M. Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno Mengibar