Patents Examined by James Wozniak
  • Patent number: 10304458
    Abstract: A system and method for summarizing video feeds correlated to identified speakers. A transcriber system includes multiple types of reasoning logic for identifying specific speakers contained in the video feed. Each type of reasoning logic is stored in memory and may be combined and configurable to provide an aggregated speaker identification result useful for full or summarized transcription before transmission across a network for display on a network-accessible device.
    Type: Grant
    Filed: March 6, 2015
    Date of Patent: May 28, 2019
    Assignee: Board of Trustees of the University of Alabama and the University of Alabama in Huntsville
    Inventor: Daniel Newton Woo
  • Patent number: 10185544
    Abstract: Techniques for naming devices via voice commands are described herein. For instance, a user may issue a voice command to a voice-controlled device stating, “you are the kitchen device”. Thereafter, the device may respond to voice commands directed, by name, to this device. For instance, the user may issue a voice command requesting to “play music on my kitchen device”. Given that the user has configured the device to respond to this name, the device may respond to the command by outputting the requested music.
    Type: Grant
    Filed: December 28, 2015
    Date of Patent: January 22, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Rohan Mutagi, Isaac Michael Taylor
  • Patent number: 10170112
    Abstract: A computing system receives requests from client devices to process voice queries that have been detected in local environments of the client devices. The system identifies that a value that is based on a number of requests to process voice queries received by the system during a specified time interval satisfies one or more criteria. In response, the system triggers analysis of at least some of the requests received during the specified time interval to trigger analysis of at least some received requests to determine a set of requests that each identify a common voice query. The system can generate an electronic fingerprint that indicates a distinctive model of the common voice query. The fingerprint can then be used to detect an illegitimate voice query identified in a request from a client device at a later time.
    Type: Grant
    Filed: May 11, 2017
    Date of Patent: January 1, 2019
    Assignee: Google LLC
    Inventors: Alexander H. Gruenstein, Aleksandar Kracun, Matthew Sharifi
  • Patent number: 10157620
    Abstract: A system and method are presented for the correction of packet loss in audio in automatic speech recognition (ASR) systems. Packet loss correction, as presented herein, occurs at the recognition stage without modifying any of the acoustic models generated during training. The behavior of the ASR engine in the absence of packet loss is thus not altered. To accomplish this, the actual input signal may be rectified, the recognition scores may be normalized to account for signal errors, and a best-estimate method using information from previous frames and acoustic models may be used to replace the noisy signal.
    Type: Grant
    Filed: March 4, 2015
    Date of Patent: December 18, 2018
    Inventors: Srinath Cheluvaraja, Ananth Nagaraja Iyer, Aravind Ganapathiraju, Felix Immanuel Wyss
  • Patent number: 10121488
    Abstract: Methods and systems are provided for optimizing call quality and improving network efficiency by reducing bandwidth requirements at the individual-voice-call level. Embodiments provided herein build vocal fingerprints that correspond to the frequency range of the human voice, as well as the frequency range of the voice of individual users. The vocal fingerprints are used minimize and reduce the transmission of background noise and ambient sound captured using HD voice while retaining the frequency range of a user's voice in HD voice. This filtered HD voice frequency range is then transmitted to recipients over the network. The reduced frequency range lowers bandwidth usage and conserves network resources, all while optimizing the call quality for individual users.
    Type: Grant
    Filed: February 23, 2015
    Date of Patent: November 6, 2018
    Assignee: Sprint Communications Company L.P.
    Inventors: Gregory Anderson Drews, Brian Dale Farst, Young Suk Lee, Raymond Reeves
  • Patent number: 10115394
    Abstract: An object is to provide a technique which can provide a highly valid recognition result while preventing unnecessary processing. A voice recognition device includes first to third voice recognition units, and a control unit. When it is decided based on recognition results obtained by the first and second voice recognition units to cause the third voice recognition unit to recognize an input voice, the control unit causes the third voice recognition unit to recognize the input voice by using a dictionary including a candidate character string obtained by at least one of the first and second voice recognition units.
    Type: Grant
    Filed: July 8, 2014
    Date of Patent: October 30, 2018
    Assignee: MITSUBISHI ELECTRIC CORPORATION
    Inventors: Naoya Sugitani, Yohei Okato, Michihiro Yamazaki
  • Patent number: 10089901
    Abstract: Provided is an apparatus for bi-directional sign language/speech translation in real time and method that may automatically translate a sign into a speech or a speech into a sign in real time by separately performing an operation of recognizing a speech externally made through a microphone and outputting a sign corresponding to the speech, and an operation of recognizing a sign sensed through a camera and outputting a speech corresponding to the sign.
    Type: Grant
    Filed: June 21, 2016
    Date of Patent: October 2, 2018
    Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Woo Sug Jung, Hwa Suk Kim, Jun Ki Jeon, Sun Joong Kim, Hyun Woo Lee
  • Patent number: 10079011
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.
    Type: Grant
    Filed: May 20, 2014
    Date of Patent: September 18, 2018
    Assignee: NUANCE COMMUNICATIONS, INC.
    Inventor: Alistair D. Conkie
  • Patent number: 10079021
    Abstract: Systems and methods for utilizing incremental processing of portions of output data to limit the time required to provide a response to a user request are provided herein. In some embodiments, portions of the user request for information can be analyzed using techniques such as automatic speech recognition (ASR), speech-to-text (STT), and natural language understanding (NLU) to determine the overall topic of the user request. One the topic has been determined, portions of the anticipated audio output data can be synthesized independently instead of waiting for the complete response. The synthesized portions can then be provided to the electronic device in anticipation of being output through one or more speakers on the electronic device, which speeds up the time that the response can be provided to the user.
    Type: Grant
    Filed: December 18, 2015
    Date of Patent: September 18, 2018
    Assignee: Amazon Technologies, Inc.
    Inventors: Roberto Barra Chicote, Adam Franciszek Nadolski
  • Patent number: 10055397
    Abstract: There is provided a mechanism for synchronizing a plurality of dynamic language models residing in a plurality of devices associated with a single user, each device comprising a dynamic language model. The mechanism is configured to: receive text data representing text that has been input by a user into one or more of the plurality of devices; train at least one language model on the text data; and provide the at least one language model for synchronizing the devices. There is also provided a system comprising the mechanism and a plurality of devices, and a method for synchronizing a plurality of dynamic language models residing in a plurality of devices associated with a single user.
    Type: Grant
    Filed: May 14, 2013
    Date of Patent: August 21, 2018
    Assignee: TOUCHTYPE LIMITED
    Inventors: Michael Bell, Joe Freeman, Emanuel George Hategan, Benjamin Medlock
  • Patent number: 10043526
    Abstract: The present invention relates to transposing signals in time and/or frequency and in particular to coding of audio signals. More particular, the present invention relates to high frequency reconstruction (HFR) methods including a frequency domain harmonic transposer. A method and system for generating a transposed output signal from an input signal using a transposition factor T is described. The system comprises an analysis window of length La, extracting a frame of the input signal, and an analysis transformation unit of order M transforming the samples into M complex coefficients. M is a function of the transposition factor T. The system further comprises a nonlinear processing unit altering the phase of the complex coefficients by using the transposition factor T, a synthesis transformation unit of order M transforming the altered coefficients into M altered samples, and a synthesis window of length Ls, generating a frame of the output signal.
    Type: Grant
    Filed: October 13, 2015
    Date of Patent: August 7, 2018
    Assignee: Dolby International AB
    Inventors: Per Ekstrand, Lars Villemoes
  • Patent number: 10032462
    Abstract: A method for speech enhancement in speech communication devices and more specifically in hearing aids for suppressing stationary and non-stationary background noise in the input speech signal signals is disclosed. The method uses spectral subtraction wherein the noise spectrum is updated using quantile-based estimation without voice activity detection and the quantile values are approximated by dynamic quantile tracking without involving large storage and sorting of past spectral samples. The technique permits use of a different quantile at each frequency bin for noise estimation without introducing processing overheads. The preferred embodiment uses analysis-modification-synthesis based on Fast Fourier transform (FFT) and it can be integrated with other FFT-based signal processing techniques used in the hearing aids and speech communication devices.
    Type: Grant
    Filed: April 24, 2015
    Date of Patent: July 24, 2018
    Assignee: Indian Institute of Technology Bombay
    Inventors: Prem Chand Pandey, Nitya Tiwari
  • Patent number: 10032457
    Abstract: A circuit to compensate for a lost audio frame, comprising: an identifier configured to identify a reference audio segment with a first length followed by the lost audio frame with a second length; a searcher coupled to the identifier and configured to search for a first audio segment similar to the reference audio segment in a cached audio segment followed by the reference audio segment by utilizing a cross-correlation search; the identifier further configured to identify a second audio segment subsequent to the first audio segment as a pre-compensated audio frame; an adjustor coupled to the identifier and configured to adjust an amplitude of the second audio segment based on a scale factor; and an output coupled to the adjustor to output the adjusted second audio segment as a compensated audio frame.
    Type: Grant
    Filed: June 8, 2017
    Date of Patent: July 24, 2018
    Assignee: BEKEN CORPORATION
    Inventors: Lianxue Liu, Weifeng Wang
  • Patent number: 10032463
    Abstract: An automatic speech recognition (“ASR”) system produces, for particular users, customized speech recognition results by using data regarding prior interactions of the users with the system. A portion of the ASR system (e.g., a neural-network-based language model) can be trained to produce an encoded representation of a user's interactions with the system based on, e.g., transcriptions of prior utterances made by the user. This user-specific encoded representation of interaction history is then used by the language model to customize ASR processing for the user.
    Type: Grant
    Filed: December 29, 2015
    Date of Patent: July 24, 2018
    Assignee: Amazon Technologies, Inc.
    Inventors: Ariya Rastrow, Nikko Ström, Spyridon Matsoukas, Markus Dreyer, Ankur Gandhe, Denis Sergeyevich Filimonov, Julian Chan, Rohit Prasad
  • Patent number: 10013975
    Abstract: A method for speech modeling by an electronic device is described. The method includes obtaining a real-time noise reference based on a noisy speech signal. The method also includes obtaining a real-time noise dictionary based on the real-time noise reference. The method further includes obtaining a first speech dictionary and a second speech dictionary. The method additionally includes reducing residual noise based on the real-time noise dictionary and the first speech dictionary to produce a residual noise-suppressed speech signal at a first modeling stage. The method also includes generating a reconstructed speech signal based on the residual noise-suppressed speech signal and the second speech dictionary at a second modeling stage.
    Type: Grant
    Filed: February 23, 2015
    Date of Patent: July 3, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Yinyi Guo, Juhan Nam, Erik Visser, Shuhua Zhang, Lae-Hoon Kim
  • Patent number: 10002608
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating relevant responses to a user query with voice-enabled search. A system practicing the method receives a word lattice generated by an automatic speech recognizer based on a user speech and a prosodic analysis of the user speech, generates a reweighted word lattice based on the word lattice and the prosodic analysis, approximates based on the reweighted word lattice one or more relevant responses to the query, and presents to a user the responses to the query. The prosodic analysis examines metalinguistic information of the user speech and can identify the most salient subject matter of the speech, assess how confident a speaker is in the content of his or her speech, and identify the attitude, mood, emotion, sentiment, etc. of the speaker. Other information not described in the content of the speech can also be used.
    Type: Grant
    Filed: September 17, 2010
    Date of Patent: June 19, 2018
    Assignee: NUANCE COMMUNICATIONS, INC.
    Inventors: Srinivas Bangalore, Junlan Feng, Michael Johnston, Taniya Mishra
  • Patent number: 10002605
    Abstract: A method and system for achieving emotional text to speech. The method includes: receiving text data; generating emotion tag for the text data by a rhythm piece; and achieving TTS to the text data corresponding to the emotion tag, where the emotion tags are expressed as a set of emotion vectors; where each emotion vector includes a plurality of emotion scores given based on a plurality of emotion categories. A system for the same includes: a text data receiving module; an emotion tag generating module; and a TTS module for achieving TTS, wherein the emotion tag is expressed as a set of emotion vectors; and wherein emotion vector includes a plurality of emotion scores given based on a plurality of emotion categories.
    Type: Grant
    Filed: December 12, 2016
    Date of Patent: June 19, 2018
    Assignee: International Business Machines Corporation
    Inventors: Shenghua Bao, Jian Chen, Yong Qin, Qin Shi, Zhiwei Shuang, Zhong Su, Liu Wen, Shi Lei Zhang
  • Patent number: 9978380
    Abstract: An audio decoder for providing a decoded audio information includes a arithmetic decoder for providing a plurality of decoded spectral values on the basis of an arithmetically-encoded representation of the spectral values and a frequency-domain-to-time-domain converter for providing a time-domain audio representation using the decoded spectral values. The arithmetic decoder is configured to select a mapping rule describing a mapping of a code value onto a symbol code in dependence on a context state. The arithmetic decoder is configured to determine or modify the current context state in dependence on a plurality of previously-decoded spectral values. The arithmetic decoder is configured to detect a group of a plurality of previously-decoded spectral values, which fulfill, individually or taken together, a predetermined condition regarding their magnitudes, and to determine the current context state in dependence on a result of the detection. An audio encoder uses similar principles.
    Type: Grant
    Filed: November 18, 2013
    Date of Patent: May 22, 2018
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
    Inventors: Guillaume Fuchs, Vignesh Subbaraman, Nikolaus Rettelbach, Markus Multrus, Marc Gayer, Patrick Warmbold, Christian Griebel, Oliver Weiss
  • Patent number: 9978379
    Abstract: A method comprising: receiving input signals for multiple channels; and parameterizing the received input signals into parameters defining multiple different object spectra and defining a distribution of the multiple different object spectra in the multiple channels.
    Type: Grant
    Filed: January 5, 2011
    Date of Patent: May 22, 2018
    Assignee: Nokia Technologies Oy
    Inventors: Miikka Vilermo, Joonas Nikunen, Tuomas Virtanen
  • Patent number: 9965685
    Abstract: This application discloses a method implemented by an electronic device to detect a signature event (e.g., a baby cry event) associated with an audio feature (e.g., baby sound). The electronic device obtains a classifier model from a remote server. The classifier model is determined according to predetermined capabilities of the electronic device and ambient sound characteristics of the electronic device, and distinguishes the audio feature from a plurality of alternative features and ambient noises. When the electronic device obtains audio data, it splits the audio data to a plurality of sound components each associated with a respective frequency or frequency band and including a series of time windows. The electronic device further extracts a feature vector from the sound components, classifies the extracted feature vector to obtain a probability value according to the classifier model, and detects the signature event based on the probability value.
    Type: Grant
    Filed: June 12, 2015
    Date of Patent: May 8, 2018
    Assignee: Google LLC
    Inventors: Yoky Matsuoka, Rajeev Conrad Nongpiur, Michael Dixon