Patents Examined by James Wozniak
  • Patent number: 9484028
    Abstract: In one embodiment the present invention includes a method comprising receiving an acoustic input signal and processing the acoustic input signal with a plurality of acoustic recognition processes configured to recognize the same target sound. Different acoustic recognition processes start processing different segments of the acoustic input signal at different time points in the acoustic input signal. In one embodiment, initial states in the recognition processes may be configured on each time step.
    Type: Grant
    Filed: February 19, 2014
    Date of Patent: November 1, 2016
    Assignee: Sensory, Incorporated
    Inventors: Pieter J. Vermeulen, Jonathan Shaw, Todd F. Mozer
  • Patent number: 9484043
    Abstract: Provided is a method, non-transitory computer program product and system for an improved noise suppression technique for speech enhancement. It operates on speech signals from a single source such as either the output from a single microphone or the reconstructed speech signal at the receiving end of a communication application. The system performs background noise monitoring of an in-coming speech signal and determines its level, and performs a time domain gain calculation. The noise suppressed output signal is the gain shaped original speech signal.
    Type: Grant
    Filed: February 24, 2015
    Date of Patent: November 1, 2016
    Assignee: QOSOUND, INC.
    Inventor: Huan-Yu Su
  • Patent number: 9460734
    Abstract: A linear prediction coefficient of a signal represented in a frequency domain is obtained by performing linear prediction analysis in a frequency direction by using a covariance method or an autocorrelation method. After the filter strength of the obtained linear prediction coefficient is adjusted, filtering may be performed in the frequency direction on the signal by using the adjusted coefficient, whereby the temporal envelope of the signal is shaped. This reduces the occurrence of pre-echo and post-echo and improves the subjective quality of the decoded signal, without significantly increasing the bit rate in a bandwidth extension technique in the frequency domain represented by SBR.
    Type: Grant
    Filed: January 10, 2014
    Date of Patent: October 4, 2016
    Assignee: NTT DOCOMO, INC.
    Inventors: Kosuke Tsujino, Kei Kikuiri, Nobuhiko Naka
  • Patent number: 9449602
    Abstract: In some implementations, a device for providing dual uplink processing paths may include a human listening (HL) input processing unit configured to receive an audio stream and pre-process the audio stream to create a first audio signal adapted for human listening via a first uplink processing path, a machine listening (ML) input processing unit configured to receive the audio stream and pre-process the audio stream to create a second audio signal adapted for machine listening via a second uplink processing path, and a network interface unit configured to transmit the first audio signal via the first uplink processing path and transmit the second audio signal via the second uplink processing path to a remote server.
    Type: Grant
    Filed: December 3, 2013
    Date of Patent: September 20, 2016
    Assignee: Google Inc.
    Inventors: Leng Ooi, Aaron Matthew Eash, Dylan Reid
  • Patent number: 9442923
    Abstract: According to one embodiment of the present invention, a system for converting a display from a source spoken language to a target spoken language includes at least one processor. The at least one processor may be configured to determine the source spoken language of content within a selected area of the display. The at least one processor may be further configured to translate the content from the source spoken language to a selected target spoken language. In addition, the at least one processor may be configured to evaluate the translated content and remap the translated content to the selected area based on the evaluation. Finally, the at least one processor may be configured to present the translated content within the selected area on the display.
    Type: Grant
    Filed: November 24, 2015
    Date of Patent: September 13, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alaa Abou Mahmoud, Paul R. Bastide, Robert E. Loredo, Fang Lu
  • Patent number: 9412393
    Abstract: In an approach to determining speech effectiveness, a computer receives speech input. The computer determines, based, at least in part, on the received speech input, whether the speech input is one of: a conversation with words spoken by two or more people during a predetermined time interval, and a presentation with words spoken by one person and not any other person during a predetermined time interval. The computer detects at least one problem with the speech input. If the speech input is the presentation, the computer weights, by a first factor, the detected at least one problem with the speech input based on the speech input being a presentation and not a conversation, and if the speech input is a conversation, the computer weights, by a second factor, the detected at least one problem with the speech input based on the speech input being a conversation and not a presentation.
    Type: Grant
    Filed: April 24, 2014
    Date of Patent: August 9, 2016
    Assignee: International Business Machines Corporation
    Inventors: Patrick A. Spizzo, Sara H. Waggoner, Kaleb D. Walton, Aaron T. Wodrich
  • Patent number: 9401160
    Abstract: Voice activity detectors and related methods are provided. Methods include receiving a frame of the input signal; determining a first SNR of the received frame; comparing the determined first SNR with an adaptive threshold; and detecting whether the received frame comprises voice based on the comparison. The adaptive threshold is at least based on total noise energy of a noise level, an estimate of a second SNR and on energy variation between different frames.
    Type: Grant
    Filed: October 18, 2010
    Date of Patent: July 26, 2016
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventor: Martin Sehlstedt
  • Patent number: 9384756
    Abstract: A noise elimination device according to the present invention includes a signal separation unit that divides input frequency information generated from an input signal on a time-domain into suppression target band information including a cyclic noise as the main component and intended sound band information including intended sound band information as the main component, a first frequency reverse-conversion unit that converts the suppression target band information into time-domain information and thereby outputs a suppression target signal, a second frequency reverse-conversion unit that converts the intended sound band information into time-domain information and thereby outputs an intended sound signal, and a cyclic noise information storage unit that accumulates the suppression target signal and thereby stores noise history information including information corresponding to at least one cycle of the cyclic noise.
    Type: Grant
    Filed: March 6, 2015
    Date of Patent: July 5, 2016
    Assignee: JVC KENWOOD CORPORATION
    Inventors: Takaaki Yamabe, Keisuke Oda
  • Patent number: 9368125
    Abstract: A voice guidance system is provided in which the voice guidance is enabled to easily follow a trend of change intervals, a rapid change of change intervals, etc. in a menu operation. The voice guidance system is configured with an input analyzing unit which inputs and analyzes an operation instruction signal of a menu item, a voice guidance control unit which controls voice guidance of the menu item according to the analysis result by the input analyzing unit, and a textual guidance control unit which performs display control of the menu item according to the analysis result by the input analyzing unit. The voice guidance control unit determines reproduction speed of the voice guidance according to the analysis result, on the basis of a speed trend obtained from a speed history as a set of plural pieces of reproduction speed information.
    Type: Grant
    Filed: August 22, 2013
    Date of Patent: June 14, 2016
    Assignee: RENESAS ELECTRONICS CORPORATION
    Inventor: Kazuyuki Ohno
  • Patent number: 9361879
    Abstract: In one aspect, a method for processing media includes accepting a query. One or more language patterns are identified that are similar to the query. A putative instance of the query is located in the media. The putative instance is associated with a corresponding location in the media. The media in a vicinity of the putative instance is compared to the identified language patterns and data characterizing the putative instance of the query is provided according to the comparing of the media to the language patterns, for example, as a score for the putative instance that is determined according to the comparing of the media to the language patterns.
    Type: Grant
    Filed: February 24, 2009
    Date of Patent: June 7, 2016
    Assignee: NEXIDIA INC.
    Inventors: Robert W. Morris, Jon A. Arrowood, Mark A. Clements, Kenneth King Griggs, Peter S. Cardillo, Marsal Gavalda
  • Patent number: 9361903
    Abstract: Various embodiments provide an ability to analyze an audio input signal and generate a counter audio signal based, at least in part, on the audio input signal. In some cases, combining the audio input signal with the counter audio signal renders the audio input signal incoherent and/or unintelligible to accidental listeners and/or listeners to whom the audio input signal is not directed towards. Alternately or additionally, the counter signal can mask the audio input signal to the accidental listeners.
    Type: Grant
    Filed: August 22, 2013
    Date of Patent: June 7, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Simone Leorin, Nghiep Duy Duong, Steven Wei Shaw, William George Verthein
  • Patent number: 9330665
    Abstract: Automatically adjusting confidence scoring functionality is described for a speech recognition engine. Operation of the speech recognition system is revised so as to change an associated receiver operating characteristic (ROC) curve describing performance of the speech recognition system with respect to rates of false acceptance (FA) versus correct acceptance (CA). Then a confidence scoring functionality related to recognition reliability for a given input utterance is automatically adjusted such that where the ROC curve is better for a given operating point after revising the operation of the speech recognition system, the adjusting reflects a double gain constraint to maintain FA and CA rates at least as good as before revising operation of the speech recognition system.
    Type: Grant
    Filed: January 7, 2011
    Date of Patent: May 3, 2016
    Assignee: Nuance Communications, Inc.
    Inventors: Nicolas Morales, Dermot Connolly, Andrew Halberstadt
  • Patent number: 9324323
    Abstract: Speech recognition techniques may include: receiving audio; identifying one or more topics associated with audio; identifying language models in a topic space that correspond to the one or more topics, where the language models are identified based on proximity of a representation of the audio to representations of other audio in the topic space; using the language models to generate recognition candidates for the audio, where the recognition candidates have scores associated therewith that are indicative of a likelihood of a recognition candidate matching the audio; and selecting a recognition candidate for the audio based on the scores.
    Type: Grant
    Filed: December 14, 2012
    Date of Patent: April 26, 2016
    Assignee: Google Inc.
    Inventors: Daniel M. Bikel, Kapil R. Thadini, Fernando Pereira, Maria Shugrina, Fadi Biadsy
  • Patent number: 9286886
    Abstract: Techniques for predicting prosody in speech synthesis may make use of a data set of example text fragments with corresponding aligned spoken audio. To predict prosody for synthesizing an input text, the input text may be compared with the data set of example text fragments to select a best matching sequence of one or more example text fragments, each example text fragment in the sequence being paired with a portion of the input text. The selected example text fragment sequence may be aligned with the input text, e.g., at the word level, such that prosody may be extracted from the audio aligned with the example text fragments, and the extracted prosody may be applied to the synthesis of the input text using the alignment between the input text and the example text fragments.
    Type: Grant
    Filed: January 24, 2011
    Date of Patent: March 15, 2016
    Assignee: Nuance Communications, Inc.
    Inventors: Stephen Minnis, Andrew P. Breen
  • Patent number: 9236061
    Abstract: The present invention relates to transposing signals in time and/or frequency and in particular to coding of audio signals. More particular, the present invention relates to high frequency reconstruction (HFR) methods including a frequency domain harmonic transposer. A method and system for generating a transposed output signal from an input signal using a transposition factor T is described. The system comprises an analysis window of length La, extracting a frame of the input signal, and an analysis transformation unit of order M transforming the samples into M complex coefficients. M is a function of the transposition factor T. The system further comprises a nonlinear processing unit altering the phase of the complex coefficients by using the transposition factor T, a synthesis transformation unit of order M transforming the altered coefficients into M altered samples, and a synthesis window of length Ls, generating a frame of the output signal.
    Type: Grant
    Filed: September 14, 2010
    Date of Patent: January 12, 2016
    Assignee: Dolby International AB
    Inventors: Per Ekstrand, Lars Villemoes
  • Patent number: 9189475
    Abstract: An inventive indexing scheme to index phrases and sub-phrases for advanced leveraging for translation is presented. The scheme provides ways to match at various levels, and allows approximate matches. The system and method comprises an index structure comprising at least one phrasal marker and/or at least one sub-phrasal marker, the index structure performing advanced leveraging for translation by matching to previously stored index structures. The index structure can be a tree structure. The markers can contain constituent names, values, and a level number. Each marker can be obtained by parsing a target string, so that the parsing identifies the constituents and levels in the target string.
    Type: Grant
    Filed: June 22, 2009
    Date of Patent: November 17, 2015
    Assignee: CA, Inc.
    Inventor: Apurv Raj Shri
  • Patent number: 9190068
    Abstract: A system and method for using bi-directional conversation data to improve signal presence detection are disclosed. The detector module is adapted to communicate with a signal enhancement module. The detector module collects data from a transmit direction of the connection and a receive direction of a data connection. The collected data from the transmit and the receive direction is used to classify at least one of data in the transmit direction and data in the receive direction. Responsive to the classification, the signal enhancement module enhances data in one of the transmit direction and the receive direction. Hence, data classification accuracy is improved by using data from both the transmit and receive directions. In one embodiment, the detector module applies a voice activity detection module (VAD) process to detect the presence or absence of voice data in the collected data.
    Type: Grant
    Filed: April 4, 2011
    Date of Patent: November 17, 2015
    Assignee: Ditech Networks, Inc.
    Inventor: Mahesh Godavarti
  • Patent number: 9171066
    Abstract: An arrangement and corresponding method are described for distributed natural language processing. A set of local data sources is stored on a mobile device. A local natural language understanding (NLU) match module on the mobile device performs natural language processing of a natural language input with respect to the local data sources to determine one or more local interpretation candidates. A local NLU ranking module on the mobile device processes the local interpretation candidates and one or more remote interpretation candidates from a remote NLU server to determine a final output interpretation corresponding to the natural language input.
    Type: Grant
    Filed: November 12, 2012
    Date of Patent: October 27, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: Matthieu Hebert, Jean-Philippe Robichaud, Christopher M. Parisien
  • Patent number: 9117446
    Abstract: A method and system for achieving emotional text to speech. The method includes: receiving text data; generating emotion tag for the text data by a rhythm piece; and achieving TTS to the text data corresponding to the emotion tag, where the emotion tags are expressed as a set of emotion vectors; where each emotion vector includes a plurality of emotion scores given based on a plurality of emotion categories. A system for the same includes: a text data receiving module; an emotion tag generating module; and a TTS module for achieving TTS, wherein the emotion tag is expressed as a set of emotion vectors; and wherein emotion vector includes a plurality of emotion scores given based on a plurality of emotion categories.
    Type: Grant
    Filed: August 31, 2011
    Date of Patent: August 25, 2015
    Assignee: International Business Machines Corporation
    Inventors: Shenghua Bao, Jian Chen, Yong Qin, Qin Shi, Zhiwei Shuang, Zhong Su, Liu Wen, Shi Lei Zhang
  • Patent number: 9105271
    Abstract: An audio encoder receives multi-channel audio data comprising a group of plural source channels and performs channel extension coding, which comprises encoding a combined channel for the group and determining plural parameters for representing individual source channels of the group as modified versions of the encoded combined channel. The encoder also performs frequency extension coding. The frequency extension coding can comprise, for example, partitioning frequency bands in the multi-channel audio data into a baseband group and an extended band group, and coding audio coefficients in the extended band group based on audio coefficients in the baseband group. The encoder also can perform other kinds of transforms. An audio decoder performs corresponding decoding and/or additional processing tasks, such as a forward complex transform.
    Type: Grant
    Filed: October 19, 2010
    Date of Patent: August 11, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Sanjeev Mehrotra, Wei-Ge Chen