Abstract: Systems, methods, apparatuses, and computer programs for transfer of recorded digital voice memos to a computing system and processing of the transferred digital voice memos by the computing system or another computing system are disclosed. A recording device is configured to record a voice memo from a user and store the voice memo. The recording device is also configured to transfer the recorded voice memo to a computing system. The computing system is configured to translate the transferred voice memo into a computer-readable format and parse the translated voice memo. The computing system is also configured to determine a type of software application to which the voice memo pertains via a preamble, a keyword, or a keyphrase in the translated voice memo. The computing system is further configured to create an item in the determined software application based on the translated voice memo.
Abstract: The inventive method provides for an encoder in a voice codec to be designed such that after a particular idle time (“Idle Period”) it recalculates the averaged energy and the autocorrelation function. Administrative points in the network inform the encoder about the idle time which has been set in the transmission network.
Type:
Grant
Filed:
February 2, 2009
Date of Patent:
February 3, 2015
Assignee:
Unify GmbH & Co. KG
Inventors:
Stefan Schandl, Panji Setiawan, Herve Taddei
Abstract: Systems and methods for modifying a computer-based speech recognition system. A speech utterance is processed with the computer-based speech recognition system using a set of internal representations, which may comprise parameters for recognizing speech in a speech utterance, such as parameters of an acoustic model and/or a language model. The computer-based speech recognition system may perform a first task in response to the processed speech utterance. The utterance may also be provided to a human who performs a second task based on the utterance. Data indicative of the first task, performed by the computer system, is compared to data indicative of a second task, performed by the human in response to the speech utterance. Based on the comparison, the set of internal representations may be updated or modified to improve the speech recognition performance and capabilities of the speech recognition system.
Abstract: A continuous speech recognition system to recognize continuous speech smoothly in a noisy environment. The system selects call commands, configures a minimum recognition network in token, which consists of the call commands and mute intervals including noises, recognizes the inputted speech continuously in real time, analyzes the reliability of speech recognition continuously and recognizes the continuous speech from a speaker. When a speaker delivers a call command, the system for detecting the speech interval and recognizing continuous speech in a noisy environment through the real-time recognition of call commands measures the reliability of the speech after recognizing the call command, and recognizes the speech from the speaker by transferring the speech interval following the call command to a continuous speech-recognition engine at the moment when the system recognizes the call command.
Abstract: Non-verbalized tokens, such as punctuation, are automatically predicted and inserted into a transcription of speech in which the tokens were not explicitly verbalized. Token prediction may be integrated with speech decoding, rather than performed as a post-process to speech decoding.
Abstract: A warping factor estimation system comprises label information generation unit that outputs voice/non-voice label information, warp model storage unit in which a probability model representing voice and non-voice occurrence probabilities is stored, and warp estimation unit that calculates a warping factor in the frequency axis direction using the probability model representing voice and non-voice occurrence probabilities, voice and non-voice labels, and a cepstrum.
Abstract: A time-domain system and method of modifying the time scale of digital audio signals includes a pre-processor. The pre-processor forms a synthesized signal for processing with minimum computation and that has optional features to give preference to certain audio channels and/or frequency bands, a mechanism of adaptively characterizing the temporal features of the synthesized signal by its normalized power and zero-crossing count, and a mechanism of identifying a segment of the synthesized signal where the time scale can be modified without introducing artifacts or losing content.
Abstract: The present invention relates to methods and systems for storing words and phrases in a data structure, and retrieving and displaying said words and phrases from said data structure. In particular, the present invention relates to a method and system of predicatively suggesting words and/or phrases to a user entering a string of characters into a user interface, which may be a limited user interface.
Abstract: The method comprises the steps of: digitizing sound signals picked up simultaneously by two microphones (N, M); executing a short-term Fourier transform on the signals (xn(t), xm(t)) picked up on the two channels so as to produce a succession of frames in a series of frequency bands; applying an algorithm for calculating a speech-presence confidence index on each channel, in particular a probability a speech that is present; selecting one of the two microphones by applying a decision rule to the successive frames of each of the channels, which rule is a function both of a channel selection criterion and of a speech-presence confidence index; and implementing speech processing on the sound signal picked up by the one microphone that is selected.
Abstract: The present invention is related to a method for coding excitation signal of a target speech comprising the steps of: extracting from a set of training normalized residual frames, a set of relevant normalized residual frames, said training residual frames being extracted from a training speech, synchronized on Glottal Closure Instant(GCI), pitch and energy normalized; determining the target excitation signal of the target speech; dividing said target excitation signal into GCI synchronized target frames; determining the local pitch and energy of the GCI synchronized target frames; normalizing the GCI synchronized target frames in both energy and pitch, to obtain target normalized residual frames; determining coefficients of linear combination of said extracted set of relevant normalized residual frames to build synthetic normalized residual frames close to each target normalized residual frames; wherein the coding parameters for each target residual frames comprise the determined coefficients.
Type:
Grant
Filed:
March 30, 2010
Date of Patent:
October 14, 2014
Assignees:
Universite de Mons, Acapela Group S.A.
Inventors:
Geoffrey Wilfart, Thomas Drugman, Thierry Dutoit
Abstract: A method and computer system for analyzing a text corpus in a natural language is provided. An initial morphological description having word inflection rules for various groups of words in the natural language is created by a linguist. A plurality of text corpuses are analyzed to obtain information on the occurrence of a plurality of word forms for each word token in each text corpus. A morphological dictionary which contains information about each base form and word inflection rules for each word token with verified hypothesis is generated.
Abstract: A method and computer system for analyzing a text corpus in a natural language is provided. An initial morphological description having word inflection rules for various groups of words in the natural language is created by a linguist. A plurality of text corpuses are analyzed to obtain information on the occurrence of a plurality of word forms for each word token in each text corpus. A morphological dictionary which contains information about each base form and word inflection rules for each word token with verified hypothesis is generated.
Abstract: A method for detecting speech using a first microphone adapted to produce a first signal (x), and a second microphone adapted to produce a second signal (x2), the method comprising the steps of: (i) applying gain to the second signal to produce a normalised second signal, which signal is normalised relative to the first signal; (ii) constructing one or more signal components from the first signal and the normalised second signal; (iii) constructing an adaptive differential microphone (ADM) having a constructed microphone response constructed from the one or more signal components which response has at least one directional null; (iv) producing one or more ADM outputs (yf, yb) from the constructed microphone response in response to detected sound; (v) computing a ratio of a parameter of either a first signal component or a constructed microphone response to a parameter of an output of the ADM; (vi) comparing the ratio to an adaptive threshold value; (vii) detecting speech if the ratio is greater than or equ
Type:
Grant
Filed:
November 19, 2010
Date of Patent:
August 5, 2014
Assignee:
NXP, B.V.
Inventors:
Patrick Kechichian, Cornelis Pieter Janse, Rene Martinus Maria Derkx, Wouter Joos Tirry
Abstract: Disclosed is a speech recognition system in which a common data processing means performs speech recognition of a speech captured by a speech input means to generate recognition result hypotheses which is not biased to one of applications and an adaptation data processing means regenerates recognition result hypotheses, using adaptation data and adaptation processing for each application. The adaptation data processing means provides to each application the recognition result recalculated for each application.
Abstract: In an embodiment, a method of transmitting an input audio signal is disclosed. A first coding error of the input audio signal with a scalable codec having a first enhancement layer is encoded, and a second coding error is encoded using a second enhancement layer after the first enhancement layer. Encoding the second coding error includes coding fine spectrum coefficients of the second coding error to produce coded fine spectrum coefficients, and coding a spectral envelope of the second coding error to produce a coded spectral envelope. The coded fine spectrum coefficients and the coded spectral envelope are transmitted.
Abstract: Some embodiments of an efficient string search have been presented. In one embodiment, a string of bytes representing content written in a non-delimited language is received, wherein the content has been classified into a predetermined category. In a single pass through the string of bytes, a set of N-grams is searched for simultaneously. Statistical information on occurrences of the N-grams, if any, in the string of bytes is collected. In some embodiments, a model is generated based on the statistical information, where the model is usable by a content filter to classify content.
Type:
Grant
Filed:
August 22, 2013
Date of Patent:
July 8, 2014
Assignee:
SonicWALL, Inc.
Inventors:
Thomas E. Raffill, Shunhui Zhu, Roman Yanovsky, Boris Yanovsky, John Gmuender
Abstract: A communication apparatus for adjusting a received voice signal in accordance with an ambient noise, the communication apparatus includes: a microphone for receiving an ambient noise and input voice and outputting a voice input signal corresponding to a level of the input voice and the ambient noise; a receiver for receiving the voice signal; a processer for extracting a voice component originated by a sender and an ambient noise component originated by the ambient noise, determining the ratio between the voice component and the ambient noise component, and adjusting the amplitude of the received voice signal in accordance with the ratio; and a speaker for outputting a reception voice corresponding to the adjusted reception voice signal.
Abstract: Noise and channel distortion parameters in the vectorized logarithmic or the cepstral domain for an utterance may be estimated, and subsequently the distorted speech parameters in the same domain may be updated using an unscented transformation framework during online automatic speech recognition. An utterance, including speech generated from a transmission source for delivery to a receiver, may be received by a computing device. The computing device may execute instructions for applying the unscented transformation framework to speech feature vectors, representative of the speech, in order to estimate, in a sequential or online manner, static noise and channel distortion parameters and dynamic noise distortion parameters in the unscented transformation framework. The static and dynamic parameters for the distorted speech in the utterance may then be updated from clean speech parameters and the noise and channel distortion parameters using non-linear mapping.
Abstract: The present invention provides a new recursive FIR filter scheme which supports a variable order short-term predictor, and uses a pipeline stall based on the radix-2 algorithm and an autocorrelation processing time for reducing the complexity of MPEG-4 ALS hardware implementation.
Type:
Grant
Filed:
December 29, 2011
Date of Patent:
May 20, 2014
Assignee:
Korea Electronics Technology Institute
Inventors:
Byeong Ho Choi, Dong Sun Kim, Je Woo Kim, Choong Sang Cho, Seung Yerl Lee, Sang Seol Lee
Abstract: Disclosed configurations include systems, methods, and apparatus arranged to generate a sequence of spectral tilt values that is based on inactive frames of a speech signal. For each of a plurality of inactive frames of the speech signal, a transmit decision is made according to a change calculated among at least two corresponding values of the sequence. The outcome of the transmit decision determines whether a silence description is transmitted for the corresponding inactive frame.
Type:
Grant
Filed:
July 30, 2007
Date of Patent:
May 13, 2014
Assignee:
QUALCOMM Incorporated
Inventors:
Vivek Rajendran, Ananthapadmanabhan A. Kandhadai