Abstract: An embodiment according to the invention provides a capability of automatically predicting how favorable a given speech signal is for statistical modeling, which is advantageous in a variety of different contexts. In Multi-Form Segment (MFS) synthesis, for example, an embodiment according to the invention uses prediction capability to provide an automatic acoustic driven template versus model decision maker with an output quality that is high, stable and depends gradually on the system footprint. In speaker selection for a statistical Text-to-Speech synthesis (TTS) system build, as another example context, an embodiment according to the invention enables a fast selection of the most appropriate speaker among several available ones for the full voice dataset recording and preparation, based on a small amount of recorded speech material.
Type:
Grant
Filed:
September 7, 2012
Date of Patent:
November 1, 2016
Assignee:
Nuance Communications, Inc.
Inventors:
Alexander Sorin, Slava Shechtman, Vincent Pollet
Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for building an automatic speech recognition system through an Internet API. A network-based automatic speech recognition server configured to practice the method receives feature streams, transcriptions, and parameter values as inputs from a network client independent of knowledge of internal operations of the server. The server processes the inputs to train an acoustic model and a language model, and transmits the acoustic model and the language model to the network client. The server can also generate a log describing the processing and transmit the log to the client. On the server side, a human expert can intervene to modify how the server processes the inputs. The inputs can include an additional feature stream generated from speech by algorithms in the client's proprietary feature extraction.
Type:
Grant
Filed:
November 23, 2010
Date of Patent:
November 1, 2016
Assignee:
AT&T Intellectual Property I, L.P.
Inventors:
Enrico Bocchieri, Dimitrios Dimitriadis, Horst J. Schroeter
Abstract: In accordance with an example embodiment of the present invention, disclosed is a method and an apparatus for voice activity detection (VAD). The VAD comprises creating a signal indicative of a primary VAD decision and determining hangover addition. The determination on hangover addition is made in dependence of a short term activity measure and/or a long term activity measure. A signal indicative of a final VAD decision is then created.
Abstract: An audio encoder has a window function controller, a windower, a time warper with a final quality check functionality, a time/frequency converter, a TNS stage or a quantizer encoder, the window function controller, the time warper, the TNS stage or an additional noise filling analyzer are controlled by signal analysis results obtained by a time warp analyzer or a signal classifier. Furthermore, a decoder applies a noise filling operation using a manipulated noise filling estimate depending on a harmonic or speech characteristic of the audio signal.
Type:
Grant
Filed:
November 11, 2014
Date of Patent:
October 11, 2016
Assignee:
Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
Inventors:
Stefan Bayer, Sascha Disch, Ralf Geiger, Guillaume Fuchs, Max Neuendorf, Gerald Schuller, Bernd Edler
Abstract: A coding method, a decoding method, a coder, and a decoder are disclosed herein. A coding method includes: obtaining the pulse distribution, on a track, of the pulses to be encoded on the track; determining a distribution identifier for identifying the pulse distribution according to the pulse distribution; and generating a coding index that includes the distribution identifier. A decoding method includes: receiving a coding index; obtaining a distribution identifier from the coding index, wherein the distribution identifier is configured to identify the pulse distribution, on a track, of the pulses to be encoded on the track; determining the pulse distribution, on a track, of all the pulses to be encoded on the track according to the distribution identifier; and reconstructing the pulse order on the track according to the pulse distribution.
Abstract: The present disclosure provides a voice recognition method for use in an electronic apparatus comprising a voice input module. The method comprises: receiving voice data by the voice input module; performing a first pattern voice recognition on the received voice data, including identifying whether the voice data comprises a first voice recognition information; performing a second pattern voice recognition on the voice data if the voice data comprises the first voice recognition information; and performing or refusing an operation corresponding to the first voice recognition information according to a result of the second pattern voice recognition. The present disclosure also provides a voice controlling method, an information processing method, and an electronic apparatus.
Abstract: The present invention relates to a method and a background estimator in voice activity detector for updating a background noise estimate for an input signal. The input signal for a current frame is received and it is determined whether the current frame of the input signal comprises non-noise. Further, an additional determination is performed whether the current frame of the non-noise input comprises noise by analyzing characteristics at least related to correlation and energy level of the input signal, and background noise estimate is updated if it is determined that the current frame comprises noise.
Abstract: Disclosed is a mobile terminal device and a radio base station apparatus capable of effectively feeding back PMIs by selecting a precoder using double codebooks W1 and W2 in downlink MIMO transmission. The mobile terminal device includes a feedback control signal generating section that individually performs channel coding for the first PMI selected from the first codebook for wideband/long-period and the second PMI selected from the second codebook for subband/short-period and a transmit section that transmits the individually channel-coded first and second PMIs to the radio base station apparatus on a physical uplink shared channel (PUSCH).
Abstract: An apparatus, method, system and computer-readable medium are provided for generating one or more segments associated with content. The segments may include fragments that may correspond to portions of the content. The segments and/or the fragments may be included in a playlist, and may be based at least in part on a user selection.
Type:
Grant
Filed:
March 6, 2012
Date of Patent:
July 12, 2016
Assignee:
COMCAST CABLE COMMUNICATIONS, LLC
Inventors:
Allen Broome, Joseph Kiok, John Leddy, Brian Field, Eric Rosenfeld, Weidong Mao, Sree Kotay
Abstract: A method for speech retrieval includes acquiring a keyword designated by a character string, and a phoneme string or a syllable string, detecting one or more coinciding segments by comparing a character string that is a recognition result of word speech recognition with words as recognition units performed for speech data to be retrieved and the character string of the keyword, calculating an evaluation value of each of the one or more segments by using the phoneme string or the syllable string of the keyword to evaluate a phoneme string or a syllable string that is recognized in each of the detected one or more segments and that is a recognition result of phoneme speech recognition with phonemes or syllables as recognition units performed for the speech data, and outputting a segment in which the calculated evaluation value exceeds a predetermined threshold.
Type:
Grant
Filed:
April 21, 2015
Date of Patent:
June 28, 2016
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Abstract: Features are disclosed for identifying and providing command suggestions during automatic speech recognition. As utterances are interpreted, suggestions may be provided based on even partial interpretations to guide users of a client device to commands available via speech recognition.
Abstract: A method for speech retrieval includes acquiring a keyword designated by a character string, and a phoneme string or a syllable string, detecting one or more coinciding segments by comparing a character string that is a recognition result of word speech recognition with words as recognition units performed for speech data to be retrieved and the character string of the keyword, calculating an evaluation value of each of the one or more segments by using the phoneme string or the syllable string of the keyword to evaluate a phoneme string or a syllable string that is recognized in each of the detected one or more segments and that is a recognition result of phoneme speech recognition with phonemes or syllables as recognition units performed for the speech data, and outputting a segment in which the calculated evaluation value exceeds a predetermined threshold.
Type:
Grant
Filed:
June 22, 2015
Date of Patent:
June 21, 2016
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Abstract: The present invention provides a method for personalizing voice assistant. First, the voice module is activated. Then, the voice message received by the voice module is recognized. According to the recognition result, the personal name in the voice message is converted to the intelligent conversation name at a remote site, and thus triggering an intelligent conversation module of the server for providing the service of intelligent conversation. Accordingly, the present invention corresponds the universal intelligent conversation name to the personal name for triggering the intelligent conversation module. Consequently, the voice assistant can be personalized.
Abstract: The embodiments of the present invention improves conventional attenuation schemes by replacing constant attenuation with an adaptive attenuation scheme that allows more aggressive attenuation, without introducing audible change of signal frequency characteristics.
Type:
Grant
Filed:
November 20, 2013
Date of Patent:
May 24, 2016
Assignee:
TELEFONAKTIEBOLAGET L M ERICSSON (PUBL)
Inventors:
Sebastian Näslund, Volodya Grancharov, Erik Norvell
Abstract: Briefly, a variety of embodiments, including the following, are described: a system embodiment and methods that allow random access to voice messages, in contrast to sequential access in existing system embodiments; a system embodiment and methods that allow for the optional use of voice recognition to enhance usability; and a system embodiment and methods that apply to the area of voicemail.
Type:
Grant
Filed:
December 30, 2013
Date of Patent:
May 17, 2016
Assignee:
TVG, LLC
Inventors:
Michael Demmitt, Amit Manna, Michael Smith, Luis Arellano, Chris Pedregal, Mike LeBeau, Brian Salomaki
Abstract: A system and method is presented for performing dual mode speech recognition, employing a local recognition module on a mobile device and a remote recognition engine on a server device. The system accepts a spoken query from a user, and both the local recognition module and the remote recognition engine perform speech recognition operations on the query, returning a transcription and confidence score, subject to a latency cutoff time. If both sources successfully transcribe the query, then the system accepts the result having the higher confidence score. If only one source succeeds, then that result is accepted. In either case, if the remote recognition engine does succeed in transcribing the query, then a client vocabulary is updated if the remote system result includes information not present in the client vocabulary.
Type:
Grant
Filed:
February 12, 2015
Date of Patent:
May 3, 2016
Assignee:
SoundHound, Inc.
Inventors:
Timothy P. Stonehocker, Keyvan Mohajer, Bernard Mont-Reynaud
Abstract: A wearable terminal includes voice data generation unit a voice data generation unit configured to generate audio data, a sensing unit configured to sense a motion of a user's upper limb in a first axis direction perpendicular to a plane defined by a vertically downward oriented direction of the upper limb and a direction of movement of the user, and to generate motion data concerning the motion, a determination unit configured to determine, based on the motion data, whether or not the user is going to perform remote control of a home electric appliance, and a data processing unit configured to process the audio data. The data processing unit includes a transmission data generation unit configured to generate transmission data corresponding to the audio data if the determination unit determines that the user is going to perform the remote control, and a transmission unit configured to transmit the transmission data to a network.
Type:
Grant
Filed:
October 1, 2014
Date of Patent:
May 3, 2016
Assignee:
PANASONIC INTELLECTUAL PROPERTY CORPORATION OF AMERICA
Abstract: Adaptive telephone relay service systems. Embodiments herein provide technical solutions for improving text captioning of Captioned Telephone Service calls, including computer systems, computer-implemented methods, and computer program products for automating the text captioning of CTS calls. These technical solutions include, among other things, embodiments for generating text captions from speech data using an adaptive captioning service to provide full automated text captioning and/or operator assisted automated text captioning, embodiments for intercepting and modifying a calling sequence for calls to captioned telephone service users, and embodiments for generating progressive text captions from speech data.
Abstract: Systems and methods for predicting trigger events, such as an advertisement during a video program, and activating a remote control device in response to the prediction are described. By activating the remote control device at a particular time, the remote control device may save energy when listening for data from one or more terminal devices. The time to activate the remote control may be based on one or more factors, including the current presentation position and/or presentation speed of the video program. A remote control device may take additional actions the next time it listens for data, including illuminating backlights, turning on a display, displaying content on the display, interacting with other devices, etc.
Abstract: Provided is an acoustic signal processing device for producing an output sound meeting listener's preferences by adjusting attack sound, reverberation, and noise component.
Abstract: In an embodiment, a method provides for receiving commands within a mobile communications application running on a mobile communication device. The method includes monitoring text entered into a text input region of a touchscreen keyboard module within a user interface on the mobile communication device for an interrupt code, and detecting an interrupt code. The method also includes determining a command from a plurality of commands, based on user inputs following the interrupt code, identifying an action from a plurality of actions corresponding to the plurality of commands, and initiating the action corresponding to the command.
Type:
Grant
Filed:
October 30, 2013
Date of Patent:
March 1, 2016
Assignee:
Sprint Communications Company L.P.
Inventors:
John Gatewood, Kenneth Wayne Samson, Bhanu Prakash Voruganti, Matthew P. Hund
Abstract: Example embodiments described herein generally provide for adaptive audio signal coding of low-frequency and high-frequency audio signals. More specifically, audio signals are categorized into high-frequency audio signals and low-frequency audio signals. Then, based on a set coding and/or characteristics of the low-frequency audio signals, the low-frequency coding manner is selected. Similarly, but in addition to, a bandwidth extension mode to code the high-frequency audio signals is selected according to the low-frequency coding manner and/or characteristics of the audio signals.
Abstract: A method and apparatus for processing encoded audio data that operates on batches of data having a predetermined time block size. An input/output memory buffer provides a delay from input to corresponding output of 2+x time blocks where x is a predetermined constant and 0<x<1.
Type:
Grant
Filed:
January 8, 2014
Date of Patent:
January 19, 2016
Assignee:
TEXAS INSTRUMENTS INCORPORATED
Inventors:
Martin Jeffrey Ambrose, Lester Anderson Longley
Abstract: In one aspect, the invention provides an audio encoding method characterized by a decision being made as to whether the device which will decode the resulting bit stream should apply post filtering including attenuation of interharmonic noise. Hence, the decision whether to use the post filter, which is encoded in the bit stream, is taken separately from the decision as to the most suitable coding mode. In another aspect, there is provided an audio decoding method with a decoding step followed by a post-filtering step, including interharmonic noise attenuation, and being characterized in a step of disabling the post filter in accordance with post filtering information encoded in the bit stream signal. Such a method is well suited for mixed-origin audio signals by virtue of its capability to deactivate the post filter in dependence of the post filtering information only, hence independently of factors such as the current coding mode.
Type:
Grant
Filed:
June 23, 2011
Date of Patent:
December 29, 2015
Assignee:
Dolby International AB
Inventors:
Barbara Resch, Kristofer Kjörling, Lars Villemoes
Abstract: An encoder and a method for encoding a digital signal are provided. The method includes encoding a preceding frame of samples of the digital signal according to a predictive encoding process, and encoding a current frame of samples of the digital signal according to a transform encoding process. The method is implemented such that a first portion of the current frame is also encoded by predictive encoding that is limited relative to the predictive encoding of the preceding frame by reusing at least one parameter of the predictive encoding of the preceding frame and only encoding the parameters of said first portion of the current frame that are not reused. A decoder and a decoding method are also provided, which correspond to the described encoding method.
Abstract: Systems and methods are provided herein relating to audio matching. Descriptors can be generated based on anchor points and interest points that characterize the local neighborhood surrounding the anchor point. Characterizing the local spectrogram neighborhood surrounding anchor points can be more robust to pitch shift distortions and time stretch distortions. Those anchor points surrounded by a lack of spectral activity or even spectral activity can be filtered from further examination. Using these pitch shift and time stretch resistant audio features within descriptors can provide for more accurate and efficient audio matching.
Abstract: A telematics system for a vehicle to be towed is provided. The telematics system includes a vehicle communication network configured to receive vehicle data from at least one vehicle system of a plurality of vehicle systems. The telematics system also includes a telematics module configured to determine a towing mode status of the vehicle, generate telematics data based on the vehicle data, and transmit the telematics data to a remote access system based on the towing mode status of the vehicle indicating that the vehicle is configured to be towed.
Abstract: A method for automatic segmentation of pitch periods of speech waveforms takes a speech waveform, a corresponding fundamental frequency contour of the speech waveform, that can be computed by some standard fundamental frequency detection algorithm, and optionally the voicing information of the speech waveform, that can be computed by some standard voicing detection algorithm, as inputs and calculates the corresponding pitch period boundaries of the speech waveform as outputs by iteratively •calculating the Fast Fourier Transform (FFT) of a speech segment having a length of approximately two periods, the period being calculated as the inverse of the mean fundamental frequency associated with these speech segments, •placing the pitch period boundary either at the position where the phase of the third FFT coefficient is ?180 degrees, or at the position where the correlation coefficient of two speech segments shifted within the two period long analysis frame maximizes, or at a position calculated as a combination
Abstract: Methods and apparatuses are disclosed for identifying locations of communication devices in a simulcast network. A comparator sends a location request to communication devices in a simulcast network. The location request includes timeslot assignments for each talk group in the simulcast network. The comparator receives responses from the communication devices. Each response is received in a timeslot assigned to a talk group. The comparator assigns network resources to talk groups in the simulcast network based at least in part on the received responses.
Abstract: Systems and method for audio processing are disclosed. Left and right channels of an audio data stream are combined to derive sum and difference signals. A time domain to frequency domain converter is provided for converting the sum and difference signals to the frequency domain. a first processing unit is provided for deriving a frequency domain noise signal based at least partly on the frequency domain difference signal. A second processing unit is provided for processing the frequency domain sum signal using the noise signal thereby to reduce noise artifacts in the sum signal. A frequency domain to time domain converter is provided for converting at least the processed frequency domain sum signal to the time domain.
Abstract: An encoder for predictively encoding a signal having a sequence of signal values has a predictor for performing an adaptive prediction in dependence on the signal, and in dependence on one or more weighting values, to obtain predicted signal values, wherein the predictor is configured to reset the weighting values at times which are dependent on the signal, and wherein the predictor is configured to adapt the weighting values to the signal between subsequent resets.
Type:
Grant
Filed:
June 13, 2013
Date of Patent:
September 1, 2015
Assignees:
Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V., Technische Universitaet Ilemnau
Inventors:
Manfred Lutzky, Gerald Schuller, Michael Schnabel, Michael Werner
Abstract: Remote controllers and systems thereof are disclosed. The remote controller remotely operates a receiving host, in which the receiving host provides voice input and speech recognition functions. The remote controller comprises a first input unit and a second input unit for generating a voice input request and a speech recognition request. The generated voice input and speech recognition requests are then sent to the receiving host, thereby forcing the receiving host to perform the voice input and speech recognition functions.
Type:
Grant
Filed:
February 9, 2009
Date of Patent:
September 1, 2015
Assignee:
ASUSTEK COMPUTER INC.
Inventors:
Chia-Chen Liu, Yun-Jung Wu, Liang-Yi Huang, Yi-Hsiu Lee
Abstract: A stereo coding method includes transforming a stereo left channel signal and a stereo right channel signal in a time domain to a frequency domain to form a left channel signal and a right channel signal in the frequency domain; down-mixing the left channel signal and the right channel signal in the frequency domain to generate a monophonic down-mix signal, and transmitting bits obtained after quantization coding is performed on the down-mix signal; extracting spatial parameters of the left channel signal and the right channel signal in the frequency domain; estimating a group delay and a group phase between stereo left and right channels by using the left channel signal and the right channel signal in the frequency domain; and performing quantization coding on the group delay, the group phase and the spatial parameters, so as to obtain a high-quality stereo coding performance at a low bit rate.
Type:
Grant
Filed:
August 6, 2012
Date of Patent:
August 11, 2015
Assignee:
Huawei Technologies Co., Ltd.
Inventors:
Wenhai Wu, Lei Miao, Yue Lang, Qi Zhang
Abstract: The present invention relates to an image display device, to an image display system, to a method for analyzing the emotional state of user, wherein information on a user response to a scene containing content is analyzed so as to provide a user with information on the emotional state of the user for the scene or to selectively provide information added for each scene of the content, thereby rendering an interactive service. According to one embodiment of the present invention, the method for analyzing the emotional state of a user comprises the steps of: outputting a scene comprising content having identification information; receiving information on a user response to the scene; determining the emotional state of the user for the scene on the basis of the information on the user response; and storing the determined emotional state in association with the identification information.
Abstract: An electronic audio apparatus is described that uses a digital audio filter in which a splitter separates an input frame of discrete time audio into different time interval portions. Separate digital filter blocks then operate in parallel upon those time interval portions, respectively. A combiner merges the filtered portions into a single audio channel signal. Other embodiments are also described and claimed.
Abstract: Embodiments include processes, systems, and devices for reshaping virtual baseband signals for transmission on non-contiguous and variable portions of a physical baseband, such as a white space frequency band. In the transmission path, a spectrum virtualization layer maps a plurality of frequency components derived from a transmission symbol produced by a physical layer protocol to sub-carriers of the allocated physical frequency band. The spectrum virtualization layer then outputs a time-domain signal derived from the mapped frequency components. In the receive path, a time-domain signal received on the physical baseband is reshaped by the virtual spectrum layer in order to recompose a time-domain symbol in the virtual baseband.
Abstract: In a CELP coder, a combined innovation codebook coding device comprises a pre-quantizer of a first, adaptive-codebook excitation residual, and a CELP innovation-codebook search module responsive to a second excitation residual produced from the first, adaptive-codebook excitation residual. In a CELP decoder, a combined innovation codebook comprises a de-quantizer of pre-quantized coding parameters into a first excitation contribution, and a CELP innovation-codebook structure responsive to CELP innovation-codebook parameters to produce a second excitation contribution.
Abstract: An apparatus for encoding an audio signal having a stream of audio samples has: a windower for applying a prediction coding analysis window to the stream of audio samples to obtain windowed data for a prediction analysis and for applying a transform coding analysis window to the stream of audio samples to obtain windowed data for a transform analysis, wherein the transform coding analysis window is associated with audio samples within a current frame of audio samples and with audio samples of a predefined portion of a future frame of audio samples being a transform-coding look-ahead portion, wherein the prediction coding analysis window is associated with at least the portion of the audio samples of the current frame and with audio samples of a predefined portion of the future frame being a prediction coding look-ahead portion, wherein the transform coding look-ahead portion and the prediction coding look-ahead portion are identically to each other or are different from each other by less than 20%; and an enc
Type:
Grant
Filed:
August 14, 2013
Date of Patent:
June 2, 2015
Assignee:
Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
Inventors:
Emmanuel Ravelli, Ralf Geiger, Markus Schnell, Guillaume Fuchs, Vesa Ruoppila, Tom Baeckstroem, Bernhard Grill, Christian Helmrich
Abstract: The present invention is based on the finding that parameters including: a first set of parameters of a representation of a first portion of an original signal and a second set of parameters of a representation of a second portion of the original signal can be efficiently encoded when the parameters are arranged in a first sequence of tuples and a second sequence of tuples. The first sequence of tuples includes tuples of parameters having two parameters from a single portion of the original signal and the second sequence of tuples includes tuples of parameters having one parameter from the first portion and one parameter from the second portion of the original signal. A bit estimator estimates the number of necessary bits to encode the first and the second sequence of tuples. Only the sequence of tuples, which results in the lower number of bits, is encoded.
Type:
Grant
Filed:
November 17, 2010
Date of Patent:
May 26, 2015
Assignee:
Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
Inventors:
Ralph Sperschneider, Jürgen Herre, Karsten Linzmeier, Johannes Hilpert
Abstract: A method (700, 800) and apparatus (100, 200) processes audio frames to transition between different codecs. The method can include producing (720), using a first coding method, a first frame of coded output audio samples by coding a first audio frame in a sequence of frames. The method can include forming (730) an overlap-add portion of the first frame using the first coding method. The method can include generating (740) a combination first frame of coded audio samples based on combining the first frame of coded output audio samples with the overlap-add portion of the first frame. The method can include initializing (760) a state of a second coding method based on the combination first frame of coded audio samples. The method can include constructing (770) an output signal based on the initialized state of the second coding method.
Abstract: An apparatus includes a plurality of applications and an integrator having a voice recognition module configured to identify at least one voice command from a user. The integrator is configured to integrate information from a remote source into at least one of the plurality of applications based on the identified voice command. A method includes analyzing speech from a first user of a first mobile device having a plurality of applications, identifying a voice command based on the analyzed speech using a voice recognition module, and incorporating information from the remote source into at least one of a plurality of applications based on the identified voice command.
Abstract: An embedder for embedding a watermark to be embedded into an input information representation comprises an embedding parameter determiner that is implemented to apply a derivation function once or several times to an initial value to obtain an embedding parameter for embedding the watermark into the input information representation. Further, the embedder comprises a watermark adder that is implemented to provide the input information representation with the watermark using the embedding parameter. The embedder is implemented to select how many times the derivation function is to be applied to the initial value.
Type:
Grant
Filed:
March 3, 2009
Date of Patent:
May 19, 2015
Assignee:
Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
Inventors:
Bernhard Grill, Ernst Eberlein, Stefan Kraegeloh, Joerg Pickel, Juliane Borsum
Abstract: Provided is a communication apparatus for direct communication between networks of different types. The communication apparatus includes a transmission data selector determining whether or not data input from a first communication network is speech data, a data processor digitizing and packetizing the data transferred from the transmission data selector, and a modem for converting the digitized and packetized data into analog data and then directly transmitting the analog data to a second communication network different from the first communication network through a speech channel.
Type:
Grant
Filed:
June 10, 2011
Date of Patent:
May 12, 2015
Assignee:
Electronics and Telecommunications Research Institute
Abstract: In a method of improving perceived loudness and sharpness of a reconstructed speech signal delimited by a predetermined bandwidth, performing the steps of providing (S10) the speech signal, and separating (S20) the provided signal into at least a first and a second signal portion. Subsequently, adapting (S30) the first signal portion to emphasize at least a predetermined frequency or frequency interval within the first bandwidth portion. Finally, reconstructing (S40) the second signal portion based on at least the first signal portion, and combining (S50) the adapted first signal portion and the reconstructed second signal portion to provide a reconstructed speech signal with an overall improved perceived loudness and sharpness.
Abstract: The present invention relates to means and methods of automated difference recognition between speech and music signals in voice communication systems, devices, telephones, and methods, and more specifically, to systems, devices, and methods that automate control when either speech or music is detected over communication links. The present invention provides a novel system and method for monitoring the audio signal, analyze selected audio signal components, compare the results of analysis with a pre-determined threshold value, and classify the audio signal either as speech or music.
Abstract: An audio coding terminal and method is provided. The terminal includes a coding mode setting unit to set an operation mode, from plural operation modes, for input audio coding by a codec configured to code the input audio based on the set operation mode such that when the set operation mode is a high frame erasure rate (FER) mode the codec codes a current frame of the input audio according to a select frame erasure concealment (FEC) mode of one or more FEC modes. Upon the setting of the operation mode to be the High FER mode the one FEC mode is selected, from the one or more FEC modes predetermined for the High FER mode, to control the codec by incorporating of redundancy within a coding of the input audio or as separate redundancy information separate from the coded input audio according to the selected one FEC mode.
Abstract: Methods and apparatus for voice and data interlacing in a system having a shared antenna. In one embodiment, a voice and data communication system has a shared antenna for transmitting and receiving information in time slots, wherein the antenna can only be used for transmit or receive at a given time. The system determines timing requirements for data transmission and reception and interrupts data transmission for transmission of speech in selected intervals while meeting the data transmission timing and throughput requirements. The speech can be manipulated to fit with the selected intervals, to preserve the intelligibility of the manipulated speech.
Type:
Grant
Filed:
March 21, 2012
Date of Patent:
May 5, 2015
Assignee:
Raytheon Company
Inventors:
David R. Peterson, Timothy S. Loos, David F. Ring, James F. Keating
Abstract: A voice quality measurement device that measures voice quality of a decoded voice signal outputted from a voice decoder unit. The voice quality measurement device includes a packet buffer unit and a voice information monitoring unit. The packet buffer unit accumulates voice packets that arrive non-periodically as voice information, and outputs the voice information to the voice decoder unit periodically. The voice information monitoring unit monitors continuity of the voice information inputted to the voice decoder unit, and calculates an index of voice quality of the decoded voice signal that reflects acceptability of this continuity.
Abstract: An apparatus includes a user input unit, a display unit, a control unit, and a buffer unit. The display unit includes a speed setting menu. The control unit selects a mode from the speed setting menu in response to the selection signal of the user, and controls a compression ratio of a voice codec and a transfer rate of a modem corresponding to a transmission-side radio, and a reception rate of a modem and a restoration rate of a voice codec corresponding to a reception-side radio, based on the selected mode. The buffer unit performs a storage function if there is a difference between the compression ratio of the voice codec and the transfer rate of the modem or if there is a difference between the reception rate of the modem and the restoration rate of the voice codec.
Type:
Application
Filed:
August 20, 2014
Publication date:
April 30, 2015
Inventors:
Young Ho SON, CheolYong PARK, Tae uk YANG, Jang Hong YOON, Jeong-Seok LIM, Jung-Gil PARK
Abstract: Methods, apparatus, and systems for voice processing are provided herein. An exemplary method can be implemented by a terminal. A voice bit stream to be sent can be obtained. Voice control information corresponding to the voice bit stream to be sent can be obtained. The voice control information can be used for a voice server to determine a voice-mixing strategy. The voice bit stream and the voice control information can be sent to the voice server. At least one voice bit stream, returned by the voice server based on the voice-mixing strategy, can be received. The at least one voice bit stream can be outputted.