Abstract: A voice modulation apparatus is provided. The voice modulation apparatus includes an audio signal input unit which receives an audio signal from an external source; an extraction unit which extracts property information relating to a voice from the audio signal; a storage unit which stores the extracted property information; a control unit which modulates a target voice based on the extracted property information; and an output unit which outputs the modulated target voice.
Abstract: An apparatus comprising an ingress port configured to receive a signal comprising a plurality of encoded audio signals corresponding to a plurality of sources; and a processor coupled to the ingress port and configured to calculate a parameter for each of the plurality of encoded audio signals, wherein each parameter is calculated without decoding any of the encoded audio signals, select some, but not all, of the plurality of encoded audio signals according to the parameter for each of the encoded audio signals, decode the selected signals to generate a plurality of decoded audio signals, and combine the plurality of decoded audio signals into a first audio signal.
Abstract: A system and method for automatically adjusting floor controls based on conversational characteristics is provided. Audio streams are received, which each originate from an audio source. Floor controls for a current configuration including at least a portion of the audio streams are maintained. Conversational characteristics shared by two or more of the audio sources are determined. Possible configurations for the audio streams are identified based on the conversational characteristics. An analysis of the current configuration and the possible configurations is performed. A change threshold comprising a minimum number of timeslices for at least one of the current configuration and one of the possible configurations is applied to the analysis. When the analysis satisfies the change threshold, the floor controls are automatically adjusted. The audio streams are mixed into one or more outputs based on the adjusted floor controls.
Type:
Grant
Filed:
February 27, 2012
Date of Patent:
June 11, 2013
Assignee:
Palo Alto Research Center Incorporated
Inventors:
Paul Masami Aoki, Margaret H. Szymanski, James D. Thornton, Daniel H. Wilson, Allison Gyle Woodruff
Abstract: An automated technique is disclosed for processing audio data and generating one or more actions in response thereto. In particular embodiments, the audio data can be obtained during a phone conversation and post-call actions can be provided to the user with contextually relevant entry points for completion by an associated application. Audio transcription services available on a remote server can be leveraged. The entry points can be generated based on keyword recognition in the transcription and passed to the application in the form of parameters.
Abstract: A first and a second data value are co-compressed by generating a sequence of symbols having a most significant symbol that is the most significant symbol of a compressed representation of the first data value and a least significant symbol that is the most significant symbol of a compressed representation of the second data value. The compressed representation of the first data value corresponds to at least a portion of the symbols of the sequence of symbols starting from the most significant symbol and extending towards the least significant symbol in a first reading direction. The compressed representation of the second data value also corresponds to at least a portion of the symbols of the sequence of symbols, however, starting from the least significant symbol and extending in an opposite reading direction towards the most significant symbol.
Type:
Application
Filed:
September 3, 2010
Publication date:
June 6, 2013
Applicant:
TELEFONAKTIEBOLAGET L M ERICSSON (publ)
Abstract: An audio signal is conveyed more efficiently by transmitting or recording a baseband of the signal with an estimated spectral envelope and a noise-blending parameter derived from a measure of the signal's noise-like quality. The signal is reconstructed by translating spectral components of the baseband signal to frequencies outside the baseband, adjusting phase of the regenerated components to maintain phase coherency, adjusting spectral shape according to the estimated spectral envelope, and adding noise according to the noise-blending parameter. Preferably, the transmitted or recorded signal also includes an estimated temporal envelope that is used to adjust the temporal shape of the reconstructed signal.
Abstract: A speech signal transmission apparatus includes an extractor to extract speech signals from speech source signals collected by a plurality of microphones, a power calculator to calculate powers of speech signals of multiple channels and set any one of the speech signals of the multiple channels as a reference speech signal, a synchronization adjustor to adjust synchronization of the other speech signals based on the reference speech signal, a signal generator to generate extraction signals by offsetting the reference speech signal from the other synchronization-adjusted speech signals, an encryptor to compress and encrypt the reference speech signal and the extraction signals, and a transmitter to transmit the compressed and encrypted reference speech signal and extraction signals.
Abstract: There is provided a method of reproducing and distributing a sound source of en electronic terminal. The method includes a step of starting to simultaneously reproduce a stream of an MR (Music Recorded) sound source file and a stream of an AR (All Recorded) sound source file that a voice is recorded to be added to the MR sound source file by a reproducing unit of the electronic terminal, and outputting one stream of the streams through an output unit; and a step of controlling the reproducing unit to stop the output of the one stream that is currently being output through the output unit and to output the other stream through the output unit by a reproducing switch unit of the electronic terminal based on a selection of a user while the stream of the MR sound source file and the stream of the AR sound source file are reproduced, respectively.
Abstract: In an encoding process, a CPU transforms an audio signal from the real-time domain to the frequency domain, and transforms the signal into spectra consisting of MDCT coefficients. The CPU separates the audio signal into several frequency bands, and performs bit shifting in each band such that the MDCT coefficients can be expressed with pre-configured numbers of bits. The CPU re-quantizes the MDCT coefficients at a precision differing for each band, and transmits the values acquired thereby and shift bit numbers as encoded data. Meanwhile, in a decoding process, a CPU receives encoded data and inverse re-quantizes and inverse bit shifts the data, thereby restoring the MDCT coefficients. Furthermore, the CPU transforms the data from frequency domain to the real-time domain by using the inverse MDCT, and restores and outputs the audio signal.
Abstract: A method for optimizing message transmission and decoding comprises: reading data from a memory of an originating device, the data comprising information regarding the originating device; encoding the data by converting the data to a subset of words having a ranked recognition accuracy higher than the remainder of words; transmitting the encoded data from the originating device to a receiving system audibly as words via a telephone connection; utilizing a voice recognition software to recognize the words; decoding the words back to the data; and taking a predetermined action based on the data.
Abstract: Systems and methods that can be utilized to convert a voice communication received over a telecommunication network to text are described. In an illustrative embodiment, a call processing system coupled to a telecommunications network receives a call from a caller intended for a first party, wherein the call is associated with call signaling information. At least a portion of the call signaling information is stored in a computer readable medium. A greeting is played the caller, and a voice communication from the caller is recorded. At least a portion of the voice communication is converted to text, which is analyzed to identify portions that are inferred to be relatively more important to communicate to the first party. A text communication is generated including at least some of the identified portions and including fewer words than the recorded voice communication. At least a portion of the text communication is made available to the first party over a data network.
Type:
Grant
Filed:
March 11, 2008
Date of Patent:
May 21, 2013
Assignee:
Callwave Communications, LLC
Inventors:
Anthony Bladon, David Giannini, David F. Hofstatter, Colin Kelley, David C. McClintock, Robert F. Smith, David S. Trandal, Leland W. Kirchhoff
Abstract: Techniques for generating, distributing, and using speech recognition models are described. A shared speech processing facility is used to support speech recognition for a wide variety of devices with limited capabilities including business computer systems, personal data assistants, etc., which are coupled to the speech processing facility via a communications channel, e.g., the Internet. Devices with audio capture capability record and transmit to the speech processing facility, via the Internet, digitized speech and receive speech processing services, e.g., speech recognition model generation and/or speech recognition services, in response. The Internet is used to return speech recognition models and/or information identifying recognized words or phrases. Thus, the speech processing facility can be used to provide speech recognition capabilities to devices without such capabilities and/or to augment a device's speech processing capability.
Abstract: Systems, methods and apparatuses are described herein for distributing user attribute information about users of a communications system to communication terminals, which use the user attribute information to configure a speech codec to operate in a speaker-dependent manner during a communication session, thereby improving speech coding efficiency. In a network-assisted model, the user attribute information is stored on the communications network and selectively transmitted to the communication terminals while in a peer-assisted model, the user attribute information is derived by and transferred between communication terminals.
Abstract: A method and an apparatus for generating comfortable noises so as to improve user experience are disclosed. The method includes: if a received data frame is a noise frame, calculating a corresponding energy attenuation parameter based on the noise frame and a data frame received earlier than the noise frame; and attenuating noise energy based on the energy attenuation parameter to obtain a comfortable noise signal. An apparatus for generating comfortable noise is also provided.
Abstract: A multi-layered speech recognition apparatus and method, the apparatus includes a client checking whether the client recognizes the speech using a characteristic of speech to be recognized and recognizing the speech or transmitting the characteristic of the speech according to a checked result; and first through N-th servers, wherein the first server checks whether the first server recognizes the speech using the characteristic of the speech transmitted from the client, and recognizes the speech or transmits the characteristic according to a checked result, and wherein an n-th (2?n?N) server checks whether the n-th server recognizes the speech using the characteristic of the speech transmitted from an (n?1)-th server, and recognizes the speech or transmits the characteristic according to a checked result.
Abstract: A method of pre-processing an audio signal transmitted to a user terminal via a communication network and an apparatus using the method are provided. The method of pre-processing the audio signal may prevent deterioration of a sound quality of the audio signal transmitted to the user terminal by pre-processing the audio signal, and by enabling a codec module, encoding the audio signal, to determine the audio signal as a speech signal. Also, the method of pre-processing the audio signal may improve a probability that the codec module may determine a corresponding audio signal as a speech when the audio signal is transmitted via the communication network by pre-processing the audio signal using a speech codec.
Type:
Application
Filed:
January 8, 2013
Publication date:
May 16, 2013
Inventors:
JAE WOONG JEONG, SEOP HYEONG PARK, JONG KYU RYU
Abstract: The present invention relates to speech coding in wireless and wireline communication systems. The present invention provides a method of saving bandwidth by a controlled dropping of speech frames at an encoder in a sending communication device. The dropping is controlled in a manner to minimize the effects on the speech quality after the decoding in the receiving communication device, by assuring that the state mismatch between the encoder and the decoder is removed or at least significantly reduced. This is achieved by letting the encoder run an ECU algorithm with a similar behavior as the one running in the decoder in the receiving communication device.
Abstract: An encoder comprising an input for inputting frames of an audio signal in a frequency band, at least a first excitation block for performing a first excitation for a speech like audio signal, and a second excitation block for performing a second excitation for a non-speech like audio signal. The encoder further comprises a filter for dividing the frequency band into a plurality of sub bands each having a narrower bandwidth than the frequency band. The encoder also comprises an excitation selection block for selecting one excitation block among the at least first excitation block and the second excitation block for performing the excitation for a frame of the audio signal on the basis of the properties of the audio signal at least at one of the sub bands. The invention also relates to a device, a system, a method and a storage medium for a computer program.
Type:
Grant
Filed:
February 22, 2005
Date of Patent:
May 7, 2013
Assignee:
Nokia Corporation
Inventors:
Janne Vainio, Hannu Mikkola, Pasi Ojala, Jari Mäkinen
Abstract: A method and apparatus for reproducing audio data using low power are provided. The apparatus may reproduce the audio data by determining a power mode based on a memory resource of an internal memory, and an amount of a memory required for reproducing the audio data, controlling a power based on the determined power mode, and decoding the audio data.
Type:
Application
Filed:
August 13, 2012
Publication date:
April 25, 2013
Applicant:
Samsung Electronics CO., LTD.
Inventors:
Chang Yong SON, Kang Eun LEE, Do Hyung KIM, Shi Hwa LEE
Abstract: The disclosure provides a multi-point sound mixing and distant view presentation method, apparatus and system, wherein the multi-point sound mixing and distant view presentation method includes: receiving audio code streams from a plurality of meeting places, wherein each meeting place comprises one or more meeting sections, and each meeting section corresponds to one audio code stream; mixing the audio code streams of the meeting sections which have a corresponding relationship among the plurality of meeting places; and outputting mixed audio code streams to the meeting sections which have the corresponding relationship among the plurality of meeting places. Sounds in different sections of the distant view presentation conference system can be distinguished by technical solutions provided by the disclosure.
Abstract: Disclosed is a method for automatically displaying a menu relating to a specific function by recognizing a user's voice command in a call mode, and for directly executing the menu, and a mobile terminal having the same. A mobile terminal may include a microphone configured to receive a user's voice in a video call mode, a display for displaying information, and a controller configured to recognize the voice, detect a voice command included in the voice, and automatically display a menu corresponding to the detected voice command on the display.
Type:
Grant
Filed:
May 21, 2009
Date of Patent:
April 23, 2013
Assignee:
LG Electronics Inc.
Inventors:
You-Hwa Oh, Ik-Hoon Kim, Kyoung-Jin Jo, Jae-Do Kwak
Abstract: An audio decoding device of the present invention includes: a decoding unit decoding a stream to a spectrum coefficient, and outputting stream information when a frame included in the stream cannot be decoded; an orthogonal transformation unit transforming the spectrum coefficient to a time signal; a correction unit generating a correction time signal based on an output waveform within a reference section that is in a section that overlaps between an error frame section to which the stream information is outputted and an adjacent frame section and that is a section in the middle of the adjacent frame section, when the decoding unit outputs the stream information: and an output unit generating the output waveform by synthesizing the correction time signal and the time signal.
Abstract: Voice over internet protocol (VoIP) devices and conferencing systems may include a spatial encoder associated with a first endpoint and a spatial renderer associated with a second endpoint. The spatial renderer may configured to receive audio data. The audio data may be rendered among a plurality of speakers based on a first set of spatial information for a plurality of microphones associated with the first endpoint, and a second set of spatial information for the plurality of speakers associated with the second endpoint. A method for generating a sound field may include determining spatial information for a plurality of microphones in a local room, determining spatial information for a plurality of speakers in a remote room, mapping the spatial information for the plurality of microphones and the spatial information for the plurality of speakers, and generating a sound field in the remote room based on the mapping.
Type:
Application
Filed:
June 11, 2012
Publication date:
April 18, 2013
Applicant:
CLEARONE COMMUNICATIONS, INC.
Inventors:
Tracy A. Bathurst, Derek Graham, Michael Braithwaite, Russel S. Ericksen, Brett Harris, Sandeep Kalra, David K. Lambert, Peter H. Manley, Ashutosh Pandey, Bryan Shaw, Michael Tilelli, Paul R. Bryson
Abstract: A PIM application provides a single page natural language interface for entering and managing PIM data. The natural language interface may receive a natural language entry as a text character string. The entry may be associated with a task, calendar, contact or other PIM data type. The received entries are processed (for example, parsed) to determine the PIM data type and other information. The original entry is not discarded from the natural language interface as a result of processing. After processing one or more received natural language entries, the entries remain in the natural language interface to be viewed and managed. The entry is maintained so it can be managed with other natural language entries provided in the interface.
Abstract: A vocabulary generating method and apparatus and a speech recognition system using the same are disclosed. In the vocabulary generating method, a new system vocabulary can be generated to increase the flexibility of the speech recognition system, so a user, when unsure of a system command, can use a specially defined “unknown code word” (UCW) for the undetermined part in the command.
Abstract: The transient problem may be sufficiently addressed, and for this purpose, a further delay on the side of the decoding may be reduced if a new SBR frame class is used wherein the frame boundaries are not shifted, i.e. the grid boundaries are still synchronized with the frame boundaries, but wherein a transient position indication is additionally used as a syntax element so as to be used, on the encoder and/or decoder sides, within the frames of these new frame class for determining the grid boundaries within these frames.
Type:
Grant
Filed:
October 18, 2007
Date of Patent:
April 9, 2013
Assignee:
Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung E.V.
Inventors:
Markus Schnell, Michael Schuldt, Manfred Lutzky, Manuel Jander
Abstract: A method and device are provided for modifying a compounded voice message having at least one first voice component. The method includes a step of obtaining at least one second voice component, a step of updating at least one item of information belonging to a group of items of information associated with the compounded voice message as a function of the at least one second voice component and a step of making available the compounded voice message comprising the at least one first and second voice components, and the group of items of information associated with the compounded voice message. The compounded voice message is intended to be consulted by at least one recipient user.
Abstract: A server apparatus acquires content based on instruction information; decodes image data of the acquired content compression encodes captured image data using a predetermined encoding scheme; decodes an audio signal and compression encodes the decoded audio signal using the predetermined encoding scheme, stores the image and the audio signal and sends the packet to a packet forwarding apparatus. A mobile terminal receives the packet, decodes and displays the compression encoded image data stored in the packet; and decodes and reproduces the compression encoded audio signal.
Abstract: A sound process apparatus includes a processor. The processor may execute instructions, which are stored on a memory, and when executed cause the sound process apparatus to perform operations. An obtaining operation may obtain sound data in a remote site. A first determining operation may determine volume levels of voice and noise in the remote site based on the sound data. A second determining operation may determine a volume level of noise in a local site based on the sound in the local site. A third determining operation may determine a target volume level based on the volume level of the voice in the remote site, the volume level of the noise in the remote site, and the volume level of the noise in the local site. A notifying operation may notify a user of the target volume level.
Abstract: A vehicle door opening indicator system comprises a base unit, a remote unit wirelessly communicating with the base unit, an on-board computer, and door opening sensors. The base unit is placed in a vehicle, is powered from a vehicle battery, and comprises a programmable CPU, a memory, a speaker, an input circuit, and an output circuit. The CPU communicates with the remote unit through the input circuit and with the computer through the output circuit. The remote unit, comprising a microphone and a recorder/player, preprograms the CPU with an individualized message. The door opening sensors are connected to the computer, and the preprogrammed individualized message is reproduced through the speaker upon opening the door.
Abstract: A speech enhancement system enhances transitions between speech and non-speech segments. The system includes a background noise estimator that approximates the magnitude of a background noise of an input signal that includes a speech and a non-speech segment. A slave processor is programmed to perform the specialized task of modifying a spectral tilt of the input signal to match a plurality of expected spectral shapes selected by a Codec.
Abstract: The present invention provides a computationally efficient technique for compression encoding of an audio signal, and further provides a technique to enhance the sound quality of the encoded audio signal. This is accomplished by including more accurate attack detection and a computationally efficient quantization technique. The improved audio coder converts the input audio signal to a digital audio signal. The audio coder then divides the digital audio signal into larger frames having a long-block frame length and partitions each of the frames into multiple short-blocks. The audio coder then computes short-block audio signal characteristics for each of the partitioned short-blocks based on changes in the input audio signal.
Type:
Grant
Filed:
March 14, 2011
Date of Patent:
March 26, 2013
Assignee:
Sasken Communication Technologies Limited
Inventors:
K. P. P. Kalyan Chakravarthy, Navaneetha K. Ruthramoorthy, Pushkar P. Patwardhan, Bishwarup Mondal
Abstract: A method of transmitting an input audio signal is disclosed. A current spectral magnitude of the input audio signal is quantized. A quantization error of a previous spectral magnitude is fed back to influence quantization of the current spectral magnitude. The feeding back includes adaptively modifying a quantization criterion to form a modified quantization criterion. A current quantization error is minimized by using the modified quantization criterion. A quantized spectral envelope is formed based on the minimizing and the quantized spectral envelope is transmitted.
Abstract: An audio decoder for decoding a multi-audio-object signal having an audio signal of a first type and an audio signal of a second type encoded therein is described, the multi-audio-object signal having a downmix signal and side information, the side information having level information of the audio signals of the first and second types in a first predetermined time/frequency resolution, and a residual signal specifying residual level values in a second predetermined time/frequency resolution, the audio decoder having a processor for computing prediction coefficients based on the level information; and an up-mixer for up-mixing the downmix signal based on the prediction coefficients and the residual signal to obtain a first up-mix audio signal approximating the audio signal of the first type and/or a second up-mix audio signal approximating the audio signal of the second type.
Type:
Grant
Filed:
April 20, 2012
Date of Patent:
March 26, 2013
Assignee:
Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung e.V.
Inventors:
Oliver Hellmuth, Johannes Hilpert, Leonid Terentiev, Cornelia Falch, Andreas Hoelzer, Juergen Herre
Abstract: A speech enhancement system improves speech conversion within an encoder and decoder. The system includes a first device that converts sound waves into operational signals. A second device selects a template that represents an expected signal model. The selected template models speech characteristics of the operational signals through a speech codebook that is further accessed in a communication channel.
Abstract: An audio encoder for providing an output signal using an input audio signal includes a patch generator, a comparator and an output interface. The patch generator generates at least one bandwidth extension high-frequency signal, wherein a bandwidth extension high-frequency signal includes a high-frequency band. The high-frequency band of the bandwidth extension high-frequency signal is based on a low frequency band of the input audio signal. A comparator calculates a plurality of comparison parameters. A comparison parameter is calculated based on a comparison of the input audio signal and a generated bandwidth extension high-frequency signal. Each comparison parameter of the plurality of comparison parameters is calculated based on a different offset frequency between the input audio signal and a generated bandwidth extension high-frequency signal.
Type:
Grant
Filed:
June 13, 2011
Date of Patent:
March 19, 2013
Assignee:
Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung e.V.
Inventors:
Frederik Nagel, Sascha Disch, Guillaume Fuchs, Juergen Herre, Christian Greibel
Abstract: Systems (1600) and methods (1500) for frame synchronization. The methods involve: extracting bit sequences S0 and S1 from a Bit Stream (“BS”) of a Data Burst (“DB”); decoding S0 and S1 to obtain decoded bit sequences S?0 and S?1; using S?0 and S?1 to determine Bit Error Rate (“BER”) estimates (516, 518); combining the BER estimates to obtain a combined BER estimate; modifying S0 and S1 so that each includes at least one bit of BS which is not included in its current set of bits and so that it is absent of at least one of the bits in the current set of bits; iteratively repeating the decoding, using, combining and modifying steps to obtain more combined BER estimates; analyzing the combined BER estimates to identify a minimum combined BER estimate; and using the minimum combined BER estimate to determine a location of a vocoder voice frame within DB.
Type:
Application
Filed:
September 2, 2011
Publication date:
March 7, 2013
Applicant:
HARRIS CORPORATION
Inventors:
Sujit Nair, Sree B. Amirapu, Eugene H. Peterson, III
Abstract: A method of encoding one or more parent blocks of values, the number of values being the length of each block, the method comprising for each parent block: (a) determining a first sum of values in the parent block; (b) splitting the parent block into smaller subblocks; (c) for at least one of the subblocks, determining a second sum of the values in the subblock, selecting a likelihood table from the plurality of likelihood tables based on said first sum of values in the parent block and encoding the second sum using the likelihood table; (d) designating each subblock a parent block; (e) carrying out steps (a), (b), (c) and (d) until at least one parent block reaches a predetermined condition.
Abstract: A method is provided for concealing a transmission error in a digital signal chopped into a plurality of successive frames associated with different time intervals in which, on reception, the signal may comprise erased frames and valid frames, the valid frames comprising information relating to the concealment of frame loss. The method is implemented during a hierarchical decoding using a core decoding and a transform-based decoding using windows introducing a time delay of less than a frame with respect to the core decoding. The method includes concealing a first set of missing samples for the erased frame, implemented in a first time interval; a step of concealing a second set of missing samples utilizing information of said valid frame and implemented in a second time interval; and a step of transition between the first and the second set of missing samples to obtain at least part of the missing frame.
Type:
Grant
Filed:
March 20, 2009
Date of Patent:
March 5, 2013
Assignee:
France Telecom
Inventors:
David Virette, Pierrick Philippe, Balazs Kovesi
Abstract: A portable device performs a multiple recording function by which data is recorded using different recording techniques. The device includes at least one of an input unit and a touch panel, which creates or supports an input signal for activating an audio-related function and an input signal for activating the multiple recording function while the audio-related function is performed. The device further includes a display panel configured to output a memo writing screen of a memo function in response to the activation of the multiple recording function, the memo writing screen allowing the activation of a voice recording function. The device also includes a control unit configured to control the output of the memo writing screen.
Type:
Application
Filed:
August 20, 2012
Publication date:
February 28, 2013
Applicant:
SAMSUNG ELECTRONICS CO., LTD.
Inventors:
Jin Young JEON, Sang Hyuk KOH, Tae Yeon KIM, Hyun Kyoung KIM, Hyun Mi PARK, Hye Bin PARK, Sae Gee OH
Abstract: A technique of improving the degree of freedom of controlling the accuracy of encoding a stereo signal. In a stereo signal encoding device, a sum/difference calculation section generates a monophonic signal which is the sum of first and second channel signals constituting a stereo signal and a side signal which is the difference between the first channel signal and the second channel signal. A mode setting section generates mode information that indicates either a monophonic encoding mode or a stereo encoding mode. A core layer encoding section, a first extended layer encoding section, a second extended layer encoding section, and a third extended layer encoding section individually carry out the monophonic encoding using the monophonic signals or the stereo encoding using both the monophonic signal and the side signal depending on the mode information, and output to a multiplexing section the resultant encoded information from the core layer to the third extended layer.
Abstract: A system is described that performs frame erasure concealment to generate frames of an output speech signal corresponding to a series of erased frames of encoded bit-stream in a manner that conceals the quality-degrading effects of such erased frames. In one embodiment, responsive to the detection of a first erased frame in the series, a number of steps are performed. These steps include deriving long-term and short synthesis filters based on previously-generated portions of the output speech signal, calculating a ringing signal segment based on the long-term and short-term synthesis filters, and generating a frame of the output speech signal corresponding to the first erased frame by overlap adding the ringing signal segment to an extrapolated waveform. Deriving the long-term filter includes estimating a pitch period based on a previously-generated portion of the output speech signal by finding a lag that minimizes a sum of magnitude difference function.
Abstract: The present disclosure provides systems and methods for dynamically signaling encoder capabilities of vocoders of corresponding communication nodes. In one embodiment, during a call between a first communication node and a second communication node, a control node (e.g., base station controller or mobile switching center) for the first communication node sends capability information for a voice encoder of a vocoder of the first communication node to a control node for the second communication node. As a result, the second communication node is enabled to select and request a preferred encoder mode for the voice encoder of the vocoder of the first communication node based on the capabilities of the voice encoder of the vocoder of the first communication node.
Type:
Application
Filed:
August 17, 2012
Publication date:
February 21, 2013
Applicant:
TELEFONAKTIEBOLAGET L M ERICSSON (PUBL)
Inventors:
Rafi Rabipour, Chung-Cheung Chu, Daniel Cohn
Abstract: An apparatus and a method provide a multi-communication service in a mobile communication terminal. The apparatus includes a microphone for inputting an audio signal and a speaker for outputting an audio signal. The apparatus also includes a plurality of communication modules for transmitting and receiving signals to and from a plurality of counterpart terminals. The apparatus further includes an audio processor for, when the multi-communication service is provided, combining and outputting at least two of audio signals fed from the counterpart terminals providing the multi-communication service and the audio signal input through the microphone.
Abstract: The embodiments of a transcoding method, a transcoding device, and a communication apparatus are provided. The embodiment of a method includes: receiving a bit stream input from a sending end; determining an attribute of discontinuous transmission (DTX) used by a receiving end and a frame type of the input bit stream; and transcoding the input bit stream in a corresponding processing manner according to a determination result. Thereby, a corresponding transcoding operation is performed on the input bit stream according to the attribute of DTX used by the receiving end and the frame type of the input bit stream. In such a manner, input bit streams of various types can be processed, and the input bit streams can be correspondingly transcoded according to the requirements of the receiving end. Therefore, the average computational complexity and peak computational complexity can be effectively decreased without decreasing the quality of the synthesized speech.
Type:
Grant
Filed:
January 21, 2010
Date of Patent:
February 19, 2013
Assignee:
Huawei Technologies Co., Ltd.
Inventors:
Changchun Bao, Hao Xu, Fanrong Tang, Xiangyu Hu
Abstract: The method and system disclosed herein reduces total bandwidth requirement for communication in a voice over Internet protocol application. Sample [101] and convert [102] the analog input audio signal into digital signals and derive sampled frames [103]. Compute spacings of order statistics [104]. Measure the entropy for each of the sampled frames [105]. Set a threshold for entropy [106]. Mark the audio frames as active speech frames or inactive speech frames [107]. Mark an audio frame as an' inactive speech frame when the entropy is greater than the threshold, and mark the audio frame as an active speech frame when the entropy is lesser than the threshold [107]. Transmit only the active speech frames [108].
Abstract: In accordance with the embodiments of the present invention, a system and method for enabling preview, editing, and transmission of emergency notification messages is provided. The system includes a controller, a microphone, and a speech-to-text engine for receiving an audio message input to the microphone and for convert the audio message to a text message. The resulting text message is displayed on a local display, where a user can edit the message via a text editor. Text and/or audio notification devices are provided for displaying the edited text data as a text message. Other embodiments are disclosed and claimed.
Type:
Application
Filed:
August 10, 2011
Publication date:
February 14, 2013
Applicant:
SIMPLEXGRINNELL LP
Inventors:
Daniel G. Farley, Matthew Farley, John R. Haynes
Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for tagging a known signal of interest. Initially, the system classifies the data from an input signal using a short-term classifier, wherein there are at least two classifications available, a first classification of the data as having no identified outputs and a second classification of the data as at least one potential signal of interest, wherein the short-term classifier also bypasses data that is known to be of no interest. After the short-term classifier classifies the inputs, it collapses the input data that is classified as having no identified outputs. This allows the short-term classifier to create time-variant data. Finally, the system will tag a known signal of interest in the time-variant data that was classified as having at least one potential signal of interest. Therefore, a system for tagging a known signal of interest is described.
Abstract: In one embodiment, a method includes receiving at a communication device an audio communication and a transcribed text created from the audio communication, and generating a mapping of the transcribed text to the audio communication independent of transcribing the audio. The mapping identifies locations of portions of the text in the audio communication. An apparatus for mapping the text to the audio is also disclosed.
Abstract: Estimates of spectral magnitude and phase are obtained by an estimation process using spectral information from analysis filter banks such as the Modified Discrete Cosine Transform. The estimation process may be implemented by convolution-like operations with impulse responses. Portions of the impulse responses may be selected for use in the convolution-like operations to trade off between computational complexity and estimation accuracy. Mathematical derivations of analytical expressions for filter structures and impulse responses are disclosed.