Abstract: Configurations disclosed herein include systems, methods, and apparatus that may be applied in a voice communications and/or storage application to remove, enhance, and/or replace the existing context. Enhancing the context of a voice communication may first include suppressing an existing context component from the digital audio signal to obtain a context suppressed signal. This signal may then be mixed with a new context signal to create a context enhanced signal, which may then be encoded before transmission. When this new context enhanced signal includes a speech component, it may be encoded and transmitted at a particular bit rate. When the context enhanced signal does not include a speech component, it may also be encoded at a similar bit rate. However, depending on the state of a process control signal, portions of a digital audio signal that lack a speech component may also be transmitted at a lower bit rate.
Type:
Grant
Filed:
May 29, 2008
Date of Patent:
October 8, 2013
Assignee:
QUALCOMM Incorporated
Inventors:
Nagendra Nagaraja, Khaled El-Maleh, Eddie L. T. Choy
Abstract: An audio encoder implements multi-channel coding decision, band truncation, multi-channel rematrixing, and header reduction techniques to improve quality and coding efficiency. In the multi-channel coding decision technique, the audio encoder dynamically selects between joint and independent coding of a multi-channel audio signal via an open-loop decision based upon (a) energy separation between the coding channels, and (b) the disparity between excitation patterns of the separate input channels. In the band truncation technique, the audio encoder performs open-loop band truncation at a cut-off frequency based on a target perceptual quality measure. In multi-channel rematrixing technique, the audio encoder suppresses certain coefficients of a difference channel by scaling according to a scale factor, which is based on current average levels of perceptual quality, current rate control buffer fullness, coding mode, and the amount of channel separation in the source.
Type:
Grant
Filed:
August 27, 2009
Date of Patent:
October 8, 2013
Assignee:
Microsoft Corporation
Inventors:
Wei-Ge Chen, Naveen Thumpudi, Ming-Chieh Lee
Abstract: Methods and apparatus to extract data encoded in media content are disclosed. An example method includes sampling a media content signal to generate digital samples, determining a frequency domain representation of the digital samples, determining a first rank of a first frequency in the frequency domain representation, determining a second rank of a second frequency in the frequency domain representation, combining the first rank and the second rank with a set of ranks to create a combined set of ranks, comparing the combined set of ranks to a set of reference sequences, determining a data represented by the combined set of ranks based on the comparison, and storing the data in a memory device.
Type:
Grant
Filed:
December 30, 2011
Date of Patent:
October 8, 2013
Assignee:
The Nielsen Company (US), LLC
Inventors:
Venugopal Srinivasan, Alexander Pavlovich Topchy
Abstract: Configurations disclosed herein include systems, methods, and apparatus that may be applied in a voice communications and/or storage application to remove, enhance, and/or replace the existing context. Particularly, certain embodiments contemplate suppressing the context component from the digital audio signal to obtain a context-suppressed signal; generating an audio context signal that is based on a first filter and a first plurality of sequences, each of the first plurality of sequences having a different time resolution and mixing a first signal that is based on the generated audio context signal with a second signal that is based on the context-suppressed signal to obtain a context-enhanced signal, wherein generating an audio context signal includes applying the first filter to each of the first plurality of sequences.
Abstract: A user device provides dynamic speech processing services during variable network connectivity with a network server. The user device includes a connection determiner that monitors a level of network connectivity between the user device and the network server, a simplified speech processor that processes speech data and is initiated based on a determination by the connection determiner that the level of network connectivity between the user device and the network server is impaired, a memory that stores processed speech data processed by the simplified speech processor, and a transmitter configured to transmit the stored processed speech data. The connection determiner determines when the level of network connectivity between the user device and the network server is no longer impaired.
Abstract: A composite memory card has 4-bit and 1-bit data transfer modes; in which, when the composite memory card plugs in a general reader, it operates in the 4-bit data transfer mode to form 4-bit format information signal linking with the external transfer through four data pins; and when it plugs in a dedicated reader, it operates in the 1-bit data transfer mode to form 1-bit format data signal linking with the external transfer through one data pin, while it starts the internal audio processing module to read the internal audio file streaming and to transform into voice output signal output to the dedicated reader through the other data pins for broadcast, and to receive the voice input signal of the dedicated reader through data pins to transform into audio file streaming to import for storage; thus, the composite memory card combines the standard memory card and the audio processing function.
Abstract: A system and method for enabling a user to retrieve, decode, and utilize hidden data embedded in audio signals. An exemplary implementation includes a microphone structured to receive sound waves representative of an audio signal and hidden data embedded in the audio signal. The then microphone converts the received sound waves into an electrical output signal. The system also includes a processor electrically coupled to the microphone and configured to receive the electrical output signal in order to extract the hidden data and provide information represented by the hidden data as an output thereof. A user interface is also provided and is electrically coupled to the processor and configured to receive a first input from the user and activate the processor to selectively initiate extraction of the hidden data. The processor produces as an output the information represented by the hidden data. Finally, the system includes a user presentation mechanism configured to present the information to the user.
Abstract: Disclosed is a method for enrolling a voiceprint in a fraudster database, the method comprising: a) defining a fraud model comprising at least one hypothesis indicative of a fraudulent transaction; b) processing audio data based on the fraud model to identify at least one suspect voiceprint in the audio data suspected of belonging to a fraudster; and c) enrolling the at least one suspect voiceprint in the fraudster database.
Type:
Application
Filed:
May 21, 2013
Publication date:
September 26, 2013
Inventors:
Richard Gutierrez, Anthony Rajakumar, Lisa Marie Guerra, David Hartig
Abstract: Methods and apparatus for voice and data interlacing in a system having a shared antenna. In one embodiment, a voice and data communication system has a shared antenna for transmitting and receiving information in time slots, wherein the antenna can only be used for transmit or receive at a given time. The system determines timing requirements for data transmission and reception and interrupts data transmission for transmission of speech in selected intervals while meeting the data transmission timing and throughput requirements. The speech can be manipulated to fit with the selected intervals, to preserve the intelligibility of the manipulated speech.
Type:
Application
Filed:
March 21, 2012
Publication date:
September 26, 2013
Applicant:
Raytheon Company
Inventors:
David R. Peterson, Timothy S. Loos, David F. Ring, James F. Keating
Abstract: Method and apparatus for processing audio signals are provided. The method for decoding an audio signal includes receiving filter information, applying spatial information to the filter information to generate surround converting information, and outputting the surround converting information. The apparatus for decoding an audio signal includes a filter information receiving part receiving filter information; an information converting part applying spatial information to the filter information to generate surround converting information, and a surround converting information output part outputting the surround converting information.
Type:
Grant
Filed:
May 26, 2006
Date of Patent:
September 24, 2013
Assignee:
LG Electronics Inc.
Inventors:
Hyen O Oh, Hee Suk Pang, Dong Soo Kim, Jae Hyun Lim, Yang-Won Jung
Abstract: A multi-channel signal enhancement system reinforces signal content and improves the signal-to-noise ratio of a multi-channel signal. The system detects, tracks, and reinforces non-stationary periodic signal components of a multi-channel signal. The periodic signal components of the signal may represent vowel sounds or other voiced sounds. The system may detect, track, or attenuate quasi-stationary signal components in the multi-channel signal.
Abstract: A computerized system for advising one communicant in electronic communication between two or more communicants has apparatus monitoring and recording interaction between the communicants, software executing from a machine-readable medium and providing analytics, the software functions including rendering speech into text, and analyzing the rendered text for topics, performing communicant verification, and detecting changes in communicant emotion. Advice is offered to the one communicant during the interaction, based on results of the analytics.
Type:
Application
Filed:
May 6, 2013
Publication date:
September 19, 2013
Applicant:
GENESYS TELECOMMUNICATIONS LABORATORIES, INC.
Abstract: A method for reducing call power consumption of a mobile terminal and mobile terminal are disclosed in the present invention, wherein, the method includes: in a voice call process, the mobile terminal performing voiceprint modeling on an audio signal collected by the mobile terminal itself to obtain a voiceprint model, and judging whether the obtained voiceprint model matches with a stored voiceprint model of the user; if not matching, giving up performing wireless transmission on the collected audio signal or giving up performing baseband and radio frequency processing and wireless transmission on the collected audio signal, and if matching, performing the baseband and radio frequency processing and wireless transmission on the audio signal. With the present invention, voice call power consumption of the mobile terminal is reduced, battery usage time of the mobile terminal is extended, and user experience is enhanced.
Abstract: A core network connected to a mobile communication network and establishing voice communication between communication apparatuses receives a connection request that includes an identifier identifying a terminating communication apparatus, from the mobile communication network, to which an originating mobile communication apparatus is connected, and temporarily signals a first codec candidate that should be used by the originating mobile communication apparatus to the originating mobile communication apparatus. Then, the core network determines at least one codec that can be used in the terminating communication apparatus.
Abstract: A parameter decoding apparatus includes a prediction residue decoder that finds a quantized prediction residue based on encoded information included in a current frame subject to decoding and an auto-regressive predictor produces a predicted parameter by multiplying a predictive coefficient with a past decoded parameter. An adder decodes a parameter by adding the quantized prediction residue and the predicted parameter, wherein the prediction residue decoder, when the current frame is erased, finds a current-frame quantized prediction residue from a weighted linear sum of a parameter decoded in the past and a future-frame quantized prediction residue.
Abstract: There is provided a method of encoding audio and including said encoded audio into a digital transport stream, comprising receiving at an encoder input a plurality of temporally co-located audio signals, assigning identical time stamps per unit time to all of the plurality of temporally co-located audio signals and incorporating the identically time stamped audio signals into the digital transport stream. There is also provided a method decoding said encoded data, and encoding apparatus and decoding apparatus.
Abstract: An audio decoder for decoding a multi-audio-object signal having an audio signal of a first type and an audio signal of a second type encoded therein is described, the multi-audio-object signal having a downmix signal and side information, the side information having level information of the audio signals of the first and second types in a first predetermined time/frequency resolution, and a residual signal specifying residual level values in a second predetermined time/frequency resolution, the audio decoder having a processor for computing prediction coefficients based on the level information; and an up-mixer for up-mixing the downmix signal based on the prediction coefficients and the residual signal to obtain a first up-mix audio signal approximating the audio signal of the first type and/or a second up-mix audio signal approximating the audio signal of the second type.
Type:
Grant
Filed:
January 23, 2013
Date of Patent:
September 17, 2013
Assignee:
Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung e.V.
Inventors:
Oliver Hellmuth, Johannes Hilpert, Leon Terentiv, Cornelia Falch, Andreas Hoelzer, Juergen Herre
Abstract: A method of receiving an audio signal includes measuring a periodicity of the audio signal to determine a checked periodicity. At least one best available subband is determined. At least one extended subband is composed, wherein composing includes reducing a ratio of composed harmonic components to composed noise components if the checked periodicity is lower than a threshold, and scaling a magnitude of the at least one extended subband based on a spectral envelope on the audio signal.
Abstract: In one embodiment, a method of transceiving an audio signal is disclosed. The method includes providing low band spectral information having a plurality of spectrum coefficients and predicting a high band extended spectral fine structure from the low band spectral information for at least one subband, where the high band extended spectral fine structure are made of a plurality of spectrum coefficients. The predicting includes preparing the spectrum coefficients of the low band spectral information, defining prediction parameters for the high band extended spectral fine structure and index ranges of the prediction parameters, and determining possible best indices of the prediction parameters, where determining includes minimizing a prediction error between a reference subband in high band and a predicted subband that is selected and composed from an available low band. The possible best indices of the prediction parameters are transmitted.
Abstract: A scalable speech and audio codec is provided that implements combinatorial spectrum encoding. A residual signal is obtained from a Code Excited Linear Prediction (CELP)-based encoding layer, where the residual signal is a difference between an original audio signal and a reconstructed version of the original audio signal. The residual signal is transformed at a Discrete Cosine Transform (DCT)-type transform layer to obtain a corresponding transform spectrum having a plurality of spectral lines. The transform spectrum spectral lines are transformed using a combinatorial position coding technique. The combinatorial position coding technique includes generating a lexicographical index for a selected subset of spectral lines, where each lexicographic index represents one of a plurality of possible binary strings representing the positions of the selected subset of spectral lines. The lexicographical index represents non-zero spectral lines in a binary string in fewer bits than the length of the binary string.
Abstract: A computer-implemented method of automatically generating an electronic reminder is provided. The method includes identifying, using term-recognition circuitry, at least one key term within an electronic message received with an electronic communications device. The method further includes generating at least one reminder based upon the at least one key term. One or more reminders are, according to the method, electronically conveyed to a user at a time later than when the message was received.
Type:
Grant
Filed:
July 1, 2008
Date of Patent:
September 3, 2013
Assignee:
International Business Machines Corporation
Inventors:
Lisa Bradley, Geetika Tandon, FuYi Li, Valerie Bennett
Abstract: A system and method for providing an audio representation of a name includes providing a list of a plurality of users of a network and respective presence information regarding each of the plurality of users; receiving a request from an endpoint to receive an audio representation of a name of a particular user of the plurality of users, and providing the audio representation to the endpoint. Moreover, the audio representation of the name at least generally approximates a pronunciation of the name as pronounced by the particular user.
Abstract: A system and method provide an audio/video coding system for adaptively transcoding audio streams based on content characteristics of the audio streams. An audio stream metadata extraction module of the system is configured to extract metadata of a source audio stream. An audio stream classification module of the system is configured to classify the source audio stream into one of the several audio content categories based on the metadata of the source audio stream. An adaptive audio encoder of the system is configured to determine one or more transcoding parameters including target bitrate and sampling rate based on the metadata and classification of the source audio stream. An adaptive audio transcoder of the system is configured to transcode the source audio stream into an output audio stream using the transcoding parameters.
Abstract: Embodiments of methods, apparatuses, devices and systems associated with encoding and/or decoding audio data are disclosed. More particularly, the claimed subject matter relates, at least in part, to a data compression/decompression method or technique, such as a lossless, approximately lossless, and/or relatively lossless data compression/decompression method or technique, for example, along with systems or apparatuses that may relate to method or technique. The disclosed techniques and methods may achieve audio data compression ratios that may be comparable to lossless compression processes. In addition, under certain circumstances, such compression ratios may be achieved while also reducing or simplifying the computational complexity of the compression and/or decompression method or technique.
Abstract: A voice processing apparatus is provided in an ADPCM (Adaptive Differential Pulse Code Modulation) voice transmission system in which voice data that is differentially quantized through an ADPCM scheme is transmitted. The voice processing apparatus includes an error detector which detects whether or not an error occurs in a transmission frame containing voice data that indicates a differential value, and an error determiner which determines a level of the error detected by the error detector when the error detector detects the error. The voice processing apparatus also includes a voice processor which corrects the voice data with a correction value depending on the level of the error detected b the error detector and an ADPCM decoder which decodes the voice data corrected by the voice processor.
Abstract: Codebook indices for a scalable speech and audio codec may be efficiently encoded based on anticipated probability distributions for such codebook indices. A residual signal from a Code Excited Linear Prediction (CELP)-based encoding layer may be obtained, where the residual signal is a difference between an original audio signal and a reconstructed version of the original audio signal. The residual signal may be transformed at a Discrete Cosine Transform (DCT)-type transform layer to obtain a corresponding transform spectrum. The transform spectrum is divided into a plurality of spectral bands, where each spectral band having a plurality of spectral lines. A plurality of different codebooks are then selected for encoding the spectral bands, where each codebook is associated with a codebook index. A plurality of codebook indices associated with the selected codebooks are then encoded together to obtain a descriptor code that more compactly represents the codebook indices.
Abstract: Historically, most audio recording and communication control has been exerted through the use of physical buttons, slider and knobs, e.g. to start/stop recording or communicating, control speaker and microphone volume settings, etc. The present invention describes improvements to this approach, such as detecting and analyzing audio signals with human voice components, e.g. to start/stop recording and communicating, set local and remote recording and playback volumes and filters, and manage metadata associated with temporal ranges in audio streams. Audio signals have historically been seen as largely real-time, to be analyzed, acted on, recorded, transmitted, etc. immediately. The current invention treats audio signals as a buffered continuum, so that the system has historical access to audio signals and metadata, and may act on both past and future audio signals and metadata.
Abstract: A method of processing a signal is disclosed. The present invention includes receiving at least one of a first signal and a second signal, obtaining mode information and modification flag information indicating whether the first signal is modified, if it is determined as an audio coding scheme according to the mode information, decoding the first signal by the audio coding scheme, if the first signal is modified based on the modification flag information, reconstructing the first signal by applying modification reconstruction information to the first signal, determining an extension base signal corresponding to a partial region of the first signal based on extension information, and generating an extended downmix signal having a bandwidth extended by reconstructing a high frequency region signal using the extension base signal and the extension information.
Abstract: The invention provides a system, method and computer program for sharing audible word tags. The word may be an individual's name or information conveyed through series of words. An audible word tag may be recorded by an individual. The audible word tag may be embedded in electronic correspondence and/or documents for sharing with others, or accessed dynamically via the internet and/or other applicable network connectivity on an as-required basis. The method includes generating a profile for associating one or more words to an audible word tag. An audio recording is made of the one or more words. The audio recording is linked to the profile. The audible word tag is linked to one or more electronic correspondence or print. The audible word tag is accessible by a receiver of the correspondence to initiate the playback of the audio recording.
Abstract: An exemplary recording method receives the personal information of a speaker transmitted from a RFID tag through a RFID reader. Then the method receives the voice of the speaker through a microphone. The method next receives the personal information of the speaker and the identifier of the audio input device transmitted from the audio input device, and associates the personal information of the speaker with the received identifier of the audio input device. Then, the method receives the voice and the identifier of the audio input device transmitted from the audio input device. The method further converts the received voice to text. The method determines the personal information corresponding to the identifier of the audio input device received with the voice, and associates the converted text with the determined personal information to generate a record.
Type:
Application
Filed:
August 2, 2012
Publication date:
August 1, 2013
Applicants:
HON HAI PRECISION INDUSTRY CO., LTD., HONG FU JIN PRECISION INDUSTRY (ShenZhen) CO., LTD
Abstract: A method including: obtaining, via a plurality of communication devices, a plurality of speech signals respectively associated with human speakers, the speech signals including verbal components and non-verbal components; identifying a plurality of geographical locations, each geographic location associated with a respective one of the plurality of the communication devices; extracting the non-verbal components from the obtained speech signals; deducing physiological or psychological conditions of the human speakers by analyzing, over a specified period, the extracted non-verbal components, using predefined relations between characteristics of the non-verbal components and physiological or psychological conditions of the human speakers; and providing a geographical distribution of the deduced physiological or psychological conditions of the human speakers by associating the deduced physiological or psychological conditions of the human speakers with geographical locations thereof.
Abstract: Language dictation recognition systems and methods for using the same. In at least one exemplary system for analyzing verbal records, the system comprises a database capable of receiving a plurality of verbal records, the verbal record comprising at least one identifier and at least one verbal feature and a processor operably coupled to the database, where the processor has and executes a software program. The processor being operational to identify a subset of the plurality of verbal records from the database, extract at least one verbal feature from the identified records, analyze the at least one verbal feature of the subset of the plurality of verbal records, process the subset of the plurality of records using the analyzed feature according to at least one reasoning approach, generate a processed verbal record using the processed subset of the plurality of records, and deliver the processed verbal record to a recipient.
Type:
Application
Filed:
October 5, 2011
Publication date:
July 25, 2013
Inventors:
Nick Mahurin, Nathan Lindle, Markus Dickinson, Sandra Kuebler
Abstract: Conference bridge (1) for managing an audio scene comprising two or more participants, the conference bridge comprising a mixer (2) and several user channels (3a, 3b, 3N). The conference bridge is arranged to continuously create a 3D positional audio environment signal for each participant as a listening participant, by rendering the speech of each participant as a 3D positioned virtual sound source and excluding the speech of the listening participant, and to distribute each created 3D positional audio environment signal to the corresponding listening participant. Further, the conference bridge is arranged to place the virtual sound source corresponding to each participant at the same spatial position relative the listening participant in every created 3D positional audio environment.
Type:
Grant
Filed:
October 9, 2008
Date of Patent:
July 23, 2013
Assignee:
Telefonaktiebolaget LM Ericsson (publ)
Inventors:
Patrik Sandgren, Anders Eriksson, Tommy Falk
Abstract: Techniques for generating, distributing, and using speech recognition models are described. A shared speech processing facility is used to support speech recognition for a wide variety of devices with limited capabilities including business computer systems, personal data assistants, etc., which are coupled to the speech processing facility via a communications channel, e.g., the Internet. Devices with audio capture capability record and transmit to the speech processing facility, via the Internet, digitized speech and receive speech processing services, e.g., speech recognition model generation and/or speech recognition services, in response. The Internet is used to return speech recognition models and/or information identifying recognized words or phrases. Thus, the speech processing facility can be used to provide speech recognition capabilities to devices without such capabilities and/or to augment a device's speech processing capability.
Abstract: A method for generating background noise and a noise processing apparatus are provided in order to improve user experience. The method includes: if an obtained signal frame is a noise frame, a high band noise encoding parameter is obtained from the noise frame; weighting and/or smoothing is performed on the high band noise encoding parameter to obtain a second high band noise encoding parameter; and a high band background noise signal is generated according to the second high band noise encoding parameter. A noise processing apparatus is also provided.
Abstract: Improved audio classification is provided for encoding applications. An initial classification is performed, followed by a finer classification, to produce speech classifications and music classifications with higher accuracy and less complexity than previously available. Audio is classified as speech or music on a frame by frame basis. If the frame is classified as music by the initial classification, that frame undergoes a second, finer classification to confirm that the frame is music and not speech (e.g., speech that is tonal and/or structured that may not have been classified as speech by the initial classification). Depending on the implementation, one or more parameters may be used in the finer classification. Example parameters include voicing, modified correlation, signal activity, and long term pitch gain.
Abstract: A speech masking apparatus includes a microphone and a speaker. The microphone can detect a human voice. The speaker can output a masking language which can include phonemes resembling human speech. At least one component of the masking language can have a pitch, a volume, a theme, and/or a phonetic content substantially matching a pitch, a volume, a theme, and/or a phonetic content of the voice.
Abstract: A quick response (QR) proxy and protocol gateway for interfacing with a carrier network, a QR-equipped device, and a contact center and contact center database is disclosed. A data link is connected to a carrier network to receive QR codes and other data. Additional data links are connected to a contact center database and a QR-equipped device to obtain information used in determining routing and tagging instructions. A user interface is connected to the gateway to accept configurable conditions for determining routing instructions. There is a text conversion function and speech conversion function for each target enterprise contact center. Synchronization between stored user preferences to automated or semi-automated customer service routes is provided by a consumer preference template system.
Abstract: A method and apparatus for a computer and telecommunication network which can receive, send and manage information from or to a subscriber of the network, based on the subscriber's configuration. The network is made up of at least one cluster containing voice servers which allow for telephony, speech recognition, text-to-speech and conferencing functions, and is accessible by the subscriber through standard telephone connections or through internet connections. The network also utilizes a database and file server allowing the subscriber to maintain and manage certain contact lists and administrative information. A web server is also connected to the cluster thereby allowing access to all functions through internet connections.
Abstract: An input frame data producing unit produces from data stored in an input buffer input frames each including a predetermined number of sub-frames of a first hopsize determined based on the first frame size and the overlapping rate. A frame processing unit executes a window function on the input frames and shifts the windowed input frames by the first hopsize and overlaps the shifted input frames, storing the overlapped frames in an output frame. An output buffer data producing frame unit stores data from the output frame to an output buffer including a predetermined number of sub-frames of a second hopsize. A CPU sets the first hopsize and overlapping rate in a slow-speed reproduction when the reproducing speed ratio is set lower than 1 different from in a high-speed reproduction when the reproducing speed ratio is set larger than 1.
Abstract: A method (700, 800) and apparatus (100, 200) processes audio frames to transition between different codecs. The method can include producing (720), using a first coding method, a first frame of coded output audio samples by coding a first audio frame in a sequence of frames. The method can include forming (730) an overlap-add portion of the first frame using the first coding method. The method can include generating (740) a combination first frame of coded audio samples based on combining the first frame of coded output audio samples with the overlap-add portion of the first frame. The method can include initializing (760) a state of a second coding method based on the combination first frame of coded audio samples. The method can include constructing (770) an output signal based on the initialized state of the second coding method.
Abstract: A device receives voice and/or data from a speaker, such as a politician, and presents a signal indicative of the accuracy of the speaker's statements. The device maybe a mobile device, such as a smart phone, or a fixed device, such as a TV set. The device compares a speaker segment, automatically selected from the speaker statement, with a factual segment, automatically selected from a database comprising stored facts, and presents the accuracy of the speaker statement to the user of the device. The device may be configured so that the user may manually select the speaker segment to be assessed by the device.
Abstract: A system and method for redundant transmission is provided. In one embodiment, an input signal S is encoded as a list of fragments. Each fragment includes an index value and a projection value. The index points to an entry in a dictionary of signal elements. A repetition factor is assigned to each fragment based on its importance. After a fragment is added, a reconstructed signal is generated by decoding the list of fragments. Encoding terminates once the reconstructed signal is sufficiently close to the original signal S.
Abstract: A device may include logic configured to receive a first speech input from a party, to compare the first speech input to a second speech input to produce a result, and to determine if the party is impaired based on the result.
Abstract: This specification describes technologies relating to multi core processing for parallel speech-to-text processing. In some implementations, a computer-implemented method is provided that includes the actions of receiving an audio file; analyzing the audio file to identify portions of the audio file as corresponding to one or more audio types; generating a time-ordered classification of the identified portions, the time-ordered classification indicating the one or more audio types and position within the audio file of each portion; generating a queue using the time-ordered classification, the queue including a plurality of jobs where each job includes one or more identifiers of a portion of the audio file classified as belonging to the one or more speech types; distributing the jobs in the queue to a plurality of processors; performing speech-to-text processing on each portion to generate a corresponding text file; and merging the corresponding text files to generate a transcription file.
Abstract: A method for gracefully extending the range and/or capacity of voice communication systems is disclosed. The method involves the persistent storage of voice media on a communication device. When the usable bit rate on the network is poor and below that necessary for conducting a live conversation, voice media is transmitted and received by the communication device at the available usable bit rate on the network. Although latency may be introduced, the persistent storage of both transmitted and received media of a conversation provides the ability to extend the useful range of wireless networks beyond what is required for live conversations. In addition, the capacity and robustness in not being affected by external interferences for both wired and wireless communications is improved.
Abstract: Provided are, among other things, systems, methods and techniques for decoding an audio signal from a frame-based bit stream. At least one frame includes processing information pertaining to the frame and entropy-encoded quantization indexes representing audio data within the frame. The processing information includes: (i) code book indexes, and (ii) code book application information specifying ranges of entropy-encoded quantization indexes to which the code books are to be applied. The entropy-encoded quantization indexes are decoded by applying the identified code books to the corresponding ranges of entropy-encoded quantization indexes.
Abstract: A parameter decoding device performs a parameter compensation process so as to suppress degradation of a main observation quality in a prediction quantization. The parameter decoding device includes first amplifiers which multiply inputted quantization prediction residual vectors by a weighting coefficient. A further amplifier multiplies the preceding frame decoding LSF vector yn?1 by the weighting coefficient. An additional amplifier multiplies the code vector xn+1 outputted from a codebook by the weighting coefficient ?0. An adder calculates the total of the vectors outputted from the amplifiers, the further amplifier, and the additional amplifier. A selector switch selects the vector outputted from the adder if the frame erasure coding Bn of the current frame indicates that ‘the n-th frame is an erased frame’ and the frame erasure coding Bn+1 of the next frame indicates that ‘the n+1-th frame is a normal frame’.
Abstract: An apparatus for synthesizing a speech including a waveform memory that stores a plurality of speech unit waveforms, an information memory that correspondingly stores speech unit information and an address of each of the speech unit waveforms, a selector that selects a speech unit sequence corresponding to the input phoneme sequence by referring to the speech unit information, a speech unit waveform acquisition unit that acquires a speech unit waveform corresponding to each speech unit of the speech unit sequence from the waveform memory by referring to the address, a speech unit concatenation unit that generates the speech by concatenating the speech unit waveform acquired.
Abstract: A method and system of speech recognition presented by a back channel from multiple user sites within a network supporting cable television and/or video delivery is disclosed.
Type:
Grant
Filed:
November 3, 2011
Date of Patent:
June 25, 2013
Assignee:
Promptu Systems Corporation
Inventors:
Theodore Calderone, Paul M. Cook, Mark J. Foster