Abstract: The APPARATUSES, METHODS AND SYSTEMS FOR A DIGITAL CONVERSATION MANAGEMENT PLATFORM (“DCM-Platform”) transforms digital dialogue from consumers, client demands and, Internet search inputs via DCM-Platform components into tradable digital assets, and client needs based artificial intelligence campaign plan outputs. In one implementation, The DCM-Platform may capture and examine conversations between individuals and artificial intelligence conversation agents. These agents may be viewed as assets. One can measure the value and performance of these agents by assessing their performance and ability to generate revenue from prolonging conversations and/or ability to effect sales through conversations with individuals.
Type:
Application
Filed:
May 28, 2014
Publication date:
April 23, 2015
Inventors:
Andrew Peter Nelson Jerram, Frederick Francis McMahon
Abstract: Disclosed is a method for controlling a cordless telephone device for use in a system that allows remote control of a home electric appliance. The method includes a first generation step of causing a first generation unit in a handset to encode audio input via a sound receiving unit in the handset to generate a first stream, and a first transmission step of transmitting the first stream to a base unit. The first generation step includes causing the first generation unit to generate instruction bit information and a first instruction stream when a first trigger indicating a request to start the remote control is given to the first generation unit. The first transmission step includes transmitting the instruction bit information and the first instruction stream to the base unit through a multiplexing scheme that is common to transmission of a first stream generated when the first trigger is not given.
Abstract: An audio encoder has a window function controller, a windower, a time warper with a final quality check functionality, a time/frequency converter, a TNS stage or a quantizer encoder, the window function controller, the time warper, the TNS stage or an additional noise filling analyzer are controlled by signal analysis results obtained by a time warp analyzer or a signal classifier. Furthermore, a decoder applies a noise filling operation using a manipulated noise filling estimate depending on a harmonic or speech characteristic of the audio signal.
Type:
Grant
Filed:
January 11, 2011
Date of Patent:
April 21, 2015
Assignee:
Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
Inventors:
Stefan Bayer, Sascha Disch, Ralf Geiger, Guillaume Fuchs, Max Neuendorf, Gerald Schuller, Bernd Edler
Abstract: A method and system for performing sample rate conversion is provided. The method may include configuring a system to convert a sample rate of a first audio channel of a plurality of audio channels to produce a first audio stream of samples. The system may be dynamically reconfigured to convert a sample rate of a second of the plurality of audio channels to produce a second audio stream of samples, wherein the first and second audio streams are output from the system at the same time. The method may further include arbitrating between request for additional data from the first and second audio stream of samples, where processing of the first channel is suspended when the request corresponds to a second channel that is of higher priority.
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating direct speech messages based on voice commands that include indirect speech messages. In one aspect, a method includes receiving a voice input corresponding to an utterance. A determination is made whether a transcription of the utterance includes a command to initiate a communication to a user and a segment that is classified as indirect speech. In response to determining that the transcription of the utterance includes the command and the segment that is classified as indirect speech, the segment that is classified as indirect speech is provided as input to a machine translator. In response to providing the segment that is classified as indirect speech to the machine translator, a direct speech segment is received from the machine translator. A communication is initiated that includes the direct speech segment.
Abstract: According to certain embodiments, training a transcription system includes accessing recorded voice data of a user from one or more sources. The recorded voice data comprises voice samples. A transcript of the recorded voice data is accessed. The transcript comprises text representing one or more words of each voice sample. The transcript and the recorded voice data are provided to a transcription system to generate a voice profile for the user. The voice profile comprises information used to convert a voice sample to corresponding text.
Type:
Grant
Filed:
May 5, 2010
Date of Patent:
April 14, 2015
Assignee:
Cisco Technology, Inc.
Inventors:
Todd C. Tatum, Michael A. Ramalho, Paul M. Dunn, Shantanu Sarkar, Tyrone T. Thorsen, Alan D. Gatzke
Abstract: A system that incorporates teachings of the present disclosure may include, for example, a controller configured to obtain information associated with media content, to generate a first group of tones representative of the information associated with the media content, and to generate a media stream comprising the media content and the first group of tones; and a communication interface configured to transmit the media stream to a media device whereby the media device presents the media content and a sequence of tones, where the sequence of tones is generated based at least in part on the first group of tones, where the first group of tones comprises high frequency tones and low frequency tones, and where one of the high and low frequency tones represents a binary one and the other of the high and low frequency tones represents a binary zero. Other embodiments are disclosed.
Abstract: The described method and system provide for HMI steering for a telematics-equipped vehicle based on likelihood to exceed eye glance guidelines. By determining whether a task is likely to cause the user to exceed eye glance guidelines, alternative HMI processes may be presented to a user to reduce ASGT and EORT and increase compliance with eye glance guidelines. By allowing a user to navigate through long lists of items through vocal input, T9 text input, or heuristic processing rather than through conventional presentation of the full list, a user is much more likely to comply with the eye glance guidelines. This invention is particularly useful in contexts where users may be searching for one item out of a plurality of potential items, for example, within the context of hands-free calling contacts, playing back audio files, or finding points of interest during GPS navigation.
Type:
Grant
Filed:
May 26, 2011
Date of Patent:
March 31, 2015
Assignees:
General Motors LLC, GM Global Technology Operations LLC
Inventors:
Steven C. Tengler, Bijaya Aryal, Scott P. Geisler, Michael A. Wuergler
Abstract: In the field of communications, a method and a device for determining a decoding mode of in-band signaling are provided, which improve accuracy of in-band signaling decoding. The method includes: calculating a probability of each decoding mode of in-band signaling of a received signal at a predetermined moment by using a posterior probability algorithm; and from the calculated probabilities of the decoding modes, selecting a decoding mode having a maximum probability value as a decoding mode of the in-band signaling of the received signal at the predetermined moment. The method and the device are mainly used in a process for determining a decoding mode of in-band signaling in a speech frame transmission process.
Abstract: According to one embodiment, an electronic apparatus includes: a microphone; a storage unit which stores at least one of record start instruction keyword; a voice recognition section which recognizes a voice content that is input through the microphone; and a record start execution section which, in a case where the voice content recognized by the voice recognition section is coincident with the record start instruction keyword, executes a record start.
Abstract: A method for providing text to speech from digital content in an electronic device is described. Digital content including a plurality of words and a pronunciation database is received. Pronunciation instructions are determined for the word using the digital content. Audio or speech is played for the word using the pronunciation instructions. As a result, the method provides text to speech on the electronic device based on the digital content.
Type:
Grant
Filed:
September 30, 2008
Date of Patent:
March 24, 2015
Assignee:
Amazon Technologies, Inc.
Inventors:
John Lattyak, John T. Kim, Robert Wai-Chi Chu, Laurent An Minh Nguyen
Abstract: A method for managing an interaction of a calling party to a communication partner is provided. The method includes automatically determining if the communication partner expects DTMF input. The method also includes translating speech input to one or more DTMF tones and communicating the one or more DTMF tones to the communication partner, if the communication partner expects DTMF input.
Type:
Grant
Filed:
March 29, 2010
Date of Patent:
March 24, 2015
Assignee:
Microsoft Technology Licensing, LLC
Inventors:
Yun-Cheng Ju, Stefanie Tomko, Frank Liu, Ivan Tashev
Abstract: Provided are an apparatus and a method for integrally encoding and decoding a speech signal and a audio signal. The encoding apparatus may include: an input signal analyzer to analyze a characteristic of an input signal; a first conversion encoder to convert the input signal to a frequency domain signal, and to encode the input signal when the input signal is a audio characteristic signal; a Linear Predictive Coding (LPC) encoder to perform LPC encoding of the input signal when the input signal is a speech characteristic signal; a frequency band expander for expanding a frequency band of the input signal whose output is transmitted to either the first conversion encoder or the LPC encoder based on the input characteristic; and a bitstream generator to generate a bitstream using an output signal of the first conversion encoder and an output signal of the LPC encoder.
Type:
Grant
Filed:
July 14, 2009
Date of Patent:
March 24, 2015
Assignees:
Electronics and Telecommunications Research Institute, Kwangwoon University Industry-Academic Collaboration Foundation
Inventors:
Tae Jin Lee, Seung-Kwon Baek, Min Je Kim, Dae Young Jang, Jeongil Seo, Kyeongok Kang, Jin-Woo Hong, Hochong Park, Young-cheol Park
Abstract: A method of analyzing an audio signal is disclosed. A digital representation of an audio signal is received and a first output function is generated based on a response of a physiological model to the digital representation. At least one property of the first output function may be determined. One or more values are determined for use in analyzing the audio signal, based on the determined property of the first output function.
Abstract: Embodiments of the present invention use one or more audible tones to communicate metadata during a transfer of an audio file. Embodiments of the present invention communicate an audio file from a speaker in a recording device (e.g., a recordable book, toy, computing device) to a microphone in a receiving device. The audio file is transferred by audibly broadcasting the audio file content. The audio file may be a recording made by the user (e.g., the user singing a song, a child responding to a storybook prompt intended to elicit a response). The file transfer process uses one or more audible tones, such as dual-tone multi-frequency signaling (“DTMF”) tones to communicate metadata associated with the audio file. Audible tones may also be used to communicate commands that delineate the beginning and/or end of a file broadcast.
Type:
Application
Filed:
September 19, 2013
Publication date:
March 19, 2015
Inventors:
SCOTT A. SCHIMKE, NICHOLAS PEDERSEN, KIERSTEN WILMES, MAX J. YOUNGER, MA LAP MAN
Abstract: A network communication node includes an audio outputter that outputs an audible representation of data to be provided to a requester. The network communication node also includes a processor that determines a categorization of the data to be provided to the requester and that varies a pause between segments of the audible representation of the data in accordance with the categorization of the data to be provided to the requester.
Type:
Grant
Filed:
July 15, 2008
Date of Patent:
March 17, 2015
Assignee:
AT&T Intellectual Property, I, L.P.
Inventors:
Gregory Pulz, Steven Lewis, Charles Rajnai
Abstract: A quantizing method is provided that includes quantizing an input signal by selecting one of a first quantization scheme not using an inter-frame prediction and a second quantization scheme using the inter-frame prediction, in consideration of one or more of a prediction mode, a predictive error and a transmission channel state.
Abstract: A method, medium, and apparatus encoding and/or decoding a multichannel audio signal. The method includes detecting the type of spatial extension data included in an encoding result of an audio signal, if the spatial extension data is data indicating a core audio object type related to a technique of encoding core audio data, detecting the core audio object type; decoding core audio data by using a decoding technique according to the detected core audio object type, if the spatial extension data is residual coding data, decoding the residual coding data by using the decoding technique according to the core audio object type, and up-mixing the decoded core audio data by using the decoded residual coding data. According to the method, the core audio data and residual coding data may be decoded by using an identical decoding technique, thereby reducing complexity at the decoding end.
Abstract: Systems and methods that can be utilized to convert a voice communication received over a telecommunication network to text are described. In an illustrative embodiment, a call processing system coupled to a telecommunications network receives a call from a caller intended for a first party, wherein the call is associated with call signaling information. At least a portion of the call signaling information is stored in a computer readable medium. A greeting is played the caller, and a voice communication from the caller is recorded. At least a portion of the voice communication is converted to text, which is analyzed to identify portions that are inferred to be relatively more important to communicate to the first party. A text communication is generated including at least some of the identified portions and including fewer words than the recorded voice communication. At least a portion of the text communication is made available to the first party over a data network.
Type:
Grant
Filed:
March 20, 2014
Date of Patent:
March 10, 2015
Assignee:
Callwave Communications, LLC
Inventors:
Anthony Bladon, David Giannini, David Frank Hofstatter, Colin Kelley, David C. McClintock, Robert F. Smith, David S. Trandal, Leland W. Kirchhoff
Abstract: An audio buffer is used to capture audio in anticipation of a user command to do so. Sensors and processor activity may be monitored, looking for indicia suggesting that the user command may be forthcoming. Upon detecting such indicia, a circular buffer is activated. Audio correction may be applied to the audio stored in the circular buffer. After receiving the user command instructing the device to process or record audio, at least a portion of the audio that was stored in the buffer before the command is combined with audio received after the command. The combined audio may then be processed, transmitted or stored.
Abstract: A method for communication includes receiving modulated signals, which convey encoded speech. A measure of information entropy associated with the received signals is estimated. A speech encoding scheme is selected responsively to the estimated measure of the information entropy. A request to encode subsequent speech using the selected speech encoding scheme is sent to a transmitter.
Type:
Grant
Filed:
December 18, 2008
Date of Patent:
March 3, 2015
Assignee:
Marvell World Trade Ltd.
Inventors:
Maor Margalit, David Ben-Eli, Paul S. Spencer
Abstract: A system and method for performing dual mode speech recognition, employing a local recognition module on a mobile device and a remote recognition engine on a server device. The system accepts a spoken query from a user, and both the local recognition module and the remote recognition engine perform speech recognition operations on the query, returning a transcription and confidence score, subject to a latency cutoff time. If both sources successfully transcribe the query, then the system accepts the result having the higher confidence score. If only one source succeeds, then that result is accepted. In either case, if the remote recognition engine does succeed in transcribing the query, then a client vocabulary is updated if the remote system result includes information not present in the client vocabulary.
Type:
Grant
Filed:
June 21, 2012
Date of Patent:
March 3, 2015
Assignee:
Soundhound, Inc.
Inventors:
Timothy P. Stonehocker, Keyvan Mohajer, Bernard Mont-Reynaud
Abstract: An audio data processing system is a client-server system including an audio data communication device and an audio data processing device which are linked together via a communication network. The audio data communication device includes an acoustic generator, a control device, a transmitter and a receiver in connection with first and second storage areas. The transmitter sequentially transmits a time series of unprocessed data DA[n] stored in the first storage area, while the receiver sequentially receives a time series of processing-completed data DB[n] from the acoustic data processing device so that processing-completed data are stored in the second storage area and sequentially reproduced. When specific processing-completed data is not stored in the second storage area, the control device designates and reproduces specific unprocessed data, which is unprocessed acoustic data corresponding to specific processing-completed data.
Abstract: A method for setting a voice tag is provided, which comprises the following steps. First, counting a number of phone calls performed between a user and a contact person. If the number of phone calls exceeds a predetermined times or a voice dialing performed by the user is failed before calling to the contact person within a predetermined duration, the user is inquired whether or not to set a voice tag corresponding to the contact person after the phone call is complete. If the user decides to set the voice tag, a voice training procedure is executed for setting the voice tag corresponding to the contact person.
Abstract: Methods and systems for extracting speech from such packet streams. The methods and systems analyze the encoded speech in a given packet stream, and automatically identify the actual speech coding scheme that was used to produce it. These techniques may be used, for example, in interception systems where the identity of the actual speech coding scheme is sometimes unavailable or inaccessible. For instance, the identity of the actual speech coding scheme may be sent in a separate signaling stream that is not intercepted. As another example, the identity of the actual speech coding scheme may be sent in the same packet stream as the encoded speech, but in encrypted form.
Abstract: An apparatus for encoding includes a first domain converter, a switchable bypass, a second domain converter, a first processor and a second processor to obtain an encoded audio signal having different signal portions represented by coded data in different domains, which have been coded by different coding algorithms. Corresponding decoding stages in the decoder together with a bypass for bypassing a domain converter allow the generation of a decoded audio signal with high quality and low bit rate.
Type:
Grant
Filed:
November 6, 2012
Date of Patent:
February 17, 2015
Assignee:
Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
Inventors:
Bernhard Grill, Stefan Bayer, Guillaume Fuchs, Stefan Geyersberger, Ralf Geiger, Johannes Hilpert, Ulrich Kraemer, Jeremie Lecomte, Markus Multrus, Max Neuendorf, Harald Popp, Nikolaus Rettelbach, Roch LeFebvre, Bruno Bessette, Jimmy LaPierre, Philippe Gournay, Redwan Salami
Abstract: A method is presented that uses steganographic codeword(s) carried in a speech payload in such a way that (i) the steganographic codeword(s) survive compression and/or transcoding as the payload travels from a transmitter to a receiver across at least one diverse network, and (ii) the embedded steganographic codeword(s) do not degrade the perceived voice quality of the received signal below an acceptable level. The steganographic codewords are combined with a speech payload by summing the amplitude of a steganographic codeword to the amplitude of the speech payload at a relatively low steganographic-to-speech bit rate. Advantageously, the illustrative embodiment of the present invention enables (i) steganographic codewords to be decoded by a compliant receiver and applied accordingly, and (ii) legacy or non-compliant receivers to play the received speech payload with resultant voice quality that is acceptable to listeners even though the steganographic codeword(s) remain in the received speech payload.
Type:
Grant
Filed:
February 18, 2010
Date of Patent:
February 10, 2015
Assignee:
Avaya Inc.
Inventors:
Anjur Sundaresan Krishnakumar, Lawrence O'Gorman
Abstract: The present disclosure discloses a speech recognition method and a terminal, which belong to the field of communications. The method comprises: receiving speech information inputted by a user; acquiring the current environment information, and judging whether the speech information needs to be played according to the current environment information; and recognizing the speech information as text information, when it is judged that the speech information needs not to be played. The terminal comprises an acquisition module, a judgment module and a recognition module. The present disclosure provides the speech receiver with a speech recognition function, when the speech information of the instant messaging is received by the terminal, it can help the receiver to normally acquire the content to be expressed by the speech sender under an inconvenient situation.
Abstract: In an audio output terminal device, a buffer control unit adjusts the buffer size of a jitter buffer in accordance with the setting of a sound output mode instructed in an instruction receiving unit. If the instruction receiving unit acknowledges an instruction for setting an audio output mode that requires low delay in outputting sound, the buffer control unit reduces the buffer size of the jitter buffer. Further, the buffer control unit controls, in accordance with the instructed setting of the sound output mode, timing for allowing a media buffer to transmit one or more voice packets to the jitter buffer.
Type:
Grant
Filed:
September 16, 2010
Date of Patent:
February 3, 2015
Assignees:
Sony Corporation, Sony Computer Entertainment Inc.
Abstract: A method of operating an audio processing device to improve a user's perception of an input sound includes defining a critical frequency fcrit between a low frequency range and a high frequency range, receiving an input sound by the audio processing device, and analyzing the input sound in a number of frequency bands below and above the critical frequency. The method also includes defining a cut-off frequency fcut below the critical frequency fcrit, identifying a source frequency band above the cut-off frequency fcut, and extracting an envelope of the source band. Further, the method identifying a corresponding target band below the critical frequency fcrit, extracting a phase of the target band, and combining the envelope of the source band with the phase of the target band.
Type:
Grant
Filed:
April 6, 2011
Date of Patent:
February 3, 2015
Assignee:
Oticon A/S
Inventors:
Marcus Holmberg, Thomas Kaulberg, Jan Mark de Haan
Abstract: Disclosed is a frame comparison apparatus and method for comparing frames included in an audio signal by using spectrum information. The frame comparison apparatus includes a spectrum information estimation apparatus for receiving an audio signal and estimating and outputting spectrum information for the respective frames included in the audio signal, an estimation operation option determiner for determining an estimation order of the spectrum information estimated from the spectrum information estimation apparatus, a frame comparison option determiner for determining a comparison order for the frames output from the spectrum information estimation apparatus, and a frame comparator for determining a comparison target frame which is a comparison target for a current frame included in the audio signal, comparing the spectrum information for the current frame with the spectrum information for the comparison target frame, and outputting a comparison result value.
Abstract: Some embodiments disclosed herein store a target application and a dictation application. The target application may be configured to receive input from a user. The dictation application interface may include a full overlay mode option, where in response to selection of the full overlay mode option, the dictation application interface is automatically sized and positioned over the target application interface to fully cover a text area of the target application interface to appear as if the dictation application interface is part of the target application interface. The dictation application may be further configured to receive an audio dictation from the user, convert the audio dictation into text, provide the text in the dictation application interface and in response to receiving a first user command to complete the dictation, automatically copy the text from the dictation application interface and inserting the text into the target application interface.
Type:
Grant
Filed:
October 16, 2013
Date of Patent:
January 13, 2015
Assignee:
Dolbey & Company, Inc.
Inventors:
Curtis A. Weeks, Aaron G. Weeks, Stephen E. Barton
Abstract: An audio decoding system including a decoder decoding a first part of audio data, and an audio buffer compressor compressing and storing the decoded first part of audio data in a first time interval and decompressing the stored first part of audio data in a second time interval.
Abstract: Method, system, and computer program product for voice transformation are provided. The method includes transforming a source speech using transformation parameters, and encoding information on the transformation parameters in an output speech using steganography, wherein the source speech can be reconstructed using the output speech and the information on the transformation parameters. A method for reconstructing voice transformation is also provided including: receiving an output speech of a voice transformation system wherein the output speech is transformed speech which has encoded information on the transformation parameters using steganography; extracting the information on the transformation parameters; and carrying out an inverse transformation of the output speech to obtain an approximation of an original source speech.
Type:
Grant
Filed:
March 17, 2011
Date of Patent:
January 6, 2015
Assignee:
International Business Machines Corporation
Inventors:
Shay Ben-David, Ron Hoory, Zvi Kons, David Nahamoo
Abstract: A speech enhancement system enhances transitions between speech and non-speech segments. The system includes a background noise estimator that approximates the magnitude of a background noise of an input signal that includes a speech and a non-speech segment. A slave processor is programmed to perform the specialized task of modifying a spectral tilt of the input signal to match a plurality of expected spectral shapes selected by a Codec.
Type:
Grant
Filed:
November 14, 2012
Date of Patent:
January 6, 2015
Assignee:
2236008 Ontario Inc.
Inventors:
Phillip A. Hetherington, Shreyas Paranjpe, Xueman Li
Abstract: A method comprising receiving at a user equipment encrypted content. The content is stored in said user equipment in an encrypted form. At least one key for decryption of said stored encrypted content is stored in the user equipment.
Type:
Grant
Filed:
May 9, 2008
Date of Patent:
January 6, 2015
Assignee:
Nokia Corporation
Inventors:
Anssi Ramo, Mikko Tammi, Adriana Vasilache, Lasse Laaksonen
Abstract: A method of converting speech from the characteristics of a first voice to the characteristics of a second voice, the method comprising: receiving a speech input from a first voice, dividing said speech input into a plurality of frames; mapping the speech from the first voice to a second voice; and outputting the speech in the second voice, wherein mapping the speech from the first voice to the second voice comprises, deriving kernels demonstrating the similarity between speech features derived from the frames of the speech input from the first voice and stored frames of training data for said first voice, the training data corresponding to different text to that of the speech input and wherein the mapping step uses a plurality of kernels derived for each frame of input speech with a plurality of stored frames of training data of the first voice.
Abstract: An information processing method and an electronic device are disclosed. The information processing method is applied to a first electronic device. When the device orientation of the first electronic device is a first device orientation at a first time instant, the method includes: obtaining, by a first sensor of the first electronic device, a first sensing parameter indicating that the device orientation is a second device orientation at a second time instant after the first time instant; determining, based on the first sensing parameter, whether the second device orientation differs from the first device orientation, and obtaining a first determination; and generating a first instruction for entering into a voice record state when the second device orientation differs from the first device orientation and the second device orientation meets a predetermined condition.
Abstract: A method for measuring speech signal quality by an electronic device is described. The method includes obtaining a modified single-channel speech signal. The method also includes estimating multiple objective distortions based on the modified single-channel speech signal. The multiple objective distortions include at least one foreground distortion and at least one background distortion. The method further includes estimating a foreground quality and a background quality based on the multiple objective distortions. The method additionally includes estimating an overall quality based on the foreground quality and the background quality.
Abstract: A method for decoding an audio signal in a decoder having a CELP-based decoder element including a fixed codebook component, at least one pitch period value, and a first decoder output, wherein a bandwidth of the audio signal extends beyond a bandwidth of the CELP-based decoder element. The method includes obtaining an up-sampled fixed codebook signal by up-sampling the fixed codebook component to a higher sample rate, obtaining an up-sampled excitation signal based on the up-sampled fixed codebook signal and an up-sampled pitch period value, and obtaining a composite output signal based on the up-sampled excitation signal and an output signal of the CELP-based decoder element, wherein the composite output signal includes a bandwidth portion that extends beyond a bandwidth of the CELP-based decoder element.
Type:
Grant
Filed:
September 28, 2011
Date of Patent:
December 30, 2014
Assignee:
Motorola Mobility LLC
Inventors:
Jonathan A. Gibbs, James P. Ashley, Udar Mittal
Abstract: A voice correction device includes a detector that detects a response from a user, a calculator that calculates an acoustic characteristic amount of an input voice signal, an analyzer that outputs an acoustic characteristic amount of a predetermined amount when having acquired a response signal due to the response from the detector, a storage unit that stores the acoustic characteristic amount output by the analyzer, a controller that calculates an correction amount of the voice signal on the basis of a result of a comparison between the acoustic characteristic amount calculated by the calculator and the acoustic characteristic amount stored in the storage unit, and a correction unit that corrects the voice signal on the basis of the correction amount calculated by the controller.
Abstract: Method for speaker identification includes detecting a target speaker's utterance locally; extracting features from the detected utterance locally, analyzing the extracted features in the local device to obtain information on the speaker identification and/or encoding the extracted features locally, transmitting the encoded extracted features to a remote server, decoding and analyzing the received extracted features by the server to obtain information on the speaker identification, and transmitting the information on the speaker identification from the server to the location where the speaker's utterance was detected. The method further includes detecting speech activity locally. Extracting features, encoding the extracted features, and/or transmitting the encoded extracted features to the server, are only performed if speech activity above some predetermined threshold is detected.
Type:
Application
Filed:
June 20, 2011
Publication date:
December 25, 2014
Applicant:
AGNITIO, S.L.
Inventors:
Luis Buera Rodriguez, Carlos Vaquero Aviles-Casco, Marta Garcia Gomar
Abstract: Method and apparatus that dynamically adjusts operational parameters of a text-to-speech engine in a speech-based system. A voice engine or other application of a device provides a mechanism to alter the adjustable operational parameters of the text-to-speech engine. In response to one or more environmental conditions, the adjustable operational parameters of the text-to-speech engine are modified to increase the intelligibility of synthesized speech.
Type:
Grant
Filed:
May 18, 2012
Date of Patent:
December 16, 2014
Assignee:
Vocollect, Inc.
Inventors:
James Hendrickson, Debra Drylie Scott, Duane Littleton, John Pecorari, Arkadiusz Slusarczyk
Abstract: A system and method for providing an audio representation of a name includes providing a list of a plurality of users of a network and respective presence information regarding each of the plurality of users; receiving a request from an endpoint to receive an audio representation of a name of a particular user of the plurality of users, and providing the audio representation to the endpoint. Moreover, the audio representation of the name at least generally approximates a pronunciation of the name as pronounced by the particular user.
Abstract: A computer-implemented method and apparatus are disclosed for decoding an encoded data signal. In one embodiment, the method includes accessing, in a memory, a set of signal elements. The encoded data signal is received at a computing device. The signal includes signal fragments each having a projection value and an index value. The projection value has been calculated as a function of at least one signal element of the set of signal elements and at least a portion of the data signal. The index value associates its respective signal fragment with the at least one signal element used to calculate the projection value. The computing device determines amplitude values based on the projection values in the signal fragments. The decoded signal is determined using the amplitude values and the signal elements associated with the at least some of the signal fragments.
Abstract: This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.
Type:
Application
Filed:
August 14, 2014
Publication date:
December 4, 2014
Inventors:
Alejandro Acero, James G. Droppo, III, Michael L. Seltzer
Abstract: There is a need to enable decompression of a speech signal even if no network synchronizing signal is output from a baseband processing portion. For this purpose, an information processing device includes a first serial interface. The first serial interface includes a notification signal generation circuit that generates a notification signal each time compressed data incorporated from the baseband processing portion reaches a predetermined data quantity, and notifies a speech processing portion of this state using the notification signal. The speech processing portion includes a synchronizing signal generation circuit that generates a network synchronizing signal based on the notification signal. A clock signal for PCM communication is generated based on the network synchronizing signal. A speech signal can be decompressed even if no network synchronizing signal is output from the baseband processing portion.
Abstract: Aspects relate to machine recognition of human voices in live or recorded audio content, and delivering text derived from such live or recorded content as real time text, with contextual information derived from characteristics of the audio. For example, volume information can be encoded as larger and smaller font sizes. Speaker changes can be detected and indicated through text additions, or color changes to the font. A variety of other context information can be detected and encoded in graphical rendition commands available through RTT, or by extending the information provided with RTT packets, and processing that extended information accordingly for modifying the display of the RTT text content.
Abstract: An electronic device includes a camera and two microphones. The space in front of the camera is divided into a plurality of imaginary cubic areas. Each imaginary cubic area is associated with a delay parameter. The camera locates a face of a user and determines an imaginary cubic area in which the face is located from the plurality of imaginary cubic areas. A wave beam pointing to the imaginary cubic area is calculated according to the delay parameter associated with the imaginary cubic area. The two microphone record voices within a range of the wave beam. A voice recording method is also provided.