Modification Of At Least One Characteristic Of Speech Waves (epo) Patents (Class 704/E21.001)
  • Publication number: 20090171664
    Abstract: Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcomes the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command.
    Type: Application
    Filed: February 4, 2009
    Publication date: July 2, 2009
    Inventors: Robert A. Kennewick, David Locke, Michael R. Kennewick, SR., Michael R. Kennewick, JR., Richard Kennewick, Tom Freeman
  • Publication number: 20090171670
    Abstract: The present invention includes systems and methods for altering a cellular phone user's speech so that the speech can be less bothersome to third parties in the surrounding area and so that the user has more privacy. Sound cancellation can be used to cancel, reduce, or modify the user's voice so third parties cannot hear the voice as easily or so that the user's voice cannot be understood. Furthermore, the user device can encourage the user to speak in a lower voice. The user device can accomplish this encouragement by indicating to the user their level of speech. In this manner, the user knows when he may lower his voice and yet still provide an adequate volume of speech for the cellular phone. Additionally, the user device can encourage the user to speak in a lower voice by audibly playing back the user's voice in real time.
    Type: Application
    Filed: March 28, 2008
    Publication date: July 2, 2009
    Applicant: Apple Inc.
    Inventors: Robert Bailey, Lawrence Heyl, Stephan Schell
  • Publication number: 20090164225
    Abstract: A method to audio matrix encode/decode, which encode and decode audio signals of two or more channels into an audio signal of one or more channel while preserving the direction of a sound image includes extracting pieces of sound image information from audio signals of multi channels, encoding and allocating the extracted sound image information to an inaudible frequency domain except an audible frequency domain, and adding the sound image information allocated to the inaudible frequency domain and matrix-encoded stereo signals of the audible frequency domain.
    Type: Application
    Filed: June 12, 2008
    Publication date: June 25, 2009
    Applicant: Samsung Electronics Co., Ltd.
    Inventor: Sung-ho CHO
  • Publication number: 20090164208
    Abstract: The method for aligning parallel spoken language corpora comprises obtaining a statistics method and dictionaries-based word alignment set from the parallel spoken language corpora, aligning chunks of the parallel spoken language corpora by using the statistics method and dictionaries-based word alignment set, to obtain a chunk alignment set, and aligning words in aligned chunks of the parallel spoken language corpora to obtain a chunk alignment-based word alignment set. Chunk alignment set and word alignment set are obtained by aligning chunks in parallel spoken language corpora in a corpus repository using a statistics method and dictionaries-based high precision word alignment set obtained from the parallel spoken language corpora and further aligning words in the chunks, and by using them in the speech-to-speech machine translation, the ambiguities of spoken language word alignment can be decreased by using the integrality of chunks.
    Type: Application
    Filed: December 16, 2008
    Publication date: June 25, 2009
    Inventors: Ren DENGJUN, Wu HUA, Wang HAIFENG
  • Publication number: 20090164215
    Abstract: A device with a voice-assisted system is provided by using a voice command to adjust operations. The voice-assisted system includes a voice recognition engine and a control device. The voice recognition engine receives a voice command and outputting a voice signal based on the voice command to the control unit. The control unit based on the voice signal adjusts the operations. A user is only required to input the voice command. The voice recognition engine performs a series of actions to adjust the operations. Therefore, the voice-assisted system can enhance convenience of adjusting the operations of the device and reduce operation complexity for the user.
    Type: Application
    Filed: February 27, 2009
    Publication date: June 25, 2009
    Applicant: DELTA ELECTRONICS, INC.
    Inventors: Yuan-Chia Lu, Liang-Sheng Huang, Jia-Lin Shen
  • Publication number: 20090157391
    Abstract: An audio fingerprint is extracted from an audio sample, where the fingerprint contains information that is characteristic of the content in the sample. The fingerprint may be generated by computing an energy spectrum for the audio sample, resampling the energy spectrum logarithmically in the time dimension, transforming the resampled energy spectrum to produce a series of feature vectors, and computing the fingerprint using differential coding of the feature vectors. The generated fingerprint can be compared to a set of reference fingerprints in a database to identify the original audio content.
    Type: Application
    Filed: February 24, 2009
    Publication date: June 18, 2009
    Inventor: Sergiy Bilobrov
  • Publication number: 20090157409
    Abstract: A method includes, generating, for each parameter of the prosody vector, an initial parameter prediction model with a plurality of attributes related to difference prosody prediction and at least part of attribute combinations of the plurality of attributes, in which each of the plurality of attributes and the attribute combinations is included as an item, calculating importance of each item in the parameter prediction model, deleting the item having the lowest importance calculated, re-generating a parameter prediction model with the remaining items, determining whether the re-generated parameter prediction model is an optimal model, and repeating the step of calculating importance and the steps following the step of calculating importance with the re-generated parameter prediction model, if the re-generated parameter prediction model is determined as not an optimal model, wherein the difference prosody vector and all parameter prediction models of the difference prosody vector constitute the difference pros
    Type: Application
    Filed: December 4, 2008
    Publication date: June 18, 2009
    Inventors: Yi Lifu, Li Jian, Lou Xiaoyan, Hao Jie
  • Publication number: 20090150143
    Abstract: A post-filtering apparatus and method for speech enhancement in a modified discrete cosine transform (MDCT) domain are disclosed. In the apparatus and method, previous and current MDCT coefficients are used for obtaining a speech spectrum coefficient similar to a real speech spectrum, and a convex function is used for transforming the speech spectrum coefficient and obtaining a post-filter coefficient so that difference can increase in the case where the speech spectrum coefficient is small but decrease in the case where the coefficient is large. Then, the post-filter coefficient is applied to the MDCT coefficient. With this configuration, both the current and previous MDCT values are used, so that it is possible to obtain a spectrum coefficient similar to the real speech spectrum and to obtain a more accurate filter coefficient. Further, the coefficient is adaptively transformed through the convex function, thereby enhancing speech quality.
    Type: Application
    Filed: June 5, 2008
    Publication date: June 11, 2009
    Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Hyun-woo Kim, Jong-mo Sung, Mi-suk Lee, Do-young Kim, Byung-sun Lee
  • Publication number: 20090144055
    Abstract: A receiver in an audio coding system receives a signal conveying frequency subband signals representing an audio signal. The subband signals are examined to assess one or more characteristics of the audio signal including temporal shape. Spectral components are synthesized having the one or more assessed characteristics, integrated with the subband signals and passed through a synthesis filterbank to generate an output signal.
    Type: Application
    Filed: February 4, 2009
    Publication date: June 4, 2009
    Applicant: Dolby Laboratories Licensing Corporation
    Inventors: Grant Allen Davidson, Michael Mead Truman, Matthew Conrad Fellers, Mark Stuart Vinton
  • Publication number: 20090135164
    Abstract: Provided are a pointing apparatus capable of providing haptic feedback, and a haptic interaction system and method using the same. The pointing apparatus includes a wireless communication unit, a controller, and a haptic stimulator. The wireless communication unit receives an event including haptic output information through wireless communication with the outside. The controller generates a control signal for reproducing a haptic pattern corresponding to the haptic output information. The haptic stimulator reproduces the haptic pattern by means of the control signal. Thus, it is possible to increase the performance and usability of a user interface of a user terminal including a touch screen.
    Type: Application
    Filed: November 21, 2008
    Publication date: May 28, 2009
    Inventors: Ki Uk KYUNG, Jun Young LEE, Jun Seok PARK, Chang Seok BAE, Dong Won HAN, Jin Tae KIM
  • Publication number: 20090132243
    Abstract: A plurality of pairs of segments to be weighted/added are selected non-linearly with respect to a time axis of audio data. A speed conversion is achieved by performing the weighting/addition on the selected pairs of segments. The non-linear selection is performed by (a) obtaining all possible pairs of segments constituting the audio data, (b) calculating a degree of similarity pertaining to each possible pair, (c) ranking the all possible pairs of segments according to the degrees of similarity, and (d) overlapping at least one of the all possible pairs of segments that holds the highest degree of similarity.
    Type: Application
    Filed: January 23, 2007
    Publication date: May 21, 2009
    Inventor: Ryoji Suzuki
  • Publication number: 20090132238
    Abstract: An audio encoding system that accepts an audio signal as an input to the system. The system includes a filter bank that splits the audio signal into a plurality of frames, and a bit allocation unit that assigns a number of bits for a current frame of the plurality of frames. The system further includes a scale factor unit that calculates a scale factor, identifies a block type of a first block of a current frame, identifies a block type of a second block consecutive to the first block, and reuses a scale factor of the first block for the second block, when the block type of the first block and the block type of the second block match. The system additionally includes a quantization and coding unit that quantizes and codes the signal, and a bit rate checker that verifies whether a bit rate requirement is satisfied.
    Type: Application
    Filed: October 31, 2008
    Publication date: May 21, 2009
    Inventor: B. SUDHAKAR
  • Publication number: 20090132239
    Abstract: Techniques are disclosed for automatically de-identifying spoken audio signals. In particular, techniques are disclosed for automatically removing personally identifying information from spoken audio signals and replacing such information with non-personally identifying information. De-identification of a spoken audio signal may be performed by automatically generating a report based on the spoken audio signal. The report may include concept content (e.g., text) corresponding to one or more concepts represented by the spoken audio signal. The report may also include timestamps indicating temporal positions of speech in the spoken audio signal that corresponds to the concept content. Concept content that represents personally identifying information is identified. Audio corresponding to the personally identifying concept content is removed from the spoken audio signal. The removed audio may be replaced with non-personally identifying audio.
    Type: Application
    Filed: January 26, 2009
    Publication date: May 21, 2009
    Inventors: Michael Finke, Detlef Koll
  • Publication number: 20090125310
    Abstract: An apparatus and method for embedding and extracting a capturing-resistant audio watermark based on discrete wavelet transform, and a copyright management system using the same are provided. The apparatus for embedding a wavelet based audio watermark includes: a framing unit for dividing an input audio signal into small signals with a regular length; a discrete wavelet transform unit for calculating an mean value of wavelet coefficients by transforming the small signals based on a discrete wavelet transform; and an embedding unit for changing the calculated mean value according to a watermark where a synchronization signal is inserted and inserting the watermark into the audio signal.
    Type: Application
    Filed: June 11, 2007
    Publication date: May 14, 2009
    Inventors: Seungjae Lee, Sang Kwang Lee, Jin Soo Seo, Young Ho Suh, Yong Seok Seo, Seon Hwa Lee, Won Gyum Kim, Wonyoung Yoo, Sung Hwan Lee, Hye Won Jung, Young Suk Yoon
  • Publication number: 20090125307
    Abstract: A system and a method for providing each user at multiple devices with speaker-dependent speech recognition engines via networks according to the pre-stored speech sounds and characteristics of devices, by which each user can use speaker-dependent speech recognition engines in different devices without the need of repeating the same procedure of recording speech to train speech recognition engines for newly utilized devices.
    Type: Application
    Filed: November 9, 2007
    Publication date: May 14, 2009
    Inventor: Jui-Chang Wang
  • Publication number: 20090119720
    Abstract: The present rear seat entertainment system provides a second display and interface in the front section of a motor vehicle for control of a media player with a rear mounted first display. The second display shows still video images (or screen shots) from the media player for real time updates on the status of the first display in the rear section of the vehicle according to adjustments made by the second user interface. The entertainment system includes a portable controller with the second display incorporated therein.
    Type: Application
    Filed: September 7, 2006
    Publication date: May 7, 2009
    Inventors: Eric S Deuel, Peter W. Mokris, Steve Schultz, Lance E. Tinder, Douglas W. Klamer, Loren D. Vredevoogd, David Straight
  • Publication number: 20090119098
    Abstract: The present invention discloses a signal processing method adapted to process a synthesized signal in packet loss concealment. The method includes the following steps: receiving a good frame following a lost frame, obtaining an energy ratio of energy of a signal in the signal of the good frame signal to energy of a synthesized signal corresponding to the same time of the good frame, and adjusting the synthesized signal in accordance with the energy ratio. The present invention also discloses a signal processing apparatus and a voice decoder.
    Type: Application
    Filed: November 4, 2008
    Publication date: May 7, 2009
    Applicant: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Wuzhou ZHAN, Dongqi WANG, Yongfeng TU, Jing WANG, Qing ZHANG, Lei MIAO, Jianfeng XU, Chen HU, Yi YANG, Zhengzhong DU, Fengyan QI
  • Publication number: 20090119110
    Abstract: An apparatus for encoding and decoding an audio signal and method thereof are disclosed, by which compatibility with a player of a general mono or stereo audio signal can be provided in coding an audio signal and by which spatial information for a multi-channel audio signal can be stored or transmitted without a presence of an auxiliary data area. The present invention includes extracting side information embedded in non-recognizable component of audio signal components and decoding the audio signal using the extracted side information.
    Type: Application
    Filed: May 26, 2006
    Publication date: May 7, 2009
    Applicant: LG ELECTRONICS
    Inventors: Hyen-O Oh, Hee Suk Pang, Dong Soo Kim, Jae Hyun Lim, Yang-Won Jung
  • Publication number: 20090112579
    Abstract: A system improves speech intelligibility by reconstructing speech segments. The system includes a low-frequency reconstruction controller programmed to select a predetermined portion of a time domain signal. The low-frequency reconstruction controller substantially blocks signals above and below the selected predetermined portion. A harmonic generator generates low-frequency harmonics in the time domain that lie within a frequency range controlled by a background noise modeler. A gain controller adjusts the low-frequency harmonics to substantially match the signal strength to the time domain original input signal.
    Type: Application
    Filed: May 23, 2008
    Publication date: April 30, 2009
    Applicant: QNX SOFTWARE SYSTEMS (WAVEMAKERS), INC.
    Inventors: Xueman Li, Rajeev Nongpiur, Frank Linseisen, Phillip A. Hetherington
  • Publication number: 20090112605
    Abstract: The present invention provides a system and method associating the freeform speech commands with one or more predefined commands from a set of predefined commands. The set of predefined commands are stored and alternate forms associated with each predefined command are retrieved from an external data source. The external data source receives the alternate forms associated with each predefined command from multiple sources so the alternate forms represent paraphrases of the predefined command. A representation including words from the predefined command and the alternate forms of the predefined command, such as a vector representation, is generated for each predefined command. A similarity value between received speech data and each representation of a predefined command is computed and the speech data is classified as the predefined command whose representation has the highest similarity value to the speech data.
    Type: Application
    Filed: October 27, 2008
    Publication date: April 30, 2009
    Inventor: Rakesh Gupta
  • Publication number: 20090106029
    Abstract: A voice acquisition system for a vehicle includes an interior rearview mirror assembly. The mirror assembly may include a microphone for receiving audio signals within a cabin of the vehicle and generating an output indicative of these audio signals. The microphone may provide sound capture for a hands free cell phone system, an audio recording system and/or an emergency communication system. The system may include a control that is responsive to the output from the microphone and that distinguishes vocal signals from non-vocal signals present in the output. The microphone may provide sound capture for at least one accessory of the equipped vehicle, and the accessory may be responsive to a vocal signal captured by the microphone. The interior rearview mirror assembly may include at least one accessory, such as an antenna, a video device, a security system status indicator, a tire pressure indicator display and/or a loudspeaker.
    Type: Application
    Filed: December 19, 2008
    Publication date: April 23, 2009
    Applicant: DONNELLY CORPORATION
    Inventors: Jonathan E. DeLine, Niall R. Lynam, Ralph A. Spooner, Phillip A. March
  • Publication number: 20090094036
    Abstract: A method of presenting a multi-modal help dialog move to a user in a multi-modal dialog system is disclosed. The method comprises presenting an audio portion of the multi-modal help dialog move that explains available ways of user inquiry and presenting a corresponding graphical action performed on a user interface associated with the audio portion. The multi-modal help dialog move is context-sensitive and uses current display information and dialog contextual information to present a multi-modal help move that is currently related to the user. A user request or a problematic dialog detection module may trigger the multi-modal help move.
    Type: Application
    Filed: November 7, 2008
    Publication date: April 9, 2009
    Applicant: AT&T Corp
    Inventors: Patrick Ehlen, Helen Hastie, Michael Johnston
  • Publication number: 20090089063
    Abstract: A method, system and computer program product for voice conversion. The method includes performing speech analysis on the speech of a source speaker to achieve speech information; performing spectral conversion based on said speech information, to at least achieve a first spectrum similar to the speech of a target speaker; performing unit selection on the speech of said target speaker at least using said first spectrum as a target; replacing at least part of said first spectrum with the spectrum of the selected target speaker's speech unit; and performing speech reconstruction at least based on the replaced spectrum.
    Type: Application
    Filed: September 29, 2008
    Publication date: April 2, 2009
    Inventors: Fan Ping Meng, Yong Qin, Qin Shi, Zhi Wei Shuang
  • Publication number: 20090076805
    Abstract: The present invention discloses a method for performing a frame erasure concealment to a higher-band signal, including: calculating a periodic intensity of a higher-band signal with respect to a lower-band signal; judging whether the periodic intensity of the higher-band signal is higher than or equal to a preconfigured threshold; if the periodic intensity of the higher-band signal is higher than or equal to the preconfigured threshold, using a pitch period repetition method to perform the frame erasure concealment to the higher-band signal of a current lost frame; and if the periodic intensity of the higher-band signal is lower than the preconfigured threshold, using a previous frame data repetition method to perform the frame erasure concealment to the higher-band signal of the current lost frame. The present invention further discloses a device for performing a frame erasure concealment to a higher-band signal and a speech decoder. The problem that the quality of the voice signal is lowered is avoided.
    Type: Application
    Filed: May 29, 2008
    Publication date: March 19, 2009
    Applicant: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Jianfeng Xu, Lei Miao, Chen Hu, Qing Zhang, Lijing Xu, Wei Li, Zhengzhong Du, Yi Yang, Fengyan Qi, Wuzhou Zhan, Dongqi Wang
  • Publication number: 20090076826
    Abstract: Watermarking of audio signals intends to manipulate the audio signal in a way that the changes in the audio content cannot be recognised by the human auditory system. In order to reduce the audibility of the watermark and to improve the robustness of the watermarking the invention uses phase modification of the audio signal. In the frequency domain, the phase of the audio signal is manipulated by the phase of a reference phase sequence, followed by transform into time domain. Because a change of the audio signal phase over the whole frequency range can be audible, the phase manipulation is carried out with a maximum amount only within one or more small frequency ranges which are located in the higher frequencies and/or in noisy audio signal sections, according to psycho-acoustic principles. Preferably, the allowable amplitude of the phase changes in the remaining frequency ranges is controlled according to psycho-acoustic principles.
    Type: Application
    Filed: September 4, 2006
    Publication date: March 19, 2009
    Inventors: Walter Voessing, Peter Georg Baum
  • Publication number: 20090063165
    Abstract: A system and method for providing improved adaptive multi-rate wideband (AMR-WB) discontinuous transmission (DTX) synchronization. According to various embodiments, an indication on the start of the inactive speech period is signalled to the decoder via a voice activity detection (VAD) flag a predetermined number of frames before the DTX period will start, i.e., before the SID_FIRST frame is received. When the VAD flag indicates active speech, or when the VAD flag has been set to zero less than the predetermined number of frames ago, the received NO_DATA frame can be classified with a high degree of reliability as active speech, i.e., considered as transmitter, network or terminal-initiated signalling, and can be substituted by a SPEECH_LOST frame. When the VAD flag was set to zero eight frames ago or earlier, the NO_DATA frame is classified as DTX.
    Type: Application
    Filed: August 27, 2008
    Publication date: March 5, 2009
    Inventors: Pasi Ojala, Ari Lakaniemi
  • Publication number: 20090055171
    Abstract: A system is described that performs periodic waveform extrapolation based frame erasure concealment (FEC) to generate frames of an output speech signal corresponding to erased frames of encoded bit-stream in a manner reduces buzzy and tonal artifacts in the output speech signal. An embodiment of the invention uses a multiple of a pitch period associated with previously-decoded speech to perform periodic waveform extrapolation for consecutively-erased frames in a frame erasure beyond the first erased frame. An embodiment of the invention also attenuates the extrapolated signal after a threshold number of erased frames so as to reduce the FEC output signal to zero, wherein the threshold number of erased frames is dependent at least in part on the pitch period associated with the previously-decoded speech.
    Type: Application
    Filed: July 24, 2008
    Publication date: February 26, 2009
    Applicant: BROADCOM CORPORATION
    Inventor: Robert W. Zopf
  • Publication number: 20090043588
    Abstract: A system capable of reducing the influence of sound reverberation or reflection to improve sound-source separation accuracy. An original signal X(?,f) is separated from an observed signal Y(?,f) according to a first model and a second model to extract an unknown signal E(?,f). According to the first model, the original signal X(?,f) of the current frame f is represented as a combined signal of known signals S(?,f?m+1) (m=1 to M) that span a certain number M of current and previous frames. This enables extraction of the unknown signal E(?,f) without changing the window length while reducing the influence of reverberation or reflection of the known signal S(?,f) on the observed signal Y(?,f).
    Type: Application
    Filed: August 7, 2008
    Publication date: February 12, 2009
    Applicant: HONDA MOTOR CO., LTD.
    Inventors: Ryu Takeda, Kazuhiro Nakadai, Hiroshi Tsujino, Hiroshi Okuno
  • Publication number: 20090037186
    Abstract: In one embodiment, the method includes receiving the audio signal having configuration information and multi-channels, and reading a first indicator from the configuration information. The first indicator indicates whether or not channel mapping information is included in the configuration information. The channel mapping information is read from the configuration information if the first indicator indicates that the channel mapping information is included in the configuration information. The channel mapping information indicates to which speaker in a reproduction device to map each channel in the audio signal. A second indicator is also read from the configuration information. The second indicator indicates whether or not channel rearrangement information is included in the configuration information. The channel rearrangement information is read from the configuration information if the second indicator indicates that the channel rearrangement information is included in the configuration information.
    Type: Application
    Filed: September 24, 2008
    Publication date: February 5, 2009
    Inventor: Tilman Liebchen
  • Publication number: 20090027355
    Abstract: A portable digital audio device is capable of playing a number of different data file types, such as music data files, speech data files, video data files, and the like. Different CODECs are generally used for different data types. The system determines the data file type and selects the appropriate CODEC based on the reported data file type. In addition, the reported data file type is used to select the appropriate media interface manager and appropriate user interface. The user interface, or “skin” is selected for compatibility with the media interface manager and selected CODEC. The appropriate controls are enabled and displayed for user operation. As new CODECs are added to the system, appropriate media interface managers and skins are also added to provide the necessary user interface compatibility.
    Type: Application
    Filed: September 26, 2008
    Publication date: January 29, 2009
    Inventors: Edward C. Miller, Mark E. Phillips
  • Publication number: 20090030704
    Abstract: An acoustic signal encoding device for down-mixing at different ratios to encode a multichannel signal with a small number of channels, and an acoustic signal decoding device for decoding the signal encoded by the acoustic signal encoding device. In these devices, weighting means (103) in the acoustic signal encoding device (100) weights input signals of two channels individually according to a down-mixing coefficient thereby to calculate the level difference of the signals of two channels weighted by a level difference calculation unit (104). A separating unit (202) in the acoustic signal decoding device (200) separates the down-mixed signals into signals of two channels with the level difference information weighted.
    Type: Application
    Filed: October 13, 2005
    Publication date: January 29, 2009
    Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.
    Inventors: Yoshiaki Takagi, Naoya Tanaka
  • Publication number: 20090030702
    Abstract: In one embodiment, the method includes receiving an audio data frame having at least one channel. The channel is subdivided into a plurality of blocks, and at least two of the blocks are capable of having different lengths. The embodiment further includes obtaining indicator information indicating whether determining of a prediction order for each block is allowed, and determining the prediction order from the audio signal indicating the prediction order for each block if the indicator information indicating that determining of the prediction order for each block is allowed. The channel is decoded using the prediction order.
    Type: Application
    Filed: September 19, 2008
    Publication date: January 29, 2009
    Inventor: Tilman Liebchen
  • Publication number: 20090024387
    Abstract: In order to enhance the quality of a communication signal derived from speech and noise, a filter divides the communication signal into a plurality of frequency band signals. A calculator generates a plurality of power band signals each having a power band value and corresponding to one of the frequency band signals. The power band values are based on estimating, over a time period, the power of one of the frequency band signals. The time period is different for different ones of the frequency band signals. The power band values are used to calculate weighting factors which are used to alter the frequency band signals that are combined to generate an improved communication signal.
    Type: Application
    Filed: August 7, 2008
    Publication date: January 22, 2009
    Applicant: Tellabs Operations, Inc.
    Inventors: Ravi Chandran, Bruce E. Dunne, Daniel J. Marchok
  • Publication number: 20090024400
    Abstract: A multiplexed audio data decoder apparatus is provided in which integration of an audio decoder is easy, and has a high flexibility when the number of the formats to be processed is increased or when the specification is changed. In an external ROM 60 there are accumulated a plurality of decoding program codes corresponding to respective plural methods for compressing and encoding. A controller means 50 transfers the decoding program code corresponding to the method for compressing and encoding after changing thereof, from the external ROM 60 to an internal RAM 25. A DSP 22 starts decoding processing by using the decoding program code which is transmitted into the internal RAM 25.
    Type: Application
    Filed: September 23, 2008
    Publication date: January 22, 2009
    Inventors: YUKIO FUJII, SHINICHI OBATA, HIROAKI SHIRANE, EIJI YAMAMOTO
  • Publication number: 20090024386
    Abstract: A method comprises analyzing each frame of a plurality of frames of the speech signal to determine one or more speech parameters for the speech signal; deciding, for each frame of the plurality of frames of the speech signal, based on the one or more speech parameters of the speech signal, to select one of a plurality of encoding modes including a first encoding mode and a second encoding mode for encoding each frame of the plurality of frames of the speech signal; encoding each frame of the plurality of frames of the speech signal according to the selected one of the plurality of encoding modes for each frame of the plurality of frames in the deciding; the first encoding mode supports a first encoding rate and the second encoding mode supports a second encoding rate, wherein the first encoding rate is the same encoding rate as the encoding rate.
    Type: Application
    Filed: August 20, 2008
    Publication date: January 22, 2009
    Inventors: Huan-Yu Su, Yang Gao
  • Publication number: 20090024391
    Abstract: According to the present invention, a method for integrating processes with a multi-faceted human centered interface is provided. The interface is facilitated to implement a hands free, voice driven environment to control processes and applications. A natural language model is used to parse voice initiated commands and data, and to route those voice initiated inputs to the required applications or processes. The use of an intelligent context based parser allows the system to intelligently determine what processes are required to complete a task which is initiated using natural language. A single window environment provides an interface which is comfortable to the user by preventing the occurrence of distracting windows from appearing. The single window has a plurality of facets which allow distinct viewing areas. Each facet has an independent process routing its outputs thereto. As other processes are activated, each facet can reshape itself to bring a new process into one of the viewing areas.
    Type: Application
    Filed: September 29, 2008
    Publication date: January 22, 2009
    Applicant: EASTERN INVESTMENTS, LLC
    Inventors: Richard Grant, Pedro E. McGregor
  • Publication number: 20090006087
    Abstract: A method and system for synchronizing words in an input text of a speech with a continuous recording of the speech. A received input text includes previously recorded content of the speech to be reproduced. A synthetic speech corresponding to the received input text is generated. Ratio data including a ratio between the respective pronunciation times of words included in the received text in the generated synthetic speech is computed. The ratio data is used to determine an association between erroneously recognized words of the received text and a time to reproduce each erroneously recognized word. The association is outputted in a recording medium and/or displayed on a display device.
    Type: Application
    Filed: June 25, 2008
    Publication date: January 1, 2009
    Inventors: Noriko Imoto, Tetsuya Uda, Takatoshi Watanabe
  • Publication number: 20080312916
    Abstract: The intelligibility of speech signals is improved in the many situations where a voice signal is communicated or stored. Means and methods are disclosed for developing a scheme with high voice signal intelligibility without sacrifice of voice quality. The disclosed method comprises certain steps, including, but not limited to: Learning the noise on near-end side and enhancing the far-end voice as a function of the noise level on the near-end side. The disclosed method and apparatus are especially useful to increase the intelligibility of the cell phone's loudspeaker output. The invention includes the processing of an input speech signal to generate an enhanced intelligent signal. In frequency domain, the FFT spectrum of the speech received from the far-end is modified in accordance with the LPC spectrum of the local background noise to generate an enhanced intelligent signal. In time domain, the speech is modified in accordance with the LPC coefficients of the noise to generate an enhanced intelligent signal.
    Type: Application
    Filed: June 15, 2008
    Publication date: December 18, 2008
    Inventors: Alon Konchitsky, Alberto D. Berstein, Hariharan Ganapathy Kathirvelu, Sandeep Kulakcherla, William Martin Ribble
  • Publication number: 20080306733
    Abstract: A noise reducing circuit includes a denoising unit configured to eliminate a noise band from an input voice signal; a noise recognizing unit configured to recognize noise included in the voice signal; a denoising period generating unit configured to generate a signal indicating a denoising period in accordance with an occurrence period of the recognized noise; and a selecting unit configured to select an output of the denoising unit when the denoising period is indicated and select the voice signal when the denoising period is not indicated.
    Type: Application
    Filed: March 13, 2008
    Publication date: December 11, 2008
    Applicant: Sony Corporation
    Inventor: Kazuhiko OZAWA
  • Publication number: 20080300866
    Abstract: The invention concerns a system (300) and method (400) for bandwidth extension of voice for improving the quality of voice in a communication system. The method and system include the steps of filtering (402) a wideband voice signal to produce a first filtered signal (301) and a second filtered signal (331), vocoding (404) the first filtered signal to produce a narrowband vocoded signal (130), compensating (406) the second filtered signal for time alignment with the narrowband vocoded signal, and adding (335) the narrowband vocoded signal with the second filtered signal to produce a wideband vocoded signal (250). One or more features from the wideband vocoded signal can be extracted to create a wideband feature vector (147) for storage in a wideband vocoded speech database (220).
    Type: Application
    Filed: May 31, 2006
    Publication date: December 4, 2008
    Applicant: MOTOROLA, INC.
    Inventors: ADEEL MUKHTAR, DEEPAK P. AHYA
  • Publication number: 20080300886
    Abstract: In embodiments of the present invention, a system and method for enabling a user to interact with a computer platform using a voice command may comprise the steps of defining a structured grammar for handling a global voice command, defining a global voice command of the structured grammar wherein the global voice command enables access to an object of the computer platform using a single command, and mapping at least one function of the object to the global voice command, wherein upon receiving voice input from the user of the computer platform the object recognizes the global voice command and controls the function.
    Type: Application
    Filed: May 19, 2008
    Publication date: December 4, 2008
    Inventor: Kimberly Patch
  • Publication number: 20080294430
    Abstract: A noise reduction device is configured by use of: means for calculating a predetermined constant, and a predetermined reference signal R?(T) in the frequency domain, respectively by use of adaptive coefficients W?(m), and for thereby obtaining estimated values N? and Q?(T) respectively of stationary noise components, and non-stationary noise components corresponding to the reference signal, which are included in a predetermined observed signal X?(T) in the frequency domain; means and for applying a noise reduction process to the observed signal on the basis of each of the estimated values, and for updating each of the adaptive coefficients on the basis of a result of the process; and an adaptive learning means and for repeating the obtaining of the estimated values and the updating of the adaptive coefficients, and for thereby learning each of the adaptive coefficients.
    Type: Application
    Filed: August 5, 2008
    Publication date: November 27, 2008
    Inventor: Osamu Ichikawa
  • Publication number: 20080288259
    Abstract: The disclosed speech recognition system enables users to define personalized, context-aware voice commands without extensive software development. Command sets may be defined in a user-friendly language and stored in an eXtensible Markup Language (XML) file. Each command object within the command set may include one or more user configurable actions, one or more configurable rules, and one or more configurable conditions The command sets may be managed by a command set loader, that loads and processes each command set into computer executable code. The command set loader may enable and disable command sets. A macro processing component may provide a speech recognition grammar to an API of the speech recognition engine based on currently enabled commands. When the speech recognition engine recognizes user speech consistent with the grammar, the macro processing component may initiate the one or more computer executable actions.
    Type: Application
    Filed: March 18, 2008
    Publication date: November 20, 2008
    Applicant: Microsoft Corporation
    Inventors: Robert L. Chambers, Brian King
  • Publication number: 20080281588
    Abstract: A speech processing apparatus includes a spectrum envelope extracting unit which extracts the spectrum envelope of an input speech signal, a spectrum envelope deforming unit which applies deformation to the spectrum envelope to generate a deformed spectrum envelope, a spectrum fine structure extracting unit which extracts the spectrum fine structure of the input speech signal, a deformed spectrum generating unit which generates a deformed spectrum by combining the deformed spectrum envelope with the spectrum fine structure, and a speech generating unit which generates an output speech signal on the basis of the deformed spectrum. This apparatus emits a disrupting sound based on the output speech signal to prevent a third party from eavesdropping on a conversation.
    Type: Application
    Filed: August 31, 2007
    Publication date: November 13, 2008
    Applicants: JAPAN ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY, GLORY LTD.
    Inventors: Masato Akagi, Rieko Futonagane, Yoshihiro Irie, Hisakazu Yanagiuchi, Yoshitane Tanaka
  • Publication number: 20080281589
    Abstract: There is disclosed a noise suppression device capable of improving the noise suppression accuracy while reducing the audio distortion. In this device, a suppression unit suppresses a noise component from the audio power spectrum by using the detection result of the audio-existing band and the noise band in the audio power spectrum including the noise component. A pitch harmonic structure extracting unit (105) extracts a pitch harmonic power spectrum from the audio power spectrum. An audio-existence judgment unit (106) judges whether the audio power spectrum has audio existence according to the extracted pitch harmonic power spectrum. A pitch harmonic structure repair unit (108) repairs the extracted pitch harmonic power spectrum.
    Type: Application
    Filed: May 30, 2005
    Publication date: November 13, 2008
    Applicant: MATSUSHITA ELECTRIC INDUSTRAIL CO., LTD.
    Inventors: Youhua Wang, Takuya Kawashima, Koji Yoshida
  • Publication number: 20080275710
    Abstract: The present invention relates to a method, device (12) and computer program product for enabling detection of additional data embedded in a media signal that may have been subjected to scaling. The invention also relates to an additional data detecting device (10) comprising such a device for enabling detection. An envelope discriminating unit (ED) provides a first extracted narrow band envelope signal sample (we[n]) from an input media signal sample (yb[n]), and a variable scale down sampling unit (VSDS) down samples the narrow band envelope signal sample using a down sampling rate that is dependent on a scaling factor variable value (?) for providing at least one sample of a first additional data estimate (wn[k]) in order to allow the detection of additional data in said signal sample.
    Type: Application
    Filed: June 29, 2005
    Publication date: November 6, 2008
    Applicant: KONINKLIJKE PHILIPS ELECTRONICS, N.V.
    Inventors: Aweke Negash Lemma, Leon Maria Van De Kerkhof, Javier Francisco Aprea
  • Publication number: 20080270126
    Abstract: Provided are a vocal-cord recognition apparatus and a method thereof. The vocal-cord signal recognition apparatus includes a vocal-cord signal extracting unit for analyzing a feature of a vocal-cord signal inputted through a throat microphone, and extracting a vocal-cord feature vector from the vocal-cord signal using the analyzing data; and a vocal-cord signal recognition unit for recognizing the vocal-cord signal by extracting the feature of the vocal-cord signal using the vocal-cord signal feature vector extracted at the vocal-cord signal extracting means.
    Type: Application
    Filed: October 19, 2006
    Publication date: October 30, 2008
    Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Young-Giu Jung, Mun-Sung Han, Kwan-Hyun Cho, Jun-Seok Park
  • Publication number: 20080270124
    Abstract: Provided is a method of encoding an audio/speech signal, the method including determining a variable length of a frame, that is, a processing unit of an input signal in accordance with a position of an attack in the input signal; transforming each frame of the input signal to a frequency domain and dividing the frame into a plurality of sub frequency bands; and, if a signal of a sub frequency band is determined to be encoded in the frequency domain, encoding the signal of the sub frequency band in the frequency domain, and if the signal of the sub frequency band is determined to be encoded in a time domain, inverse transforming the signal of the sub frequency band to the time domain and encoding the inverse transformed signal in the time domain. According to the present invention, the audio/speech signal may be efficiently encoded by controlling time resolution and frequency resolution.
    Type: Application
    Filed: October 15, 2007
    Publication date: October 30, 2008
    Applicant: Samsung Electronics Co., Ltd
    Inventors: Chang-yong SON, Eun-mi Oh, Jung-hoe Kim, Ho-sang Sung, Kang-eun Lee, Ki-hyun Choo
  • Publication number: 20080262854
    Abstract: Methods and apparatuses for encoding and decoding a multi-channel audio signal are provided. In the encoding method, spatial information that is calculated based on a multi-channel audio signal and a downmix signal is encoded, and additional configuration information is generated based on information that is selected from the encoded spatial information. The downmix signal is encoded, and then, a bitstream is generated by combining the encoded downmix signal with the encoded spatial information. Thereafter, the additional configuration information is inserted into the bitstream. Therefore, it is possible to configure an optimum bitstream according to the circumstances by retransmitting all or part of information included in a header.
    Type: Application
    Filed: October 20, 2006
    Publication date: October 23, 2008
    Applicant: LG ELECTRONICS, INC.
    Inventors: Yang-Won Jung, Hee Suk Pang, Hyen-O Oh, Dong Soo Kim, Jae Hyun Lim
  • Publication number: 20080255856
    Abstract: An audio encoder (109) has a hierarchical encoding structure and generates a data stream comprising one or more audio channels as well as parametric audio encoding data. The encoder (109) comprises an encoding structure processor (305) which inserts decoder tree structure data into the data stream. The decoder tree structure data comprises at least one data value indicative of a channel split characteristic for an audio channel at a hierarchical layer of the hierarchical decoder structure and may specifically specify the decoder tree structures to be applied by a decoder A decoder (115) comprises a receiver (401) which receives the data stream and a decoder structure processor (405) for generating the hierarchical decoder structure in response to the decoder tree structure data. A decode processor (403) then generates output audio channels from the data stream using the hierarchical decoder structure.
    Type: Application
    Filed: July 7, 2006
    Publication date: October 16, 2008
    Applicant: KONINKLIJKE PHILIPS ELECTRONCIS N.V.
    Inventors: Erik Gosuinus Petrus Schuijers, Gerard Herman Hotho, Heiko Purnhagen, Wolfgang Schildbach, Holger Horich, Hans Magnus Kristofer Kjorling, Karl Jonas Roden