Sound Editing Patents (Class 704/278)
  • Patent number: 8515759
    Abstract: An apparatus for synthesizing a rendered output signal having a first audio channel and a second audio channel includes a decorrelator stage for generating a decorrelator signal based on a downmix signal, and a combiner for performing a weighted combination of the downmix signal and a decorrelated signal based on parametric audio object information, downmix information and target rendering information. The combiner solves the problem of optimally combining matrixing with decorrelation for a high quality stereo scene reproduction of a number of individual audio objects using a multichannel downmix.
    Type: Grant
    Filed: April 23, 2008
    Date of Patent: August 20, 2013
    Assignee: Dolby International AB
    Inventors: Jonas Engdegard, Heiko Purnhagen, Barbara Resch, Lars Villemoes, Cornelia Falch, Juergen Herre, Johannes Hilpert, Andreas Hoelzer, Leonid Terentiev
  • Patent number: 8515753
    Abstract: The example embodiment of the present invention provides an acoustic model adaptation method for enhancing recognition performance for a non-native speaker's speech. In order to adapt acoustic models, first, pronunciation variations are examined by analyzing a non-native speaker's speech. Thereafter, based on variation pronunciation of a non-native speaker's speech, acoustic models are adapted in a state-tying step during a training process of acoustic models. When the present invention for adapting acoustic models and a conventional acoustic model adaptation scheme are combined, more-enhanced recognition performance can be obtained. The example embodiment of the present invention enhances recognition performance for a non-native speaker's speech while reducing the degradation of recognition performance for a native speaker's speech.
    Type: Grant
    Filed: March 30, 2007
    Date of Patent: August 20, 2013
    Assignee: Gwangju Institute of Science and Technology
    Inventors: Hong Kook Kim, Yoo Rhee Oh, Jae Sam Yoon
  • Publication number: 20130211845
    Abstract: A method for automatically generating at least one voice message with the desired voice expression, starting from a prestored voice message, including assigning a vocal category to one word or to groups of words of the prestored message, computing, based on a vocal category/vocal parameter correlation table, a predetermined level of each one of the vocal parameters, emitting said voice message, with the vocal parameter levels computed for each word or group of words.
    Type: Application
    Filed: January 24, 2013
    Publication date: August 15, 2013
    Applicant: LA VOCE.NET DI CIRO IMPARATO
    Inventor: LA VOCE.NET DI CIRO IMPARATO
  • Patent number: 8510112
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, modifying the identified segments in the primary speech database using selected mappings, enhancing the primary speech database by substituting the modified segments for the corresponding identified database segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Grant
    Filed: August 31, 2006
    Date of Patent: August 13, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair Conkie, Ann Syrdal
  • Patent number: 8484035
    Abstract: A method of altering a social signaling characteristic of a speech signal. A statistically large number of speech samples created by different speakers in different tones of voice are evaluated to determine one or more relationships that exist between a selected social signaling characteristic and one or more measurable parameters of the speech samples. An input audio voice signal is then processed in accordance with these relationships to modify one or more of controllable parameters of input audio voice signal to produce a modified output audio voice signal in which said selected social signaling characteristic is modified. In a specific illustrative embodiment, a two-level hidden Markov model is used to identify voiced and unvoiced speech segments and selected controllable characteristics of these speech segments are modified to alter the desired social signaling characteristic.
    Type: Grant
    Filed: September 6, 2007
    Date of Patent: July 9, 2013
    Assignee: Massachusetts Institute of Technology
    Inventor: Alex Paul Pentland
  • Patent number: 8478599
    Abstract: An embodiment of the present invention is a method of presenting a media work which includes: detecting media work content properties in a portion of the media work; associating a presentation rate of the portion with the detected media work content properties; and presenting the portion at the presentation rate; wherein the media work content properties include one or more of: (a) indicia of a number of syllables in utterances; (b) indicia of a number of letters in a word; (c) indicia of the complexity of grammatical structures in portions of the media work; (d) indicia of arrival rate of newly presented objects; (e) indicia of temporal proximity of between events in portions of the media work or (f) indicia of number of phonemes per unit of time in portions of the media work.
    Type: Grant
    Filed: May 18, 2009
    Date of Patent: July 2, 2013
    Assignee: Enounce, Inc.
    Inventor: Donald J. Hejna, Jr.
  • Patent number: 8463612
    Abstract: Various embodiments of the invention provide a facility for monitoring audio events on a computer, including without limitation voice conversations (which often are carried on a digital transport platform, such as VoIP and/or other technologies). In a set of embodiments, a system intercepts the audio streams that flow into and out of an application program on a monitored client computer, for instance by inserting an audio stream capture program between a monitored application and the function calls in the audio driver libraries used by the application program to handle the audio streams. In some cases, this intercept does not disrupt the normal operation of the application. Optionally, the audio stream capture program takes the input and output audio streams and passes them through audio mixer and audio compression programs to yield a condensed recording of the original conversation.
    Type: Grant
    Filed: November 6, 2006
    Date of Patent: June 11, 2013
    Assignee: Raytheon Company
    Inventors: Greg S. Neath, John W. Rosenvall
  • Patent number: 8457688
    Abstract: A mobile wireless communications device may include a housing, a wireless transceiver carried by the housing, an audio transducer carried by the housing, and a novelty voice alteration processor carried by the housing and coupled to the wireless transceiver and the audio transducer and configured to alter voice communications. For example, the novelty voice alteration processor may comprise a memory and a processor cooperating therewith to alter the voice communications.
    Type: Grant
    Filed: February 26, 2009
    Date of Patent: June 4, 2013
    Assignee: Research In Motion Limited
    Inventors: Fredrik Stenmark, Daniel Hanson
  • Patent number: 8457771
    Abstract: A data stream is filtered to produce a filtered data stream. The data stream is analyzed based on an acoustic parameter to determine whether a predetermined condition is satisfied. At least one extraneous portion of the data stream, in which the predetermined condition is satisfied, is determined. Thereafter, the at least one extraneous portion is deleted from the data stream to produce the filtered data stream.
    Type: Grant
    Filed: December 10, 2009
    Date of Patent: June 4, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Yeon-Jun Kim, I. Dan Melamed, Bernard S. Renger, Steven Neil Tischer
  • Patent number: 8452604
    Abstract: Recognizable visual and/or audio artifacts, such as recognizable sounds, are introduced into visual and/or audio content in an identifying pattern to generate a signed visual and/or audio recording for distribution over a digital communications medium. A library of images and/or sounds may be provided, and the image and/or sounds from the library may be selectively inserted to generate the identifying pattern. The images and/or sounds may be inserted responsive to one or more parameters associated with creation of the visual and/or audio content. A representation of the identifying pattern may be generated and stored in a repository, e.g., an independent repository configured to maintain creative rights information. The stored pattern may be retrieved from the repository and compared to an unidentified visual and/or audio recording to determine an identity thereof.
    Type: Grant
    Filed: August 15, 2005
    Date of Patent: May 28, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Steven Tischer
  • Patent number: 8447604
    Abstract: Provided in some embodiments is a method including receiving ordered script words are indicative of dialogue words to be spoken, receiving audio data corresponding to at least a portion of the dialogue words to be spoken and including timecodes associated with dialogue words, generating a matrix of the ordered script words versus the dialogue words, aligning the matrix to determine hard alignment points that include matching consecutive sequences of ordered script words with corresponding sequences of dialogue words, partitioning the matrix of ordered script words into sub-matrices bounded by adjacent hard-alignment points and including corresponding sub-sets the script and dialogue words between the hard-alignment points, and aligning each of the sub-matrices.
    Type: Grant
    Filed: May 28, 2010
    Date of Patent: May 21, 2013
    Assignee: Adobe Systems Incorporated
    Inventor: Walter W. Chang
  • Patent number: 8438035
    Abstract: When there are missing voice-transmission-signals, a repetition-section calculating unit sets a plurality of repetition sections of different lengths that are determined to be similar to the voice-transmission-signals preceding the missing voice-transmission-signal, the repetition sections being determined with respect to stationary voice-transmission-signals stored in a normal signal storage unit, the stationary voice-transmission-signals being selected from the previously input voice-transmission-signals. A controller generates a concealment signal using the repetition sections.
    Type: Grant
    Filed: December 31, 2007
    Date of Patent: May 7, 2013
    Assignee: Fujitsu Limited
    Inventors: Kaori Endo, Yasuji Ota, Chikako Matsumoto
  • Patent number: 8433988
    Abstract: A method and apparatus are capable of masking a signal loss condition. According to an exemplary embodiment, the method includes steps of receiving a signal, detecting a period of loss of the signal, and enabling a received portion of the signal to be reproduced continuously and causing a portion of the signal lost during the period to be skipped.
    Type: Grant
    Filed: December 3, 2008
    Date of Patent: April 30, 2013
    Assignee: Thomson Licensing
    Inventors: Mark Alan Schultz, Ronald Douglas Johnson
  • Patent number: 8433073
    Abstract: In a sound effect applying apparatus, an input part frequency-analyzes an input signal of sound or voice for detecting a plurality of local peaks of harmonics contained in the input signal. A subharmonics provision part adds a spectrum component of subharmonics between the detected local peaks so as to provide the input signal with a sound effect. An output part converts the input signal of a frequency domain containing the added spectrum component into an output signal of a time domain for generating the sound or voice provided with the sound effect.
    Type: Grant
    Filed: June 22, 2005
    Date of Patent: April 30, 2013
    Assignee: Yamaha Corporation
    Inventors: Yasuo Yoshioka, Alex Loscos
  • Patent number: 8401861
    Abstract: A method for generating a frequency warping function comprising preparing the training speech of a source and a target speaker; performing frame alignment on the training speech of the speakers; selecting aligned frames from the frame-aligned training speech of the speakers; extracting corresponding sets of formant parameters from the selected aligned frames; and generating a frequency warping function based on the corresponding sets of formant parameters. The step of selecting aligned frames preferably selects a pair of aligned frames in the middle of the same or similar frame-aligned phonemes with the same or similar contexts in the speech of the source speaker and target speaker. The step of generating a frequency warping function preferably uses the various pairs of corresponding formant parameters in the corresponding sets of formant parameters as key positions in a piecewise linear frequency warping function to generate the frequency warping function.
    Type: Grant
    Filed: January 17, 2007
    Date of Patent: March 19, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Shuang Zhi Wei, Raimo Bakis, Ellen Marie Eide, Liqin Shen
  • Patent number: 8401865
    Abstract: This invention relates to a method, a computer program product, apparatuses and a system for extracting coded parameter set from an encoded audio/speech stream, said audio/speech stream being distributed to a sequence of packets, and generating a time scaled encoded audio/speech stream in the parameter coded domain using said extracted coded parameter set.
    Type: Grant
    Filed: July 18, 2007
    Date of Patent: March 19, 2013
    Assignee: Nokia Corporation
    Inventors: Pasi Sakari Ojala, Ari Kalevi Lakaniemi
  • Patent number: 8392180
    Abstract: In general, the techniques are described for adjusting audio gain levels for multi-talker audio. In one example, an audio system monitors an audio stream for the presence of a new talker. Upon identifying a new talker, the system determines whether the new talker is a first-time talker. For a first-time talker, the system executes a fast-attack/decay automatic gain control (AGC) algorithm to quickly determine a gain value for the first-time talker. The system additionally executes standard AGC techniques to refine the gain for the first-time talker while the first-time talker continues speaking. When a steady state within a decibel threshold is attained using standard AGC for the first-time talker, the system stores the steady state gain for the first-time talker to storage. Upon identifying a previously-identified talker, the system retrieves from storage the steady state gain for the talker and applies the steady state gain to the audio stream.
    Type: Grant
    Filed: May 18, 2012
    Date of Patent: March 5, 2013
    Assignee: Google Inc.
    Inventors: Serge Lachapelle, Alexander Kjeldaas
  • Patent number: 8392197
    Abstract: A speaker speed conversion system includes: a risk site detection unit (22) for detecting sites of risk regarding sound quality from among speech that is received as input, a frame boundary detection unit (23) for searching for a plurality of points that can serve as candidates of frame boundaries from among speech that is received as input and, of these points, supplying as a frame boundary the point that is predicted to be best from the standpoint of sound quality, and an OLA unit (25) for implementing speed conversion based on the detection results in the frame boundary detection unit (23); wherein the frame boundary detection unit (23) eliminates, from candidates of frame boundaries, sites of risk regarding sound quality that were detected in the risk site detection unit (22).
    Type: Grant
    Filed: July 22, 2008
    Date of Patent: March 5, 2013
    Assignee: NEC Corporation
    Inventor: Satoshi Hosokawa
  • Patent number: 8392195
    Abstract: A multiple audio/video data stream simulation method and system. A computing system receives first audio and/or video data streams. The first audio and/or video data streams include data associated with a first person and a second person. The computing system monitors the first audio and/or video data streams. The computing system identifies emotional attributes comprised by the first audio and/or video data streams. The computing system generates second audio and/or video data streams associated with the first audio and/or video data streams. The second audio and/or video data streams include the first audio and/or video data streams data without the emotional attributes. The computing system stores the second audio and/or video data streams.
    Type: Grant
    Filed: May 31, 2012
    Date of Patent: March 5, 2013
    Assignee: International Business Machines Corporation
    Inventors: Sara H. Basson, Dimitri Kanevsky, Edward Emile Kelley, Bhuvana Ramabhadran
  • Patent number: 8386251
    Abstract: A speech recognition system is provided with iteratively refined multiple passes through the received data to enhance the accuracy of the results by introducing constraints and adaptation from initial passes into subsequent recognition operations. The multiple passes are performed on an initial utterance received from a user. The iteratively enhanced subsequent passes are also performed on following utterances received from the user increasing an overall system efficiency and accuracy.
    Type: Grant
    Filed: June 8, 2009
    Date of Patent: February 26, 2013
    Assignee: Microsoft Corporation
    Inventors: Nikko Strom, Julian Odell, Jon Hamaker
  • Patent number: 8380509
    Abstract: A speech recognition device (1) processes speech data (SD) of a dictation and establishes recognized text information (ETI) and link information (LI) of the dictation. In a synchronous playback mode of the speech recognition device (1), during acoustic playback of the dictation a correction device (10) synchronously marks the word of the recognized text information (ETI) which word relates to speech data (SD) just played back marked by link information (LI) is marked synchronously, the just marked word featuring the position of an audio cursor (AC). When a user of the speech recognition device (1) recognizes an incorrect word, he positions a text cursor (TC) at the incorrect word and corrects it. Cursor synchronization means (15) makes it possible to synchronize text cursor (TC) with audio cursor (AC) or audio cursor (AC) with text cursor (TC) so the positioning of the respective cursor (AC, TC) is simplified considerably.
    Type: Grant
    Filed: February 13, 2012
    Date of Patent: February 19, 2013
    Assignee: Nuance Communications Austria GmbH
    Inventor: Wolfgang Gschwendtner
  • Patent number: 8380513
    Abstract: Improving speech capabilities of a multimodal application including receiving, by the multimodal browser, a media file having a metadata container; retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser; determining whether the speech artifact includes a grammar rule or a pronunciation rule; if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule; and if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule.
    Type: Grant
    Filed: May 19, 2009
    Date of Patent: February 19, 2013
    Assignee: International Business Machines Corporation
    Inventors: Ciprian Agapi, William K. Bodin, Charles W. Cross, Jr.
  • Patent number: 8374879
    Abstract: Systems and methods are described for speech systems that utilize an interaction manager to manage interactions—also known as dialogues—from one or more applications. The interactions are managed properly even if multiple applications use different grammars. The interaction manager maintains an interaction list. An application wishing to utilize the speech system submits one or more interactions to the interaction manager. Interactions are normally processed in the order in which they are received. An exception to this rule is an interaction that is configured by an application to be processed immediately, which causes the interaction manager to place the interaction at the front of the interaction list of interactions. If an application has designated an interaction to interrupt a currently processing interaction, then the newly submitted application will interrupt any interaction currently being processed and, therefore, it will be processed immediately.
    Type: Grant
    Filed: December 16, 2005
    Date of Patent: February 12, 2013
    Assignee: Microsoft Corporation
    Inventors: Stephen Russell Falcon, Clement Yip, Dan Banay, David Miller
  • Patent number: 8374878
    Abstract: Audio or sound envelopes contain or are combined with a sound module for generating and playing prerecorded sound tracks upon the opening of the envelope or removal of its contents. Operation of the sound module may be activated by the opening of the envelope flap, or by removal of the envelope contents. The flap is configured with a removable strip which protects the sound module from damage before and after opening. The sound module can be replayed by repeated operation of the flap. Alternate embodiments of the audio envelopes have other structural or operational features which work in concert with the sound module.
    Type: Grant
    Filed: June 23, 2009
    Date of Patent: February 12, 2013
    Assignee: American Greetings Corporation
    Inventors: Carol Miller, Mary McClain, David Mayer, Sharon Bogdanski, Kimberly Bikowski, Theresa Muri, Julie Vojtko
  • Patent number: 8364294
    Abstract: Tools and techniques are provided to allow the user of a signal editing application to retain control over individual changes, while still relieving the user of the responsibility of manually identifying problems. Specifically, tools and techniques are provided which separate the automated finding of potential problems from the automated correction of those problems. Thus, editing is performed in two phases, referred to herein as the “analysis” phase and the “action” phase. During the analysis phase, the signal editing application automatically identifies target areas within the signal that may be of particular interest to the user. During the “action” phase, the user is presented with the results of the analysis phase, and is able to decide what action to take relative to each target area.
    Type: Grant
    Filed: August 1, 2005
    Date of Patent: January 29, 2013
    Assignee: Apple Inc.
    Inventors: Christopher J. Moulios, Nikhil M. Bhatt
  • Patent number: 8355918
    Abstract: A method (10) in a speech recognition application callflow can include the steps of assigning (11) an individual option and a pre-built grammar to a same prompt, treating (15) the individual option as a valid output of the pre-built grammar if the individual option is a potential valid match to a recognition phrase (12) or an annotation (13) in the pre-built grammar, and treating (14) the individual option as an independent grammar from the pre-built grammar if the individual option fails to be a potential valid match to the recognition phrase or the annotation in the pre-built grammar.
    Type: Grant
    Filed: January 5, 2012
    Date of Patent: January 15, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Ciprian Agapi, Felipe Gomez, James R. Lewis, Vanessa V. Michelini
  • Patent number: 8340302
    Abstract: In summary, this application describes a psycho-acoustically motivated, parametric description of the spatial attributes of multichannel audio signals. This parametric description allows strong bitrate reductions in audio coders, since only one monaural signal has to be transmitted, combined with (quantized) parameters which describe the spatial properties of the signal. The decoder can form the original amount of audio channels by applying the spatial parameters. For near-CD-quality stereo audio, a bitrate associated with these spatial parameters of 10 kbit/s or less seems sufficient to reproduce the correct spatial impression at the receiving end.
    Type: Grant
    Filed: April 22, 2003
    Date of Patent: December 25, 2012
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Dirk Jeroen Breebaart, Steven Leonardus Josephus Dimphina Elisabeth Van De Par
  • Patent number: 8331572
    Abstract: In summary, this application describes a psycho-acoustically motivated, parametric description of the spatial attributes of multichannel audio signals. This parametric description allows strong bitrate reductions in audio coders, since only one monaural signal has to be transmitted, combined with (quantized) parameters which describe the spatial properties of the signal. The decoder can form the original amount of audio channels by applying the spatial parameters. For near-CD-quality stereo audio, a bitrate associated with these spatial parameters of 10 kbit/s or less seems sufficient to reproduce the correct spatial impression at the receiving end.
    Type: Grant
    Filed: July 27, 2009
    Date of Patent: December 11, 2012
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Dirk Jeroen Breebaart, Steven Leonardus Josephus Dimphina Elisabeth Van De Par
  • Patent number: 8315723
    Abstract: A recording and/or reproducing apparatus includes a microphone, a semiconductor memory, an operating section and a controller. An output signal from the microphone is written in the semiconductor memory and the written signals are read out from the semiconductor memory. The operating section performs input processing for writing a digital signal outputted by an analog/digital converter, reading out the digital signal stored in the semiconductor memory and for erasing the digital signal stored in the semiconductor memory. The control section controls the writing of the microphone output signal in the semiconductor memory based on an input from the operating section and the readout of the digital signal stored in the semiconductor memory.
    Type: Grant
    Filed: October 3, 2005
    Date of Patent: November 20, 2012
    Assignee: Sony Corporation
    Inventor: Kenichi Iida
  • Patent number: 8311657
    Abstract: Some embodiments of the invention provide a computer system for processing an audio track. This system includes at least on DSP for processing the audio track. It also includes an application for editing the audio track. To process audio data in a first interval of the audio track, the application first asks and obtains from the DSP an impulse response parameter related to the DSP's processing of audio data. From the received impulse response parameter, the application identifies a second audio track interval that is before the first interval. To process audio data in the first interval, the application then directs the DSP to process audio data within the first and second intervals.
    Type: Grant
    Filed: May 23, 2008
    Date of Patent: November 13, 2012
    Assignee: Apple Inc.
    Inventors: Alan C. Cannistraro, William George Stewart, Roger A. Powell, Kevin Christopher Rogers, Kelly B. Jacklin, Doug Wyatt
  • Patent number: 8306828
    Abstract: An audio signal expansion and compression method for expanding and compressing an audio signal in a time domain, includes the steps of setting an initial value of a signal comparison length of a first comparison interval and a second comparison interval, used for detection of two similar waveforms in the audio signal, equal to or larger than a minimum waveform detection length, determining an interval length of the two similar waveforms while changing a shift amount of the first comparison interval and the second comparison interval so that the shift amount does not exceed the signal comparison length, and expanding or compressing the audio signal in the time domain on the basis of the interval length of the two similar waveforms.
    Type: Grant
    Filed: May 10, 2007
    Date of Patent: November 6, 2012
    Assignee: Sony Corporation
    Inventors: Osamu Nakamura, Mototsugu Abe, Masayuki Nishiguchi
  • Patent number: 8300851
    Abstract: A method of managing a sound source in a digital AV device and an apparatus thereof are provided. The method of managing a sound source in a digital AV device includes: extracting at least one sound source from sound being reproduced through the digital AV device; mapping an image to the extracted sound source; and managing the sound sources by using the mapped image. In addition, preferably, the extracted sound source is registered, changed, deleted, selectively reproduced, or selectively deleted by using the image. Accordingly, sound being output can be visually managed by handling the sound sources separately, a desired sound source can be selectively reproduced or removed such that utilization of the digital AV device can be enhanced.
    Type: Grant
    Filed: November 10, 2005
    Date of Patent: October 30, 2012
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jung-eun Shin, Eun-ha Lee
  • Patent number: 8301443
    Abstract: A computer implemented method, apparatus, and computer program product for generating audio cohorts. An audio analysis engine receives audio data from a set of audio input devices. The audio data is associated with a plurality of objects. The audio data comprises a set of audio patterns. The audio data is processed to identify attributes of the audio data to form digital audio data. The digital audio data comprises metadata describing the attributes of the audio data. A set of audio cohorts is generated using the digital audio data and cohort criteria. Each audio cohort in the set of audio cohorts comprises a set of objects from the plurality of objects that share at least one audio attribute in common.
    Type: Grant
    Filed: November 21, 2008
    Date of Patent: October 30, 2012
    Assignee: International Business Machines Corporation
    Inventors: Robert Lee Angell, Robert R Friedlander, James R Kraemer
  • Patent number: 8296154
    Abstract: A sound processor including a microphone (1), a pre-amplifier (2), a bank of N parallel filters (3), means for detecting short-duration transitions in the envelope signal of each filter channel, and means for applying gain to the outputs of these filter channels in which the gain is related to a function of the second-order derivative of the slow-varying envelope signal in each filter channel, to assist in perception of low-intensity sort-duration speech features in said signal.
    Type: Grant
    Filed: October 28, 2008
    Date of Patent: October 23, 2012
    Assignee: Hearworks Pty Limited
    Inventors: Andrew E. Vandali, Graeme M. Clark
  • Patent number: 8296143
    Abstract: An audio waveform processing not imparting any feeling of strangeness and high in definition, in which time stretch and pitch shift are performed by a vocoder method, and the variation of phase over the whole waveform caused by the vocoder method at all times is reduced. An audio input waveform is handled as one band as it is or subjected to frequency band division into bands. While performing time stretch and pitch shift of each band waveform like conventional vocoder methods, the waveforms are combined. The combined waveform of the band is phase-synchronized at regular intervals to reduce the variation of phase. The phase-synchronized waveforms of the band are added, thus obtaining the final output waveform.
    Type: Grant
    Filed: December 26, 2005
    Date of Patent: October 23, 2012
    Assignee: P Softhouse Co., ltd.
    Inventor: Takuma Kudoh
  • Patent number: 8285547
    Abstract: An audio font output device is disclosed that is able to effectively convert characters or text into an audio signal recognizable by the acoustic sense of human beings. The audio font output device includes a font data base that stores a character corresponding to a symbol code or picture data of a symbol, first audio data corresponding to the symbol code, a symbol display unit that displays the character corresponding to the symbol code or the symbol based on the picture data, and an audio output unit that outputs an audio signal based on the first audio data corresponding to the symbol code.
    Type: Grant
    Filed: April 3, 2006
    Date of Patent: October 9, 2012
    Assignee: Ricoh Company, Ltd.
    Inventor: Atsushi Koinuma
  • Patent number: 8280738
    Abstract: The voice quality conversion apparatus includes: low-frequency harmonic level calculating units and a harmonic level mixing unit for calculating a low-frequency sound source spectrum by mixing a level of a harmonic of an input sound source waveform and a level of a harmonic of a target sound source waveform at a predetermined conversion ratio for each order of harmonics including fundamental, in a frequency range equal to or lower than a boundary frequency; a high-frequency spectral envelope mixing unit that calculates a high-frequency sound source spectrum by mixing the input sound source spectrum and the target sound source spectrum at the predetermined conversion ratio in a frequency range larger than the boundary frequency; and a spectrum combining unit that combines the low-frequency sound source spectrum with the high-frequency sound source spectrum at the boundary frequency to generate a sound source spectrum for an entire frequency range.
    Type: Grant
    Filed: January 31, 2011
    Date of Patent: October 2, 2012
    Assignee: Panasonic Corporation
    Inventors: Yoshifumi Hirose, Takahiro Kamai
  • Patent number: 8275603
    Abstract: A speech translating apparatus includes a input unit, a speech recognizing unit, a translating unit, a first dividing unit, a second dividing unit, an associating unit, and an outputting unit. The input unit inputs a speech in a first language. The speech recognizing unit generates a first text from the speech. The translating unit translates the first text into a second language and generates a second text. The first dividing unit divides the first text and generates first phrases. The second dividing unit divides the second text and generates second phrases. The associating unit associates semantically equivalent phrases within each group of phrases. The outputting unit sequentially outputs the associated phrases in a phrase order within the second text.
    Type: Grant
    Filed: September 4, 2007
    Date of Patent: September 25, 2012
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Kentaro Furihata, Tetsuro Chino, Satoshi Kamatani
  • Patent number: 8271288
    Abstract: In a masking sound generation apparatus, a CPU analyzes a speech utterance speed of a received sound signal. Then, the CPU copies the received sound signal into a plurality of sound signals and performs the following processing on each of the sound signals. Namely, the CPU divides each of the sound signals into frames on the basis of a frame length determined on the basis of the speech utterance speed. Reverse process is performed on each of the frames to replace a waveform of the frame with a reverse waveform, and a windowing process is performed to achieve a smooth connection between the frames. Then, the CPU randomly rearranges the order of the frames and mixes the plurality of sound signals to generate a masking sound signal.
    Type: Grant
    Filed: September 22, 2011
    Date of Patent: September 18, 2012
    Assignee: Yamaha Corporation
    Inventors: Atsuko Ito, Yasushi Shimizu, Akira Miki, Masato Hata
  • Patent number: 8255221
    Abstract: Disclosed is a system and method for generating a web podcast interview that allows a single user to create his own multi-voices interview from his computer. The method allows the user to enter a set of questions from a text file using a text editor. (Answers may also be entered from a text file although this is not the more preferred embodiment.) For each question, the user may select one particular interviewer voice among a plurality of predefined interviewer voices, and by using a text-to-speech module in a text-to-speech server, each question is converted into an audio question having the selected interviewer voice. Then, the user preferably records answers to each audio question using a telephone. And a questions/answers sequence in a podcast compliant format is generated.
    Type: Grant
    Filed: December 1, 2008
    Date of Patent: August 28, 2012
    Assignee: International Business Machines Corporation
    Inventors: Steve Groeger, Brian Heasman, Christopher von Koschembahr, Yuk-Lun Wong
  • Patent number: 8244542
    Abstract: A method, article of manufacture, and apparatus for monitoring a location having a plurality of audio sensors and video sensors are disclosed. In an embodiment, this comprises receiving auditory data, comparing a portion of the auditory data to a lexicon comprising a plurality of keywords to determine if there is a match to a keyword from the lexicon, and if a match is found, selecting at least one video sensor to monitor an area to be monitored. Video data from the video sensor is archived with the auditory data and metadata. The video sensor is selected by determining video sensors associated with the areas to be monitored. A lookup table is used to determine the association. Cartesian coordinates may be used to determine positions of components and their areas of coverage.
    Type: Grant
    Filed: March 31, 2005
    Date of Patent: August 14, 2012
    Assignee: EMC Corporation
    Inventors: Christopher Hercules Claudatos, William Dale Andruss, Richard Urmston, John Louis Acott
  • Patent number: 8229748
    Abstract: Methods and apparatus to present a video program to a visually impaired person are disclosed. An example method comprises receiving a video stream and an associated audio stream of a video program, detecting a portion of the video program that is not readily consumable by a visually impaired person, obtaining text associated with the portion of the video program, converting the text to a second audio stream, and combining the second audio stream with the associated audio stream.
    Type: Grant
    Filed: April 14, 2008
    Date of Patent: July 24, 2012
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Hisao M. Chang, Horst Schroeter
  • Patent number: 8229754
    Abstract: Systems, methods, and computer program products for graphically displaying audio data are provided. In some implementations, a method is provided. The method includes displaying a graphical visual representation of digital audio data, the representation displaying a feature of the audio data on a feature axis with respect to time on a time axis. The method also includes receiving an input in the graphical visual representation selecting a range with respect to the feature and automatically extending the selected range with respect to time to define a selected region of the visual representation, where the extended range with respect to time is predefined and ignoring any component of the received input with respect to time.
    Type: Grant
    Filed: October 23, 2006
    Date of Patent: July 24, 2012
    Assignee: Adobe Systems Incorporated
    Inventors: Daniel Ramirez, Todd Orler, Shenzhi Zhang, Jeffrey Garner
  • Patent number: 8223269
    Abstract: In a closed caption production device, video recognition processing of an input video signal is performed by a video recognizer. This causes a working object in video to be recognized. In addition, a sound recognizer performs sound recognition processing of an input sound signal. This causes a position of a sound source to be estimated. A controller performs linking processing by comparing information of the working object recognized by the video recognition processing with positional information of the sound source estimated by the sound recognition processing. This causes a position of a closed caption produced based on the sound signal to be set in the vicinity of the working object in the video.
    Type: Grant
    Filed: September 19, 2007
    Date of Patent: July 17, 2012
    Assignee: Panasonic Corporation
    Inventor: Isao Ikegami
  • Patent number: 8219400
    Abstract: Stereo to mono voice conferencing conversion is performed during a voice conference. Conferencing equipment receives audio for right and left channels and filters each of the channels into a plurality of bands. For each band of each channel, the equipment determines an energy level and compares each energy level for each band of the right channel to each energy level for each corresponding band of the left channel. Based on the comparison, the equipment determines which channel has more audio resulting from speech. Based on the determination, the equipment adjusts delivery of the audio from the right and left channels to a mono channel for transmission to endpoints only capable of mono audio in the voice conference.
    Type: Grant
    Filed: November 21, 2008
    Date of Patent: July 10, 2012
    Assignee: Polycom, Inc.
    Inventor: Peter L. Chu
  • Patent number: 8204750
    Abstract: Disclosed are Multipurpose Media Players that enable users to create transcriptions, closed captions, and/or logs of digitized recordings, that enable the presentation of transcripts, closed captions, logs, and digitized recordings in a correlated manner to users, that enable users to compose one or more scenes of a production, and that enable users to compose storyboards for a production. The multipurpose media players can be embodied within Internet browser environments, thereby providing high availability of the multipurpose players across software platforms, networks, and physical locations.
    Type: Grant
    Filed: February 14, 2006
    Date of Patent: June 19, 2012
    Assignee: Teresis Media Management
    Inventor: Keri DeWitt
  • Patent number: 8174981
    Abstract: Method of processing a transmitted encoded media data stream is received. If a data element arrives prior to, or at, a predetermined playout deadline, the data element is decoded, the media represented by the decoded data element is played, and the data element is provided to a decoder state machine to update a decoder state. If a data element arrives after the predetermined playout deadline, the data element is provided to the decoder state machine to update the decoder state. In one embodiment, if the specified data element fails to arrive by the playout deadline, a subsequently received data element is saved in memory. Then, if the specified data element arrives after the predetermined playout deadline, the specified data element and the saved, subsequently received, data element are provided to the decoder state machine to update the decoder state.
    Type: Grant
    Filed: December 2, 2008
    Date of Patent: May 8, 2012
    Assignee: Broadcom Corporation
    Inventor: Wilfrid LeBlanc
  • Patent number: 8165888
    Abstract: Disclosed is a reproducing apparatus comprising: a reproduction section to reproduce reproduction data comprising sound data and/or image data; a selection section to calculate evaluation values between a link source set for the reproduction data and each of a plurality of link destinations corresponding to the link source by a predetermined arithmetic expression based on link information of the plurality of link destinations, and to select a link destination having a highest evaluation among the evaluation values out of the plurality of link destinations; and a reproduction control section to move a reproduction point of the reproduction data reproduced by the reproduction section to a position corresponding to the link destination by linking the link source with the link destination when the reproduction point reaches a given point with respect to a position corresponding to the link source, and to instruct the reproduction section to reproduce the reproduction data.
    Type: Grant
    Filed: March 14, 2008
    Date of Patent: April 24, 2012
    Assignees: The University of Electro-Communications, Funai Electric Co., Ltd.
    Inventors: Kota Takahashi, Yasuo Masaki
  • Patent number: 8155964
    Abstract: This invention includes: a voice quality feature database (101) holding voice quality features; a speaker attribute database (106) holding, for each voice quality feature, an identifier enabling a user to expect a voice quality of the voice quality feature; a weight setting unit (103) setting a weight for each acoustic feature of a voice quality; a scaling unit (105) calculating display coordinates of each voice quality feature based on the acoustic features in the voice quality feature and the weights set by the weight setting unit (103); a display unit (107) displaying the identifier of each voice quality feature on the calculated display coordinates; a position input unit (108) receiving designated coordinates; and a voice quality mix unit (110) (i) calculating a distance between (1) the received designated coordinates and (2) the display coordinates of each of a part or all of the voice quality features, and (ii) mixing the acoustic features of the part or all of the voice quality features together based
    Type: Grant
    Filed: June 4, 2008
    Date of Patent: April 10, 2012
    Assignee: Panasonic Corporation
    Inventors: Yoshifumi Hirose, Takahiro Kamai
  • Patent number: 8145497
    Abstract: Provided are a user interface for processing digital data, a method for processing a media interface, and a recording medium thereof. The user interface is used for converting a selected script into voice to generate digital data having a form of a voice file corresponding to the script, or for managing the generated digital data. In the method, the user interface is displayed. The user interface includes at least a text window on which a script to be converted into voice is written, and an icon to be selected for converting the script written on the text window into voice.
    Type: Grant
    Filed: July 10, 2008
    Date of Patent: March 27, 2012
    Assignee: LG Electronics Inc.
    Inventors: Tae Hee Ahn, Sung Hun Kim, Dong Hoon Lee