Sound Editing Patents (Class 704/278)

Apparatus and method for synthesizing an output signal

Patent number: 8515759

Abstract: An apparatus for synthesizing a rendered output signal having a first audio channel and a second audio channel includes a decorrelator stage for generating a decorrelator signal based on a downmix signal, and a combiner for performing a weighted combination of the downmix signal and a decorrelated signal based on parametric audio object information, downmix information and target rendering information. The combiner solves the problem of optimally combining matrixing with decorrelation for a high quality stereo scene reproduction of a number of individual audio objects using a multichannel downmix.

Type: Grant

Filed: April 23, 2008

Date of Patent: August 20, 2013

Assignee: Dolby International AB

Inventors: Jonas Engdegard, Heiko Purnhagen, Barbara Resch, Lars Villemoes, Cornelia Falch, Juergen Herre, Johannes Hilpert, Andreas Hoelzer, Leonid Terentiev
Acoustic model adaptation methods based on pronunciation variability analysis for enhancing the recognition of voice of non-native speaker and apparatus thereof

Patent number: 8515753

Abstract: The example embodiment of the present invention provides an acoustic model adaptation method for enhancing recognition performance for a non-native speaker's speech. In order to adapt acoustic models, first, pronunciation variations are examined by analyzing a non-native speaker's speech. Thereafter, based on variation pronunciation of a non-native speaker's speech, acoustic models are adapted in a state-tying step during a training process of acoustic models. When the present invention for adapting acoustic models and a conventional acoustic model adaptation scheme are combined, more-enhanced recognition performance can be obtained. The example embodiment of the present invention enhances recognition performance for a non-native speaker's speech while reducing the degradation of recognition performance for a native speaker's speech.

Type: Grant

Filed: March 30, 2007

Date of Patent: August 20, 2013

Assignee: Gwangju Institute of Science and Technology

Inventors: Hong Kook Kim, Yoo Rhee Oh, Jae Sam Yoon
METHOD AND DEVICE FOR PROCESSING VOCAL MESSAGES

Publication number: 20130211845

Abstract: A method for automatically generating at least one voice message with the desired voice expression, starting from a prestored voice message, including assigning a vocal category to one word or to groups of words of the prestored message, computing, based on a vocal category/vocal parameter correlation table, a predetermined level of each one of the vocal parameters, emitting said voice message, with the vocal parameter levels computed for each word or group of words.

Type: Application

Filed: January 24, 2013

Publication date: August 15, 2013

Applicant: LA VOCE.NET DI CIRO IMPARATO

Inventor: LA VOCE.NET DI CIRO IMPARATO
Method and system for enhancing a speech database

Patent number: 8510112

Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, modifying the identified segments in the primary speech database using selected mappings, enhancing the primary speech database by substituting the modified segments for the corresponding identified database segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

Type: Grant

Filed: August 31, 2006

Date of Patent: August 13, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Alistair Conkie, Ann Syrdal
Modification of voice waveforms to change social signaling

Patent number: 8484035

Abstract: A method of altering a social signaling characteristic of a speech signal. A statistically large number of speech samples created by different speakers in different tones of voice are evaluated to determine one or more relationships that exist between a selected social signaling characteristic and one or more measurable parameters of the speech samples. An input audio voice signal is then processed in accordance with these relationships to modify one or more of controllable parameters of input audio voice signal to produce a modified output audio voice signal in which said selected social signaling characteristic is modified. In a specific illustrative embodiment, a two-level hidden Markov model is used to identify voiced and unvoiced speech segments and selected controllable characteristics of these speech segments are modified to alter the desired social signaling characteristic.

Type: Grant

Filed: September 6, 2007

Date of Patent: July 9, 2013

Assignee: Massachusetts Institute of Technology

Inventor: Alex Paul Pentland
Method and apparatus to determine and use audience affinity and aptitude

Patent number: 8478599

Abstract: An embodiment of the present invention is a method of presenting a media work which includes: detecting media work content properties in a portion of the media work; associating a presentation rate of the portion with the detected media work content properties; and presenting the portion at the presentation rate; wherein the media work content properties include one or more of: (a) indicia of a number of syllables in utterances; (b) indicia of a number of letters in a word; (c) indicia of the complexity of grammatical structures in portions of the media work; (d) indicia of arrival rate of newly presented objects; (e) indicia of temporal proximity of between events in portions of the media work or (f) indicia of number of phonemes per unit of time in portions of the media work.

Type: Grant

Filed: May 18, 2009

Date of Patent: July 2, 2013

Assignee: Enounce, Inc.

Inventor: Donald J. Hejna, Jr.
Monitoring and collection of audio events

Patent number: 8463612

Abstract: Various embodiments of the invention provide a facility for monitoring audio events on a computer, including without limitation voice conversations (which often are carried on a digital transport platform, such as VoIP and/or other technologies). In a set of embodiments, a system intercepts the audio streams that flow into and out of an application program on a monitored client computer, for instance by inserting an audio stream capture program between a monitored application and the function calls in the audio driver libraries used by the application program to handle the audio streams. In some cases, this intercept does not disrupt the normal operation of the application. Optionally, the audio stream capture program takes the input and output audio streams and passes them through audio mixer and audio compression programs to yield a condensed recording of the original conversation.

Type: Grant

Filed: November 6, 2006

Date of Patent: June 11, 2013

Assignee: Raytheon Company

Inventors: Greg S. Neath, John W. Rosenvall
Mobile wireless communications device with voice alteration and related methods

Patent number: 8457688

Abstract: A mobile wireless communications device may include a housing, a wireless transceiver carried by the housing, an audio transducer carried by the housing, and a novelty voice alteration processor carried by the housing and coupled to the wireless transceiver and the audio transducer and configured to alter voice communications. For example, the novelty voice alteration processor may comprise a memory and a processor cooperating therewith to alter the voice communications.

Type: Grant

Filed: February 26, 2009

Date of Patent: June 4, 2013

Assignee: Research In Motion Limited

Inventors: Fredrik Stenmark, Daniel Hanson
Automated detection and filtering of audio advertisements

Patent number: 8457771

Abstract: A data stream is filtered to produce a filtered data stream. The data stream is analyzed based on an acoustic parameter to determine whether a predetermined condition is satisfied. At least one extraneous portion of the data stream, in which the predetermined condition is satisfied, is determined. Thereafter, the at least one extraneous portion is deleted from the data stream to produce the filtered data stream.

Type: Grant

Filed: December 10, 2009

Date of Patent: June 4, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Yeon-Jun Kim, I. Dan Melamed, Bernard S. Renger, Steven Neil Tischer
Systems, methods and computer program products providing signed visual and/or audio records for digital distribution using patterned recognizable artifacts

Patent number: 8452604

Abstract: Recognizable visual and/or audio artifacts, such as recognizable sounds, are introduced into visual and/or audio content in an identifying pattern to generate a signed visual and/or audio recording for distribution over a digital communications medium. A library of images and/or sounds may be provided, and the image and/or sounds from the library may be selectively inserted to generate the identifying pattern. The images and/or sounds may be inserted responsive to one or more parameters associated with creation of the visual and/or audio content. A representation of the identifying pattern may be generated and stored in a repository, e.g., an independent repository configured to maintain creative rights information. The stored pattern may be retrieved from the repository and compared to an unidentified visual and/or audio recording to determine an identity thereof.

Type: Grant

Filed: August 15, 2005

Date of Patent: May 28, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Steven Tischer
Method and apparatus for processing scripts and related data

Patent number: 8447604

Abstract: Provided in some embodiments is a method including receiving ordered script words are indicative of dialogue words to be spoken, receiving audio data corresponding to at least a portion of the dialogue words to be spoken and including timecodes associated with dialogue words, generating a matrix of the ordered script words versus the dialogue words, aligning the matrix to determine hard alignment points that include matching consecutive sequences of ordered script words with corresponding sequences of dialogue words, partitioning the matrix of ordered script words into sub-matrices bounded by adjacent hard-alignment points and including corresponding sub-sets the script and dialogue words between the hard-alignment points, and aligning each of the sub-matrices.

Type: Grant

Filed: May 28, 2010

Date of Patent: May 21, 2013

Assignee: Adobe Systems Incorporated

Inventor: Walter W. Chang
Concealment signal generator, concealment signal generation method, and computer product

Patent number: 8438035

Abstract: When there are missing voice-transmission-signals, a repetition-section calculating unit sets a plurality of repetition sections of different lengths that are determined to be similar to the voice-transmission-signals preceding the missing voice-transmission-signal, the repetition sections being determined with respect to stationary voice-transmission-signals stored in a normal signal storage unit, the stationary voice-transmission-signals being selected from the previously input voice-transmission-signals. A controller generates a concealment signal using the repetition sections.

Type: Grant

Filed: December 31, 2007

Date of Patent: May 7, 2013

Assignee: Fujitsu Limited

Inventors: Kaori Endo, Yasuji Ota, Chikako Matsumoto
Method and apparatus for masking signal loss

Patent number: 8433988

Abstract: A method and apparatus are capable of masking a signal loss condition. According to an exemplary embodiment, the method includes steps of receiving a signal, detecting a period of loss of the signal, and enabling a received portion of the signal to be reproduced continuously and causing a portion of the signal lost during the period to be skipped.

Type: Grant

Filed: December 3, 2008

Date of Patent: April 30, 2013

Assignee: Thomson Licensing

Inventors: Mark Alan Schultz, Ronald Douglas Johnson
Adding a sound effect to voice or sound by adding subharmonics

Patent number: 8433073

Abstract: In a sound effect applying apparatus, an input part frequency-analyzes an input signal of sound or voice for detecting a plurality of local peaks of harmonics contained in the input signal. A subharmonics provision part adds a spectrum component of subharmonics between the detected local peaks so as to provide the input signal with a sound effect. An output part converts the input signal of a frequency domain containing the added spectrum component into an output signal of a time domain for generating the sound or voice provided with the sound effect.

Type: Grant

Filed: June 22, 2005

Date of Patent: April 30, 2013

Assignee: Yamaha Corporation

Inventors: Yasuo Yoshioka, Alex Loscos
Generating a frequency warping function based on phoneme and context

Patent number: 8401861

Abstract: A method for generating a frequency warping function comprising preparing the training speech of a source and a target speaker; performing frame alignment on the training speech of the speakers; selecting aligned frames from the frame-aligned training speech of the speakers; extracting corresponding sets of formant parameters from the selected aligned frames; and generating a frequency warping function based on the corresponding sets of formant parameters. The step of selecting aligned frames preferably selects a pair of aligned frames in the middle of the same or similar frame-aligned phonemes with the same or similar contexts in the speech of the source speaker and target speaker. The step of generating a frequency warping function preferably uses the various pairs of corresponding formant parameters in the corresponding sets of formant parameters as key positions in a piecewise linear frequency warping function to generate the frequency warping function.

Type: Grant

Filed: January 17, 2007

Date of Patent: March 19, 2013

Assignee: Nuance Communications, Inc.

Inventors: Shuang Zhi Wei, Raimo Bakis, Ellen Marie Eide, Liqin Shen
Flexible parameter update in audio/speech coded signals

Patent number: 8401865

Abstract: This invention relates to a method, a computer program product, apparatuses and a system for extracting coded parameter set from an encoded audio/speech stream, said audio/speech stream being distributed to a sequence of packets, and generating a time scaled encoded audio/speech stream in the parameter coded domain using said extracted coded parameter set.

Type: Grant

Filed: July 18, 2007

Date of Patent: March 19, 2013

Assignee: Nokia Corporation

Inventors: Pasi Sakari Ojala, Ari Kalevi Lakaniemi
Automatic gain control

Patent number: 8392180

Abstract: In general, the techniques are described for adjusting audio gain levels for multi-talker audio. In one example, an audio system monitors an audio stream for the presence of a new talker. Upon identifying a new talker, the system determines whether the new talker is a first-time talker. For a first-time talker, the system executes a fast-attack/decay automatic gain control (AGC) algorithm to quickly determine a gain value for the first-time talker. The system additionally executes standard AGC techniques to refine the gain for the first-time talker while the first-time talker continues speaking. When a steady state within a decibel threshold is attained using standard AGC for the first-time talker, the system stores the steady state gain for the first-time talker to storage. Upon identifying a previously-identified talker, the system retrieves from storage the steady state gain for the talker and applies the steady state gain to the audio stream.

Type: Grant

Filed: May 18, 2012

Date of Patent: March 5, 2013

Assignee: Google Inc.

Inventors: Serge Lachapelle, Alexander Kjeldaas
Speaker speed conversion system, method for same, and speed conversion device

Patent number: 8392197

Abstract: A speaker speed conversion system includes: a risk site detection unit (22) for detecting sites of risk regarding sound quality from among speech that is received as input, a frame boundary detection unit (23) for searching for a plurality of points that can serve as candidates of frame boundaries from among speech that is received as input and, of these points, supplying as a frame boundary the point that is predicted to be best from the standpoint of sound quality, and an OLA unit (25) for implementing speed conversion based on the detection results in the frame boundary detection unit (23); wherein the frame boundary detection unit (23) eliminates, from candidates of frame boundaries, sites of risk regarding sound quality that were detected in the risk site detection unit (22).

Type: Grant

Filed: July 22, 2008

Date of Patent: March 5, 2013

Assignee: NEC Corporation

Inventor: Satoshi Hosokawa
Multiple audio/video data stream simulation

Patent number: 8392195

Abstract: A multiple audio/video data stream simulation method and system. A computing system receives first audio and/or video data streams. The first audio and/or video data streams include data associated with a first person and a second person. The computing system monitors the first audio and/or video data streams. The computing system identifies emotional attributes comprised by the first audio and/or video data streams. The computing system generates second audio and/or video data streams associated with the first audio and/or video data streams. The second audio and/or video data streams include the first audio and/or video data streams data without the emotional attributes. The computing system stores the second audio and/or video data streams.

Type: Grant

Filed: May 31, 2012

Date of Patent: March 5, 2013

Assignee: International Business Machines Corporation

Inventors: Sara H. Basson, Dimitri Kanevsky, Edward Emile Kelley, Bhuvana Ramabhadran
Progressive application of knowledge sources in multistage speech recognition

Patent number: 8386251

Abstract: A speech recognition system is provided with iteratively refined multiple passes through the received data to enhance the accuracy of the results by introducing constraints and adaptation from initial passes into subsequent recognition operations. The multiple passes are performed on an initial utterance received from a user. The iteratively enhanced subsequent passes are also performed on following utterances received from the user increasing an overall system efficiency and accuracy.

Type: Grant

Filed: June 8, 2009

Date of Patent: February 26, 2013

Assignee: Microsoft Corporation

Inventors: Nikko Strom, Julian Odell, Jon Hamaker
Synchronise an audio cursor and a text cursor during editing

Patent number: 8380509

Abstract: A speech recognition device (1) processes speech data (SD) of a dictation and establishes recognized text information (ETI) and link information (LI) of the dictation. In a synchronous playback mode of the speech recognition device (1), during acoustic playback of the dictation a correction device (10) synchronously marks the word of the recognized text information (ETI) which word relates to speech data (SD) just played back marked by link information (LI) is marked synchronously, the just marked word featuring the position of an audio cursor (AC). When a user of the speech recognition device (1) recognizes an incorrect word, he positions a text cursor (TC) at the incorrect word and corrects it. Cursor synchronization means (15) makes it possible to synchronize text cursor (TC) with audio cursor (AC) or audio cursor (AC) with text cursor (TC) so the positioning of the respective cursor (AC, TC) is simplified considerably.

Type: Grant

Filed: February 13, 2012

Date of Patent: February 19, 2013

Assignee: Nuance Communications Austria GmbH

Inventor: Wolfgang Gschwendtner
Improving speech capabilities of a multimodal application

Patent number: 8380513

Abstract: Improving speech capabilities of a multimodal application including receiving, by the multimodal browser, a media file having a metadata container; retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser; determining whether the speech artifact includes a grammar rule or a pronunciation rule; if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule; and if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule.

Type: Grant

Filed: May 19, 2009

Date of Patent: February 19, 2013

Assignee: International Business Machines Corporation

Inventors: Ciprian Agapi, William K. Bodin, Charles W. Cross, Jr.
Systems and methods for managing interactions from multiple speech-enabled applications

Patent number: 8374879

Abstract: Systems and methods are described for speech systems that utilize an interaction manager to manage interactions—also known as dialogues—from one or more applications. The interactions are managed properly even if multiple applications use different grammars. The interaction manager maintains an interaction list. An application wishing to utilize the speech system submits one or more interactions to the interaction manager. Interactions are normally processed in the order in which they are received. An exception to this rule is an interaction that is configured by an application to be processed immediately, which causes the interaction manager to place the interaction at the front of the interaction list of interactions. If an application has designated an interaction to interrupt a currently processing interaction, then the newly submitted application will interrupt any interaction currently being processed and, therefore, it will be processed immediately.

Type: Grant

Filed: December 16, 2005

Date of Patent: February 12, 2013

Assignee: Microsoft Corporation

Inventors: Stephen Russell Falcon, Clement Yip, Dan Banay, David Miller
Audio envelopes

Patent number: 8374878

Abstract: Audio or sound envelopes contain or are combined with a sound module for generating and playing prerecorded sound tracks upon the opening of the envelope or removal of its contents. Operation of the sound module may be activated by the opening of the envelope flap, or by removal of the envelope contents. The flap is configured with a removable strip which protects the sound module from damage before and after opening. The sound module can be replayed by repeated operation of the flap. Alternate embodiments of the audio envelopes have other structural or operational features which work in concert with the sound module.

Type: Grant

Filed: June 23, 2009

Date of Patent: February 12, 2013

Assignee: American Greetings Corporation

Inventors: Carol Miller, Mary McClain, David Mayer, Sharon Bogdanski, Kimberly Bikowski, Theresa Muri, Julie Vojtko
Two-phase editing of signal data

Patent number: 8364294

Abstract: Tools and techniques are provided to allow the user of a signal editing application to retain control over individual changes, while still relieving the user of the responsibility of manually identifying problems. Specifically, tools and techniques are provided which separate the automated finding of potential problems from the automated correction of those problems. Thus, editing is performed in two phases, referred to herein as the “analysis” phase and the “action” phase. During the analysis phase, the signal editing application automatically identifies target areas within the signal that may be of particular interest to the user. During the “action” phase, the user is presented with the results of the analysis phase, and is able to decide what action to take relative to each target area.

Type: Grant

Filed: August 1, 2005

Date of Patent: January 29, 2013

Assignee: Apple Inc.

Inventors: Christopher J. Moulios, Nikhil M. Bhatt
Method and arrangement for managing grammar options in a graphical callflow builder

Patent number: 8355918

Abstract: A method (10) in a speech recognition application callflow can include the steps of assigning (11) an individual option and a pre-built grammar to a same prompt, treating (15) the individual option as a valid output of the pre-built grammar if the individual option is a potential valid match to a recognition phrase (12) or an annotation (13) in the pre-built grammar, and treating (14) the individual option as an independent grammar from the pre-built grammar if the individual option fails to be a potential valid match to the recognition phrase or the annotation in the pre-built grammar.

Type: Grant

Filed: January 5, 2012

Date of Patent: January 15, 2013

Assignee: Nuance Communications, Inc.

Inventors: Ciprian Agapi, Felipe Gomez, James R. Lewis, Vanessa V. Michelini
Parametric representation of spatial audio

Patent number: 8340302

Abstract: In summary, this application describes a psycho-acoustically motivated, parametric description of the spatial attributes of multichannel audio signals. This parametric description allows strong bitrate reductions in audio coders, since only one monaural signal has to be transmitted, combined with (quantized) parameters which describe the spatial properties of the signal. The decoder can form the original amount of audio channels by applying the spatial parameters. For near-CD-quality stereo audio, a bitrate associated with these spatial parameters of 10 kbit/s or less seems sufficient to reproduce the correct spatial impression at the receiving end.

Type: Grant

Filed: April 22, 2003

Date of Patent: December 25, 2012

Assignee: Koninklijke Philips Electronics N.V.

Inventors: Dirk Jeroen Breebaart, Steven Leonardus Josephus Dimphina Elisabeth Van De Par
Spatial audio

Patent number: 8331572

Abstract: In summary, this application describes a psycho-acoustically motivated, parametric description of the spatial attributes of multichannel audio signals. This parametric description allows strong bitrate reductions in audio coders, since only one monaural signal has to be transmitted, combined with (quantized) parameters which describe the spatial properties of the signal. The decoder can form the original amount of audio channels by applying the spatial parameters. For near-CD-quality stereo audio, a bitrate associated with these spatial parameters of 10 kbit/s or less seems sufficient to reproduce the correct spatial impression at the receiving end.

Type: Grant

Filed: July 27, 2009

Date of Patent: December 11, 2012

Assignee: Koninklijke Philips Electronics N.V.

Inventors: Dirk Jeroen Breebaart, Steven Leonardus Josephus Dimphina Elisabeth Van De Par
Recording and/or reproducing apparatus and recording apparatus

Patent number: 8315723

Abstract: A recording and/or reproducing apparatus includes a microphone, a semiconductor memory, an operating section and a controller. An output signal from the microphone is written in the semiconductor memory and the written signals are read out from the semiconductor memory. The operating section performs input processing for writing a digital signal outputted by an analog/digital converter, reading out the digital signal stored in the semiconductor memory and for erasing the digital signal stored in the semiconductor memory. The control section controls the writing of the microphone output signal in the semiconductor memory based on an input from the operating section and the readout of the digital signal stored in the semiconductor memory.

Type: Grant

Filed: October 3, 2005

Date of Patent: November 20, 2012

Assignee: Sony Corporation

Inventor: Kenichi Iida
Method and apparatus for efficiently accounting for the temporal nature of audio processing

Patent number: 8311657

Abstract: Some embodiments of the invention provide a computer system for processing an audio track. This system includes at least on DSP for processing the audio track. It also includes an application for editing the audio track. To process audio data in a first interval of the audio track, the application first asks and obtains from the DSP an impulse response parameter related to the DSP's processing of audio data. From the received impulse response parameter, the application identifies a second audio track interval that is before the first interval. To process audio data in the first interval, the application then directs the DSP to process audio data within the first and second intervals.

Type: Grant

Filed: May 23, 2008

Date of Patent: November 13, 2012

Assignee: Apple Inc.

Inventors: Alan C. Cannistraro, William George Stewart, Roger A. Powell, Kevin Christopher Rogers, Kelly B. Jacklin, Doug Wyatt
Method and apparatus for audio signal expansion and compression

Patent number: 8306828

Abstract: An audio signal expansion and compression method for expanding and compressing an audio signal in a time domain, includes the steps of setting an initial value of a signal comparison length of a first comparison interval and a second comparison interval, used for detection of two similar waveforms in the audio signal, equal to or larger than a minimum waveform detection length, determining an interval length of the two similar waveforms while changing a shift amount of the first comparison interval and the second comparison interval so that the shift amount does not exceed the signal comparison length, and expanding or compressing the audio signal in the time domain on the basis of the interval length of the two similar waveforms.

Type: Grant

Filed: May 10, 2007

Date of Patent: November 6, 2012

Assignee: Sony Corporation

Inventors: Osamu Nakamura, Mototsugu Abe, Masayuki Nishiguchi
Method of managing sound source and apparatus therefor

Patent number: 8300851

Abstract: A method of managing a sound source in a digital AV device and an apparatus thereof are provided. The method of managing a sound source in a digital AV device includes: extracting at least one sound source from sound being reproduced through the digital AV device; mapping an image to the extracted sound source; and managing the sound sources by using the mapped image. In addition, preferably, the extracted sound source is registered, changed, deleted, selectively reproduced, or selectively deleted by using the image. Accordingly, sound being output can be visually managed by handling the sound sources separately, a desired sound source can be selectively reproduced or removed such that utilization of the digital AV device can be enhanced.

Type: Grant

Filed: November 10, 2005

Date of Patent: October 30, 2012

Assignee: Samsung Electronics Co., Ltd.

Inventors: Jung-eun Shin, Eun-ha Lee
Identifying and generating audio cohorts based on audio data input

Patent number: 8301443

Abstract: A computer implemented method, apparatus, and computer program product for generating audio cohorts. An audio analysis engine receives audio data from a set of audio input devices. The audio data is associated with a plurality of objects. The audio data comprises a set of audio patterns. The audio data is processed to identify attributes of the audio data to form digital audio data. The digital audio data comprises metadata describing the attributes of the audio data. A set of audio cohorts is generated using the digital audio data and cohort criteria. Each audio cohort in the set of audio cohorts comprises a set of objects from the plurality of objects that share at least one audio attribute in common.

Type: Grant

Filed: November 21, 2008

Date of Patent: October 30, 2012

Assignee: International Business Machines Corporation

Inventors: Robert Lee Angell, Robert R Friedlander, James R Kraemer
Emphasis of short-duration transient speech features

Patent number: 8296154

Abstract: A sound processor including a microphone (1), a pre-amplifier (2), a bank of N parallel filters (3), means for detecting short-duration transitions in the envelope signal of each filter channel, and means for applying gain to the outputs of these filter channels in which the gain is related to a function of the second-order derivative of the slow-varying envelope signal in each filter channel, to assist in perception of low-intensity sort-duration speech features in said signal.

Type: Grant

Filed: October 28, 2008

Date of Patent: October 23, 2012

Assignee: Hearworks Pty Limited

Inventors: Andrew E. Vandali, Graeme M. Clark
Audio signal processing apparatus, audio signal processing method, and program for having the method executed by computer

Patent number: 8296143

Abstract: An audio waveform processing not imparting any feeling of strangeness and high in definition, in which time stretch and pitch shift are performed by a vocoder method, and the variation of phase over the whole waveform caused by the vocoder method at all times is reduced. An audio input waveform is handled as one band as it is or subjected to frequency band division into bands. While performing time stretch and pitch shift of each band waveform like conventional vocoder methods, the waveforms are combined. The combined waveform of the band is phase-synchronized at regular intervals to reduce the variation of phase. The phase-synchronized waveforms of the band are added, thus obtaining the final output waveform.

Type: Grant

Filed: December 26, 2005

Date of Patent: October 23, 2012

Assignee: P Softhouse Co., ltd.

Inventor: Takuma Kudoh
Audio font output device, font database, and language input front end processor

Patent number: 8285547

Abstract: An audio font output device is disclosed that is able to effectively convert characters or text into an audio signal recognizable by the acoustic sense of human beings. The audio font output device includes a font data base that stores a character corresponding to a symbol code or picture data of a symbol, first audio data corresponding to the symbol code, a symbol display unit that displays the character corresponding to the symbol code or the symbol based on the picture data, and an audio output unit that outputs an audio signal based on the first audio data corresponding to the symbol code.

Type: Grant

Filed: April 3, 2006

Date of Patent: October 9, 2012

Assignee: Ricoh Company, Ltd.

Inventor: Atsushi Koinuma
Voice quality conversion apparatus, pitch conversion apparatus, and voice quality conversion method

Patent number: 8280738

Abstract: The voice quality conversion apparatus includes: low-frequency harmonic level calculating units and a harmonic level mixing unit for calculating a low-frequency sound source spectrum by mixing a level of a harmonic of an input sound source waveform and a level of a harmonic of a target sound source waveform at a predetermined conversion ratio for each order of harmonics including fundamental, in a frequency range equal to or lower than a boundary frequency; a high-frequency spectral envelope mixing unit that calculates a high-frequency sound source spectrum by mixing the input sound source spectrum and the target sound source spectrum at the predetermined conversion ratio in a frequency range larger than the boundary frequency; and a spectrum combining unit that combines the low-frequency sound source spectrum with the high-frequency sound source spectrum at the boundary frequency to generate a sound source spectrum for an entire frequency range.

Type: Grant

Filed: January 31, 2011

Date of Patent: October 2, 2012

Assignee: Panasonic Corporation

Inventors: Yoshifumi Hirose, Takahiro Kamai
Apparatus performing translation process from inputted speech

Patent number: 8275603

Abstract: A speech translating apparatus includes a input unit, a speech recognizing unit, a translating unit, a first dividing unit, a second dividing unit, an associating unit, and an outputting unit. The input unit inputs a speech in a first language. The speech recognizing unit generates a first text from the speech. The translating unit translates the first text into a second language and generates a second text. The first dividing unit divides the first text and generates first phrases. The second dividing unit divides the second text and generates second phrases. The associating unit associates semantically equivalent phrases within each group of phrases. The outputting unit sequentially outputs the associated phrases in a phrase order within the second text.

Type: Grant

Filed: September 4, 2007

Date of Patent: September 25, 2012

Assignee: Kabushiki Kaisha Toshiba

Inventors: Kentaro Furihata, Tetsuro Chino, Satoshi Kamatani
Sound masking system and masking sound generation method

Patent number: 8271288

Abstract: In a masking sound generation apparatus, a CPU analyzes a speech utterance speed of a received sound signal. Then, the CPU copies the received sound signal into a plurality of sound signals and performs the following processing on each of the sound signals. Namely, the CPU divides each of the sound signals into frames on the basis of a frame length determined on the basis of the speech utterance speed. Reverse process is performed on each of the frames to replace a waveform of the frame with a reverse waveform, and a windowing process is performed to achieve a smooth connection between the frames. Then, the CPU randomly rearranges the order of the frames and mixes the plurality of sound signals to generate a masking sound signal.

Type: Grant

Filed: September 22, 2011

Date of Patent: September 18, 2012

Assignee: Yamaha Corporation

Inventors: Atsuko Ito, Yasushi Shimizu, Akira Miki, Masato Hata
Generating a web podcast interview by selecting interview voices through text-to-speech synthesis

Patent number: 8255221

Abstract: Disclosed is a system and method for generating a web podcast interview that allows a single user to create his own multi-voices interview from his computer. The method allows the user to enter a set of questions from a text file using a text editor. (Answers may also be entered from a text file although this is not the more preferred embodiment.) For each question, the user may select one particular interviewer voice among a plurality of predefined interviewer voices, and by using a text-to-speech module in a text-to-speech server, each question is converted into an audio question having the selected interviewer voice. Then, the user preferably records answers to each audio question using a telephone. And a questions/answers sequence in a podcast compliant format is generated.

Type: Grant

Filed: December 1, 2008

Date of Patent: August 28, 2012

Assignee: International Business Machines Corporation

Inventors: Steve Groeger, Brian Heasman, Christopher von Koschembahr, Yuk-Lun Wong
Video surveillance

Patent number: 8244542

Abstract: A method, article of manufacture, and apparatus for monitoring a location having a plurality of audio sensors and video sensors are disclosed. In an embodiment, this comprises receiving auditory data, comparing a portion of the auditory data to a lexicon comprising a plurality of keywords to determine if there is a match to a keyword from the lexicon, and if a match is found, selecting at least one video sensor to monitor an area to be monitored. Video data from the video sensor is archived with the auditory data and metadata. The video sensor is selected by determining video sensors associated with the areas to be monitored. A lookup table is used to determine the association. Cartesian coordinates may be used to determine positions of components and their areas of coverage.

Type: Grant

Filed: March 31, 2005

Date of Patent: August 14, 2012

Assignee: EMC Corporation

Inventors: Christopher Hercules Claudatos, William Dale Andruss, Richard Urmston, John Louis Acott
Methods and apparatus to present a video program to a visually impaired person

Patent number: 8229748

Abstract: Methods and apparatus to present a video program to a visually impaired person are disclosed. An example method comprises receiving a video stream and an associated audio stream of a video program, detecting a portion of the video program that is not readily consumable by a visually impaired person, obtaining text associated with the portion of the video program, converting the text to a second audio stream, and combining the second audio stream with the associated audio stream.

Type: Grant

Filed: April 14, 2008

Date of Patent: July 24, 2012

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Hisao M. Chang, Horst Schroeter
Selecting features of displayed audio data across time

Patent number: 8229754

Abstract: Systems, methods, and computer program products for graphically displaying audio data are provided. In some implementations, a method is provided. The method includes displaying a graphical visual representation of digital audio data, the representation displaying a feature of the audio data on a feature axis with respect to time on a time axis. The method also includes receiving an input in the graphical visual representation selecting a range with respect to the feature and automatically extending the selected range with respect to time to define a selected region of the visual representation, where the extended range with respect to time is predefined and ignoring any component of the received input with respect to time.

Type: Grant

Filed: October 23, 2006

Date of Patent: July 24, 2012

Assignee: Adobe Systems Incorporated

Inventors: Daniel Ramirez, Todd Orler, Shenzhi Zhang, Jeffrey Garner
Closed caption production device, method and program for synthesizing video, sound and text

Patent number: 8223269

Abstract: In a closed caption production device, video recognition processing of an input video signal is performed by a video recognizer. This causes a working object in video to be recognized. In addition, a sound recognizer performs sound recognition processing of an input sound signal. This causes a position of a sound source to be estimated. A controller performs linking processing by comparing information of the working object recognized by the video recognition processing with positional information of the sound source estimated by the sound recognition processing. This causes a position of a closed caption produced based on the sound signal to be set in the vicinity of the working object in the video.

Type: Grant

Filed: September 19, 2007

Date of Patent: July 17, 2012

Assignee: Panasonic Corporation

Inventor: Isao Ikegami
Stereo to mono conversion for voice conferencing

Patent number: 8219400

Abstract: Stereo to mono voice conferencing conversion is performed during a voice conference. Conferencing equipment receives audio for right and left channels and filters each of the channels into a plurality of bands. For each band of each channel, the equipment determines an energy level and compares each energy level for each band of the right channel to each energy level for each corresponding band of the left channel. Based on the comparison, the equipment determines which channel has more audio resulting from speech. Based on the determination, the equipment adjusts delivery of the audio from the right and left channels to a mono channel for transmission to endpoints only capable of mono audio in the voice conference.

Type: Grant

Filed: November 21, 2008

Date of Patent: July 10, 2012

Assignee: Polycom, Inc.

Inventor: Peter L. Chu
Multipurpose media players

Patent number: 8204750

Abstract: Disclosed are Multipurpose Media Players that enable users to create transcriptions, closed captions, and/or logs of digitized recordings, that enable the presentation of transcripts, closed captions, logs, and digitized recordings in a correlated manner to users, that enable users to compose one or more scenes of a production, and that enable users to compose storyboards for a production. The multipurpose media players can be embodied within Internet browser environments, thereby providing high availability of the multipurpose players across software platforms, networks, and physical locations.

Type: Grant

Filed: February 14, 2006

Date of Patent: June 19, 2012

Assignee: Teresis Media Management

Inventor: Keri DeWitt
Late frame recovery method

Patent number: 8174981

Abstract: Method of processing a transmitted encoded media data stream is received. If a data element arrives prior to, or at, a predetermined playout deadline, the data element is decoded, the media represented by the decoded data element is played, and the data element is provided to a decoder state machine to update a decoder state. If a data element arrives after the predetermined playout deadline, the data element is provided to the decoder state machine to update the decoder state. In one embodiment, if the specified data element fails to arrive by the playout deadline, a subsequently received data element is saved in memory. Then, if the specified data element arrives after the predetermined playout deadline, the specified data element and the saved, subsequently received, data element are provided to the decoder state machine to update the decoder state.

Type: Grant

Filed: December 2, 2008

Date of Patent: May 8, 2012

Assignee: Broadcom Corporation

Inventor: Wilfrid LeBlanc
Reproducing apparatus

Patent number: 8165888

Abstract: Disclosed is a reproducing apparatus comprising: a reproduction section to reproduce reproduction data comprising sound data and/or image data; a selection section to calculate evaluation values between a link source set for the reproduction data and each of a plurality of link destinations corresponding to the link source by a predetermined arithmetic expression based on link information of the plurality of link destinations, and to select a link destination having a highest evaluation among the evaluation values out of the plurality of link destinations; and a reproduction control section to move a reproduction point of the reproduction data reproduced by the reproduction section to a position corresponding to the link destination by linking the link source with the link destination when the reproduction point reaches a given point with respect to a position corresponding to the link source, and to instruct the reproduction section to reproduce the reproduction data.

Type: Grant

Filed: March 14, 2008

Date of Patent: April 24, 2012

Assignees: The University of Electro-Communications, Funai Electric Co., Ltd.

Inventors: Kota Takahashi, Yasuo Masaki
Voice quality edit device and voice quality edit method

Patent number: 8155964

Abstract: This invention includes: a voice quality feature database (101) holding voice quality features; a speaker attribute database (106) holding, for each voice quality feature, an identifier enabling a user to expect a voice quality of the voice quality feature; a weight setting unit (103) setting a weight for each acoustic feature of a voice quality; a scaling unit (105) calculating display coordinates of each voice quality feature based on the acoustic features in the voice quality feature and the weights set by the weight setting unit (103); a display unit (107) displaying the identifier of each voice quality feature on the calculated display coordinates; a position input unit (108) receiving designated coordinates; and a voice quality mix unit (110) (i) calculating a distance between (1) the received designated coordinates and (2) the display coordinates of each of a part or all of the voice quality features, and (ii) mixing the acoustic features of the part or all of the voice quality features together based

Type: Grant

Filed: June 4, 2008

Date of Patent: April 10, 2012

Assignee: Panasonic Corporation

Inventors: Yoshifumi Hirose, Takahiro Kamai
Media interface for converting voice to text

Patent number: 8145497

Abstract: Provided are a user interface for processing digital data, a method for processing a media interface, and a recording medium thereof. The user interface is used for converting a selected script into voice to generate digital data having a form of a voice file corresponding to the script, or for managing the generated digital data. In the method, the user interface is displayed. The user interface includes at least a text window on which a script to be converted into voice is written, and an icon to be selected for converting the script written on the text window into voice.

Type: Grant

Filed: July 10, 2008

Date of Patent: March 27, 2012

Assignee: LG Electronics Inc.

Inventors: Tae Hee Ahn, Sung Hun Kim, Dong Hoon Lee

prev 1 2 3 4 5 6 7 8 … next