Sound Editing Patents (Class 704/278)
-
Patent number: 8515759Abstract: An apparatus for synthesizing a rendered output signal having a first audio channel and a second audio channel includes a decorrelator stage for generating a decorrelator signal based on a downmix signal, and a combiner for performing a weighted combination of the downmix signal and a decorrelated signal based on parametric audio object information, downmix information and target rendering information. The combiner solves the problem of optimally combining matrixing with decorrelation for a high quality stereo scene reproduction of a number of individual audio objects using a multichannel downmix.Type: GrantFiled: April 23, 2008Date of Patent: August 20, 2013Assignee: Dolby International ABInventors: Jonas Engdegard, Heiko Purnhagen, Barbara Resch, Lars Villemoes, Cornelia Falch, Juergen Herre, Johannes Hilpert, Andreas Hoelzer, Leonid Terentiev
-
Patent number: 8515753Abstract: The example embodiment of the present invention provides an acoustic model adaptation method for enhancing recognition performance for a non-native speaker's speech. In order to adapt acoustic models, first, pronunciation variations are examined by analyzing a non-native speaker's speech. Thereafter, based on variation pronunciation of a non-native speaker's speech, acoustic models are adapted in a state-tying step during a training process of acoustic models. When the present invention for adapting acoustic models and a conventional acoustic model adaptation scheme are combined, more-enhanced recognition performance can be obtained. The example embodiment of the present invention enhances recognition performance for a non-native speaker's speech while reducing the degradation of recognition performance for a native speaker's speech.Type: GrantFiled: March 30, 2007Date of Patent: August 20, 2013Assignee: Gwangju Institute of Science and TechnologyInventors: Hong Kook Kim, Yoo Rhee Oh, Jae Sam Yoon
-
Publication number: 20130211845Abstract: A method for automatically generating at least one voice message with the desired voice expression, starting from a prestored voice message, including assigning a vocal category to one word or to groups of words of the prestored message, computing, based on a vocal category/vocal parameter correlation table, a predetermined level of each one of the vocal parameters, emitting said voice message, with the vocal parameter levels computed for each word or group of words.Type: ApplicationFiled: January 24, 2013Publication date: August 15, 2013Applicant: LA VOCE.NET DI CIRO IMPARATOInventor: LA VOCE.NET DI CIRO IMPARATO
-
Patent number: 8510112Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, modifying the identified segments in the primary speech database using selected mappings, enhancing the primary speech database by substituting the modified segments for the corresponding identified database segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.Type: GrantFiled: August 31, 2006Date of Patent: August 13, 2013Assignee: AT&T Intellectual Property II, L.P.Inventors: Alistair Conkie, Ann Syrdal
-
Patent number: 8484035Abstract: A method of altering a social signaling characteristic of a speech signal. A statistically large number of speech samples created by different speakers in different tones of voice are evaluated to determine one or more relationships that exist between a selected social signaling characteristic and one or more measurable parameters of the speech samples. An input audio voice signal is then processed in accordance with these relationships to modify one or more of controllable parameters of input audio voice signal to produce a modified output audio voice signal in which said selected social signaling characteristic is modified. In a specific illustrative embodiment, a two-level hidden Markov model is used to identify voiced and unvoiced speech segments and selected controllable characteristics of these speech segments are modified to alter the desired social signaling characteristic.Type: GrantFiled: September 6, 2007Date of Patent: July 9, 2013Assignee: Massachusetts Institute of TechnologyInventor: Alex Paul Pentland
-
Patent number: 8478599Abstract: An embodiment of the present invention is a method of presenting a media work which includes: detecting media work content properties in a portion of the media work; associating a presentation rate of the portion with the detected media work content properties; and presenting the portion at the presentation rate; wherein the media work content properties include one or more of: (a) indicia of a number of syllables in utterances; (b) indicia of a number of letters in a word; (c) indicia of the complexity of grammatical structures in portions of the media work; (d) indicia of arrival rate of newly presented objects; (e) indicia of temporal proximity of between events in portions of the media work or (f) indicia of number of phonemes per unit of time in portions of the media work.Type: GrantFiled: May 18, 2009Date of Patent: July 2, 2013Assignee: Enounce, Inc.Inventor: Donald J. Hejna, Jr.
-
Patent number: 8463612Abstract: Various embodiments of the invention provide a facility for monitoring audio events on a computer, including without limitation voice conversations (which often are carried on a digital transport platform, such as VoIP and/or other technologies). In a set of embodiments, a system intercepts the audio streams that flow into and out of an application program on a monitored client computer, for instance by inserting an audio stream capture program between a monitored application and the function calls in the audio driver libraries used by the application program to handle the audio streams. In some cases, this intercept does not disrupt the normal operation of the application. Optionally, the audio stream capture program takes the input and output audio streams and passes them through audio mixer and audio compression programs to yield a condensed recording of the original conversation.Type: GrantFiled: November 6, 2006Date of Patent: June 11, 2013Assignee: Raytheon CompanyInventors: Greg S. Neath, John W. Rosenvall
-
Patent number: 8457688Abstract: A mobile wireless communications device may include a housing, a wireless transceiver carried by the housing, an audio transducer carried by the housing, and a novelty voice alteration processor carried by the housing and coupled to the wireless transceiver and the audio transducer and configured to alter voice communications. For example, the novelty voice alteration processor may comprise a memory and a processor cooperating therewith to alter the voice communications.Type: GrantFiled: February 26, 2009Date of Patent: June 4, 2013Assignee: Research In Motion LimitedInventors: Fredrik Stenmark, Daniel Hanson
-
Patent number: 8457771Abstract: A data stream is filtered to produce a filtered data stream. The data stream is analyzed based on an acoustic parameter to determine whether a predetermined condition is satisfied. At least one extraneous portion of the data stream, in which the predetermined condition is satisfied, is determined. Thereafter, the at least one extraneous portion is deleted from the data stream to produce the filtered data stream.Type: GrantFiled: December 10, 2009Date of Patent: June 4, 2013Assignee: AT&T Intellectual Property I, L.P.Inventors: Yeon-Jun Kim, I. Dan Melamed, Bernard S. Renger, Steven Neil Tischer
-
Patent number: 8452604Abstract: Recognizable visual and/or audio artifacts, such as recognizable sounds, are introduced into visual and/or audio content in an identifying pattern to generate a signed visual and/or audio recording for distribution over a digital communications medium. A library of images and/or sounds may be provided, and the image and/or sounds from the library may be selectively inserted to generate the identifying pattern. The images and/or sounds may be inserted responsive to one or more parameters associated with creation of the visual and/or audio content. A representation of the identifying pattern may be generated and stored in a repository, e.g., an independent repository configured to maintain creative rights information. The stored pattern may be retrieved from the repository and compared to an unidentified visual and/or audio recording to determine an identity thereof.Type: GrantFiled: August 15, 2005Date of Patent: May 28, 2013Assignee: AT&T Intellectual Property I, L.P.Inventor: Steven Tischer
-
Patent number: 8447604Abstract: Provided in some embodiments is a method including receiving ordered script words are indicative of dialogue words to be spoken, receiving audio data corresponding to at least a portion of the dialogue words to be spoken and including timecodes associated with dialogue words, generating a matrix of the ordered script words versus the dialogue words, aligning the matrix to determine hard alignment points that include matching consecutive sequences of ordered script words with corresponding sequences of dialogue words, partitioning the matrix of ordered script words into sub-matrices bounded by adjacent hard-alignment points and including corresponding sub-sets the script and dialogue words between the hard-alignment points, and aligning each of the sub-matrices.Type: GrantFiled: May 28, 2010Date of Patent: May 21, 2013Assignee: Adobe Systems IncorporatedInventor: Walter W. Chang
-
Patent number: 8438035Abstract: When there are missing voice-transmission-signals, a repetition-section calculating unit sets a plurality of repetition sections of different lengths that are determined to be similar to the voice-transmission-signals preceding the missing voice-transmission-signal, the repetition sections being determined with respect to stationary voice-transmission-signals stored in a normal signal storage unit, the stationary voice-transmission-signals being selected from the previously input voice-transmission-signals. A controller generates a concealment signal using the repetition sections.Type: GrantFiled: December 31, 2007Date of Patent: May 7, 2013Assignee: Fujitsu LimitedInventors: Kaori Endo, Yasuji Ota, Chikako Matsumoto
-
Patent number: 8433988Abstract: A method and apparatus are capable of masking a signal loss condition. According to an exemplary embodiment, the method includes steps of receiving a signal, detecting a period of loss of the signal, and enabling a received portion of the signal to be reproduced continuously and causing a portion of the signal lost during the period to be skipped.Type: GrantFiled: December 3, 2008Date of Patent: April 30, 2013Assignee: Thomson LicensingInventors: Mark Alan Schultz, Ronald Douglas Johnson
-
Patent number: 8433073Abstract: In a sound effect applying apparatus, an input part frequency-analyzes an input signal of sound or voice for detecting a plurality of local peaks of harmonics contained in the input signal. A subharmonics provision part adds a spectrum component of subharmonics between the detected local peaks so as to provide the input signal with a sound effect. An output part converts the input signal of a frequency domain containing the added spectrum component into an output signal of a time domain for generating the sound or voice provided with the sound effect.Type: GrantFiled: June 22, 2005Date of Patent: April 30, 2013Assignee: Yamaha CorporationInventors: Yasuo Yoshioka, Alex Loscos
-
Patent number: 8401861Abstract: A method for generating a frequency warping function comprising preparing the training speech of a source and a target speaker; performing frame alignment on the training speech of the speakers; selecting aligned frames from the frame-aligned training speech of the speakers; extracting corresponding sets of formant parameters from the selected aligned frames; and generating a frequency warping function based on the corresponding sets of formant parameters. The step of selecting aligned frames preferably selects a pair of aligned frames in the middle of the same or similar frame-aligned phonemes with the same or similar contexts in the speech of the source speaker and target speaker. The step of generating a frequency warping function preferably uses the various pairs of corresponding formant parameters in the corresponding sets of formant parameters as key positions in a piecewise linear frequency warping function to generate the frequency warping function.Type: GrantFiled: January 17, 2007Date of Patent: March 19, 2013Assignee: Nuance Communications, Inc.Inventors: Shuang Zhi Wei, Raimo Bakis, Ellen Marie Eide, Liqin Shen
-
Patent number: 8401865Abstract: This invention relates to a method, a computer program product, apparatuses and a system for extracting coded parameter set from an encoded audio/speech stream, said audio/speech stream being distributed to a sequence of packets, and generating a time scaled encoded audio/speech stream in the parameter coded domain using said extracted coded parameter set.Type: GrantFiled: July 18, 2007Date of Patent: March 19, 2013Assignee: Nokia CorporationInventors: Pasi Sakari Ojala, Ari Kalevi Lakaniemi
-
Patent number: 8392180Abstract: In general, the techniques are described for adjusting audio gain levels for multi-talker audio. In one example, an audio system monitors an audio stream for the presence of a new talker. Upon identifying a new talker, the system determines whether the new talker is a first-time talker. For a first-time talker, the system executes a fast-attack/decay automatic gain control (AGC) algorithm to quickly determine a gain value for the first-time talker. The system additionally executes standard AGC techniques to refine the gain for the first-time talker while the first-time talker continues speaking. When a steady state within a decibel threshold is attained using standard AGC for the first-time talker, the system stores the steady state gain for the first-time talker to storage. Upon identifying a previously-identified talker, the system retrieves from storage the steady state gain for the talker and applies the steady state gain to the audio stream.Type: GrantFiled: May 18, 2012Date of Patent: March 5, 2013Assignee: Google Inc.Inventors: Serge Lachapelle, Alexander Kjeldaas
-
Patent number: 8392197Abstract: A speaker speed conversion system includes: a risk site detection unit (22) for detecting sites of risk regarding sound quality from among speech that is received as input, a frame boundary detection unit (23) for searching for a plurality of points that can serve as candidates of frame boundaries from among speech that is received as input and, of these points, supplying as a frame boundary the point that is predicted to be best from the standpoint of sound quality, and an OLA unit (25) for implementing speed conversion based on the detection results in the frame boundary detection unit (23); wherein the frame boundary detection unit (23) eliminates, from candidates of frame boundaries, sites of risk regarding sound quality that were detected in the risk site detection unit (22).Type: GrantFiled: July 22, 2008Date of Patent: March 5, 2013Assignee: NEC CorporationInventor: Satoshi Hosokawa
-
Patent number: 8392195Abstract: A multiple audio/video data stream simulation method and system. A computing system receives first audio and/or video data streams. The first audio and/or video data streams include data associated with a first person and a second person. The computing system monitors the first audio and/or video data streams. The computing system identifies emotional attributes comprised by the first audio and/or video data streams. The computing system generates second audio and/or video data streams associated with the first audio and/or video data streams. The second audio and/or video data streams include the first audio and/or video data streams data without the emotional attributes. The computing system stores the second audio and/or video data streams.Type: GrantFiled: May 31, 2012Date of Patent: March 5, 2013Assignee: International Business Machines CorporationInventors: Sara H. Basson, Dimitri Kanevsky, Edward Emile Kelley, Bhuvana Ramabhadran
-
Patent number: 8386251Abstract: A speech recognition system is provided with iteratively refined multiple passes through the received data to enhance the accuracy of the results by introducing constraints and adaptation from initial passes into subsequent recognition operations. The multiple passes are performed on an initial utterance received from a user. The iteratively enhanced subsequent passes are also performed on following utterances received from the user increasing an overall system efficiency and accuracy.Type: GrantFiled: June 8, 2009Date of Patent: February 26, 2013Assignee: Microsoft CorporationInventors: Nikko Strom, Julian Odell, Jon Hamaker
-
Patent number: 8380509Abstract: A speech recognition device (1) processes speech data (SD) of a dictation and establishes recognized text information (ETI) and link information (LI) of the dictation. In a synchronous playback mode of the speech recognition device (1), during acoustic playback of the dictation a correction device (10) synchronously marks the word of the recognized text information (ETI) which word relates to speech data (SD) just played back marked by link information (LI) is marked synchronously, the just marked word featuring the position of an audio cursor (AC). When a user of the speech recognition device (1) recognizes an incorrect word, he positions a text cursor (TC) at the incorrect word and corrects it. Cursor synchronization means (15) makes it possible to synchronize text cursor (TC) with audio cursor (AC) or audio cursor (AC) with text cursor (TC) so the positioning of the respective cursor (AC, TC) is simplified considerably.Type: GrantFiled: February 13, 2012Date of Patent: February 19, 2013Assignee: Nuance Communications Austria GmbHInventor: Wolfgang Gschwendtner
-
Patent number: 8380513Abstract: Improving speech capabilities of a multimodal application including receiving, by the multimodal browser, a media file having a metadata container; retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser; determining whether the speech artifact includes a grammar rule or a pronunciation rule; if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule; and if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule.Type: GrantFiled: May 19, 2009Date of Patent: February 19, 2013Assignee: International Business Machines CorporationInventors: Ciprian Agapi, William K. Bodin, Charles W. Cross, Jr.
-
Patent number: 8374879Abstract: Systems and methods are described for speech systems that utilize an interaction manager to manage interactions—also known as dialogues—from one or more applications. The interactions are managed properly even if multiple applications use different grammars. The interaction manager maintains an interaction list. An application wishing to utilize the speech system submits one or more interactions to the interaction manager. Interactions are normally processed in the order in which they are received. An exception to this rule is an interaction that is configured by an application to be processed immediately, which causes the interaction manager to place the interaction at the front of the interaction list of interactions. If an application has designated an interaction to interrupt a currently processing interaction, then the newly submitted application will interrupt any interaction currently being processed and, therefore, it will be processed immediately.Type: GrantFiled: December 16, 2005Date of Patent: February 12, 2013Assignee: Microsoft CorporationInventors: Stephen Russell Falcon, Clement Yip, Dan Banay, David Miller
-
Patent number: 8374878Abstract: Audio or sound envelopes contain or are combined with a sound module for generating and playing prerecorded sound tracks upon the opening of the envelope or removal of its contents. Operation of the sound module may be activated by the opening of the envelope flap, or by removal of the envelope contents. The flap is configured with a removable strip which protects the sound module from damage before and after opening. The sound module can be replayed by repeated operation of the flap. Alternate embodiments of the audio envelopes have other structural or operational features which work in concert with the sound module.Type: GrantFiled: June 23, 2009Date of Patent: February 12, 2013Assignee: American Greetings CorporationInventors: Carol Miller, Mary McClain, David Mayer, Sharon Bogdanski, Kimberly Bikowski, Theresa Muri, Julie Vojtko
-
Patent number: 8364294Abstract: Tools and techniques are provided to allow the user of a signal editing application to retain control over individual changes, while still relieving the user of the responsibility of manually identifying problems. Specifically, tools and techniques are provided which separate the automated finding of potential problems from the automated correction of those problems. Thus, editing is performed in two phases, referred to herein as the “analysis” phase and the “action” phase. During the analysis phase, the signal editing application automatically identifies target areas within the signal that may be of particular interest to the user. During the “action” phase, the user is presented with the results of the analysis phase, and is able to decide what action to take relative to each target area.Type: GrantFiled: August 1, 2005Date of Patent: January 29, 2013Assignee: Apple Inc.Inventors: Christopher J. Moulios, Nikhil M. Bhatt
-
Patent number: 8355918Abstract: A method (10) in a speech recognition application callflow can include the steps of assigning (11) an individual option and a pre-built grammar to a same prompt, treating (15) the individual option as a valid output of the pre-built grammar if the individual option is a potential valid match to a recognition phrase (12) or an annotation (13) in the pre-built grammar, and treating (14) the individual option as an independent grammar from the pre-built grammar if the individual option fails to be a potential valid match to the recognition phrase or the annotation in the pre-built grammar.Type: GrantFiled: January 5, 2012Date of Patent: January 15, 2013Assignee: Nuance Communications, Inc.Inventors: Ciprian Agapi, Felipe Gomez, James R. Lewis, Vanessa V. Michelini
-
Patent number: 8340302Abstract: In summary, this application describes a psycho-acoustically motivated, parametric description of the spatial attributes of multichannel audio signals. This parametric description allows strong bitrate reductions in audio coders, since only one monaural signal has to be transmitted, combined with (quantized) parameters which describe the spatial properties of the signal. The decoder can form the original amount of audio channels by applying the spatial parameters. For near-CD-quality stereo audio, a bitrate associated with these spatial parameters of 10 kbit/s or less seems sufficient to reproduce the correct spatial impression at the receiving end.Type: GrantFiled: April 22, 2003Date of Patent: December 25, 2012Assignee: Koninklijke Philips Electronics N.V.Inventors: Dirk Jeroen Breebaart, Steven Leonardus Josephus Dimphina Elisabeth Van De Par
-
Patent number: 8331572Abstract: In summary, this application describes a psycho-acoustically motivated, parametric description of the spatial attributes of multichannel audio signals. This parametric description allows strong bitrate reductions in audio coders, since only one monaural signal has to be transmitted, combined with (quantized) parameters which describe the spatial properties of the signal. The decoder can form the original amount of audio channels by applying the spatial parameters. For near-CD-quality stereo audio, a bitrate associated with these spatial parameters of 10 kbit/s or less seems sufficient to reproduce the correct spatial impression at the receiving end.Type: GrantFiled: July 27, 2009Date of Patent: December 11, 2012Assignee: Koninklijke Philips Electronics N.V.Inventors: Dirk Jeroen Breebaart, Steven Leonardus Josephus Dimphina Elisabeth Van De Par
-
Patent number: 8315723Abstract: A recording and/or reproducing apparatus includes a microphone, a semiconductor memory, an operating section and a controller. An output signal from the microphone is written in the semiconductor memory and the written signals are read out from the semiconductor memory. The operating section performs input processing for writing a digital signal outputted by an analog/digital converter, reading out the digital signal stored in the semiconductor memory and for erasing the digital signal stored in the semiconductor memory. The control section controls the writing of the microphone output signal in the semiconductor memory based on an input from the operating section and the readout of the digital signal stored in the semiconductor memory.Type: GrantFiled: October 3, 2005Date of Patent: November 20, 2012Assignee: Sony CorporationInventor: Kenichi Iida
-
Patent number: 8311657Abstract: Some embodiments of the invention provide a computer system for processing an audio track. This system includes at least on DSP for processing the audio track. It also includes an application for editing the audio track. To process audio data in a first interval of the audio track, the application first asks and obtains from the DSP an impulse response parameter related to the DSP's processing of audio data. From the received impulse response parameter, the application identifies a second audio track interval that is before the first interval. To process audio data in the first interval, the application then directs the DSP to process audio data within the first and second intervals.Type: GrantFiled: May 23, 2008Date of Patent: November 13, 2012Assignee: Apple Inc.Inventors: Alan C. Cannistraro, William George Stewart, Roger A. Powell, Kevin Christopher Rogers, Kelly B. Jacklin, Doug Wyatt
-
Patent number: 8306828Abstract: An audio signal expansion and compression method for expanding and compressing an audio signal in a time domain, includes the steps of setting an initial value of a signal comparison length of a first comparison interval and a second comparison interval, used for detection of two similar waveforms in the audio signal, equal to or larger than a minimum waveform detection length, determining an interval length of the two similar waveforms while changing a shift amount of the first comparison interval and the second comparison interval so that the shift amount does not exceed the signal comparison length, and expanding or compressing the audio signal in the time domain on the basis of the interval length of the two similar waveforms.Type: GrantFiled: May 10, 2007Date of Patent: November 6, 2012Assignee: Sony CorporationInventors: Osamu Nakamura, Mototsugu Abe, Masayuki Nishiguchi
-
Patent number: 8300851Abstract: A method of managing a sound source in a digital AV device and an apparatus thereof are provided. The method of managing a sound source in a digital AV device includes: extracting at least one sound source from sound being reproduced through the digital AV device; mapping an image to the extracted sound source; and managing the sound sources by using the mapped image. In addition, preferably, the extracted sound source is registered, changed, deleted, selectively reproduced, or selectively deleted by using the image. Accordingly, sound being output can be visually managed by handling the sound sources separately, a desired sound source can be selectively reproduced or removed such that utilization of the digital AV device can be enhanced.Type: GrantFiled: November 10, 2005Date of Patent: October 30, 2012Assignee: Samsung Electronics Co., Ltd.Inventors: Jung-eun Shin, Eun-ha Lee
-
Patent number: 8301443Abstract: A computer implemented method, apparatus, and computer program product for generating audio cohorts. An audio analysis engine receives audio data from a set of audio input devices. The audio data is associated with a plurality of objects. The audio data comprises a set of audio patterns. The audio data is processed to identify attributes of the audio data to form digital audio data. The digital audio data comprises metadata describing the attributes of the audio data. A set of audio cohorts is generated using the digital audio data and cohort criteria. Each audio cohort in the set of audio cohorts comprises a set of objects from the plurality of objects that share at least one audio attribute in common.Type: GrantFiled: November 21, 2008Date of Patent: October 30, 2012Assignee: International Business Machines CorporationInventors: Robert Lee Angell, Robert R Friedlander, James R Kraemer
-
Patent number: 8296154Abstract: A sound processor including a microphone (1), a pre-amplifier (2), a bank of N parallel filters (3), means for detecting short-duration transitions in the envelope signal of each filter channel, and means for applying gain to the outputs of these filter channels in which the gain is related to a function of the second-order derivative of the slow-varying envelope signal in each filter channel, to assist in perception of low-intensity sort-duration speech features in said signal.Type: GrantFiled: October 28, 2008Date of Patent: October 23, 2012Assignee: Hearworks Pty LimitedInventors: Andrew E. Vandali, Graeme M. Clark
-
Patent number: 8296143Abstract: An audio waveform processing not imparting any feeling of strangeness and high in definition, in which time stretch and pitch shift are performed by a vocoder method, and the variation of phase over the whole waveform caused by the vocoder method at all times is reduced. An audio input waveform is handled as one band as it is or subjected to frequency band division into bands. While performing time stretch and pitch shift of each band waveform like conventional vocoder methods, the waveforms are combined. The combined waveform of the band is phase-synchronized at regular intervals to reduce the variation of phase. The phase-synchronized waveforms of the band are added, thus obtaining the final output waveform.Type: GrantFiled: December 26, 2005Date of Patent: October 23, 2012Assignee: P Softhouse Co., ltd.Inventor: Takuma Kudoh
-
Patent number: 8285547Abstract: An audio font output device is disclosed that is able to effectively convert characters or text into an audio signal recognizable by the acoustic sense of human beings. The audio font output device includes a font data base that stores a character corresponding to a symbol code or picture data of a symbol, first audio data corresponding to the symbol code, a symbol display unit that displays the character corresponding to the symbol code or the symbol based on the picture data, and an audio output unit that outputs an audio signal based on the first audio data corresponding to the symbol code.Type: GrantFiled: April 3, 2006Date of Patent: October 9, 2012Assignee: Ricoh Company, Ltd.Inventor: Atsushi Koinuma
-
Patent number: 8280738Abstract: The voice quality conversion apparatus includes: low-frequency harmonic level calculating units and a harmonic level mixing unit for calculating a low-frequency sound source spectrum by mixing a level of a harmonic of an input sound source waveform and a level of a harmonic of a target sound source waveform at a predetermined conversion ratio for each order of harmonics including fundamental, in a frequency range equal to or lower than a boundary frequency; a high-frequency spectral envelope mixing unit that calculates a high-frequency sound source spectrum by mixing the input sound source spectrum and the target sound source spectrum at the predetermined conversion ratio in a frequency range larger than the boundary frequency; and a spectrum combining unit that combines the low-frequency sound source spectrum with the high-frequency sound source spectrum at the boundary frequency to generate a sound source spectrum for an entire frequency range.Type: GrantFiled: January 31, 2011Date of Patent: October 2, 2012Assignee: Panasonic CorporationInventors: Yoshifumi Hirose, Takahiro Kamai
-
Patent number: 8275603Abstract: A speech translating apparatus includes a input unit, a speech recognizing unit, a translating unit, a first dividing unit, a second dividing unit, an associating unit, and an outputting unit. The input unit inputs a speech in a first language. The speech recognizing unit generates a first text from the speech. The translating unit translates the first text into a second language and generates a second text. The first dividing unit divides the first text and generates first phrases. The second dividing unit divides the second text and generates second phrases. The associating unit associates semantically equivalent phrases within each group of phrases. The outputting unit sequentially outputs the associated phrases in a phrase order within the second text.Type: GrantFiled: September 4, 2007Date of Patent: September 25, 2012Assignee: Kabushiki Kaisha ToshibaInventors: Kentaro Furihata, Tetsuro Chino, Satoshi Kamatani
-
Patent number: 8271288Abstract: In a masking sound generation apparatus, a CPU analyzes a speech utterance speed of a received sound signal. Then, the CPU copies the received sound signal into a plurality of sound signals and performs the following processing on each of the sound signals. Namely, the CPU divides each of the sound signals into frames on the basis of a frame length determined on the basis of the speech utterance speed. Reverse process is performed on each of the frames to replace a waveform of the frame with a reverse waveform, and a windowing process is performed to achieve a smooth connection between the frames. Then, the CPU randomly rearranges the order of the frames and mixes the plurality of sound signals to generate a masking sound signal.Type: GrantFiled: September 22, 2011Date of Patent: September 18, 2012Assignee: Yamaha CorporationInventors: Atsuko Ito, Yasushi Shimizu, Akira Miki, Masato Hata
-
Patent number: 8255221Abstract: Disclosed is a system and method for generating a web podcast interview that allows a single user to create his own multi-voices interview from his computer. The method allows the user to enter a set of questions from a text file using a text editor. (Answers may also be entered from a text file although this is not the more preferred embodiment.) For each question, the user may select one particular interviewer voice among a plurality of predefined interviewer voices, and by using a text-to-speech module in a text-to-speech server, each question is converted into an audio question having the selected interviewer voice. Then, the user preferably records answers to each audio question using a telephone. And a questions/answers sequence in a podcast compliant format is generated.Type: GrantFiled: December 1, 2008Date of Patent: August 28, 2012Assignee: International Business Machines CorporationInventors: Steve Groeger, Brian Heasman, Christopher von Koschembahr, Yuk-Lun Wong
-
Patent number: 8244542Abstract: A method, article of manufacture, and apparatus for monitoring a location having a plurality of audio sensors and video sensors are disclosed. In an embodiment, this comprises receiving auditory data, comparing a portion of the auditory data to a lexicon comprising a plurality of keywords to determine if there is a match to a keyword from the lexicon, and if a match is found, selecting at least one video sensor to monitor an area to be monitored. Video data from the video sensor is archived with the auditory data and metadata. The video sensor is selected by determining video sensors associated with the areas to be monitored. A lookup table is used to determine the association. Cartesian coordinates may be used to determine positions of components and their areas of coverage.Type: GrantFiled: March 31, 2005Date of Patent: August 14, 2012Assignee: EMC CorporationInventors: Christopher Hercules Claudatos, William Dale Andruss, Richard Urmston, John Louis Acott
-
Patent number: 8229748Abstract: Methods and apparatus to present a video program to a visually impaired person are disclosed. An example method comprises receiving a video stream and an associated audio stream of a video program, detecting a portion of the video program that is not readily consumable by a visually impaired person, obtaining text associated with the portion of the video program, converting the text to a second audio stream, and combining the second audio stream with the associated audio stream.Type: GrantFiled: April 14, 2008Date of Patent: July 24, 2012Assignee: AT&T Intellectual Property I, L.P.Inventors: Hisao M. Chang, Horst Schroeter
-
Patent number: 8229754Abstract: Systems, methods, and computer program products for graphically displaying audio data are provided. In some implementations, a method is provided. The method includes displaying a graphical visual representation of digital audio data, the representation displaying a feature of the audio data on a feature axis with respect to time on a time axis. The method also includes receiving an input in the graphical visual representation selecting a range with respect to the feature and automatically extending the selected range with respect to time to define a selected region of the visual representation, where the extended range with respect to time is predefined and ignoring any component of the received input with respect to time.Type: GrantFiled: October 23, 2006Date of Patent: July 24, 2012Assignee: Adobe Systems IncorporatedInventors: Daniel Ramirez, Todd Orler, Shenzhi Zhang, Jeffrey Garner
-
Patent number: 8223269Abstract: In a closed caption production device, video recognition processing of an input video signal is performed by a video recognizer. This causes a working object in video to be recognized. In addition, a sound recognizer performs sound recognition processing of an input sound signal. This causes a position of a sound source to be estimated. A controller performs linking processing by comparing information of the working object recognized by the video recognition processing with positional information of the sound source estimated by the sound recognition processing. This causes a position of a closed caption produced based on the sound signal to be set in the vicinity of the working object in the video.Type: GrantFiled: September 19, 2007Date of Patent: July 17, 2012Assignee: Panasonic CorporationInventor: Isao Ikegami
-
Patent number: 8219400Abstract: Stereo to mono voice conferencing conversion is performed during a voice conference. Conferencing equipment receives audio for right and left channels and filters each of the channels into a plurality of bands. For each band of each channel, the equipment determines an energy level and compares each energy level for each band of the right channel to each energy level for each corresponding band of the left channel. Based on the comparison, the equipment determines which channel has more audio resulting from speech. Based on the determination, the equipment adjusts delivery of the audio from the right and left channels to a mono channel for transmission to endpoints only capable of mono audio in the voice conference.Type: GrantFiled: November 21, 2008Date of Patent: July 10, 2012Assignee: Polycom, Inc.Inventor: Peter L. Chu
-
Patent number: 8204750Abstract: Disclosed are Multipurpose Media Players that enable users to create transcriptions, closed captions, and/or logs of digitized recordings, that enable the presentation of transcripts, closed captions, logs, and digitized recordings in a correlated manner to users, that enable users to compose one or more scenes of a production, and that enable users to compose storyboards for a production. The multipurpose media players can be embodied within Internet browser environments, thereby providing high availability of the multipurpose players across software platforms, networks, and physical locations.Type: GrantFiled: February 14, 2006Date of Patent: June 19, 2012Assignee: Teresis Media ManagementInventor: Keri DeWitt
-
Patent number: 8174981Abstract: Method of processing a transmitted encoded media data stream is received. If a data element arrives prior to, or at, a predetermined playout deadline, the data element is decoded, the media represented by the decoded data element is played, and the data element is provided to a decoder state machine to update a decoder state. If a data element arrives after the predetermined playout deadline, the data element is provided to the decoder state machine to update the decoder state. In one embodiment, if the specified data element fails to arrive by the playout deadline, a subsequently received data element is saved in memory. Then, if the specified data element arrives after the predetermined playout deadline, the specified data element and the saved, subsequently received, data element are provided to the decoder state machine to update the decoder state.Type: GrantFiled: December 2, 2008Date of Patent: May 8, 2012Assignee: Broadcom CorporationInventor: Wilfrid LeBlanc
-
Patent number: 8165888Abstract: Disclosed is a reproducing apparatus comprising: a reproduction section to reproduce reproduction data comprising sound data and/or image data; a selection section to calculate evaluation values between a link source set for the reproduction data and each of a plurality of link destinations corresponding to the link source by a predetermined arithmetic expression based on link information of the plurality of link destinations, and to select a link destination having a highest evaluation among the evaluation values out of the plurality of link destinations; and a reproduction control section to move a reproduction point of the reproduction data reproduced by the reproduction section to a position corresponding to the link destination by linking the link source with the link destination when the reproduction point reaches a given point with respect to a position corresponding to the link source, and to instruct the reproduction section to reproduce the reproduction data.Type: GrantFiled: March 14, 2008Date of Patent: April 24, 2012Assignees: The University of Electro-Communications, Funai Electric Co., Ltd.Inventors: Kota Takahashi, Yasuo Masaki
-
Patent number: 8155964Abstract: This invention includes: a voice quality feature database (101) holding voice quality features; a speaker attribute database (106) holding, for each voice quality feature, an identifier enabling a user to expect a voice quality of the voice quality feature; a weight setting unit (103) setting a weight for each acoustic feature of a voice quality; a scaling unit (105) calculating display coordinates of each voice quality feature based on the acoustic features in the voice quality feature and the weights set by the weight setting unit (103); a display unit (107) displaying the identifier of each voice quality feature on the calculated display coordinates; a position input unit (108) receiving designated coordinates; and a voice quality mix unit (110) (i) calculating a distance between (1) the received designated coordinates and (2) the display coordinates of each of a part or all of the voice quality features, and (ii) mixing the acoustic features of the part or all of the voice quality features together basedType: GrantFiled: June 4, 2008Date of Patent: April 10, 2012Assignee: Panasonic CorporationInventors: Yoshifumi Hirose, Takahiro Kamai
-
Patent number: 8145497Abstract: Provided are a user interface for processing digital data, a method for processing a media interface, and a recording medium thereof. The user interface is used for converting a selected script into voice to generate digital data having a form of a voice file corresponding to the script, or for managing the generated digital data. In the method, the user interface is displayed. The user interface includes at least a text window on which a script to be converted into voice is written, and an icon to be selected for converting the script written on the text window into voice.Type: GrantFiled: July 10, 2008Date of Patent: March 27, 2012Assignee: LG Electronics Inc.Inventors: Tae Hee Ahn, Sung Hun Kim, Dong Hoon Lee