Sound Editing Patents (Class 704/278)
  • Patent number: 8145497
    Abstract: Provided are a user interface for processing digital data, a method for processing a media interface, and a recording medium thereof. The user interface is used for converting a selected script into voice to generate digital data having a form of a voice file corresponding to the script, or for managing the generated digital data. In the method, the user interface is displayed. The user interface includes at least a text window on which a script to be converted into voice is written, and an icon to be selected for converting the script written on the text window into voice.
    Type: Grant
    Filed: July 10, 2008
    Date of Patent: March 27, 2012
    Assignee: LG Electronics Inc.
    Inventors: Tae Hee Ahn, Sung Hun Kim, Dong Hoon Lee
  • Patent number: 8145496
    Abstract: A programmed “Stutter Edit” creates, stores and triggers combinations of effects to be used on a repeated short sample (“slice”) of recorded audio. The combination of effects (“gesture”) act on the sample over a specified duration (“gesture length”), with the change in parameters for each effect over the gesture length being dictated by user-defined curves. Such a system affords wide manipulation of audio recorded on-the-fly, perfectly suited for live performance. These effects preferably include not only stuttering but also imposing an amplitude envelope on the slice being triggered, sample rate and bit rate manipulation, panning (interpolation between pre-defined spatial positions), high- and low-pass filters and compression. Destructive edits, such as reversing, pitch shifting, and fading may also alter the way the Stutter Edit is heard. More advanced techniques, include using filters, FX processors, and other plug-ins, can increase the detail and uniqueness of a particular Stutter Edit effect.
    Type: Grant
    Filed: May 25, 2007
    Date of Patent: March 27, 2012
    Inventor: Brian Transeau
  • Patent number: 8121849
    Abstract: According to some embodiments, content filtering is provided for a digital audio signal.
    Type: Grant
    Filed: November 22, 2010
    Date of Patent: February 21, 2012
    Assignee: Intel Corporation
    Inventors: Christopher J. Cormack, Tony Moy
  • Patent number: 8116463
    Abstract: A method and an apparatus for detecting audio signals are disclosed. The input audio signal is inspected to check whether it is a foreground frame or a background frame; the detected background signal is further inspected according to the music eigenvalue and the decision rule. Therefore, background music can be detected, and the classifying performance of the voice/music classifier is improved.
    Type: Grant
    Filed: December 27, 2010
    Date of Patent: February 14, 2012
    Assignee: Huawei Technologies Co., Ltd.
    Inventor: Zhe Wang
  • Patent number: 8117034
    Abstract: A speech recognition device (1) processes speech data (SD) of a dictation and thus establishes recognized text information (ETI) and link information (LI) of the dictation. In a synchronous playback mode of the speech recognition device (1), during the acoustic playback of the dictation a correction device (10) synchronously marks the word of the recognized text information (ETI) which word relates to the speech data (SD) just played back marked by the link information (LI) is marked synchronously, the just marked word featuring the position of an audio cursor (AC). When a user of the speech recognition device (1) recognizes an incorrect word, he positions a text cursor (TC) at the incorrect word and corrects it. Cursor synchronization means (15) now make it possible to synchronize the text cursor (TC) with the audio cursor (AC) or the audio cursor (AC) with the text cursor (TC) so that the positioning of the respective cursor (AC, TC) is simplified considerably.
    Type: Grant
    Filed: March 26, 2002
    Date of Patent: February 14, 2012
    Assignee: Nuance Communications Austria GmbH
    Inventor: Wolfgang Gschwendtner
  • Patent number: 8103511
    Abstract: An audio file generation method and system. A computing system receives a first audio file comprising first speech data associated with a first party. The computing system receives a second audio file comprising second speech data associated with a second party. The first audio file differs from the second audio file. The computing system generates a third audio file from the second audio file. The third audio file differs from the second audio file. The process to generate the third audio file includes identifying a first set of attributes missing from the second audio file and adding the first set of attributes to the second audio file. The process to generate the third audio file additionally includes removing a second set of attributes from the second audio file. The third audio file includes third speech data associated with the second party. The computing system broadcasts the third audio file.
    Type: Grant
    Filed: May 28, 2008
    Date of Patent: January 24, 2012
    Assignee: International Business Machines Corporation
    Inventors: Sara H. Basson, Brian R. Heasman, Dimitri Kanevsky, Edward Emile Kelley
  • Patent number: 8095367
    Abstract: Methods and systems of parasitic sensing are shown and described. The method includes, measuring, at a first time using one or more electrical elements native to a domain, a parameter of a circuit within the domain and measuring, at a second time using the one or more electrical elements native to the domain, the parameter. The method also includes, comparing the parameter measurement from the first time to the parameter measurement at the second time and determining, in response to the comparison, that an activity occurred within the domain.
    Type: Grant
    Filed: October 30, 2007
    Date of Patent: January 10, 2012
    Assignee: Raytheon BBN Technologies Corp.
    Inventors: Ronald Bruce Coleman, John Scott Knight, George Shepard, Richard Madden
  • Patent number: 8050415
    Abstract: A method and an apparatus for detecting audio signals are disclosed. The input audio signal is detected to determine whether it is a background frame. The detected background signal is further detected according to a music characterization value and a decision rule. Therefore, background music can be detected, and the classifying performance of the voice/music classifier is improved.
    Type: Grant
    Filed: April 25, 2011
    Date of Patent: November 1, 2011
    Assignee: Huawei Technologies, Co., Ltd.
    Inventor: Zhe Wang
  • Patent number: 8050926
    Abstract: An apparatus for adjusting a prompt voice depending on an environment comprises a receiver module used for receiving a background sound, an analyzer module generating a control signal according to the background sound and an output module adjusting an output frequency of a prompt voice through the control signal and outputting the adjusted prompt voice.
    Type: Grant
    Filed: November 5, 2007
    Date of Patent: November 1, 2011
    Assignee: Micro-Star Int'l Co., Ltd
    Inventor: Chien Ming Huang
  • Patent number: 8050931
    Abstract: In a masking sound generation apparatus, a CPU analyzes a speech utterance speed of a received sound signal. Then, the CPU copies the received sound signal into a plurality of sound signals and performs the following processing on each of the sound signals. Namely, the CPU divides each of the sound signals into frames on the basis of a frame length determined on the basis of the speech utterance speed. Reverse process is performed on each of the frames to replace a waveform of the frame with a reverse waveform, and a windowing process is performed to achieve a smooth connection between the frames. Then, the CPU randomly rearranges the order of the frames and mixes the plurality of sound signals to generate a masking sound signal.
    Type: Grant
    Filed: March 19, 2008
    Date of Patent: November 1, 2011
    Assignee: Yamaha Corporation
    Inventors: Atsuko Ito, Yasushi Shimizu, Akira Miki, Masato Hata
  • Publication number: 20110264453
    Abstract: In a method of adapting communications in a communication system comprising at least two terminals (1,2), a signal carrying at least a representation of at least part of an information content of an audio signal captured at a first terminal (1) and representing speech is communicated between the first terminal (1) and a second terminal (2). A modified version of the audio signal is made available for at the second terminal (2). At least one of the terminals (1,2) generates the modified version by re-creating the audio signal in a version modified such that at least one prosodic aspect of the represented speech is adapted in dependence on input data (22) provided at at least one of the terminals (1,2).
    Type: Application
    Filed: December 15, 2009
    Publication date: October 27, 2011
    Applicant: KONINKLIJKE PHILIPS ELECTRONICS N.V.
    Inventors: Dirk Brokken, Nicolle Hanneke van Schijndel, Mark Thomas Johnson, Joanne Henriette Desiree Monique Westerink, Paul Marcel Carl Lemmens
  • Patent number: 8037413
    Abstract: This specification describes technologies relating to editing digital audio data. In some implementations, a computer-implemented method is provided. The method includes displaying a visual representation of audio data, receiving an input selecting a selected portion of audio data within the visual representation, the selecting including applying a brush tool to the visual representation of the audio data, and editing the selected portion of audio data including determining a degree of opacity for the selected audio data and applying an editing effect according to the degree of opacity.
    Type: Grant
    Filed: January 30, 2008
    Date of Patent: October 11, 2011
    Assignee: Adobe Systems Incorporated
    Inventor: David E. Johnston
  • Patent number: 8032360
    Abstract: A system and method for high-quality variable speed playback of audio-visual (A/V) media is provided. The system receives an encoded visual signal and an encoded audio signal. The encoded visual signal is decoded to generate a decoded visual signal and the encoded audio signal is decoded to generate a decoded audio signal. The decoded audio signal is time scale modified to generate a time scale modified audio signal. The decoded visual signal and the time scale modified audio signal are then synchronized for playback at a predefined playback speed. Only partial decoding of the encoded audio signal may be performed to conserve processing power.
    Type: Grant
    Filed: May 13, 2004
    Date of Patent: October 4, 2011
    Assignee: Broadcom Corporation
    Inventor: Juin-Hwey Chen
  • Publication number: 20110239107
    Abstract: A transcript editor enables text-based editing of time-based media that includes spoken dialog. It involves an augmented transcript that includes timing metadata that associates words and phrases within the transcript with corresponding temporal locations within the time-based media where the text is spoken, and editing the augmented transcript without the need for playback of the time-based media. After editing, the augmented transcript is processed by a media editing system to automatically generate an edited version of the time-based media that only includes the segments of the time-based media that include the speech corresponding to the edited augmented transcript.
    Type: Application
    Filed: March 29, 2010
    Publication date: September 29, 2011
    Inventors: Michael E. Phillips, Glenn Lea
  • Publication number: 20110224990
    Abstract: A speaker speed conversion system includes: a risk site detection unit (22) for detecting sites of risk regarding sound quality from among speech that is received as input, a frame boundary detection unit (23) for searching for a plurality of points that can serve as candidates of frame boundaries from among speech that is received as input and, of these points, supplying as a frame boundary the point that is predicted to be best from the standpoint of sound quality, and an OLA unit (25) for implementing speed conversion based on the detection results in the frame boundary detection unit (23); wherein the frame boundary detection unit (23) eliminates, from candidates of frame boundaries, sites of risk regarding sound quality that were detected in the risk site detection unit (22).
    Type: Application
    Filed: July 22, 2008
    Publication date: September 15, 2011
    Inventor: Satoshi Hosokawa
  • Publication number: 20110218798
    Abstract: Techniques implemented as systems, methods, and apparatuses, including computer program products, for obfuscating sensitive content in an audio source representative of an interaction between a contact center caller and a contact center agent. The techniques include performing, by an analysis engine of a contact center system, a context-sensitive content analysis of the audio source to identify each audio source segment that includes content determined by the analysis engine to be sensitive content based on its context; and processing, by an obfuscation engine of the contact center system, one or more identified audio source segments to generate corresponding altered audio source segments each including obfuscated sensitive content.
    Type: Application
    Filed: March 5, 2010
    Publication date: September 8, 2011
    Applicant: Nexdia Inc.
    Inventor: Marsal Gavalda
  • Patent number: 7996230
    Abstract: A marker is derived from an interaction between a person and an agent of a business and the agent's user interface. A part of a speech signal that corresponds to a portion of the person's special information is located with the marker. The speech signal results from the interaction between the person and the agent. The part of the speech signal that corresponds to the portion of the person's special information is rendered unintelligible.
    Type: Grant
    Filed: August 6, 2009
    Date of Patent: August 9, 2011
    Assignee: Intellisist, Inc.
    Inventor: G. Kevin Doren
  • Patent number: 7991619
    Abstract: A system, method and computer program product for performing blind change detection audio segmentation that combines hypothesized boundaries from several segmentation algorithms to achieve the final segmentation of the audio stream. Automatic segmentation of the audio streams according to the system and method of the invention may be used for many applications like speech recognition, speaker recognition, audio data mining, online audio indexing, and information retrieval systems, where the actual boundaries of the audio segments are required.
    Type: Grant
    Filed: June 19, 2008
    Date of Patent: August 2, 2011
    Assignee: International Business Machines Corporation
    Inventors: Upendra V. Chaudhari, Mohamed Kamal Omar, Ganesh N. Ramaswamy
  • Publication number: 20110178805
    Abstract: According to one embodiment, a sound quality control device includes: a time domain analysis module configured to perform a time-domain analysis on an audio-input signal; a frequency domain analysis module configured to perform a frequency-domain analysis on a frequency-domain signal; a first calculation module configured to calculate first speech/music scores based on the analysis results; a compensation filtering processing module configured to generate a filtered signal; a second calculation module configured to calculate second speech/music scores based on the filtered signal; a score correction module configured to generate one of corrected speech/music scores based on a difference between the first speech/music score and the second speech/music score; and a sound quality control module configured to control a sound quality of the audio-input signal based on the one of the corrected speech/music scores.
    Type: Application
    Filed: September 29, 2010
    Publication date: July 21, 2011
    Inventors: Hirokazu Takeuchi, Hiroshi Yonekubo
  • Publication number: 20110145001
    Abstract: A data stream is filtered to produce a filtered data stream. The data stream is analyzed based on an acoustic parameter to determine whether a predetermined condition is satisfied. At least one extraneous portion of the data stream, in which the predetermined condition is satisfied, is determined. Thereafter, the at least one extraneous portion is deleted from the data stream to produce the filtered data stream.
    Type: Application
    Filed: December 10, 2009
    Publication date: June 16, 2011
    Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventors: Yeon-Jun KIM, I. Dan MELAMED, Bernard S. RENGER, Steven Neil TISCHER
  • Publication number: 20110145002
    Abstract: A method, apparatus, and computer-readable medium for editing a data stream based on a corpus are provided. The data stream includes stream words. A sequence includes a predetermined number of sequential words of the stream words. The method, apparatus, and computer-readable medium determine whether the sequence exists in the corpus at least at a predetermined minimum frequency. When the sequence exists in the corpus at least at the predetermined minimum frequency, the sequence is edited in the data stream.
    Type: Application
    Filed: September 17, 2010
    Publication date: June 16, 2011
    Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventors: Ilya Dan MELAMED, Yeon-Jun KIM
  • Patent number: 7962335
    Abstract: Techniques and tools related to delayed or lost coded audio information are described. For example, a concealment technique for one or more missing frames is selected based on one or more factors that include a classification of each of one or more available frames near the one or more missing frames. As another example, information from a concealment signal is used to produce substitute information that is relied on in decoding a subsequent frame. As yet another example, a data structure having nodes corresponding to received packet delays is used to determine a desired decoder packet delay value.
    Type: Grant
    Filed: July 14, 2009
    Date of Patent: June 14, 2011
    Assignee: Microsoft Corporation
    Inventors: Hosam A. Khalil, Tian Wang, Kazuhito Koishida, Xiaoqin Sun, Wei-Ge Chen
  • Publication number: 20110137658
    Abstract: Provided is a method of canceling a vocal signal, wherein the method includes obtaining a difference signal between two audio signals; and smoothing the frequency of the difference signal. Also provided is a device for canceling a vocal signal, the device including a subtracter which obtains a difference signal between two audio signals; and a frequency smoothing unit which smoothes a frequency of the difference signal.
    Type: Application
    Filed: October 12, 2010
    Publication date: June 9, 2011
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Jun-ho LEE
  • Patent number: 7945446
    Abstract: Spectrum envelope of an input sound is detected. In the meantime, a converting spectrum is acquired which is a frequency spectrum of a converting sound comprising a plurality of sounds, such as unison sounds. Output spectrum is generated by imparting the detected spectrum envelope of the input sound to the acquired converting spectrum. Sound signal is synthesized on the basis of the generated output spectrum. Further, a pitch of the input sound may be detected, and frequencies of peaks in the acquired converting spectrum may be varied in accordance with the detected pitch of the input sound. In this manner, the output spectrum can have the pitch and spectrum envelope of the input sound and spectrum frequency components of the converting sound comprising a plurality of sounds, and thus, unison sounds can be readily generated with simple arrangements.
    Type: Grant
    Filed: March 9, 2006
    Date of Patent: May 17, 2011
    Assignee: Yamaha Corporation
    Inventors: Hideki Kemmochi, Yasuo Yoshioka, Jordi Bonada
  • Patent number: 7933768
    Abstract: A vocoder system for improving the performance expression of an output sound while lightening the computational load. The system includes formant detection means and division means in which the center frequencies have been fixed. The modulation level with which the levels of each of the frequency bands that have been divided in the division means are set by a setting means based on the levels of each of the frequency bands that correspond to those that have been detected in the formant detection means and formant information with which the formants are changed. Therefore, it is possible to improve the performance expression of the output sound with a light computational load and without the need to calculate and change the filter figure of each filter for each sample in order to change the center frequency and bandwidth of each of the filters comprising the division means.
    Type: Grant
    Filed: March 23, 2004
    Date of Patent: April 26, 2011
    Assignee: Roland Corporation
    Inventor: Tadao Kikumoto
  • Publication number: 20110054910
    Abstract: A system provided herein may perform automatic temporal alignment between music audio signal and lyrics with higher accuracy than ever. A non-fricative section extracting 4 extracts non-fricative sound sections, where no fricative sounds exist, from the music audio signal. An alignment portion 17 includes a phone model 15 for singing voice capable of estimating phonemes corresponding to temporal-alignment features. The alignment portion 17 performs an alignment operation using as inputs temporal-alignment features obtained from a temporal-alignment feature extracting portion 11, information on vocal and non-vocal sections obtained from a vocal section estimating portion 9, and a phoneme network SN on conditions that no phonemes exist at least in non-vocal sections and that no fricative phonemes exist in non-fricative sound sections.
    Type: Application
    Filed: February 5, 2009
    Publication date: March 3, 2011
    Applicant: NATIONAL INSTITUTE OF ADVANCED INDUSTRIAL SCIENCE AND TECHNOLOGY
    Inventors: Hiromasa Fujihara, Masataka Goto
  • Patent number: 7865367
    Abstract: A system that includes a speaker workstation and a system that includes an auditor device. The speaker workstation is configured to perform a method for generating a Speech Hyperlink-Time table in conjunction with a system of universal time. The speaker workstation creates a Speech Hyperlink table. While a speech is being spoken by a speaker, the speaker workstation recognizes each hyperlinked term of the Speech Hyperlink table being spoken by the speaker, and for each recognized hyperlinked term, generates a row in the Speech Hyperlink-Time table. The auditor device is configured to perform a method for processing a speech in conjunction with a system of universal time. The auditor device determines and records, in a record of a Selections Hyperlink-Time table, a universal time corresponding to a hyperlinked term spoken during a speech.
    Type: Grant
    Filed: March 12, 2009
    Date of Patent: January 4, 2011
    Assignee: International Business Machines Corporation
    Inventor: Fernando Incertis Carro
  • Patent number: 7865360
    Abstract: An audio device for modifying the voice transmitted during a telephone call particularly suitable for a mobile telephone system receives from the user of the audio device an analog speech signal. A converter converts the analog speech signal into a digital speech signal comprising at least one fundamental frequency. A set of coded data represents a musical score comprising a set of notes, each note being defined by a fundamental frequency, a duration, and an instrument that plays the note. A digital music signal is extracted from the set of coded data, and a first portion of the digital speech signal is mixed with a first portion of the digital music signal to produce a combined digital signal.
    Type: Grant
    Filed: March 18, 2004
    Date of Patent: January 4, 2011
    Assignee: IPG Electronics 504 Limited
    Inventors: Xavier Fourquin, Pierre Bonnard
  • Patent number: 7865370
    Abstract: According to some embodiments, content filtering is provided for a digital audio signal.
    Type: Grant
    Filed: November 21, 2008
    Date of Patent: January 4, 2011
    Assignee: Intel Corporation
    Inventors: Christopher J. Cormack, Tony Moy
  • Publication number: 20100332237
    Abstract: According to one embodiment, a sound quality correction apparatus calculates various feature parameters for identifying the speech signal and the music signal from an input audio signal and, based on the various feature parameters thus calculated, also calculates a speech/music identification score indicating to which of the speech signal and the music signal the input audio signal is close to. Then, based on this speech/music identification score, the correction strength of each of plural sound quality correctors is controlled to execute different types of the sound quality correction processes on the input audio signal.
    Type: Application
    Filed: February 4, 2010
    Publication date: December 30, 2010
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventor: Hirokazu TAKEUCHI
  • Patent number: 7831421
    Abstract: Techniques and tools related to delayed or lost coded audio information are described. For example, a concealment technique for one or more missing frames is selected based on one or more factors that include a classification of each of one or more available frames near the one or more missing frames. As another example, information from a concealment signal is used to produce substitute information that is relied on in decoding a subsequent frame. As yet another example, a data structure having nodes corresponding to received packet delays is used to determine a desired decoder packet delay value.
    Type: Grant
    Filed: May 31, 2005
    Date of Patent: November 9, 2010
    Assignee: Microsoft Corporation
    Inventors: Hosam A. Khalil, Tian Wang, Kazuhito Koishida, Xiaoqin Sun, Wei-Ge Chen
  • Patent number: 7814284
    Abstract: A data redundancy elimination system.
    Type: Grant
    Filed: January 18, 2007
    Date of Patent: October 12, 2010
    Assignee: Cisco Technology, Inc.
    Inventors: Gideon Glass, Maxim Martynov, Qiwen Zhang, Etai Lev Ran, Dan Li
  • Publication number: 20100250257
    Abstract: This invention includes: a voice quality feature database (101) holding voice quality features; a speaker attribute database (106) holding, for each voice quality feature, an identifier enabling a user to expect a voice quality of the voice quality feature; a weight setting unit (103) setting a weight for each acoustic feature of a voice quality; a scaling unit (105) calculating display coordinates of each voice quality feature based on the acoustic features in the voice quality feature and the weights set by the weight setting unit (103); a display unit (107) displaying the identifier of each voice quality feature on the calculated display coordinates; a position input unit (108) receiving designated coordinates; and a voice quality mix unit (110) (i) calculating a distance between (1) the received designated coordinates and (2) the display coordinates of each of a part or all of the voice quality features, and (ii) mixing the acoustic features of the part or all of the voice quality features together based
    Type: Application
    Filed: June 4, 2008
    Publication date: September 30, 2010
    Inventors: Yoshifumi Hirose, Takahiro Kamai
  • Patent number: 7796626
    Abstract: For supporting a decoding of encoded frames, which belong to a sequence of frames received via a packet switched network, it is detected whether a particular encoded frame has been received after a scheduled decoding time for the particular encoded frame and before a scheduled decoding time for a next encoded frame. In case the particular encoded frame is detected to have been received after its scheduled decoding time and before the scheduled decoding time for the next encoded frame, the particular encoded frame is re-scheduled to be decoded at the scheduled decoding time for the next encoded frame.
    Type: Grant
    Filed: September 26, 2006
    Date of Patent: September 14, 2010
    Assignee: Nokia Corporation
    Inventors: Ari Lakaniemi, Pasi S. Ojala
  • Patent number: 7792674
    Abstract: A method and machine-readable medium for providing virtual spatial sound with an audio visual player are disclosed. Input audio is processed into output audio having spatial attributes associated with the spatial sound represented in a room display.
    Type: Grant
    Filed: March 30, 2007
    Date of Patent: September 7, 2010
    Assignee: Smith Micro Software, Inc.
    Inventors: Robert J. E. Dalton, Jr., Rupen Dolasia
  • Patent number: 7778837
    Abstract: Systems and methods that create a classification of sentences in a language, and further construct associated local versions of language models, based on geographical location and/or other demographic criteria—wherein such local language models can be of different levels of granularity according to chosen demographic criteria. The subject innovation employs a classification encoder component that forms a classification (e.g. a tree structure) of sentences, and a local language models encoder component, which employs the classification of sentences in order to construct the localized language models. A decoder component can subsequently enable local word wheeling and/or local web search by blending k-best answers from local language models of varying demographic granularity that match users demographics. Hence, k-best matches for input data by users in one demographic locality can be different from k-best matches for the same input by other users in another locality.
    Type: Grant
    Filed: November 30, 2006
    Date of Patent: August 17, 2010
    Assignee: Microsoft Corporation
    Inventors: Bo Thiesson, Kenneth W. Church
  • Patent number: 7778823
    Abstract: Some embodiments of the invention provide a method of processing audio data while creating a media presentation. The media presentation includes several audio streams. The method processes a section of a first audio stream and stores the processed section of the first audio stream. The method also processes a section of a second audio stream that overlaps with the processes section of the first audio stream. The method then processes the section of the second audio stream independently of the first audio stream. In some embodiments, the method processes the first audio stream section by applying an effect to the first audio stream section. Also, in some embodiments, the processing of the first audio stream section also entails performing a sample rate conversion on the first audio stream section.
    Type: Grant
    Filed: December 21, 2007
    Date of Patent: August 17, 2010
    Assignee: Apple Inc.
    Inventor: Kenneth M. Carson
  • Publication number: 20100195812
    Abstract: The claimed subject matter relates to an architecture that can preprocess audio portions of communications in order to enrich multiparty communication sessions or environments. In particular, the architecture can provide both a public channel for public communications that are received by substantially all connected parties and can further provide a private channel for private communications that are received by a selected subset of all connected parties. Most particularly, the architecture can apply an audio transform to communications that occur during the multiparty communication session based upon a target audience of the communication. By way of illustration, the architecture can apply a whisper transform to private communications, an emotion transform based upon relationships, an ambience or spatial transform based upon physical locations, or a pace transform based upon lack of presence.
    Type: Application
    Filed: February 5, 2009
    Publication date: August 5, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Dinei A. Florencio, Alejandro Acero, William Buxton, Phillip A. Chou, Ross G. Cutler, Jason Garms, Christian Huitema, Kori M. Quinn, Daniel Allen Rosenfeld, Zhengyou Zhang
  • Publication number: 20100198600
    Abstract: A voice conversion training system, voice conversion system, voice conversion client-server system, and program that realize voice conversion to be performed with low load of training are provided. In a server 10, an intermediate conversion function generation unit 101 generates an intermediate conversion function F, and a target conversion function generation unit 102 generates a target conversion function G. In a mobile terminal 20, an intermediate voice conversion unit 211 uses the conversion function F to generate speech of an intermediate speaker from speech of a source speaker, and a target voice conversion unit 212 uses the conversion function G to convert speech of the intermediate speaker speech generated by the intermediate voice conversion unit 211 to speech of a target speaker.
    Type: Application
    Filed: November 28, 2006
    Publication date: August 5, 2010
    Inventor: Tsuyoshi Masuda
  • Publication number: 20100161326
    Abstract: A speech recognition system includes: a speed level classifier for measuring a moving speed of a moving object by using a noise signal at an initial time of speech recognition to determine a speed level of the moving object; a first speech enhancement unit for enhancing sound quality of an input speech signal of the speech recognition by using a Wiener filter, if the speed level of the moving object is equal to or lower than a specific level; and a second speech enhancement unit enhancing the sound quality of the input speech signal by using a Gaussian mixture model, if the speed level of the moving object is higher than the specific level. The system further includes an end point detection unit for detecting start and end points, an elimination unit for eliminating sudden noise components based on a sudden noise Gaussian mixture model.
    Type: Application
    Filed: July 21, 2009
    Publication date: June 24, 2010
    Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Sung Joo Lee, Ho-Young Jung, Jeon Gue Park, Hoon Chung, Yunkeun Lee, Byung Ok Kang, Hyung-Bae Jeon, Jong Jin Kim, Ki-young Park, Euisok Chung, Ji Hyun Wang, Jeom Ja Kang
  • Patent number: 7743016
    Abstract: An apparatus for processing a signal and method thereof are disclosed. Data coding and entropy coding are performed with interconnection, and grouping is used to enhance coding efficiency. The present invention includes the steps of obtaining index information and entropy-decoding the index information and identifying a content corresponding to the entropy-decoded index information and selecting entropy table.
    Type: Grant
    Filed: October 4, 2006
    Date of Patent: June 22, 2010
    Assignee: LG Electronics Inc.
    Inventors: Hee Suk Pang, Hyen O Oh, Dong Soo Kim, Jae Hyun Lim, Yang-Won Jung, Hyo Jin Kim
  • Patent number: 7739112
    Abstract: A signal connecting method and apparatus is provided which can reduce noises and create natural synthesized voices. The signal connecting method (or apparatus) for connecting a plurality of waveform signals and creating a synthesized waveform signal, has: a step (or unit) for determining an upper limit frequency of a frequency spectrum of each of the plurality of waveform signals; and a step (or unit) for filtering at least a connection portion of each waveform signal by using predetermined filter characteristics having the determined upper limit frequency. The cut-off frequency of the filtering is the higher upper limit frequency in upper limit frequencies of spectra of adjacent two waveform signals before and after the connection portion of the waveform signals. Higher harmonics to be caused by discontinuity of the connection portion of waveform signals can be effectively removed and noises of synthesized waveform signals can be reduced considerably.
    Type: Grant
    Filed: June 27, 2002
    Date of Patent: June 15, 2010
    Assignees: Kabushiki Kaisha Kenwood, Advanced Telecommunications Research Institute International
    Inventors: Yasushi Sato, Davin Patrick
  • Patent number: 7734472
    Abstract: The invention concerns a speech recognition enhancer (51) and a speech recognition system comprising such speech recognition enhancer (51), an audio input unit (41) and a speech recognizer (61, 3). The speech recognition enhancer (51) is arranged between the audio input unit (41) and the speech recognizer (61, 3). The speech recognition enhancer (51) has a parametrizable pre-filtering unit (511), a parametrizable dynamic voice level control unit (512), a parametrizable noise reduction unit (513) and a parametrizable voice level control unit (514). The parameters of these parametrizable units (511, 512, 513, 514) are adjusted to the characteristics of the specific audio input unit (41) and/or the characteristics of the specific speech recognizer (61, 3) for adapting the audio input unit (41) to the speech recognizer (61, 3).
    Type: Grant
    Filed: September 29, 2004
    Date of Patent: June 8, 2010
    Assignee: Alcatel
    Inventor: Michael Walker
  • Publication number: 20100137030
    Abstract: Disclosed is a technique for presenting audible items to a user in a manner that allows the user to easily distinguish them and to select from among them. A number of audible items are rendered simultaneously to the user. To prevent the sounds from blending together into a sonic mishmash, some of the items are “conditioned” while they are being rendered. For example, one audible item might be rendered more quietly than another, or one item can be moved up in register compared with another. Some embodiments combine audible conditioning with visual avatars portrayed on, for example, a display screen of a user device. During the rendering, each audible item is paired with an avatar, the pairing based on some suitable criterion, such as a type of conditioning applied to the audible item. Audible spatial placement is mimicked by visual placement of the avatars on the user's display screen.
    Type: Application
    Filed: December 2, 2008
    Publication date: June 3, 2010
    Applicant: MOTOROLA, INC.
    Inventor: Changxue Ma
  • Patent number: 7698139
    Abstract: In a method and apparatus for a differentiated voice output, systems existing in a vehicle, such as the on-board computer, the navigation system, and others, can be connected with a voice output device. The voice outputs of different systems can be differentiated by way of voice characteristics.
    Type: Grant
    Filed: June 20, 2003
    Date of Patent: April 13, 2010
    Assignee: Bayerische Motoren Werke Aktiengesellschaft
    Inventors: Georg Obert, Klaus-Josef Bengler
  • Patent number: 7698138
    Abstract: A broadcast receiving system includes a broadcast receiving part for receiving a broadcast in which additional information that corresponds to an object appearing in broadcast contents and that contains keyword information for specifying the object is broadcasted simultaneously with the broadcast contents; a recognition vocabulary generating section for generating a recognition vocabulary set in a manner corresponding to the additional information by using a synonym dictionary; a speech recognition section for performing the speech recognition of a voice uttered by a viewing person, and for thereby specifying keyword information corresponding to a recognition vocabulary set when a word recognized as the speech recognition result is contained in the recognition vocabulary set; and a displaying section for displaying additional information corresponding to the specified keyword information.
    Type: Grant
    Filed: December 26, 2003
    Date of Patent: April 13, 2010
    Assignee: Panasonic Corporation
    Inventors: Yumiko Kato, Takahiro Kamai, Hideyuki Yoshida, Yoshifumi Hirose
  • Patent number: 7693717
    Abstract: An apparatus comprising a session file, session file editor, annotation window, concatenation software and training software. The session file includes one or more audio files and text associated with each audio file segment. The session file editor displays text and provides text selection capability and plays back audio. The annotation window operably associated with the session file editor supports user modification of the selected text, the annotation window saves modified text corresponding to the selected text from the session file editor and audio associated with the modified text. The concatenation software concatenates modified text and audio associated therewith for two or more instances of the selected text. The training software trains a speech user profile using a concatenated file formed by the concatenating software.
    Type: Grant
    Filed: April 12, 2006
    Date of Patent: April 6, 2010
    Assignee: Custom Speech USA, Inc.
    Inventors: Jonathan Kahn, Michael C. Huttinger
  • Publication number: 20100076771
    Abstract: A voice signal processing apparatus and method includes determining maximum amplitude values of a plurality of different voice frame signals obtained by giving different amounts of phase shift to frequency components of voice frame signals having a predetermined length which are divided from a digital voice signal, and selecting a voice frame signal whose maximum amplitude value is the minimum from among the amplitude values of the plurality of different voice frame signals.
    Type: Application
    Filed: September 16, 2009
    Publication date: March 25, 2010
    Applicant: Fujitsu Limited
    Inventor: Fumio AMANO
  • Patent number: RE41939
    Abstract: An audio/video reproducing apparatus is connectable to a communications network for selectively reproducing items of audio/video material from a recording medium in response to a request received via the communications network. The audio/video reproducing apparatus may comprise a control processor operable in use to receive data representing the request for the audio/video material item via the communications network. A reproducing processor is operable in response to signals identifying the audio/video material items from the control processor to reproduce the audio/video material items. The data identifying the audio/video material items includes meta data indicative of the audio/video material items. The meta data may be one of UMID, tape ID and time codes, and a Unique Material Identifier the material items.
    Type: Grant
    Filed: August 3, 2006
    Date of Patent: November 16, 2010
    Assignee: Sony United Kingdom Limited
    Inventors: Vincent Carl Harradine, Alan Turner, Morgan William David, Michael Williams, Mark John McGrath, Andrew Kydd, Jonathan Thorpe
  • Patent number: RE42647
    Abstract: The present invention provides a text-to-speech conversion system (TTS) for interlocking synchronizing with multimedia and a method for organizing input data of the TTS which can enhance the natural naturalness of synthesized speech and accomplish the synchronization of multimedia with TTS by defining additional prosody information, the information required to interlock synchronize TTS with multimedia, and interface between these this information and TTS for use in the production of the synthesized speech.
    Type: Grant
    Filed: September 30, 2002
    Date of Patent: August 23, 2011
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Jung Chul Lee, Min Soo Hahn, Hang Seop Lee, Jae Woo Yang, Youngjik Lee