Subportions Patents (Class 704/249)
  • Patent number: 8010361
    Abstract: In an embodiment, a lattice of phone strings in an input communication of a user may be recognized, wherein the lattice may represent a distribution over the phone strings. Morphemes in the input communication of the user may be detected using the recognized lattice. Task-type classification decisions may be made based on the detected morphemes in the input communication of the user.
    Type: Grant
    Filed: July 30, 2008
    Date of Patent: August 30, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Allen Louis Gorin, Dijana Petrovska-Delacretaz, Giuseppe Riccardi, Jeremy Huntley Wright
  • Publication number: 20110208525
    Abstract: A voice recognizing apparatus includes a voice start instructing section 3 for instructing to start voice recognition; a voice input section 1 for receiving uttered voice and converting to a voice signal; a voice recognizing section 2 for recognizing the voice on the basis of the voice signal; an utterance start time detecting section 4 for detecting duration from the time when the voice start instructing section instructs to the time when the voice input section delivers the voice signal; an utterance timing deciding section 5 for deciding utterance timing indicating whether the utterance start is quick or slow by comparing the duration detected by the utterance start time detecting section with a prescribed threshold; an interaction control section 6 for determining a content, which is to be shown when exhibiting a recognition result of the voice recognizing section, in accordance with the utterance timing decided; a system response generating section 7 for generating a system response on the basis of the d
    Type: Application
    Filed: March 27, 2008
    Publication date: August 25, 2011
    Inventors: Yuzuru Inoue, Tadashi Suzuki, Fumitaka Sato, Takayoshi Chikuri
  • Patent number: 8000963
    Abstract: The sound reproducing apparatus includes a replay section receiving unit that receives a request for reproducing a specific part of sound file from a user, a replay section determining unit that determines a replay section based on the request and conversation structure information stored in a sound data holding unit, and a reproducing unit that reproduces the replay section determined by the replay section determining unit.
    Type: Grant
    Filed: March 25, 2005
    Date of Patent: August 16, 2011
    Assignee: Fujitsu Limited
    Inventors: Sachiko Onodera, Ryo Ochitani
  • Patent number: 7996221
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for processing a message received from a user to determine whether an estimate of intelligibility is below an intelligibility threshold. The method includes recognizing a portion of a user's message that contains the one or more expected utterances from a critical information list, calculating an estimate of intelligibility for the recognized portion of the user's message that contains the one or more expected utterances, and prompting the user to repeat at least the recognized portion of the user's message if the calculated estimate of intelligibility for the recognized portion of the user's message is below an intelligibility threshold. In one aspect, the method further includes prompting the user to repeat at least a portion of the message if any of a measured speech level and a measured signal-to-noise ratio of the user's message are determined to be below their respective thresholds.
    Type: Grant
    Filed: December 22, 2009
    Date of Patent: August 9, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Harvey S. Cohen, Randy G. Goldberg, Kenneth H. Rosen
  • Publication number: 20110184736
    Abstract: Automated methods are provided for recognizing inputted information items and selecting information items. The recognition and selection processes are performed by selecting category designations that the information items belong to. The category designations improve the accuracy and speed of the inputting and selection processes.
    Type: Application
    Filed: January 25, 2011
    Publication date: July 28, 2011
    Inventor: Benjamin SLOTZNICK
  • Publication number: 20110131045
    Abstract: Systems and methods are provided for receiving speech and non-speech communications of natural language questions and/or commands, transcribing the speech and non-speech communications to textual messages, and executing the questions and/or commands. The invention applies context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users presenting questions or commands across multiple domains. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech and non-speech communications and presenting the expected results for a particular question or command.
    Type: Application
    Filed: February 2, 2011
    Publication date: June 2, 2011
    Applicant: VoiceBox Technologies, Inc.
    Inventors: Philippe Di Cristo, Min Ke, Robert A. Kennewick, Lynn Elise Armstrong
  • Publication number: 20110119052
    Abstract: A device extracts prosodic information including a power value from a speech data and an utterance section including a period with a power value equal to or larger than a threshold, from the speech data, divides the utterance section into each section in which a power value equal to or larger than another threshold, acquires phoneme sequence data for each divided speech data by phoneme recognition, generates clusters which is a set of the classified phoneme sequence data by clustering, calculates an evaluation value for each cluster, selects clusters for which the evaluation value is equal to or larger than a given value as candidate clusters, determines one of the phoneme sequence data from the phoneme sequence data constituting the cluster for each candidate cluster to be a representative phoneme sequence, and selects the divided speech data corresponding to the representative phoneme sequence as listening target speech data.
    Type: Application
    Filed: November 5, 2010
    Publication date: May 19, 2011
    Applicant: Fujitsu Limited
    Inventor: Sachiko Onodera
  • Publication number: 20110115702
    Abstract: A method and system for computer programming using speech and one or two hand gesture input is described. The system generally uses a plurality of microphones and cameras as input devices. A configurable event recognition system is described allowing various software objects in a system to respond to speech and hand gesture and other input. From this input program code is produced that can be compiled at any time. Various speech and hand gesture events invoke functions within programs to modify programs, move text and punctuation in a word processor, manipulate mathematical objects, perform data mining, perform natural language interne search, modify project management tasks and visualizations, perform 3D modeling, web page design and web page data entry, and television and DVR programming.
    Type: Application
    Filed: July 9, 2009
    Publication date: May 19, 2011
    Inventor: David Seaberg
  • Publication number: 20110112838
    Abstract: A system and method are provided to authenticate a voice in a frequency domain. A voice in the time domain is transformed to a signal in the frequency domain. The first harmonic is set to a predetermined frequency and the other harmonic components are equalized. Similarly, the amplitude of the first harmonic is set to a predetermined amplitude, and the harmonic components are also equalized. The voice signal is then filtered. The amplitudes of each of the harmonic components are then digitized into bits to form at least part of a voice ID. In another system and method, a voice is authenticated in a time domain. The initial rise time, initial fall time, second rise time, second fall time and final oscillation time are digitized into bits to form at least part of a voice ID. The voice IDs are used to authenticate a user's voice.
    Type: Application
    Filed: November 10, 2009
    Publication date: May 12, 2011
    Applicant: Research In Motion Limited
    Inventor: Sasan Adibi
  • Publication number: 20110112830
    Abstract: A system and method are provided to authenticate a voice in a frequency domain. A voice in the time domain is transformed to a signal in the frequency domain. The first harmonic is set to a predetermined frequency and the other harmonic components are equalized. Similarly, the amplitude of the first harmonic is set to a predetermined amplitude, and the harmonic components are also equalized. The voice signal is then filtered. The amplitudes of each of the harmonic components are then digitized into bits to form at least part of a voice ID. In another system and method, a voice is authenticated in a time domain. The initial rise time, initial fall time, second rise time, second fall time and final oscillation time are digitized into bits to form at least part of a voice ID. The voice IDs are used to authenticate a user's voice.
    Type: Application
    Filed: November 10, 2009
    Publication date: May 12, 2011
    Applicant: Research In Motion Limited
    Inventor: Sasan Adibi
  • Publication number: 20110112839
    Abstract: A command recognition device includes: an utterance understanding unit that determines or selects word sequence information from speech information; speech confidence degree calculating unit that calculates degree of speech confidence based on the speech information and the word sequence information; a phrase confidence degree calculating unit that calculates a degree of phrase confidence based on image information and phrase information included in the word sequence information; and a motion control instructing unit that determines whether a command of the word sequence information should be executed based on the degree of speech confidence and the degree of phrase confidence.
    Type: Application
    Filed: September 2, 2010
    Publication date: May 12, 2011
    Applicant: HONDA MOTOR CO., LTD.
    Inventors: Kotaro FUNAKOSHI, Mikio NAKANO, Xiang ZUO, Naoto IWAHASHI, Ryo TAGUCHI
  • Patent number: 7937269
    Abstract: Systems and methods are provided for real-time classification of streaming data. In particular, systems and methods for real-time classification of continuous data streams implement micro-clustering methods for offline and online processing of training data to build and dynamically update training models that are used for classification, as well as incrementally clustering the data over contiguous segments of a continuous data stream (in real-time) into a plurality of micro-clusters from which target profiles are constructed which define/model the behavior of the data in individual segments of the data stream.
    Type: Grant
    Filed: August 22, 2005
    Date of Patent: May 3, 2011
    Assignee: International Business Machines Corporation
    Inventors: Charu Chandra Aggarwal, Philip Shilung Yu
  • Publication number: 20110093268
    Abstract: An apparatus, a method, and a machine-readable medium are provided for characterizing differences between two language models. A group of utterances from each of a group of time domains are examined. One of a significant word change or a significant word class change within the plurality of utterances is determined. A first cluster of utterances including a word or a word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances. A second cluster of utterances not including the word or the word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances.
    Type: Application
    Filed: September 14, 2010
    Publication date: April 21, 2011
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: Allen Louis Gorin, John Grothendieck, Jeremy Huntley Greet Wright
  • Patent number: 7864987
    Abstract: An access system in one embodiment that first determines that someone has correct credentials by using a non-biometric authentication method such as typing in a password, presenting a Smart card containing a cryptographic secret, or having a valid digital signature. Once the credentials are authenticated, then the user must take at least two biometric tests, which can be chosen randomly. In one approach, the biometric tests need only check a template generated from the user who desires access with the stored templates matching the holder of the credentials authenticated by the non-biometric test. Access desirably will be allowed when both biometric tests are passed.
    Type: Grant
    Filed: April 18, 2006
    Date of Patent: January 4, 2011
    Assignee: Infosys Technologies Ltd.
    Inventors: Kumar Balepur Venkatanna, Rajat Moona, S V Subrahmanya
  • Patent number: 7853450
    Abstract: A method of transmitting digital voice information comprises encoding raw speech into encoded digital speech data. The beginning and end of individual phonemes within the encoded digital speech data are marked. The encoded digital speech data is formed into packets. The packets are fed into a speech decoding mechanism.
    Type: Grant
    Filed: March 30, 2007
    Date of Patent: December 14, 2010
    Assignee: Alcatel-Lucent USA Inc.
    Inventor: Bryan Kadel
  • Publication number: 20100312559
    Abstract: A method of playing pictures comprises the steps of: receiving (11) a voice message; extracting (12) a key feature from the voice message; selecting (13) pictures by matching the key feature with pre-stored picture information; generating (14) a picture-voice sequence by integrating the selected pictures and the voice message; and playing (15) the picture-voice sequence. An electronic apparatus comprises a processing unit for implementing the different steps of the method.
    Type: Application
    Filed: December 11, 2008
    Publication date: December 9, 2010
    Applicant: Koninklijke Philips Electronics N.V.
    Inventors: Sheng Jin, Xin Chen, Yang Peng, Ningjiang Chen, Yunji Xia
  • Patent number: 7818170
    Abstract: A method for distributed voice searching may include receiving a search query from a user of the mobile communication device, generating a lattice of coarse linguistic representations from speech parts in the search query, extracting query features from the generated lattice of coarse linguistic representations, generating coarse search feature vectors based on the extracted query features, performing a coarse search using the generated coarse search feature vectors and transmitting the generated coarse search feature vectors to a remote voice search processing unit, receiving remote resultant web indices from the remote voice search processing unit, generating a lattice of fine linguistic representations from speech parts in the search query, generating fine search feature vectors from the lattice of fine linguistic representations, performing a fine search using the coarse search results, the remote resultant web indices and the generated fine search feature vectors, and displaying the fine search results t
    Type: Grant
    Filed: April 10, 2007
    Date of Patent: October 19, 2010
    Assignee: Motorola, Inc.
    Inventor: Yan Ming Cheng
  • Patent number: 7813927
    Abstract: There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states.
    Type: Grant
    Filed: June 4, 2008
    Date of Patent: October 12, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Jiri Navratil, James H. Nealand, Jason W. Pelecanos, Ganesh N. Ramaswamy, Ran D. Zilca
  • Patent number: 7792269
    Abstract: The present invention enables information about a service impacting network event to be collected from network operations and automatically conveyed to a Media Server that plays a network announcement to callers into network customer service center. The announcement can be played as an option on an Interactive Voice Response (IVR) menu and informs the caller of known service issues that are being addressed and estimates of when service should return to normal.
    Type: Grant
    Filed: December 30, 2004
    Date of Patent: September 7, 2010
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Marian Croak, Hossein Eslambolchi
  • Publication number: 20100211391
    Abstract: Speech processing is disclosed for an apparatus having a main processing unit, a memory unit, and one or more co-processors. Memory maintenance and voice recognition result retrievals upon execution are performed with a first main processor thread. Voice detection and initial feature extraction on the raw data are performed with a first co-processor. A second co-processor thread receives feature data derived for one or more features extracted by the first co-processor thread and information for locating probability density functions needed for probability computation by a speech recognition model and computes a probability that the one or more features correspond to a known sub-unit of speech using the probability density functions and the feature data. At least a portion of a path probability that a sequence of sub-units of speech correspond to a known speech unit is computed with a third co-processor thread.
    Type: Application
    Filed: February 2, 2010
    Publication date: August 19, 2010
    Applicant: Sony Computer Entertainment Inc.
    Inventor: Ruxin Chen
  • Publication number: 20100211387
    Abstract: Computer implemented speech processing is disclosed. First and second voice segments are extracted from first and second microphone signals originating from first and second microphones. The first and second voice segments correspond to a voice sound originating from a common source. An estimated source location is generated based on a relative energy of the first and second voice segments and/or a correlation of the first and second voice segments. A determination whether the voice segment is desired or undesired may be made based on the estimated source location.
    Type: Application
    Filed: February 2, 2010
    Publication date: August 19, 2010
    Applicant: Sony Computer Entertainment Inc.
    Inventor: Ruxin Chen
  • Patent number: 7778832
    Abstract: One-to-many comparisons of callers' voice prints with known voice prints to identify any matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract at least a portion of the customer's voice to create a customer voice print, and it formats the segmented voice print for network transmission to a server. The server compares the customer's voice print with multiple known voice prints to determine any matches, meaning that the customer's voice print and one of the known voice prints are likely from the same person. The identification of any matches can be used for a variety of purposes, such as determining whether to authorize a transaction requested by the customer.
    Type: Grant
    Filed: September 26, 2007
    Date of Patent: August 17, 2010
    Assignee: American Express Travel Related Services Company, Inc.
    Inventors: Vicki Broman, Vernon Marshall, Seshasayee Bellamkonda, Marcel Leyva, Cynthia Hanson
  • Patent number: 7769583
    Abstract: A system, method and computer program product for classification of an analog electrical signal using statistical models of training data. A technique is described to quantize the analog electrical signal in a manner which maximizes the compression of the signal while simultaneously minimizing the diminution in the ability to classify the compressed signal. These goals are achieved by utilizing a quantizer designed to minimize the loss in a power of the log-likelihood ratio. A further technique is described to enhance the quantization process by optimally allocating a number of bits for each dimension of the quantized feature vector subject to a maximum number of bits available across all dimensions.
    Type: Grant
    Filed: May 13, 2006
    Date of Patent: August 3, 2010
    Assignee: International Business Machines Corporation
    Inventors: Upendra V. Chaudhari, Hsin I. Tseng, Deepak S. Turaga, Olivier Verscheure
  • Patent number: 7747435
    Abstract: A speaker of encoded speech data recorded in a semiconductor storage device in an IC recorder is to be retrieved easily. An information receiving unit 10 in a speaker retrieval apparatus 1 reads out the encoded speech data recorded in a semiconductor storage device 107 in an IC recorder 100. A speech decoding unit 12 decodes the encoded speech data. A speaker frequency detection unit 13 discriminates the speaker based on a feature of the speech waveform decoded to find the frequency of conversation (frequency of occurrence) of the speaker in a preset time interval. A speaker frequency graph displaying unit 14 displays the speaker frequency on a picture as a two-dimensional graph having time and the frequency as two axes.
    Type: Grant
    Filed: March 15, 2008
    Date of Patent: June 29, 2010
    Assignee: Sony Corporation
    Inventors: Yasuhiro Toguri, Masayuki Nishiguchi
  • Publication number: 20100106495
    Abstract: A voice recognition system comprises: a voice input unit that receives an input signal from a voice input element and output it; a voice detection unit that detects an utterance segment in the input signal; a voice recognition unit that performs voice recognition for the utterance segment; and a control unit that outputs a control signal to at least one of the voice input unit and the voice detection unit and suppresses a detection frequency if the detection frequency satisfies a predetermined condition.
    Type: Application
    Filed: February 27, 2008
    Publication date: April 29, 2010
    Applicant: NEc Corporation
    Inventor: Toru Iwasawa
  • Publication number: 20100063905
    Abstract: Method and system for carrying out all the transactions performable at an ATM, with the exception of provision of cash, by means of a mobile telecommunications device, specifically a third generation (3G) mobile telephone and the UMTS network. The system comprises the modules (2, 4, 5) necessary for securely carrying out said transactions, by means of access to the ATM by using a virtual emulation thereof, and outside existing networks and protocols now used for these types of transactions.
    Type: Application
    Filed: February 16, 2007
    Publication date: March 11, 2010
    Applicant: NILUTESA, S.L.
    Inventor: Nicolás Luca De Tena Sainz
  • Publication number: 20100057457
    Abstract: An unknown word is additionally registered in a speech recognition dictionary by utilizing a correction result, and a new pronunciation of the word that has been registered in a speech recognition dictionary is additionally registered in the speech recognition dictionary, thereby increasing the accuracy of speech recognition. The start time and finish time of each phoneme unit in speech data corresponding to each phoneme included in a phoneme sequence acquired by a phoneme sequence converting section 13 are added to the phoneme sequence. A phoneme sequence extracting section 15 extracts from the phoneme sequence a phoneme sequence portion composed of phonemes existing in a segment corresponding to the period from the start time to the finish time of the word segment of the word corrected by a word correcting section 9 and the extracted phoneme sequence portion is determined as the pronunciation of the corrected word.
    Type: Application
    Filed: November 30, 2007
    Publication date: March 4, 2010
    Applicant: National Institute of Advanced Industrial Science Technology
    Inventors: Jun Ogata, Masataka Goto
  • Patent number: 7660716
    Abstract: The present invention relates to a system and method for automatically verifying that a message received from a user is intelligible. In an exemplary embodiment, a message is received from the user. A speech level of the user's message may be measured and compared to a pre-determined speech level threshold to determine whether the measured speech level is below the pre-determined speech level threshold. A signal-to-noise ratio of the user's message may be measured and compared to a pre-determined signal-to-noise ratio threshold to determine whether the measured signal-to-noise ratio of the message is below the pre-determined signal-to-noise ratio threshold. An estimate of intelligibility for the user's message may be calculated and compared to an intelligibility threshold to determine whether the calculated estimate of intelligibility is below the intelligibility threshold.
    Type: Grant
    Filed: October 3, 2007
    Date of Patent: February 9, 2010
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Harvey S. Cohen, Randy G. Goldberg, Kenneth H. Rosen
  • Patent number: 7653540
    Abstract: The present invention provides a speech signal compression device which allows a storage capacity of data representing speech to be efficiently compressed. In the present invention, a computer C1 operates with respect to speech data to be compressed into speech data for each phoneme on the basis of phoneme labeling data, to unify the time length of a unit pitch section for each of the divided speech data into the same value, thereby creating a pitch waveform and creating a sub-band data representing variation in time of spectrum components of the pitch waveform signal. Also, this sub-band data is compressed so as to match a condition designated by a table for compression, and the compressed data is further encoded in entropy to output the entropy coded data.
    Type: Grant
    Filed: March 26, 2004
    Date of Patent: January 26, 2010
    Assignee: Kabushiki Kaisha Kenwood
    Inventor: Yasushi Sato
  • Publication number: 20100010813
    Abstract: A voice recognition apparatus includes an extraction unit extracting a feature amount from a voice signal, a word dictionary storing a plurality of recognition words; a reject word generation unit storing reject words in the word dictionary in association with the recognition words and a collation unit calculating a degree of similarity between the voice signal and each of the recognition words and reject words stored in the word dictionary by using the feature amount extracted by the extraction unit, determining whether or not a word having a high calculated degree of similarity corresponds to a reject word, when the word is determined as the reject word, excluding the recognition word stored in the word dictionary in association with the reject word from a result of recognition, and outputting a recognition word having a high calculated degree of similarity as a result of recognition.
    Type: Application
    Filed: April 30, 2009
    Publication date: January 14, 2010
    Applicant: FUJITSU LIMITED
    Inventor: Shouji HARADA
  • Patent number: 7630891
    Abstract: The present invention relates to a voice region detection apparatus and method capable of accurately detecting a voice region even in a voice signal with color noise. The voice region detection method comprises the steps of, if a voice signal is input, dividing the input voice signal into frames; performing whitening of surrounding noise by combining white noise with the frames; extracting random parameters indicating randomness of frames from the frames subjected to the whitening; classifying the frames into voice frames and noise frames based on the extracted random parameters; and detecting a voice region by calculating start and end positions of a voice based on the voice and noise frames. According to the present invention, the voice region can be accurately detected even in a voice signal with a large amount of color noise mixed therewith.
    Type: Grant
    Filed: November 26, 2003
    Date of Patent: December 8, 2009
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Kwang-cheol Oh, Yong-beom Lee
  • Patent number: 7627472
    Abstract: A method and a system for person/speaker verification via different communications systems. The system may include a control logic (SL) having access to a voice-controlled dialog system (DS) having verification dialogs stored for querying, a biometrics customer profile (BK), in which the personal biometric data of customers are stored and a provider database (PD), which contains information regarding protected database areas and services in conjunction with the biometric methods authorized for verification. The method for person/speaker verification is designed to ascertain, transmit, analyze and evaluate via telecommunications systems the different personal biometric data that are suitable to establish unequivocally the access authorization of a customer.
    Type: Grant
    Filed: March 18, 2005
    Date of Patent: December 1, 2009
    Assignee: Deutsche Telekom AG
    Inventors: Marian Trinkel, Christel Mueller, Fred Runge
  • Patent number: 7624012
    Abstract: The invention enables to generate a general function (4) which can operate on an input signal (Sx) to extract from the latter a value (DVex) of a global characteristic value expressing a feature (De) of the information conveyed by that signal. It operates by: generating at least one compound function (CF1-CFn), said compound function being generated from at least one of a set of elementary functions (EF1, EF2, . . .
    Type: Grant
    Filed: December 16, 2003
    Date of Patent: November 24, 2009
    Assignee: Sony France S.A.
    Inventors: François Pachet, Aymeric Zils
  • Publication number: 20090271198
    Abstract: Methods, systems, and machine-readable media are disclosed for processing a signal representing speech. According to one embodiment, processing a signal representing speech can comprise receiving a first frame of the signal, the first frame comprising a voiced frame. One or more cords can be extracted from the voiced frame based on occurrence of one or more events within the frame. For example, the one or more events can comprise one or more glottal pulses. The one or more cords can collectively comprise less than all of the frame. For example, each of the cords can begin with onset of a glottal pulse and extend to a point prior to an onset of neighboring glottal pulse but may exclude a portion of the frame prior to the onset of the neighboring glottal pulse. A phoneme for the voiced frame can be determined based on at least one of the extracted cords.
    Type: Application
    Filed: October 23, 2008
    Publication date: October 29, 2009
    Applicant: Red Shift Company, LLC
    Inventors: Joel K. Nyquist, Erik N. Reckase, Matthew D. Robinson, John F. Remillard
  • Publication number: 20090271197
    Abstract: Methods, systems, and machine-readable media are disclosed for processing a signal representing speech. According to one embodiment, processing a signal representing speech can comprise receiving a region of the signal representing speech. The region can comprise a portion of a frame of the signal representing speech classified as a voiced frame. The region can be marked based on one or more pitch estimates for the region. A cord can be identified within the region based on occurrence of one or more events within the region of the signal. For example, the one or more events can comprise one or more glottal pulses. In such cases, cord can begin with onset of a first glottal pulse and extend to a point prior to an onset of a second glottal pulse. The cord may exclude a portion of the region of the signal prior to the onset of the second glottal pulse.
    Type: Application
    Filed: October 23, 2008
    Publication date: October 29, 2009
    Applicant: Red Shift Company, LLC
    Inventors: Joel K. Nyquist, Erik N. Reckase, Matthew D. Robinson, John F. Remillard
  • Publication number: 20090265162
    Abstract: A set of words is converted to a corresponding set of particles, wherein the words and the particles are unique within each set. For each word, all possible partitionings of the word into particles are determined, and a cost is determined for each possible partitioning. The particles of the possible partitioning associated with a minimal cost are added to the set of particles.
    Type: Application
    Filed: June 30, 2009
    Publication date: October 22, 2009
    Inventors: Tony Ezzat, Evandro B. Gouvea
  • Patent number: 7599836
    Abstract: To provide a method of specifying each of speakers of individual voices, based on recorded voices made by a plurality of speakers, with a simple system configuration, and to provide a system using the method. The system includes: microphones individually provided for each of the speakers; a voice processing unit which gives a unique characteristic to each pair of two-channel voice signals recorded with each of the microphones 10, by executing different kinds of voice processing on the respective pairs of voice signals, and which mixes the voice signals for each channel; and an analysis unit which performs an analysis according to the unique characteristics, given to the voice signals concerning the respective microphones through the processing by the voice processing unit, and which specifies the speaker for each speech segment of the voice signals.
    Type: Grant
    Filed: May 25, 2005
    Date of Patent: October 6, 2009
    Assignee: Nuance Communications, Inc.
    Inventors: Osamu Ichikawa, Masafumi Nishimura, Tetsuya Takiguchi
  • Patent number: 7590537
    Abstract: A speech recognition method and apparatus perform speaker clustering and speaker adaptation using average model variation information over speakers while analyzing the quantity variation amount and the directional variation amount. In the speaker clustering method, a speaker group model variation is generated based on the model variation between a speaker-independent model and a training speaker ML model. In the speaker adaptation method, the model in which the model variation between a test speaker ML model and a speaker group ML model to which the test speaker belongs which is most similar to a training speaker group model variation is found, and speaker adaptation is performed on the found model. Herein, the model variation in the speaker clustering and the speaker adaptation are calculated while analyzing both the quantity variation amount and the directional variation amount. The present invention may be applied to any speaker adaptation algorithm of MLLR and MAP.
    Type: Grant
    Filed: December 27, 2004
    Date of Patent: September 15, 2009
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Namhoon Kim, Injeong Choi, Yoonkyung Song
  • Publication number: 20090135177
    Abstract: Systems and methods are disclosed for performing voice personalization of video content. The personalized media content may include a composition of a background scene having a character, head model data representing an individualized three-dimensional (3D) head model of a user, audio data simulating the user's voice, and a viseme track containing instructions for causing the individualized 3D head model to lip sync the words contained in the audio data. The audio data simulating the user's voice can be generated using a voice transformation process. In certain examples, the audio data is based on a text input or selected by the user (e.g., via a telephone or computer) or a textual dialogue of a background character.
    Type: Application
    Filed: November 19, 2008
    Publication date: May 28, 2009
    Applicant: BIG STAGE ENTERTAINMENT, INC.
    Inventors: Jonathan Isaac Strietzel, Jon Hayes Snoddy, Douglas Alexander Fidaleo
  • Patent number: 7529669
    Abstract: A voice based multimodal speaker authentication method and telecommunications application thereof employing a speaker adaptive method for training phenome specific Gaussian mixture models. Applied to telecommunications services, the method may advantageously be implemented in contemporary wireless terminals.
    Type: Grant
    Filed: June 13, 2007
    Date of Patent: May 5, 2009
    Assignee: NEC Laboratories America, Inc.
    Inventors: Srivaths Ravi, Anand Raghunathan, Srimat Chakradhar, Karthik Nandakumar
  • Patent number: 7496512
    Abstract: A method and apparatus are provided for refining segmental boundaries in speech waveforms. Contextual acoustic feature similarities are used as a basis for clustering adjacent phoneme speech units, where each adjacent pair phoneme speech units include a segmental boundary. A refining model is trained for each cluster and used to refine boundaries of contextual phoneme speech units forming the clusters.
    Type: Grant
    Filed: April 13, 2004
    Date of Patent: February 24, 2009
    Assignee: Microsoft Corporation
    Inventors: Yong Zhao, Min Chu, Jian-lai Zhou, Lijuan Wang
  • Patent number: 7496510
    Abstract: Disclosed are a method and apparatus for processing a continuous audio stream containing human speech in order to locate a particular speech-based transaction in the audio stream, applying both known speaker recognition and speech recognition techniques. Only the utterances of a particular predetermined speaker are transcribed thus providing an index and a summary of the underlying dialogue(s). In a first scenario, an incoming audio stream, e.g. a speech call from outside, is scanned in order to detect audio segments of the predetermined speaker. These audio segments are then indexed and only the indexed segments are transcribed into spoken or written language. In a second scenario, two or more speakers located in one room are using a multi-user speech recognition system (SRS). For each user there exists a different speaker model and optionally a different dictionary or vocabulary of words already known or trained by the speech or voice recognition system.
    Type: Grant
    Filed: November 30, 2001
    Date of Patent: February 24, 2009
    Assignee: International Business Machines Corporation
    Inventors: Joachim Frank, Werner Kriechbaum, Gerhard Stenzel
  • Publication number: 20090048836
    Abstract: Portions from segment boundary regions of a plurality of speech segments are extracted. Each segment boundary region is based on a corresponding initial unit boundary. Feature vectors that represent the portions in a vector space are created. For each of a plurality of potential unit boundaries within each segment boundary region, an average discontinuity based on distances between the feature vectors is determined. For each segment, the potential unit boundary associated with a minimum average discontinuity is selected as a new unit boundary.
    Type: Application
    Filed: July 28, 2008
    Publication date: February 19, 2009
    Inventor: Jerome R. Bellegarda
  • Patent number: 7480616
    Abstract: Information relating to an amount of muscle activity is extracted from a myo-electrical signal by activity amount information extraction means, and information recognition is performed by activity amount information recognition means using the information relating to the amount of muscle activity of a speaker. There is a prescribed correspondence relationship between the amount of muscle activity of a speaker and a phoneme uttered by a speaker, so the content of an utterance can be recognized with a high recognition rate by information recognition using information relating to an amount of muscle activity.
    Type: Grant
    Filed: February 27, 2003
    Date of Patent: January 20, 2009
    Assignee: NTT DoCoMo, Inc.
    Inventors: Hiroyuki Manabe, Akira Hiraiwa, Toshiaki Sugimura
  • Patent number: 7475016
    Abstract: A system, method, and apparatus for identifying problematic speech segments is provided. The system includes a clustering module for generating a first cluster of one or more consecutive speech segments if the consecutive speech segments satisfy a predetermined filtering test, and for generating a second cluster comprising at least one different consecutive speech segment selected from the ordered sequence if the at least one different consecutive speech segment satisfies the predetermined filtering test. The system also includes a combining module for combining the first and second clusters as well as the at least one intervening consecutive speech segment to form an aggregated cluster if the aggregated cluster satisfies a predetermined combining criterion. The system can further include a ranking module for ranking aggregated clusters, the ranking reflecting a relative severity of misalignments among problematic speech segments.
    Type: Grant
    Filed: December 15, 2004
    Date of Patent: January 6, 2009
    Assignee: International Business Machines Corporation
    Inventors: Maria E. Smith, Jie Z. Zeng
  • Patent number: 7472062
    Abstract: Methods and arrangements for facilitating data clustering. From a set of input data, a predetermined number of non-overlapping subsets are created. The input data is split recursively to create the subsets.
    Type: Grant
    Filed: January 4, 2002
    Date of Patent: December 30, 2008
    Assignee: International Business Machines Corporation
    Inventors: Upendra V. Chaudhari, Jiri Navratil, Ganesh N. Ramaswamy
  • Publication number: 20080312926
    Abstract: An automatic dual-step, text independent, language-independent speaker voice-print creation and speaker recognition method, wherein a neural network-based technique is used in a first step and a Markov model-based technique is used in a second step. In particular, the first step uses a neural network-based technique for decoding the content of what is uttered by the speaker in terms of language independent acoustic-phonetic classes, wherein the second step uses the sequence of language-independent acoustic-phonetic classes from the first step and employs a Markov model-based technique for creating the speaker voice-print and for recognizing the speaker. The combination of the two steps enables improvement in the accuracy and efficiency of the speaker voice-print creation and of the speaker recognition, without setting any constraints on the lexical content of the speaker utterance and on the language thereof.
    Type: Application
    Filed: May 24, 2005
    Publication date: December 18, 2008
    Inventors: Claudio Vair, Daniele Colibro, Luciano Fissore
  • Patent number: 7464034
    Abstract: A voice converting apparatus is constructed for converting an input voice into an output voice according to a target voice. The apparatus includes a storage section, an analyzing section including a characteristic analyzer, a producing section, a synthesizing section, a memory, an alignment processor, and target decoder.
    Type: Grant
    Filed: September 27, 2004
    Date of Patent: December 9, 2008
    Assignees: Yamaha Corporation, Pompeu Fabra University
    Inventors: Takahiro Kawashima, Yasuo Yoshioka, Pedro Cano, Alex Loscos, Xavier Serra, Mark Schiementz, Jordi Bonada
  • Patent number: 7451082
    Abstract: A method and detector for providing a noise resistant utterance detector is provided by extracting a noise estimate (15) to augment the signal-to-noise ratio of the speech signal, inverse filtering (17) of the speech signal to focus on the periodic excitation part of the signal and spectral reshaping (19) to accentuate separation between formants.
    Type: Grant
    Filed: August 27, 2003
    Date of Patent: November 11, 2008
    Assignee: Texas Instruments Incorporated
    Inventors: Yifan Gong, Alexis P. Bernard
  • Patent number: 7447633
    Abstract: There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states.
    Type: Grant
    Filed: November 22, 2004
    Date of Patent: November 4, 2008
    Assignee: International Business Machines Corporation
    Inventors: Jiri Navratil, James H. Nealand, Jason W. Pelecanos, Ganesh N. Ramaswamy, Ran D. Zilca