Subportions Patents (Class 704/249)
-
Patent number: 8010361Abstract: In an embodiment, a lattice of phone strings in an input communication of a user may be recognized, wherein the lattice may represent a distribution over the phone strings. Morphemes in the input communication of the user may be detected using the recognized lattice. Task-type classification decisions may be made based on the detected morphemes in the input communication of the user.Type: GrantFiled: July 30, 2008Date of Patent: August 30, 2011Assignee: AT&T Intellectual Property II, L.P.Inventors: Allen Louis Gorin, Dijana Petrovska-Delacretaz, Giuseppe Riccardi, Jeremy Huntley Wright
-
Publication number: 20110208525Abstract: A voice recognizing apparatus includes a voice start instructing section 3 for instructing to start voice recognition; a voice input section 1 for receiving uttered voice and converting to a voice signal; a voice recognizing section 2 for recognizing the voice on the basis of the voice signal; an utterance start time detecting section 4 for detecting duration from the time when the voice start instructing section instructs to the time when the voice input section delivers the voice signal; an utterance timing deciding section 5 for deciding utterance timing indicating whether the utterance start is quick or slow by comparing the duration detected by the utterance start time detecting section with a prescribed threshold; an interaction control section 6 for determining a content, which is to be shown when exhibiting a recognition result of the voice recognizing section, in accordance with the utterance timing decided; a system response generating section 7 for generating a system response on the basis of the dType: ApplicationFiled: March 27, 2008Publication date: August 25, 2011Inventors: Yuzuru Inoue, Tadashi Suzuki, Fumitaka Sato, Takayoshi Chikuri
-
Patent number: 8000963Abstract: The sound reproducing apparatus includes a replay section receiving unit that receives a request for reproducing a specific part of sound file from a user, a replay section determining unit that determines a replay section based on the request and conversation structure information stored in a sound data holding unit, and a reproducing unit that reproduces the replay section determined by the replay section determining unit.Type: GrantFiled: March 25, 2005Date of Patent: August 16, 2011Assignee: Fujitsu LimitedInventors: Sachiko Onodera, Ryo Ochitani
-
Patent number: 7996221Abstract: Disclosed herein are systems, methods, and computer-readable storage media for processing a message received from a user to determine whether an estimate of intelligibility is below an intelligibility threshold. The method includes recognizing a portion of a user's message that contains the one or more expected utterances from a critical information list, calculating an estimate of intelligibility for the recognized portion of the user's message that contains the one or more expected utterances, and prompting the user to repeat at least the recognized portion of the user's message if the calculated estimate of intelligibility for the recognized portion of the user's message is below an intelligibility threshold. In one aspect, the method further includes prompting the user to repeat at least a portion of the message if any of a measured speech level and a measured signal-to-noise ratio of the user's message are determined to be below their respective thresholds.Type: GrantFiled: December 22, 2009Date of Patent: August 9, 2011Assignee: AT&T Intellectual Property II, L.P.Inventors: Harvey S. Cohen, Randy G. Goldberg, Kenneth H. Rosen
-
Publication number: 20110184736Abstract: Automated methods are provided for recognizing inputted information items and selecting information items. The recognition and selection processes are performed by selecting category designations that the information items belong to. The category designations improve the accuracy and speed of the inputting and selection processes.Type: ApplicationFiled: January 25, 2011Publication date: July 28, 2011Inventor: Benjamin SLOTZNICK
-
Publication number: 20110131045Abstract: Systems and methods are provided for receiving speech and non-speech communications of natural language questions and/or commands, transcribing the speech and non-speech communications to textual messages, and executing the questions and/or commands. The invention applies context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users presenting questions or commands across multiple domains. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech and non-speech communications and presenting the expected results for a particular question or command.Type: ApplicationFiled: February 2, 2011Publication date: June 2, 2011Applicant: VoiceBox Technologies, Inc.Inventors: Philippe Di Cristo, Min Ke, Robert A. Kennewick, Lynn Elise Armstrong
-
Publication number: 20110119052Abstract: A device extracts prosodic information including a power value from a speech data and an utterance section including a period with a power value equal to or larger than a threshold, from the speech data, divides the utterance section into each section in which a power value equal to or larger than another threshold, acquires phoneme sequence data for each divided speech data by phoneme recognition, generates clusters which is a set of the classified phoneme sequence data by clustering, calculates an evaluation value for each cluster, selects clusters for which the evaluation value is equal to or larger than a given value as candidate clusters, determines one of the phoneme sequence data from the phoneme sequence data constituting the cluster for each candidate cluster to be a representative phoneme sequence, and selects the divided speech data corresponding to the representative phoneme sequence as listening target speech data.Type: ApplicationFiled: November 5, 2010Publication date: May 19, 2011Applicant: Fujitsu LimitedInventor: Sachiko Onodera
-
Publication number: 20110115702Abstract: A method and system for computer programming using speech and one or two hand gesture input is described. The system generally uses a plurality of microphones and cameras as input devices. A configurable event recognition system is described allowing various software objects in a system to respond to speech and hand gesture and other input. From this input program code is produced that can be compiled at any time. Various speech and hand gesture events invoke functions within programs to modify programs, move text and punctuation in a word processor, manipulate mathematical objects, perform data mining, perform natural language interne search, modify project management tasks and visualizations, perform 3D modeling, web page design and web page data entry, and television and DVR programming.Type: ApplicationFiled: July 9, 2009Publication date: May 19, 2011Inventor: David Seaberg
-
Publication number: 20110112838Abstract: A system and method are provided to authenticate a voice in a frequency domain. A voice in the time domain is transformed to a signal in the frequency domain. The first harmonic is set to a predetermined frequency and the other harmonic components are equalized. Similarly, the amplitude of the first harmonic is set to a predetermined amplitude, and the harmonic components are also equalized. The voice signal is then filtered. The amplitudes of each of the harmonic components are then digitized into bits to form at least part of a voice ID. In another system and method, a voice is authenticated in a time domain. The initial rise time, initial fall time, second rise time, second fall time and final oscillation time are digitized into bits to form at least part of a voice ID. The voice IDs are used to authenticate a user's voice.Type: ApplicationFiled: November 10, 2009Publication date: May 12, 2011Applicant: Research In Motion LimitedInventor: Sasan Adibi
-
Publication number: 20110112830Abstract: A system and method are provided to authenticate a voice in a frequency domain. A voice in the time domain is transformed to a signal in the frequency domain. The first harmonic is set to a predetermined frequency and the other harmonic components are equalized. Similarly, the amplitude of the first harmonic is set to a predetermined amplitude, and the harmonic components are also equalized. The voice signal is then filtered. The amplitudes of each of the harmonic components are then digitized into bits to form at least part of a voice ID. In another system and method, a voice is authenticated in a time domain. The initial rise time, initial fall time, second rise time, second fall time and final oscillation time are digitized into bits to form at least part of a voice ID. The voice IDs are used to authenticate a user's voice.Type: ApplicationFiled: November 10, 2009Publication date: May 12, 2011Applicant: Research In Motion LimitedInventor: Sasan Adibi
-
Publication number: 20110112839Abstract: A command recognition device includes: an utterance understanding unit that determines or selects word sequence information from speech information; speech confidence degree calculating unit that calculates degree of speech confidence based on the speech information and the word sequence information; a phrase confidence degree calculating unit that calculates a degree of phrase confidence based on image information and phrase information included in the word sequence information; and a motion control instructing unit that determines whether a command of the word sequence information should be executed based on the degree of speech confidence and the degree of phrase confidence.Type: ApplicationFiled: September 2, 2010Publication date: May 12, 2011Applicant: HONDA MOTOR CO., LTD.Inventors: Kotaro FUNAKOSHI, Mikio NAKANO, Xiang ZUO, Naoto IWAHASHI, Ryo TAGUCHI
-
Patent number: 7937269Abstract: Systems and methods are provided for real-time classification of streaming data. In particular, systems and methods for real-time classification of continuous data streams implement micro-clustering methods for offline and online processing of training data to build and dynamically update training models that are used for classification, as well as incrementally clustering the data over contiguous segments of a continuous data stream (in real-time) into a plurality of micro-clusters from which target profiles are constructed which define/model the behavior of the data in individual segments of the data stream.Type: GrantFiled: August 22, 2005Date of Patent: May 3, 2011Assignee: International Business Machines CorporationInventors: Charu Chandra Aggarwal, Philip Shilung Yu
-
Publication number: 20110093268Abstract: An apparatus, a method, and a machine-readable medium are provided for characterizing differences between two language models. A group of utterances from each of a group of time domains are examined. One of a significant word change or a significant word class change within the plurality of utterances is determined. A first cluster of utterances including a word or a word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances. A second cluster of utterances not including the word or the word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances.Type: ApplicationFiled: September 14, 2010Publication date: April 21, 2011Applicant: AT&T Intellectual Property II, L.P.Inventors: Allen Louis Gorin, John Grothendieck, Jeremy Huntley Greet Wright
-
Patent number: 7864987Abstract: An access system in one embodiment that first determines that someone has correct credentials by using a non-biometric authentication method such as typing in a password, presenting a Smart card containing a cryptographic secret, or having a valid digital signature. Once the credentials are authenticated, then the user must take at least two biometric tests, which can be chosen randomly. In one approach, the biometric tests need only check a template generated from the user who desires access with the stored templates matching the holder of the credentials authenticated by the non-biometric test. Access desirably will be allowed when both biometric tests are passed.Type: GrantFiled: April 18, 2006Date of Patent: January 4, 2011Assignee: Infosys Technologies Ltd.Inventors: Kumar Balepur Venkatanna, Rajat Moona, S V Subrahmanya
-
Patent number: 7853450Abstract: A method of transmitting digital voice information comprises encoding raw speech into encoded digital speech data. The beginning and end of individual phonemes within the encoded digital speech data are marked. The encoded digital speech data is formed into packets. The packets are fed into a speech decoding mechanism.Type: GrantFiled: March 30, 2007Date of Patent: December 14, 2010Assignee: Alcatel-Lucent USA Inc.Inventor: Bryan Kadel
-
Publication number: 20100312559Abstract: A method of playing pictures comprises the steps of: receiving (11) a voice message; extracting (12) a key feature from the voice message; selecting (13) pictures by matching the key feature with pre-stored picture information; generating (14) a picture-voice sequence by integrating the selected pictures and the voice message; and playing (15) the picture-voice sequence. An electronic apparatus comprises a processing unit for implementing the different steps of the method.Type: ApplicationFiled: December 11, 2008Publication date: December 9, 2010Applicant: Koninklijke Philips Electronics N.V.Inventors: Sheng Jin, Xin Chen, Yang Peng, Ningjiang Chen, Yunji Xia
-
Patent number: 7818170Abstract: A method for distributed voice searching may include receiving a search query from a user of the mobile communication device, generating a lattice of coarse linguistic representations from speech parts in the search query, extracting query features from the generated lattice of coarse linguistic representations, generating coarse search feature vectors based on the extracted query features, performing a coarse search using the generated coarse search feature vectors and transmitting the generated coarse search feature vectors to a remote voice search processing unit, receiving remote resultant web indices from the remote voice search processing unit, generating a lattice of fine linguistic representations from speech parts in the search query, generating fine search feature vectors from the lattice of fine linguistic representations, performing a fine search using the coarse search results, the remote resultant web indices and the generated fine search feature vectors, and displaying the fine search results tType: GrantFiled: April 10, 2007Date of Patent: October 19, 2010Assignee: Motorola, Inc.Inventor: Yan Ming Cheng
-
Patent number: 7813927Abstract: There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states.Type: GrantFiled: June 4, 2008Date of Patent: October 12, 2010Assignee: Nuance Communications, Inc.Inventors: Jiri Navratil, James H. Nealand, Jason W. Pelecanos, Ganesh N. Ramaswamy, Ran D. Zilca
-
Patent number: 7792269Abstract: The present invention enables information about a service impacting network event to be collected from network operations and automatically conveyed to a Media Server that plays a network announcement to callers into network customer service center. The announcement can be played as an option on an Interactive Voice Response (IVR) menu and informs the caller of known service issues that are being addressed and estimates of when service should return to normal.Type: GrantFiled: December 30, 2004Date of Patent: September 7, 2010Assignee: AT&T Intellectual Property II, L.P.Inventors: Marian Croak, Hossein Eslambolchi
-
Publication number: 20100211391Abstract: Speech processing is disclosed for an apparatus having a main processing unit, a memory unit, and one or more co-processors. Memory maintenance and voice recognition result retrievals upon execution are performed with a first main processor thread. Voice detection and initial feature extraction on the raw data are performed with a first co-processor. A second co-processor thread receives feature data derived for one or more features extracted by the first co-processor thread and information for locating probability density functions needed for probability computation by a speech recognition model and computes a probability that the one or more features correspond to a known sub-unit of speech using the probability density functions and the feature data. At least a portion of a path probability that a sequence of sub-units of speech correspond to a known speech unit is computed with a third co-processor thread.Type: ApplicationFiled: February 2, 2010Publication date: August 19, 2010Applicant: Sony Computer Entertainment Inc.Inventor: Ruxin Chen
-
Publication number: 20100211387Abstract: Computer implemented speech processing is disclosed. First and second voice segments are extracted from first and second microphone signals originating from first and second microphones. The first and second voice segments correspond to a voice sound originating from a common source. An estimated source location is generated based on a relative energy of the first and second voice segments and/or a correlation of the first and second voice segments. A determination whether the voice segment is desired or undesired may be made based on the estimated source location.Type: ApplicationFiled: February 2, 2010Publication date: August 19, 2010Applicant: Sony Computer Entertainment Inc.Inventor: Ruxin Chen
-
Patent number: 7778832Abstract: One-to-many comparisons of callers' voice prints with known voice prints to identify any matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract at least a portion of the customer's voice to create a customer voice print, and it formats the segmented voice print for network transmission to a server. The server compares the customer's voice print with multiple known voice prints to determine any matches, meaning that the customer's voice print and one of the known voice prints are likely from the same person. The identification of any matches can be used for a variety of purposes, such as determining whether to authorize a transaction requested by the customer.Type: GrantFiled: September 26, 2007Date of Patent: August 17, 2010Assignee: American Express Travel Related Services Company, Inc.Inventors: Vicki Broman, Vernon Marshall, Seshasayee Bellamkonda, Marcel Leyva, Cynthia Hanson
-
Patent number: 7769583Abstract: A system, method and computer program product for classification of an analog electrical signal using statistical models of training data. A technique is described to quantize the analog electrical signal in a manner which maximizes the compression of the signal while simultaneously minimizing the diminution in the ability to classify the compressed signal. These goals are achieved by utilizing a quantizer designed to minimize the loss in a power of the log-likelihood ratio. A further technique is described to enhance the quantization process by optimally allocating a number of bits for each dimension of the quantized feature vector subject to a maximum number of bits available across all dimensions.Type: GrantFiled: May 13, 2006Date of Patent: August 3, 2010Assignee: International Business Machines CorporationInventors: Upendra V. Chaudhari, Hsin I. Tseng, Deepak S. Turaga, Olivier Verscheure
-
Patent number: 7747435Abstract: A speaker of encoded speech data recorded in a semiconductor storage device in an IC recorder is to be retrieved easily. An information receiving unit 10 in a speaker retrieval apparatus 1 reads out the encoded speech data recorded in a semiconductor storage device 107 in an IC recorder 100. A speech decoding unit 12 decodes the encoded speech data. A speaker frequency detection unit 13 discriminates the speaker based on a feature of the speech waveform decoded to find the frequency of conversation (frequency of occurrence) of the speaker in a preset time interval. A speaker frequency graph displaying unit 14 displays the speaker frequency on a picture as a two-dimensional graph having time and the frequency as two axes.Type: GrantFiled: March 15, 2008Date of Patent: June 29, 2010Assignee: Sony CorporationInventors: Yasuhiro Toguri, Masayuki Nishiguchi
-
Publication number: 20100106495Abstract: A voice recognition system comprises: a voice input unit that receives an input signal from a voice input element and output it; a voice detection unit that detects an utterance segment in the input signal; a voice recognition unit that performs voice recognition for the utterance segment; and a control unit that outputs a control signal to at least one of the voice input unit and the voice detection unit and suppresses a detection frequency if the detection frequency satisfies a predetermined condition.Type: ApplicationFiled: February 27, 2008Publication date: April 29, 2010Applicant: NEc CorporationInventor: Toru Iwasawa
-
Publication number: 20100063905Abstract: Method and system for carrying out all the transactions performable at an ATM, with the exception of provision of cash, by means of a mobile telecommunications device, specifically a third generation (3G) mobile telephone and the UMTS network. The system comprises the modules (2, 4, 5) necessary for securely carrying out said transactions, by means of access to the ATM by using a virtual emulation thereof, and outside existing networks and protocols now used for these types of transactions.Type: ApplicationFiled: February 16, 2007Publication date: March 11, 2010Applicant: NILUTESA, S.L.Inventor: Nicolás Luca De Tena Sainz
-
Publication number: 20100057457Abstract: An unknown word is additionally registered in a speech recognition dictionary by utilizing a correction result, and a new pronunciation of the word that has been registered in a speech recognition dictionary is additionally registered in the speech recognition dictionary, thereby increasing the accuracy of speech recognition. The start time and finish time of each phoneme unit in speech data corresponding to each phoneme included in a phoneme sequence acquired by a phoneme sequence converting section 13 are added to the phoneme sequence. A phoneme sequence extracting section 15 extracts from the phoneme sequence a phoneme sequence portion composed of phonemes existing in a segment corresponding to the period from the start time to the finish time of the word segment of the word corrected by a word correcting section 9 and the extracted phoneme sequence portion is determined as the pronunciation of the corrected word.Type: ApplicationFiled: November 30, 2007Publication date: March 4, 2010Applicant: National Institute of Advanced Industrial Science TechnologyInventors: Jun Ogata, Masataka Goto
-
Patent number: 7660716Abstract: The present invention relates to a system and method for automatically verifying that a message received from a user is intelligible. In an exemplary embodiment, a message is received from the user. A speech level of the user's message may be measured and compared to a pre-determined speech level threshold to determine whether the measured speech level is below the pre-determined speech level threshold. A signal-to-noise ratio of the user's message may be measured and compared to a pre-determined signal-to-noise ratio threshold to determine whether the measured signal-to-noise ratio of the message is below the pre-determined signal-to-noise ratio threshold. An estimate of intelligibility for the user's message may be calculated and compared to an intelligibility threshold to determine whether the calculated estimate of intelligibility is below the intelligibility threshold.Type: GrantFiled: October 3, 2007Date of Patent: February 9, 2010Assignee: AT&T Intellectual Property II, L.P.Inventors: Harvey S. Cohen, Randy G. Goldberg, Kenneth H. Rosen
-
Patent number: 7653540Abstract: The present invention provides a speech signal compression device which allows a storage capacity of data representing speech to be efficiently compressed. In the present invention, a computer C1 operates with respect to speech data to be compressed into speech data for each phoneme on the basis of phoneme labeling data, to unify the time length of a unit pitch section for each of the divided speech data into the same value, thereby creating a pitch waveform and creating a sub-band data representing variation in time of spectrum components of the pitch waveform signal. Also, this sub-band data is compressed so as to match a condition designated by a table for compression, and the compressed data is further encoded in entropy to output the entropy coded data.Type: GrantFiled: March 26, 2004Date of Patent: January 26, 2010Assignee: Kabushiki Kaisha KenwoodInventor: Yasushi Sato
-
Publication number: 20100010813Abstract: A voice recognition apparatus includes an extraction unit extracting a feature amount from a voice signal, a word dictionary storing a plurality of recognition words; a reject word generation unit storing reject words in the word dictionary in association with the recognition words and a collation unit calculating a degree of similarity between the voice signal and each of the recognition words and reject words stored in the word dictionary by using the feature amount extracted by the extraction unit, determining whether or not a word having a high calculated degree of similarity corresponds to a reject word, when the word is determined as the reject word, excluding the recognition word stored in the word dictionary in association with the reject word from a result of recognition, and outputting a recognition word having a high calculated degree of similarity as a result of recognition.Type: ApplicationFiled: April 30, 2009Publication date: January 14, 2010Applicant: FUJITSU LIMITEDInventor: Shouji HARADA
-
Patent number: 7630891Abstract: The present invention relates to a voice region detection apparatus and method capable of accurately detecting a voice region even in a voice signal with color noise. The voice region detection method comprises the steps of, if a voice signal is input, dividing the input voice signal into frames; performing whitening of surrounding noise by combining white noise with the frames; extracting random parameters indicating randomness of frames from the frames subjected to the whitening; classifying the frames into voice frames and noise frames based on the extracted random parameters; and detecting a voice region by calculating start and end positions of a voice based on the voice and noise frames. According to the present invention, the voice region can be accurately detected even in a voice signal with a large amount of color noise mixed therewith.Type: GrantFiled: November 26, 2003Date of Patent: December 8, 2009Assignee: Samsung Electronics Co., Ltd.Inventors: Kwang-cheol Oh, Yong-beom Lee
-
Patent number: 7627472Abstract: A method and a system for person/speaker verification via different communications systems. The system may include a control logic (SL) having access to a voice-controlled dialog system (DS) having verification dialogs stored for querying, a biometrics customer profile (BK), in which the personal biometric data of customers are stored and a provider database (PD), which contains information regarding protected database areas and services in conjunction with the biometric methods authorized for verification. The method for person/speaker verification is designed to ascertain, transmit, analyze and evaluate via telecommunications systems the different personal biometric data that are suitable to establish unequivocally the access authorization of a customer.Type: GrantFiled: March 18, 2005Date of Patent: December 1, 2009Assignee: Deutsche Telekom AGInventors: Marian Trinkel, Christel Mueller, Fred Runge
-
Patent number: 7624012Abstract: The invention enables to generate a general function (4) which can operate on an input signal (Sx) to extract from the latter a value (DVex) of a global characteristic value expressing a feature (De) of the information conveyed by that signal. It operates by: generating at least one compound function (CF1-CFn), said compound function being generated from at least one of a set of elementary functions (EF1, EF2, . . .Type: GrantFiled: December 16, 2003Date of Patent: November 24, 2009Assignee: Sony France S.A.Inventors: François Pachet, Aymeric Zils
-
Publication number: 20090271198Abstract: Methods, systems, and machine-readable media are disclosed for processing a signal representing speech. According to one embodiment, processing a signal representing speech can comprise receiving a first frame of the signal, the first frame comprising a voiced frame. One or more cords can be extracted from the voiced frame based on occurrence of one or more events within the frame. For example, the one or more events can comprise one or more glottal pulses. The one or more cords can collectively comprise less than all of the frame. For example, each of the cords can begin with onset of a glottal pulse and extend to a point prior to an onset of neighboring glottal pulse but may exclude a portion of the frame prior to the onset of the neighboring glottal pulse. A phoneme for the voiced frame can be determined based on at least one of the extracted cords.Type: ApplicationFiled: October 23, 2008Publication date: October 29, 2009Applicant: Red Shift Company, LLCInventors: Joel K. Nyquist, Erik N. Reckase, Matthew D. Robinson, John F. Remillard
-
Publication number: 20090271197Abstract: Methods, systems, and machine-readable media are disclosed for processing a signal representing speech. According to one embodiment, processing a signal representing speech can comprise receiving a region of the signal representing speech. The region can comprise a portion of a frame of the signal representing speech classified as a voiced frame. The region can be marked based on one or more pitch estimates for the region. A cord can be identified within the region based on occurrence of one or more events within the region of the signal. For example, the one or more events can comprise one or more glottal pulses. In such cases, cord can begin with onset of a first glottal pulse and extend to a point prior to an onset of a second glottal pulse. The cord may exclude a portion of the region of the signal prior to the onset of the second glottal pulse.Type: ApplicationFiled: October 23, 2008Publication date: October 29, 2009Applicant: Red Shift Company, LLCInventors: Joel K. Nyquist, Erik N. Reckase, Matthew D. Robinson, John F. Remillard
-
Publication number: 20090265162Abstract: A set of words is converted to a corresponding set of particles, wherein the words and the particles are unique within each set. For each word, all possible partitionings of the word into particles are determined, and a cost is determined for each possible partitioning. The particles of the possible partitioning associated with a minimal cost are added to the set of particles.Type: ApplicationFiled: June 30, 2009Publication date: October 22, 2009Inventors: Tony Ezzat, Evandro B. Gouvea
-
Patent number: 7599836Abstract: To provide a method of specifying each of speakers of individual voices, based on recorded voices made by a plurality of speakers, with a simple system configuration, and to provide a system using the method. The system includes: microphones individually provided for each of the speakers; a voice processing unit which gives a unique characteristic to each pair of two-channel voice signals recorded with each of the microphones 10, by executing different kinds of voice processing on the respective pairs of voice signals, and which mixes the voice signals for each channel; and an analysis unit which performs an analysis according to the unique characteristics, given to the voice signals concerning the respective microphones through the processing by the voice processing unit, and which specifies the speaker for each speech segment of the voice signals.Type: GrantFiled: May 25, 2005Date of Patent: October 6, 2009Assignee: Nuance Communications, Inc.Inventors: Osamu Ichikawa, Masafumi Nishimura, Tetsuya Takiguchi
-
Patent number: 7590537Abstract: A speech recognition method and apparatus perform speaker clustering and speaker adaptation using average model variation information over speakers while analyzing the quantity variation amount and the directional variation amount. In the speaker clustering method, a speaker group model variation is generated based on the model variation between a speaker-independent model and a training speaker ML model. In the speaker adaptation method, the model in which the model variation between a test speaker ML model and a speaker group ML model to which the test speaker belongs which is most similar to a training speaker group model variation is found, and speaker adaptation is performed on the found model. Herein, the model variation in the speaker clustering and the speaker adaptation are calculated while analyzing both the quantity variation amount and the directional variation amount. The present invention may be applied to any speaker adaptation algorithm of MLLR and MAP.Type: GrantFiled: December 27, 2004Date of Patent: September 15, 2009Assignee: Samsung Electronics Co., Ltd.Inventors: Namhoon Kim, Injeong Choi, Yoonkyung Song
-
Publication number: 20090135177Abstract: Systems and methods are disclosed for performing voice personalization of video content. The personalized media content may include a composition of a background scene having a character, head model data representing an individualized three-dimensional (3D) head model of a user, audio data simulating the user's voice, and a viseme track containing instructions for causing the individualized 3D head model to lip sync the words contained in the audio data. The audio data simulating the user's voice can be generated using a voice transformation process. In certain examples, the audio data is based on a text input or selected by the user (e.g., via a telephone or computer) or a textual dialogue of a background character.Type: ApplicationFiled: November 19, 2008Publication date: May 28, 2009Applicant: BIG STAGE ENTERTAINMENT, INC.Inventors: Jonathan Isaac Strietzel, Jon Hayes Snoddy, Douglas Alexander Fidaleo
-
Patent number: 7529669Abstract: A voice based multimodal speaker authentication method and telecommunications application thereof employing a speaker adaptive method for training phenome specific Gaussian mixture models. Applied to telecommunications services, the method may advantageously be implemented in contemporary wireless terminals.Type: GrantFiled: June 13, 2007Date of Patent: May 5, 2009Assignee: NEC Laboratories America, Inc.Inventors: Srivaths Ravi, Anand Raghunathan, Srimat Chakradhar, Karthik Nandakumar
-
Patent number: 7496512Abstract: A method and apparatus are provided for refining segmental boundaries in speech waveforms. Contextual acoustic feature similarities are used as a basis for clustering adjacent phoneme speech units, where each adjacent pair phoneme speech units include a segmental boundary. A refining model is trained for each cluster and used to refine boundaries of contextual phoneme speech units forming the clusters.Type: GrantFiled: April 13, 2004Date of Patent: February 24, 2009Assignee: Microsoft CorporationInventors: Yong Zhao, Min Chu, Jian-lai Zhou, Lijuan Wang
-
Patent number: 7496510Abstract: Disclosed are a method and apparatus for processing a continuous audio stream containing human speech in order to locate a particular speech-based transaction in the audio stream, applying both known speaker recognition and speech recognition techniques. Only the utterances of a particular predetermined speaker are transcribed thus providing an index and a summary of the underlying dialogue(s). In a first scenario, an incoming audio stream, e.g. a speech call from outside, is scanned in order to detect audio segments of the predetermined speaker. These audio segments are then indexed and only the indexed segments are transcribed into spoken or written language. In a second scenario, two or more speakers located in one room are using a multi-user speech recognition system (SRS). For each user there exists a different speaker model and optionally a different dictionary or vocabulary of words already known or trained by the speech or voice recognition system.Type: GrantFiled: November 30, 2001Date of Patent: February 24, 2009Assignee: International Business Machines CorporationInventors: Joachim Frank, Werner Kriechbaum, Gerhard Stenzel
-
Publication number: 20090048836Abstract: Portions from segment boundary regions of a plurality of speech segments are extracted. Each segment boundary region is based on a corresponding initial unit boundary. Feature vectors that represent the portions in a vector space are created. For each of a plurality of potential unit boundaries within each segment boundary region, an average discontinuity based on distances between the feature vectors is determined. For each segment, the potential unit boundary associated with a minimum average discontinuity is selected as a new unit boundary.Type: ApplicationFiled: July 28, 2008Publication date: February 19, 2009Inventor: Jerome R. Bellegarda
-
Patent number: 7480616Abstract: Information relating to an amount of muscle activity is extracted from a myo-electrical signal by activity amount information extraction means, and information recognition is performed by activity amount information recognition means using the information relating to the amount of muscle activity of a speaker. There is a prescribed correspondence relationship between the amount of muscle activity of a speaker and a phoneme uttered by a speaker, so the content of an utterance can be recognized with a high recognition rate by information recognition using information relating to an amount of muscle activity.Type: GrantFiled: February 27, 2003Date of Patent: January 20, 2009Assignee: NTT DoCoMo, Inc.Inventors: Hiroyuki Manabe, Akira Hiraiwa, Toshiaki Sugimura
-
Patent number: 7475016Abstract: A system, method, and apparatus for identifying problematic speech segments is provided. The system includes a clustering module for generating a first cluster of one or more consecutive speech segments if the consecutive speech segments satisfy a predetermined filtering test, and for generating a second cluster comprising at least one different consecutive speech segment selected from the ordered sequence if the at least one different consecutive speech segment satisfies the predetermined filtering test. The system also includes a combining module for combining the first and second clusters as well as the at least one intervening consecutive speech segment to form an aggregated cluster if the aggregated cluster satisfies a predetermined combining criterion. The system can further include a ranking module for ranking aggregated clusters, the ranking reflecting a relative severity of misalignments among problematic speech segments.Type: GrantFiled: December 15, 2004Date of Patent: January 6, 2009Assignee: International Business Machines CorporationInventors: Maria E. Smith, Jie Z. Zeng
-
Patent number: 7472062Abstract: Methods and arrangements for facilitating data clustering. From a set of input data, a predetermined number of non-overlapping subsets are created. The input data is split recursively to create the subsets.Type: GrantFiled: January 4, 2002Date of Patent: December 30, 2008Assignee: International Business Machines CorporationInventors: Upendra V. Chaudhari, Jiri Navratil, Ganesh N. Ramaswamy
-
Publication number: 20080312926Abstract: An automatic dual-step, text independent, language-independent speaker voice-print creation and speaker recognition method, wherein a neural network-based technique is used in a first step and a Markov model-based technique is used in a second step. In particular, the first step uses a neural network-based technique for decoding the content of what is uttered by the speaker in terms of language independent acoustic-phonetic classes, wherein the second step uses the sequence of language-independent acoustic-phonetic classes from the first step and employs a Markov model-based technique for creating the speaker voice-print and for recognizing the speaker. The combination of the two steps enables improvement in the accuracy and efficiency of the speaker voice-print creation and of the speaker recognition, without setting any constraints on the lexical content of the speaker utterance and on the language thereof.Type: ApplicationFiled: May 24, 2005Publication date: December 18, 2008Inventors: Claudio Vair, Daniele Colibro, Luciano Fissore
-
Patent number: 7464034Abstract: A voice converting apparatus is constructed for converting an input voice into an output voice according to a target voice. The apparatus includes a storage section, an analyzing section including a characteristic analyzer, a producing section, a synthesizing section, a memory, an alignment processor, and target decoder.Type: GrantFiled: September 27, 2004Date of Patent: December 9, 2008Assignees: Yamaha Corporation, Pompeu Fabra UniversityInventors: Takahiro Kawashima, Yasuo Yoshioka, Pedro Cano, Alex Loscos, Xavier Serra, Mark Schiementz, Jordi Bonada
-
Patent number: 7451082Abstract: A method and detector for providing a noise resistant utterance detector is provided by extracting a noise estimate (15) to augment the signal-to-noise ratio of the speech signal, inverse filtering (17) of the speech signal to focus on the periodic excitation part of the signal and spectral reshaping (19) to accentuate separation between formants.Type: GrantFiled: August 27, 2003Date of Patent: November 11, 2008Assignee: Texas Instruments IncorporatedInventors: Yifan Gong, Alexis P. Bernard
-
Patent number: 7447633Abstract: There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states.Type: GrantFiled: November 22, 2004Date of Patent: November 4, 2008Assignee: International Business Machines CorporationInventors: Jiri Navratil, James H. Nealand, Jason W. Pelecanos, Ganesh N. Ramaswamy, Ran D. Zilca