Subportions Patents (Class 704/249)

Method and system for automatically detecting morphemes in a task classification system using lattices

Patent number: 8010361

Abstract: In an embodiment, a lattice of phone strings in an input communication of a user may be recognized, wherein the lattice may represent a distribution over the phone strings. Morphemes in the input communication of the user may be detected using the recognized lattice. Task-type classification decisions may be made based on the detected morphemes in the input communication of the user.

Type: Grant

Filed: July 30, 2008

Date of Patent: August 30, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Allen Louis Gorin, Dijana Petrovska-Delacretaz, Giuseppe Riccardi, Jeremy Huntley Wright
VOICE RECOGNIZING APPARATUS

Publication number: 20110208525

Abstract: A voice recognizing apparatus includes a voice start instructing section 3 for instructing to start voice recognition; a voice input section 1 for receiving uttered voice and converting to a voice signal; a voice recognizing section 2 for recognizing the voice on the basis of the voice signal; an utterance start time detecting section 4 for detecting duration from the time when the voice start instructing section instructs to the time when the voice input section delivers the voice signal; an utterance timing deciding section 5 for deciding utterance timing indicating whether the utterance start is quick or slow by comparing the duration detected by the utterance start time detecting section with a prescribed threshold; an interaction control section 6 for determining a content, which is to be shown when exhibiting a recognition result of the voice recognizing section, in accordance with the utterance timing decided; a system response generating section 7 for generating a system response on the basis of the d

Type: Application

Filed: March 27, 2008

Publication date: August 25, 2011

Inventors: Yuzuru Inoue, Tadashi Suzuki, Fumitaka Sato, Takayoshi Chikuri
Sound reproducing apparatus

Patent number: 8000963

Abstract: The sound reproducing apparatus includes a replay section receiving unit that receives a request for reproducing a specific part of sound file from a user, a replay section determining unit that determines a replay section based on the request and conversation structure information stored in a sound data holding unit, and a reproducing unit that reproduces the replay section determined by the replay section determining unit.

Type: Grant

Filed: March 25, 2005

Date of Patent: August 16, 2011

Assignee: Fujitsu Limited

Inventors: Sachiko Onodera, Ryo Ochitani
System and method for automatic verification of the understandability of speech

Patent number: 7996221

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for processing a message received from a user to determine whether an estimate of intelligibility is below an intelligibility threshold. The method includes recognizing a portion of a user's message that contains the one or more expected utterances from a critical information list, calculating an estimate of intelligibility for the recognized portion of the user's message that contains the one or more expected utterances, and prompting the user to repeat at least the recognized portion of the user's message if the calculated estimate of intelligibility for the recognized portion of the user's message is below an intelligibility threshold. In one aspect, the method further includes prompting the user to repeat at least a portion of the message if any of a measured speech level and a measured signal-to-noise ratio of the user's message are determined to be below their respective thresholds.

Type: Grant

Filed: December 22, 2009

Date of Patent: August 9, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Harvey S. Cohen, Randy G. Goldberg, Kenneth H. Rosen
AUTOMATED METHOD OF RECOGNIZING INPUTTED INFORMATION ITEMS AND SELECTING INFORMATION ITEMS

Publication number: 20110184736

Abstract: Automated methods are provided for recognizing inputted information items and selecting information items. The recognition and selection processes are performed by selecting category designations that the information items belong to. The category designations improve the accuracy and speed of the inputting and selection processes.

Type: Application

Filed: January 25, 2011

Publication date: July 28, 2011

Inventor: Benjamin SLOTZNICK
SYSTEMS AND METHODS FOR RESPONDING TO NATURAL LANGUAGE SPEECH UTTERANCE

Publication number: 20110131045

Abstract: Systems and methods are provided for receiving speech and non-speech communications of natural language questions and/or commands, transcribing the speech and non-speech communications to textual messages, and executing the questions and/or commands. The invention applies context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users presenting questions or commands across multiple domains. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech and non-speech communications and presenting the expected results for a particular question or command.

Type: Application

Filed: February 2, 2011

Publication date: June 2, 2011

Applicant: VoiceBox Technologies, Inc.

Inventors: Philippe Di Cristo, Min Ke, Robert A. Kennewick, Lynn Elise Armstrong
Speech recognition dictionary creating support device, computer readable medium storing processing program, and processing method

Publication number: 20110119052

Abstract: A device extracts prosodic information including a power value from a speech data and an utterance section including a period with a power value equal to or larger than a threshold, from the speech data, divides the utterance section into each section in which a power value equal to or larger than another threshold, acquires phoneme sequence data for each divided speech data by phoneme recognition, generates clusters which is a set of the classified phoneme sequence data by clustering, calculates an evaluation value for each cluster, selects clusters for which the evaluation value is equal to or larger than a given value as candidate clusters, determines one of the phoneme sequence data from the phoneme sequence data constituting the cluster for each candidate cluster to be a representative phoneme sequence, and selects the divided speech data corresponding to the representative phoneme sequence as listening target speech data.

Type: Application

Filed: November 5, 2010

Publication date: May 19, 2011

Applicant: Fujitsu Limited

Inventor: Sachiko Onodera
Process for Providing and Editing Instructions, Data, Data Structures, and Algorithms in a Computer System

Publication number: 20110115702

Abstract: A method and system for computer programming using speech and one or two hand gesture input is described. The system generally uses a plurality of microphones and cameras as input devices. A configurable event recognition system is described allowing various software objects in a system to respond to speech and hand gesture and other input. From this input program code is produced that can be compiled at any time. Various speech and hand gesture events invoke functions within programs to modify programs, move text and punctuation in a word processor, manipulate mathematical objects, perform data mining, perform natural language interne search, modify project management tasks and visualizations, perform 3D modeling, web page design and web page data entry, and television and DVR programming.

Type: Application

Filed: July 9, 2009

Publication date: May 19, 2011

Inventor: David Seaberg
SYSTEM AND METHOD FOR LOW OVERHEAD VOICE AUTHENTICATION

Publication number: 20110112838

Abstract: A system and method are provided to authenticate a voice in a frequency domain. A voice in the time domain is transformed to a signal in the frequency domain. The first harmonic is set to a predetermined frequency and the other harmonic components are equalized. Similarly, the amplitude of the first harmonic is set to a predetermined amplitude, and the harmonic components are also equalized. The voice signal is then filtered. The amplitudes of each of the harmonic components are then digitized into bits to form at least part of a voice ID. In another system and method, a voice is authenticated in a time domain. The initial rise time, initial fall time, second rise time, second fall time and final oscillation time are digitized into bits to form at least part of a voice ID. The voice IDs are used to authenticate a user's voice.

Type: Application

Filed: November 10, 2009

Publication date: May 12, 2011

Applicant: Research In Motion Limited

Inventor: Sasan Adibi
SYSTEM AND METHOD FOR LOW OVERHEAD VOICE AUTHENTICATION

Publication number: 20110112830

Abstract: A system and method are provided to authenticate a voice in a frequency domain. A voice in the time domain is transformed to a signal in the frequency domain. The first harmonic is set to a predetermined frequency and the other harmonic components are equalized. Similarly, the amplitude of the first harmonic is set to a predetermined amplitude, and the harmonic components are also equalized. The voice signal is then filtered. The amplitudes of each of the harmonic components are then digitized into bits to form at least part of a voice ID. In another system and method, a voice is authenticated in a time domain. The initial rise time, initial fall time, second rise time, second fall time and final oscillation time are digitized into bits to form at least part of a voice ID. The voice IDs are used to authenticate a user's voice.

Type: Application

Filed: November 10, 2009

Publication date: May 12, 2011

Applicant: Research In Motion Limited

Inventor: Sasan Adibi
COMMAND RECOGNITION DEVICE, COMMAND RECOGNITION METHOD, AND COMMAND RECOGNITION ROBOT

Publication number: 20110112839

Abstract: A command recognition device includes: an utterance understanding unit that determines or selects word sequence information from speech information; speech confidence degree calculating unit that calculates degree of speech confidence based on the speech information and the word sequence information; a phrase confidence degree calculating unit that calculates a degree of phrase confidence based on image information and phrase information included in the word sequence information; and a motion control instructing unit that determines whether a command of the word sequence information should be executed based on the degree of speech confidence and the degree of phrase confidence.

Type: Application

Filed: September 2, 2010

Publication date: May 12, 2011

Applicant: HONDA MOTOR CO., LTD.

Inventors: Kotaro FUNAKOSHI, Mikio NAKANO, Xiang ZUO, Naoto IWAHASHI, Ryo TAGUCHI
Systems and methods for providing real-time classification of continuous data streams

Patent number: 7937269

Abstract: Systems and methods are provided for real-time classification of streaming data. In particular, systems and methods for real-time classification of continuous data streams implement micro-clustering methods for offline and online processing of training data to build and dynamically update training models that are used for classification, as well as incrementally clustering the data over contiguous segments of a continuous data stream (in real-time) into a plurality of micro-clusters from which target profiles are constructed which define/model the behavior of the data in individual segments of the data stream.

Type: Grant

Filed: August 22, 2005

Date of Patent: May 3, 2011

Assignee: International Business Machines Corporation

Inventors: Charu Chandra Aggarwal, Philip Shilung Yu
APPARATUS AND METHOD FOR ANALYSIS OF LANGUAGE MODEL CHANGES

Publication number: 20110093268

Abstract: An apparatus, a method, and a machine-readable medium are provided for characterizing differences between two language models. A group of utterances from each of a group of time domains are examined. One of a significant word change or a significant word class change within the plurality of utterances is determined. A first cluster of utterances including a word or a word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances. A second cluster of utterances not including the word or the word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances.

Type: Application

Filed: September 14, 2010

Publication date: April 21, 2011

Applicant: AT&T Intellectual Property II, L.P.

Inventors: Allen Louis Gorin, John Grothendieck, Jeremy Huntley Greet Wright
Methods and systems for secured access to devices and systems

Patent number: 7864987

Abstract: An access system in one embodiment that first determines that someone has correct credentials by using a non-biometric authentication method such as typing in a password, presenting a Smart card containing a cryptographic secret, or having a valid digital signature. Once the credentials are authenticated, then the user must take at least two biometric tests, which can be chosen randomly. In one approach, the biometric tests need only check a template generated from the user who desires access with the stored templates matching the holder of the credentials authenticated by the non-biometric test. Access desirably will be allowed when both biometric tests are passed.

Type: Grant

Filed: April 18, 2006

Date of Patent: January 4, 2011

Assignee: Infosys Technologies Ltd.

Inventors: Kumar Balepur Venkatanna, Rajat Moona, S V Subrahmanya
Digital voice enhancement

Patent number: 7853450

Abstract: A method of transmitting digital voice information comprises encoding raw speech into encoded digital speech data. The beginning and end of individual phonemes within the encoded digital speech data are marked. The encoded digital speech data is formed into packets. The packets are fed into a speech decoding mechanism.

Type: Grant

Filed: March 30, 2007

Date of Patent: December 14, 2010

Assignee: Alcatel-Lucent USA Inc.

Inventor: Bryan Kadel
METHOD AND APPARATUS FOR PLAYING PICTURES

Publication number: 20100312559

Abstract: A method of playing pictures comprises the steps of: receiving (11) a voice message; extracting (12) a key feature from the voice message; selecting (13) pictures by matching the key feature with pre-stored picture information; generating (14) a picture-voice sequence by integrating the selected pictures and the voice message; and playing (15) the picture-voice sequence. An electronic apparatus comprises a processing unit for implementing the different steps of the method.

Type: Application

Filed: December 11, 2008

Publication date: December 9, 2010

Applicant: Koninklijke Philips Electronics N.V.

Inventors: Sheng Jin, Xin Chen, Yang Peng, Ningjiang Chen, Yunji Xia
Method and apparatus for distributed voice searching

Patent number: 7818170

Abstract: A method for distributed voice searching may include receiving a search query from a user of the mobile communication device, generating a lattice of coarse linguistic representations from speech parts in the search query, extracting query features from the generated lattice of coarse linguistic representations, generating coarse search feature vectors based on the extracted query features, performing a coarse search using the generated coarse search feature vectors and transmitting the generated coarse search feature vectors to a remote voice search processing unit, receiving remote resultant web indices from the remote voice search processing unit, generating a lattice of fine linguistic representations from speech parts in the search query, generating fine search feature vectors from the lattice of fine linguistic representations, performing a fine search using the coarse search results, the remote resultant web indices and the generated fine search feature vectors, and displaying the fine search results t

Type: Grant

Filed: April 10, 2007

Date of Patent: October 19, 2010

Assignee: Motorola, Inc.

Inventor: Yan Ming Cheng
Method and apparatus for training a text independent speaker recognition system using speech data with text labels

Patent number: 7813927

Abstract: There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states.

Type: Grant

Filed: June 4, 2008

Date of Patent: October 12, 2010

Assignee: Nuance Communications, Inc.

Inventors: Jiri Navratil, James H. Nealand, Jason W. Pelecanos, Ganesh N. Ramaswamy, Ran D. Zilca
Method and apparatus for providing network announcements about service impairments

Patent number: 7792269

Abstract: The present invention enables information about a service impacting network event to be collected from network operations and automatically conveyed to a Media Server that plays a network announcement to callers into network customer service center. The announcement can be played as an option on an Interactive Voice Response (IVR) menu and informs the caller of known service issues that are being addressed and estimates of when service should return to normal.

Type: Grant

Filed: December 30, 2004

Date of Patent: September 7, 2010

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Marian Croak, Hossein Eslambolchi
AUTOMATIC COMPUTATION STREAMING PARTITION FOR VOICE RECOGNITION ON MULTIPLE PROCESSORS WITH LIMITED MEMORY

Publication number: 20100211391

Abstract: Speech processing is disclosed for an apparatus having a main processing unit, a memory unit, and one or more co-processors. Memory maintenance and voice recognition result retrievals upon execution are performed with a first main processor thread. Voice detection and initial feature extraction on the raw data are performed with a first co-processor. A second co-processor thread receives feature data derived for one or more features extracted by the first co-processor thread and information for locating probability density functions needed for probability computation by a speech recognition model and computes a probability that the one or more features correspond to a known sub-unit of speech using the probability density functions and the feature data. At least a portion of a path probability that a sequence of sub-units of speech correspond to a known speech unit is computed with a third co-processor thread.

Type: Application

Filed: February 2, 2010

Publication date: August 19, 2010

Applicant: Sony Computer Entertainment Inc.

Inventor: Ruxin Chen
SPEECH PROCESSING WITH SOURCE LOCATION ESTIMATION USING SIGNALS FROM TWO OR MORE MICROPHONES

Publication number: 20100211387

Abstract: Computer implemented speech processing is disclosed. First and second voice segments are extracted from first and second microphone signals originating from first and second microphones. The first and second voice segments correspond to a voice sound originating from a common source. An estimated source location is generated based on a relative energy of the first and second voice segments and/or a correlation of the first and second voice segments. A determination whether the voice segment is desired or undesired may be made based on the estimated source location.

Type: Application

Filed: February 2, 2010

Publication date: August 19, 2010

Applicant: Sony Computer Entertainment Inc.

Inventor: Ruxin Chen
Speaker recognition in a multi-speaker environment and comparison of several voice prints to many

Patent number: 7778832

Abstract: One-to-many comparisons of callers' voice prints with known voice prints to identify any matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract at least a portion of the customer's voice to create a customer voice print, and it formats the segmented voice print for network transmission to a server. The server compares the customer's voice print with multiple known voice prints to determine any matches, meaning that the customer's voice print and one of the known voice prints are likely from the same person. The identification of any matches can be used for a variety of purposes, such as determining whether to authorize a transaction requested by the customer.

Type: Grant

Filed: September 26, 2007

Date of Patent: August 17, 2010

Assignee: American Express Travel Related Services Company, Inc.

Inventors: Vicki Broman, Vernon Marshall, Seshasayee Bellamkonda, Marcel Leyva, Cynthia Hanson
Quantizing feature vectors in decision-making applications

Patent number: 7769583

Abstract: A system, method and computer program product for classification of an analog electrical signal using statistical models of training data. A technique is described to quantize the analog electrical signal in a manner which maximizes the compression of the signal while simultaneously minimizing the diminution in the ability to classify the compressed signal. These goals are achieved by utilizing a quantizer designed to minimize the loss in a power of the log-likelihood ratio. A further technique is described to enhance the quantization process by optimally allocating a number of bits for each dimension of the quantized feature vector subject to a maximum number of bits available across all dimensions.

Type: Grant

Filed: May 13, 2006

Date of Patent: August 3, 2010

Assignee: International Business Machines Corporation

Inventors: Upendra V. Chaudhari, Hsin I. Tseng, Deepak S. Turaga, Olivier Verscheure
Information retrieving method and apparatus

Patent number: 7747435

Abstract: A speaker of encoded speech data recorded in a semiconductor storage device in an IC recorder is to be retrieved easily. An information receiving unit 10 in a speaker retrieval apparatus 1 reads out the encoded speech data recorded in a semiconductor storage device 107 in an IC recorder 100. A speech decoding unit 12 decodes the encoded speech data. A speaker frequency detection unit 13 discriminates the speaker based on a feature of the speech waveform decoded to find the frequency of conversation (frequency of occurrence) of the speaker in a preset time interval. A speaker frequency graph displaying unit 14 displays the speaker frequency on a picture as a two-dimensional graph having time and the frequency as two axes.

Type: Grant

Filed: March 15, 2008

Date of Patent: June 29, 2010

Assignee: Sony Corporation

Inventors: Yasuhiro Toguri, Masayuki Nishiguchi
VOICE RECOGNITION SYSTEM, METHOD, AND PROGRAM

Publication number: 20100106495

Abstract: A voice recognition system comprises: a voice input unit that receives an input signal from a voice input element and output it; a voice detection unit that detects an utterance segment in the input signal; a voice recognition unit that performs voice recognition for the utterance segment; and a control unit that outputs a control signal to at least one of the voice input unit and the voice detection unit and suppresses a detection frequency if the detection frequency satisfies a predetermined condition.

Type: Application

Filed: February 27, 2008

Publication date: April 29, 2010

Applicant: NEc Corporation

Inventor: Toru Iwasawa
METHOD AND SYSTEM FOR PERFORMING BANKING TRANSACTIONS BY SIMULATING A VIRTUAL ATM BY MEANS OF A MOBILE TELECOMMUNICATIONS DEVICE

Publication number: 20100063905

Abstract: Method and system for carrying out all the transactions performable at an ATM, with the exception of provision of cash, by means of a mobile telecommunications device, specifically a third generation (3G) mobile telephone and the UMTS network. The system comprises the modules (2, 4, 5) necessary for securely carrying out said transactions, by means of access to the ATM by using a virtual emulation thereof, and outside existing networks and protocols now used for these types of transactions.

Type: Application

Filed: February 16, 2007

Publication date: March 11, 2010

Applicant: NILUTESA, S.L.

Inventor: Nicolás Luca De Tena Sainz
SPEECH RECOGNITION SYSTEM AND PROGRAM THEREFOR

Publication number: 20100057457

Abstract: An unknown word is additionally registered in a speech recognition dictionary by utilizing a correction result, and a new pronunciation of the word that has been registered in a speech recognition dictionary is additionally registered in the speech recognition dictionary, thereby increasing the accuracy of speech recognition. The start time and finish time of each phoneme unit in speech data corresponding to each phoneme included in a phoneme sequence acquired by a phoneme sequence converting section 13 are added to the phoneme sequence. A phoneme sequence extracting section 15 extracts from the phoneme sequence a phoneme sequence portion composed of phonemes existing in a segment corresponding to the period from the start time to the finish time of the word segment of the word corrected by a word correcting section 9 and the extracted phoneme sequence portion is determined as the pronunciation of the corrected word.

Type: Application

Filed: November 30, 2007

Publication date: March 4, 2010

Applicant: National Institute of Advanced Industrial Science Technology

Inventors: Jun Ogata, Masataka Goto
System and method for automatic verification of the understandability of speech

Patent number: 7660716

Abstract: The present invention relates to a system and method for automatically verifying that a message received from a user is intelligible. In an exemplary embodiment, a message is received from the user. A speech level of the user's message may be measured and compared to a pre-determined speech level threshold to determine whether the measured speech level is below the pre-determined speech level threshold. A signal-to-noise ratio of the user's message may be measured and compared to a pre-determined signal-to-noise ratio threshold to determine whether the measured signal-to-noise ratio of the message is below the pre-determined signal-to-noise ratio threshold. An estimate of intelligibility for the user's message may be calculated and compared to an intelligibility threshold to determine whether the calculated estimate of intelligibility is below the intelligibility threshold.

Type: Grant

Filed: October 3, 2007

Date of Patent: February 9, 2010

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Harvey S. Cohen, Randy G. Goldberg, Kenneth H. Rosen
Speech signal compression device, speech signal compression method, and program

Patent number: 7653540

Abstract: The present invention provides a speech signal compression device which allows a storage capacity of data representing speech to be efficiently compressed. In the present invention, a computer C1 operates with respect to speech data to be compressed into speech data for each phoneme on the basis of phoneme labeling data, to unify the time length of a unit pitch section for each of the divided speech data into the same value, thereby creating a pitch waveform and creating a sub-band data representing variation in time of spectrum components of the pitch waveform signal. Also, this sub-band data is compressed so as to match a condition designated by a table for compression, and the compressed data is further encoded in entropy to output the entropy coded data.

Type: Grant

Filed: March 26, 2004

Date of Patent: January 26, 2010

Assignee: Kabushiki Kaisha Kenwood

Inventor: Yasushi Sato
VOICE RECOGNITION APPARATUS, VOICE RECOGNITION METHOD AND RECORDING MEDIUM

Publication number: 20100010813

Abstract: A voice recognition apparatus includes an extraction unit extracting a feature amount from a voice signal, a word dictionary storing a plurality of recognition words; a reject word generation unit storing reject words in the word dictionary in association with the recognition words and a collation unit calculating a degree of similarity between the voice signal and each of the recognition words and reject words stored in the word dictionary by using the feature amount extracted by the extraction unit, determining whether or not a word having a high calculated degree of similarity corresponds to a reject word, when the word is determined as the reject word, excluding the recognition word stored in the word dictionary in association with the reject word from a result of recognition, and outputting a recognition word having a high calculated degree of similarity as a result of recognition.

Type: Application

Filed: April 30, 2009

Publication date: January 14, 2010

Applicant: FUJITSU LIMITED

Inventor: Shouji HARADA
Voice region detection apparatus and method with color noise removal using run statistics

Patent number: 7630891

Abstract: The present invention relates to a voice region detection apparatus and method capable of accurately detecting a voice region even in a voice signal with color noise. The voice region detection method comprises the steps of, if a voice signal is input, dividing the input voice signal into frames; performing whitening of surrounding noise by combining white noise with the frames; extracting random parameters indicating randomness of frames from the frames subjected to the whitening; classifying the frames into voice frames and noise frames based on the extracted random parameters; and detecting a voice region by calculating start and end positions of a voice based on the voice and noise frames. According to the present invention, the voice region can be accurately detected even in a voice signal with a large amount of color noise mixed therewith.

Type: Grant

Filed: November 26, 2003

Date of Patent: December 8, 2009

Assignee: Samsung Electronics Co., Ltd.

Inventors: Kwang-cheol Oh, Yong-beom Lee
Method and system for person/speaker verification via communications systems

Patent number: 7627472

Abstract: A method and a system for person/speaker verification via different communications systems. The system may include a control logic (SL) having access to a voice-controlled dialog system (DS) having verification dialogs stored for querying, a biometrics customer profile (BK), in which the personal biometric data of customers are stored and a provider database (PD), which contains information regarding protected database areas and services in conjunction with the biometric methods authorized for verification. The method for person/speaker verification is designed to ascertain, transmit, analyze and evaluate via telecommunications systems the different personal biometric data that are suitable to establish unequivocally the access authorization of a customer.

Type: Grant

Filed: March 18, 2005

Date of Patent: December 1, 2009

Assignee: Deutsche Telekom AG

Inventors: Marian Trinkel, Christel Mueller, Fred Runge
Method and apparatus for automatically generating a general extraction function calculable on an input signal, e.g. an audio signal to extract therefrom a predetermined global characteristic value of its contents, e.g. a descriptor

Patent number: 7624012

Abstract: The invention enables to generate a general function (4) which can operate on an input signal (Sx) to extract from the latter a value (DVex) of a global characteristic value expressing a feature (De) of the information conveyed by that signal. It operates by: generating at least one compound function (CF1-CFn), said compound function being generated from at least one of a set of elementary functions (EF1, EF2, . . .

Type: Grant

Filed: December 16, 2003

Date of Patent: November 24, 2009

Assignee: Sony France S.A.

Inventors: François Pachet, Aymeric Zils
PRODUCING PHONITOS BASED ON FEATURE VECTORS

Publication number: 20090271198

Abstract: Methods, systems, and machine-readable media are disclosed for processing a signal representing speech. According to one embodiment, processing a signal representing speech can comprise receiving a first frame of the signal, the first frame comprising a voiced frame. One or more cords can be extracted from the voiced frame based on occurrence of one or more events within the frame. For example, the one or more events can comprise one or more glottal pulses. The one or more cords can collectively comprise less than all of the frame. For example, each of the cords can begin with onset of a glottal pulse and extend to a point prior to an onset of neighboring glottal pulse but may exclude a portion of the frame prior to the onset of the neighboring glottal pulse. A phoneme for the voiced frame can be determined based on at least one of the extracted cords.

Type: Application

Filed: October 23, 2008

Publication date: October 29, 2009

Applicant: Red Shift Company, LLC

Inventors: Joel K. Nyquist, Erik N. Reckase, Matthew D. Robinson, John F. Remillard
IDENTIFYING FEATURES IN A PORTION OF A SIGNAL REPRESENTING SPEECH

Publication number: 20090271197

Abstract: Methods, systems, and machine-readable media are disclosed for processing a signal representing speech. According to one embodiment, processing a signal representing speech can comprise receiving a region of the signal representing speech. The region can comprise a portion of a frame of the signal representing speech classified as a voiced frame. The region can be marked based on one or more pitch estimates for the region. A cord can be identified within the region based on occurrence of one or more events within the region of the signal. For example, the one or more events can comprise one or more glottal pulses. In such cases, cord can begin with onset of a first glottal pulse and extend to a point prior to an onset of a second glottal pulse. The cord may exclude a portion of the region of the signal prior to the onset of the second glottal pulse.

Type: Application

Filed: October 23, 2008

Publication date: October 29, 2009

Applicant: Red Shift Company, LLC

Inventors: Joel K. Nyquist, Erik N. Reckase, Matthew D. Robinson, John F. Remillard
Method for Retrieving Items Represented by Particles from an Information Database

Publication number: 20090265162

Abstract: A set of words is converted to a corresponding set of particles, wherein the words and the particles are unique within each set. For each word, all possible partitionings of the word into particles are determined, and a cost is determined for each possible partitioning. The particles of the possible partitioning associated with a minimal cost are added to the set of particles.

Type: Application

Filed: June 30, 2009

Publication date: October 22, 2009

Inventors: Tony Ezzat, Evandro B. Gouvea
Voice recording system, recording device, voice analysis device, voice recording method and program

Patent number: 7599836

Abstract: To provide a method of specifying each of speakers of individual voices, based on recorded voices made by a plurality of speakers, with a simple system configuration, and to provide a system using the method. The system includes: microphones individually provided for each of the speakers; a voice processing unit which gives a unique characteristic to each pair of two-channel voice signals recorded with each of the microphones 10, by executing different kinds of voice processing on the respective pairs of voice signals, and which mixes the voice signals for each channel; and an analysis unit which performs an analysis according to the unique characteristics, given to the voice signals concerning the respective microphones through the processing by the voice processing unit, and which specifies the speaker for each speech segment of the voice signals.

Type: Grant

Filed: May 25, 2005

Date of Patent: October 6, 2009

Assignee: Nuance Communications, Inc.

Inventors: Osamu Ichikawa, Masafumi Nishimura, Tetsuya Takiguchi
Speaker clustering and adaptation method based on the HMM model variation information and its apparatus for speech recognition

Patent number: 7590537

Abstract: A speech recognition method and apparatus perform speaker clustering and speaker adaptation using average model variation information over speakers while analyzing the quantity variation amount and the directional variation amount. In the speaker clustering method, a speaker group model variation is generated based on the model variation between a speaker-independent model and a training speaker ML model. In the speaker adaptation method, the model in which the model variation between a test speaker ML model and a speaker group ML model to which the test speaker belongs which is most similar to a training speaker group model variation is found, and speaker adaptation is performed on the found model. Herein, the model variation in the speaker clustering and the speaker adaptation are calculated while analyzing both the quantity variation amount and the directional variation amount. The present invention may be applied to any speaker adaptation algorithm of MLLR and MAP.

Type: Grant

Filed: December 27, 2004

Date of Patent: September 15, 2009

Assignee: Samsung Electronics Co., Ltd.

Inventors: Namhoon Kim, Injeong Choi, Yoonkyung Song
SYSTEMS AND METHODS FOR VOICE PERSONALIZATION OF VIDEO CONTENT

Publication number: 20090135177

Abstract: Systems and methods are disclosed for performing voice personalization of video content. The personalized media content may include a composition of a background scene having a character, head model data representing an individualized three-dimensional (3D) head model of a user, audio data simulating the user's voice, and a viseme track containing instructions for causing the individualized 3D head model to lip sync the words contained in the audio data. The audio data simulating the user's voice can be generated using a voice transformation process. In certain examples, the audio data is based on a text input or selected by the user (e.g., via a telephone or computer) or a textual dialogue of a background character.

Type: Application

Filed: November 19, 2008

Publication date: May 28, 2009

Applicant: BIG STAGE ENTERTAINMENT, INC.

Inventors: Jonathan Isaac Strietzel, Jon Hayes Snoddy, Douglas Alexander Fidaleo
Voice-based multimodal speaker authentication using adaptive training and applications thereof

Patent number: 7529669

Abstract: A voice based multimodal speaker authentication method and telecommunications application thereof employing a speaker adaptive method for training phenome specific Gaussian mixture models. Applied to telecommunications services, the method may advantageously be implemented in contemporary wireless terminals.

Type: Grant

Filed: June 13, 2007

Date of Patent: May 5, 2009

Assignee: NEC Laboratories America, Inc.

Inventors: Srivaths Ravi, Anand Raghunathan, Srimat Chakradhar, Karthik Nandakumar
Refining of segmental boundaries in speech waveforms using contextual-dependent models

Patent number: 7496512

Abstract: A method and apparatus are provided for refining segmental boundaries in speech waveforms. Contextual acoustic feature similarities are used as a basis for clustering adjacent phoneme speech units, where each adjacent pair phoneme speech units include a segmental boundary. A refining model is trained for each cluster and used to refine boundaries of contextual phoneme speech units forming the clusters.

Type: Grant

Filed: April 13, 2004

Date of Patent: February 24, 2009

Assignee: Microsoft Corporation

Inventors: Yong Zhao, Min Chu, Jian-lai Zhou, Lijuan Wang
Method and apparatus for the automatic separating and indexing of multi-speaker conversations

Patent number: 7496510

Abstract: Disclosed are a method and apparatus for processing a continuous audio stream containing human speech in order to locate a particular speech-based transaction in the audio stream, applying both known speaker recognition and speech recognition techniques. Only the utterances of a particular predetermined speaker are transcribed thus providing an index and a summary of the underlying dialogue(s). In a first scenario, an incoming audio stream, e.g. a speech call from outside, is scanned in order to detect audio segments of the predetermined speaker. These audio segments are then indexed and only the indexed segments are transcribed into spoken or written language. In a second scenario, two or more speakers located in one room are using a multi-user speech recognition system (SRS). For each user there exists a different speaker model and optionally a different dictionary or vocabulary of words already known or trained by the speech or voice recognition system.

Type: Grant

Filed: November 30, 2001

Date of Patent: February 24, 2009

Assignee: International Business Machines Corporation

Inventors: Joachim Frank, Werner Kriechbaum, Gerhard Stenzel
DATA-DRIVEN GLOBAL BOUNDARY OPTIMIZATION

Publication number: 20090048836

Abstract: Portions from segment boundary regions of a plurality of speech segments are extracted. Each segment boundary region is based on a corresponding initial unit boundary. Feature vectors that represent the portions in a vector space are created. For each of a plurality of potential unit boundaries within each segment boundary region, an average discontinuity based on distances between the feature vectors is determined. For each segment, the potential unit boundary associated with a minimum average discontinuity is selected as a new unit boundary.

Type: Application

Filed: July 28, 2008

Publication date: February 19, 2009

Inventor: Jerome R. Bellegarda
Information recognition device and information recognition method

Patent number: 7480616

Abstract: Information relating to an amount of muscle activity is extracted from a myo-electrical signal by activity amount information extraction means, and information recognition is performed by activity amount information recognition means using the information relating to the amount of muscle activity of a speaker. There is a prescribed correspondence relationship between the amount of muscle activity of a speaker and a phoneme uttered by a speaker, so the content of an utterance can be recognized with a high recognition rate by information recognition using information relating to an amount of muscle activity.

Type: Grant

Filed: February 27, 2003

Date of Patent: January 20, 2009

Assignee: NTT DoCoMo, Inc.

Inventors: Hiroyuki Manabe, Akira Hiraiwa, Toshiaki Sugimura
Speech segment clustering and ranking

Patent number: 7475016

Abstract: A system, method, and apparatus for identifying problematic speech segments is provided. The system includes a clustering module for generating a first cluster of one or more consecutive speech segments if the consecutive speech segments satisfy a predetermined filtering test, and for generating a second cluster comprising at least one different consecutive speech segment selected from the ordered sequence if the at least one different consecutive speech segment satisfies the predetermined filtering test. The system also includes a combining module for combining the first and second clusters as well as the at least one intervening consecutive speech segment to form an aggregated cluster if the aggregated cluster satisfies a predetermined combining criterion. The system can further include a ranking module for ranking aggregated clusters, the ranking reflecting a relative severity of misalignments among problematic speech segments.

Type: Grant

Filed: December 15, 2004

Date of Patent: January 6, 2009

Assignee: International Business Machines Corporation

Inventors: Maria E. Smith, Jie Z. Zeng
Efficient recursive clustering based on a splitting function derived from successive eigen-decompositions

Patent number: 7472062

Abstract: Methods and arrangements for facilitating data clustering. From a set of input data, a predetermined number of non-overlapping subsets are created. The input data is split recursively to create the subsets.

Type: Grant

Filed: January 4, 2002

Date of Patent: December 30, 2008

Assignee: International Business Machines Corporation

Inventors: Upendra V. Chaudhari, Jiri Navratil, Ganesh N. Ramaswamy
Automatic Text-Independent, Language-Independent Speaker Voice-Print Creation and Speaker Recognition

Publication number: 20080312926

Abstract: An automatic dual-step, text independent, language-independent speaker voice-print creation and speaker recognition method, wherein a neural network-based technique is used in a first step and a Markov model-based technique is used in a second step. In particular, the first step uses a neural network-based technique for decoding the content of what is uttered by the speaker in terms of language independent acoustic-phonetic classes, wherein the second step uses the sequence of language-independent acoustic-phonetic classes from the first step and employs a Markov model-based technique for creating the speaker voice-print and for recognizing the speaker. The combination of the two steps enables improvement in the accuracy and efficiency of the speaker voice-print creation and of the speaker recognition, without setting any constraints on the lexical content of the speaker utterance and on the language thereof.

Type: Application

Filed: May 24, 2005

Publication date: December 18, 2008

Inventors: Claudio Vair, Daniele Colibro, Luciano Fissore
Voice converter for assimilation by frame synthesis with temporal alignment

Patent number: 7464034

Abstract: A voice converting apparatus is constructed for converting an input voice into an output voice according to a target voice. The apparatus includes a storage section, an analyzing section including a characteristic analyzer, a producing section, a synthesizing section, a memory, an alignment processor, and target decoder.

Type: Grant

Filed: September 27, 2004

Date of Patent: December 9, 2008

Assignees: Yamaha Corporation, Pompeu Fabra University

Inventors: Takahiro Kawashima, Yasuo Yoshioka, Pedro Cano, Alex Loscos, Xavier Serra, Mark Schiementz, Jordi Bonada
Noise-resistant utterance detector

Patent number: 7451082

Abstract: A method and detector for providing a noise resistant utterance detector is provided by extracting a noise estimate (15) to augment the signal-to-noise ratio of the speech signal, inverse filtering (17) of the speech signal to focus on the periodic excitation part of the signal and spectral reshaping (19) to accentuate separation between formants.

Type: Grant

Filed: August 27, 2003

Date of Patent: November 11, 2008

Assignee: Texas Instruments Incorporated

Inventors: Yifan Gong, Alexis P. Bernard
Method and apparatus for training a text independent speaker recognition system using speech data with text labels

Patent number: 7447633

Abstract: There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states.

Type: Grant

Filed: November 22, 2004

Date of Patent: November 4, 2008

Assignee: International Business Machines Corporation

Inventors: Jiri Navratil, James H. Nealand, Jason W. Pelecanos, Ganesh N. Ramaswamy, Ran D. Zilca

prev 1 2 3 4 5 6 7 next