Voice Recognition Patents (Class 704/246)

Preliminary matching (Class 704/247)

Endpoint detection (Class 704/248)

Subportions (Class 704/249)

Specialized models (Class 704/250)

Methods and apparatus for real-time interaction analysis in call centers

Patent number: 9015046

Abstract: A method and system for indicating in real time that an interaction is associated with a problem or issue, comprising: receiving a segment of an interaction in which a representative of the organization participates; extracting a feature from the segment; extracting a global feature associated with the interaction; aggregating the feature and the global feature; and classifying the segment or the interaction in association with the problem or issue by applying a model to the feature and the global feature. The method and system may also use features extracted from earlier segments within the interaction. The method and system can also evaluate the model based on features extracted from training interactions and manual tagging assigned to the interactions or segments thereof.

Type: Grant

Filed: June 10, 2010

Date of Patent: April 21, 2015

Assignee: Nice-Systems Ltd.

Inventors: Oren Pereg, Moshe Wasserblat, Yuval Lubowich, Ronen Laperdon, Dori Shapira, Vladislav Feigin, Oz Fox-Kahana
Methods and systems for touch-free call handling

Patent number: 9014346

Abstract: A method, apparatus and computer-readable medium for handling incoming calls destined for a called party. The method comprises detecting arrival of an incoming call destined for the called party and attempting to reach the called party by causing a communication device associated with the called party to emit a voice message soliciting a spoken call handling command from the called party. This allows the called party not only to recognize the calling party, but also to decide whether to accept, reject or forward the incoming call without having to physically manipulate the communication device. The network-based example of implementation is compatible with many existing communication devices and has the ability to query the calling party for identification information, whereas the communication device-based example of implementation is compatible with many existing network architectures, and does not require the called party to subscribe to any particular network service.

Type: Grant

Filed: September 22, 2006

Date of Patent: April 21, 2015

Assignee: BCE Inc.

Inventor: Jeffrey William Dawson
Formant based speech reconstruction from noisy signals

Patent number: 9015044

Abstract: Implementations of systems, method and devices described herein enable enhancing the intelligibility of a target voice signal included in a noisy audible signal received by a hearing aid device or the like. In particular, in some implementations, systems, methods and devices are operable to generate a machine readable formant based codebook. In some implementations, the method includes determining whether or not a candidate codebook tuple includes a sufficient amount of new information to warrant either adding the candidate codebook tuple to the codebook or using at least a portion of the candidate codebook tuple to update an existing codebook tuple. Additionally and/or alternatively, in some implementations systems, methods and devices are operable to reconstruct a target voice signal by detecting formants in an audible signal, using the detected formants to select codebook tuples, and using the formant information in the selected codebook tuples to reconstruct the target voice signal.

Type: Grant

Filed: August 20, 2012

Date of Patent: April 21, 2015

Assignee: Malaspina Labs (Barbados) Inc.

Inventors: Pierre Zakarauskas, Alexander Escott, Clarence S. H. Chu, Shawn E. Stevenson
Choosing recognized text from a background environment

Patent number: 9015043

Abstract: A computer-implemented method includes receiving an electronic representation of one or more human voices, recognizing words in a first portion of the electronic representation of the one or more human voices, and sending suggested search terms to a display device for display to a user in a text format. The suggested search terms are based on the recognized words in the first portion of the electronic representation of the one or more human voices. A search query is received from the user, which includes one or more of the suggested search terms that were displayed to the user.

Type: Grant

Filed: October 1, 2010

Date of Patent: April 21, 2015

Assignee: Google Inc.

Inventor: Scott Jenson
VOICE INPUT DEVICE, VOICE INPUT METHOD AND PROGRAM

Publication number: 20150106098

Abstract: A voice input device provided with an input section for inputting a voice of a user, a recognition section for recognizing the voice of the user inputted by the input section, a generation section for generating characters or a command based on a recognition result of the recognition section, a detection section for detecting a device's own posture, and an instruction section for instructing the generation section to generate the command when a detection result of the detection section represents a specific posture as compared to instructing the generation section to generate the characters when the detection result of the detection section represents a posture other than the specific posture. Accordingly, character input and command input during dictation is correctly distinguished, or more specifically unexpected character input during dictation is avoided.

Type: Application

Filed: October 10, 2012

Publication date: April 16, 2015

Inventor: Yusuke Inutsuka
METHOD AND DEVICE FOR PROVIDING DISTRIBUTED TELEPRESENCE SERVICE

Publication number: 20150106097

Abstract: There is provided a method of determining a main speaker that is performed by a first terminal participating in a distributed telepresence service. The method of determining a main speaker according to an embodiment of the invention includes obtaining first feature information for determining a main speaker from an audio input signal, obtaining second feature information for determining a main speaker of a second terminal from the second terminal participating in the distributed telepresence service, and determining a main speaker terminal for providing a video and an audio of a main speaker who is participating in a telepresence and is speaking based on the first feature information for determining a main speaker and the second feature information for determining a main speaker.

Type: Application

Filed: February 21, 2014

Publication date: April 16, 2015

Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Inventor: Hyun-Woo KIM
IMAGE PROCESSING APPARATUS AND CONTROL METHOD THEREOF

Publication number: 20150106099

Abstract: An image processing apparatus includes: a voice input receiver configured to receive a voice input of user; a signal processor configured to recognize and process the received voice input received through the voice input receiver; a buffer configured to store the voice input; and a controller configured to determine whether a voice recognition function of the signal processor is activated and control the signal processor to recognize the voice input stored in the buffer in response to the voice recognition function being determined to be activated wherein the controller is further configured to store the received voice input in the buffer in response to the received voice input being input through the voice input receiver while the voice recognition function is not activated, so that the received voice input is recognized by the signal processor when the voice recognition function is activated.

Type: Application

Filed: September 23, 2014

Publication date: April 16, 2015

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Chan-hee CHOI, Kyung-mi PARK, Hee-seob RYU, Chan-sik BOK
Name Based Initiation of Speech Recognition

Publication number: 20150106089

Abstract: A computer-implemented method includes listening for audio name information indicative of a name of a computer, with the computer configured to listen for the audio name information in a first power mode that promotes a conservation of power; detecting the audio name information indicative of the name of the computer; after detection of the audio name information, switching to a second power mode that promotes a performance of speech recognition; receiving audio command information; and performing speech recognition on the audio command information.

Type: Application

Filed: December 30, 2010

Publication date: April 16, 2015

Inventors: Evan H. Parker, Michal R. Grabowski
Context-based utterance recognition

Patent number: 9009025

Abstract: In some implementations, a digital work provider may provide language model information related to a plurality of different contexts, such as a plurality of different digital works. For example, the language model information may include language model difference information identifying a plurality of sequences of one or more words in a digital work that have probabilities of occurrence that differ from probabilities of occurrence in a base language model by a threshold amount. The language model difference information corresponding to a particular context may be used in conjunction with the base language model to recognize an utterance made by a user of a user device. In some examples, the recognition is performed on the user device. In other examples, the utterance and associated context information are sent over a network to a recognition computing device that performs the recognition.

Type: Grant

Filed: December 27, 2011

Date of Patent: April 14, 2015

Assignee: Amazon Technologies, Inc.

Inventor: Brandon W. Porter
Noise adaptive training for speech recognition

Patent number: 9009039

Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.

Type: Grant

Filed: June 12, 2009

Date of Patent: April 14, 2015

Assignee: Microsoft Technology Licensing, LLC

Inventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
Confidence level assignment to information from audio transcriptions

Patent number: 9002702

Abstract: Embodiments of the present invention provide an approach for automatically assigning a confidence level to information extracted from a transcription of a voice recording. Specifically, in a typical embodiment, an axiom is extracted from a source associated with the text of the transcription. A confidence level of the source is determined. A confidence level is assigned to the axiom based on the confidence level of the source.

Type: Grant

Filed: May 3, 2012

Date of Patent: April 7, 2015

Assignee: International Business Machines Corporation

Inventors: James E. Bostick, John M. Ganci, Jr., John P. Kaemmerer, Craig M. Trim
Interactive device that recognizes input voice of a user and contents of an utterance of the user, and performs a response corresponding to the recognized contents

Patent number: 9002705

Abstract: The present invention provides an interactive device which allows quick utterance recognition results and sequential output thereof and which diminishes a recognition rate decrease even if user's utterance is divided by a short interval into frames for quick decision. The interactive device: sets a recognition section for voice recognition; performs voice recognition for the recognition section; when the voice recognition includes a key phrase, determines response actions corresponding thereto; and executes the response actions. The interactive device repeatedly updates the set recognition terminal point to a frame which is the predetermined time length ahead of the set recognition terminal point to set a plurality of recognition sections. The interactive device performs voice recognition for each recognition section.

Type: Grant

Filed: April 19, 2012

Date of Patent: April 7, 2015

Assignee: Honda Motor Co., Ltd.

Inventors: Yuichi Yoshida, Taku Osada
Cut and paste spoofing detection using dynamic time warping

Patent number: 9002706

Abstract: The invention refers to a method for comparing voice utterances, the method comprising the steps: extracting a plurality of features (201) from a first voice utterance of a given text sample and extracting a plurality of features (201) from a second voice utterance of said given text sample, wherein each feature is extracted as a function of time, and wherein each feature of the second voice utterance corresponds to a feature of the first voice utterance; applying dynamic time warping (202) to one or more time dependent characteristics of the first and/or second voice utterance e.g.

Type: Grant

Filed: December 10, 2009

Date of Patent: April 7, 2015

Assignee: Agnitio SL

Inventors: Jesus Antonio Villalba Lopez, Alfonso Ortega Gimenez, Eduardo Lleida Solano, Sara Varela Redondo, Marta Garcia Gomar
System and method for speech personalization by need

Patent number: 9002713

Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for speaker recognition personalization. The method recognizes speech received from a speaker interacting with a speech interface using a set of allocated resources, the set of allocated resources including bandwidth, processor time, memory, and storage. The method records metrics associated with the recognized speech, and after recording the metrics, modifies at least one of the allocated resources in the set of allocated resources commensurate with the recorded metrics. The method recognizes additional speech from the speaker using the modified set of allocated resources. Metrics can include a speech recognition confidence score, processing speed, dialog behavior, requests for repeats, negative responses to confirmations, and task completions.

Type: Grant

Filed: June 9, 2009

Date of Patent: April 7, 2015

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
Community audio narration generation

Patent number: 9002703

Abstract: The community-based generation of audio narrations for a text-based work leverages collaboration of a community of people to provide human-voiced audio readings. During the community-based generation, a collection of audio recordings for the text-based work may be collected from multiple human readers in a community. An audio recording for each section in the text-based work may be selected from the collection of audio recordings. The selected audio recordings may be then combined to produce an audio reading of at least a portion of the text-based work.

Type: Grant

Filed: September 28, 2011

Date of Patent: April 7, 2015

Assignee: Amazon Technologies, Inc.

Inventor: Jay A. Crosley
System and method for applying dynamic contextual grammars and language models to improve automatic speech recognition accuracy

Patent number: 9002710

Abstract: The invention involves the loading and unloading of dynamic section grammars and language models in a speech recognition system. The values of the sections of the structured document are either determined in advance from a collection of documents of the same domain, document type, and speaker; or collected incrementally from documents of the same domain, document type, and speaker; or added incrementally to an already existing set of values. Speech recognition in the context of the given field is constrained to the contents of these dynamic values. If speech recognition fails or produces a poor match within this grammar or section language model, speech recognition against a larger, more general vocabulary that is not constrained to the given section is performed.

Type: Grant

Filed: September 12, 2012

Date of Patent: April 7, 2015

Assignee: Nuance Communications, Inc.

Inventors: Alwin B. Carus, Larissa Lapshina, Raghu Vemula
Determining the position of the source of an utterance

Patent number: 9002707

Abstract: An information processing apparatus includes: a plurality of information input units; an event detection unit that generates event information including estimated position information and estimated identification information of users present in the real space based on analysis of the information from the information input unit; and an information integration processing unit that inputs the event information, and generates target information including a position of each user and user identification information based on the input event information, and signal information representing a probability value of the event generation source, wherein the information integration processing unit includes an utterance source probability calculation unit, and wherein the utterance source probability calculation unit performs a process of calculating an utterance source score as an index value representing an utterance source probability of each target by multiplying weights based on utterance situations by a plurality of d

Type: Grant

Filed: November 6, 2012

Date of Patent: April 7, 2015

Assignee: Sony Corporation

Inventor: Keiichi Yamada
Speaker adaptation

Patent number: 9001976

Abstract: A method for speaker adaptation includes receiving a plurality of media files, each associated with a call center agent of a plurality of call center agents and receiving a plurality of terms. Speech processing is performed on at least some of the media files to identify putative instances of at least some of the plurality of terms. Each putative instance is associated with a hit quality that characterizes a quality of recognition of the corresponding term. One or more call center agents for performing speaker adaptation are determined, including identifying call center agents that are associated with at least one media file that includes one or more putative instances with a hit quality below a predetermined threshold. Speaker adaptation is performed for each identified call center agent based on the media files associated with the identified call center agent and the identified instances of the plurality of terms.

Type: Grant

Filed: May 3, 2012

Date of Patent: April 7, 2015

Assignee: Nexidia, Inc.

Inventors: Jon A. Arrowood, Robert W. Morris, Marsal Gavalda
Customer Identification Through Voice Biometrics

Publication number: 20150095028

Abstract: Systems and methods for determining an identity of an individual are provided. Audio may be received that includes a key phrase spoken by the individual, and the key phrase may include an identifier spoken by the individual. A key phrase voice print and key phrase text corresponding to the audio may be obtained. The key phrase text may include text corresponding to the identifier spoken by the individual. Voice prints may be retrieved based on the text corresponding to the identifier, and the voice prints may be provided to a voice biometric engine for comparison to the key phrase voice print. The individual may be authenticated based on a comparison of the key phrase voice print to the voice prints. The identifier may include a first name and a last name of the individual.

Type: Application

Filed: September 30, 2013

Publication date: April 2, 2015

Applicant: Bank of America Corporation

Inventors: David Karpey, Mark Pender
Senone scoring for multiple input streams

Patent number: 8996374

Abstract: Embodiments of the present invention include an apparatus, method, and system for calculating senone scores for multiple concurrent input speech streams. The method can include the following: receiving one or more feature vectors from one or more input streams; accessing the acoustic model one senone at a time; and calculating separate senone scores corresponding to each incoming feature vector. The calculation uses a single read access to the acoustic model for a single senone and calculates a set of separate senone scores for the one or more feature vectors, before proceeding to the next senone in the acoustic model.

Type: Grant

Filed: November 6, 2012

Date of Patent: March 31, 2015

Assignee: Spansion LLC

Inventor: Ojas A. Bapat
Lips blockers, headsets and systems

Patent number: 8996382

Abstract: Systems and methods for inhibiting access to the lips of speaking person including a sound receiving device for receiving speech of a person speaking, the person having lips that move when the person speaks, a blocker connected to the device for blocking the lips of the person speaking while the person is speaking; and, in some aspects, such a blocker with a material addition apparatus to provide added material for the breath of a person speaking, e.g., for preventing the spread of disease or to freshen a speaker's breath. This abstract is provided to comply with the rules requiring an abstract which will allow a searcher or other reader to quickly ascertain the subject matter of the technical disclosure and is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims, 37 C.F.R. 1.72(b).

Type: Grant

Filed: October 11, 2011

Date of Patent: March 31, 2015

Inventor: Guy L. McClung, III
State detection device and state detecting method

Patent number: 8996373

Abstract: A state detection device includes: a first model generation unit to generate a first specific speaker model obtained by modeling speech features of a specific speaker in an undepressed state; a second model generation unit to generate a second specific speaker model obtained by modeling speech features of the specific speaker in the depressed state; a likelihood calculation unit to calculate a first likelihood as a likelihood of the first specific speaker model with respect to input voice, and a second likelihood as a likelihood of the second specific speaker model with respect to the input voice; and a state determination unit to determine a state of the speaker of the input voice using the first likelihood and the second likelihood.

Type: Grant

Filed: October 5, 2011

Date of Patent: March 31, 2015

Assignee: Fujitsu Limited

Inventors: Shoji Hayakawa, Naoshi Matsuo
Online maximum-likelihood mean and variance normalization for speech recognition

Patent number: 8996368

Abstract: A feature transform for speech recognition is described. An input speech utterance is processed to produce a sequence of representative speech vectors. A time-synchronous speech recognition pass is performed using a decoding search to determine a recognition output corresponding to the speech input. The decoding search includes, for each speech vector after some first threshold number of speech vectors, estimating a feature transform based on the preceding speech vectors in the utterance and partial decoding results of the decoding search. The current speech vector is then adjusted based on the current feature transform, and the adjusted speech vector is used in a current frame of the decoding search.

Type: Grant

Filed: February 22, 2010

Date of Patent: March 31, 2015

Assignee: Nuance Communications, Inc.

Inventor: Daniel Willett
Release of transaction data

Patent number: 8996387

Abstract: For clearing transaction data selected for a processing, there is generated in a portable data carrier (1) a transaction acoustic signal (003; 103; 203) (S007; S107; S207) upon whose acoustic reproduction by an end device (10) at least transaction data selected for the processing are reproduced superimposed acoustically with a melody specific to a user of the data carrier (1) (S009; S109; S209). The generated transaction acoustic signal (003; 103; 203) is electronically transferred to an end device (10) (S108; S208), which processes the selected transaction data (S011; S121; S216) only when the user of the data carrier (1) confirms vis-à-vis the end device (10) an at least partial match both of the acoustically reproduced melody with the user-specific melody and of the acoustically reproduced transaction data with the selected transaction data (S010; S110, S116; S210).

Type: Grant

Filed: September 8, 2009

Date of Patent: March 31, 2015

Assignee: Giesecke & Devrient GmbH

Inventors: Thomas Stocker, Michael Baldischweiler
SOUND PROCESSING SYSTEM AND RELATED METHOD

Publication number: 20150088513

Abstract: A sound processing system is provided and is executed by a processor. The processor acquires a video/audio file from video/audio files. The processor controls a video/audio processing chip to build a voiceprint feature model of each section for use in speaker recognition, and to identify the speaker of each section based on comparison of the built voiceprint feature model of the acquired video/audio file and the voiceprint feature models of speakers stored in a storage unit. The processor generates a tag file recording relationships between the plurality of sections of the acquired video/audio file and the speakers according to the identification result. A sound processing method is also provided.

Type: Application

Filed: September 17, 2014

Publication date: March 26, 2015

Inventors: HAI-HSING LIN, HSIN-TSUNG TUNG
CONTEXT-BASED AUDIO FILTER SELECTION

Publication number: 20150088512

Abstract: For context-based audio filter selection, a type module determines a recipient type for a recipient process of an audio signal. The recipient type includes a human destination recipient type and a speech recognition recipient type. A filter module selects an audio filter in response to the recipient type.

Type: Application

Filed: September 20, 2013

Publication date: March 26, 2015

Applicant: LENOVO (Singapore) PTE, LTD.

Inventors: John Miles Hunt, John Weldon Nicholson
Telephony service interaction management

Patent number: 8990071

Abstract: A method for managing an interaction of a calling party to a communication partner is provided. The method includes automatically determining if the communication partner expects DTMF input. The method also includes translating speech input to one or more DTMF tones and communicating the one or more DTMF tones to the communication partner, if the communication partner expects DTMF input.

Type: Grant

Filed: March 29, 2010

Date of Patent: March 24, 2015

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yun-Cheng Ju, Stefanie Tomko, Frank Liu, Ivan Tashev
SPEECH RECOGNITION SYSTEM AND METHOD USING INCREMENTAL DEVICE-BASED ACOUSTIC MODEL ADAPTATION

Publication number: 20150081300

Abstract: An embodiment of the present invention relates to a speech recognition system and method using incremental device-based acoustic model adaptation.

Type: Application

Filed: April 18, 2014

Publication date: March 19, 2015

Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Inventor: Dong-Hyun Kim
METHOD AND SYSTEM FOR ASSISTING PATIENTS

Publication number: 20150081299

Abstract: A system for use in assisting a user in a social interaction with another person is provided, the system being configured to determine whether the user recognizes the person and, if it is determined that the user does not recognize the person, to provide information to the user about the person. A corresponding method and computer program product for performing the method are also provided.

Type: Application

Filed: June 1, 2012

Publication date: March 19, 2015

Applicant: Koninklijke Philips N.V.

Inventors: Radu Serban Jasinschi, Murtaza Bulut, Luca Bellodi
Global speech user interface

Patent number: 8983838

Abstract: A global speech user interface (GSUI) comprises an input system to receive a user's spoken command, a feedback system along with a set of feedback overlays to give the user information on the progress of his spoken requests, a set of visual cues on the television screen to help the user understand what he can say, a help system, and a model for navigation among applications. The interface is extensible to make it easy to add new applications.

Type: Grant

Filed: September 17, 2013

Date of Patent: March 17, 2015

Assignee: Promptu Systems Corporation

Inventors: Adam Jordan, Scott Lynn Maddux, Tim Plowman, Victoria Stanbach, Jody Williams
Alert mode management method and communication device having alert mode management function

Patent number: 8983837

Abstract: A computerized alert mode management method of a communication device, the communication device includes a sound capture unit. Vocal sounds of the environment around the communication device are extracted at regular intervals using the sound capture unit. Voice characteristic information of the captured vocal sounds is extracted using a speech recognition method and/or a voice recognition method. The communication device is controlled to work at one of a plurality of predetermined alert modes according to the extracted voice characteristic information.

Type: Grant

Filed: December 6, 2012

Date of Patent: March 17, 2015

Assignee: Hon Hai Precision Industry Co., Ltd.

Inventor: Tsung-Jen Chuang
System for spectrum sensing of multi-carrier signals with equidistant sub-carriers

Patent number: 8982971

Abstract: A multi-carrier signal is typically comprised of many equidistant sub-carriers. This results in periodicity of spectrum within the bandwidth of such a multi-carrier signal. An unknown multi-carrier signal with equidistant sub-carriers can thus be sensed together with its sub-carrier spacing by finding a discernable local maximum in the cepstrum (Fourier transform of the log spectrum) of the multi-carrier signal.

Type: Grant

Filed: March 29, 2012

Date of Patent: March 17, 2015

Assignee: QRC, Inc.

Inventors: Sinisa Peric, Thomas F. Callahan, III
Mitigating replay attacks using multiple-image authentication

Patent number: 8983207

Abstract: A technique for authenticating a user is described. During this authentication technique, an electronic device (such as a cellular telephone) captures multiple images of the user while the user moves the electronic device in a pre-defined manner (for example, along a path in 3-dimensional space), and determines positions of the electronic device when the multiple images were captured. Then, the electronic device compares the images at the positions with corresponding pre-existing images of the user captured at different points of view. If the comparisons achieve a match condition, the electronic device authenticates the user. In this way, the authentication technique may be used to prevent successful replay attacks.

Type: Grant

Filed: January 10, 2013

Date of Patent: March 17, 2015

Assignee: Intuit Inc.

Inventors: Alexander S. Ran, Christopher Z. Lesner, Cynthia J. Osmon
Captioning using socially derived acoustic profiles

Patent number: 8983836

Abstract: Mechanisms for performing dynamic automatic speech recognition on a portion of multimedia content are provided. Multimedia content is segmented into homogeneous segments of content with regard to speakers and background sounds. For the at least one segment, a speaker providing speech in an audio track of the at least one segment is identified using information retrieved from a social network service source. A speech profile for the speaker is generated using information retrieved from the social network service source, an acoustic profile for the segment is generated based on the generated speech profile, and an automatic speech recognition engine is dynamically configured for operation on the at least one segment based on the acoustic profile. Automatic speech recognition operations are performed on the audio track of the at least one segment to generate a textual representation of speech content in the audio track corresponding to the speaker.

Type: Grant

Filed: September 26, 2012

Date of Patent: March 17, 2015

Assignee: International Business Machines Corporation

Inventors: Elizabeth V. Woodward, Shunguo Yan
APPARATUS AND METHOD FOR SELECTING A CONTROL OBJECT BY VOICE RECOGNITION

Publication number: 20150073801

Abstract: There are provided an apparatus and a method for selecting a control object through voice recognition. The apparatus for selecting a control object through voice recognition according to the present invention includes one or more processing devices, in which the one or more processing devices are configured to obtain input information on the basis of a voice of a user, to match the input information to at least one first identification information obtained based on a control object and second identification information corresponding to the first identification information, to obtain matched identification information matched to the input information within the first identification information and the second identification information, and to select a control object corresponding to the matched identification information.

Type: Application

Filed: August 29, 2014

Publication date: March 12, 2015

Inventors: Jongwon Shin, Semi Kim, Kanglae Jung, Jeongin Doh, Jehseon Youn, Kyeogsun Kim
VOICE VERIFYING SYSTEM AND VOICE VERIFYING METHOD

Publication number: 20150073799

Abstract: A voice verifying system, which comprises: a microphone, which is always turned on to output at least one voice signal; a speech determining device, for determining if the voice signal is valid or not according to a reference value, wherein the speech determining device passes the voice signal if the voice signal is valid; and a verifying module, for verifying a speech signal generated from the voice signal and for outputting a device activating signal to activate a target device if the speech signal matches a predetermined rule; and a reference value generating device, for generating the reference value according to speech signal information from the verifying module.

Type: Application

Filed: September 12, 2013

Publication date: March 12, 2015

Applicant: Mediatek Inc.

Inventors: Liang-Che Sun, Yiou-Wen Cheng, Ting-Yuan Chiu
DIGITAL SIGNATURES FOR COMMUNICATIONS USING TEXT-INDEPENDENT SPEAKER VERIFICATION

Publication number: 20150073800

Abstract: A speaker-verification digital signature system is disclosed that provides greater confidence in communications having digital signatures because a signing party may be prompted to speak a text-phrase that may be different for each digital signature, thus making it difficult for anyone other than the legitimate signing party to provide a valid signature.

Type: Application

Filed: June 9, 2014

Publication date: March 12, 2015

Inventors: Pradeep K. BANSAL, Lee BEGEJA, Carroll W. CRESWELL, Jeffrey Joseph FARAH, Benjamin J. STERN, Jay WILPON
Identification of utterance subjects

Patent number: 8977555

Abstract: Features are disclosed for generating markers for elements or other portions of an audio presentation so that a speech processing system may determine which portion of the audio presentation a user utterance refers to. For example, an utterance may include a pronoun with no explicit antecedent. The marker may be used to associate the utterance with the corresponding content portion for processing. The markers can be provided to a client device with a text-to-speech (“TTS”) presentation. The markers may then be provided to a speech processing system along with a user utterance captured by the client device. The speech processing system, which may include automatic speech recognition (“ASR”) modules and/or natural language understanding (“NLU”) modules, can generate hints based on the marker. The hints can be provided to the ASR and/or NLU modules in order to aid in processing the meaning or intent of a user utterance.

Type: Grant

Filed: December 20, 2012

Date of Patent: March 10, 2015

Assignee: Amazon Technologies, Inc.

Inventors: Fred Torok, Frédéric Johan Georges Deramat, Vikram Kumar Gundeti
Voice phone-based method and system to authenticate users

Patent number: 8976943

Abstract: Provided is a method and a telephone-based system with voice-verification capabilities that enable a user to safely and securely conduct transactions with his or her online financial transaction program account over the phone in a convenient and user-friendly fashion, without having to depend on an internet connection.

Type: Grant

Filed: September 25, 2012

Date of Patent: March 10, 2015

Assignee: Ebay Inc.

Inventor: Will Tonini
Voice recognition system for registration of stable utterances

Patent number: 8977547

Abstract: A voice recognition system includes: a voice input unit 11 for inputting a voice uttered a plurality of times; a registering voice data storage unit 12 for storing voice data uttered the plurality of times and input into the voice input unit 11; an utterance stability verification unit 13 for determining a similarity between the voice data uttered the plurality of times that are read from the registering voice data storage unit 12, and determining that registration of the voice data is acceptable when the similarity is greater than a threshold Tl; and a standard pattern creation unit 14 for creating a standard pattern by using the voice data where the utterance stability verification unit 13 determines that registration is acceptable.

Type: Grant

Filed: October 8, 2009

Date of Patent: March 10, 2015

Assignee: Mitsubishi Electric Corporation

Inventors: Michihiro Yamazaki, Jun Ishii, Hiroki Sakashita, Kazuyuki Nogi
Natural language system and method based on unisolated performance metric

Patent number: 8977549

Abstract: A natural language business system and method is developed to understand the underlying meaning of a person's speech, such as during a transaction with the business system. The system includes a speech recognition engine, and action classification engine, and a control module. The control module causes the system to execute an inventive method wherein the speech recognition and action classification models may be recursively optimized on an unisolated performance metric that is pertinent to the overall performance of the natural language business system, as opposed to the isolated model-specific criteria previously employed.

Type: Grant

Filed: September 26, 2013

Date of Patent: March 10, 2015

Assignee: Nuance Communications, Inc.

Inventors: Sabine V. Deligne, Yuqing Gao, Vaibhava Goel, Hong-Kwang Kuo, Cheng Wu
Method for spectrum sensing of multi-carrier signals with equidistant sub-carriers

Patent number: 8976906

Abstract: A multi-carrier signal is typically comprised of many equidistant sub-carriers. This results in periodicity of spectrum within the bandwidth of such a multi-carrier signal. An unknown multi-carrier signal with equidistant sub-carriers can thus be sensed together with its sub-carrier spacing by finding a discernible local maximum in the cepstrum (Fourier transform of the log spectrum) of the multi-carrier signal.

Type: Grant

Filed: March 29, 2012

Date of Patent: March 10, 2015

Assignee: QRC, Inc.

Inventors: Sinisa Peric, Thomas F. Callahan, III
ELECTRONIC DEVICE AND METHOD FOR ENCRYPTING AND DECRYPTING DOCUMENT BASED ON VOICEPRINT TECHOLOGY

Publication number: 20150066509

Abstract: In a method for encrypting and decrypting a document based on a voiceprint recognition technology on an electronic device, an encryption key is generated and stored in a storage device of the electronic device. And a voiceprint is verified to determined whether the voiceprint is identical to a predefined voiceprint. if the voiceprint is identical to a predefined voiceprint, the encryption key is obtained from the storage device to encrypt a document. When the encrypted document is decrypted, a decryption key is generated to decrypt the encrypted document.

Type: Application

Filed: October 22, 2013

Publication date: March 5, 2015

Applicants: HON HAI PRECISION INDUSTRY CO., LTD., HONG FU JIN PRECISION INDUSTRY (WUHAN) CO., LTD.

Inventors: SHI-CHAO WANG, WEN-TING PENG, JIAN LI, YI-HUNG PENG
DATA PRE-PROCESSING AND PROCESSING FOR VOICE RECOGNITION

Publication number: 20150066508

Abstract: An apparatus, system, and computer readable media for data pre-processing and processing for voice recognition are described herein. The apparatus includes logic to pre-process multi-channel audio data and logic to resolve a source location. The apparatus also includes logic to perform wide range adaptive beam forming, and logic to perform full voice recognition.

Type: Application

Filed: August 30, 2013

Publication date: March 5, 2015

Inventor: Gangatharan Jothiswaran
LANGUAGE DELAY TREATMENT SYSTEM AND CONTROL METHOD FOR THE SAME

Publication number: 20150064666

Abstract: The present disclosures relates to a control terminal, comprising: a data communication unit for receiving a first user voice by data communication with a first audio device and receiving a second user voice by data communication with a second audio device; a turn information generating unit for generating turn information, which is voice unit information, by using the first and second user voices; and a metalanguage processing unit for determining a conversation pattern of the first and second users by using the turn information, and outputting a reminder message corresponding to a reminder event to the first user when the conversation pattern corresponds to a preset reminder event occurrence condition.

Type: Application

Filed: October 7, 2013

Publication date: March 5, 2015

Applicant: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY

Inventors: June Hwa Song, In Seok Hwang, Chung Kuk Yoo, Chan You Hwang, Young Ki Lee, John Dong Jun Kim, Dong Sun Jennifer Yim, Chul Hong Min
System and method for teaching non-lexical speech effects

Patent number: 8972259

Abstract: A method and system for teaching non-lexical speech effects includes delexicalizing a first speech segment to provide a first prosodic speech signal and data indicative of the first prosodic speech signal is stored in a computer memory. The first speech segment is audibly played to a language student and the student is prompted to recite the speech segment. The speech uttered by the student in response to the prompt, is recorded.

Type: Grant

Filed: September 9, 2010

Date of Patent: March 3, 2015

Assignee: Rosetta Stone, Ltd.

Inventors: Joseph Tepperman, Theban Stanley, Kadri Hacioglu
Multiple voices in audio content

Patent number: 8972265

Abstract: A content customization service is disclosed. The content customization service may identify one or more speakers in an item of content, and map one or more portions of the item of content to a speaker. A speaker may also be mapped to a voice. In one embodiment, the content customization service obtains portions of audio content synchronized to the mapped portions of the item of content. Each portion of audio content may be associated with a voice to which the speaker of the portion of the item of content is mapped. These portions of audio content may be combined to produce a combined item of audio content with multiple voices.

Type: Grant

Filed: June 18, 2012

Date of Patent: March 3, 2015

Assignee: Audible, Inc.

Inventor: Kevin S. Lester
Sparse maximum a posteriori (map) adaption

Patent number: 8972258

Abstract: Techniques disclosed herein include using a Maximum A Posteriori (MAP) adaptation process that imposes sparseness constraints to generate acoustic parameter adaptation data for specific users based on a relatively small set of training data. The resulting acoustic parameter adaptation data identifies changes for a relatively small fraction of acoustic parameters from a baseline acoustic speech model instead of changes to all acoustic parameters. This results in user-specific acoustic parameter adaptation data that is several orders of magnitude smaller than storage amounts otherwise required for a complete acoustic model. This provides customized acoustic speech models that increase recognition accuracy at a fraction of expected data storage requirements.

Type: Grant

Filed: May 22, 2014

Date of Patent: March 3, 2015

Assignee: Nuance Communications, Inc.

Inventors: Vaibhava Goel, Peder A. Olsen, Steven J. Rennie, Jing Huang
Speech recognition using multiple language models

Patent number: 8972260

Abstract: In accordance with one embodiment, a method of generating language models for speech recognition includes identifying a plurality of utterances in training data corresponding to speech, generating a frequency count of each utterance in the plurality of utterances, generating a high-frequency plurality of utterances from the plurality of utterances having a frequency that exceeds a predetermined frequency threshold, generating a low-frequency plurality of utterances from the plurality of utterances having a frequency that is below the predetermined frequency threshold, generating a grammar-based language model using the high-frequency plurality of utterances as training data, and generating a statistical language model using the low-frequency plurality of utterances as training data.

Type: Grant

Filed: April 19, 2012

Date of Patent: March 3, 2015

Assignee: Robert Bosch GmbH

Inventors: Fuliang Weng, Zhe Feng, Kui Xu, Lin Zhao
Method and apparatus for providing case restoration

Patent number: 8972855

Abstract: A method and apparatus for providing case restoration in a communication network are disclosed. For example, the method obtains one or more content sources from one or more information feeds, and extracts textual information from the one or more content sources obtained from the one or more information feeds. The method then creates or updates a capitalization model based on the textual information.

Type: Grant

Filed: December 16, 2008

Date of Patent: March 3, 2015

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Zhu Liu, David Gibbon, Behzad Shahraray

prev … 6 7 8 9 10 11 12 13 14 … next