Subportions Patents (Class 704/249)
-
Patent number: 8812318Abstract: One-to-many comparisons of callers' voice prints with known voice prints to identify any matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract at least a portion of the customer's voice to create a customer voice print, and it formats the segmented voice print for network transmission to a server. The server compares the customer's voice print with multiple known voice prints to determine any matches, meaning that the customer's voice print and one of the known voice prints are likely from the same person. The identification of any matches can be used for a variety of purposes, such as determining whether to authorize a transaction requested by the customer.Type: GrantFiled: February 6, 2012Date of Patent: August 19, 2014Assignee: III Holdings 1, LLCInventors: Vicki Broman, Vernon Marshall, Seshasayee Bellamkonda, Marcel Leyva, Cynthia Hanson
-
Publication number: 20140214425Abstract: A voice recognition apparatus and a method for providing response information are provided. The voice recognition apparatus according to the present disclosure includes an extractor configured to extract a first utterance element representing a user action and a second utterance element representing an object from a user's utterance voice signal; a domain determiner configured to detect an expansion domain related to the extracted first and second utterance elements based on a hierarchical domain model, and determine at least one candidate domain related to the detected expansion domain as a final domain; a communicator which performs communication with an external apparatus; and a controller configured to control the communicator to transmit information regarding the first and second utterance elements and information regarding the determined final domain.Type: ApplicationFiled: January 31, 2014Publication date: July 31, 2014Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Eun-sang BAK, Kyung-duk KIM, Myung-jae KIM, Yu LIU, Seong-han RYU, Geun-bae LEE
-
Publication number: 20140195236Abstract: In one embodiment, a computer system stores speech data for a plurality of speakers, where the speech data includes a plurality of feature vectors and, for each feature vector, an associated sub-phonetic class. The computer system then builds, based on the speech data, an artificial neural network (ANN) for modeling speech of a target speaker in the plurality of speakers, where the ANN is configured to discriminate between instances of sub-phonetic classes uttered by the target speaker and instances of sub-phonetic classes uttered by other speakers in the plurality of speakers.Type: ApplicationFiled: January 10, 2013Publication date: July 10, 2014Applicant: Sensory, IncorporatedInventors: John-Paul Hosom, Pieter J. Vermeulen, Jonathan Shaw
-
Publication number: 20140195237Abstract: A method and system for training a user authentication by voice signal are described. In one embodiment, a set of feature vectors are decomposed into speaker-specific recognition units. The speaker-specific recognition units are used to compute distribution values to train the voice signal. In addition, spectral feature vectors are decomposed into speaker-specific characteristic units which are compared to the speaker-specific distribution values. If the speaker-specific characteristic units are within a threshold limit of the speaker-specific distribution values, the speech signal is authenticated.Type: ApplicationFiled: January 9, 2014Publication date: July 10, 2014Applicant: APPLE INC.Inventors: Jerome R. BELLEGARDA, Kim E. A. SILVERMAN
-
Patent number: 8768706Abstract: Techniques are disclosed for facilitating the process of proofreading draft transcripts of spoken audio streams. In general, proofreading of a draft transcript is facilitated by playing back the corresponding spoken audio stream with an emphasis on those regions in the audio stream that are highly relevant or likely to have been transcribed incorrectly. Regions may be emphasized by, for example, playing them back more slowly than regions that are of low relevance and likely to have been transcribed correctly. Emphasizing those regions of the audio stream that are most important to transcribe correctly and those regions that are most likely to have been transcribed incorrectly increases the likelihood that the proofreader will accurately correct any errors in those regions, thereby improving the overall accuracy of the transcript.Type: GrantFiled: August 20, 2010Date of Patent: July 1, 2014Assignee: Multimodal Technologies, LLCInventors: Kjell Schubert, Juergen Fritsch, Michael Finke, Detlef Koll
-
Patent number: 8762149Abstract: The present invention refers to a method for verifying the identity of a speaker based on the speakers voice comprising the steps of: a) receiving a voice utterance; b) using biometric voice data to verify (10) that the speakers voice corresponds to the speaker the identity of which is to be verified based on the received voice utterance; and c) verifying (12, 13) that the received voice utterance is not falsified, preferably after having verified the speakers voice; d) accepting (16) the speakers identity to be verified in case that both verification steps give a positive result and not accepting (15) the speakers identity to be verified if any of the verification steps give a negative result. The invention further refers to a corresponding computer readable medium and a computer.Type: GrantFiled: December 10, 2008Date of Patent: June 24, 2014Inventors: Marta Sánchez Asenjo, Alfredo Gutiérrez Navarro, Alberto Martín de los Santos de las Heras, Marta García Gomar
-
Patent number: 8751233Abstract: A speaker-verification digital signature system is disclosed that provides greater confidence in communications having digital signatures because a signing party may be prompted to speak a text-phrase that may be different for each digital signature, thus making it difficult for anyone other than the legitimate signing party to provide a valid signature.Type: GrantFiled: July 31, 2012Date of Patent: June 10, 2014Assignee: AT&T Intellectual Property II, L.P.Inventors: Pradeep K. Bansal, Lee Begeja, Carroll W. Creswell, Jeffrey Farah, Benjamin J. Stern, Jay Wilpon
-
Publication number: 20140142943Abstract: A signal processing device includes a processor; and a memory which stores a plurality of instructions, which when executed by the processor, cause the processor to execute, receiving speech of a speaker as a first signal; detecting an expiration period included in the first signal; extracting a number of phonemes included in the expiration period; and controlling, a second signal, which is an output to the speaker, on the basis of the number of phonemes and a length of the expiration period.Type: ApplicationFiled: October 15, 2013Publication date: May 22, 2014Applicant: FUJITSU LIMITEDInventors: Chisato Ishikawa, Taro TOGAWA, Takeshi OTANI, Masanao SUZUKI
-
Patent number: 8731940Abstract: A method of controlling a system which includes the steps of obtaining at least one signal representative of information communicated by a user via an input device in an environment of the user, wherein a signal from a first source is available in a perceptible form in the environment; estimating at least a point in time when a transition between information flowing from the first source and information flowing from the user is expected to occur; and timing the performance of a function by the system in relation to the estimated time.Type: GrantFiled: September 11, 2009Date of Patent: May 20, 2014Assignee: Koninklijke Philips N.V.Inventor: Aki Sakari Harma
-
Publication number: 20140136204Abstract: Methods and systems are provided for a speech system of a vehicle. In one embodiment, the method includes: generating an utterance signature from a speech utterance received from a user of the speech system without a specific need for a user identification interaction; developing a user signature for a user based on the utterance signature; and managing a dialog with the user based on the user signature.Type: ApplicationFiled: October 22, 2013Publication date: May 15, 2014Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLCInventors: RON M. HECHT, OMER TSIMHONI, UTE WINTER, ROBERT D. SIMS, III
-
Publication number: 20140136206Abstract: Provided are a mash-up service generation apparatus and method based on a voice command. The mash-up service generation apparatus includes a voice recognizer configured to convert a voice command into a character string, a mash-up natural language processor configured to extract a word corresponding to a mash-up module based on the character string, and convert the word into at least one of metadata of the mash-up module and metadata of a mash-up sequence in which a plurality of mash-up modules are combined, and a mash-up sequence processor configured to search for and select a target mash-up sequence corresponding to the metadata of the mash-up sequence, and newly generate the target mash-up sequence. Accordingly, a customized mash-up service can be provided to a user.Type: ApplicationFiled: November 12, 2013Publication date: May 15, 2014Applicant: Electronics & Telecommunications Research InstituteInventors: Jae Chul KIM, Seong Ho LEE, Young Jae LIM, Yoon Seop CHANG
-
Publication number: 20140136205Abstract: Disclosed are a display apparatus, a voice acquiring apparatus and a voice recognition method thereof, the display apparatus including: a display unit which displays an image; a communication unit which communicates with a plurality of external apparatuses; and a controller which includes a voice recognition engine to recognize a user's voice, receives a voice signal from a voice acquiring unit, and controls the communication unit to receive candidate instruction words from at least one of the plurality of external apparatuses to recognize the received voice signal.Type: ApplicationFiled: November 11, 2013Publication date: May 15, 2014Applicant: Samsung Electronics Co., Ltd.Inventors: Jong-hyuk JANG, Chan-hee CHOI, Hee-seob RYU, Kyung-mi PARK, Seung-kwon PARK, Jae-hyun BAE
-
Patent number: 8725508Abstract: A computer-implemented method and apparatus for searching for an element sequence, the method comprising: receiving a signal; determining an initial segment of the signal; inputting the initial segment into an element extraction engine to obtain a first element sequence; determining one or more second segments, each of the second segments at least partially overlapping with the initial segment; inputting the second segments into the element extraction engine to obtain at least one second element sequence; and searching for an element subsequence common to at least a predetermined number of sequences of the first element sequence and the second element sequences.Type: GrantFiled: March 27, 2012Date of Patent: May 13, 2014Assignee: NovospeechInventor: Yossef Ben-Ezra
-
Publication number: 20140129219Abstract: A computer-implemented system and method for masking special data is provided. Speakers of a call recording are identified. The call recording is separated into strands corresponding to each of the speakers. A prompt list of elements that prompt the speaker of the other strand to utter special information is applied to one of the strands. At least one of the elements of the prompt list is identified in the one strand. A special information candidate is identified in the other strand and is located after a location in time where the element was found in the voice recording of the one strand. A confidence score is assigned to the element located in the one strand and to the special information candidate in the other strand. The confidence scores are combined and a threshold is applied. The special information candidate is rendered unintelligible when the combined confidence scores satisfy the threshold.Type: ApplicationFiled: November 4, 2013Publication date: May 8, 2014Applicant: Intellisist, Inc.Inventors: Howard M. Lee, Steven Lutz, Gilad Odinak
-
Publication number: 20140129224Abstract: A method and apparatus for utterance verification are provided for verifying a recognized vocabulary output from speech recognition. The apparatus for utterance verification includes a reference score accumulator, a verification score generator and a decision device. A log-likelihood score obtained from speech recognition is processed by taking a logarithm of the value of the probability of one of feature vectors of an input speech conditioned on one of states of each model vocabulary. A verification score is generated based on the processed result. The verification score is compared with a predetermined threshold value so as to reject or accept the recognized vocabulary.Type: ApplicationFiled: December 17, 2012Publication date: May 8, 2014Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTEInventor: Shih-Chieh Chien
-
Publication number: 20140122077Abstract: A voice agent device includes: a position detection unit which detects a position of a person in a conversation space to which the voice agent device is capable of providing information; a voice volume detection unit which detects a voice volume of the person from a sound signal in the conversation space obtained by a sound acquisition unit; a conversation area determination unit which determines a conversation area as a first area including the position when the voice volume has a first voice volume value and determines the conversation area as a second area including the position and being smaller than the first area when the voice volume has a second voice volume value smaller than the first voice volume value, the conversation area being a spatial range where an utterance of the person can be heard; and an information provision unit which provides provision information to the conversation area.Type: ApplicationFiled: October 25, 2013Publication date: May 1, 2014Applicant: Panasonic CorporationInventors: Yuri NISHIKAWA, Kazunori YAMADA
-
Patent number: 8706499Abstract: Client devices periodically capture ambient audio waveforms, generate waveform fingerprints, and upload the fingerprints to a server for analysis. The server compares the waveforms to a database of stored waveform fingerprints, and upon finding a match, pushes content or other information to the client device. The fingerprints in the database may be uploaded by other users, and compared to the received client waveform fingerprint based on common location or other social factors. Thus a client's location may be enhanced if the location of users whose fingerprints match the client's is known. In particular embodiments, the server may instruct clients whose fingerprints partially match to capture waveform data at a particular time and duration for further analysis and increased match confidence.Type: GrantFiled: August 16, 2011Date of Patent: April 22, 2014Assignee: Facebook, Inc.Inventors: Matthew Nicholas Papakipos, David Harry Garcia
-
Publication number: 20140095162Abstract: Improved techniques are disclosed for permitting a user to employ more human-based grammar (i.e., free form or conversational input) while addressing a target system via a voice system. For example, a technique for determining intent associated with a spoken utterance of a user comprises the following steps/operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a first class is determined after a first iteration and a sub-class of the first class is determined after a second iteration. The first class and the sub-class of the first class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target. The multi-stage intent extraction approach may have more than two iterations.Type: ApplicationFiled: October 1, 2013Publication date: April 3, 2014Applicant: Nuance Communications, Inc.Inventors: Dimitri Kanevsky, Joseph Simon Reisinger, Roberto Sicconi, Mahesh Viswanathan
-
Patent number: 8688451Abstract: A speech recognition method includes receiving input speech from a user, processing the input speech using a first grammar to obtain parameter values of a first N-best list of vocabulary, comparing a parameter value of a top result of the first N-best list to a threshold value, and if the compared parameter value is below the threshold value, then additionally processing the input speech using a second grammar to obtain parameter values of a second N-best list of vocabulary. Other preferred steps include: determining the input speech to be in-vocabulary if any of the results of the first N-best list is also present within the second N-best list, but out-of-vocabulary if none of the results of the first N-best list is within the second N-best list; and providing audible feedback to the user if the input speech is determined to be out-of-vocabulary.Type: GrantFiled: May 11, 2006Date of Patent: April 1, 2014Assignee: General Motors LLCInventors: Timothy J. Grost, Rathinavelu Chengalvarayan
-
Publication number: 20140081639Abstract: The communication support device includes: a storing unit configured to store an utterance of a first speaker transmitted from a first terminal as utterance information; an analyzing unit configured to obtain a holding notice which sets communications with the first terminal to a holding state, the communications being transmitted from a second terminal used by a second speaker who communicates with the first speaker, and to analyze features of utterance information which correspond to a time of a holding state; and an instructing unit configured to output to the second terminal determination information on the first speaker based on the features of the utterance information of the first speaker.Type: ApplicationFiled: August 30, 2013Publication date: March 20, 2014Applicant: FUJITSU LIMITEDInventors: Naoto KAWASHIMA, Naoto MATSUDAIRA, Yuusuke TOUNAI, Hiroshi YOSHIDA, Shingo HIRONO
-
Publication number: 20140081638Abstract: The invention refers to a method for comparing voice utterances, the method comprising the steps: extracting a plurality of features (201) from a first voice utterance of a given text sample and extracting a plurality of features (201) from a second voice utterance of said given text sample, wherein each feature is extracted as a function of time, and wherein each feature of the second voice utterance corresponds to a feature of the first voice utterance; applying dynamic time warping (202) to one or more time dependent characteristics of the first and/or second voice utterance e.g.Type: ApplicationFiled: December 10, 2009Publication date: March 20, 2014Inventors: Jesus Antonio Villalba Lopez, Alfonso Ortega Gimenez, Eduardo Lleida Solano, Sara Varela Redondo, Marta Garcia Gomar
-
Publication number: 20140081640Abstract: One aspect includes determining validity of an identity asserted by a speaker using a voice print associated with a user whose identity the speaker is asserting, the voice print obtained from characteristic features of at least one first voice signal obtained from the user uttering at least one enrollment utterance including at least one enrollment word by obtaining a second voice signal of the speaker uttering at least one challenge utterance that includes at least one word not in the at least one enrollment utterance, obtaining at least one characteristic feature from the second voice signal, comparing the at least one characteristic feature with at least a portion of the voice print to determine a similarity between the at least one characteristic feature and the at least a portion of the voice print, and determining whether the speaker is the user based, at least in part, on the similarity.Type: ApplicationFiled: November 21, 2013Publication date: March 20, 2014Applicant: Nuance Communications, Inc.Inventors: Kevin R. Farrell, David A. James, William F. Ganong, III, Jerry K. Carter
-
Patent number: 8676580Abstract: A method, an apparatus and an article of manufacture for automatic speech recognition. The method includes obtaining at least one language model word and at least one rule-based grammar word, determining an acoustic similarity of at least one pair of language model word and rule-based grammar word, and increasing a transition cost to the at least one language model word based on the acoustic similarity of the at least one language model word with the at least one rule-based grammar word to generate a modified language model for automatic speech recognition.Type: GrantFiled: August 16, 2011Date of Patent: March 18, 2014Assignee: International Business Machines CorporationInventors: Om D. Deshmukh, Etienne Marcheret, Shajith I. Mohamed, Ashish Verma, Karthik Visweswariah
-
Publication number: 20140046666Abstract: According to an embodiment, an information processing apparatus includes a dividing unit, an assigning unit, and a generating unit. The dividing unit is configured to divide speech data into pieces of utterance data. The assigning unit is configured to assign speaker identification information to each piece of utterance data based on an acoustic feature of the each piece of utterance data. The generating unit is configured to generate a candidate list that indicates candidate speaker names so as to enable a user to determine a speaker name to be given to the piece of utterance data identified by instruction information, based on operation history information in which at least pieces of utterance identification information, pieces of the speaker identification information, and speaker names given by the user to the respective pieces of utterance data are associated with one another.Type: ApplicationFiled: August 6, 2013Publication date: February 13, 2014Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Osamu Nishiyama, Taira Ashikawa, Tomoo Ikeda, Kouji Ueno, Kouta Nakata
-
Publication number: 20140039893Abstract: Disclosed embodiments provide for personalizing a voice user interface of a remote multi-user service. A voice user interface for the remote multi-user service can be provided and voice information from an identified user can be received at the multi-user service through the voice user interface. A language model specific to the identified user can be retrieved that models one or more language elements. The retrieved language model can be applied to interpret the received voice information and a response can be generated by the multi-user service in response the interpreted voice information.Type: ApplicationFiled: July 31, 2012Publication date: February 6, 2014Applicant: SRI INTERNATIONALInventor: Steven Weiner
-
Patent number: 8639508Abstract: A method of automatic speech recognition includes receiving an utterance from a user via a microphone that converts the utterance into a speech signal, pre-processing the speech signal using a processor to extract acoustic data from the received speech signal, and identifying at least one user-specific characteristic in response to the extracted acoustic data. The method also includes determining a user-specific confidence threshold responsive to the at least one user-specific characteristic, and using the user-specific confidence threshold to recognize the utterance received from the user and/or to assess confusability of the utterance with stored vocabulary.Type: GrantFiled: February 14, 2011Date of Patent: January 28, 2014Assignee: General Motors LLCInventors: Xufang Zhao, Gaurav Talwar
-
Publication number: 20140025377Abstract: A speech recognition system, method of recognizing speech and a computer program product therefor. A client device identified with a context for an associated user selectively streams audio to a provider computer, e.g., a cloud computer. Speech recognition receives streaming audio, maps utterances to specific textual candidates and determines a likelihood of a correct match for each mapped textual candidate. A context model selectively winnows candidate to resolve recognition ambiguity according to context whenever multiple textual candidates are recognized as potential matches for the same mapped utterance. Matches are used to update the context model, which may be used for multiple users in the same context.Type: ApplicationFiled: August 10, 2012Publication date: January 23, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Fernando Luiz Koch, Julio Nogima
-
Publication number: 20140025378Abstract: A first gender-specific speaker adaptation technique may be selected based on characteristics of a first set of feature vectors that correspond to a first unit of input speech. The first set of feature vectors may be configured for use in automatic speech recognition (ASR) of the first unit of input speech. A second set of feature vectors, which correspond to a second unit of input speech, may be modified based on the first gender-specific speaker adaptation technique. The modified second set of feature vectors may be configured for use in ASR of the second unit of input speech. A first speaker-dependent speaker adaptation technique may be selected based on characteristics of the second set of feature vectors. A third set of feature vectors, which correspond to a third unit of speech, may be modified based on the first speaker-dependent speaker adaptation technique.Type: ApplicationFiled: September 24, 2013Publication date: January 23, 2014Applicant: Google Inc.Inventors: Petar Aleksic, Xin Lei
-
Patent number: 8630860Abstract: Techniques disclosed herein include systems and methods for open-domain voice-enabled searching that is speaker sensitive. Techniques include using speech information, speaker information, and information associated with a spoken query to enhance open voice search results. This includes integrating a textual index with a voice index to support the entire search cycle. Given a voice query, the system can execute two matching processes simultaneously. This can include a text matching process based on the output of speech recognition, as well as a voice matching process based on characteristics of a caller or user voicing a query. Characteristics of the caller can include output of voice feature extraction and metadata about the call. The system clusters callers according to these characteristics. The system can use specific voice and text clusters to modify speech recognition results, as well as modifying search results.Type: GrantFiled: March 3, 2011Date of Patent: January 14, 2014Assignee: Nuance Communications, Inc.Inventors: Shilei Zhang, Shenghua Bao, Wen Liu, Yong Qin, Zhiwei Shuang, Jian Chen, Zhong Su, Qin Shi, William F. Ganong, III
-
Publication number: 20140012576Abstract: A signal processing method includes separating a mixed sound signal in which a plurality of excitations are mixed into the respective excitations, and performing speech detection on the plurality of separated excitation signals, judging whether or not the plurality of excitation signals are speech and generating speech section information indicating speech/non-speech information for each excitation signal. The signal processing signal also includes at least one of calculating and analyzing an utterance overlap duration using the speech section information for combinations of the plurality of excitation signals and of calculating and analyzing a silence duration. The signal processing signal further includes calculating a degree of establishment of a conversation indicating the degree of establishment of a conversation based on the extracted utterance overlap duration or the silence duration.Type: ApplicationFiled: June 26, 2013Publication date: January 9, 2014Applicant: PANASONIC CORPORATIONInventors: Maki YAMADA, Mitsuru ENDO, Koichiro MIZUSHIMA
-
Publication number: 20140012577Abstract: The system and method described herein may use various natural language models to deliver targeted advertisements and track advertisement interactions in voice recognition contexts. In particular, in response to an input device receiving an utterance, a conversational language processor may select and deliver one or more advertisements targeted to a user that spoke the utterance based on cognitive models associated with the user, various users having similar characteristics to the user, an environment in which the user spoke the utterance, or other criteria. Further, subsequent interaction with the targeted advertisements may be tracked to build and refine the cognitive models and thereby enhance the information used to deliver targeted advertisements in response to subsequent utterances.Type: ApplicationFiled: September 3, 2013Publication date: January 9, 2014Applicant: VoiceBox Technologies CorporationInventors: Tom Freeman, Mike Kennewick
-
Publication number: 20140012575Abstract: In some embodiments, the recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated to determine whether a meaning of any of the alternative recognition results differs from a meaning of the top recognition result in a manner that is significant for a domain, such as the medical domain. In some embodiments, words and/or phrases that may be confused by an ASR system may be determined and associated in sets of words and/or phrases. Words and/or phrases that may be determined include those that change a meaning of a phrase or sentence when included in the phrase/sentence.Type: ApplicationFiled: July 9, 2012Publication date: January 9, 2014Applicant: Nuance Communications, Inc.Inventors: William F. Ganong, III, Raghu Vemula, Robert Fleming
-
Patent number: 8626505Abstract: A computer implemented method, system, and/or computer program product generates an audio cohort. Audio data from a set of audio sensors is received by an audio analysis engine. The audio data, which is associated with a plurality of objects, comprises a set of audio patterns. The audio data is processed to identify audio attributes associated with the plurality of objects to form digital audio data. This digital audio data comprises metadata that describes the audio attributes of the set of objects. A set of audio cohorts is generated using the audio attributes associated with the digital audio data and cohort criteria, where each audio cohort in the set of audio cohorts is a cohort of accompanied customers in a store, and where processing the audio data identifies a type of zoological creature that is accompanying each of the accompanied customers.Type: GrantFiled: September 6, 2012Date of Patent: January 7, 2014Assignee: International Business Machines CorporationInventors: Robert L. Angell, Robert R. Friedlander, James R. Kraemer
-
Patent number: 8626508Abstract: Provided are a speech search device, the search speed of which is very fast, the search performance of which is also excellent, and which performs fuzzy search, and a speech search method. Not only the fuzzy search is performed, but also the distance between phoneme discrimination features included in speech data is calculated to determine the similarity with respect to the speech using both a suffix array and dynamic programming, and an object to be searched for is narrowed by means of search keyword division based on a phoneme and search thresholds relative to a plurality of the divided search keywords, the object to be searched for is repeatedly searched for while increasing the search thresholds in order, and whether or not there is the keyword division is determined according to the length of the search keywords, thereby implementing speech search, the search speed of which is very fast and the search performance of which is also excellent.Type: GrantFiled: February 10, 2010Date of Patent: January 7, 2014Assignee: National University Corporation TOYOHASHI UNIVERSITY OF TECHNOLOGYInventors: Koichi Katsurada, Tsuneo Nitta, Shigeki Teshima
-
Patent number: 8626507Abstract: Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.Type: GrantFiled: November 30, 2012Date of Patent: January 7, 2014Assignee: AT&T Intellectual Property II, L.P.Inventors: Srinivas Bangalore, Michael J. Johnston
-
Patent number: 8612225Abstract: A voice recognition device that recognizes a voice of an input voice signal, comprises a voice model storage unit that stores in advance a predetermined voice model having a plurality of detail levels, the plurality of detail levels being information indicating a feature property of a voice for the voice model; a detail level selection unit that selects a detail level, closest to a feature property of an input voice signal, from the detail levels of the voice model stored in the voice model storage unit; and a parameter setting unit that sets parameters for recognizing the voice of an input voice according to the detail level selected by the detail level selection unit.Type: GrantFiled: February 26, 2008Date of Patent: December 17, 2013Assignee: NEC CorporationInventors: Takayuki Arakawa, Ken Hanazawa, Masanori Tsujikawa
-
Patent number: 8612224Abstract: A method for identifying a plurality of speakers in audio data and for decoding the speech spoken by said speakers; the method comprising: receiving speech; dividing the speech into segments as it is received; processing the received speech segment by segment in the order received to identify the speaker and to decode the speech, processing comprising: performing primary decoding of the segment using an acoustic model and a language model; obtaining segment parameters indicating the differences between the speaker of the segment and a base speaker during the primary decoding; comparing the segment parameters with a plurality of stored speaker profiles to determine the identity of the speaker, and selecting a speaker profile for said speaker; updating the selected speaker profile; performing a further decoding of the segment using a speaker independent acoustic model, adapted using the updated speaker profile; outputting the decoded speech for the identified speaker, wherein the speaker profiles are updType: GrantFiled: August 23, 2011Date of Patent: December 17, 2013Assignee: Kabushiki Kaisha ToshibaInventors: Catherine Breslin, Mark John Francis Gales, Kean Kheong Chin, Katherine Mary Knill
-
Publication number: 20130325473Abstract: Embodiments of systems and methods for speaker verification are provided. In various embodiments, a method includes receiving an utterance from a speaker and determining a text-independent speaker verification score and a text-dependent speaker verification score in response to the utterance. Various embodiments include a system for speaker verification, the system comprising an audio receiving device for receiving an utterance from a speaker and converting the utterance to an utterance signal, and a processor coupled to the audio receiving device for determining speaker verification in response to the utterance signal, wherein the processor determines speaker verification in response to a UBM-independent speaker-normalized score.Type: ApplicationFiled: May 23, 2013Publication date: December 5, 2013Applicant: Agency for Science, Technology and ResearchInventors: Anthony Larcher, Kong Aik Lee, Bin Ma, Thai Ngoc Thuy Huong
-
Publication number: 20130289992Abstract: A voice recognition method includes: detecting a vocal section including a vocal sound in a voice, based on a feature value of an audio signal representing the voice; identifying a word expressed by the vocal sound in the vocal section, by matching the feature value of the audio signal of the vocal section and an acoustic model of each of a plurality of words; and selecting, with a processor, the word expressed by the vocal sound in a word section based on a comparison result between a signal characteristic of the word section and a signal characteristic of the vocal section.Type: ApplicationFiled: March 18, 2013Publication date: October 31, 2013Applicant: FUJITSU LIMITEDInventor: Shouji HARADA
-
Patent number: 8571865Abstract: Systems, methods performed by data processing apparatus and computer storage media encoded with computer programs for receiving information relating to (i) a communication device that has received an utterance and (ii) a voice associated with the received utterance, comparing the received voice information with voice signatures in a comparison group, the comparison group including one or more individuals identified from one or more connections arising from the received information relating to the communication device, attempting to identify the voice associated with the utterance as matching one of the individuals in the comparison group, and based on a result of the attempt to identify, selectively providing the communication device with access to one or more resources associated with the matched individual.Type: GrantFiled: August 10, 2012Date of Patent: October 29, 2013Assignee: Google Inc.Inventor: Philip Hewinson
-
Patent number: 8571867Abstract: A method (700) and system (900) for authenticating a user is provided. The method can include receiving one or more spoken utterances from a user (702), recognizing a phrase corresponding to one or more spoken utterances (704), identifying a biometric voice print of the user from one or more spoken utterances of the phrase (706), determining a device identifier associated with the device (708), and authenticating the user based on the phrase, the biometric voice print, and the device identifier (710). A location of the handset or the user can be employed as criteria for granting access to one or more resources (712).Type: GrantFiled: September 13, 2012Date of Patent: October 29, 2013Assignee: Porticus Technology, Inc.Inventors: Germano Di Mambro, Bernardas Salna
-
Publication number: 20130268273Abstract: A method of recognizing gender or age of a speaker according to speech emotion or arousal includes the following steps of A) segmentalizing speech signals into a plurality of speech segments; B) fetching the first speech segment from the plural speech segments to further acquire at least one of emotional features or arousal degree in the speech segment; C) determining whether at least one of the emotional feature and the arousal degree conforms to some condition; if yes, proceed to the step D); if no, return to the step B) and then fetch the next speech segment; D) fetching the feature indicative of gender or age from the speech segment to further acquire at least one feature parameter; and E) recognizing the at least one feature parameter to further determine the gender or age of the speaker at the currently-processed speech segment.Type: ApplicationFiled: July 27, 2012Publication date: October 10, 2013Inventors: Oscal Tzyh-Chiang Chen, Ping-Tsung Lu, Jia-You Ke
-
Patent number: 8554566Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.Type: GrantFiled: November 29, 2012Date of Patent: October 8, 2013Assignee: Morphism LLCInventor: James H. Stephens, Jr.
-
Patent number: 8548807Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.Type: GrantFiled: June 9, 2009Date of Patent: October 1, 2013Assignee: AT&T Intellectual Property I, L.P.Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
-
Patent number: 8532989Abstract: A command recognition device includes: an utterance understanding unit that determines or selects word sequence information from speech information; speech confidence degree calculating unit that calculates degree of speech confidence based on the speech information and the word sequence information; a phrase confidence degree calculating unit that calculates a degree of phrase confidence based on image information and phrase information included in the word sequence information; and a motion control instructing unit that determines whether a command of the word sequence information should be executed based on the degree of speech confidence and the degree of phrase confidence.Type: GrantFiled: September 2, 2010Date of Patent: September 10, 2013Assignee: Honda Motor Co., Ltd.Inventors: Kotaro Funakoshi, Mikio Nakano, Xiang Zuo, Naoto Iwahashi, Ryo Taguchi
-
Patent number: 8521527Abstract: A computer-implemented system and method for processing audio in a voice response environment is provided. A database of host scripts each comprising signature files of audio phrases and actions to take when one of the audio phrases is recognized is maintained. The host scripts are loaded and a call to a voice mail server is initiated. Incoming audio buffers are received during the call from voice messages stored on the voice mail server. The incoming audio buffers are processed. A signature data structure is created for each audio buffer. The signature data structure is compared with signatures of expected phrases in the host scripts. The actions stored in the host scripts are executed when the signature data structure matches the signature of the expected phrase.Type: GrantFiled: September 10, 2012Date of Patent: August 27, 2013Assignee: Intellisist, Inc.Inventor: Martin R. M. Dunsmuir
-
Patent number: 8515753Abstract: The example embodiment of the present invention provides an acoustic model adaptation method for enhancing recognition performance for a non-native speaker's speech. In order to adapt acoustic models, first, pronunciation variations are examined by analyzing a non-native speaker's speech. Thereafter, based on variation pronunciation of a non-native speaker's speech, acoustic models are adapted in a state-tying step during a training process of acoustic models. When the present invention for adapting acoustic models and a conventional acoustic model adaptation scheme are combined, more-enhanced recognition performance can be obtained. The example embodiment of the present invention enhances recognition performance for a non-native speaker's speech while reducing the degradation of recognition performance for a native speaker's speech.Type: GrantFiled: March 30, 2007Date of Patent: August 20, 2013Assignee: Gwangju Institute of Science and TechnologyInventors: Hong Kook Kim, Yoo Rhee Oh, Jae Sam Yoon
-
Patent number: 8489397Abstract: A machine-readable medium and a network device are provided for speech-to-text translation. Speech packets are received at a broadband telephony interface and stored in a buffer. The speech packets are processed and textual representations thereof are displayed as words on a display device. Speech processing is activated and deactivated in response to a command from a subscriber.Type: GrantFiled: September 11, 2012Date of Patent: July 16, 2013Assignee: AT&T Intellectual Property II, L.P.Inventors: Charles David Caldwell, John Bruce Harlow, Robert J. Sayko, Norman Shaye
-
Publication number: 20130173268Abstract: A method for verifying that a person is registered to use a telemedical device includes identifying an unprompted trigger phrase in words spoken by a person and received by the telemedical device. The telemedical device prompts the person to state a name of a registered user and optionally prompts the person to state health tips for the person. The telemedical device verifies that the person is the registered user using utterance data generated from the unprompted trigger phrase, name of the registered user, and health tips.Type: ApplicationFiled: December 29, 2011Publication date: July 4, 2013Applicant: Robert Bosch GmbHInventors: Fuliang Weng, Taufiq Hasan, Zhe Feng
-
Publication number: 20130166283Abstract: A phoneme rule generating apparatus includes a spectrum analyzer configured to analyze pronunciation patterns of voices included in a plurality of voice data, a clusterer configured to cluster the plurality of voice data based on the analyzed pronunciation patterns, a voice group generator configured to generate voice groups from the clustered voice data, a phoneme rule generator configured to generate a phoneme rule corresponding to each respective voice group from among the generated voice groups and a group mapping DB configured to store the generated voice groups and the generated phoneme rules for an accurate voice recognition.Type: ApplicationFiled: December 26, 2012Publication date: June 27, 2013Applicant: KT CORPORATIONInventor: KT Corporation