Subportions Patents (Class 704/249)
  • Patent number: 8812318
    Abstract: One-to-many comparisons of callers' voice prints with known voice prints to identify any matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract at least a portion of the customer's voice to create a customer voice print, and it formats the segmented voice print for network transmission to a server. The server compares the customer's voice print with multiple known voice prints to determine any matches, meaning that the customer's voice print and one of the known voice prints are likely from the same person. The identification of any matches can be used for a variety of purposes, such as determining whether to authorize a transaction requested by the customer.
    Type: Grant
    Filed: February 6, 2012
    Date of Patent: August 19, 2014
    Assignee: III Holdings 1, LLC
    Inventors: Vicki Broman, Vernon Marshall, Seshasayee Bellamkonda, Marcel Leyva, Cynthia Hanson
  • Publication number: 20140214425
    Abstract: A voice recognition apparatus and a method for providing response information are provided. The voice recognition apparatus according to the present disclosure includes an extractor configured to extract a first utterance element representing a user action and a second utterance element representing an object from a user's utterance voice signal; a domain determiner configured to detect an expansion domain related to the extracted first and second utterance elements based on a hierarchical domain model, and determine at least one candidate domain related to the detected expansion domain as a final domain; a communicator which performs communication with an external apparatus; and a controller configured to control the communicator to transmit information regarding the first and second utterance elements and information regarding the determined final domain.
    Type: Application
    Filed: January 31, 2014
    Publication date: July 31, 2014
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Eun-sang BAK, Kyung-duk KIM, Myung-jae KIM, Yu LIU, Seong-han RYU, Geun-bae LEE
  • Publication number: 20140195236
    Abstract: In one embodiment, a computer system stores speech data for a plurality of speakers, where the speech data includes a plurality of feature vectors and, for each feature vector, an associated sub-phonetic class. The computer system then builds, based on the speech data, an artificial neural network (ANN) for modeling speech of a target speaker in the plurality of speakers, where the ANN is configured to discriminate between instances of sub-phonetic classes uttered by the target speaker and instances of sub-phonetic classes uttered by other speakers in the plurality of speakers.
    Type: Application
    Filed: January 10, 2013
    Publication date: July 10, 2014
    Applicant: Sensory, Incorporated
    Inventors: John-Paul Hosom, Pieter J. Vermeulen, Jonathan Shaw
  • Publication number: 20140195237
    Abstract: A method and system for training a user authentication by voice signal are described. In one embodiment, a set of feature vectors are decomposed into speaker-specific recognition units. The speaker-specific recognition units are used to compute distribution values to train the voice signal. In addition, spectral feature vectors are decomposed into speaker-specific characteristic units which are compared to the speaker-specific distribution values. If the speaker-specific characteristic units are within a threshold limit of the speaker-specific distribution values, the speech signal is authenticated.
    Type: Application
    Filed: January 9, 2014
    Publication date: July 10, 2014
    Applicant: APPLE INC.
    Inventors: Jerome R. BELLEGARDA, Kim E. A. SILVERMAN
  • Patent number: 8768706
    Abstract: Techniques are disclosed for facilitating the process of proofreading draft transcripts of spoken audio streams. In general, proofreading of a draft transcript is facilitated by playing back the corresponding spoken audio stream with an emphasis on those regions in the audio stream that are highly relevant or likely to have been transcribed incorrectly. Regions may be emphasized by, for example, playing them back more slowly than regions that are of low relevance and likely to have been transcribed correctly. Emphasizing those regions of the audio stream that are most important to transcribe correctly and those regions that are most likely to have been transcribed incorrectly increases the likelihood that the proofreader will accurately correct any errors in those regions, thereby improving the overall accuracy of the transcript.
    Type: Grant
    Filed: August 20, 2010
    Date of Patent: July 1, 2014
    Assignee: Multimodal Technologies, LLC
    Inventors: Kjell Schubert, Juergen Fritsch, Michael Finke, Detlef Koll
  • Patent number: 8762149
    Abstract: The present invention refers to a method for verifying the identity of a speaker based on the speakers voice comprising the steps of: a) receiving a voice utterance; b) using biometric voice data to verify (10) that the speakers voice corresponds to the speaker the identity of which is to be verified based on the received voice utterance; and c) verifying (12, 13) that the received voice utterance is not falsified, preferably after having verified the speakers voice; d) accepting (16) the speakers identity to be verified in case that both verification steps give a positive result and not accepting (15) the speakers identity to be verified if any of the verification steps give a negative result. The invention further refers to a corresponding computer readable medium and a computer.
    Type: Grant
    Filed: December 10, 2008
    Date of Patent: June 24, 2014
    Inventors: Marta Sánchez Asenjo, Alfredo Gutiérrez Navarro, Alberto Martín de los Santos de las Heras, Marta García Gomar
  • Patent number: 8751233
    Abstract: A speaker-verification digital signature system is disclosed that provides greater confidence in communications having digital signatures because a signing party may be prompted to speak a text-phrase that may be different for each digital signature, thus making it difficult for anyone other than the legitimate signing party to provide a valid signature.
    Type: Grant
    Filed: July 31, 2012
    Date of Patent: June 10, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Pradeep K. Bansal, Lee Begeja, Carroll W. Creswell, Jeffrey Farah, Benjamin J. Stern, Jay Wilpon
  • Publication number: 20140142943
    Abstract: A signal processing device includes a processor; and a memory which stores a plurality of instructions, which when executed by the processor, cause the processor to execute, receiving speech of a speaker as a first signal; detecting an expiration period included in the first signal; extracting a number of phonemes included in the expiration period; and controlling, a second signal, which is an output to the speaker, on the basis of the number of phonemes and a length of the expiration period.
    Type: Application
    Filed: October 15, 2013
    Publication date: May 22, 2014
    Applicant: FUJITSU LIMITED
    Inventors: Chisato Ishikawa, Taro TOGAWA, Takeshi OTANI, Masanao SUZUKI
  • Patent number: 8731940
    Abstract: A method of controlling a system which includes the steps of obtaining at least one signal representative of information communicated by a user via an input device in an environment of the user, wherein a signal from a first source is available in a perceptible form in the environment; estimating at least a point in time when a transition between information flowing from the first source and information flowing from the user is expected to occur; and timing the performance of a function by the system in relation to the estimated time.
    Type: Grant
    Filed: September 11, 2009
    Date of Patent: May 20, 2014
    Assignee: Koninklijke Philips N.V.
    Inventor: Aki Sakari Harma
  • Publication number: 20140136204
    Abstract: Methods and systems are provided for a speech system of a vehicle. In one embodiment, the method includes: generating an utterance signature from a speech utterance received from a user of the speech system without a specific need for a user identification interaction; developing a user signature for a user based on the utterance signature; and managing a dialog with the user based on the user signature.
    Type: Application
    Filed: October 22, 2013
    Publication date: May 15, 2014
    Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLC
    Inventors: RON M. HECHT, OMER TSIMHONI, UTE WINTER, ROBERT D. SIMS, III
  • Publication number: 20140136206
    Abstract: Provided are a mash-up service generation apparatus and method based on a voice command. The mash-up service generation apparatus includes a voice recognizer configured to convert a voice command into a character string, a mash-up natural language processor configured to extract a word corresponding to a mash-up module based on the character string, and convert the word into at least one of metadata of the mash-up module and metadata of a mash-up sequence in which a plurality of mash-up modules are combined, and a mash-up sequence processor configured to search for and select a target mash-up sequence corresponding to the metadata of the mash-up sequence, and newly generate the target mash-up sequence. Accordingly, a customized mash-up service can be provided to a user.
    Type: Application
    Filed: November 12, 2013
    Publication date: May 15, 2014
    Applicant: Electronics & Telecommunications Research Institute
    Inventors: Jae Chul KIM, Seong Ho LEE, Young Jae LIM, Yoon Seop CHANG
  • Publication number: 20140136205
    Abstract: Disclosed are a display apparatus, a voice acquiring apparatus and a voice recognition method thereof, the display apparatus including: a display unit which displays an image; a communication unit which communicates with a plurality of external apparatuses; and a controller which includes a voice recognition engine to recognize a user's voice, receives a voice signal from a voice acquiring unit, and controls the communication unit to receive candidate instruction words from at least one of the plurality of external apparatuses to recognize the received voice signal.
    Type: Application
    Filed: November 11, 2013
    Publication date: May 15, 2014
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Jong-hyuk JANG, Chan-hee CHOI, Hee-seob RYU, Kyung-mi PARK, Seung-kwon PARK, Jae-hyun BAE
  • Patent number: 8725508
    Abstract: A computer-implemented method and apparatus for searching for an element sequence, the method comprising: receiving a signal; determining an initial segment of the signal; inputting the initial segment into an element extraction engine to obtain a first element sequence; determining one or more second segments, each of the second segments at least partially overlapping with the initial segment; inputting the second segments into the element extraction engine to obtain at least one second element sequence; and searching for an element subsequence common to at least a predetermined number of sequences of the first element sequence and the second element sequences.
    Type: Grant
    Filed: March 27, 2012
    Date of Patent: May 13, 2014
    Assignee: Novospeech
    Inventor: Yossef Ben-Ezra
  • Publication number: 20140129219
    Abstract: A computer-implemented system and method for masking special data is provided. Speakers of a call recording are identified. The call recording is separated into strands corresponding to each of the speakers. A prompt list of elements that prompt the speaker of the other strand to utter special information is applied to one of the strands. At least one of the elements of the prompt list is identified in the one strand. A special information candidate is identified in the other strand and is located after a location in time where the element was found in the voice recording of the one strand. A confidence score is assigned to the element located in the one strand and to the special information candidate in the other strand. The confidence scores are combined and a threshold is applied. The special information candidate is rendered unintelligible when the combined confidence scores satisfy the threshold.
    Type: Application
    Filed: November 4, 2013
    Publication date: May 8, 2014
    Applicant: Intellisist, Inc.
    Inventors: Howard M. Lee, Steven Lutz, Gilad Odinak
  • Publication number: 20140129224
    Abstract: A method and apparatus for utterance verification are provided for verifying a recognized vocabulary output from speech recognition. The apparatus for utterance verification includes a reference score accumulator, a verification score generator and a decision device. A log-likelihood score obtained from speech recognition is processed by taking a logarithm of the value of the probability of one of feature vectors of an input speech conditioned on one of states of each model vocabulary. A verification score is generated based on the processed result. The verification score is compared with a predetermined threshold value so as to reject or accept the recognized vocabulary.
    Type: Application
    Filed: December 17, 2012
    Publication date: May 8, 2014
    Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE
    Inventor: Shih-Chieh Chien
  • Publication number: 20140122077
    Abstract: A voice agent device includes: a position detection unit which detects a position of a person in a conversation space to which the voice agent device is capable of providing information; a voice volume detection unit which detects a voice volume of the person from a sound signal in the conversation space obtained by a sound acquisition unit; a conversation area determination unit which determines a conversation area as a first area including the position when the voice volume has a first voice volume value and determines the conversation area as a second area including the position and being smaller than the first area when the voice volume has a second voice volume value smaller than the first voice volume value, the conversation area being a spatial range where an utterance of the person can be heard; and an information provision unit which provides provision information to the conversation area.
    Type: Application
    Filed: October 25, 2013
    Publication date: May 1, 2014
    Applicant: Panasonic Corporation
    Inventors: Yuri NISHIKAWA, Kazunori YAMADA
  • Patent number: 8706499
    Abstract: Client devices periodically capture ambient audio waveforms, generate waveform fingerprints, and upload the fingerprints to a server for analysis. The server compares the waveforms to a database of stored waveform fingerprints, and upon finding a match, pushes content or other information to the client device. The fingerprints in the database may be uploaded by other users, and compared to the received client waveform fingerprint based on common location or other social factors. Thus a client's location may be enhanced if the location of users whose fingerprints match the client's is known. In particular embodiments, the server may instruct clients whose fingerprints partially match to capture waveform data at a particular time and duration for further analysis and increased match confidence.
    Type: Grant
    Filed: August 16, 2011
    Date of Patent: April 22, 2014
    Assignee: Facebook, Inc.
    Inventors: Matthew Nicholas Papakipos, David Harry Garcia
  • Publication number: 20140095162
    Abstract: Improved techniques are disclosed for permitting a user to employ more human-based grammar (i.e., free form or conversational input) while addressing a target system via a voice system. For example, a technique for determining intent associated with a spoken utterance of a user comprises the following steps/operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a first class is determined after a first iteration and a sub-class of the first class is determined after a second iteration. The first class and the sub-class of the first class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target. The multi-stage intent extraction approach may have more than two iterations.
    Type: Application
    Filed: October 1, 2013
    Publication date: April 3, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Dimitri Kanevsky, Joseph Simon Reisinger, Roberto Sicconi, Mahesh Viswanathan
  • Patent number: 8688451
    Abstract: A speech recognition method includes receiving input speech from a user, processing the input speech using a first grammar to obtain parameter values of a first N-best list of vocabulary, comparing a parameter value of a top result of the first N-best list to a threshold value, and if the compared parameter value is below the threshold value, then additionally processing the input speech using a second grammar to obtain parameter values of a second N-best list of vocabulary. Other preferred steps include: determining the input speech to be in-vocabulary if any of the results of the first N-best list is also present within the second N-best list, but out-of-vocabulary if none of the results of the first N-best list is within the second N-best list; and providing audible feedback to the user if the input speech is determined to be out-of-vocabulary.
    Type: Grant
    Filed: May 11, 2006
    Date of Patent: April 1, 2014
    Assignee: General Motors LLC
    Inventors: Timothy J. Grost, Rathinavelu Chengalvarayan
  • Publication number: 20140081639
    Abstract: The communication support device includes: a storing unit configured to store an utterance of a first speaker transmitted from a first terminal as utterance information; an analyzing unit configured to obtain a holding notice which sets communications with the first terminal to a holding state, the communications being transmitted from a second terminal used by a second speaker who communicates with the first speaker, and to analyze features of utterance information which correspond to a time of a holding state; and an instructing unit configured to output to the second terminal determination information on the first speaker based on the features of the utterance information of the first speaker.
    Type: Application
    Filed: August 30, 2013
    Publication date: March 20, 2014
    Applicant: FUJITSU LIMITED
    Inventors: Naoto KAWASHIMA, Naoto MATSUDAIRA, Yuusuke TOUNAI, Hiroshi YOSHIDA, Shingo HIRONO
  • Publication number: 20140081638
    Abstract: The invention refers to a method for comparing voice utterances, the method comprising the steps: extracting a plurality of features (201) from a first voice utterance of a given text sample and extracting a plurality of features (201) from a second voice utterance of said given text sample, wherein each feature is extracted as a function of time, and wherein each feature of the second voice utterance corresponds to a feature of the first voice utterance; applying dynamic time warping (202) to one or more time dependent characteristics of the first and/or second voice utterance e.g.
    Type: Application
    Filed: December 10, 2009
    Publication date: March 20, 2014
    Inventors: Jesus Antonio Villalba Lopez, Alfonso Ortega Gimenez, Eduardo Lleida Solano, Sara Varela Redondo, Marta Garcia Gomar
  • Publication number: 20140081640
    Abstract: One aspect includes determining validity of an identity asserted by a speaker using a voice print associated with a user whose identity the speaker is asserting, the voice print obtained from characteristic features of at least one first voice signal obtained from the user uttering at least one enrollment utterance including at least one enrollment word by obtaining a second voice signal of the speaker uttering at least one challenge utterance that includes at least one word not in the at least one enrollment utterance, obtaining at least one characteristic feature from the second voice signal, comparing the at least one characteristic feature with at least a portion of the voice print to determine a similarity between the at least one characteristic feature and the at least a portion of the voice print, and determining whether the speaker is the user based, at least in part, on the similarity.
    Type: Application
    Filed: November 21, 2013
    Publication date: March 20, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Kevin R. Farrell, David A. James, William F. Ganong, III, Jerry K. Carter
  • Patent number: 8676580
    Abstract: A method, an apparatus and an article of manufacture for automatic speech recognition. The method includes obtaining at least one language model word and at least one rule-based grammar word, determining an acoustic similarity of at least one pair of language model word and rule-based grammar word, and increasing a transition cost to the at least one language model word based on the acoustic similarity of the at least one language model word with the at least one rule-based grammar word to generate a modified language model for automatic speech recognition.
    Type: Grant
    Filed: August 16, 2011
    Date of Patent: March 18, 2014
    Assignee: International Business Machines Corporation
    Inventors: Om D. Deshmukh, Etienne Marcheret, Shajith I. Mohamed, Ashish Verma, Karthik Visweswariah
  • Publication number: 20140046666
    Abstract: According to an embodiment, an information processing apparatus includes a dividing unit, an assigning unit, and a generating unit. The dividing unit is configured to divide speech data into pieces of utterance data. The assigning unit is configured to assign speaker identification information to each piece of utterance data based on an acoustic feature of the each piece of utterance data. The generating unit is configured to generate a candidate list that indicates candidate speaker names so as to enable a user to determine a speaker name to be given to the piece of utterance data identified by instruction information, based on operation history information in which at least pieces of utterance identification information, pieces of the speaker identification information, and speaker names given by the user to the respective pieces of utterance data are associated with one another.
    Type: Application
    Filed: August 6, 2013
    Publication date: February 13, 2014
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Osamu Nishiyama, Taira Ashikawa, Tomoo Ikeda, Kouji Ueno, Kouta Nakata
  • Publication number: 20140039893
    Abstract: Disclosed embodiments provide for personalizing a voice user interface of a remote multi-user service. A voice user interface for the remote multi-user service can be provided and voice information from an identified user can be received at the multi-user service through the voice user interface. A language model specific to the identified user can be retrieved that models one or more language elements. The retrieved language model can be applied to interpret the received voice information and a response can be generated by the multi-user service in response the interpreted voice information.
    Type: Application
    Filed: July 31, 2012
    Publication date: February 6, 2014
    Applicant: SRI INTERNATIONAL
    Inventor: Steven Weiner
  • Patent number: 8639508
    Abstract: A method of automatic speech recognition includes receiving an utterance from a user via a microphone that converts the utterance into a speech signal, pre-processing the speech signal using a processor to extract acoustic data from the received speech signal, and identifying at least one user-specific characteristic in response to the extracted acoustic data. The method also includes determining a user-specific confidence threshold responsive to the at least one user-specific characteristic, and using the user-specific confidence threshold to recognize the utterance received from the user and/or to assess confusability of the utterance with stored vocabulary.
    Type: Grant
    Filed: February 14, 2011
    Date of Patent: January 28, 2014
    Assignee: General Motors LLC
    Inventors: Xufang Zhao, Gaurav Talwar
  • Publication number: 20140025377
    Abstract: A speech recognition system, method of recognizing speech and a computer program product therefor. A client device identified with a context for an associated user selectively streams audio to a provider computer, e.g., a cloud computer. Speech recognition receives streaming audio, maps utterances to specific textual candidates and determines a likelihood of a correct match for each mapped textual candidate. A context model selectively winnows candidate to resolve recognition ambiguity according to context whenever multiple textual candidates are recognized as potential matches for the same mapped utterance. Matches are used to update the context model, which may be used for multiple users in the same context.
    Type: Application
    Filed: August 10, 2012
    Publication date: January 23, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Fernando Luiz Koch, Julio Nogima
  • Publication number: 20140025378
    Abstract: A first gender-specific speaker adaptation technique may be selected based on characteristics of a first set of feature vectors that correspond to a first unit of input speech. The first set of feature vectors may be configured for use in automatic speech recognition (ASR) of the first unit of input speech. A second set of feature vectors, which correspond to a second unit of input speech, may be modified based on the first gender-specific speaker adaptation technique. The modified second set of feature vectors may be configured for use in ASR of the second unit of input speech. A first speaker-dependent speaker adaptation technique may be selected based on characteristics of the second set of feature vectors. A third set of feature vectors, which correspond to a third unit of speech, may be modified based on the first speaker-dependent speaker adaptation technique.
    Type: Application
    Filed: September 24, 2013
    Publication date: January 23, 2014
    Applicant: Google Inc.
    Inventors: Petar Aleksic, Xin Lei
  • Patent number: 8630860
    Abstract: Techniques disclosed herein include systems and methods for open-domain voice-enabled searching that is speaker sensitive. Techniques include using speech information, speaker information, and information associated with a spoken query to enhance open voice search results. This includes integrating a textual index with a voice index to support the entire search cycle. Given a voice query, the system can execute two matching processes simultaneously. This can include a text matching process based on the output of speech recognition, as well as a voice matching process based on characteristics of a caller or user voicing a query. Characteristics of the caller can include output of voice feature extraction and metadata about the call. The system clusters callers according to these characteristics. The system can use specific voice and text clusters to modify speech recognition results, as well as modifying search results.
    Type: Grant
    Filed: March 3, 2011
    Date of Patent: January 14, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Shilei Zhang, Shenghua Bao, Wen Liu, Yong Qin, Zhiwei Shuang, Jian Chen, Zhong Su, Qin Shi, William F. Ganong, III
  • Publication number: 20140012576
    Abstract: A signal processing method includes separating a mixed sound signal in which a plurality of excitations are mixed into the respective excitations, and performing speech detection on the plurality of separated excitation signals, judging whether or not the plurality of excitation signals are speech and generating speech section information indicating speech/non-speech information for each excitation signal. The signal processing signal also includes at least one of calculating and analyzing an utterance overlap duration using the speech section information for combinations of the plurality of excitation signals and of calculating and analyzing a silence duration. The signal processing signal further includes calculating a degree of establishment of a conversation indicating the degree of establishment of a conversation based on the extracted utterance overlap duration or the silence duration.
    Type: Application
    Filed: June 26, 2013
    Publication date: January 9, 2014
    Applicant: PANASONIC CORPORATION
    Inventors: Maki YAMADA, Mitsuru ENDO, Koichiro MIZUSHIMA
  • Publication number: 20140012577
    Abstract: The system and method described herein may use various natural language models to deliver targeted advertisements and track advertisement interactions in voice recognition contexts. In particular, in response to an input device receiving an utterance, a conversational language processor may select and deliver one or more advertisements targeted to a user that spoke the utterance based on cognitive models associated with the user, various users having similar characteristics to the user, an environment in which the user spoke the utterance, or other criteria. Further, subsequent interaction with the targeted advertisements may be tracked to build and refine the cognitive models and thereby enhance the information used to deliver targeted advertisements in response to subsequent utterances.
    Type: Application
    Filed: September 3, 2013
    Publication date: January 9, 2014
    Applicant: VoiceBox Technologies Corporation
    Inventors: Tom Freeman, Mike Kennewick
  • Publication number: 20140012575
    Abstract: In some embodiments, the recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated to determine whether a meaning of any of the alternative recognition results differs from a meaning of the top recognition result in a manner that is significant for a domain, such as the medical domain. In some embodiments, words and/or phrases that may be confused by an ASR system may be determined and associated in sets of words and/or phrases. Words and/or phrases that may be determined include those that change a meaning of a phrase or sentence when included in the phrase/sentence.
    Type: Application
    Filed: July 9, 2012
    Publication date: January 9, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: William F. Ganong, III, Raghu Vemula, Robert Fleming
  • Patent number: 8626505
    Abstract: A computer implemented method, system, and/or computer program product generates an audio cohort. Audio data from a set of audio sensors is received by an audio analysis engine. The audio data, which is associated with a plurality of objects, comprises a set of audio patterns. The audio data is processed to identify audio attributes associated with the plurality of objects to form digital audio data. This digital audio data comprises metadata that describes the audio attributes of the set of objects. A set of audio cohorts is generated using the audio attributes associated with the digital audio data and cohort criteria, where each audio cohort in the set of audio cohorts is a cohort of accompanied customers in a store, and where processing the audio data identifies a type of zoological creature that is accompanying each of the accompanied customers.
    Type: Grant
    Filed: September 6, 2012
    Date of Patent: January 7, 2014
    Assignee: International Business Machines Corporation
    Inventors: Robert L. Angell, Robert R. Friedlander, James R. Kraemer
  • Patent number: 8626508
    Abstract: Provided are a speech search device, the search speed of which is very fast, the search performance of which is also excellent, and which performs fuzzy search, and a speech search method. Not only the fuzzy search is performed, but also the distance between phoneme discrimination features included in speech data is calculated to determine the similarity with respect to the speech using both a suffix array and dynamic programming, and an object to be searched for is narrowed by means of search keyword division based on a phoneme and search thresholds relative to a plurality of the divided search keywords, the object to be searched for is repeatedly searched for while increasing the search thresholds in order, and whether or not there is the keyword division is determined according to the length of the search keywords, thereby implementing speech search, the search speed of which is very fast and the search performance of which is also excellent.
    Type: Grant
    Filed: February 10, 2010
    Date of Patent: January 7, 2014
    Assignee: National University Corporation TOYOHASHI UNIVERSITY OF TECHNOLOGY
    Inventors: Koichi Katsurada, Tsuneo Nitta, Shigeki Teshima
  • Patent number: 8626507
    Abstract: Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.
    Type: Grant
    Filed: November 30, 2012
    Date of Patent: January 7, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Srinivas Bangalore, Michael J. Johnston
  • Patent number: 8612225
    Abstract: A voice recognition device that recognizes a voice of an input voice signal, comprises a voice model storage unit that stores in advance a predetermined voice model having a plurality of detail levels, the plurality of detail levels being information indicating a feature property of a voice for the voice model; a detail level selection unit that selects a detail level, closest to a feature property of an input voice signal, from the detail levels of the voice model stored in the voice model storage unit; and a parameter setting unit that sets parameters for recognizing the voice of an input voice according to the detail level selected by the detail level selection unit.
    Type: Grant
    Filed: February 26, 2008
    Date of Patent: December 17, 2013
    Assignee: NEC Corporation
    Inventors: Takayuki Arakawa, Ken Hanazawa, Masanori Tsujikawa
  • Patent number: 8612224
    Abstract: A method for identifying a plurality of speakers in audio data and for decoding the speech spoken by said speakers; the method comprising: receiving speech; dividing the speech into segments as it is received; processing the received speech segment by segment in the order received to identify the speaker and to decode the speech, processing comprising: performing primary decoding of the segment using an acoustic model and a language model; obtaining segment parameters indicating the differences between the speaker of the segment and a base speaker during the primary decoding; comparing the segment parameters with a plurality of stored speaker profiles to determine the identity of the speaker, and selecting a speaker profile for said speaker; updating the selected speaker profile; performing a further decoding of the segment using a speaker independent acoustic model, adapted using the updated speaker profile; outputting the decoded speech for the identified speaker, wherein the speaker profiles are upd
    Type: Grant
    Filed: August 23, 2011
    Date of Patent: December 17, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Catherine Breslin, Mark John Francis Gales, Kean Kheong Chin, Katherine Mary Knill
  • Publication number: 20130325473
    Abstract: Embodiments of systems and methods for speaker verification are provided. In various embodiments, a method includes receiving an utterance from a speaker and determining a text-independent speaker verification score and a text-dependent speaker verification score in response to the utterance. Various embodiments include a system for speaker verification, the system comprising an audio receiving device for receiving an utterance from a speaker and converting the utterance to an utterance signal, and a processor coupled to the audio receiving device for determining speaker verification in response to the utterance signal, wherein the processor determines speaker verification in response to a UBM-independent speaker-normalized score.
    Type: Application
    Filed: May 23, 2013
    Publication date: December 5, 2013
    Applicant: Agency for Science, Technology and Research
    Inventors: Anthony Larcher, Kong Aik Lee, Bin Ma, Thai Ngoc Thuy Huong
  • Publication number: 20130289992
    Abstract: A voice recognition method includes: detecting a vocal section including a vocal sound in a voice, based on a feature value of an audio signal representing the voice; identifying a word expressed by the vocal sound in the vocal section, by matching the feature value of the audio signal of the vocal section and an acoustic model of each of a plurality of words; and selecting, with a processor, the word expressed by the vocal sound in a word section based on a comparison result between a signal characteristic of the word section and a signal characteristic of the vocal section.
    Type: Application
    Filed: March 18, 2013
    Publication date: October 31, 2013
    Applicant: FUJITSU LIMITED
    Inventor: Shouji HARADA
  • Patent number: 8571865
    Abstract: Systems, methods performed by data processing apparatus and computer storage media encoded with computer programs for receiving information relating to (i) a communication device that has received an utterance and (ii) a voice associated with the received utterance, comparing the received voice information with voice signatures in a comparison group, the comparison group including one or more individuals identified from one or more connections arising from the received information relating to the communication device, attempting to identify the voice associated with the utterance as matching one of the individuals in the comparison group, and based on a result of the attempt to identify, selectively providing the communication device with access to one or more resources associated with the matched individual.
    Type: Grant
    Filed: August 10, 2012
    Date of Patent: October 29, 2013
    Assignee: Google Inc.
    Inventor: Philip Hewinson
  • Patent number: 8571867
    Abstract: A method (700) and system (900) for authenticating a user is provided. The method can include receiving one or more spoken utterances from a user (702), recognizing a phrase corresponding to one or more spoken utterances (704), identifying a biometric voice print of the user from one or more spoken utterances of the phrase (706), determining a device identifier associated with the device (708), and authenticating the user based on the phrase, the biometric voice print, and the device identifier (710). A location of the handset or the user can be employed as criteria for granting access to one or more resources (712).
    Type: Grant
    Filed: September 13, 2012
    Date of Patent: October 29, 2013
    Assignee: Porticus Technology, Inc.
    Inventors: Germano Di Mambro, Bernardas Salna
  • Publication number: 20130268273
    Abstract: A method of recognizing gender or age of a speaker according to speech emotion or arousal includes the following steps of A) segmentalizing speech signals into a plurality of speech segments; B) fetching the first speech segment from the plural speech segments to further acquire at least one of emotional features or arousal degree in the speech segment; C) determining whether at least one of the emotional feature and the arousal degree conforms to some condition; if yes, proceed to the step D); if no, return to the step B) and then fetch the next speech segment; D) fetching the feature indicative of gender or age from the speech segment to further acquire at least one feature parameter; and E) recognizing the at least one feature parameter to further determine the gender or age of the speaker at the currently-processed speech segment.
    Type: Application
    Filed: July 27, 2012
    Publication date: October 10, 2013
    Inventors: Oscal Tzyh-Chiang Chen, Ping-Tsung Lu, Jia-You Ke
  • Patent number: 8554566
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: November 29, 2012
    Date of Patent: October 8, 2013
    Assignee: Morphism LLC
    Inventor: James H. Stephens, Jr.
  • Patent number: 8548807
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
    Type: Grant
    Filed: June 9, 2009
    Date of Patent: October 1, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 8532989
    Abstract: A command recognition device includes: an utterance understanding unit that determines or selects word sequence information from speech information; speech confidence degree calculating unit that calculates degree of speech confidence based on the speech information and the word sequence information; a phrase confidence degree calculating unit that calculates a degree of phrase confidence based on image information and phrase information included in the word sequence information; and a motion control instructing unit that determines whether a command of the word sequence information should be executed based on the degree of speech confidence and the degree of phrase confidence.
    Type: Grant
    Filed: September 2, 2010
    Date of Patent: September 10, 2013
    Assignee: Honda Motor Co., Ltd.
    Inventors: Kotaro Funakoshi, Mikio Nakano, Xiang Zuo, Naoto Iwahashi, Ryo Taguchi
  • Patent number: 8521527
    Abstract: A computer-implemented system and method for processing audio in a voice response environment is provided. A database of host scripts each comprising signature files of audio phrases and actions to take when one of the audio phrases is recognized is maintained. The host scripts are loaded and a call to a voice mail server is initiated. Incoming audio buffers are received during the call from voice messages stored on the voice mail server. The incoming audio buffers are processed. A signature data structure is created for each audio buffer. The signature data structure is compared with signatures of expected phrases in the host scripts. The actions stored in the host scripts are executed when the signature data structure matches the signature of the expected phrase.
    Type: Grant
    Filed: September 10, 2012
    Date of Patent: August 27, 2013
    Assignee: Intellisist, Inc.
    Inventor: Martin R. M. Dunsmuir
  • Patent number: 8515753
    Abstract: The example embodiment of the present invention provides an acoustic model adaptation method for enhancing recognition performance for a non-native speaker's speech. In order to adapt acoustic models, first, pronunciation variations are examined by analyzing a non-native speaker's speech. Thereafter, based on variation pronunciation of a non-native speaker's speech, acoustic models are adapted in a state-tying step during a training process of acoustic models. When the present invention for adapting acoustic models and a conventional acoustic model adaptation scheme are combined, more-enhanced recognition performance can be obtained. The example embodiment of the present invention enhances recognition performance for a non-native speaker's speech while reducing the degradation of recognition performance for a native speaker's speech.
    Type: Grant
    Filed: March 30, 2007
    Date of Patent: August 20, 2013
    Assignee: Gwangju Institute of Science and Technology
    Inventors: Hong Kook Kim, Yoo Rhee Oh, Jae Sam Yoon
  • Patent number: 8489397
    Abstract: A machine-readable medium and a network device are provided for speech-to-text translation. Speech packets are received at a broadband telephony interface and stored in a buffer. The speech packets are processed and textual representations thereof are displayed as words on a display device. Speech processing is activated and deactivated in response to a command from a subscriber.
    Type: Grant
    Filed: September 11, 2012
    Date of Patent: July 16, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Charles David Caldwell, John Bruce Harlow, Robert J. Sayko, Norman Shaye
  • Publication number: 20130173268
    Abstract: A method for verifying that a person is registered to use a telemedical device includes identifying an unprompted trigger phrase in words spoken by a person and received by the telemedical device. The telemedical device prompts the person to state a name of a registered user and optionally prompts the person to state health tips for the person. The telemedical device verifies that the person is the registered user using utterance data generated from the unprompted trigger phrase, name of the registered user, and health tips.
    Type: Application
    Filed: December 29, 2011
    Publication date: July 4, 2013
    Applicant: Robert Bosch GmbH
    Inventors: Fuliang Weng, Taufiq Hasan, Zhe Feng
  • Publication number: 20130166283
    Abstract: A phoneme rule generating apparatus includes a spectrum analyzer configured to analyze pronunciation patterns of voices included in a plurality of voice data, a clusterer configured to cluster the plurality of voice data based on the analyzed pronunciation patterns, a voice group generator configured to generate voice groups from the clustered voice data, a phoneme rule generator configured to generate a phoneme rule corresponding to each respective voice group from among the generated voice groups and a group mapping DB configured to store the generated voice groups and the generated phoneme rules for an accurate voice recognition.
    Type: Application
    Filed: December 26, 2012
    Publication date: June 27, 2013
    Applicant: KT CORPORATION
    Inventor: KT Corporation