Feature Extraction For Speech Recognition; Selection Of Recognition Unit (epo) Patents (Class 704/E15.004)
-
Patent number: 11955125Abstract: The present disclosure provides an operation method of a smart speaker. The method includes steps as follows. The linked settings among voiceprint registration data, user information and a cast setting of a user device are preloaded by the smart speaker. Wake-up words are received to set an operation mode of the smart speaker and to generate a voiceprint recognition result. In the operation mode, after receiving voice, the voice is converted into voice text and the voiceprint recognition result is compared to voiceprint registration data. When the voiceprint recognition result matches the voiceprint registration data, the user information and the voice text are transmitted to a cloud server, so that the cloud server returns the response message to the smart speaker. According to the cast setting, the response message is sent to the user device.Type: GrantFiled: October 15, 2019Date of Patent: April 9, 2024Assignee: AmTRAN Technology Co., Ltd.Inventors: Che-Chia Ho, Chia-Wei Lin
-
Patent number: 11948575Abstract: Implementations set forth herein relate to an automated assistant that uses circumstantial condition data, generated based on circumstantial conditions of an input, to determine whether the input should affect an action been initialized by a particular user. The automated assistant can allow each user to manipulate their respective ongoing action without necessitating interruptions for soliciting explicit user authentication. For example, when an individual in a group of persons interacts with the automated assistant to initialize or affect a particular ongoing action, the automated assistant can generate data that correlates that individual to the particular ongoing action. The data can be generated using a variety of different input modalities, which can be dynamically selected based on changing circumstances of the individual. Therefore, different sets of input modalities can be processed each time a user provides an input for modifying an ongoing action and/or initializing another action.Type: GrantFiled: January 12, 2023Date of Patent: April 2, 2024Assignee: GOOGLE LLCInventors: Andrew Gallagher, Caroline Pantofaru, Vinay Bettadapura, Utsav Prabhu
-
Patent number: 11694444Abstract: Aspects of the present disclosure involve a system comprising a computer-readable storage medium storing a program and method for setting ad breakpoints in a video. The program and method provide for accessing a video; determining plural shot boundaries for the video, each shot boundary defining a shot corresponding to a contiguous sequence of video frames that is free of cuts or transitions; and for each shot boundary of the plural shot boundaries, performing a set of breakpoint tests on the shot boundary, each breakpoint test configured to return a respective score indicating whether the shot boundary corresponds to a breakpoint for potential insertion of an ad during playback of the video, calculating a combined score for the shot boundary based on combining the each of the respective scores, and setting, in a case where the combined score meets a threshold value, the shot boundary as the breakpoint.Type: GrantFiled: April 5, 2021Date of Patent: July 4, 2023Assignee: Snap Inc.Inventors: Khalil Chatoo, David Michael Hornsby, Jeffrey Kile, Chinmay Lonkar, Zhimin Wang, Ian Anthony Wehrman
-
Patent number: 11562740Abstract: In one aspect, a network microphone device includes a plurality of microphones and is configured to capture a voice input via the one or more microphones, detect a wake word in the voice input, transmit data associated with the voice input to one or more remote computing devices associated with a voice assistant service, and receive a response from the one or more remote computing devices, the response comprising a playback command based on the voice input. The network microphone device may be configured to obtain verification information characterizing the voice input and, based on the verification information indicating that the voice input was spoken by an unverified user, functionally disable the NMD from performing the playback command.Type: GrantFiled: January 7, 2020Date of Patent: January 24, 2023Assignee: Sonos, Inc.Inventor: Connor Kristopher Smith
-
Publication number: 20140012575Abstract: In some embodiments, the recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated to determine whether a meaning of any of the alternative recognition results differs from a meaning of the top recognition result in a manner that is significant for a domain, such as the medical domain. In some embodiments, words and/or phrases that may be confused by an ASR system may be determined and associated in sets of words and/or phrases. Words and/or phrases that may be determined include those that change a meaning of a phrase or sentence when included in the phrase/sentence.Type: ApplicationFiled: July 9, 2012Publication date: January 9, 2014Applicant: Nuance Communications, Inc.Inventors: William F. Ganong, III, Raghu Vemula, Robert Fleming
-
Publication number: 20130297307Abstract: A dictation module is described herein which receives and interprets a complete utterance of the user in incremental fashion, that is, one incremental portion at a time. The dictation module also provides rendered text in incremental fashion. The rendered text corresponds to the dictation module's interpretation of each incremental portion. The dictation module also allows the user to modify any part of the rendered text, as it becomes available. In one case, for instance, the dictation module provides a marking menu which includes multiple options by which a user can modify a selected part of the rendered text. The dictation module also uses the rendered text (as modified or unmodified by the user using the marking menu) to adjust one or more models used by the dictation model to interpret the user's utterance.Type: ApplicationFiled: May 1, 2012Publication date: November 7, 2013Applicant: MICROSOFT CORPORATIONInventors: Timothy S. Paek, Bongshin Lee, Bo-June Hsu
-
Publication number: 20130080153Abstract: An information processing apparatus includes a receiving unit that receives character sequences, a sorting unit that sorts the character sequences received by the receiving unit into known words and unknown words, and a detecting unit that detects character sequences sorted as unknown words by the sorting unit as incorrect words and detects a third character sequence between a first character sequence and a second character sequence, which have been sorted as unknown words by the sorting unit, as incorrect words when the third character sequence includes words sorted as known words by the sorting unit and the number of the known words is less than or equal to or less than a predetermined number.Type: ApplicationFiled: January 6, 2012Publication date: March 28, 2013Applicant: FUJI XEROX CO., LTD.Inventor: Eiichi TANAKA
-
Publication number: 20130046540Abstract: A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N-L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.Type: ApplicationFiled: December 3, 2007Publication date: February 21, 2013Inventor: Alexander Sorin
-
Publication number: 20130030794Abstract: According to one embodiment, a speaker clustering apparatus includes a clustering unit, an extraction unit, and an error detection unit. The clustering unit is configured to extract acoustic features for speakers from an acoustic signal, and to cluster utterances included in the acoustic signal into the speakers by using the acoustic features. The extraction unit is configured to acquire character strings representing contents of the utterances, and to extract linguistic features of the speakers by using the character strings. The error detection unit is configured to decide that, when one of the character strings does not fit with a linguistic feature of a speaker into which an utterance of the one is clustered, the utterance is erroneously clustered by the clustering unit.Type: ApplicationFiled: March 6, 2012Publication date: January 31, 2013Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Tomoo Ikeda, Manabu Nagao, Osamu Nishiyama, Hirokazu Suzuki, Koji Ueno, Nobuhiro Shimogori
-
Publication number: 20120253812Abstract: In syllable or vowel or phone boundary detection during speech, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more syllable or vowel or phone boundaries in the input window of sound can be detected by mapping the cumulative gist vector to one or more syllable or vowel or phone boundary characteristics using a machine learning algorithm.Type: ApplicationFiled: April 1, 2011Publication date: October 4, 2012Applicant: Sony Computer Entertainment Inc.Inventors: OZLEM KALINLI, Ruxin Chen
-
Publication number: 20120253813Abstract: A speech segment determination device includes a frame division portion, a power spectrum calculation portion, a power spectrum operation portion, a spectral entropy calculation portion and a determination portion. The frame division portion divides an input signal in units of frames. The power spectrum calculation portion calculates, using an analysis length, a power spectrum of the input signal for each of the frames that have been divided. The power spectrum operation portion adds a value of the calculated power spectrum to a value of power spectrum in each of frequency bins. The spectral entropy calculation portion calculates spectral entropy using the power spectrum whose value has been increased. The determination portion determines, based on a value of the spectral entropy, whether the input signal is a signal in a speech segment.Type: ApplicationFiled: February 17, 2012Publication date: October 4, 2012Applicant: OKI ELECTRIC INDUSTRY CO., LTD.Inventor: Kazuhiro KATAGIRI
-
Publication number: 20120245919Abstract: An automatic speech recognition (ASR) apparatus for an embedded device application is described. A speech decoder receives an input sequence of speech feature vectors in a first language and outputs an acoustic segment lattice representing a probabilistic combination of basic linguistic units in a second language. A vocabulary matching module compares the acoustic segment lattice to vocabulary models in the first language to determine an output set of probability-ranked recognition hypotheses. A detailed matching module compares the set of probability-ranked recognition hypotheses to detailed match models in the first language to determine a recognition output representing a vocabulary word most likely to correspond to the input sequence of speech feature vectors.Type: ApplicationFiled: September 23, 2009Publication date: September 27, 2012Applicant: NUANCE COMMUNICATIONS, INC.Inventors: Guillermo Aradilla, Rainer Gruhn
-
Publication number: 20120245941Abstract: A device can be configured to receive speech input from a user. The speech input can include a command for accessing a restricted feature of the device. The speech input can be compared to a voiceprint (e.g., text-independent voiceprint) of the user's voice to authenticate the user to the device. Responsive to successful authentication of the user to the device, the user is allowed access to the restricted feature without the user having to perform additional authentication steps or speaking the command again. If the user is not successfully authenticated to the device, additional authentication steps can be request by the device (e.g., request a password).Type: ApplicationFiled: March 21, 2011Publication date: September 27, 2012Inventor: Adam J. Cheyer
-
Publication number: 20120215537Abstract: According to one embodiment, a sound recognition operation apparatus includes a sound detection module, a keyword detection module, an audio mute module, and a transmission module. The sound detection module is configured to detect sound. The keyword detection module is configured to detect a particular keyword using voice recognition when the sound detection module detects sound. The audio mute module is configured to transmit an operation signal for muting audio sound when the keyword detection module detects the keyword. The transmission module is configured to recognize the voice command after the keyword is detected by the keyword detection module, and transmit an operation signal corresponding to the voice command.Type: ApplicationFiled: September 21, 2011Publication date: August 23, 2012Inventor: Yoshihiro Igarashi
-
Publication number: 20120185237Abstract: Speech recognition models are dynamically re-configurable based on user information, application information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. Word recognition lattices are generated for each data field of an application and dynamically concatenated into a single word recognition lattice. A language model is applied to the concatenated word recognition lattice to determine the relationships between the word recognition lattices and repeated until the generated word recognition lattices are acceptable or differ from a predetermined value only by a threshold amount. These techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.Type: ApplicationFiled: March 26, 2012Publication date: July 19, 2012Applicant: AT&T Intellectual Property II, L.P.Inventors: Bojana GAJIC, Shrikanth Sambasivan Narayanan, Sarangarajan Parthasarathy, Richard Cameron Rose, Aaron Edward Rosenberg
-
Publication number: 20120173232Abstract: An acoustic processing apparatus is provided. The acoustic processing apparatus including a first extracting unit configured to extract a first acoustic model that corresponds with a first position among positions set in a speech recognition target area, a second extracting unit configured to extract at least one second acoustic model that corresponds with, respectively, at least one second position in proximity to the first position, and an acoustic model generating unit configured to generate a third acoustic model based on the first acoustic model, the second acoustic model, or a combination thereof.Type: ApplicationFiled: July 28, 2011Publication date: July 5, 2012Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Nam-Hoon KIM, Jeong-Su KIM, Jeong-Mi CHO
-
Publication number: 20120123780Abstract: A video summary method comprises dividing a video into a plurality of video shots, analyzing each frame in a video shot from the plurality of video shots, determining a saliency of each frame of the video shot, determining a key frame of the video shot based on the saliency of each frame of the video shot, extracting visual features from the key frame and performing shot clustering of the plurality of video shots to determine concept patterns based on the visual features. The method further comprises fusing different concept patterns using a saliency tuning method and generating a summary of the video based upon a global optimization method.Type: ApplicationFiled: November 15, 2011Publication date: May 17, 2012Applicant: FutureWei Technologies, Inc.Inventors: Jizhou Gao, Yu Huang, Hong Heather Yu
-
Publication number: 20120116774Abstract: An implantable system (11) for control of and communication with an implant (17) in a body, comprising a command input device (12) and a processing device (13) coupled thereto, the processing device (13) being adapted to generate input to a command generator (16) which is comprised in the system (11) coupled to the processing device (13) and which is adapted to generate and communicate commands to the medical implant (17) in response to input received from the processing device (13), the system (11) further comprising a memory unit (15) connected to at least one of said devices in the system (11) for storing a memory bank of commands. The command input device (12) is adapted to receive commands from a user as voice commands, and the processing device (13) comprises a filter adapted to filter voice commands against high frequency losses and frequency distortion caused by the mammal body (10).Type: ApplicationFiled: July 19, 2010Publication date: May 10, 2012Applicant: MILUX HOLDING SAInventor: Peter Forsell
-
Publication number: 20120116766Abstract: A method and apparatus combining the advantages of phonetic search such as the rapid implementation and deployment and medium accuracy, with the advantages of speech to text, including providing the full text of the audio and rapid search. The method and apparatus comprise steps or components for receiving the audio signal captured in the call center environment; extracting a multiplicity of feature vectors from the audio signal; creating a phoneme lattice from the multiplicity of feature vectors, the phoneme lattice comprising one or more allophone, each allophone comprising two or more phonemes; creating a hybrid phoneme-word lattice from the phoneme lattice; and extracting the word by analyzing the hybrid phoneme-word lattice.Type: ApplicationFiled: November 7, 2010Publication date: May 10, 2012Applicant: Nice Systems Ltd.Inventors: Moshe WASSERBLAT, Ronen Laperdon, Dori Shapira
-
Publication number: 20120116765Abstract: A speech recognition unit (102) includes a phrase determination unit (103) which determines a phrase boundary based on the comparison between the hypothetical word group generated by speech recognition and set words representing phrase boundaries. In this speech processing device, the speech recognition unit (102) outputs recognition results for each phrase based on a phrase boundary determined by the phrase determination unit (103).Type: ApplicationFiled: June 4, 2010Publication date: May 10, 2012Applicant: NEC CORPORATIONInventors: Ken Hanazawa, Seiya Osada, Takayuki Arakawa, Koji Okabe, Daisuke Tanaka
-
Publication number: 20120095762Abstract: A method of recognizing speech is provided. The method includes the operations of (a) dividing first speech that is input to a speech recognizing apparatus into frames; (b) converting the frames of the first speech into frames of second speech by applying conversion rules to the divided frames, respectively; and (c) recognizing, by the speech recognizing apparatus, the frames of the second speech, wherein (b) comprises converting the frames of the first speech into the frames of the second speech by reflecting at least one frame from among the frames that are previously positioned with respect to a frame of the first speech.Type: ApplicationFiled: October 19, 2011Publication date: April 19, 2012Applicants: SEOUL NATIONAL UNIVERSITY INDUSTRY FOUNDATION, SAMSUNG ELECTRONICS CO., LTD.Inventors: Ki-wan EOM, Chang-woo HAN, Tae-gyoon KANG, Nam-soo KIM, Doo-hwa HONG, Jae-won LEE, Hyung-joon LIM
-
Publication number: 20120089393Abstract: A highlight section including an exciting scene is appropriately extracted with smaller amount of processing. A reflection coefficient calculating unit (12) calculates a parameter (reflection coefficient) representing a slope of spectrum distribution of the input audio signal for each frame. A reflection coefficient comparison unit (13) calculates an amount of change in the reflection coefficients between adjacent frames, and compares the calculation result with a predetermined threshold. An audio signal classifying unit (14) classifies the input audio signal into a background noise section and a speech section based on the comparison result. A background noise level calculating unit (15) calculates a level of a background noise in the background noise section based on signal energy in the background noise section. An event detecting unit (16) detects an event occurring point from a sharp increase in the background noise level.Type: ApplicationFiled: June 2, 2010Publication date: April 12, 2012Inventor: Naoya Tanaka
-
Publication number: 20120059656Abstract: A method for determining a similarity between a first audio source and a second audio source includes: for the first audio source, determining a first frequency of occurrence for each of a plurality of phoneme sequences and determining a first weighted frequency for each of the plurality of phoneme sequences based on the first frequency of occurrence for the phoneme sequence; for the second audio source, determining a second frequency of occurrence for each of a plurality of phoneme sequences and determining a second weighted frequency for each of the plurality of phoneme sequences based on the second frequency of occurrence for the phoneme sequence; comparing the first weighted frequency for each phoneme sequence with the second weighted frequency for the corresponding phoneme sequence; and generating a similarity score representative of a similarity between the first audio source and the second audio source based on the results of the comparing.Type: ApplicationFiled: August 30, 2011Publication date: March 8, 2012Applicant: Nexidia Inc.Inventors: Jacob B. Garland, Jon A. Arrowood, Drew Lanham, Marsal Gavalda
-
Publication number: 20120054054Abstract: An information providing system is disclosed. The system includes a management apparatus having a database storing facility-event information of facilities or events, and a portable apparatus communicatable with the management apparatus. The portable apparatus sets each place, at which a movement stop time exceeds a prescribed staying time, as a stay place. The management apparatus estimates that a living area of a user of the portable apparatus is a collective area covering all of the stay places of the user. The management apparatus sets the collective area as the search scope, extracts the facility-event information matching the search scope from the database, and transmits the extracted facility-event information to the portable apparatus.Type: ApplicationFiled: August 24, 2011Publication date: March 1, 2012Applicant: DENSO CORPORATIONInventor: Shogo Kameyama
-
Patent number: 8123615Abstract: Herein disclosed is a gaming machine executing a game and paying out a predetermined amount of credits according to a game result; generating voice data based on a player's voice; identifying a voice pattern corresponding to the voice data by retrieving the dialogue voice database and identifying a type of voice corresponding to the voice data, so as to store the voice data along with the voice pattern into the memory; calculating a value indicative of a game result, and updating the play history data stored in the memory using the result of the calculation; comparing the play history data thus updated with a predetermined threshold value data; generating voice data according to the voice pattern based on the play history data if the play history data thus updated exceeds the predetermined threshold value data; and outputting voices from the speaker.Type: GrantFiled: January 23, 2009Date of Patent: February 28, 2012Assignee: Aruze Gaming America, Inc.Inventor: Kazuo Okada
-
Publication number: 20120041762Abstract: An apparatus and method for tracking dialogue and other sound signals in film, television or other systems with multiple channel sound is described. One or more audio channels which is expected to carry the speech of persons appearing in the program or other particular types of sounds is inspected to determine if that channel's audio includes particular sounds such as MUEVs, including phonemes corresponding to human speech patterns. If an improper number of particular sounds such as phonemes are found in the channel(s) an action such as a report, an alarm, a correction, or other action is taken. The inspection of the audio channel(s) may be made in conjunction with the appearance of corresponding images associated with the sound, such as visemes in the video signal, to improve the determination of types of sounds such as phonemes.Type: ApplicationFiled: December 7, 2010Publication date: February 16, 2012Applicant: Pixel Instruments CorporationInventors: J. Carl Cooper, Mirko Vojnovic, Christopher Smith
-
Publication number: 20120035932Abstract: In one implementation, a computer-implemented method includes receiving, at a mobile computing device, ambiguous user input that indicates more than one of a plurality of commands; and determining a current context associated with the mobile computing device that indicates where the mobile computing device is currently located. The method can further include disambiguating the ambiguous user input by selecting a command from the plurality of commands based on the current context associated with the mobile computing device; and causing output associated with performance of the selected command to be provided by the mobile computing device.Type: ApplicationFiled: August 6, 2010Publication date: February 9, 2012Applicant: GOOGLE INC.Inventors: John Nicholas Jitkoff, Michael J. LeBeau
-
Publication number: 20120008802Abstract: A voice detection system and method for automatic volume controls and voice sensors is disclosed. More specifically, the invention addresses a situation where the user's own voice undesirably affects the functionality of an automatic volume control for a two-way communication device, such as a cellular telephone. In addition, the invention proposes solutions wherein one (voice) microphone is employed and also, when two (voice and noise) microphones are employed. Further, an algorithm is disclosed that addresses the issue concerning the user's own voice in an AVC pertaining to the two microphone solution. Yet further, a method herein is disclosed that detects the presence of voice in a single non-selective (noise) microphone.Type: ApplicationFiled: January 3, 2011Publication date: January 12, 2012Inventor: Franklin S. Felber
-
Publication number: 20120008875Abstract: The present invention pertains to method and a communication device (100) for associating a contact record pertaining to a remote speaker (220) with a mnemonic image (191) based on attributes of the speaker (220). The method comprises receiving voice data of the speaker (220); in a communication session with a source device (200). A source determination representing the speaker (220) is registered, and then the received voice data is analyzed so that voice data characteristics can be extracted. Based on these voice data characteristics a mnemonic image (191) can be selected, and associated to a contact record in which the source determination is stored. The mnemonic image (191) may be selected among images previously stored in the device, or derived through editing of such images.Type: ApplicationFiled: May 17, 2011Publication date: January 12, 2012Applicant: SONY ERICSSON MOBILE COMMUNICATIONS ABInventor: Joakim MARTENSSON
-
Publication number: 20110282663Abstract: A method of and system for transient noise rejection for improved speech recognition. The method comprises the steps of (a) receiving audio including user speech and at least some transient noise associated with the speech, (b) converting the received audio into digital data, (c) segmenting the digital data into acoustic frames, and (d) extracting acoustic feature vectors from the acoustic frames. The method also comprises the steps of (e) evaluating the acoustic frames for transient noise on a frame-by-frame basis, (f) rejecting those acoustic frames having transient noise, (g) accepting as speech frames those acoustic frames having no transient noise and, thereafter, (h) recognizing the user speech using the speech frames.Type: ApplicationFiled: May 13, 2010Publication date: November 17, 2011Applicant: GENERAL MOTORS LLCInventors: Gaurav Talwar, Rathinavelu Chengalvarayan
-
Publication number: 20110257976Abstract: Speech recognition includes structured modeling, irrelevant variability normalization and unsupervised online adaptation of speech recognition parameters.Type: ApplicationFiled: April 14, 2010Publication date: October 20, 2011Applicant: Microsoft CorporationInventor: Qiang Huo
-
Publication number: 20110251844Abstract: Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.Type: ApplicationFiled: June 20, 2011Publication date: October 13, 2011Applicant: MICROSOFT CORPORATIONInventors: Xiao Li, Asela J. R. Gunawardana, Alejandro Acero
-
Publication number: 20110231191Abstract: A weight coefficient generation device, a speech recognition device, a navigation system, a vehicle, a vehicle coefficient generation method, and a weight coefficient generation program are provided for the purpose of improving a speech recognition performance of place names. In order to address the above purpose, an address database 12 has address information data items including country names, city names, street names, and house numbers, and manages the address information having a tree structure indicating hierarchical relationships between the place names from wide area to a narrow area. Each of the place names stored in the address database 12 is taken as a speech recognition candidate. A weight coefficient calculation unit 11 of a weight coefficient generation device 10 calculates a weight coefficient of the likelihood of the aforementioned recognition candidate based on the number of the street names belonging to the lower hierarchy below the city names.Type: ApplicationFiled: November 17, 2009Publication date: September 22, 2011Inventor: Toshiyuki Miyazaki
-
Publication number: 20110224982Abstract: Described is a technology in which information retrieval (IR) techniques are used in a speech recognition (ASR) system. Acoustic units (e.g., phones, syllables, multi-phone units, words and/or phrases) are decoded, and features found from those acoustic units. The features are then used with IR techniques (e.g., TF-IDF based retrieval) to obtain a target output (a word or words).Type: ApplicationFiled: March 12, 2010Publication date: September 15, 2011Applicant: c/o Microsoft CorporationInventors: Alejandro Acero, James Garnet Droppo, III, Xiaoqiang Xiao, Geoffrey G. Zweig
-
Patent number: 8019593Abstract: Embodiments of a feature generation system and process for use in machine learning applications utilizing statistical modeling systems are described. In one embodiment, the feature generation process generates large feature spaces by combining features using logical, arithmetic and/or functional operations. A first set of features in an initial feature space are defined. Some or all of the first set of features are processed using one or more arithmetic, logic, user-defined combinatorial processes, or combinations thereof, to produce additional features. The additional features and at least some of the first set of features are combined to produce an expanded feature space. The expanded feature space is processed through a feature selection and optimization process to produce a model in a statistical modeling system.Type: GrantFiled: June 30, 2006Date of Patent: September 13, 2011Assignee: Robert Bosch CorporationInventors: Fuliang Weng, Zhe Feng, Qi Zhang
-
Patent number: 8019594Abstract: Embodiments of a progressive feature selection method that selects features in multiple rounds are described. In one embodiment, the progressive feature selection method splits the feature space into tractable sub-spaces such that a feature selection algorithm can be performed on each sub-space. In a merge-split operation, the subset of features that the feature selection algorithm selects from the different sub-spaces are merged into subsequent sets of features. Instead of re-generating the mapping table for each subsequent set from scratch, a new mapping table from the previous round's tables is created by collecting those entries that correspond to the selected features. The feature selection method is then performed again on each of the subsequent feature sets and new features are selected from each of these feature sets. This feature selection-merge-split process is repeated on successively smaller numbers of feature sets until a single final set of features is selected.Type: GrantFiled: June 30, 2006Date of Patent: September 13, 2011Assignee: Robert Bosch CorporationInventors: Fuliang Weng, Zhe Feng, Qi Zhang
-
Publication number: 20110213615Abstract: A method for configuring a voice authentication system comprises ascertaining a measure of confidence associated with a voice sample enrolled with the authentication system. The measure of confidence is derived through simulated impostor testing carried out on the enrolled sample.Type: ApplicationFiled: September 7, 2009Publication date: September 1, 2011Applicant: Auraya Pty LtdInventors: Clive Summerfield, Habib E. Talhami
-
Publication number: 20110213611Abstract: A method and a device control the transport of an object to a predetermined destination. The object is provided with information on a destination to which the object is to be transported. The destination information with which the object is provided is inputted into a speech detection station. A speech recognition system evaluates the destination information detected by the speech detection station. A conveying device transports the object. The destination, the information of which is provided to the object, is determined. The evaluation result of the speech recognition system is used to determine the destination. A release signal is produced. The release signal triggers two processes: the speech detection station is released for the input of destination information on another object. The conveying device transports the object. The transport of the object to the determined destination is triggered.Type: ApplicationFiled: August 28, 2009Publication date: September 1, 2011Applicant: SIEMENS AKTIENGESELLSCHAFTInventor: Ingolf Rauh
-
Publication number: 20110208525Abstract: A voice recognizing apparatus includes a voice start instructing section 3 for instructing to start voice recognition; a voice input section 1 for receiving uttered voice and converting to a voice signal; a voice recognizing section 2 for recognizing the voice on the basis of the voice signal; an utterance start time detecting section 4 for detecting duration from the time when the voice start instructing section instructs to the time when the voice input section delivers the voice signal; an utterance timing deciding section 5 for deciding utterance timing indicating whether the utterance start is quick or slow by comparing the duration detected by the utterance start time detecting section with a prescribed threshold; an interaction control section 6 for determining a content, which is to be shown when exhibiting a recognition result of the voice recognizing section, in accordance with the utterance timing decided; a system response generating section 7 for generating a system response on the basis of the dType: ApplicationFiled: March 27, 2008Publication date: August 25, 2011Inventors: Yuzuru Inoue, Tadashi Suzuki, Fumitaka Sato, Takayoshi Chikuri
-
Publication number: 20110208527Abstract: A voice activatable system for providing the correct spelling of a spoken word is disposed in an elongated body of a writing instrument such as a ball point pen. The system includes a microphone the output of which is fed to an amplifier analog to a digital converter and from there to a speech recognition program, the output of the speech recognition program is fed to a computer, namely a word processor/controller that includes a data base. The output of the speech recognition is compared with the digital library of words and when a match is found, it is amplified and fed to digital to analog connector. The output of the digital/analog computer is fed to a speaker that repeats the word with the correct pronunciation followed by a correct spelling of the word. The system includes a battery for powering the system as well as an on/off switch and a repeat button for repeating information from the system.Type: ApplicationFiled: February 23, 2010Publication date: August 25, 2011Inventor: Fawzi Q. Behbehani
-
Publication number: 20110184730Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing voice commands. In one aspect, a method includes receiving an audio signal at a server, performing, by the server, speech recognition on the audio signal to identify one or more candidate terms that match one or more portions of the audio signal, identifying one or more possible intended actions for each candidate term, providing information for display on a client device, the information specifying the candidate terms and the actions for each candidate term, receiving from the client device an indication of an action selected by a user, where the action was selected from among the actions included in the provided information, and invoking the action selected by the user.Type: ApplicationFiled: January 22, 2010Publication date: July 28, 2011Applicant: GOOGLE INC.Inventors: Michael J. LeBeau, William J. Byrne, Nicholas Jitkoff, Alexander H. Gruenstein
-
Publication number: 20110178799Abstract: Methods and systems of identifying speech sound features within a speech sound are provided. The sound features may be identified using a multi-dimensional analysis that analyzes the time, frequency, and intensity at which a feature occurs within a speech sound, and the contribution of the feature to the sound. Information about sound features may be used to enhance spoken speech sounds to improve recognizability of the speech sounds by a listener.Type: ApplicationFiled: July 24, 2009Publication date: July 21, 2011Applicant: The Board of Trustees of the University of IllinoisInventors: Jont B. Allen, Feipeng Li
-
Publication number: 20110161084Abstract: Apparatus, method and system for generating a threshold for utterance verification are introduced herein. When a processing object is determined, a recommendation threshold is generated according to an expected utterance verification result. In addition, extra collection of corpuses or training models is not necessary for the utterance verification introduced here. The processing unit can be a recognition object or an utterance verification object. In the apparatus, method and system for generating a threshold for utterance verification, at least one of the processing objects is received and then a speech unit sequence is generated therefrom. One or more values corresponding to each of the speech unit of the speech unit sequence are obtained accordingly, and then a recommendation threshold is generated based on an expected utterance verification result.Type: ApplicationFiled: June 24, 2010Publication date: June 30, 2011Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTEInventors: Cheng-Hsien Lin, Sen-Chia Chang, Chi-Tien Chiu
-
Publication number: 20110125498Abstract: One embodiment of the invention provides a computer-implemented method of handling a telephone call. The method comprises monitoring a conversation between an agent and a customer on a telephone line as part of the telephone call to extract the audio signal therefrom. Real-time voice analytics are performed on the extracted audio signal while the telephone call is in progress. The results from the voice analytics are then passed to a computer-telephony integration system responsible for the call for use by the computer-telephony integration system for determining future handling of the call.Type: ApplicationFiled: June 19, 2009Publication date: May 26, 2011Applicant: NEWVOICEMEDIA LTDInventors: Richard Pickering, Joseph Moussalli, Ashley Unitt
-
Publication number: 20110109539Abstract: A behavior recognition system and method by combining an image and a speech are provided. The system includes a data analyzing module, a database, and a calculating module. A plurality of image-and-speech relation modules is stored in the database. Each image-and-speech relation module includes a feature extraction parameter and an image-and-speech relation parameter. The data analyzing module obtains a gesture image and a speech data corresponding to each other, and substitutes the gesture image and the speech data into each feature extraction parameter to generate image feature sequences and speech feature sequences. The data analyzing module uses each image-and-speech relation parameter to calculate image-and-speech status parameters.Type: ApplicationFiled: December 9, 2009Publication date: May 12, 2011Inventors: Chung-Hsien Wu, Jen-Chun Lin, Wen-Li Wei, Chia-Te Chu, Red-Tom Lin, Chin-Shun Hsu
-
Publication number: 20110093261Abstract: Systems and methods are operable to associate each of a plurality of stored audio patterns with at least one of a plurality of digital tokens, identify a user based on user identification input, access a plurality of stored audio patterns associated with a user based on the user identification input, receive from a user at least one audio input from a custom language made up of custom language elements wherein the elements include at least one monosyllabic representation of a number, letter or word, select one of the plurality of stored audio patterns associated with the identified user, in the case that the audio input received from the identified user corresponds with one of the plurality of stored audio patterns, determine the digital token associated with the selected one of the plurality of stored audio patterns, and generate the output signal for use in a device based on the determined digital token.Type: ApplicationFiled: October 15, 2010Publication date: April 21, 2011Inventor: Paul Angott
-
Publication number: 20110082694Abstract: A method for real-time data-pattern analysis. The method includes receiving and queuing at least one data-pattern analysis request by a data-pattern analysis unit controller. At least one data stream portion is also received and stored by the data-pattern analysis unit controller, each data stream portion corresponding to a received data-pattern analysis request. Next, a received data-pattern analysis request is selected by the data-pattern analysis unit controller along with a corresponding data stream portion. A data-pattern analysis is performed based on the selected data-pattern analysis request and the corresponding data stream portion, wherein the data-pattern analysis is performed by one of a plurality of data-pattern analysis units.Type: ApplicationFiled: August 9, 2010Publication date: April 7, 2011Inventors: Richard FASTOW, Qamrul HASAN
-
Publication number: 20110082697Abstract: A method is described for correcting and improving the functioning of certain devices for the diagnosis and treatment of speech that dynamically measure the functioning of the velum in the control of nasality during speech. The correction method uses an estimate of the vowel frequency spectrum to greatly reduce the variation of nasalance with the vowel being spoken, so as to result in a corrected value of nasalance that reflects with greater accuracy the degree of velar opening. Correction is also described for reducing the effect on nasalance values of energy from the oral and nasal channels crossing over into the other channel because of imperfect acoustic separation.Type: ApplicationFiled: October 6, 2009Publication date: April 7, 2011Applicant: Rothenberg EnterprisesInventor: Martin ROTHENBERG
-
Publication number: 20110077944Abstract: A speech recognition module includes an acoustic front-end module, a sound detection module, and a word detection module. The acoustic front-end module generates a plurality of representations of frames from a digital audio signal and generates speech characteristic probabilities for the plurality of frames. The sound detection module determines a plurality of estimated utterances from the plurality of representations and the speech characteristic probabilities. The word detection module determines one or more words based on the plurality of estimated utterances and the speech characteristics probabilities.Type: ApplicationFiled: November 30, 2009Publication date: March 31, 2011Applicant: BROADCOM CORPORATIONInventor: Nambirajan Seshadri
-
Publication number: 20110071830Abstract: The present invention provides a combined lip reading and voice recognition multimodal interface system, which can issue a navigation operation instruction only by voice and lip movements, thus allowing a driver to look ahead during a navigation operation and reducing vehicle accidents related to navigation operations during driving.Type: ApplicationFiled: December 1, 2009Publication date: March 24, 2011Applicants: HYUNDAI MOTOR COMPANY, KIA MOTORS CORPORATIONInventors: Dae Hee Kim, Dai-Jin Kim, Jin Lee, Jong-Ju Shin, Jin-Seok Lee