Feature Extraction For Speech Recognition; Selection Of Recognition Unit (epo) Patents (Class 704/E15.004)
  • Patent number: 11955125
    Abstract: The present disclosure provides an operation method of a smart speaker. The method includes steps as follows. The linked settings among voiceprint registration data, user information and a cast setting of a user device are preloaded by the smart speaker. Wake-up words are received to set an operation mode of the smart speaker and to generate a voiceprint recognition result. In the operation mode, after receiving voice, the voice is converted into voice text and the voiceprint recognition result is compared to voiceprint registration data. When the voiceprint recognition result matches the voiceprint registration data, the user information and the voice text are transmitted to a cloud server, so that the cloud server returns the response message to the smart speaker. According to the cast setting, the response message is sent to the user device.
    Type: Grant
    Filed: October 15, 2019
    Date of Patent: April 9, 2024
    Assignee: AmTRAN Technology Co., Ltd.
    Inventors: Che-Chia Ho, Chia-Wei Lin
  • Patent number: 11948575
    Abstract: Implementations set forth herein relate to an automated assistant that uses circumstantial condition data, generated based on circumstantial conditions of an input, to determine whether the input should affect an action been initialized by a particular user. The automated assistant can allow each user to manipulate their respective ongoing action without necessitating interruptions for soliciting explicit user authentication. For example, when an individual in a group of persons interacts with the automated assistant to initialize or affect a particular ongoing action, the automated assistant can generate data that correlates that individual to the particular ongoing action. The data can be generated using a variety of different input modalities, which can be dynamically selected based on changing circumstances of the individual. Therefore, different sets of input modalities can be processed each time a user provides an input for modifying an ongoing action and/or initializing another action.
    Type: Grant
    Filed: January 12, 2023
    Date of Patent: April 2, 2024
    Assignee: GOOGLE LLC
    Inventors: Andrew Gallagher, Caroline Pantofaru, Vinay Bettadapura, Utsav Prabhu
  • Patent number: 11694444
    Abstract: Aspects of the present disclosure involve a system comprising a computer-readable storage medium storing a program and method for setting ad breakpoints in a video. The program and method provide for accessing a video; determining plural shot boundaries for the video, each shot boundary defining a shot corresponding to a contiguous sequence of video frames that is free of cuts or transitions; and for each shot boundary of the plural shot boundaries, performing a set of breakpoint tests on the shot boundary, each breakpoint test configured to return a respective score indicating whether the shot boundary corresponds to a breakpoint for potential insertion of an ad during playback of the video, calculating a combined score for the shot boundary based on combining the each of the respective scores, and setting, in a case where the combined score meets a threshold value, the shot boundary as the breakpoint.
    Type: Grant
    Filed: April 5, 2021
    Date of Patent: July 4, 2023
    Assignee: Snap Inc.
    Inventors: Khalil Chatoo, David Michael Hornsby, Jeffrey Kile, Chinmay Lonkar, Zhimin Wang, Ian Anthony Wehrman
  • Patent number: 11562740
    Abstract: In one aspect, a network microphone device includes a plurality of microphones and is configured to capture a voice input via the one or more microphones, detect a wake word in the voice input, transmit data associated with the voice input to one or more remote computing devices associated with a voice assistant service, and receive a response from the one or more remote computing devices, the response comprising a playback command based on the voice input. The network microphone device may be configured to obtain verification information characterizing the voice input and, based on the verification information indicating that the voice input was spoken by an unverified user, functionally disable the NMD from performing the playback command.
    Type: Grant
    Filed: January 7, 2020
    Date of Patent: January 24, 2023
    Assignee: Sonos, Inc.
    Inventor: Connor Kristopher Smith
  • Publication number: 20140012575
    Abstract: In some embodiments, the recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated to determine whether a meaning of any of the alternative recognition results differs from a meaning of the top recognition result in a manner that is significant for a domain, such as the medical domain. In some embodiments, words and/or phrases that may be confused by an ASR system may be determined and associated in sets of words and/or phrases. Words and/or phrases that may be determined include those that change a meaning of a phrase or sentence when included in the phrase/sentence.
    Type: Application
    Filed: July 9, 2012
    Publication date: January 9, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: William F. Ganong, III, Raghu Vemula, Robert Fleming
  • Publication number: 20130297307
    Abstract: A dictation module is described herein which receives and interprets a complete utterance of the user in incremental fashion, that is, one incremental portion at a time. The dictation module also provides rendered text in incremental fashion. The rendered text corresponds to the dictation module's interpretation of each incremental portion. The dictation module also allows the user to modify any part of the rendered text, as it becomes available. In one case, for instance, the dictation module provides a marking menu which includes multiple options by which a user can modify a selected part of the rendered text. The dictation module also uses the rendered text (as modified or unmodified by the user using the marking menu) to adjust one or more models used by the dictation model to interpret the user's utterance.
    Type: Application
    Filed: May 1, 2012
    Publication date: November 7, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Timothy S. Paek, Bongshin Lee, Bo-June Hsu
  • Publication number: 20130080153
    Abstract: An information processing apparatus includes a receiving unit that receives character sequences, a sorting unit that sorts the character sequences received by the receiving unit into known words and unknown words, and a detecting unit that detects character sequences sorted as unknown words by the sorting unit as incorrect words and detects a third character sequence between a first character sequence and a second character sequence, which have been sorted as unknown words by the sorting unit, as incorrect words when the third character sequence includes words sorted as known words by the sorting unit and the number of the known words is less than or equal to or less than a predetermined number.
    Type: Application
    Filed: January 6, 2012
    Publication date: March 28, 2013
    Applicant: FUJI XEROX CO., LTD.
    Inventor: Eiichi TANAKA
  • Publication number: 20130046540
    Abstract: A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N-L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.
    Type: Application
    Filed: December 3, 2007
    Publication date: February 21, 2013
    Inventor: Alexander Sorin
  • Publication number: 20130030794
    Abstract: According to one embodiment, a speaker clustering apparatus includes a clustering unit, an extraction unit, and an error detection unit. The clustering unit is configured to extract acoustic features for speakers from an acoustic signal, and to cluster utterances included in the acoustic signal into the speakers by using the acoustic features. The extraction unit is configured to acquire character strings representing contents of the utterances, and to extract linguistic features of the speakers by using the character strings. The error detection unit is configured to decide that, when one of the character strings does not fit with a linguistic feature of a speaker into which an utterance of the one is clustered, the utterance is erroneously clustered by the clustering unit.
    Type: Application
    Filed: March 6, 2012
    Publication date: January 31, 2013
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Tomoo Ikeda, Manabu Nagao, Osamu Nishiyama, Hirokazu Suzuki, Koji Ueno, Nobuhiro Shimogori
  • Publication number: 20120253812
    Abstract: In syllable or vowel or phone boundary detection during speech, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more syllable or vowel or phone boundaries in the input window of sound can be detected by mapping the cumulative gist vector to one or more syllable or vowel or phone boundary characteristics using a machine learning algorithm.
    Type: Application
    Filed: April 1, 2011
    Publication date: October 4, 2012
    Applicant: Sony Computer Entertainment Inc.
    Inventors: OZLEM KALINLI, Ruxin Chen
  • Publication number: 20120253813
    Abstract: A speech segment determination device includes a frame division portion, a power spectrum calculation portion, a power spectrum operation portion, a spectral entropy calculation portion and a determination portion. The frame division portion divides an input signal in units of frames. The power spectrum calculation portion calculates, using an analysis length, a power spectrum of the input signal for each of the frames that have been divided. The power spectrum operation portion adds a value of the calculated power spectrum to a value of power spectrum in each of frequency bins. The spectral entropy calculation portion calculates spectral entropy using the power spectrum whose value has been increased. The determination portion determines, based on a value of the spectral entropy, whether the input signal is a signal in a speech segment.
    Type: Application
    Filed: February 17, 2012
    Publication date: October 4, 2012
    Applicant: OKI ELECTRIC INDUSTRY CO., LTD.
    Inventor: Kazuhiro KATAGIRI
  • Publication number: 20120245919
    Abstract: An automatic speech recognition (ASR) apparatus for an embedded device application is described. A speech decoder receives an input sequence of speech feature vectors in a first language and outputs an acoustic segment lattice representing a probabilistic combination of basic linguistic units in a second language. A vocabulary matching module compares the acoustic segment lattice to vocabulary models in the first language to determine an output set of probability-ranked recognition hypotheses. A detailed matching module compares the set of probability-ranked recognition hypotheses to detailed match models in the first language to determine a recognition output representing a vocabulary word most likely to correspond to the input sequence of speech feature vectors.
    Type: Application
    Filed: September 23, 2009
    Publication date: September 27, 2012
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Guillermo Aradilla, Rainer Gruhn
  • Publication number: 20120245941
    Abstract: A device can be configured to receive speech input from a user. The speech input can include a command for accessing a restricted feature of the device. The speech input can be compared to a voiceprint (e.g., text-independent voiceprint) of the user's voice to authenticate the user to the device. Responsive to successful authentication of the user to the device, the user is allowed access to the restricted feature without the user having to perform additional authentication steps or speaking the command again. If the user is not successfully authenticated to the device, additional authentication steps can be request by the device (e.g., request a password).
    Type: Application
    Filed: March 21, 2011
    Publication date: September 27, 2012
    Inventor: Adam J. Cheyer
  • Publication number: 20120215537
    Abstract: According to one embodiment, a sound recognition operation apparatus includes a sound detection module, a keyword detection module, an audio mute module, and a transmission module. The sound detection module is configured to detect sound. The keyword detection module is configured to detect a particular keyword using voice recognition when the sound detection module detects sound. The audio mute module is configured to transmit an operation signal for muting audio sound when the keyword detection module detects the keyword. The transmission module is configured to recognize the voice command after the keyword is detected by the keyword detection module, and transmit an operation signal corresponding to the voice command.
    Type: Application
    Filed: September 21, 2011
    Publication date: August 23, 2012
    Inventor: Yoshihiro Igarashi
  • Publication number: 20120185237
    Abstract: Speech recognition models are dynamically re-configurable based on user information, application information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. Word recognition lattices are generated for each data field of an application and dynamically concatenated into a single word recognition lattice. A language model is applied to the concatenated word recognition lattice to determine the relationships between the word recognition lattices and repeated until the generated word recognition lattices are acceptable or differ from a predetermined value only by a threshold amount. These techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.
    Type: Application
    Filed: March 26, 2012
    Publication date: July 19, 2012
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: Bojana GAJIC, Shrikanth Sambasivan Narayanan, Sarangarajan Parthasarathy, Richard Cameron Rose, Aaron Edward Rosenberg
  • Publication number: 20120173232
    Abstract: An acoustic processing apparatus is provided. The acoustic processing apparatus including a first extracting unit configured to extract a first acoustic model that corresponds with a first position among positions set in a speech recognition target area, a second extracting unit configured to extract at least one second acoustic model that corresponds with, respectively, at least one second position in proximity to the first position, and an acoustic model generating unit configured to generate a third acoustic model based on the first acoustic model, the second acoustic model, or a combination thereof.
    Type: Application
    Filed: July 28, 2011
    Publication date: July 5, 2012
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Nam-Hoon KIM, Jeong-Su KIM, Jeong-Mi CHO
  • Publication number: 20120123780
    Abstract: A video summary method comprises dividing a video into a plurality of video shots, analyzing each frame in a video shot from the plurality of video shots, determining a saliency of each frame of the video shot, determining a key frame of the video shot based on the saliency of each frame of the video shot, extracting visual features from the key frame and performing shot clustering of the plurality of video shots to determine concept patterns based on the visual features. The method further comprises fusing different concept patterns using a saliency tuning method and generating a summary of the video based upon a global optimization method.
    Type: Application
    Filed: November 15, 2011
    Publication date: May 17, 2012
    Applicant: FutureWei Technologies, Inc.
    Inventors: Jizhou Gao, Yu Huang, Hong Heather Yu
  • Publication number: 20120116774
    Abstract: An implantable system (11) for control of and communication with an implant (17) in a body, comprising a command input device (12) and a processing device (13) coupled thereto, the processing device (13) being adapted to generate input to a command generator (16) which is comprised in the system (11) coupled to the processing device (13) and which is adapted to generate and communicate commands to the medical implant (17) in response to input received from the processing device (13), the system (11) further comprising a memory unit (15) connected to at least one of said devices in the system (11) for storing a memory bank of commands. The command input device (12) is adapted to receive commands from a user as voice commands, and the processing device (13) comprises a filter adapted to filter voice commands against high frequency losses and frequency distortion caused by the mammal body (10).
    Type: Application
    Filed: July 19, 2010
    Publication date: May 10, 2012
    Applicant: MILUX HOLDING SA
    Inventor: Peter Forsell
  • Publication number: 20120116766
    Abstract: A method and apparatus combining the advantages of phonetic search such as the rapid implementation and deployment and medium accuracy, with the advantages of speech to text, including providing the full text of the audio and rapid search. The method and apparatus comprise steps or components for receiving the audio signal captured in the call center environment; extracting a multiplicity of feature vectors from the audio signal; creating a phoneme lattice from the multiplicity of feature vectors, the phoneme lattice comprising one or more allophone, each allophone comprising two or more phonemes; creating a hybrid phoneme-word lattice from the phoneme lattice; and extracting the word by analyzing the hybrid phoneme-word lattice.
    Type: Application
    Filed: November 7, 2010
    Publication date: May 10, 2012
    Applicant: Nice Systems Ltd.
    Inventors: Moshe WASSERBLAT, Ronen Laperdon, Dori Shapira
  • Publication number: 20120116765
    Abstract: A speech recognition unit (102) includes a phrase determination unit (103) which determines a phrase boundary based on the comparison between the hypothetical word group generated by speech recognition and set words representing phrase boundaries. In this speech processing device, the speech recognition unit (102) outputs recognition results for each phrase based on a phrase boundary determined by the phrase determination unit (103).
    Type: Application
    Filed: June 4, 2010
    Publication date: May 10, 2012
    Applicant: NEC CORPORATION
    Inventors: Ken Hanazawa, Seiya Osada, Takayuki Arakawa, Koji Okabe, Daisuke Tanaka
  • Publication number: 20120095762
    Abstract: A method of recognizing speech is provided. The method includes the operations of (a) dividing first speech that is input to a speech recognizing apparatus into frames; (b) converting the frames of the first speech into frames of second speech by applying conversion rules to the divided frames, respectively; and (c) recognizing, by the speech recognizing apparatus, the frames of the second speech, wherein (b) comprises converting the frames of the first speech into the frames of the second speech by reflecting at least one frame from among the frames that are previously positioned with respect to a frame of the first speech.
    Type: Application
    Filed: October 19, 2011
    Publication date: April 19, 2012
    Applicants: SEOUL NATIONAL UNIVERSITY INDUSTRY FOUNDATION, SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Ki-wan EOM, Chang-woo HAN, Tae-gyoon KANG, Nam-soo KIM, Doo-hwa HONG, Jae-won LEE, Hyung-joon LIM
  • Publication number: 20120089393
    Abstract: A highlight section including an exciting scene is appropriately extracted with smaller amount of processing. A reflection coefficient calculating unit (12) calculates a parameter (reflection coefficient) representing a slope of spectrum distribution of the input audio signal for each frame. A reflection coefficient comparison unit (13) calculates an amount of change in the reflection coefficients between adjacent frames, and compares the calculation result with a predetermined threshold. An audio signal classifying unit (14) classifies the input audio signal into a background noise section and a speech section based on the comparison result. A background noise level calculating unit (15) calculates a level of a background noise in the background noise section based on signal energy in the background noise section. An event detecting unit (16) detects an event occurring point from a sharp increase in the background noise level.
    Type: Application
    Filed: June 2, 2010
    Publication date: April 12, 2012
    Inventor: Naoya Tanaka
  • Publication number: 20120059656
    Abstract: A method for determining a similarity between a first audio source and a second audio source includes: for the first audio source, determining a first frequency of occurrence for each of a plurality of phoneme sequences and determining a first weighted frequency for each of the plurality of phoneme sequences based on the first frequency of occurrence for the phoneme sequence; for the second audio source, determining a second frequency of occurrence for each of a plurality of phoneme sequences and determining a second weighted frequency for each of the plurality of phoneme sequences based on the second frequency of occurrence for the phoneme sequence; comparing the first weighted frequency for each phoneme sequence with the second weighted frequency for the corresponding phoneme sequence; and generating a similarity score representative of a similarity between the first audio source and the second audio source based on the results of the comparing.
    Type: Application
    Filed: August 30, 2011
    Publication date: March 8, 2012
    Applicant: Nexidia Inc.
    Inventors: Jacob B. Garland, Jon A. Arrowood, Drew Lanham, Marsal Gavalda
  • Publication number: 20120054054
    Abstract: An information providing system is disclosed. The system includes a management apparatus having a database storing facility-event information of facilities or events, and a portable apparatus communicatable with the management apparatus. The portable apparatus sets each place, at which a movement stop time exceeds a prescribed staying time, as a stay place. The management apparatus estimates that a living area of a user of the portable apparatus is a collective area covering all of the stay places of the user. The management apparatus sets the collective area as the search scope, extracts the facility-event information matching the search scope from the database, and transmits the extracted facility-event information to the portable apparatus.
    Type: Application
    Filed: August 24, 2011
    Publication date: March 1, 2012
    Applicant: DENSO CORPORATION
    Inventor: Shogo Kameyama
  • Patent number: 8123615
    Abstract: Herein disclosed is a gaming machine executing a game and paying out a predetermined amount of credits according to a game result; generating voice data based on a player's voice; identifying a voice pattern corresponding to the voice data by retrieving the dialogue voice database and identifying a type of voice corresponding to the voice data, so as to store the voice data along with the voice pattern into the memory; calculating a value indicative of a game result, and updating the play history data stored in the memory using the result of the calculation; comparing the play history data thus updated with a predetermined threshold value data; generating voice data according to the voice pattern based on the play history data if the play history data thus updated exceeds the predetermined threshold value data; and outputting voices from the speaker.
    Type: Grant
    Filed: January 23, 2009
    Date of Patent: February 28, 2012
    Assignee: Aruze Gaming America, Inc.
    Inventor: Kazuo Okada
  • Publication number: 20120041762
    Abstract: An apparatus and method for tracking dialogue and other sound signals in film, television or other systems with multiple channel sound is described. One or more audio channels which is expected to carry the speech of persons appearing in the program or other particular types of sounds is inspected to determine if that channel's audio includes particular sounds such as MUEVs, including phonemes corresponding to human speech patterns. If an improper number of particular sounds such as phonemes are found in the channel(s) an action such as a report, an alarm, a correction, or other action is taken. The inspection of the audio channel(s) may be made in conjunction with the appearance of corresponding images associated with the sound, such as visemes in the video signal, to improve the determination of types of sounds such as phonemes.
    Type: Application
    Filed: December 7, 2010
    Publication date: February 16, 2012
    Applicant: Pixel Instruments Corporation
    Inventors: J. Carl Cooper, Mirko Vojnovic, Christopher Smith
  • Publication number: 20120035932
    Abstract: In one implementation, a computer-implemented method includes receiving, at a mobile computing device, ambiguous user input that indicates more than one of a plurality of commands; and determining a current context associated with the mobile computing device that indicates where the mobile computing device is currently located. The method can further include disambiguating the ambiguous user input by selecting a command from the plurality of commands based on the current context associated with the mobile computing device; and causing output associated with performance of the selected command to be provided by the mobile computing device.
    Type: Application
    Filed: August 6, 2010
    Publication date: February 9, 2012
    Applicant: GOOGLE INC.
    Inventors: John Nicholas Jitkoff, Michael J. LeBeau
  • Publication number: 20120008802
    Abstract: A voice detection system and method for automatic volume controls and voice sensors is disclosed. More specifically, the invention addresses a situation where the user's own voice undesirably affects the functionality of an automatic volume control for a two-way communication device, such as a cellular telephone. In addition, the invention proposes solutions wherein one (voice) microphone is employed and also, when two (voice and noise) microphones are employed. Further, an algorithm is disclosed that addresses the issue concerning the user's own voice in an AVC pertaining to the two microphone solution. Yet further, a method herein is disclosed that detects the presence of voice in a single non-selective (noise) microphone.
    Type: Application
    Filed: January 3, 2011
    Publication date: January 12, 2012
    Inventor: Franklin S. Felber
  • Publication number: 20120008875
    Abstract: The present invention pertains to method and a communication device (100) for associating a contact record pertaining to a remote speaker (220) with a mnemonic image (191) based on attributes of the speaker (220). The method comprises receiving voice data of the speaker (220); in a communication session with a source device (200). A source determination representing the speaker (220) is registered, and then the received voice data is analyzed so that voice data characteristics can be extracted. Based on these voice data characteristics a mnemonic image (191) can be selected, and associated to a contact record in which the source determination is stored. The mnemonic image (191) may be selected among images previously stored in the device, or derived through editing of such images.
    Type: Application
    Filed: May 17, 2011
    Publication date: January 12, 2012
    Applicant: SONY ERICSSON MOBILE COMMUNICATIONS AB
    Inventor: Joakim MARTENSSON
  • Publication number: 20110282663
    Abstract: A method of and system for transient noise rejection for improved speech recognition. The method comprises the steps of (a) receiving audio including user speech and at least some transient noise associated with the speech, (b) converting the received audio into digital data, (c) segmenting the digital data into acoustic frames, and (d) extracting acoustic feature vectors from the acoustic frames. The method also comprises the steps of (e) evaluating the acoustic frames for transient noise on a frame-by-frame basis, (f) rejecting those acoustic frames having transient noise, (g) accepting as speech frames those acoustic frames having no transient noise and, thereafter, (h) recognizing the user speech using the speech frames.
    Type: Application
    Filed: May 13, 2010
    Publication date: November 17, 2011
    Applicant: GENERAL MOTORS LLC
    Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
  • Publication number: 20110257976
    Abstract: Speech recognition includes structured modeling, irrelevant variability normalization and unsupervised online adaptation of speech recognition parameters.
    Type: Application
    Filed: April 14, 2010
    Publication date: October 20, 2011
    Applicant: Microsoft Corporation
    Inventor: Qiang Huo
  • Publication number: 20110251844
    Abstract: Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.
    Type: Application
    Filed: June 20, 2011
    Publication date: October 13, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Xiao Li, Asela J. R. Gunawardana, Alejandro Acero
  • Publication number: 20110231191
    Abstract: A weight coefficient generation device, a speech recognition device, a navigation system, a vehicle, a vehicle coefficient generation method, and a weight coefficient generation program are provided for the purpose of improving a speech recognition performance of place names. In order to address the above purpose, an address database 12 has address information data items including country names, city names, street names, and house numbers, and manages the address information having a tree structure indicating hierarchical relationships between the place names from wide area to a narrow area. Each of the place names stored in the address database 12 is taken as a speech recognition candidate. A weight coefficient calculation unit 11 of a weight coefficient generation device 10 calculates a weight coefficient of the likelihood of the aforementioned recognition candidate based on the number of the street names belonging to the lower hierarchy below the city names.
    Type: Application
    Filed: November 17, 2009
    Publication date: September 22, 2011
    Inventor: Toshiyuki Miyazaki
  • Publication number: 20110224982
    Abstract: Described is a technology in which information retrieval (IR) techniques are used in a speech recognition (ASR) system. Acoustic units (e.g., phones, syllables, multi-phone units, words and/or phrases) are decoded, and features found from those acoustic units. The features are then used with IR techniques (e.g., TF-IDF based retrieval) to obtain a target output (a word or words).
    Type: Application
    Filed: March 12, 2010
    Publication date: September 15, 2011
    Applicant: c/o Microsoft Corporation
    Inventors: Alejandro Acero, James Garnet Droppo, III, Xiaoqiang Xiao, Geoffrey G. Zweig
  • Patent number: 8019593
    Abstract: Embodiments of a feature generation system and process for use in machine learning applications utilizing statistical modeling systems are described. In one embodiment, the feature generation process generates large feature spaces by combining features using logical, arithmetic and/or functional operations. A first set of features in an initial feature space are defined. Some or all of the first set of features are processed using one or more arithmetic, logic, user-defined combinatorial processes, or combinations thereof, to produce additional features. The additional features and at least some of the first set of features are combined to produce an expanded feature space. The expanded feature space is processed through a feature selection and optimization process to produce a model in a statistical modeling system.
    Type: Grant
    Filed: June 30, 2006
    Date of Patent: September 13, 2011
    Assignee: Robert Bosch Corporation
    Inventors: Fuliang Weng, Zhe Feng, Qi Zhang
  • Patent number: 8019594
    Abstract: Embodiments of a progressive feature selection method that selects features in multiple rounds are described. In one embodiment, the progressive feature selection method splits the feature space into tractable sub-spaces such that a feature selection algorithm can be performed on each sub-space. In a merge-split operation, the subset of features that the feature selection algorithm selects from the different sub-spaces are merged into subsequent sets of features. Instead of re-generating the mapping table for each subsequent set from scratch, a new mapping table from the previous round's tables is created by collecting those entries that correspond to the selected features. The feature selection method is then performed again on each of the subsequent feature sets and new features are selected from each of these feature sets. This feature selection-merge-split process is repeated on successively smaller numbers of feature sets until a single final set of features is selected.
    Type: Grant
    Filed: June 30, 2006
    Date of Patent: September 13, 2011
    Assignee: Robert Bosch Corporation
    Inventors: Fuliang Weng, Zhe Feng, Qi Zhang
  • Publication number: 20110213615
    Abstract: A method for configuring a voice authentication system comprises ascertaining a measure of confidence associated with a voice sample enrolled with the authentication system. The measure of confidence is derived through simulated impostor testing carried out on the enrolled sample.
    Type: Application
    Filed: September 7, 2009
    Publication date: September 1, 2011
    Applicant: Auraya Pty Ltd
    Inventors: Clive Summerfield, Habib E. Talhami
  • Publication number: 20110213611
    Abstract: A method and a device control the transport of an object to a predetermined destination. The object is provided with information on a destination to which the object is to be transported. The destination information with which the object is provided is inputted into a speech detection station. A speech recognition system evaluates the destination information detected by the speech detection station. A conveying device transports the object. The destination, the information of which is provided to the object, is determined. The evaluation result of the speech recognition system is used to determine the destination. A release signal is produced. The release signal triggers two processes: the speech detection station is released for the input of destination information on another object. The conveying device transports the object. The transport of the object to the determined destination is triggered.
    Type: Application
    Filed: August 28, 2009
    Publication date: September 1, 2011
    Applicant: SIEMENS AKTIENGESELLSCHAFT
    Inventor: Ingolf Rauh
  • Publication number: 20110208525
    Abstract: A voice recognizing apparatus includes a voice start instructing section 3 for instructing to start voice recognition; a voice input section 1 for receiving uttered voice and converting to a voice signal; a voice recognizing section 2 for recognizing the voice on the basis of the voice signal; an utterance start time detecting section 4 for detecting duration from the time when the voice start instructing section instructs to the time when the voice input section delivers the voice signal; an utterance timing deciding section 5 for deciding utterance timing indicating whether the utterance start is quick or slow by comparing the duration detected by the utterance start time detecting section with a prescribed threshold; an interaction control section 6 for determining a content, which is to be shown when exhibiting a recognition result of the voice recognizing section, in accordance with the utterance timing decided; a system response generating section 7 for generating a system response on the basis of the d
    Type: Application
    Filed: March 27, 2008
    Publication date: August 25, 2011
    Inventors: Yuzuru Inoue, Tadashi Suzuki, Fumitaka Sato, Takayoshi Chikuri
  • Publication number: 20110208527
    Abstract: A voice activatable system for providing the correct spelling of a spoken word is disposed in an elongated body of a writing instrument such as a ball point pen. The system includes a microphone the output of which is fed to an amplifier analog to a digital converter and from there to a speech recognition program, the output of the speech recognition program is fed to a computer, namely a word processor/controller that includes a data base. The output of the speech recognition is compared with the digital library of words and when a match is found, it is amplified and fed to digital to analog connector. The output of the digital/analog computer is fed to a speaker that repeats the word with the correct pronunciation followed by a correct spelling of the word. The system includes a battery for powering the system as well as an on/off switch and a repeat button for repeating information from the system.
    Type: Application
    Filed: February 23, 2010
    Publication date: August 25, 2011
    Inventor: Fawzi Q. Behbehani
  • Publication number: 20110184730
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing voice commands. In one aspect, a method includes receiving an audio signal at a server, performing, by the server, speech recognition on the audio signal to identify one or more candidate terms that match one or more portions of the audio signal, identifying one or more possible intended actions for each candidate term, providing information for display on a client device, the information specifying the candidate terms and the actions for each candidate term, receiving from the client device an indication of an action selected by a user, where the action was selected from among the actions included in the provided information, and invoking the action selected by the user.
    Type: Application
    Filed: January 22, 2010
    Publication date: July 28, 2011
    Applicant: GOOGLE INC.
    Inventors: Michael J. LeBeau, William J. Byrne, Nicholas Jitkoff, Alexander H. Gruenstein
  • Publication number: 20110178799
    Abstract: Methods and systems of identifying speech sound features within a speech sound are provided. The sound features may be identified using a multi-dimensional analysis that analyzes the time, frequency, and intensity at which a feature occurs within a speech sound, and the contribution of the feature to the sound. Information about sound features may be used to enhance spoken speech sounds to improve recognizability of the speech sounds by a listener.
    Type: Application
    Filed: July 24, 2009
    Publication date: July 21, 2011
    Applicant: The Board of Trustees of the University of Illinois
    Inventors: Jont B. Allen, Feipeng Li
  • Publication number: 20110161084
    Abstract: Apparatus, method and system for generating a threshold for utterance verification are introduced herein. When a processing object is determined, a recommendation threshold is generated according to an expected utterance verification result. In addition, extra collection of corpuses or training models is not necessary for the utterance verification introduced here. The processing unit can be a recognition object or an utterance verification object. In the apparatus, method and system for generating a threshold for utterance verification, at least one of the processing objects is received and then a speech unit sequence is generated therefrom. One or more values corresponding to each of the speech unit of the speech unit sequence are obtained accordingly, and then a recommendation threshold is generated based on an expected utterance verification result.
    Type: Application
    Filed: June 24, 2010
    Publication date: June 30, 2011
    Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE
    Inventors: Cheng-Hsien Lin, Sen-Chia Chang, Chi-Tien Chiu
  • Publication number: 20110125498
    Abstract: One embodiment of the invention provides a computer-implemented method of handling a telephone call. The method comprises monitoring a conversation between an agent and a customer on a telephone line as part of the telephone call to extract the audio signal therefrom. Real-time voice analytics are performed on the extracted audio signal while the telephone call is in progress. The results from the voice analytics are then passed to a computer-telephony integration system responsible for the call for use by the computer-telephony integration system for determining future handling of the call.
    Type: Application
    Filed: June 19, 2009
    Publication date: May 26, 2011
    Applicant: NEWVOICEMEDIA LTD
    Inventors: Richard Pickering, Joseph Moussalli, Ashley Unitt
  • Publication number: 20110109539
    Abstract: A behavior recognition system and method by combining an image and a speech are provided. The system includes a data analyzing module, a database, and a calculating module. A plurality of image-and-speech relation modules is stored in the database. Each image-and-speech relation module includes a feature extraction parameter and an image-and-speech relation parameter. The data analyzing module obtains a gesture image and a speech data corresponding to each other, and substitutes the gesture image and the speech data into each feature extraction parameter to generate image feature sequences and speech feature sequences. The data analyzing module uses each image-and-speech relation parameter to calculate image-and-speech status parameters.
    Type: Application
    Filed: December 9, 2009
    Publication date: May 12, 2011
    Inventors: Chung-Hsien Wu, Jen-Chun Lin, Wen-Li Wei, Chia-Te Chu, Red-Tom Lin, Chin-Shun Hsu
  • Publication number: 20110093261
    Abstract: Systems and methods are operable to associate each of a plurality of stored audio patterns with at least one of a plurality of digital tokens, identify a user based on user identification input, access a plurality of stored audio patterns associated with a user based on the user identification input, receive from a user at least one audio input from a custom language made up of custom language elements wherein the elements include at least one monosyllabic representation of a number, letter or word, select one of the plurality of stored audio patterns associated with the identified user, in the case that the audio input received from the identified user corresponds with one of the plurality of stored audio patterns, determine the digital token associated with the selected one of the plurality of stored audio patterns, and generate the output signal for use in a device based on the determined digital token.
    Type: Application
    Filed: October 15, 2010
    Publication date: April 21, 2011
    Inventor: Paul Angott
  • Publication number: 20110082694
    Abstract: A method for real-time data-pattern analysis. The method includes receiving and queuing at least one data-pattern analysis request by a data-pattern analysis unit controller. At least one data stream portion is also received and stored by the data-pattern analysis unit controller, each data stream portion corresponding to a received data-pattern analysis request. Next, a received data-pattern analysis request is selected by the data-pattern analysis unit controller along with a corresponding data stream portion. A data-pattern analysis is performed based on the selected data-pattern analysis request and the corresponding data stream portion, wherein the data-pattern analysis is performed by one of a plurality of data-pattern analysis units.
    Type: Application
    Filed: August 9, 2010
    Publication date: April 7, 2011
    Inventors: Richard FASTOW, Qamrul HASAN
  • Publication number: 20110082697
    Abstract: A method is described for correcting and improving the functioning of certain devices for the diagnosis and treatment of speech that dynamically measure the functioning of the velum in the control of nasality during speech. The correction method uses an estimate of the vowel frequency spectrum to greatly reduce the variation of nasalance with the vowel being spoken, so as to result in a corrected value of nasalance that reflects with greater accuracy the degree of velar opening. Correction is also described for reducing the effect on nasalance values of energy from the oral and nasal channels crossing over into the other channel because of imperfect acoustic separation.
    Type: Application
    Filed: October 6, 2009
    Publication date: April 7, 2011
    Applicant: Rothenberg Enterprises
    Inventor: Martin ROTHENBERG
  • Publication number: 20110077944
    Abstract: A speech recognition module includes an acoustic front-end module, a sound detection module, and a word detection module. The acoustic front-end module generates a plurality of representations of frames from a digital audio signal and generates speech characteristic probabilities for the plurality of frames. The sound detection module determines a plurality of estimated utterances from the plurality of representations and the speech characteristic probabilities. The word detection module determines one or more words based on the plurality of estimated utterances and the speech characteristics probabilities.
    Type: Application
    Filed: November 30, 2009
    Publication date: March 31, 2011
    Applicant: BROADCOM CORPORATION
    Inventor: Nambirajan Seshadri
  • Publication number: 20110071830
    Abstract: The present invention provides a combined lip reading and voice recognition multimodal interface system, which can issue a navigation operation instruction only by voice and lip movements, thus allowing a driver to look ahead during a navigation operation and reducing vehicle accidents related to navigation operations during driving.
    Type: Application
    Filed: December 1, 2009
    Publication date: March 24, 2011
    Applicants: HYUNDAI MOTOR COMPANY, KIA MOTORS CORPORATION
    Inventors: Dae Hee Kim, Dai-Jin Kim, Jin Lee, Jong-Ju Shin, Jin-Seok Lee