Feature Extraction For Speech Recognition; Selection Of Recognition Unit (epo) Patents (Class 704/E15.004)

Smart speaker and operation method thereof

Patent number: 11955125

Abstract: The present disclosure provides an operation method of a smart speaker. The method includes steps as follows. The linked settings among voiceprint registration data, user information and a cast setting of a user device are preloaded by the smart speaker. Wake-up words are received to set an operation mode of the smart speaker and to generate a voiceprint recognition result. In the operation mode, after receiving voice, the voice is converted into voice text and the voiceprint recognition result is compared to voiceprint registration data. When the voiceprint recognition result matches the voiceprint registration data, the user information and the voice text are transmitted to a cloud server, so that the cloud server returns the response message to the smart speaker. According to the cast setting, the response message is sent to the user device.

Type: Grant

Filed: October 15, 2019

Date of Patent: April 9, 2024

Assignee: AmTRAN Technology Co., Ltd.

Inventors: Che-Chia Ho, Chia-Wei Lin
Dynamically assigning multi-modality circumstantial data to assistant action requests for correlating with subsequent requests

Patent number: 11948575

Abstract: Implementations set forth herein relate to an automated assistant that uses circumstantial condition data, generated based on circumstantial conditions of an input, to determine whether the input should affect an action been initialized by a particular user. The automated assistant can allow each user to manipulate their respective ongoing action without necessitating interruptions for soliciting explicit user authentication. For example, when an individual in a group of persons interacts with the automated assistant to initialize or affect a particular ongoing action, the automated assistant can generate data that correlates that individual to the particular ongoing action. The data can be generated using a variety of different input modalities, which can be dynamically selected based on changing circumstances of the individual. Therefore, different sets of input modalities can be processed each time a user provides an input for modifying an ongoing action and/or initializing another action.

Type: Grant

Filed: January 12, 2023

Date of Patent: April 2, 2024

Assignee: GOOGLE LLC

Inventors: Andrew Gallagher, Caroline Pantofaru, Vinay Bettadapura, Utsav Prabhu
Setting ad breakpoints in a video within a messaging system

Patent number: 11694444

Abstract: Aspects of the present disclosure involve a system comprising a computer-readable storage medium storing a program and method for setting ad breakpoints in a video. The program and method provide for accessing a video; determining plural shot boundaries for the video, each shot boundary defining a shot corresponding to a contiguous sequence of video frames that is free of cuts or transitions; and for each shot boundary of the plural shot boundaries, performing a set of breakpoint tests on the shot boundary, each breakpoint test configured to return a respective score indicating whether the shot boundary corresponds to a breakpoint for potential insertion of an ad during playback of the video, calculating a combined score for the shot boundary based on combining the each of the respective scores, and setting, in a case where the combined score meets a threshold value, the shot boundary as the breakpoint.

Type: Grant

Filed: April 5, 2021

Date of Patent: July 4, 2023

Assignee: Snap Inc.

Inventors: Khalil Chatoo, David Michael Hornsby, Jeffrey Kile, Chinmay Lonkar, Zhimin Wang, Ian Anthony Wehrman
Voice verification for media playback

Patent number: 11562740

Abstract: In one aspect, a network microphone device includes a plurality of microphones and is configured to capture a voice input via the one or more microphones, detect a wake word in the voice input, transmit data associated with the voice input to one or more remote computing devices associated with a voice assistant service, and receive a response from the one or more remote computing devices, the response comprising a playback command based on the voice input. The network microphone device may be configured to obtain verification information characterizing the voice input and, based on the verification information indicating that the voice input was spoken by an unverified user, functionally disable the NMD from performing the playback command.

Type: Grant

Filed: January 7, 2020

Date of Patent: January 24, 2023

Assignee: Sonos, Inc.

Inventor: Connor Kristopher Smith
DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS

Publication number: 20140012575

Abstract: In some embodiments, the recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated to determine whether a meaning of any of the alternative recognition results differs from a meaning of the top recognition result in a manner that is significant for a domain, such as the medical domain. In some embodiments, words and/or phrases that may be confused by an ASR system may be determined and associated in sets of words and/or phrases. Words and/or phrases that may be determined include those that change a meaning of a phrase or sentence when included in the phrase/sentence.

Type: Application

Filed: July 9, 2012

Publication date: January 9, 2014

Applicant: Nuance Communications, Inc.

Inventors: William F. Ganong, III, Raghu Vemula, Robert Fleming
DICTATION WITH INCREMENTAL RECOGNITION OF SPEECH

Publication number: 20130297307

Abstract: A dictation module is described herein which receives and interprets a complete utterance of the user in incremental fashion, that is, one incremental portion at a time. The dictation module also provides rendered text in incremental fashion. The rendered text corresponds to the dictation module's interpretation of each incremental portion. The dictation module also allows the user to modify any part of the rendered text, as it becomes available. In one case, for instance, the dictation module provides a marking menu which includes multiple options by which a user can modify a selected part of the rendered text. The dictation module also uses the rendered text (as modified or unmodified by the user using the marking menu) to adjust one or more models used by the dictation model to interpret the user's utterance.

Type: Application

Filed: May 1, 2012

Publication date: November 7, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Timothy S. Paek, Bongshin Lee, Bo-June Hsu
INFORMATION PROCESSING APPARATUS, NON-TRANSITORY COMPUTER READABLE MEDIUM STORING INFORMATION PROCESSING PROGRAM, AND INFORMATION PROCESSING METHOD

Publication number: 20130080153

Abstract: An information processing apparatus includes a receiving unit that receives character sequences, a sorting unit that sorts the character sequences received by the receiving unit into known words and unknown words, and a detecting unit that detects character sequences sorted as unknown words by the sorting unit as incorrect words and detects a third character sequence between a first character sequence and a second character sequence, which have been sorted as unknown words by the sorting unit, as incorrect words when the third character sequence includes words sorted as known words by the sorting unit and the number of the known words is less than or equal to or less than a predetermined number.

Type: Application

Filed: January 6, 2012

Publication date: March 28, 2013

Applicant: FUJI XEROX CO., LTD.

Inventor: Eiichi TANAKA
Restoration of high-order Mel Frequency Cepstral Coefficients

Publication number: 20130046540

Abstract: A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N-L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.

Type: Application

Filed: December 3, 2007

Publication date: February 21, 2013

Inventor: Alexander Sorin
APPARATUS AND METHOD FOR CLUSTERING SPEAKERS, AND A NON-TRANSITORY COMPUTER READABLE MEDIUM THEREOF

Publication number: 20130030794

Abstract: According to one embodiment, a speaker clustering apparatus includes a clustering unit, an extraction unit, and an error detection unit. The clustering unit is configured to extract acoustic features for speakers from an acoustic signal, and to cluster utterances included in the acoustic signal into the speakers by using the acoustic features. The extraction unit is configured to acquire character strings representing contents of the utterances, and to extract linguistic features of the speakers by using the character strings. The error detection unit is configured to decide that, when one of the character strings does not fit with a linguistic feature of a speaker into which an utterance of the one is clustered, the utterance is erroneously clustered by the clustering unit.

Type: Application

Filed: March 6, 2012

Publication date: January 31, 2013

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Tomoo Ikeda, Manabu Nagao, Osamu Nishiyama, Hirokazu Suzuki, Koji Ueno, Nobuhiro Shimogori
SPEECH SEGMENT DETERMINATION DEVICE, AND STORAGE MEDIUM

Publication number: 20120253813

Abstract: A speech segment determination device includes a frame division portion, a power spectrum calculation portion, a power spectrum operation portion, a spectral entropy calculation portion and a determination portion. The frame division portion divides an input signal in units of frames. The power spectrum calculation portion calculates, using an analysis length, a power spectrum of the input signal for each of the frames that have been divided. The power spectrum operation portion adds a value of the calculated power spectrum to a value of power spectrum in each of frequency bins. The spectral entropy calculation portion calculates spectral entropy using the power spectrum whose value has been increased. The determination portion determines, based on a value of the spectral entropy, whether the input signal is a signal in a speech segment.

Type: Application

Filed: February 17, 2012

Publication date: October 4, 2012

Applicant: OKI ELECTRIC INDUSTRY CO., LTD.

Inventor: Kazuhiro KATAGIRI
SPEECH SYLLABLE/VOWEL/PHONE BOUNDARY DETECTION USING AUDITORY ATTENTION CUES

Publication number: 20120253812

Abstract: In syllable or vowel or phone boundary detection during speech, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more syllable or vowel or phone boundaries in the input window of sound can be detected by mapping the cumulative gist vector to one or more syllable or vowel or phone boundary characteristics using a machine learning algorithm.

Type: Application

Filed: April 1, 2011

Publication date: October 4, 2012

Applicant: Sony Computer Entertainment Inc.

Inventors: OZLEM KALINLI, Ruxin Chen
Probabilistic Representation of Acoustic Segments

Publication number: 20120245919

Abstract: An automatic speech recognition (ASR) apparatus for an embedded device application is described. A speech decoder receives an input sequence of speech feature vectors in a first language and outputs an acoustic segment lattice representing a probabilistic combination of basic linguistic units in a second language. A vocabulary matching module compares the acoustic segment lattice to vocabulary models in the first language to determine an output set of probability-ranked recognition hypotheses. A detailed matching module compares the set of probability-ranked recognition hypotheses to detailed match models in the first language to determine a recognition output representing a vocabulary word most likely to correspond to the input sequence of speech feature vectors.

Type: Application

Filed: September 23, 2009

Publication date: September 27, 2012

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Guillermo Aradilla, Rainer Gruhn
Device Access Using Voice Authentication

Publication number: 20120245941

Abstract: A device can be configured to receive speech input from a user. The speech input can include a command for accessing a restricted feature of the device. The speech input can be compared to a voiceprint (e.g., text-independent voiceprint) of the user's voice to authenticate the user to the device. Responsive to successful authentication of the user to the device, the user is allowed access to the restricted feature without the user having to perform additional authentication steps or speaking the command again. If the user is not successfully authenticated to the device, additional authentication steps can be request by the device (e.g., request a password).

Type: Application

Filed: March 21, 2011

Publication date: September 27, 2012

Inventor: Adam J. Cheyer
Sound Recognition Operation Apparatus and Sound Recognition Operation Method

Publication number: 20120215537

Abstract: According to one embodiment, a sound recognition operation apparatus includes a sound detection module, a keyword detection module, an audio mute module, and a transmission module. The sound detection module is configured to detect sound. The keyword detection module is configured to detect a particular keyword using voice recognition when the sound detection module detects sound. The audio mute module is configured to transmit an operation signal for muting audio sound when the keyword detection module detects the keyword. The transmission module is configured to recognize the voice command after the keyword is detected by the keyword detection module, and transmit an operation signal corresponding to the voice command.

Type: Application

Filed: September 21, 2011

Publication date: August 23, 2012

Inventor: Yoshihiro Igarashi
SYSTEM AND METHOD OF PERFORMING USER-SPECIFIC AUTOMATIC SPEECH RECOGNITION

Publication number: 20120185237

Abstract: Speech recognition models are dynamically re-configurable based on user information, application information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. Word recognition lattices are generated for each data field of an application and dynamically concatenated into a single word recognition lattice. A language model is applied to the concatenated word recognition lattice to determine the relationships between the word recognition lattices and repeated until the generated word recognition lattices are acceptable or differ from a predetermined value only by a threshold amount. These techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.

Type: Application

Filed: March 26, 2012

Publication date: July 19, 2012

Applicant: AT&T Intellectual Property II, L.P.

Inventors: Bojana GAJIC, Shrikanth Sambasivan Narayanan, Sarangarajan Parthasarathy, Richard Cameron Rose, Aaron Edward Rosenberg
ACOUSTIC PROCESSING APPARATUS AND METHOD

Publication number: 20120173232

Abstract: An acoustic processing apparatus is provided. The acoustic processing apparatus including a first extracting unit configured to extract a first acoustic model that corresponds with a first position among positions set in a speech recognition target area, a second extracting unit configured to extract at least one second acoustic model that corresponds with, respectively, at least one second position in proximity to the first position, and an acoustic model generating unit configured to generate a third acoustic model based on the first acoustic model, the second acoustic model, or a combination thereof.

Type: Application

Filed: July 28, 2011

Publication date: July 5, 2012

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Nam-Hoon KIM, Jeong-Su KIM, Jeong-Mi CHO
Method and system for video summarization

Publication number: 20120123780

Abstract: A video summary method comprises dividing a video into a plurality of video shots, analyzing each frame in a video shot from the plurality of video shots, determining a saliency of each frame of the video shot, determining a key frame of the video shot based on the saliency of each frame of the video shot, extracting visual features from the key frame and performing shot clustering of the plurality of video shots to determine concept patterns based on the visual features. The method further comprises fusing different concept patterns using a saliency tuning method and generating a summary of the video based upon a global optimization method.

Type: Application

Filed: November 15, 2011

Publication date: May 17, 2012

Applicant: FutureWei Technologies, Inc.

Inventors: Jizhou Gao, Yu Huang, Hong Heather Yu
SPEECH PROCESSING DEVICE, METHOD, AND STORAGE MEDIUM

Publication number: 20120116765

Abstract: A speech recognition unit (102) includes a phrase determination unit (103) which determines a phrase boundary based on the comparison between the hypothetical word group generated by speech recognition and set words representing phrase boundaries. In this speech processing device, the speech recognition unit (102) outputs recognition results for each phrase based on a phrase boundary determined by the phrase determination unit (103).

Type: Application

Filed: June 4, 2010

Publication date: May 10, 2012

Applicant: NEC CORPORATION

Inventors: Ken Hanazawa, Seiya Osada, Takayuki Arakawa, Koji Okabe, Daisuke Tanaka
SYSTEM FOR VOICE CONTROL OF A MEDICAL IMPLANT

Publication number: 20120116774

Abstract: An implantable system (11) for control of and communication with an implant (17) in a body, comprising a command input device (12) and a processing device (13) coupled thereto, the processing device (13) being adapted to generate input to a command generator (16) which is comprised in the system (11) coupled to the processing device (13) and which is adapted to generate and communicate commands to the medical implant (17) in response to input received from the processing device (13), the system (11) further comprising a memory unit (15) connected to at least one of said devices in the system (11) for storing a memory bank of commands. The command input device (12) is adapted to receive commands from a user as voice commands, and the processing device (13) comprises a filter adapted to filter voice commands against high frequency losses and frequency distortion caused by the mammal body (10).

Type: Application

Filed: July 19, 2010

Publication date: May 10, 2012

Applicant: MILUX HOLDING SA

Inventor: Peter Forsell
METHOD AND APPARATUS FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION

Publication number: 20120116766

Abstract: A method and apparatus combining the advantages of phonetic search such as the rapid implementation and deployment and medium accuracy, with the advantages of speech to text, including providing the full text of the audio and rapid search. The method and apparatus comprise steps or components for receiving the audio signal captured in the call center environment; extracting a multiplicity of feature vectors from the audio signal; creating a phoneme lattice from the multiplicity of feature vectors, the phoneme lattice comprising one or more allophone, each allophone comprising two or more phonemes; creating a hybrid phoneme-word lattice from the phoneme lattice; and extracting the word by analyzing the hybrid phoneme-word lattice.

Type: Application

Filed: November 7, 2010

Publication date: May 10, 2012

Applicant: Nice Systems Ltd.

Inventors: Moshe WASSERBLAT, Ronen Laperdon, Dori Shapira
FRONT-END PROCESSOR FOR SPEECH RECOGNITION, AND SPEECH RECOGNIZING APPARATUS AND METHOD USING THE SAME

Publication number: 20120095762

Abstract: A method of recognizing speech is provided. The method includes the operations of (a) dividing first speech that is input to a speech recognizing apparatus into frames; (b) converting the frames of the first speech into frames of second speech by applying conversion rules to the divided frames, respectively; and (c) recognizing, by the speech recognizing apparatus, the frames of the second speech, wherein (b) comprises converting the frames of the first speech into the frames of the second speech by reflecting at least one frame from among the frames that are previously positioned with respect to a frame of the first speech.

Type: Application

Filed: October 19, 2011

Publication date: April 19, 2012

Applicants: SEOUL NATIONAL UNIVERSITY INDUSTRY FOUNDATION, SAMSUNG ELECTRONICS CO., LTD.

Inventors: Ki-wan EOM, Chang-woo HAN, Tae-gyoon KANG, Nam-soo KIM, Doo-hwa HONG, Jae-won LEE, Hyung-joon LIM
ACOUSTIC SIGNAL PROCESSING DEVICE AND METHOD

Publication number: 20120089393

Abstract: A highlight section including an exciting scene is appropriately extracted with smaller amount of processing. A reflection coefficient calculating unit (12) calculates a parameter (reflection coefficient) representing a slope of spectrum distribution of the input audio signal for each frame. A reflection coefficient comparison unit (13) calculates an amount of change in the reflection coefficients between adjacent frames, and compares the calculation result with a predetermined threshold. An audio signal classifying unit (14) classifies the input audio signal into a background noise section and a speech section based on the comparison result. A background noise level calculating unit (15) calculates a level of a background noise in the background noise section based on signal energy in the background noise section. An event detecting unit (16) detects an event occurring point from a sharp increase in the background noise level.

Type: Application

Filed: June 2, 2010

Publication date: April 12, 2012

Inventor: Naoya Tanaka
Speech Signal Similarity

Publication number: 20120059656

Abstract: A method for determining a similarity between a first audio source and a second audio source includes: for the first audio source, determining a first frequency of occurrence for each of a plurality of phoneme sequences and determining a first weighted frequency for each of the plurality of phoneme sequences based on the first frequency of occurrence for the phoneme sequence; for the second audio source, determining a second frequency of occurrence for each of a plurality of phoneme sequences and determining a second weighted frequency for each of the plurality of phoneme sequences based on the second frequency of occurrence for the phoneme sequence; comparing the first weighted frequency for each phoneme sequence with the second weighted frequency for the corresponding phoneme sequence; and generating a similarity score representative of a similarity between the first audio source and the second audio source based on the results of the comparing.

Type: Application

Filed: August 30, 2011

Publication date: March 8, 2012

Applicant: Nexidia Inc.

Inventors: Jacob B. Garland, Jon A. Arrowood, Drew Lanham, Marsal Gavalda
Information providing apparatus and system

Publication number: 20120054054

Abstract: An information providing system is disclosed. The system includes a management apparatus having a database storing facility-event information of facilities or events, and a portable apparatus communicatable with the management apparatus. The portable apparatus sets each place, at which a movement stop time exceeds a prescribed staying time, as a stay place. The management apparatus estimates that a living area of a user of the portable apparatus is a collective area covering all of the stay places of the user. The management apparatus sets the collective area as the search scope, extracts the facility-event information matching the search scope from the database, and transmits the extracted facility-event information to the portable apparatus.

Type: Application

Filed: August 24, 2011

Publication date: March 1, 2012

Applicant: DENSO CORPORATION

Inventor: Shogo Kameyama
Multiplayer gaming machine capable of changing voice pattern

Patent number: 8123615

Abstract: Herein disclosed is a gaming machine executing a game and paying out a predetermined amount of credits according to a game result; generating voice data based on a player's voice; identifying a voice pattern corresponding to the voice data by retrieving the dialogue voice database and identifying a type of voice corresponding to the voice data, so as to store the voice data along with the voice pattern into the memory; calculating a value indicative of a game result, and updating the play history data stored in the memory using the result of the calculation; comparing the play history data thus updated with a predetermined threshold value data; generating voice data according to the voice pattern based on the play history data if the play history data thus updated exceeds the predetermined threshold value data; and outputting voices from the speaker.

Type: Grant

Filed: January 23, 2009

Date of Patent: February 28, 2012

Assignee: Aruze Gaming America, Inc.

Inventor: Kazuo Okada
Dialogue Detector and Correction

Publication number: 20120041762

Abstract: An apparatus and method for tracking dialogue and other sound signals in film, television or other systems with multiple channel sound is described. One or more audio channels which is expected to carry the speech of persons appearing in the program or other particular types of sounds is inspected to determine if that channel's audio includes particular sounds such as MUEVs, including phonemes corresponding to human speech patterns. If an improper number of particular sounds such as phonemes are found in the channel(s) an action such as a report, an alarm, a correction, or other action is taken. The inspection of the audio channel(s) may be made in conjunction with the appearance of corresponding images associated with the sound, such as visemes in the video signal, to improve the determination of types of sounds such as phonemes.

Type: Application

Filed: December 7, 2010

Publication date: February 16, 2012

Applicant: Pixel Instruments Corporation

Inventors: J. Carl Cooper, Mirko Vojnovic, Christopher Smith
Disambiguating Input Based on Context

Publication number: 20120035932

Abstract: In one implementation, a computer-implemented method includes receiving, at a mobile computing device, ambiguous user input that indicates more than one of a plurality of commands; and determining a current context associated with the mobile computing device that indicates where the mobile computing device is currently located. The method can further include disambiguating the ambiguous user input by selecting a command from the plurality of commands based on the current context associated with the mobile computing device; and causing output associated with performance of the selected command to be provided by the mobile computing device.

Type: Application

Filed: August 6, 2010

Publication date: February 9, 2012

Applicant: GOOGLE INC.

Inventors: John Nicholas Jitkoff, Michael J. LeBeau
METHOD AND DEVICE FOR MNEMONIC CONTACT IMAGE ASSOCIATION

Publication number: 20120008875

Abstract: The present invention pertains to method and a communication device (100) for associating a contact record pertaining to a remote speaker (220) with a mnemonic image (191) based on attributes of the speaker (220). The method comprises receiving voice data of the speaker (220); in a communication session with a source device (200). A source determination representing the speaker (220) is registered, and then the received voice data is analyzed so that voice data characteristics can be extracted. Based on these voice data characteristics a mnemonic image (191) can be selected, and associated to a contact record in which the source determination is stored. The mnemonic image (191) may be selected among images previously stored in the device, or derived through editing of such images.

Type: Application

Filed: May 17, 2011

Publication date: January 12, 2012

Applicant: SONY ERICSSON MOBILE COMMUNICATIONS AB

Inventor: Joakim MARTENSSON
VOICE DETECTION FOR AUTOMATIC VOLUME CONTROLS AND VOICE SENSORS

Publication number: 20120008802

Abstract: A voice detection system and method for automatic volume controls and voice sensors is disclosed. More specifically, the invention addresses a situation where the user's own voice undesirably affects the functionality of an automatic volume control for a two-way communication device, such as a cellular telephone. In addition, the invention proposes solutions wherein one (voice) microphone is employed and also, when two (voice and noise) microphones are employed. Further, an algorithm is disclosed that addresses the issue concerning the user's own voice in an AVC pertaining to the two microphone solution. Yet further, a method herein is disclosed that detects the presence of voice in a single non-selective (noise) microphone.

Type: Application

Filed: January 3, 2011

Publication date: January 12, 2012

Inventor: Franklin S. Felber
TRANSIENT NOISE REJECTION FOR SPEECH RECOGNITION

Publication number: 20110282663

Abstract: A method of and system for transient noise rejection for improved speech recognition. The method comprises the steps of (a) receiving audio including user speech and at least some transient noise associated with the speech, (b) converting the received audio into digital data, (c) segmenting the digital data into acoustic frames, and (d) extracting acoustic feature vectors from the acoustic frames. The method also comprises the steps of (e) evaluating the acoustic frames for transient noise on a frame-by-frame basis, (f) rejecting those acoustic frames having transient noise, (g) accepting as speech frames those acoustic frames having no transient noise and, thereafter, (h) recognizing the user speech using the speech frames.

Type: Application

Filed: May 13, 2010

Publication date: November 17, 2011

Applicant: GENERAL MOTORS LLC

Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
Robust Speech Recognition

Publication number: 20110257976

Abstract: Speech recognition includes structured modeling, irrelevant variability normalization and unsupervised online adaptation of speech recognition parameters.

Type: Application

Filed: April 14, 2010

Publication date: October 20, 2011

Applicant: Microsoft Corporation

Inventor: Qiang Huo
GRAPHEME-TO-PHONEME CONVERSION USING ACOUSTIC DATA

Publication number: 20110251844

Abstract: Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.

Type: Application

Filed: June 20, 2011

Publication date: October 13, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Xiao Li, Asela J. R. Gunawardana, Alejandro Acero
Weight Coefficient Generation Device, Voice Recognition Device, Navigation Device, Vehicle, Weight Coefficient Generation Method, and Weight Coefficient Generation Program

Publication number: 20110231191

Abstract: A weight coefficient generation device, a speech recognition device, a navigation system, a vehicle, a vehicle coefficient generation method, and a weight coefficient generation program are provided for the purpose of improving a speech recognition performance of place names. In order to address the above purpose, an address database 12 has address information data items including country names, city names, street names, and house numbers, and manages the address information having a tree structure indicating hierarchical relationships between the place names from wide area to a narrow area. Each of the place names stored in the address database 12 is taken as a speech recognition candidate. A weight coefficient calculation unit 11 of a weight coefficient generation device 10 calculates a weight coefficient of the likelihood of the aforementioned recognition candidate based on the number of the street names belonging to the lower hierarchy below the city names.

Type: Application

Filed: November 17, 2009

Publication date: September 22, 2011

Inventor: Toshiyuki Miyazaki
AUTOMATIC SPEECH RECOGNITION BASED UPON INFORMATION RETRIEVAL METHODS

Publication number: 20110224982

Abstract: Described is a technology in which information retrieval (IR) techniques are used in a speech recognition (ASR) system. Acoustic units (e.g., phones, syllables, multi-phone units, words and/or phrases) are decoded, and features found from those acoustic units. The features are then used with IR techniques (e.g., TF-IDF based retrieval) to obtain a target output (a word or words).

Type: Application

Filed: March 12, 2010

Publication date: September 15, 2011

Applicant: c/o Microsoft Corporation

Inventors: Alejandro Acero, James Garnet Droppo, III, Xiaoqiang Xiao, Geoffrey G. Zweig
Method and apparatus for generating features through logical and functional operations

Patent number: 8019593

Abstract: Embodiments of a feature generation system and process for use in machine learning applications utilizing statistical modeling systems are described. In one embodiment, the feature generation process generates large feature spaces by combining features using logical, arithmetic and/or functional operations. A first set of features in an initial feature space are defined. Some or all of the first set of features are processed using one or more arithmetic, logic, user-defined combinatorial processes, or combinations thereof, to produce additional features. The additional features and at least some of the first set of features are combined to produce an expanded feature space. The expanded feature space is processed through a feature selection and optimization process to produce a model in a statistical modeling system.

Type: Grant

Filed: June 30, 2006

Date of Patent: September 13, 2011

Assignee: Robert Bosch Corporation

Inventors: Fuliang Weng, Zhe Feng, Qi Zhang
Method and apparatus for progressively selecting features from a large feature space in statistical modeling

Patent number: 8019594

Abstract: Embodiments of a progressive feature selection method that selects features in multiple rounds are described. In one embodiment, the progressive feature selection method splits the feature space into tractable sub-spaces such that a feature selection algorithm can be performed on each sub-space. In a merge-split operation, the subset of features that the feature selection algorithm selects from the different sub-spaces are merged into subsequent sets of features. Instead of re-generating the mapping table for each subsequent set from scratch, a new mapping table from the previous round's tables is created by collecting those entries that correspond to the selected features. The feature selection method is then performed again on each of the subsequent feature sets and new features are selected from each of these feature sets. This feature selection-merge-split process is repeated on successively smaller numbers of feature sets until a single final set of features is selected.

Type: Grant

Filed: June 30, 2006

Date of Patent: September 13, 2011

Assignee: Robert Bosch Corporation

Inventors: Fuliang Weng, Zhe Feng, Qi Zhang
METHOD AND DEVICE FOR CONTROLLING THE TRANSPORT OF AN OBJECT TO A PREDETERMINED DESTINATION

Publication number: 20110213611

Abstract: A method and a device control the transport of an object to a predetermined destination. The object is provided with information on a destination to which the object is to be transported. The destination information with which the object is provided is inputted into a speech detection station. A speech recognition system evaluates the destination information detected by the speech detection station. A conveying device transports the object. The destination, the information of which is provided to the object, is determined. The evaluation result of the speech recognition system is used to determine the destination. A release signal is produced. The release signal triggers two processes: the speech detection station is released for the input of destination information on another object. The conveying device transports the object. The transport of the object to the determined destination is triggered.

Type: Application

Filed: August 28, 2009

Publication date: September 1, 2011

Applicant: SIEMENS AKTIENGESELLSCHAFT

Inventor: Ingolf Rauh
VOICE AUTHENTICATION SYSTEM AND METHODS

Publication number: 20110213615

Abstract: A method for configuring a voice authentication system comprises ascertaining a measure of confidence associated with a voice sample enrolled with the authentication system. The measure of confidence is derived through simulated impostor testing carried out on the enrolled sample.

Type: Application

Filed: September 7, 2009

Publication date: September 1, 2011

Applicant: Auraya Pty Ltd

Inventors: Clive Summerfield, Habib E. Talhami
VOICE RECOGNIZING APPARATUS

Publication number: 20110208525

Abstract: A voice recognizing apparatus includes a voice start instructing section 3 for instructing to start voice recognition; a voice input section 1 for receiving uttered voice and converting to a voice signal; a voice recognizing section 2 for recognizing the voice on the basis of the voice signal; an utterance start time detecting section 4 for detecting duration from the time when the voice start instructing section instructs to the time when the voice input section delivers the voice signal; an utterance timing deciding section 5 for deciding utterance timing indicating whether the utterance start is quick or slow by comparing the duration detected by the utterance start time detecting section with a prescribed threshold; an interaction control section 6 for determining a content, which is to be shown when exhibiting a recognition result of the voice recognizing section, in accordance with the utterance timing decided; a system response generating section 7 for generating a system response on the basis of the d

Type: Application

Filed: March 27, 2008

Publication date: August 25, 2011

Inventors: Yuzuru Inoue, Tadashi Suzuki, Fumitaka Sato, Takayoshi Chikuri
Voice Activatable System for Providing the Correct Spelling of a Spoken Word

Publication number: 20110208527

Abstract: A voice activatable system for providing the correct spelling of a spoken word is disposed in an elongated body of a writing instrument such as a ball point pen. The system includes a microphone the output of which is fed to an amplifier analog to a digital converter and from there to a speech recognition program, the output of the speech recognition program is fed to a computer, namely a word processor/controller that includes a data base. The output of the speech recognition is compared with the digital library of words and when a match is found, it is amplified and fed to digital to analog connector. The output of the digital/analog computer is fed to a speaker that repeats the word with the correct pronunciation followed by a correct spelling of the word. The system includes a battery for powering the system as well as an on/off switch and a repeat button for repeating information from the system.

Type: Application

Filed: February 23, 2010

Publication date: August 25, 2011

Inventor: Fawzi Q. Behbehani
MULTI-DIMENSIONAL DISAMBIGUATION OF VOICE COMMANDS

Publication number: 20110184730

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing voice commands. In one aspect, a method includes receiving an audio signal at a server, performing, by the server, speech recognition on the audio signal to identify one or more candidate terms that match one or more portions of the audio signal, identifying one or more possible intended actions for each candidate term, providing information for display on a client device, the information specifying the candidate terms and the actions for each candidate term, receiving from the client device an indication of an action selected by a user, where the action was selected from among the actions included in the provided information, and invoking the action selected by the user.

Type: Application

Filed: January 22, 2010

Publication date: July 28, 2011

Applicant: GOOGLE INC.

Inventors: Michael J. LeBeau, William J. Byrne, Nicholas Jitkoff, Alexander H. Gruenstein
METHODS AND SYSTEMS FOR IDENTIFYING SPEECH SOUNDS USING MULTI-DIMENSIONAL ANALYSIS

Publication number: 20110178799

Abstract: Methods and systems of identifying speech sound features within a speech sound are provided. The sound features may be identified using a multi-dimensional analysis that analyzes the time, frequency, and intensity at which a feature occurs within a speech sound, and the contribution of the feature to the sound. Information about sound features may be used to enhance spoken speech sounds to improve recognizability of the speech sounds by a listener.

Type: Application

Filed: July 24, 2009

Publication date: July 21, 2011

Applicant: The Board of Trustees of the University of Illinois

Inventors: Jont B. Allen, Feipeng Li
APPARATUS, METHOD AND SYSTEM FOR GENERATING THRESHOLD FOR UTTERANCE VERIFICATION

Publication number: 20110161084

Abstract: Apparatus, method and system for generating a threshold for utterance verification are introduced herein. When a processing object is determined, a recommendation threshold is generated according to an expected utterance verification result. In addition, extra collection of corpuses or training models is not necessary for the utterance verification introduced here. The processing unit can be a recognition object or an utterance verification object. In the apparatus, method and system for generating a threshold for utterance verification, at least one of the processing objects is received and then a speech unit sequence is generated therefrom. One or more values corresponding to each of the speech unit of the speech unit sequence are obtained accordingly, and then a recommendation threshold is generated based on an expected utterance verification result.

Type: Application

Filed: June 24, 2010

Publication date: June 30, 2011

Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE

Inventors: Cheng-Hsien Lin, Sen-Chia Chang, Chi-Tien Chiu
METHOD AND APPARATUS FOR HANDLING A TELEPHONE CALL

Publication number: 20110125498

Abstract: One embodiment of the invention provides a computer-implemented method of handling a telephone call. The method comprises monitoring a conversation between an agent and a customer on a telephone line as part of the telephone call to extract the audio signal therefrom. Real-time voice analytics are performed on the extracted audio signal while the telephone call is in progress. The results from the voice analytics are then passed to a computer-telephony integration system responsible for the call for use by the computer-telephony integration system for determining future handling of the call.

Type: Application

Filed: June 19, 2009

Publication date: May 26, 2011

Applicant: NEWVOICEMEDIA LTD

Inventors: Richard Pickering, Joseph Moussalli, Ashley Unitt
BEHAVIOR RECOGNITION SYSTEM AND METHOD BY COMBINING IMAGE AND SPEECH

Publication number: 20110109539

Abstract: A behavior recognition system and method by combining an image and a speech are provided. The system includes a data analyzing module, a database, and a calculating module. A plurality of image-and-speech relation modules is stored in the database. Each image-and-speech relation module includes a feature extraction parameter and an image-and-speech relation parameter. The data analyzing module obtains a gesture image and a speech data corresponding to each other, and substitutes the gesture image and the speech data into each feature extraction parameter to generate image feature sequences and speech feature sequences. The data analyzing module uses each image-and-speech relation parameter to calculate image-and-speech status parameters.

Type: Application

Filed: December 9, 2009

Publication date: May 12, 2011

Inventors: Chung-Hsien Wu, Jen-Chun Lin, Wen-Li Wei, Chia-Te Chu, Red-Tom Lin, Chin-Shun Hsu
SYSTEM AND METHOD FOR VOICE RECOGNITION

Publication number: 20110093261

Abstract: Systems and methods are operable to associate each of a plurality of stored audio patterns with at least one of a plurality of digital tokens, identify a user based on user identification input, access a plurality of stored audio patterns associated with a user based on the user identification input, receive from a user at least one audio input from a custom language made up of custom language elements wherein the elements include at least one monosyllabic representation of a number, letter or word, select one of the plurality of stored audio patterns associated with the identified user, in the case that the audio input received from the identified user corresponds with one of the plurality of stored audio patterns, determine the digital token associated with the selected one of the plurality of stored audio patterns, and generate the output signal for use in a device based on the determined digital token.

Type: Application

Filed: October 15, 2010

Publication date: April 21, 2011

Inventor: Paul Angott
METHOD FOR THE CORRECTION OF MEASURED VALUES OF VOWEL NASALANCE

Publication number: 20110082697

Abstract: A method is described for correcting and improving the functioning of certain devices for the diagnosis and treatment of speech that dynamically measure the functioning of the velum in the control of nasality during speech. The correction method uses an estimate of the vowel frequency spectrum to greatly reduce the variation of nasalance with the vowel being spoken, so as to result in a corrected value of nasalance that reflects with greater accuracy the degree of velar opening. Correction is also described for reducing the effect on nasalance values of energy from the oral and nasal channels crossing over into the other channel because of imperfect acoustic separation.

Type: Application

Filed: October 6, 2009

Publication date: April 7, 2011

Applicant: Rothenberg Enterprises

Inventor: Martin ROTHENBERG
REAL-TIME DATA PATTERN ANALYSIS SYSTEM AND METHOD OF OPERATION THEREOF

Publication number: 20110082694

Abstract: A method for real-time data-pattern analysis. The method includes receiving and queuing at least one data-pattern analysis request by a data-pattern analysis unit controller. At least one data stream portion is also received and stored by the data-pattern analysis unit controller, each data stream portion corresponding to a received data-pattern analysis request. Next, a received data-pattern analysis request is selected by the data-pattern analysis unit controller along with a corresponding data stream portion. A data-pattern analysis is performed based on the selected data-pattern analysis request and the corresponding data stream portion, wherein the data-pattern analysis is performed by one of a plurality of data-pattern analysis units.

Type: Application

Filed: August 9, 2010

Publication date: April 7, 2011

Inventors: Richard FASTOW, Qamrul HASAN
SPEECH RECOGNITION MODULE AND APPLICATIONS THEREOF

Publication number: 20110077944

Abstract: A speech recognition module includes an acoustic front-end module, a sound detection module, and a word detection module. The acoustic front-end module generates a plurality of representations of frames from a digital audio signal and generates speech characteristic probabilities for the plurality of frames. The sound detection module determines a plurality of estimated utterances from the plurality of representations and the speech characteristic probabilities. The word detection module determines one or more words based on the plurality of estimated utterances and the speech characteristics probabilities.

Type: Application

Filed: November 30, 2009

Publication date: March 31, 2011

Applicant: BROADCOM CORPORATION

Inventor: Nambirajan Seshadri
COMBINED LIP READING AND VOICE RECOGNITION MULTIMODAL INTERFACE SYSTEM

Publication number: 20110071830

Abstract: The present invention provides a combined lip reading and voice recognition multimodal interface system, which can issue a navigation operation instruction only by voice and lip movements, thus allowing a driver to look ahead during a navigation operation and reducing vehicle accidents related to navigation operations during driving.

Type: Application

Filed: December 1, 2009

Publication date: March 24, 2011

Applicants: HYUNDAI MOTOR COMPANY, KIA MOTORS CORPORATION

Inventors: Dae Hee Kim, Dai-Jin Kim, Jin Lee, Jong-Ju Shin, Jin-Seok Lee

1 2 3 next