Speech Recognition (epo) Patents (Class 704/E15.001)

E Subclasses

Assessment or evaluation of speech recognition systems (epo) (Class 704/E15.002)

Language recognition (epo) (Class 704/E15.003)

Feature extraction for speech recognition; selection of recognition unit (epo) (Class 704/E15.004)

Segmentation or word limit detection (epo) (Class 704/E15.005)

Word boundary detection (EPO) (Class 704/E15.006)

Creation of reference templates; training of speech recognition systems, e.g., adaption to the characteristics of the speaker's voice, etc. (epo) (Class 704/E15.007)

Speech classification or search (epo) (Class 704/E15.014)

Speech recognition techniques for robustness in adverse environments, e.g., in noise, of stress induced speech, etc. (epo) (Class 704/E15.039)

Procedures used during a speech recognition process, e.g., man-machine dialogue, etc. (epo) (Class 704/E15.04)

Speech recognition using nonacoustical features, e.g., position of the lips, etc. (epo) (Class 704/E15.041)

Using position of the lips, movement of the lips, or face analysis (EPO) (Class 704/E15.042)

Speech to text systems (epo) (Class 704/E15.043)

Constructional details of speech recognition systems (epo) (Class 704/E15.046)

System and Method for Controlling the Operation of a Device by Voice Commands

Publication number: 20090043580

Abstract: The present invention includes a speech recognition system comprising a light element, a power control switch, the power control switch varying the power delivered to the light element, a controller, a microphone, a speech recognizer coupled to the microphone for recognizing speech input signals and transmitting recognition results to the controller, and a speech synthesizer coupled to the controller for generating synthesized speech, wherein the controller varies the power to the light element in accordance with the recognition results received from the speech recognizer. Embodiments of the invention may alternatively include a low power wake up circuit. In another embodiment, the present invention is a method of controlling a device by voice commands.

Type: Application

Filed: July 24, 2008

Publication date: February 12, 2009

Applicant: Sensory, Incorporated

Inventors: Todd F. Mozer, Forrest S. Mozer, Erich B. Adams
TARGET SPECIFIC DATA FILTER TO SPEED PROCESSING

Publication number: 20090043579

Abstract: A method is presented which reduces data flow and thereby increases processing capacity while preserving a high level of accuracy in a distributed speech processing environment for speaker detection. The method and system of the present invention includes filtering out data based on a target speaker specific subset of labels using data filters. The method preserves accuracy and passes only a fraction of the data by optimizing target specific performance measures. Therefore, a high level of speaker recognition accuracy is maintained while utilizing existing processing capabilities.

Type: Application

Filed: April 2, 2008

Publication date: February 12, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Upendra V. Chaudhari, Juan M. Huerta, Ganesh N. Ramaswamy, Olivier Verscheure
Methods and apparatus relating to searching of spoken audio data

Publication number: 20090043581

Abstract: This invention relates to a method of searching spoken audio data for one or more search terms comprising performing a phonetic search of the audio data to identify likely matches to a search term and producing textual data corresponding to a portion of the spoken audio data including a likely match. An embodiment of the method comprises the steps of taking phonetic index data corresponding to the spoken audio data, searching the phonetic index data for likely matches to the search term, wherein when a likely match is detected a portion of the spoken audio data or phonetic index data is selected which includes the likely match and said selected portion of the spoken audio data or phonetic index data is processed using a large vocabulary speech recogniser. The large vocabulary speech recogniser may derive textual data which can be used for further processing or may be used to present a transcript to a user.

Type: Application

Filed: August 7, 2008

Publication date: February 12, 2009

Applicant: AURIX LIMITED

Inventors: Martin G. Abbott, Keith M. Ponting
CONTROL AND CONFIGURATION OF A SPEECH RECOGNIZER BY WORDSPOTTING

Publication number: 20090037176

Abstract: A wordspotting system is applied to a speech source in a preliminary processing phase. The putative hits corresponding to queries (e.g., keywords, key phrases, or more complex queries that may include Boolean expressions and proximity operators) are used to control a speech recognizer. The control can include one or more of application of a time specification that is determined from the putative hits for selecting an interval of the speech source to which to apply the speech recognizer; application of a grammar specification determined from the putative hits that is used by the speech recognizer, and application of a specification of a lattice or pruning specification that is used by the recognizer to limit or guide the recognizer in recognition of the speech source.

Type: Application

Filed: August 1, 2008

Publication date: February 5, 2009

Applicant: Nexidia Inc.

Inventor: Jon A. Arrowood
SYSTEM AND METHOD TO EVALUATE AN AUDIO CONFIGURATION

Publication number: 20090034750

Abstract: A system (100) and method (300) for evaluating an audio configuration is provided. The system can include a media device (110) that receives audio (121) from an application (115) and transmits the audio, and a media system (150) that receives the audio, and plays the audio out of a speaker (170) to produce external audio (156). The media device can capture the external audio from a microphone (145) to produce captured audio (146), perform pattern matching (125) on the audio signal and the captured audio signal in real-time, and present a notification (200) identifying whether audio sourced by the media device is rendered by the media system.

Type: Application

Filed: July 31, 2007

Publication date: February 5, 2009

Applicant: MOTOROLA, INC.

Inventors: RAMY AYOUB, GARY LEE CHRISTOPHER, BRIAN SIBILSKY
Understanding spoken location information based on intersections

Publication number: 20090037174

Abstract: In one embodiment, the present system recognizes a user's speech input using an automatically generated probabilistic context free grammar for street names that maps all pronunciation variations of a street name to a single canonical representation during recognition. A tokenizer expands the representation using position-dependent phonetic tokens and an intersection classifier classifies an intersection, despite the presence of recognition errors and incomplete street names.

Type: Application

Filed: July 31, 2007

Publication date: February 5, 2009

Applicant: Microsoft Corporation

Inventors: Michael L. Seltzer, Yun-Cheng Ju, Ivan J. Tashev
Real-time voice transcription system

Publication number: 20090037171

Abstract: The real-time voice transcription system provides a speech recognition system and method that includes use of speech and spatial-temporal acoustic data to enhance speech recognition probabilities while simultaneously identifying the speaker. Real-time edit capability is provided enabling a user to train the system during a transcription session. The system may be connected to user computers via local network and/or wide area network means.

Type: Application

Filed: August 4, 2008

Publication date: February 5, 2009

Inventors: Tim J. McFarland, Vasudevan C. Gurunathan
USING SPEAKER IDENTIFICATION AND VERIFICATION SPEECH PROCESSING TECHNOLOGIES TO ACTIVATE A PAYMENT CARD

Publication number: 20090037173

Abstract: The present invention discloses a payment card that uses speaker identification and verification (SIV) speech processing techniques for activation purposes. For example, the invention can initially identify a payment card in a deactivated state, which is an internal state of the payment card. Speech input can then be received. Speech characteristics of the speech input can be determined and compared against a voice print of an authorized card user. The payment card can be selectively activated based on comparison results. That is, when the voice print and the speech characteristics match, the payment card can be activated. Otherwise, the card will remain deactivated. An activated payment card is one that has undergone an internal state change from the deactivated state. For example, when activated a credit card number can appear in a display and a magnetic strip can contain payment information, neither of which are present in the deactivated state.

Type: Application

Filed: August 2, 2007

Publication date: February 5, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: JOSEPH A. HANSEN
USING AN UNSTRUCTURED LANGUAGE MODEL ASSOCIATED WITH AN APPLICATION OF A MOBILE COMMUNICATION FACILITY

Publication number: 20090030691

Abstract: A user may control a mobile communication facility through recognized speech provided to the mobile communication facility. Speech that is recorded by a user using a mobile communication facility resident capture facility. A speech recognition facility generates results of the recorded speech using an unstructured language model based at least in part on information relating to the recording. An application resident on the mobile communications facility is identified, wherein the resident application is capable of taking the results generated by the speech recognition facility as an input. The generated results are input to the application.

Type: Application

Filed: August 1, 2008

Publication date: January 29, 2009

Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu
USING CONTEXTUAL INFORMATION FOR DELIVERING RESULTS GENERATED FROM A SPEECH RECOGNITION FACILITY USING AN UNSTRUCTURED LANGUAGE MODEL

Publication number: 20090030697

Abstract: A user may control a mobile communication facility through recognized speech provided to the mobile communication facility. Speech that is recorded by a user using a mobile communication facility resident capture facility. A speech recognition facility generates results of the recorded speech using an unstructured language model based at least in part on information relating to the recording. Determining a context of the mobile communications facility at the time speech is recorded, and based on the context, delivering the generated results to a facility for performing an action on the mobile communication facility.

Type: Application

Filed: August 1, 2008

Publication date: January 29, 2009

Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu
USING SPEECH RECOGNITION RESULTS BASED ON AN UNSTRUCTURED LANGUAGE MODEL IN A MOBILE COMMUNICATION FACILITY APPLICATION

Publication number: 20090030684

Abstract: A method and system for entering information into a software application resident on a mobile communication facility is provided. The method and system may include recording speech presented by a user using a mobile communication facility resident capture facility, transmitting the recording through a wireless communication facility to a speech recognition facility, transmitting information relating to the software application to the speech recognition facility, generating results utilizing the speech recognition facility using an unstructured language model based at least in part on the information relating to the software application and the recording, transmitting the results to the mobile communications facility, loading the results into the software application and simultaneously displaying the results as a set of words and as a set of application results based on those words.

Type: Application

Filed: August 1, 2008

Publication date: January 29, 2009

Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu, Alexandra Beth Mischke
ADAPTING AN UNSTRUCTURED LANGUAGE MODEL SPEECH RECOGNITION SYSTEM BASED ON USAGE

Publication number: 20090030687

Abstract: A user may control a mobile communication facility through recognized speech provided to the mobile communication facility. Speech that is recorded by a user using a mobile communication facility resident capture facility is transmitted through a wireless communication facility to a speech recognition facility. The speech recognition facility generates results using an unstructured language model based at least in part on information relating to the recording. The results are transmitted to the mobile communications facility where an action is performed on the mobile communication facility based on the results and adapting the speech recognition facility based on usage.

Type: Application

Filed: August 1, 2008

Publication date: January 29, 2009

Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu, Alexandra Beth Mischke, Christopher Michael Micali
Mobile voice recognition data collection and processing

Publication number: 20090030689

Abstract: Voice recognition methods, systems and interfaces are used to collect data and produce databases that are then searched and used to produce reports or electronic filings. The databases are developed using a hierarchically designed command structure and a hierarchy of relational databases for the entry and recognition of voice commands. The invention uses an Adaptive Grammar that allows a very high probability for accurate recognition and a rapid recognition response to be achieved. The invention allows for multiple users and multiple mobile computers to maximize voice recognition capabilities.

Type: Application

Filed: October 3, 2006

Publication date: January 29, 2009

Inventors: Vincent Perrin, Judi Perrin, Michael Joost, Kevin Wood
SPEECH ANALYSIS APPARATUS, SPEECH ANALYSIS METHOD AND COMPUTER PROGRAM

Publication number: 20090030690

Abstract: A speech analysis apparatus analyzing prosodic characteristics of speech information and outputting a prosodic discrimination result includes an input unit inputting speech information, an acoustic analysis unit calculating relative pitch variation and a discrimination unit performing speech discrimination processing, in which the acoustic analysis unit calculates a current template relative pitch difference, determining whether a difference absolute value between the current template relative pitch difference and a previous template relative pitch difference is equal to or less than a predetermined threshold or not, when the value is not less than the threshold, calculating an adjacent relative pitch difference, and when the adjacent relative pitch difference is equal to or less than a previously set margin value, executing correction processing of adding or subtracting an octave of the current template relative pitch difference to calculate the relative pitch variation by applying the relative pitch differe

Type: Application

Filed: July 21, 2008

Publication date: January 29, 2009

Inventor: Keiichi YAMADA
SYSTEM AND METHOD FOR TRACKING DIALOGUE STATES USING PARTICLE FILTERS

Publication number: 20090030683

Abstract: Disclosed are methods, systems, and computer-readable media for tracking dialog states in a spoken dialog system. The method comprises casting a plurality of dialog states, or particles, as a network describing the probability relationships between each of a plurality of variables, sampling a subset of the plurality of dialog states, or particles, in the network, for each sampled dialog state, or particle, projecting into the future, assigning a weight to each sampled particle, and normalizing the assigned weights to yield a new estimated distribution over each variable's values, wherein the distribution over the variables is used in a spoken dialog system. Also disclosed is a method of tuning performance of the methods, systems, and computer-readable media by adding or removing particles to/from the network.

Type: Application

Filed: July 26, 2007

Publication date: January 29, 2009

Applicant: AT&T Labs, Inc

Inventor: Jason WILLIAMS
Method and system for computing or determining confidence scores for parse trees at all levels

Publication number: 20090030686

Abstract: In a confidence computing method and system, a processor may interpret speech signals as a text string or directly receive a text string as input, generate a syntactical parse tree representing the interpreted string and including a plurality of sub-trees which each represents a corresponding section of the interpreted text string, determine for each sub-tree whether the sub-tree is accurate, obtain replacement speech signals for each sub-tree determined to be inaccurate, and provide output based on corresponding text string sections of at least one sub-tree determined to be accurate.

Type: Application

Filed: July 27, 2007

Publication date: January 29, 2009

Inventors: Fuliang Weng, Feng Lin, Zhe Feng
USING SPEECH RECOGNITION RESULTS BASED ON AN UNSTRUCTURED LANGUAGE MODEL WITH A NAVIGATION SYSTEM

Publication number: 20090030685

Abstract: Speech recorded by an audio capture facility of a navigation facility is processed by a speech recognition facility to generate results that are provided to the navigation facility. When information related to a navigation application running on the navigation facility are provided to the speech recognition facility, the results generated are based at least in part on the application related information. The speech recognition facility uses an unstructured language model for generating results. The user of the navigation facility may optionally be allowed to edit the results being provided to the navigation facility. The speech recognition facility may also adapt speech recognition based on usage of the results.

Type: Application

Filed: August 1, 2008

Publication date: January 29, 2009

Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu, Yongdeng Chen
TAGGING SPEECH RECOGNITION RESULTS BASED ON AN UNSTRUCTURED LANGUAGE MODEL FOR USE IN A MOBILE COMMUNICATION FACILITY APPLICATION

Publication number: 20090030688

Abstract: Entering information into a software application resident on a mobile communication facility comprises recording speech presented by a user using a mobile communication facility resident capture facility, transmitting the recording through a wireless communication facility to a speech recognition facility, transmitting information relating to the software application to the speech recognition facility, generating results utilizing the speech recognition facility using an unstructured language model based at least in part on the information relating to the software application and the recording, tagging the results with information about the words in the results, transmitting the results and tags to the mobile communications facility, and loading the results and tags into the software application.

Type: Application

Filed: August 1, 2008

Publication date: January 29, 2009

Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu, Doron Gan, Eric H. Thayer
System And Method For Hazard Mitigation In Voice-Driven Control Applications

Publication number: 20090030695

Abstract: A speech recognition and control system including a sound card for receiving speech and converting the speech into digital data, the sound card removably connected to an input of a computer, recognizer software executing on the computer for interpreting at least a portion of the digital data, event detection software executing on the computer for detecting connectivity of the sound card, and command control software executing on the computer for generating a command based on at least one of the digital data and the connectivity of the sound card.

Type: Application

Filed: April 18, 2008

Publication date: January 29, 2009

Inventors: Gang Wang, Chengyi Zheng, Heinz-Werner Stiller, Matteo Contolini
PROCESS AND ARRANGEMENT FOR AUTHENTICATING A USER OF FACILITIES, A SERVICE, A DATABASE OR A DATA NETWORK

Publication number: 20090025071

Abstract: A process for authenticating a user to control remote access to a service, data base or data network is provided, in which during an enrolment step, an initial voice sample provided by the user is analyzed to obtain an initial user-specific voice profile and, in a later verification step, a current voice sample of the user is analyzed and compared to the initial voice profile to generate an access control signal. An additional user-dedicated authentication is generated in a pre-enrolment period, and the additional authentication is used to authenticate the user in the enrolment step and/or in an access control step prior to and independent on the enrolment step, in a provisional or supplementary authentication procedure.

Type: Application

Filed: January 7, 2008

Publication date: January 22, 2009

Applicant: VOICE.TRUST AG

Inventors: Marc Mumm, Rajasekharan Kuppuswamy
Multi-Class Constrained Maximum Likelihood Linear Regression

Publication number: 20090024390

Abstract: A method of speech recognition converts an unknown speech input into a stream of representative features. The feature stream is transformed based on speaker dependent adaptation of multi-class feature models. Then automatic speech recognition is used to compare the transformed feature stream to multi-class speaker independent acoustic models to generate an output representative of the unknown speech input.

Type: Application

Filed: May 2, 2008

Publication date: January 22, 2009

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Neeraj Deshmukh, Puming Zhan
VOICE-ENABLED WEB PORTAL SYSTEM

Publication number: 20090024720

Abstract: A voice-enabled web portal system includes a web portal server and a call manager software module. The web portal server is operable to download data according to parameters of a web portal. The call manager software module is operable to accept a voice query from a user via telephone, to retrieve a portion of the downloaded data in response to the voice query, and to provide the portion of downloaded data to the user via telephone. A method of providing information from a web portal includes downloading data via a web portal according to parameters of the web portal at a predetermined time interval, filtering the downloaded data to produce portal information, and selectively providing portions of the portal information in response to a voice query.

Type: Application

Filed: July 21, 2008

Publication date: January 22, 2009

Inventors: Fakhreddine Karray, Jiping Sun
Personal Virtual Assistant

Publication number: 20090018834

Abstract: A computer-based virtual assistant includes a virtual assistant application running on a computer capable of receiving human voice communications from a user of a remote user interface and transmitting a vocalization to the remote user interface, the virtual assistant application enabling the user to access email and voicemail messages of the user, the virtual assistant application selecting a responsive action to a verbal query or instruction received from the remote user interface and transmitting a vocalization characterizing the selected responsive action to the remote user interface, and the virtual assistant waiting a predetermined period of time, and if no canceling indication is received from the remote user interface, proceeding to perform the selected responsive action, and if a canceling indication is received from the remote user interface halting the selected responsive action and transmitting a new vocalization to the remote user interface. Also a method of using the virtual assistant.

Type: Application

Filed: September 23, 2008

Publication date: January 15, 2009

Inventors: Robert S. Cooper, Derek Sanders, Richard M. Ulmer
Speech Recognition Apparatus and Speech Recognition Method

Publication number: 20090018831

Abstract: Voices are prevented from being recognized with poor accuracy when a speaker is not close to a sound pickup device. A speech recognition apparatus (10) includes a sound pickup device (16) picking up sounds, a photographing device (18) photographing images of a speaker making voices into the sound pickup device (16), a voice recognition function unit (132) recognizing voices on the basis of the picked-up voices, and a recognition/learning determination unit (142) restricting the voice recognition function unit (132) from recognizing voices when the photographed images do not contain speaker images showing at least part of the speaker.

Type: Application

Filed: December 19, 2005

Publication date: January 15, 2009

Applicant: KYOCERA CORPORATION

Inventor: Kugo Morita
INFORMATION COMMUNICATION TERMINAL, INFORMATION COMMUNICATION SYSTEM, INFORMATION COMMUNICATION METHOD, INFORMATION COMMUNICATION PROGRAM, AND RECORDING MEDIUM RECORDING THEREOF

Publication number: 20090018832

Abstract: An information communication terminal (100) that includes: a speech recognition module (6) for recognizing speech information to identify a plurality of words in the recognized speech information; a storage medium (20) for storing keyword extraction condition setting data (24) in which a condition for extracting a keyword is set; a keyword extraction module (8) for reading the keyword extraction condition setting data (24) to extract a plurality of keywords from the plurality of words; a related information acquisition module (11) for acquiring related information related to a plurality of keywords; and a related information output module (14) for providing related information to a monitor (2).

Type: Application

Filed: February 8, 2006

Publication date: January 15, 2009

Inventors: Takeya Mukaigaito, Shinya Takada, Daigoro Yokozeki, Miki Sakai, Rie Sakai, Katsuya Arai, Takuo Nishihara, Takahiko Murayama
Personal Virtual Assistant

Publication number: 20090018835

Abstract: A computer-based virtual assistant includes a virtual assistant application running on a computer capable of receiving human voice communications from a user of a remote user interface and transmitting a vocalization to the remote user interface, the virtual assistant application enabling the user to access email and voicemail messages of the user, the virtual assistant application selecting a responsive action to a verbal query or instruction received from the remote user interface and transmitting a vocalization characterizing the selected responsive action to the remote user interface, and the virtual assistant waiting a predetermined period of time, and if no canceling indication is received from the remote user interface, proceeding to perform the selected responsive action, and if a canceling indication is received from the remote user interface halting the selected responsive action and transmitting a new vocalization to the remote user interface. Also a method of using the virtual assistant.

Type: Application

Filed: September 23, 2008

Publication date: January 15, 2009

Inventors: Robert S. Cooper, Derek Sanders, Richard M. Ulmer
Speech Recognition Dialog Management

Publication number: 20090018829

Abstract: Described is a speech recognition dialog management system that allows more open-ended conversations between virtual agents and people than are possible using just agent-directed dialogs. The system uses both novel dialog context switching and learning algorithms based on spoken interactions with people. The context switching is performed through processing multiple dialog goals in a last-in-first-out (LIFO) pattern. The recognition accuracy for these new flexible conversations is improved through automated learning from processing errors and addition of new grammars.

Type: Application

Filed: June 8, 2005

Publication date: January 15, 2009

Applicant: METAPHOR SOLUTIONS, INC.

Inventor: Michael Kuperstein
Emission monitoring display device

Publication number: 20090018720

Abstract: The present invention provides an emission monitoring device for a vehicle. The device comprises a lightweight housing for operationally encapsulating a processor, memory; a sensor for each vehicle component to be monitored, a display device; and a power supply for power the device. The processor is programmed to monitor each sensor, capture data from each sensor, store the captured data from each sensor in the memory; and display the captured onto the display device. In one embodiment of the present invention the housing is operationally mounted onto the inner gas lid of the vehicle. In an alternative embodiment, the housing is incorporated into the dashboard.

Type: Application

Filed: May 8, 2006

Publication date: January 15, 2009

Inventor: Lee Bernard
SPEECH RECOGNITION SYSTEM

Publication number: 20090012792

Abstract: A speech recognition system is provided for selecting, via a speech input, an item from a list of items. The speech recognition system detects a first speech input, recognizes the first speech input, compares the recognized first speech input with the list of items and generates a first candidate list of best matching items based on the comparison result. The system then informs the speaker of at least one of the best matching items of the first candidate list for a selection of an item by the speaker. If the intended item is not one of the best matching items presented to the speaker, the system then detects a second speech input, recognizes the second speech input, and generates a second candidate list of best matching items taking into account the comparison result obtained with the first speech input.

Type: Application

Filed: December 12, 2007

Publication date: January 8, 2009

Applicant: Harman Becker Automotive Systems GmbH

Inventors: Andreas Low, Lars Konig, Christian Hillebrecht
DIALOG PROCESSING SYSTEM, DIALOG PROCESSING METHOD AND COMPUTER PROGRAM

Publication number: 20090012787

Abstract: A dialog processing system which includes a target expression data extraction unit for extracting a plurality of target expression data each including a pattern matching portion which matches an utterance pattern, which are inputted by an utterance pattern input unit and is an utterance structure derived from contents of field-independent general conversations, among a plurality of utterance data which are inputted by an utterance data input unit and obtained by converting contents of a plurality of conversations in one field; a feature extraction unit for retrieving the pattern matching portions, respectively, from the plurality of target expression data extracted, and then for extracting feature quantity common to the plurality of pattern matching portions; and a mandatory data extraction unit for extracting mandatory data in the one field included in the plurality of utterance data by use of the feature quantities extracted.

Type: Application

Filed: July 3, 2008

Publication date: January 8, 2009

Inventors: Nobuyasu Itoh, Shiho Negishi, Hironori Takeuchi
SPEECH RECOGNITION APPARATUS AND CONTROL METHOD THEREOF

Publication number: 20090012790

Abstract: A speech recognition apparatus which improves the sound quality of speech output as a speech recognition result is provided. The speech recognition apparatus includes a recognition unit, which recognizes speech based on a recognition dictionary, and a registration unit, which registers a dictionary entry of a new recognition word in the recognition dictionary. The recognition unit includes a generation unit, which generates a dictionary entry including speech of the new recognition word item and feature parameters of the speech, and a modification unit, which makes a modification for improving the sound quality of the speech included in the dictionary entry generated by the generation unit. The recognition unit includes a speech output unit, which outputs speech which is included in a dictionary entry corresponding to the recognition result of input speech, and is modified by the modification unit.

Type: Application

Filed: July 1, 2008

Publication date: January 8, 2009

Applicant: CANON KABUSHIKI KAISHA

Inventors: Masayuki Yamada, Toshiaki Fukada, Yasuo Okutani, Michio Aizawa
REFERENCE PATTERN ADAPTATION APPARATUS, REFERENCE PATTERN ADAPTATION METHOD AND REFERENCE PATTERN ADAPTATION PROGRAM

Publication number: 20090012791

Abstract: A method and apparatus for carrying out adaptation using input speech data information even at a low reference pattern recognition performance. A reference pattern adaptation device 2 includes a speech recognition section 18, an adaptation data calculating section 19 and a reference pattern adaptation section 20. The speech recognition section 18 calculates a recognition result teacher label from the input speech data and the reference pattern. The adaptation data calculating section 19 calculates adaptation data composed of a teacher label and speech data. The adaptation data is composed of the input speech data and the recognition result teacher label corrected for adaptation by the recognition error knowledge which is the statistical information of the tendency towards recognition errors of the reference pattern. The reference pattern adaptation section 20 adapts the reference pattern using the adaptation data to generate an adaptation pattern.

Type: Application

Filed: February 16, 2007

Publication date: January 8, 2009

Applicant: NEC Corporation

Inventor: Yoshifumi Onishi
SAMPLING RATE INDEPENDENT SPEECH RECOGNITION

Publication number: 20090012785

Abstract: A sampling-rate-independent method of automated speech recognition (ASR). Speech energies of a plurality of codebooks generated from training data created at an ASR sampling rate are compared to speech energies in a current frame of acoustic data generated from received audio created at an audio sampling rate below the ASR sampling rate. A codebook is selected from the plurality of codebooks, and has speech energies that correspond to speech energies in the current frame over a spectral range corresponding to the audio sampling rate. Speech energies above the spectral range are copied from the selected codebook and appended to the current frame.

Type: Application

Filed: July 3, 2007

Publication date: January 8, 2009

Applicant: GENERAL MOTORS CORPORATION

Inventor: Rathinavelu Chengalvarayan
SIGN LANGUAGE TRANSLATION SYSTEM

Publication number: 20090012788

Abstract: The translation system of a preferred embodiment includes an input element that receives an input language as audio information, an output element that displays an output language as visual information, and a remote server coupled to the input element and the output element, the remote server including a database of sign language images; and a processor that receives the input language from the input element, translates the input language into the output language, and transmits the output language to the output element, wherein the output language is a series of the sign language images that correspond to the input language and that are coupled to one another with substantially seamless continuity, such that the ending position of a first image is blended into the starting position of a second image.

Type: Application

Filed: July 3, 2008

Publication date: January 8, 2009

Inventors: Jason Andre Gilbert, Shau-yuh YU
Depicting a speech user interface via graphical elements

Publication number: 20090006099

Abstract: Depiction of a speech user interface via graphical elements is provided. One or more bits of a graphical user interface bitmask are re-designated as speech bits. When a software application processes the re-designated speech bits, a window manager responsible for generating and rendering a graphical user interface for the application passes information to a secondary window manager responsible for generating and rendering a speech user interface. The secondary speech window manager may load a text-to-speech engine, a speech recognizer engine, a lexicon or library of recognizable words or phrases and a set of “grammars” (recognizable words and phrasing) for building a speech user interface that will receive, recognize and act on spoken input to the associated software application.

Type: Application

Filed: June 29, 2007

Publication date: January 1, 2009

Applicant: Microsoft Corporation

Inventors: Timothy D. Sharpe, Cameron A. Etezadi
SYNCHRONIZATION OF AN INPUT TEXT OF A SPEECH WITH A RECORDING OF THE SPEECH

Publication number: 20090006087

Abstract: A method and system for synchronizing words in an input text of a speech with a continuous recording of the speech. A received input text includes previously recorded content of the speech to be reproduced. A synthetic speech corresponding to the received input text is generated. Ratio data including a ratio between the respective pronunciation times of words included in the received text in the generated synthetic speech is computed. The ratio data is used to determine an association between erroneously recognized words of the received text and a time to reproduce each erroneously recognized word. The association is outputted in a recording medium and/or displayed on a display device.

Type: Application

Filed: June 25, 2008

Publication date: January 1, 2009

Inventors: Noriko Imoto, Tetsuya Uda, Takatoshi Watanabe
VOICE-BASED SEARCH PROCESSING

Publication number: 20090006345

Abstract: Architecture for completing search queries by using artificial intelligence based schemes to infer search intentions of users. Partial queries are completed dynamically in real time. Additionally, search aliasing can also be employed. Custom tuning can be performed based on at least query inputs in the form of text, graffiti, images, handwriting, voice, audio, and video signals. Natural language processing occurs, along with handwriting recognition and slang recognition. The system includes a classifier that receives a partial query as input, accesses a query database based on contents of the query input, and infers an intended search goal from query information stored on the query database. A query formulation engine receives search information associated with the intended search goal and generates a completed formal query for execution.

Type: Application

Filed: June 28, 2007

Publication date: January 1, 2009

Applicant: MICROSOFT CORPORATION

Inventors: John C. Platt, Gary W. Flake, Ramez Naam, Anoop Gupta, Oliver Hurst-Hiller, Trenholme J. Griffin
Speech Recognition Language Model Making System, Method, and Program, and Speech Recognition System

Publication number: 20090006092

Abstract: [PROBLEMS] To provide a speech recognition language model making system for making a speech recognition language model so as to recognize a meaningful speech necessary for application of speech recognition, such as a speech in conversation at a call center. [MEANS FOR SOLVING PROBLEMS] A speech recognition language model making system (1) comprises a probability estimating device (11), a language model learning corpus storage device (14), and a learning corpus emphasizing device (12). The learning corpus emphasizing device (12) emphasizes a prescribed part of the learning corpus to create an emphasized learning corpus. The probability estimating device (11) operates to make a speech recognition language model by estimating the probability value of a language model by the emphasized learning corpus.

Type: Application

Filed: December 26, 2006

Publication date: January 1, 2009

Inventors: Kiyokazu Miki, Kentarou Nagamoto
SYSTEM AND METHOD OF PERFORMING SPEECH RECOGNITION BASED ON A USER IDENTIFIER

Publication number: 20090006088

Abstract: Speech recognition models are dynamically re-configurable based on user information, application information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. Word recognition lattices are generated for each data field of an application and dynamically concatenated into a single word recognition lattice. A language model is applied to the concatenated word recognition lattice to determine the relationships between the word recognition lattices and repeated until the generated word recognition lattices are acceptable or differ from a predetermined value only by a threshold amount. These techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.

Type: Application

Filed: September 9, 2008

Publication date: January 1, 2009

Applicant: AT&T Corp.

Inventors: Bojana Gajic, Shrikanth Sambasivan Narayanan, Sarangarajan Parthasarathy, Richard Cameron Rose, Aaron Edward Rosenberg
SPOKEN MAN-MACHINE INTERFACE WITH SPEAKER IDENTIFICATION

Publication number: 20080319747

Abstract: The method of operating a man-machine interface unit includes classifying at least one utterance of a speaker to be of a first type or of a second type. If the utterance is classified to be of the first type, the utterance belongs to a known speaker of a speaker data base, and if the utterance is classified to be of the second type, the utterance belongs to an unknown speaker that is not included in the speaker data base. The method also includes storing a set of utterances of the second type, clustering the set of utterances into clusters, wherein each cluster comprises utterances having similar features, and automatically adding a new speaker to the speaker data base based on utterances of one of the clusters.

Type: Application

Filed: August 20, 2008

Publication date: December 25, 2008

Applicant: Sony Deutschland GmbH

Inventors: Ralf Kompe, Thomas Kemp
SYSTEMS AND METHODS FOR RESPONDING TO NATURAL LANGUAGE SPEECH UTTERANCE

Publication number: 20080319751

Abstract: Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcomes the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command.

Type: Application

Filed: July 7, 2008

Publication date: December 25, 2008

Inventors: Robert A. Kennewick, David Locke, Michael R. Kennewick, SR., Michael R. Kennewick, JR., Richard Kennewick, Tom Freeman
METHOD AND DEVICE FOR PROVIDING SPEECH-TO-TEXT ENCODING AND TELEPHONY SERVICE

Publication number: 20080319745

Abstract: A machine-readable medium and a network device are provided for speech-to-text translation. Speech packets are received at a broadband telephony interface and stored in a buffer. The speech packets are processed and textual representations thereof are displayed as words on a display device. Speech processing is activated and deactivated in response to a command from a subscriber.

Type: Application

Filed: August 28, 2008

Publication date: December 25, 2008

Applicant: AT&T Corp.

Inventors: Charles David Caldwell, John Bruce Harlow, Robert J. Sayko, Norman Shaye
SYSTEM AND METHOD FOR IMPROVING ROBUSTNESS OF SPEECH RECOGNITION USING VOCAL TRACT LENGTH NORMALIZATION CODEBOOKS

Publication number: 20080319741

Abstract: Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.

Type: Application

Filed: June 20, 2007

Publication date: December 25, 2008

Applicant: AT&T Corp.

Inventor: Mazin Gilbert
METHOD AND SYSTEM FOR RAPID TRANSCRIPTION

Publication number: 20080319744

Abstract: A method and system for producing and working with transcripts according to the invention eliminates the foregoing time inefficiencies. By dispersing a source recording to a transcription team in small segments, so that team members transcribe segments in parallel, a rapid transcription process delivers a fully edited transcript within minutes. Clients can view accurate, grammatically correct, proofread and fact-checked documents that shadow live proceedings by mere minutes. The rapid transcript includes time coding, speaker identification and summary. A viewer application allows a client to view a video recording side-by-side with a transcript. Clicking on a word in the transcript locates the corresponding recorded content; advancing a recording to a particular point locates and displays the corresponding spot in the transcript. The recording is viewed using common video features, and may be downloaded. The client can edit the transcript and insert comments.

Type: Application

Filed: May 27, 2008

Publication date: December 25, 2008

Inventor: Adam Michael Goldberg
GENERIC SPELLING MNEMONICS

Publication number: 20080319749

Abstract: A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new Language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.

Type: Application

Filed: July 11, 2008

Publication date: December 25, 2008

Applicant: MICROSOFT CORPORATION

Inventors: David Mowatt, Robert Chambers, Ciprian Chelba, Qiang Wu
Conversation System and Conversation Software

Publication number: 20080319748

Abstract: A first domain satisfying a first condition concerning a current utterance understanding result and a second domain satisfying a second condition concerning a selection history are specified. For each of the first and second domains, indices representing reliability in consideration of the utterance understanding history, selection history, and utterance generation history are evaluated. Based on the evaluation results, one of the first, second, and third domains is selected as a current domain according to a selection rule.

Type: Application

Filed: January 31, 2007

Publication date: December 25, 2008

Inventors: Mikio Nakano, Hiroshi Tsujino, Yohane Takeuchi, Kazunori Komatani, Hiroshi Okuno
SPEECH-TO-SPEECH GENERATION SYSTEM AND METHOD

Publication number: 20080312920

Abstract: An expressive speech-to-speech generation system which can generate expressive speech output by using expressive parameters extracted from the original speech signal to drive the standard TTS system. The system comprises: speech recognition means, machine translation means, text-to-speech generation means, expressive parameter detection means for extracting expressive parameters from the speech of language A, and expressive parameter mapping means for mapping the expressive parameters extracted by the expressive parameter detection means from language A to language B, and driving the text-to-speech generation means by the mapping results to synthesize expressive speech.

Type: Application

Filed: August 23, 2008

Publication date: December 18, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Shen Liqin, Shi Qin, Donald T. Tang, Zhang Wei
Method and System for Packetised Content Streaming Optimisation

Publication number: 20080312922

Abstract: A method of determining the speech content of a packet carrying speech encoded data missing from speech segment communicated by in a packetised data stream communicated using at least one VOIP link between a server platform and a client platform, the method comprising at the client platform: receiving a plurality of packets carrying speech encoded data forming said packetised data stream; processing each received packet to determine a unique message segment identifier associated with a speech segment of the received packet; processing each received packet to determine if it contains another unique message segment identifier associated with a previously received packet carrying encoded speech data; determining if the unique message segment identifier for the received packet exists in storage means provided on the client platform, and if not, storing the received packet in association with its unique message segment identifier; processing each received packet to determine a sequence identifier; checking if the

Type: Application

Filed: July 27, 2005

Publication date: December 18, 2008

Inventors: Richard J Evenden, Francis J Scahill
Active Speaker Identification

Publication number: 20080312923

Abstract: Procedures for identifying clients in an audio event are described. In an example, a media server may order clients providing audio based on the input level. An identifier may be associated with the client for identifying the client providing input within the event. The ordered clients may be included in a list which may be inserted into a packet header carrying the audio content.

Type: Application

Filed: June 12, 2007

Publication date: December 18, 2008

Applicant: Microsoft Corporation

Inventors: Regis J. Crinon, Humayun M. Khan, Dalibor Kukoleca
USING RESULTS OF UNSTRUCTURED LANGUAGE MODEL BASED SPEECH RECOGNITION TO PERFORM AN ACTION ON A MOBILE COMMUNICATIONS FACILITY

Publication number: 20080312934

Abstract: A user may control a mobile communication facility through recognized speech provided to the mobile communication facility. Speech that is recorded by a user using a mobile communication facility resident capture facility is transmitted through a wireless communication facility to a speech recognition facility. The speech recognition facility generates results using an unstructured language model based at least in part on information relating to the recording. The results are transmitted to the mobile communications facility where an action is performed on the mobile communication facility based on the results.

Type: Application

Filed: March 7, 2008

Publication date: December 18, 2008

Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu

prev … 14 15 16 17 18 19 20 21 22 next