Speech Recognition (epo) Patents (Class 704/E15.001)

  • Publication number: 20090043580
    Abstract: The present invention includes a speech recognition system comprising a light element, a power control switch, the power control switch varying the power delivered to the light element, a controller, a microphone, a speech recognizer coupled to the microphone for recognizing speech input signals and transmitting recognition results to the controller, and a speech synthesizer coupled to the controller for generating synthesized speech, wherein the controller varies the power to the light element in accordance with the recognition results received from the speech recognizer. Embodiments of the invention may alternatively include a low power wake up circuit. In another embodiment, the present invention is a method of controlling a device by voice commands.
    Type: Application
    Filed: July 24, 2008
    Publication date: February 12, 2009
    Applicant: Sensory, Incorporated
    Inventors: Todd F. Mozer, Forrest S. Mozer, Erich B. Adams
  • Publication number: 20090043579
    Abstract: A method is presented which reduces data flow and thereby increases processing capacity while preserving a high level of accuracy in a distributed speech processing environment for speaker detection. The method and system of the present invention includes filtering out data based on a target speaker specific subset of labels using data filters. The method preserves accuracy and passes only a fraction of the data by optimizing target specific performance measures. Therefore, a high level of speaker recognition accuracy is maintained while utilizing existing processing capabilities.
    Type: Application
    Filed: April 2, 2008
    Publication date: February 12, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Upendra V. Chaudhari, Juan M. Huerta, Ganesh N. Ramaswamy, Olivier Verscheure
  • Publication number: 20090043581
    Abstract: This invention relates to a method of searching spoken audio data for one or more search terms comprising performing a phonetic search of the audio data to identify likely matches to a search term and producing textual data corresponding to a portion of the spoken audio data including a likely match. An embodiment of the method comprises the steps of taking phonetic index data corresponding to the spoken audio data, searching the phonetic index data for likely matches to the search term, wherein when a likely match is detected a portion of the spoken audio data or phonetic index data is selected which includes the likely match and said selected portion of the spoken audio data or phonetic index data is processed using a large vocabulary speech recogniser. The large vocabulary speech recogniser may derive textual data which can be used for further processing or may be used to present a transcript to a user.
    Type: Application
    Filed: August 7, 2008
    Publication date: February 12, 2009
    Applicant: AURIX LIMITED
    Inventors: Martin G. Abbott, Keith M. Ponting
  • Publication number: 20090037176
    Abstract: A wordspotting system is applied to a speech source in a preliminary processing phase. The putative hits corresponding to queries (e.g., keywords, key phrases, or more complex queries that may include Boolean expressions and proximity operators) are used to control a speech recognizer. The control can include one or more of application of a time specification that is determined from the putative hits for selecting an interval of the speech source to which to apply the speech recognizer; application of a grammar specification determined from the putative hits that is used by the speech recognizer, and application of a specification of a lattice or pruning specification that is used by the recognizer to limit or guide the recognizer in recognition of the speech source.
    Type: Application
    Filed: August 1, 2008
    Publication date: February 5, 2009
    Applicant: Nexidia Inc.
    Inventor: Jon A. Arrowood
  • Publication number: 20090034750
    Abstract: A system (100) and method (300) for evaluating an audio configuration is provided. The system can include a media device (110) that receives audio (121) from an application (115) and transmits the audio, and a media system (150) that receives the audio, and plays the audio out of a speaker (170) to produce external audio (156). The media device can capture the external audio from a microphone (145) to produce captured audio (146), perform pattern matching (125) on the audio signal and the captured audio signal in real-time, and present a notification (200) identifying whether audio sourced by the media device is rendered by the media system.
    Type: Application
    Filed: July 31, 2007
    Publication date: February 5, 2009
    Applicant: MOTOROLA, INC.
    Inventors: RAMY AYOUB, GARY LEE CHRISTOPHER, BRIAN SIBILSKY
  • Publication number: 20090037174
    Abstract: In one embodiment, the present system recognizes a user's speech input using an automatically generated probabilistic context free grammar for street names that maps all pronunciation variations of a street name to a single canonical representation during recognition. A tokenizer expands the representation using position-dependent phonetic tokens and an intersection classifier classifies an intersection, despite the presence of recognition errors and incomplete street names.
    Type: Application
    Filed: July 31, 2007
    Publication date: February 5, 2009
    Applicant: Microsoft Corporation
    Inventors: Michael L. Seltzer, Yun-Cheng Ju, Ivan J. Tashev
  • Publication number: 20090037171
    Abstract: The real-time voice transcription system provides a speech recognition system and method that includes use of speech and spatial-temporal acoustic data to enhance speech recognition probabilities while simultaneously identifying the speaker. Real-time edit capability is provided enabling a user to train the system during a transcription session. The system may be connected to user computers via local network and/or wide area network means.
    Type: Application
    Filed: August 4, 2008
    Publication date: February 5, 2009
    Inventors: Tim J. McFarland, Vasudevan C. Gurunathan
  • Publication number: 20090037173
    Abstract: The present invention discloses a payment card that uses speaker identification and verification (SIV) speech processing techniques for activation purposes. For example, the invention can initially identify a payment card in a deactivated state, which is an internal state of the payment card. Speech input can then be received. Speech characteristics of the speech input can be determined and compared against a voice print of an authorized card user. The payment card can be selectively activated based on comparison results. That is, when the voice print and the speech characteristics match, the payment card can be activated. Otherwise, the card will remain deactivated. An activated payment card is one that has undergone an internal state change from the deactivated state. For example, when activated a credit card number can appear in a display and a magnetic strip can contain payment information, neither of which are present in the deactivated state.
    Type: Application
    Filed: August 2, 2007
    Publication date: February 5, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: JOSEPH A. HANSEN
  • Publication number: 20090030691
    Abstract: A user may control a mobile communication facility through recognized speech provided to the mobile communication facility. Speech that is recorded by a user using a mobile communication facility resident capture facility. A speech recognition facility generates results of the recorded speech using an unstructured language model based at least in part on information relating to the recording. An application resident on the mobile communications facility is identified, wherein the resident application is capable of taking the results generated by the speech recognition facility as an input. The generated results are input to the application.
    Type: Application
    Filed: August 1, 2008
    Publication date: January 29, 2009
    Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu
  • Publication number: 20090030697
    Abstract: A user may control a mobile communication facility through recognized speech provided to the mobile communication facility. Speech that is recorded by a user using a mobile communication facility resident capture facility. A speech recognition facility generates results of the recorded speech using an unstructured language model based at least in part on information relating to the recording. Determining a context of the mobile communications facility at the time speech is recorded, and based on the context, delivering the generated results to a facility for performing an action on the mobile communication facility.
    Type: Application
    Filed: August 1, 2008
    Publication date: January 29, 2009
    Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu
  • Publication number: 20090030684
    Abstract: A method and system for entering information into a software application resident on a mobile communication facility is provided. The method and system may include recording speech presented by a user using a mobile communication facility resident capture facility, transmitting the recording through a wireless communication facility to a speech recognition facility, transmitting information relating to the software application to the speech recognition facility, generating results utilizing the speech recognition facility using an unstructured language model based at least in part on the information relating to the software application and the recording, transmitting the results to the mobile communications facility, loading the results into the software application and simultaneously displaying the results as a set of words and as a set of application results based on those words.
    Type: Application
    Filed: August 1, 2008
    Publication date: January 29, 2009
    Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu, Alexandra Beth Mischke
  • Publication number: 20090030687
    Abstract: A user may control a mobile communication facility through recognized speech provided to the mobile communication facility. Speech that is recorded by a user using a mobile communication facility resident capture facility is transmitted through a wireless communication facility to a speech recognition facility. The speech recognition facility generates results using an unstructured language model based at least in part on information relating to the recording. The results are transmitted to the mobile communications facility where an action is performed on the mobile communication facility based on the results and adapting the speech recognition facility based on usage.
    Type: Application
    Filed: August 1, 2008
    Publication date: January 29, 2009
    Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu, Alexandra Beth Mischke, Christopher Michael Micali
  • Publication number: 20090030689
    Abstract: Voice recognition methods, systems and interfaces are used to collect data and produce databases that are then searched and used to produce reports or electronic filings. The databases are developed using a hierarchically designed command structure and a hierarchy of relational databases for the entry and recognition of voice commands. The invention uses an Adaptive Grammar that allows a very high probability for accurate recognition and a rapid recognition response to be achieved. The invention allows for multiple users and multiple mobile computers to maximize voice recognition capabilities.
    Type: Application
    Filed: October 3, 2006
    Publication date: January 29, 2009
    Inventors: Vincent Perrin, Judi Perrin, Michael Joost, Kevin Wood
  • Publication number: 20090030690
    Abstract: A speech analysis apparatus analyzing prosodic characteristics of speech information and outputting a prosodic discrimination result includes an input unit inputting speech information, an acoustic analysis unit calculating relative pitch variation and a discrimination unit performing speech discrimination processing, in which the acoustic analysis unit calculates a current template relative pitch difference, determining whether a difference absolute value between the current template relative pitch difference and a previous template relative pitch difference is equal to or less than a predetermined threshold or not, when the value is not less than the threshold, calculating an adjacent relative pitch difference, and when the adjacent relative pitch difference is equal to or less than a previously set margin value, executing correction processing of adding or subtracting an octave of the current template relative pitch difference to calculate the relative pitch variation by applying the relative pitch differe
    Type: Application
    Filed: July 21, 2008
    Publication date: January 29, 2009
    Inventor: Keiichi YAMADA
  • Publication number: 20090030683
    Abstract: Disclosed are methods, systems, and computer-readable media for tracking dialog states in a spoken dialog system. The method comprises casting a plurality of dialog states, or particles, as a network describing the probability relationships between each of a plurality of variables, sampling a subset of the plurality of dialog states, or particles, in the network, for each sampled dialog state, or particle, projecting into the future, assigning a weight to each sampled particle, and normalizing the assigned weights to yield a new estimated distribution over each variable's values, wherein the distribution over the variables is used in a spoken dialog system. Also disclosed is a method of tuning performance of the methods, systems, and computer-readable media by adding or removing particles to/from the network.
    Type: Application
    Filed: July 26, 2007
    Publication date: January 29, 2009
    Applicant: AT&T Labs, Inc
    Inventor: Jason WILLIAMS
  • Publication number: 20090030686
    Abstract: In a confidence computing method and system, a processor may interpret speech signals as a text string or directly receive a text string as input, generate a syntactical parse tree representing the interpreted string and including a plurality of sub-trees which each represents a corresponding section of the interpreted text string, determine for each sub-tree whether the sub-tree is accurate, obtain replacement speech signals for each sub-tree determined to be inaccurate, and provide output based on corresponding text string sections of at least one sub-tree determined to be accurate.
    Type: Application
    Filed: July 27, 2007
    Publication date: January 29, 2009
    Inventors: Fuliang Weng, Feng Lin, Zhe Feng
  • Publication number: 20090030685
    Abstract: Speech recorded by an audio capture facility of a navigation facility is processed by a speech recognition facility to generate results that are provided to the navigation facility. When information related to a navigation application running on the navigation facility are provided to the speech recognition facility, the results generated are based at least in part on the application related information. The speech recognition facility uses an unstructured language model for generating results. The user of the navigation facility may optionally be allowed to edit the results being provided to the navigation facility. The speech recognition facility may also adapt speech recognition based on usage of the results.
    Type: Application
    Filed: August 1, 2008
    Publication date: January 29, 2009
    Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu, Yongdeng Chen
  • Publication number: 20090030688
    Abstract: Entering information into a software application resident on a mobile communication facility comprises recording speech presented by a user using a mobile communication facility resident capture facility, transmitting the recording through a wireless communication facility to a speech recognition facility, transmitting information relating to the software application to the speech recognition facility, generating results utilizing the speech recognition facility using an unstructured language model based at least in part on the information relating to the software application and the recording, tagging the results with information about the words in the results, transmitting the results and tags to the mobile communications facility, and loading the results and tags into the software application.
    Type: Application
    Filed: August 1, 2008
    Publication date: January 29, 2009
    Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu, Doron Gan, Eric H. Thayer
  • Publication number: 20090030695
    Abstract: A speech recognition and control system including a sound card for receiving speech and converting the speech into digital data, the sound card removably connected to an input of a computer, recognizer software executing on the computer for interpreting at least a portion of the digital data, event detection software executing on the computer for detecting connectivity of the sound card, and command control software executing on the computer for generating a command based on at least one of the digital data and the connectivity of the sound card.
    Type: Application
    Filed: April 18, 2008
    Publication date: January 29, 2009
    Inventors: Gang Wang, Chengyi Zheng, Heinz-Werner Stiller, Matteo Contolini
  • Publication number: 20090025071
    Abstract: A process for authenticating a user to control remote access to a service, data base or data network is provided, in which during an enrolment step, an initial voice sample provided by the user is analyzed to obtain an initial user-specific voice profile and, in a later verification step, a current voice sample of the user is analyzed and compared to the initial voice profile to generate an access control signal. An additional user-dedicated authentication is generated in a pre-enrolment period, and the additional authentication is used to authenticate the user in the enrolment step and/or in an access control step prior to and independent on the enrolment step, in a provisional or supplementary authentication procedure.
    Type: Application
    Filed: January 7, 2008
    Publication date: January 22, 2009
    Applicant: VOICE.TRUST AG
    Inventors: Marc Mumm, Rajasekharan Kuppuswamy
  • Publication number: 20090024390
    Abstract: A method of speech recognition converts an unknown speech input into a stream of representative features. The feature stream is transformed based on speaker dependent adaptation of multi-class feature models. Then automatic speech recognition is used to compare the transformed feature stream to multi-class speaker independent acoustic models to generate an output representative of the unknown speech input.
    Type: Application
    Filed: May 2, 2008
    Publication date: January 22, 2009
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Neeraj Deshmukh, Puming Zhan
  • Publication number: 20090024720
    Abstract: A voice-enabled web portal system includes a web portal server and a call manager software module. The web portal server is operable to download data according to parameters of a web portal. The call manager software module is operable to accept a voice query from a user via telephone, to retrieve a portion of the downloaded data in response to the voice query, and to provide the portion of downloaded data to the user via telephone. A method of providing information from a web portal includes downloading data via a web portal according to parameters of the web portal at a predetermined time interval, filtering the downloaded data to produce portal information, and selectively providing portions of the portal information in response to a voice query.
    Type: Application
    Filed: July 21, 2008
    Publication date: January 22, 2009
    Inventors: Fakhreddine Karray, Jiping Sun
  • Publication number: 20090018834
    Abstract: A computer-based virtual assistant includes a virtual assistant application running on a computer capable of receiving human voice communications from a user of a remote user interface and transmitting a vocalization to the remote user interface, the virtual assistant application enabling the user to access email and voicemail messages of the user, the virtual assistant application selecting a responsive action to a verbal query or instruction received from the remote user interface and transmitting a vocalization characterizing the selected responsive action to the remote user interface, and the virtual assistant waiting a predetermined period of time, and if no canceling indication is received from the remote user interface, proceeding to perform the selected responsive action, and if a canceling indication is received from the remote user interface halting the selected responsive action and transmitting a new vocalization to the remote user interface. Also a method of using the virtual assistant.
    Type: Application
    Filed: September 23, 2008
    Publication date: January 15, 2009
    Inventors: Robert S. Cooper, Derek Sanders, Richard M. Ulmer
  • Publication number: 20090018831
    Abstract: Voices are prevented from being recognized with poor accuracy when a speaker is not close to a sound pickup device. A speech recognition apparatus (10) includes a sound pickup device (16) picking up sounds, a photographing device (18) photographing images of a speaker making voices into the sound pickup device (16), a voice recognition function unit (132) recognizing voices on the basis of the picked-up voices, and a recognition/learning determination unit (142) restricting the voice recognition function unit (132) from recognizing voices when the photographed images do not contain speaker images showing at least part of the speaker.
    Type: Application
    Filed: December 19, 2005
    Publication date: January 15, 2009
    Applicant: KYOCERA CORPORATION
    Inventor: Kugo Morita
  • Publication number: 20090018832
    Abstract: An information communication terminal (100) that includes: a speech recognition module (6) for recognizing speech information to identify a plurality of words in the recognized speech information; a storage medium (20) for storing keyword extraction condition setting data (24) in which a condition for extracting a keyword is set; a keyword extraction module (8) for reading the keyword extraction condition setting data (24) to extract a plurality of keywords from the plurality of words; a related information acquisition module (11) for acquiring related information related to a plurality of keywords; and a related information output module (14) for providing related information to a monitor (2).
    Type: Application
    Filed: February 8, 2006
    Publication date: January 15, 2009
    Inventors: Takeya Mukaigaito, Shinya Takada, Daigoro Yokozeki, Miki Sakai, Rie Sakai, Katsuya Arai, Takuo Nishihara, Takahiko Murayama
  • Publication number: 20090018835
    Abstract: A computer-based virtual assistant includes a virtual assistant application running on a computer capable of receiving human voice communications from a user of a remote user interface and transmitting a vocalization to the remote user interface, the virtual assistant application enabling the user to access email and voicemail messages of the user, the virtual assistant application selecting a responsive action to a verbal query or instruction received from the remote user interface and transmitting a vocalization characterizing the selected responsive action to the remote user interface, and the virtual assistant waiting a predetermined period of time, and if no canceling indication is received from the remote user interface, proceeding to perform the selected responsive action, and if a canceling indication is received from the remote user interface halting the selected responsive action and transmitting a new vocalization to the remote user interface. Also a method of using the virtual assistant.
    Type: Application
    Filed: September 23, 2008
    Publication date: January 15, 2009
    Inventors: Robert S. Cooper, Derek Sanders, Richard M. Ulmer
  • Publication number: 20090018829
    Abstract: Described is a speech recognition dialog management system that allows more open-ended conversations between virtual agents and people than are possible using just agent-directed dialogs. The system uses both novel dialog context switching and learning algorithms based on spoken interactions with people. The context switching is performed through processing multiple dialog goals in a last-in-first-out (LIFO) pattern. The recognition accuracy for these new flexible conversations is improved through automated learning from processing errors and addition of new grammars.
    Type: Application
    Filed: June 8, 2005
    Publication date: January 15, 2009
    Applicant: METAPHOR SOLUTIONS, INC.
    Inventor: Michael Kuperstein
  • Publication number: 20090018720
    Abstract: The present invention provides an emission monitoring device for a vehicle. The device comprises a lightweight housing for operationally encapsulating a processor, memory; a sensor for each vehicle component to be monitored, a display device; and a power supply for power the device. The processor is programmed to monitor each sensor, capture data from each sensor, store the captured data from each sensor in the memory; and display the captured onto the display device. In one embodiment of the present invention the housing is operationally mounted onto the inner gas lid of the vehicle. In an alternative embodiment, the housing is incorporated into the dashboard.
    Type: Application
    Filed: May 8, 2006
    Publication date: January 15, 2009
    Inventor: Lee Bernard
  • Publication number: 20090012792
    Abstract: A speech recognition system is provided for selecting, via a speech input, an item from a list of items. The speech recognition system detects a first speech input, recognizes the first speech input, compares the recognized first speech input with the list of items and generates a first candidate list of best matching items based on the comparison result. The system then informs the speaker of at least one of the best matching items of the first candidate list for a selection of an item by the speaker. If the intended item is not one of the best matching items presented to the speaker, the system then detects a second speech input, recognizes the second speech input, and generates a second candidate list of best matching items taking into account the comparison result obtained with the first speech input.
    Type: Application
    Filed: December 12, 2007
    Publication date: January 8, 2009
    Applicant: Harman Becker Automotive Systems GmbH
    Inventors: Andreas Low, Lars Konig, Christian Hillebrecht
  • Publication number: 20090012787
    Abstract: A dialog processing system which includes a target expression data extraction unit for extracting a plurality of target expression data each including a pattern matching portion which matches an utterance pattern, which are inputted by an utterance pattern input unit and is an utterance structure derived from contents of field-independent general conversations, among a plurality of utterance data which are inputted by an utterance data input unit and obtained by converting contents of a plurality of conversations in one field; a feature extraction unit for retrieving the pattern matching portions, respectively, from the plurality of target expression data extracted, and then for extracting feature quantity common to the plurality of pattern matching portions; and a mandatory data extraction unit for extracting mandatory data in the one field included in the plurality of utterance data by use of the feature quantities extracted.
    Type: Application
    Filed: July 3, 2008
    Publication date: January 8, 2009
    Inventors: Nobuyasu Itoh, Shiho Negishi, Hironori Takeuchi
  • Publication number: 20090012790
    Abstract: A speech recognition apparatus which improves the sound quality of speech output as a speech recognition result is provided. The speech recognition apparatus includes a recognition unit, which recognizes speech based on a recognition dictionary, and a registration unit, which registers a dictionary entry of a new recognition word in the recognition dictionary. The recognition unit includes a generation unit, which generates a dictionary entry including speech of the new recognition word item and feature parameters of the speech, and a modification unit, which makes a modification for improving the sound quality of the speech included in the dictionary entry generated by the generation unit. The recognition unit includes a speech output unit, which outputs speech which is included in a dictionary entry corresponding to the recognition result of input speech, and is modified by the modification unit.
    Type: Application
    Filed: July 1, 2008
    Publication date: January 8, 2009
    Applicant: CANON KABUSHIKI KAISHA
    Inventors: Masayuki Yamada, Toshiaki Fukada, Yasuo Okutani, Michio Aizawa
  • Publication number: 20090012791
    Abstract: A method and apparatus for carrying out adaptation using input speech data information even at a low reference pattern recognition performance. A reference pattern adaptation device 2 includes a speech recognition section 18, an adaptation data calculating section 19 and a reference pattern adaptation section 20. The speech recognition section 18 calculates a recognition result teacher label from the input speech data and the reference pattern. The adaptation data calculating section 19 calculates adaptation data composed of a teacher label and speech data. The adaptation data is composed of the input speech data and the recognition result teacher label corrected for adaptation by the recognition error knowledge which is the statistical information of the tendency towards recognition errors of the reference pattern. The reference pattern adaptation section 20 adapts the reference pattern using the adaptation data to generate an adaptation pattern.
    Type: Application
    Filed: February 16, 2007
    Publication date: January 8, 2009
    Applicant: NEC Corporation
    Inventor: Yoshifumi Onishi
  • Publication number: 20090012785
    Abstract: A sampling-rate-independent method of automated speech recognition (ASR). Speech energies of a plurality of codebooks generated from training data created at an ASR sampling rate are compared to speech energies in a current frame of acoustic data generated from received audio created at an audio sampling rate below the ASR sampling rate. A codebook is selected from the plurality of codebooks, and has speech energies that correspond to speech energies in the current frame over a spectral range corresponding to the audio sampling rate. Speech energies above the spectral range are copied from the selected codebook and appended to the current frame.
    Type: Application
    Filed: July 3, 2007
    Publication date: January 8, 2009
    Applicant: GENERAL MOTORS CORPORATION
    Inventor: Rathinavelu Chengalvarayan
  • Publication number: 20090012788
    Abstract: The translation system of a preferred embodiment includes an input element that receives an input language as audio information, an output element that displays an output language as visual information, and a remote server coupled to the input element and the output element, the remote server including a database of sign language images; and a processor that receives the input language from the input element, translates the input language into the output language, and transmits the output language to the output element, wherein the output language is a series of the sign language images that correspond to the input language and that are coupled to one another with substantially seamless continuity, such that the ending position of a first image is blended into the starting position of a second image.
    Type: Application
    Filed: July 3, 2008
    Publication date: January 8, 2009
    Inventors: Jason Andre Gilbert, Shau-yuh YU
  • Publication number: 20090006099
    Abstract: Depiction of a speech user interface via graphical elements is provided. One or more bits of a graphical user interface bitmask are re-designated as speech bits. When a software application processes the re-designated speech bits, a window manager responsible for generating and rendering a graphical user interface for the application passes information to a secondary window manager responsible for generating and rendering a speech user interface. The secondary speech window manager may load a text-to-speech engine, a speech recognizer engine, a lexicon or library of recognizable words or phrases and a set of “grammars” (recognizable words and phrasing) for building a speech user interface that will receive, recognize and act on spoken input to the associated software application.
    Type: Application
    Filed: June 29, 2007
    Publication date: January 1, 2009
    Applicant: Microsoft Corporation
    Inventors: Timothy D. Sharpe, Cameron A. Etezadi
  • Publication number: 20090006087
    Abstract: A method and system for synchronizing words in an input text of a speech with a continuous recording of the speech. A received input text includes previously recorded content of the speech to be reproduced. A synthetic speech corresponding to the received input text is generated. Ratio data including a ratio between the respective pronunciation times of words included in the received text in the generated synthetic speech is computed. The ratio data is used to determine an association between erroneously recognized words of the received text and a time to reproduce each erroneously recognized word. The association is outputted in a recording medium and/or displayed on a display device.
    Type: Application
    Filed: June 25, 2008
    Publication date: January 1, 2009
    Inventors: Noriko Imoto, Tetsuya Uda, Takatoshi Watanabe
  • Publication number: 20090006345
    Abstract: Architecture for completing search queries by using artificial intelligence based schemes to infer search intentions of users. Partial queries are completed dynamically in real time. Additionally, search aliasing can also be employed. Custom tuning can be performed based on at least query inputs in the form of text, graffiti, images, handwriting, voice, audio, and video signals. Natural language processing occurs, along with handwriting recognition and slang recognition. The system includes a classifier that receives a partial query as input, accesses a query database based on contents of the query input, and infers an intended search goal from query information stored on the query database. A query formulation engine receives search information associated with the intended search goal and generates a completed formal query for execution.
    Type: Application
    Filed: June 28, 2007
    Publication date: January 1, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: John C. Platt, Gary W. Flake, Ramez Naam, Anoop Gupta, Oliver Hurst-Hiller, Trenholme J. Griffin
  • Publication number: 20090006092
    Abstract: [PROBLEMS] To provide a speech recognition language model making system for making a speech recognition language model so as to recognize a meaningful speech necessary for application of speech recognition, such as a speech in conversation at a call center. [MEANS FOR SOLVING PROBLEMS] A speech recognition language model making system (1) comprises a probability estimating device (11), a language model learning corpus storage device (14), and a learning corpus emphasizing device (12). The learning corpus emphasizing device (12) emphasizes a prescribed part of the learning corpus to create an emphasized learning corpus. The probability estimating device (11) operates to make a speech recognition language model by estimating the probability value of a language model by the emphasized learning corpus.
    Type: Application
    Filed: December 26, 2006
    Publication date: January 1, 2009
    Inventors: Kiyokazu Miki, Kentarou Nagamoto
  • Publication number: 20090006088
    Abstract: Speech recognition models are dynamically re-configurable based on user information, application information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. Word recognition lattices are generated for each data field of an application and dynamically concatenated into a single word recognition lattice. A language model is applied to the concatenated word recognition lattice to determine the relationships between the word recognition lattices and repeated until the generated word recognition lattices are acceptable or differ from a predetermined value only by a threshold amount. These techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.
    Type: Application
    Filed: September 9, 2008
    Publication date: January 1, 2009
    Applicant: AT&T Corp.
    Inventors: Bojana Gajic, Shrikanth Sambasivan Narayanan, Sarangarajan Parthasarathy, Richard Cameron Rose, Aaron Edward Rosenberg
  • Publication number: 20080319747
    Abstract: The method of operating a man-machine interface unit includes classifying at least one utterance of a speaker to be of a first type or of a second type. If the utterance is classified to be of the first type, the utterance belongs to a known speaker of a speaker data base, and if the utterance is classified to be of the second type, the utterance belongs to an unknown speaker that is not included in the speaker data base. The method also includes storing a set of utterances of the second type, clustering the set of utterances into clusters, wherein each cluster comprises utterances having similar features, and automatically adding a new speaker to the speaker data base based on utterances of one of the clusters.
    Type: Application
    Filed: August 20, 2008
    Publication date: December 25, 2008
    Applicant: Sony Deutschland GmbH
    Inventors: Ralf Kompe, Thomas Kemp
  • Publication number: 20080319751
    Abstract: Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcomes the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command.
    Type: Application
    Filed: July 7, 2008
    Publication date: December 25, 2008
    Inventors: Robert A. Kennewick, David Locke, Michael R. Kennewick, SR., Michael R. Kennewick, JR., Richard Kennewick, Tom Freeman
  • Publication number: 20080319745
    Abstract: A machine-readable medium and a network device are provided for speech-to-text translation. Speech packets are received at a broadband telephony interface and stored in a buffer. The speech packets are processed and textual representations thereof are displayed as words on a display device. Speech processing is activated and deactivated in response to a command from a subscriber.
    Type: Application
    Filed: August 28, 2008
    Publication date: December 25, 2008
    Applicant: AT&T Corp.
    Inventors: Charles David Caldwell, John Bruce Harlow, Robert J. Sayko, Norman Shaye
  • Publication number: 20080319741
    Abstract: Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.
    Type: Application
    Filed: June 20, 2007
    Publication date: December 25, 2008
    Applicant: AT&T Corp.
    Inventor: Mazin Gilbert
  • Publication number: 20080319744
    Abstract: A method and system for producing and working with transcripts according to the invention eliminates the foregoing time inefficiencies. By dispersing a source recording to a transcription team in small segments, so that team members transcribe segments in parallel, a rapid transcription process delivers a fully edited transcript within minutes. Clients can view accurate, grammatically correct, proofread and fact-checked documents that shadow live proceedings by mere minutes. The rapid transcript includes time coding, speaker identification and summary. A viewer application allows a client to view a video recording side-by-side with a transcript. Clicking on a word in the transcript locates the corresponding recorded content; advancing a recording to a particular point locates and displays the corresponding spot in the transcript. The recording is viewed using common video features, and may be downloaded. The client can edit the transcript and insert comments.
    Type: Application
    Filed: May 27, 2008
    Publication date: December 25, 2008
    Inventor: Adam Michael Goldberg
  • Publication number: 20080319749
    Abstract: A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new Language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.
    Type: Application
    Filed: July 11, 2008
    Publication date: December 25, 2008
    Applicant: MICROSOFT CORPORATION
    Inventors: David Mowatt, Robert Chambers, Ciprian Chelba, Qiang Wu
  • Publication number: 20080319748
    Abstract: A first domain satisfying a first condition concerning a current utterance understanding result and a second domain satisfying a second condition concerning a selection history are specified. For each of the first and second domains, indices representing reliability in consideration of the utterance understanding history, selection history, and utterance generation history are evaluated. Based on the evaluation results, one of the first, second, and third domains is selected as a current domain according to a selection rule.
    Type: Application
    Filed: January 31, 2007
    Publication date: December 25, 2008
    Inventors: Mikio Nakano, Hiroshi Tsujino, Yohane Takeuchi, Kazunori Komatani, Hiroshi Okuno
  • Publication number: 20080312920
    Abstract: An expressive speech-to-speech generation system which can generate expressive speech output by using expressive parameters extracted from the original speech signal to drive the standard TTS system. The system comprises: speech recognition means, machine translation means, text-to-speech generation means, expressive parameter detection means for extracting expressive parameters from the speech of language A, and expressive parameter mapping means for mapping the expressive parameters extracted by the expressive parameter detection means from language A to language B, and driving the text-to-speech generation means by the mapping results to synthesize expressive speech.
    Type: Application
    Filed: August 23, 2008
    Publication date: December 18, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Shen Liqin, Shi Qin, Donald T. Tang, Zhang Wei
  • Publication number: 20080312922
    Abstract: A method of determining the speech content of a packet carrying speech encoded data missing from speech segment communicated by in a packetised data stream communicated using at least one VOIP link between a server platform and a client platform, the method comprising at the client platform: receiving a plurality of packets carrying speech encoded data forming said packetised data stream; processing each received packet to determine a unique message segment identifier associated with a speech segment of the received packet; processing each received packet to determine if it contains another unique message segment identifier associated with a previously received packet carrying encoded speech data; determining if the unique message segment identifier for the received packet exists in storage means provided on the client platform, and if not, storing the received packet in association with its unique message segment identifier; processing each received packet to determine a sequence identifier; checking if the
    Type: Application
    Filed: July 27, 2005
    Publication date: December 18, 2008
    Inventors: Richard J Evenden, Francis J Scahill
  • Publication number: 20080312923
    Abstract: Procedures for identifying clients in an audio event are described. In an example, a media server may order clients providing audio based on the input level. An identifier may be associated with the client for identifying the client providing input within the event. The ordered clients may be included in a list which may be inserted into a packet header carrying the audio content.
    Type: Application
    Filed: June 12, 2007
    Publication date: December 18, 2008
    Applicant: Microsoft Corporation
    Inventors: Regis J. Crinon, Humayun M. Khan, Dalibor Kukoleca
  • Publication number: 20080312934
    Abstract: A user may control a mobile communication facility through recognized speech provided to the mobile communication facility. Speech that is recorded by a user using a mobile communication facility resident capture facility is transmitted through a wireless communication facility to a speech recognition facility. The speech recognition facility generates results using an unstructured language model based at least in part on information relating to the recording. The results are transmitted to the mobile communications facility where an action is performed on the mobile communication facility based on the results.
    Type: Application
    Filed: March 7, 2008
    Publication date: December 18, 2008
    Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu