Abstract: A method of enabling the selection of a language to be employed as a method input language by a disambiguation routine of a handheld electronic device having stored therein a plurality of method input languages and disambiguation routine number, includes detecting a selection of a language, detecting as an ambiguous input an actuation of one or more input members, outputting at least a plurality of the language objects that corresponds to the ambiguous input, outputting an indicator which one of the method input languages is currently employed by the disambiguation function, and enabling an alternate one of the input language methods to be selected in response to a selection of the indicator in lieu of one of the plurality of language objects.
Type:
Grant
Filed:
September 14, 2012
Date of Patent:
March 5, 2013
Assignee:
Research In Motion Limited
Inventors:
Sherryl Lee Lorraine Scott, Zaheen Somani
Abstract: A speaker speed conversion system includes: a risk site detection unit (22) for detecting sites of risk regarding sound quality from among speech that is received as input, a frame boundary detection unit (23) for searching for a plurality of points that can serve as candidates of frame boundaries from among speech that is received as input and, of these points, supplying as a frame boundary the point that is predicted to be best from the standpoint of sound quality, and an OLA unit (25) for implementing speed conversion based on the detection results in the frame boundary detection unit (23); wherein the frame boundary detection unit (23) eliminates, from candidates of frame boundaries, sites of risk regarding sound quality that were detected in the risk site detection unit (22).
Abstract: A complex analysis filterbank is implemented by obtaining an input audio signal as a plurality of N time-domain input samples. Pair-wise additions and subtractions of the time-domain input samples is performed to obtain a first and second groups of intermediate samples, each group having N/2 intermediate samples. The signs of odd-indexed intermediate samples in the second group are then inverted. A first transform is applied to the first group of intermediate samples to obtain a first group of output coefficients in the frequency domain. A second transform is applied to the second group of intermediate samples to obtain an intermediate second group of output coefficients in the frequency domain. The order of coefficients in the intermediate second group of output coefficients is then reversed to obtain a second group of output coefficients. The first and second groups of output coefficients may be stored and/or transmitted as a frequency domain representation of the audio signal.
Abstract: Descriptions of visually presented material are provided to one or more conference participants that do not have video capabilities. This presented material could be any one or more of a document, PowerPoint® presentation, spreadsheet, Webex® presentation, whiteboard, chalkboard, interactive whiteboard, description of a flowchart, picture, or in general, any information visually presented at a conference. For this visually presented information, descriptions thereof are assembled and forwarded via one or more of a message, SMS message, whisper channel, text information, non-video channel, MSRP, or the like, to one or more conference participant endpoints. These descriptions of visually presented information, such as a document, spreadsheet, spreadsheet presentation, multi-media presentation, or the like, can be assembled in cooperation with one or more of OCR recognition and text-to-speech conversion, human input, or the like.
Abstract: Embodiments of a system include a client device (102), a voice server (106), and an application server (104). The voice server is distinct from the application server. The client device renders (316) a visual display that includes at least one display element for which input data is receivable though a visual modality and a voice modality. The client device may receive speech through the voice modality and send (502) uplink audio data representing the speech to the voice server over an audio data path (124). The application server receives (514) a speech recognition result from the voice server over an application server/voice server control path (122). The application server sends (514), over an application server/client control path (120), a message to the client device that includes the speech recognition result. The client device updates (516) one or more of the display elements according to the speech recognition result.
Type:
Grant
Filed:
December 17, 2008
Date of Patent:
February 26, 2013
Assignee:
Motorola Mobility LLC
Inventors:
Jonathan R. Engelsma, Anuraj Kunnummel Ennai, James C. Ferrans
Abstract: A method for combining speech recognition with near field communication (NFC) to enable a user to enter, store, and use web addresses on portable devices. A user of a portable device having a NFC reader, a voice input interface, a speech recognition system, and memory enables the NFC reader of the portable device to touch a NFC tag or reader found on an object. The object containing information of interest to a user of the portable device; wherein when the NFC reader and the NFC tag or reader touch, the portable device receives a URI and default keywords associated with the URI. The portable device stores the URI in a persistent storage of the portable device based on the default keywords, and date, time, and location of when and where the URI was obtained.
Abstract: A speech recognition device (1) processes speech data (SD) of a dictation and establishes recognized text information (ETI) and link information (LI) of the dictation. In a synchronous playback mode of the speech recognition device (1), during acoustic playback of the dictation a correction device (10) synchronously marks the word of the recognized text information (ETI) which word relates to speech data (SD) just played back marked by link information (LI) is marked synchronously, the just marked word featuring the position of an audio cursor (AC). When a user of the speech recognition device (1) recognizes an incorrect word, he positions a text cursor (TC) at the incorrect word and corrects it. Cursor synchronization means (15) makes it possible to synchronize text cursor (TC) with audio cursor (AC) or audio cursor (AC) with text cursor (TC) so the positioning of the respective cursor (AC, TC) is simplified considerably.
Abstract: A food processor with recognition ability of emotion-related information and emotional signals is disclosed, which comprises: an emotion recognition module and a food processing module. The emotion recognition module is capable of receiving sound signals so as to identify an emotion containing in the received sound signals. The food processing module is capable of producing food products with a taste corresponding to the emotion recognition result of the emotion recognition module.
Type:
Grant
Filed:
April 12, 2010
Date of Patent:
February 19, 2013
Assignee:
Industrial Technology Research Institute
Inventors:
Ying-Tzu Lin, Chao-Ming Yu, Hao-Hsiang Yu, Tsang-Gang Lin
Abstract: Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized from text strings associated with media assets, where each text string can be associated with a native string language (e.g., the language of the string). When several text strings are associated with at least two distinct languages, a series of rules can be applied to the strings to identify a single voice language to use for synthesizing the speech content from the text strings. In some embodiments, a prioritization scheme can be applied to the text strings to identify the more important text strings. The rules can include, for example, selecting a voice language based on the prioritization scheme, a default language associated with an electronic device, the ability of a voice language to speak text in a different language, or any other suitable rule.
Type:
Grant
Filed:
March 9, 2009
Date of Patent:
February 19, 2013
Assignee:
Apple Inc.
Inventors:
Kenneth Herman, Matthew Rogers, Bryan James
Abstract: Methods and systems are described in which spoken voice prompts can be produced in a manner such that they will most likely have the desired effect, for example to indicate empathy, or produce a desired follow-up action from a call recipient. The prompts can be produced with specific optimized speech parameters, including duration, gender of speaker, and pitch, so as to encourage participation and promote comprehension among a wide range of patients or listeners. Upon hearing such voice prompts, patients/listeners can know immediately when they are being asked questions that they are expected to answer, and when they are being given information, as well as the information that considered sensitive.
Type:
Grant
Filed:
January 25, 2008
Date of Patent:
February 19, 2013
Assignee:
Eliza Corporation
Inventors:
Lisa Lavoie, Lucas Merrow, Alexandra Drane, Frank Rizzo, Ivy Krull
Abstract: Encoding audio signals with selecting an encoding mode for encoding the signal categorizing the signal into active segments having voice activity and non-active segments having substantially no voice activity by using categorization parameters depending on the selected encoding mode and encoding at least the active segments using the selected encoding mode.
Type:
Grant
Filed:
September 29, 2011
Date of Patent:
February 12, 2013
Assignee:
Core Wireless Licensing S.A.R.L.
Inventors:
Kari Jarvinen, Pasi Ojala, Ari Lakaniemi
Abstract: A voice activity detector indicates the presence of speech within a signal. The detector may determine whether the signal includes speech by calculating a variance of a signal-to-noise ratio across a plurality of portions of a signal, calculating a value based on the variance of the signal-to-noise ratio, performing a comparison between the value and a threshold, and identifying whether the signal contains speech based on the comparison between the value and the threshold.
Abstract: Establishing a multimodal personality for a multimodal application, including evaluating, by the multimodal application, attributes of a user's interaction with the multimodal application; selecting, by the multimodal application, a vocal demeanor in dependence upon the values of the attributes of the user's interaction with the multimodal application; and incorporating, by the multimodal application, the vocal demeanor into the multimodal application.
Abstract: A method and apparatus for concealing frame loss and an apparatus for transmitting and receiving a speech signal that are capable of reducing speech quality degradation caused by packet loss are provided. In the method, when loss of a current received frame occurs, a random excitation signal having the highest correlation with a periodic excitation signal (i.e., a pitch excitation signal) decoded from a previous frame received without loss is used as a noise excitation signal to recover an excitation signal of a current lost frame. Furthermore, a third, new attenuation constant (AS) is obtained by summing a first attenuation constant (NS) obtained based on the number of continuously lost frames and a second attenuation constant (PS) predicted in consideration of change in amplitude of previously received frames to adjust the amplitude of the recovered excitation signal for the current lost frame.
Abstract: A speech recognition system and a personal speech profile data (PSPD) storage device that is physically distinct from the speech recognition system are provided. In the speech recognition system, a PSPD interface receives voice training data, which is associated with an individual and the operating conditions of an aircraft, from the PSPD storage device. A speech input module produces a digital speech signal derived from an utterance made by a system user. A speech processing module accesses voice training data stored on the PSPD storage device through the PSPD interface, and executes a speech processing algorithm that analyzes the digital speech signal using the voice training data, in order to identify one or more recognized terms from the digital speech signal. A command processing module initiates execution of various applications based on the recognized terms. Embodiments may be implemented in various types of host systems, including an aircraft cockpit-based system.
Abstract: The present invention aims at extracting a keyword of conversation without preparations by advanced anticipation of keywords of conversation.
Abstract: Embodiments include methods and apparatus for synchronizing data and focus between visual and voice views associated with distributed multi-modal applications. An embodiment includes a client device adapted to render a visual display that includes at least one multi-modal display element for which input data is receivable though a visual modality and a voice modality. When the client detects a user utterance via the voice modality, the client sends uplink audio data representing the utterance to a speech recognizer. An application server receives a speech recognition result generated by the speech recognizer, and sends a voice event response to the client. The voice event response is sent as a response to an asynchronous HTTP voice event request previously sent to the application server by the client. The client may then send another voice event request to the application server in response to receiving the voice event response.
Type:
Grant
Filed:
December 31, 2007
Date of Patent:
February 5, 2013
Assignee:
Motorola Mobility LLC
Inventors:
Michael D. Pearce, Jonathan R. Engelsma, James C. Ferrans
Abstract: A speech recognition system includes a mobile device and a remote server. The mobile device receives the speech from the user and extracts the features and phonemes from the speech. Selected phonemes, supplemental non-verbal information not based upon or derived from the speech, and measures of uncertainty are transmitted to the server, the phonemes are processed for speech understanding remotely or locally based, respectively, on whether or not the audible speech has a high or low recognition uncertainty, and a text of the speech (or the context or understanding of the speech) is transmitted back to the mobile device.
Abstract: A spoken dialogue apparatus is provided. The apparatus includes a speech recognition unit, a retrieval unit, a calculation unit, and a selection unit. The speech recognition unit performs speech recognition of an input speech to obtain a recognition candidate. The retrieval unit updates a retrieval condition by use of the recognition candidate and outputs a retrieval result. The calculation unit calculates estimated numbers of data as costs concerning a first response and a second response based on the retrieval condition and the retrieval result. The selection unit selects a response having a lowest cost and presents the response to the user.
Abstract: A list of reference terms can be provided. Text and the list of reference terms can be broken down into tokens. At least one candidate can be generated in the text for mapping to at least one of the reference terms. Characters of the candidate can be compared to characters of the reference term according to one or more mapping rules. A confidence value of the mapping can be generated based on the comparison of characters. Candidates can be ranked according to their confidence value.
Type:
Grant
Filed:
January 8, 2009
Date of Patent:
January 29, 2013
Assignee:
International Business Machines Corporation