Abstract: The invention relates to a method of automatic recognition of an at least partly spelled speech utterance, with a speech recognition unit (2) based on statistical models that include a linguistic speech model (6).
Abstract: A method of secure remote control by voice wherein the digitization and speech recognition functions are separated, which involves receiving an audible voice password in a remote controller, digitizing the voice password, and transmitting the digitized voice password and an ID from the controller to a base station. The method also includes confirming the ID and the password in the base station, receiving an audible voice command in the controller, and digitizing the command. The method still further includes transmitting the digitized command from the controller to the base station, confirming the command to indicate transmission of a desired control signal by the base station, and transmitting the control signal from the base station in response to the command.
Abstract: A financial data system is disclosed that receives real-time data, uses a set of pre-determined rules to prioritize the data and provide a priority value, and then delivers the highest priority data by way of multiple audio channels. A key aspect of the invention is the use of data manipulation according to the priority value to adjust delivery volume, provide selective vocalization compression, add additional audio channels, or to override an existing comment when required. As a result of the invention, a significant amount of information may be aurally delivered to a user including properties of events as they change in response to changing financial conditions.
Abstract: A method and system of performing confidence measure in a speech recognition system includes receiving an utterance of input speech and creating a near-miss pattern or a near-miss list of possible word entries for the utterance. Each word entry includes an associated value of probability that the utterance corresponds to the word entry. The near-miss list of possible word entries is compared with corresponding stored near-miss confidence templates. Each word in the vocabulary (or keyword list) of near-miss confidence template, which includes a list of word entries and each word entry in each list includes an associated value. Confidence measure for a particular hypothesis word is performed based on the comparison of the values in the near-miss list of possible word entries with the values of the corresponding near-miss confidence template.
November 13, 1998
Date of Patent:
May 27, 2003
Hsiao-Wuen Hon, Asela J. R. Gunawardana
Abstract: The model adaptation system of the present invention is a speaker verification system that embodies the capability to adapt models learned during the enrollment component to track aging of a user's voice. The system has the advantage of only requiring a single enrollment for the user. The model adaptation system and methods can be applied to several types of speaker recognition models including neural tree networks (NTN), Gaussian Mixture Models (GMMs), and dynamic time warping (DTW) or to multiple models (i.e., combinations of NTNs, GMMs and DTW). Moreover, the present invention can be applied to text-dependent or text-independent systems.
Abstract: Speech recognition software is provided in combination with application specific software on a communications network. Analog voice data is digitized at a user's location, identified as voice data, and transmitted to the application software residing at a central location. The network server receiving data identified as voice data transmits it to a speech server. Speech recognition software resident at the speech server contains a dictionary and modules tailored to the voice of each of the users of the speech recognition software. As the user speaks, a translation of the dictation is transmitted back to the user's location and appears in print on the user's computer screen for examination and if necessary, voice or typed correction of its contents. Multiple users have interleaved access to the speech recognition software so that transmission back to each of the users is contemporaneous.
June 29, 1998
Date of Patent:
August 13, 2002
International Business Machines Corporation
Abstract: An audio compressor utilizes a switched charging state rectifier which produces an output proportional to the magnitude of an input signal but with controlled attack/release times. The rectified voltage is input to a logical selector, which provides logical control signals which are a function of the rectified voltage. In a preferred embodiment, the control signals of the logical selector are used to select the switch positions of a switched resistor ladder. The switched resistor ladder is used to provide a resistance path in an op-amp feedback amplifier, thereby enabling the gain of the op-amp amplifier to be adjusted in steps by the selector as the rectified signal level varies.
Abstract: A method for N-best search for continuous speech recognition with limited storage space includes the steps of Viterbi pruning word level (same word, different time alignment, thus non-output differentiation) states and keeping the N-best sub-optimal paths for sentence level (output differentiation) states.
Abstract: In a speech synthesizer apparatus, a weighting coefficient training controller calculates acoustic distances in second acoustic feature parameters between one target phoneme from the same phoneme and the phoneme candidates other than the target phoneme based on first acoustic feature parameters and prosodic feature parameters, and determines weighting coefficient vectors for respective target phonemes defining degrees of contribution to the second acoustic feature parameters for respective phoneme candidates by executing a predetermined statistical analysis therefor.
Abstract: A noise suppression system implemented in communication system provides an improved level of quality during severe signal-to-noise ratio (SNR) conditions. The noise suppression system, inter alia, incorporates a frequency domain comb-filtering (289) technique which supplements a traditional spectral noise suppression method. The invention includes a real cepstrum generator (285) for an input signal (285) G(k) to produce a likely voiced speech pitch lag component and converting a result to frequency domain to obtain a comb-filter function (290) C(k), applying input signal (291) G(k) to comb-filter function (290) C(k), and equalizing the energies of the corresponding pre and post filtered subbands, to produce a signal (293) G″(k) to be used for noise suppression. This prevents high frequency components from being unnecessarily attenuated, thereby reducing muffling effects of prior art comb-filters.