Neural Network Patents (Class 704/232)
  • Patent number: 8315864
    Abstract: Provided herein are systems and methods for using context-sensitive speech recognition logic in a computer to create a software program, including context-aware voice entry of instructions that make up a software program, automatic context-sensitive instruction formatting, and automatic context-sensitive insertion-point positioning.
    Type: Grant
    Filed: April 24, 2012
    Date of Patent: November 20, 2012
    Inventor: Lunis Orcutt
  • Patent number: 8249870
    Abstract: A semi-automatic speech transcription system of the invention leverages the complementary capabilities of human and machine, building a system which combines automatic and manual approaches. With the invention, collected audio data is automatically distilled into speech segments, using signal processing and pattern recognition algorithms. The detected speech segments are presented to a human transcriber using a transcription tool with a streamlined transcription interface, requiring the transcriber to simply “listen and type”. This eliminates the need to manually navigate the audio, coupling the human effort to the amount of speech, rather than the amount of audio. Errors produced by the automatic system can be quickly identified by the human transcriber, which are used to improve the automatic system performance. The automatic system is tuned to maximize the human transcriber efficiency.
    Type: Grant
    Filed: November 12, 2008
    Date of Patent: August 21, 2012
    Assignee: Massachusetts Institute of Technology
    Inventors: Brandon Cain Roy, Deb Kumar Roy
  • Patent number: 8239196
    Abstract: An architecture and framework for speech/noise classification of an audio signal using multiple features with multiple input channels (e.g., microphones) are provided. The architecture may be implemented with noise suppression in a multi-channel environment where noise suppression is based on an estimation of the noise spectrum. The noise spectrum is estimated using a model that classifies each time/frame and frequency component of a signal as speech or noise by applying a speech/noise probability function. The speech/noise probability function estimates a speech/noise probability for each frequency and time bin. A speech/noise classification estimate is obtained by fusing (e.g., combining) data across different input channels using a layered network model.
    Type: Grant
    Filed: July 28, 2011
    Date of Patent: August 7, 2012
    Assignee: Google Inc.
    Inventor: Marco Paniconi
  • Patent number: 8239194
    Abstract: An architecture and framework for speech/noise classification of an audio signal using multiple features with multiple input channels (e.g., microphones) are provided. The architecture may be implemented with noise suppression in a multi-channel environment where noise suppression is based on an estimation of the noise spectrum. The noise spectrum is estimated using a model that classifies each time/frame and frequency component of a signal as speech or noise by applying a speech/noise probability function. The speech/noise probability function estimates a speech/noise probability for each frequency and time bin. A speech/noise classification estimate is obtained by fusing (e.g., combining) data across different input channels using a layered network model.
    Type: Grant
    Filed: September 26, 2011
    Date of Patent: August 7, 2012
    Assignee: Google Inc.
    Inventor: Marco Paniconi
  • Patent number: 8209170
    Abstract: Provided herein are systems and methods for using context-sensitive speech recognition logic in a computer to create a software program, including context-aware voice entry of instructions that make up a software program, automatic context-sensitive instruction formatting, and automatic context-sensitive insertion-point positioning.
    Type: Grant
    Filed: June 2, 2011
    Date of Patent: June 26, 2012
    Assignee: Lunis ORCUTT
    Inventor: Lunis Orcutt
  • Patent number: 8200486
    Abstract: Method and system for processing and identifying a sub-audible signal formed by a source of sub-audible sounds. Sequences of samples of sub-audible sound patterns (“SASPs”) for known words/phrases in a selected database are received for overlapping time intervals, and Signal Processing Transforms (“SPTs”) are formed for each sample, as part of a matrix of entry values. The matrix is decomposed into contiguous, non-overlapping two-dimensional cells of entries, and neural net analysis is applied to estimate reference sets of weight coefficients that provide sums with optimal matches to reference sets of values. The reference sets of weight coefficients are used to determine a correspondence between a new (unknown) word/phrase and a word/phrase in the database.
    Type: Grant
    Filed: June 5, 2003
    Date of Patent: June 12, 2012
    Assignee: The United States of America as represented by the Administrator of the National Aeronautics & Space Administration (NASA)
    Inventors: Charles C. Jorgensen, Diana D. Lee, Shane T. Agabon
  • Patent number: 8180633
    Abstract: A system and method for semantic extraction using a neural network architecture includes indexing each word in an input sentence into a dictionary and using these indices to map each word to a d-dimensional vector (the features of which are learned). Together with this, position information for a word of interest (the word to labeled) and a verb of interest (the verb that the semantic role is being predicted for) with respect to a given word are also used. These positions are integrated by employing a linear layer that is adapted to the input sentence. Several linear transformations and squashing functions are then applied to output class probabilities for semantic role labels. All the weights for the whole architecture are trained by backpropagation.
    Type: Grant
    Filed: February 29, 2008
    Date of Patent: May 15, 2012
    Assignee: NEC Laboratories America, Inc.
    Inventors: Ronan Collobert, Jason Weston
  • Patent number: 8150687
    Abstract: An example embodiment of the invention includes a speech recognition processing unit for specifying speech segments for speech data, recognizing a speech in each of the speech segments, and associating a character string of obtained recognition data with the speech data for each speech segment, based on information on a time of the speech, and an output control unit for displaying/outputting the text prepared by sorting the recognition data in each speech segment. Sometimes, the system further includes a text editing unit for editing the prepared text, and a speech correspondence estimation unit for associating a character string in the edited text with the speech data by using a technique of dynamic programming.
    Type: Grant
    Filed: November 30, 2004
    Date of Patent: April 3, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Shinsuke Mori, Nobuyasu Itoh, Masafumi Nishimura
  • Patent number: 8145492
    Abstract: A behavior control system of a robot for learning a phoneme sequence includes a sound inputting device inputting a phoneme sequence, a sound signal learning unit operable to convert the phoneme sequence into a sound synthesis parameter and to learn or evaluate a relationship between a sound synthesis parameter of a phoneme sequence that is generated by the robot and a sound synthesis parameter used for sound imitation, and a sound synthesizer operable to generate a phoneme sequence based on the sound synthesis parameter obtained by the sound signal learning unit.
    Type: Grant
    Filed: April 6, 2005
    Date of Patent: March 27, 2012
    Assignee: Sony Corporation
    Inventor: Masahiro Fujita
  • Patent number: 8126710
    Abstract: A method of adapting a neural network of an automatic speech recognition device, includes the steps of: providing a neural network including an input stage, an intermediate stage and an output stage, the output stage outputting phoneme probabilities; providing a linear stage in the neural network; and training the linear stage by means of an adaptation set; wherein the step of providing the linear stage includes the step of providing the linear stage after the intermediate stage.
    Type: Grant
    Filed: June 1, 2005
    Date of Patent: February 28, 2012
    Assignee: Loquendo S.p.A.
    Inventors: Roberto Gemello, Franco Mana
  • Publication number: 20110307252
    Abstract: Described is the use of utterance classification based methods and other machine learning techniques to provide a telephony application or other voice menu application (e.g., an automotive application) that need not use Context-Free-Grammars to determine a user's spoken intent. A classifier receives text from an information retrieval-based speech recognizer and outputs a semantic label corresponding to the likely intent of a user's speech. The semantic label is then output, such as for use by a voice menu program in branching between menus. Also described is training, including training the language model from acoustic data without transcriptions, and training the classifier from speech-recognized acoustic data having associated semantic labels.
    Type: Application
    Filed: June 15, 2010
    Publication date: December 15, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Yun-Cheng Ju, James Garnet Droppo, III
  • Patent number: 8041571
    Abstract: A method and apparatus detect and localize electric faults in electrical power grids and circuit. High impedance faults are detected by analyzing data from remote sensor units deployed over the network using the algorithms of speech and speaker analysis software. This is accomplished by converting the voltage and/or current waveform readouts from the sensors into a digital form which is then transmitted to a computer located either near the sensors or at an operations center. The digitized data is converted by a dedicated software or software/hardware interface to a format accepted by a reliable and stable software solution, such as speech or speaker recognition software. The speech or speaker recognition software must be “trained” to recognize various signal patterns that either indicate or not the occurrence of a fault. The readout of the speech or speaker recognition software, if indicating a fault, is transmitted to a central processor and displayed to provide information on the most likely type of fault.
    Type: Grant
    Filed: January 5, 2007
    Date of Patent: October 18, 2011
    Assignee: International Business Machines Corporation
    Inventors: Sarah C. McAllister, Tomasz J. Nowicki, Jason W. Pelecanos, Grzegorz M. Swirszcz
  • Patent number: 8032363
    Abstract: A method of processing a decoded speech (DS) signal including successive DS frames, each DS frame including DS samples. The method comprises: adaptively filtering the DS signal to produce a filtered signal; gain-scaling the filtered signal with an adaptive gain updated once a DS frame, thereby producing a gain-scaled signal; and performing a smoothing operation to smooth possible waveform discontinuities in the gain-scaled signal.
    Type: Grant
    Filed: August 9, 2002
    Date of Patent: October 4, 2011
    Assignee: Broadcom Corporation
    Inventors: Juin-Hwey Chen, Jes Thyssen, Chris C Lee
  • Patent number: 8032372
    Abstract: A computer program product for computing a correction rate predictor for medical record dictations, the computer program product residing on a computer-readable medium includes computer-readable instructions for causing a computer to obtain a draft medical transcription of at least a portion of a dictation, the dictation being from medical personnel and concerning a patient, determine features of the dictation to produce a feature set comprising a combination of features of the dictation, the features being relevant to a quantity of transcription errors in the transcription, analyze the feature set to compute a predicted correction rate associated with the dictation and use the predicted correction rate to determine whether to provide at least a portion of the transcription to a transcriptionist.
    Type: Grant
    Filed: September 13, 2005
    Date of Patent: October 4, 2011
    Assignee: eScription, Inc.
    Inventors: Roger Scott Zimmerman, George Zavaliagkos
  • Patent number: 7966177
    Abstract: The invention relates to a method for recognizing a phonetic sound sequence or a character sequence, e.g.
    Type: Grant
    Filed: August 13, 2001
    Date of Patent: June 21, 2011
    Inventor: Hans Geiger
  • Publication number: 20110144986
    Abstract: Described is a calibration model for use in a speech recognition system. The calibration model adjusts the confidence scores output by a speech recognition engine to thereby provide an improved calibrated confidence score for use by an application. The calibration model is one that has been trained for a specific usage scenario, e.g., for that application, based upon a calibration training set obtained from a previous similar/corresponding usage scenario or scenarios. Different calibration models may be used with different usage scenarios, e.g., during different conditions. The calibration model may comprise a maximum entropy classifier with distribution constraints, trained with continuous raw confidence scores and multi-valued word tokens, and/or other distributions and extracted features.
    Type: Application
    Filed: December 10, 2009
    Publication date: June 16, 2011
    Applicant: Microsoft Corporation
    Inventors: Dong Yu, Li Deng, Jinyu Li
  • Patent number: 7949679
    Abstract: A method of operating a storage of a finite state machine includes organizing information concerning an operation of the machine in a payload-transition matrix, in which a given number of columns of the matrix reflect features of a state of the machine and other columns describe valid transitions between the states of the machine depending on input characters, and compressing the payload-transition matrix in a row-displaced format.
    Type: Grant
    Filed: March 5, 2008
    Date of Patent: May 24, 2011
    Assignee: International Business Machines Corporation
    Inventor: Branimir Z. Lambov
  • Publication number: 20110119057
    Abstract: Disclosed are systems, methods, and computer-program products for segmenting content of an input signal and applications thereof. In an embodiment, the system includes simulated neurons, a phase modulator, and an entity-identifier module. Each simulated neuron is connected to one or more other simulated neurons and is associated with an activity and a phase. The activity and the phase of each simulated neuron is set based on the activity and the phase of the one or more other simulated neurons connected to each simulated neuron. The phase modulator includes individual modulators, each configured to modulate the activity and the phase of each of the plurality of simulated neurons based on a modulation function. The entity-identifier module is configured to identify one or more distinct entities (e.g., objects, sound sources, etc.) included in the input signal based on the one or more distinct collections of simulated neurons that have substantially distinct phases.
    Type: Application
    Filed: November 18, 2009
    Publication date: May 19, 2011
    Applicant: The Intellisis Corporation
    Inventors: Douglas A. Moore, Kristi H. Tsukida, Paulo B. Ang
  • Patent number: 7890329
    Abstract: Disclosed is directed an apparatus and method to reduce recognition errors through context relations among multiple dialogue turns. The apparatus includes a rule set storage unit having a rule set containing one or more rules, an evolutionary rule generation module connected to the rule storage unit, and a rule trigger unit connected to the rule storage unit. The rule set uses dialogue turn as a unit for the information described by each rule. The method analyzes a dialogue history through an evolutionary massive parallelism approach to get a rule set describing the context relation among dialogue turns. Based on the rule set and recognition result of an ASR system, it reevaluates the recognition result, and measures the confidence measure of the reevaluated recognition result. After each successful dialogue turn, the rule set is dynamically adapted.
    Type: Grant
    Filed: August 1, 2007
    Date of Patent: February 15, 2011
    Assignee: Industrial Technology Research Institute
    Inventors: Hsu-Chih Wu, Ching-Hsien Lee
  • Patent number: 7827031
    Abstract: A neural network in a speech-recognition system has computing units organized in levels including at least one hidden level and one output level. The computing units of the hidden level are connected to the computing units of the output level via weighted connections, and the computing units of the output level correspond to acoustic-phonetic units of the general vocabulary. This network executes the following steps: determining a subset of acoustic-phonetic units necessary for recognizing all the words contained in the general vocabulary subset; eliminating from the neural network all the weighted connections afferent to computing units of the output level that correspond to acoustic-phonetic units not contained in the previously determined subset of acoustic-phonetic units, thus obtaining a compacted neural network optimized for recognition of the words contained in the general vocabulary subset; and executing, at each moment in time, only the compacted neural network.
    Type: Grant
    Filed: February 12, 2003
    Date of Patent: November 2, 2010
    Assignee: Loquendo S.p.A.
    Inventors: Dario Albesano, Roberto Gemello
  • Patent number: 7818271
    Abstract: A method and apparatus are disclosed for selecting interaction policies. Values may be provided for a group of parameters for user models. Interaction policies within a specific tolerance of an optimal interaction policy for the user models may be learned. Up to a predetermined number of the learned interaction policies, within a specific tolerance of an optimal policy for the user models, may be selected for a wireless communication device. The wireless communication device, including the selected interaction policies, may determine whether any of a group of parameters, representing a user preference or contextual information with respect to use of the wireless communication device, is updated. When any of the group of parameters has been updated, the wireless communication device may select one of the selected interaction policies, such that the selected one of the selected interaction policies may determine a better interaction behavior for the wireless communication device.
    Type: Grant
    Filed: June 13, 2007
    Date of Patent: October 19, 2010
    Assignee: Motorola Mobility, Inc.
    Inventor: Michael E. Groble
  • Publication number: 20100217589
    Abstract: The invention provides a method for automated training of a plurality of artificial neural networks for phoneme recognition using training data, wherein the training data comprises speech signals subdivided into frames, each frame associated with a phoneme label, wherein the phoneme label indicates a phoneme associated with the frame. A sequence of frames from the training data are provided, wherein the number of frames in the sequence of frames is at least equal to the number of artificial neural networks. Each of the artificial neural networks is assigned a different subsequence of the provided sequence, wherein each subsequence comprises a predetermined number of frames. A common phoneme label for the sequence of frames is determined based on the phoneme labels of one or more frames of one or more subsequences of the provided sequence. Each artificial neural network using the common phoneme label.
    Type: Application
    Filed: February 17, 2010
    Publication date: August 26, 2010
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Rainer Gruhn, Daniel Vasquez, Guillermo Aradilla
  • Patent number: 7769580
    Abstract: A method of optimizing the execution of a neural network in a speech recognition system provides for conditionally skipping a variable number of frames, depending on a distance computed between output probabilities, or likelihoods, of a neural network. The distance is initially evaluated between two frames at times 1 and 1+k, where k is a predetermined maximum distance between frames, and if such distance is sufficiently small, the frames between times 1 and 1+k are calculated by interpolation, avoiding further executions of the neural network. If, on the contrary, such distance is not small enough, it means that the outputs of the network are changing quickly, and it is not possible to skip too many frames. In that case, the method attempts to skip remaining frames, calculating and evaluating a new distance.
    Type: Grant
    Filed: December 23, 2002
    Date of Patent: August 3, 2010
    Assignee: Loquendo S.p.A.
    Inventors: Roberto Gemello, Dario Albesano
  • Patent number: 7743011
    Abstract: In a weighted finite state network process, a finite state network object is stored. The finite state network object includes arcs, and each arc has an associated weight stored as a weight-defining finite state network object. The finite state network object is applied to an input. The applying includes combining weights of one or more arcs matching the input using finite state network-combining operations.
    Type: Grant
    Filed: December 21, 2006
    Date of Patent: June 22, 2010
    Assignee: Xerox Corporation
    Inventor: Kenneth R. Beesley
  • Patent number: 7720012
    Abstract: A system, method, and apparatus for identifying a speaker of an utterance, particularly when the utterance has portions of it missing due to packet losses. Different packet loss models are applied to each speaker's training data in order to improve accuracy, especially for small packet sizes.
    Type: Grant
    Filed: July 11, 2005
    Date of Patent: May 18, 2010
    Assignee: Arrowhead Center, Inc.
    Inventors: Deva K. Borah, Phillip De Leon
  • Patent number: 7680332
    Abstract: Techniques for efficiently and accurately organizing freeform handwriting into lines. A global cost function is employed to find the simplest partitioning of electronic ink strokes into line groups that also maximize the “goodness” of the resulting lines and the consistency of their configuration. The “goodness” of a line may be based upon its linear regression error and the horizontal and vertical compactness of the strokes making up the line. The line consistency configuration for a grouping of strokes is measured by the angle difference between neighboring groups. The global cost function also takes into account the complexity of the stroke partitioning, measured by the number of lines into which the strokes are grouped. An initial grouping of strokes is made, and the cost for this initial grouping is determined. Alternate groupings of the initial stroke grouping are then generated.
    Type: Grant
    Filed: May 30, 2005
    Date of Patent: March 16, 2010
    Assignee: Microsoft Corporation
    Inventors: Ming Ye, Herry Sutanto, Sashi Raghupathy, Chengyang Li, Michael Shilman
  • Publication number: 20100057452
    Abstract: The described implementations relate to speech interfaces and in some instances to speech pattern recognition techniques that enable speech interfaces. One system includes a feature pipeline configured to produce speech feature vectors from input speech. This system also includes a classifier pipeline configured to classify individual speech feature vectors utilizing multi-level classification.
    Type: Application
    Filed: August 28, 2008
    Publication date: March 4, 2010
    Applicant: Microsoft Corporation
    Inventors: Kunal Mukerjee, Brendan Meeder
  • Publication number: 20100057453
    Abstract: Discrimination between at least two classes of events in an input signal is carried out in the following way. A set of frames containing an input signal is received, and at least two different feature vectors are determined for each of said frames. Said at least two different feature vectors are classified using respective sets of preclassifiers trained for said at least two classes of events. Values for at least one weighting factor are determined based on outputs of said preclassifiers for each of said frames. A combined feature vector is calculated for each of said frames by applying said at least one weighting factor to said at least two different feature vectors. Said combined feature vector is classified using a set of classifiers trained for said at least two classes of events.
    Type: Application
    Filed: November 16, 2006
    Publication date: March 4, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Zica Valsan
  • Publication number: 20090292538
    Abstract: Systems and methods of improving speech recognition accuracy using statistical analysis of word or phrase-based search terms are disclosed. An illustrative system for statistically analyzing search terms includes an interface adapted to receive a text-based search term, a textual-linguistic analysis module that detects textual features within the search term and generates a first score, a phonetic conversion module that converts the search term into a phoneme string, a phonetic-linguistic analysis module that detects phonemic features within the phoneme string and generates a second score, and a score normalization module that normalizes the first and second scores and outputs a search term score to a user or process.
    Type: Application
    Filed: May 20, 2008
    Publication date: November 26, 2009
    Applicant: Calabrio, Inc.
    Inventor: David M. Barnish
  • Patent number: 7624082
    Abstract: Material (G) is converted by a combustion process in a plant (1) while air (L) is supplied. The state of the system in the plant (1) is described by state variables (x, y) and is regulated at least by one control loop (3, 5, 7, 9). Groups of states (Z) are defined for at least one pair of correlated state variables (x, y), with the groups being comparable as regards changes (dx/dt, dy/dt) of the correlated state variables (x, y). Each group of comparable states (Z) is characterized, as regards their transition functions, by parameters (Kp, Tn, Tv) of a standard controller. In the event of changes in the state of the system in the plant (1), the closest groups of comparable states (Z) are selected, and their transition functions, characterized by the parameters (Kp, Tn, Tv), are used for the purposes of regulation the system.
    Type: Grant
    Filed: September 27, 2007
    Date of Patent: November 24, 2009
    Assignee: Powitec Intelligent Technologies GmbH
    Inventors: Franz Wintrich, Volker Stephan
  • Patent number: 7620546
    Abstract: A speech signal isolation system configured to isolate and reconstruct a speech signal transmitted in an environment in which frequency components of the speech signal are masked by background noise. The speech signal isolation system obtains a noisy speech signal from an audio source. The noisy speech signal may then be fed through a neural network that has been trained to isolate and reconstruct a clean speech signal from against background noise. Once the noisy speech signal has been fed through the neural network, the speech signal isolation system generates an estimated speech signal with substantially reduced noise.
    Type: Grant
    Filed: March 21, 2005
    Date of Patent: November 17, 2009
    Assignee: QNX Software Systems (Wavemakers), Inc.
    Inventors: Phillip Hetherington, Pierre Zakarauskas, Shahla Parveen
  • Patent number: 7617101
    Abstract: A method and system for utterance verification is disclosed. It first extracts a sequence of feature vectors from speech signal. At least one candidate string is obtained after speech recognition. Then, speech signal is segmented into speech segments according to the verification-unit-specified structure of candidate string for making each speech segment corresponding to a verification unit. After calculating the verification feature vectors of speech segments, these verification feature vectors are sequentially used to generate verification scores of speech segments in verification process. This invention uses neural networks for calculating verification scores, where each neural network is a Multi-Layer Perceptron (MLP) developed for each verification unit. Verification score is obtained through using feed-forward process of MLP.
    Type: Grant
    Filed: July 29, 2003
    Date of Patent: November 10, 2009
    Assignee: Industrial Technology Research Institute
    Inventors: Sen-Chia Chang, Shih-Chieh Chien
  • Publication number: 20090259464
    Abstract: A system and method for facilitating cognitive processing of simultaneous remote voice conversations is provided. A plurality of remote voice conversations participated in by distributed participants are provided over a shared communication channel. A main conversation between at least two of the distributed participants and one or more subconversations between at least two other of the distributed participants are identified from within the remote voice conversations. Segments of interest to one of the distributed participants are defined including a conversation excerpt having a lower attention activation threshold for the one distributed participant. Each of the subconversations is parsed into conversation excerpts. The conversation excerpts are compared to the segments of interest. One or more gaps between conversation flow in the main conversation are predicted.
    Type: Application
    Filed: April 11, 2008
    Publication date: October 15, 2009
    Applicant: PALO ALTO RESEARCH CENTER INCORPORATED
    Inventors: Nicolas B. Ducheneaut, Trevor F. Smith
  • Patent number: 7603272
    Abstract: Disclosed is a system and method of decomposing a lattice transition matrix into a block diagonal matrix. The method is applicable to automatic speech recognition but can be used in other contexts as well, such as parsing, named entity extraction and any other methods. The method normalizes the topology of any input graph according to a canonical form.
    Type: Grant
    Filed: June 19, 2007
    Date of Patent: October 13, 2009
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Dilek Z. Hakkani-Tur, Giuseppe Riccardi
  • Patent number: 7584098
    Abstract: A method of identifying a location of a query string in an audio signal is provided. Under the method, a segment of the audio signal is selected. A score for a query string in the segment of the audio signal is determined by determining the product of probabilities of overlapping sequences of tokens. The score is then used to decide if the segment of the audio signal is likely to contain the query string.
    Type: Grant
    Filed: November 29, 2004
    Date of Patent: September 1, 2009
    Assignee: Microsoft Corporation
    Inventors: Roger Peng Yu, Frank Torsten Seide
  • Publication number: 20090216528
    Abstract: A method of adapting a neural network of an automatic speech recognition device, includes the steps of: providing a neural network including an input stage, an intermediate stage and an output stage, the output stage outputting phoneme probabilities; providing a linear stage in the neural network; and training the linear stage by means of an adaptation set; wherein the step of providing the linear stage includes the step of providing the linear stage after the intermediate stage.
    Type: Application
    Filed: June 1, 2005
    Publication date: August 27, 2009
    Inventors: Roberto Gemello, Franco Mana
  • Publication number: 20090112599
    Abstract: Disclosed are systems, methods and computer readable media for applying a multi-state barge-in acoustic model in a spoken dialogue system comprising the steps of (1) presenting a prompt to a user from the spoken dialog system. (2) receiving an audio speech input from the user during the presentation of the prompt, (3) accumulating the audio speech input from the user, (4) applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, (5) applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, (6) determining whether the audio speech input is a barge-in-speech input from the user, and (7) if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.
    Type: Application
    Filed: October 31, 2007
    Publication date: April 30, 2009
    Applicant: AT&T Labs
    Inventor: Andrej Ljolje
  • Publication number: 20090106022
    Abstract: An improved system and method is provided for efficiently learning a network of categories using prediction. A learning engine may receive a stream of characters and incrementally segment the stream of characters beginning with individual characters into larger and larger categories. To do so, a prediction engine may be provided for predicting a target category from the stream of characters using one or more context categories. Upon predicting the target category, the edges of the network of categories may be updated. A category composer may also be provided for composing a new category from existing categories in the network of categories, and a new category composed may then be added to the network of categories. Advantageously, iterative episodes of prediction and learning of categories for large scale applications may result in hundreds of thousands of categories connected by millions of prediction edges.
    Type: Application
    Filed: October 18, 2007
    Publication date: April 23, 2009
    Applicant: Yahoo! Inc.
    Inventor: Omid Madani
  • Patent number: 7502736
    Abstract: Disclosed is a voice registration method for voice recognition, comprising the steps of analyzing a spectrum of a sound signal inputted from the outside; extracting predetermined language units for a speaker recognition from a voice signal in the sound signal; measuring the loudness of each language unit; collecting voice data on registered (background) speakers including loudness data of the plurality of background speakers as a reference onto voice database; determining whether the loudness of each language unit is within a predetermined loudness range based on the voice data base; learning each language unit by using a multi-layer perceptron in the case that at least a predetermined number of language units are within the predetermined loudness range; and storing data on the learned language unit as data for recognizing the speaker. With this configuration, loudness of a speaker is considered at learning for registering his/her voice and at verifying a speaker.
    Type: Grant
    Filed: December 6, 2001
    Date of Patent: March 10, 2009
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Sang-jin Hong, Sung-zoo Lee, Tae-soo Kim, Tae-sung Lee, Ho-jin Choi, Byoung-won Hwang
  • Patent number: 7499892
    Abstract: An information processing apparatus includes a first learning unit adapted to learn a first SOM (self-organization map), based on a first parameter extracted from an observed value, a winner node determination unit adapted to determine a winner node on the first SOM, a searching unit adapted to search for a generation node on a second SOM having highest connection strength with the winner node, a parameter generation unit adapted to generate a second parameter from the generation node, a modification unit adapted to modify the second parameter generated from the generation node, a first connection weight modification unit adapted to modify the connection weight when end condition is satisfied, a second connection weight modification unit adapted to modify the connection weight depending on evaluation made by a user, and a second learning unit adapted to learn the second SOM based on the second parameter obtained when the end condition is satisfied.
    Type: Grant
    Filed: April 4, 2006
    Date of Patent: March 3, 2009
    Assignee: Sony Corporation
    Inventors: Kazumi Aoyama, Katsuki Minamino, Hideki Shimomura
  • Patent number: 7454341
    Abstract: According to one aspect of the invention, a method is provided in which a mean vector set and a variance vector set of a set of N Gaussians are divided into multiple mean sub-vector sets and variance sub-vector sets, respectively. Each mean sub-vector set contains a subset of the dimensions of the corresponding mean vector set and each variance sub-vector set contains a subset of the dimensions of the corresponding variance vector set. Each resultant sub-vector set is clustered to build a codebook for the respective sub-vector set using a modified K-means clustering process which dynamically merges and splits clusters based upon the size and average distortion of each cluster during each iteration in the modified K-means clustering process.
    Type: Grant
    Filed: September 30, 2000
    Date of Patent: November 18, 2008
    Assignee: Intel Corporation
    Inventors: Jielin Pan, Baosheng Yuan
  • Patent number: 7444282
    Abstract: A method of automatic labeling using an optimum-partitioned classified neural network includes searching for neural networks having minimum errors with respect to a number of L phoneme combinations from a number of K neural network combinations generated at an initial stage or updated, updating weights during learning of the K neural networks by K phoneme combination groups searched with the same neural networks, and composing an optimum-partitioned classified neural network combination using the K neural networks of which a total error sum has converged; and tuning a phoneme boundary of a first label file by using the phoneme combination group classification result and the optimum-partitioned classified neural network combination, and generating a final label file reflecting the tuning result.
    Type: Grant
    Filed: March 1, 2004
    Date of Patent: October 28, 2008
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Ki-hyun Choo, Jeong-su Kim, Jae-won Lee, Ki-seung Lee
  • Publication number: 20080221878
    Abstract: A system and method for semantic extraction using a neural network architecture includes indexing each word in an input sentence into a dictionary and using these indices to map each word to a d-dimensional vector (the features of which are learned). Together with this, position information for a word of interest (the word to labeled) and a verb of interest (the verb that the semantic role is being predicted for) with respect to a given word are also used. These positions are integrated by employing a linear layer that is adapted to the input sentence. Several linear transformations and squashing functions are then applied to output class probabilities for semantic role labels. All the weights for the whole architecture are trained by backpropagation.
    Type: Application
    Filed: February 29, 2008
    Publication date: September 11, 2008
    Applicant: NEC LABORATORIES AMERICA, INC.
    Inventors: Ronan Collobert, Jason Weston
  • Patent number: 7409340
    Abstract: A neural network is used to obtain more robust performance in determining prosodic markers on the basis of linguistic categories.
    Type: Grant
    Filed: January 27, 2003
    Date of Patent: August 5, 2008
    Assignee: Siemens Aktiengesellschaft
    Inventors: Martin Holzapfel, Achim Mueller
  • Publication number: 20080147391
    Abstract: Provided is a method and apparatus for transforming a speech feature vector. The method includes extracting a feature vector required for speech recognition from a speech signal and transforming the extracted feature vector using an auto-associative neural network (AANN).
    Type: Application
    Filed: August 31, 2007
    Publication date: June 19, 2008
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: So-young Jeong, Kwang-cheol Oh, Jae-hoon Jeong, Jeong-su Kim
  • Patent number: 7379507
    Abstract: A modulation recognition method and device for digitally modulated signals with multi-level magnitudes are provided. The modulation recognition method includes selecting plural quantization sizes used to construct plural statistic histograms related to the magnitude of a sequence of data, setting up an off-line processing to extract plural useful feature patterns for each modulation type of interest, receiving a sequence of samples of a modulated object signal and constructing plural statistic histograms related to the magnitude of these samples, and adopting a hierarchical classification method for modulation recognition. It can be applied to the adaptive-modulation communication system, software defined radio, digital broadcasting systems and military communication systems. It can also be integrated with modulation recognition techniques for other types of modulated signals to function in a universal demodulator.
    Type: Grant
    Filed: October 1, 2004
    Date of Patent: May 27, 2008
    Assignee: Industrial Technology Research Institute
    Inventors: Ching-Yung Chen, Chih-Chun Feng
  • Patent number: 7376553
    Abstract: An apparatus for signal processing based on an algorithm for representing harmonics in a fractal lattice. The apparatus includes a plurality of tuned segments, each tuned segment including a transceiver having an intrinsic resonant frequency the amplitude of the resonant frequency capable of being modified by either receiving an external input signal, or by internally generating a response to an applied feedback signal. A plurality of signal processing elements are arranged in an array pattern, the signal processing elements including at least one function selected from the group including buffers for storing information, a feedback device for generating a feedback signal, a controller for controlling an output signal, a connection circuit for connecting the plurality of tuned segments to signal processing elements, and a feedback connection circuit for conveying signals from the plurality of signal processing elements in the array to the tuned segments.
    Type: Grant
    Filed: July 8, 2004
    Date of Patent: May 20, 2008
    Inventor: Robert Patel Quinn
  • Patent number: 7346497
    Abstract: An automatic speech recognition system comprising a speech decoder to resolve phone and word level information, a vector generator to generate information vectors on which a confidence measure is based by a neural network classifier (ANN). An error signal is designed which is not subject to false saturation or over specialization. The error signal is integrated into an error function which is back propagated through the ANN.
    Type: Grant
    Filed: May 8, 2001
    Date of Patent: March 18, 2008
    Assignee: Intel Corporation
    Inventors: Xiaobo Pi, Ying Jia
  • Patent number: 7319960
    Abstract: A speech recognition system uses a phoneme counter to determine the length of a word to be recognized. The result is used to split a lexicon into one or more sub-lexicons containing only words which have the same or similar length to that of the word to be recognized, so restricting the search space significantly. In another aspect, a phoneme counter is used to estimate the number of phonemes in a word so that a transition bias can be calculated. This bias is applied to the transition probabilities between phoneme models in an HNN based recognizer to improve recognition performance for relatively short or long words.
    Type: Grant
    Filed: December 19, 2001
    Date of Patent: January 15, 2008
    Assignee: Nokia Corporation
    Inventors: Soren Riis, Konstantinos Koumpis
  • Patent number: 7295977
    Abstract: The method of the present invention utilizes machine-learning techniques, particularly Support Vector Machines in combination with a neural network, to process a unique machine-learning enabled representation of the audio bitstream. Using this method, a classifying machine is able to autonomously detect characteristics of a piece of music, such as the artist or genre, and classify it accordingly. The method includes transforming digital time-domain representation of music into a frequency-domain representation, then dividing that frequency data into time slices, and compressing it into frequency bands to form multiple learning representations of each song. The learning representations that result are processed by a group of Support Vector Machines, then by a neural network, both previously trained to distinguish among a given set of characteristics, to determine the classification.
    Type: Grant
    Filed: August 27, 2001
    Date of Patent: November 13, 2007
    Assignee: NEC Laboratories America, Inc.
    Inventors: Brian Whitman, Gary W. Flake, Stephen R. Lawrence