Patents by Inventor Frank Torsten Bernd Seide

Frank Torsten Bernd Seide has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8825481
    Abstract: Techniques are described for training a speech recognition model for accented speech. A subword parse table is employed that models mispronunciations at multiple subword levels, such as the syllable, position-specific cluster, and/or phone levels. Mispronunciation probability data is then generated at each level based on inputted training data, such as phone-level annotated transcripts of accented speech. Data from different levels of the subword parse table may then be combined to determine the accented speech model. Mispronunciation probability data at each subword level is based at least in part on context at that level. In some embodiments, phone-level annotated transcripts are generated using a semi-supervised method.
    Type: Grant
    Filed: January 20, 2012
    Date of Patent: September 2, 2014
    Assignee: Microsoft Corporation
    Inventors: Albert Joseph Kishan Thambiratnam, Timo Pascal Mertens, Frank Torsten Bernd Seide
  • Publication number: 20140149468
    Abstract: Described is a technology by which a playback list comprising similar songs is automatically built based on automatically detected/generated song attributes, such as by extracting numeric features of each song. The attributes may be downloaded from a remote connection, and/or may be locally generated on the playback device. To build a playlist, a seed song's attributes may be compared against attributes of other songs to determine which other songs are similar to the seed song and thus included in the playlist. Another way to build a playlist is based on similarity of songs to a set of user provided-attributes, such as corresponding to moods or usage modes such as “resting” “reading” “jogging” or “driving” moods/modes. The playlist may be dynamically adjusted based on user interaction with the device, such as when a user skips a song, queues a song, or dequeues a song.
    Type: Application
    Filed: February 3, 2014
    Publication date: May 29, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Lie Lu, Frank Torsten Bernd Seide, Gabriel White
  • Publication number: 20140142929
    Abstract: The use of a pipelined algorithm that performs parallelized computations to train deep neural networks (DNNs) for performing data analysis may reduce training time. The DNNs may be one of context-independent DNNs or context-dependent DNNs. The training may include partitioning training data into sample batches of a specific batch size. The partitioning may be performed based on rates of data transfers between processors that execute the pipelined algorithm, considerations of accuracy and convergence, and the execution speed of each processor. Other techniques for training may include grouping layers of the DNNs for processing on a single processor, distributing a layer of the DNNs to multiple processors for processing, or modifying an execution order of steps in the pipelined algorithm.
    Type: Application
    Filed: November 20, 2012
    Publication date: May 22, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Frank Torsten Bernd Seide, Gang Li, Dong Yu, Adam C. Eversole, Xie Chen
  • Patent number: 8700552
    Abstract: Deep Neural Network (DNN) training technique embodiments are presented that train a DNN while exploiting the sparseness of non-zero hidden layer interconnection weight values. Generally, a fully connected DNN is initially trained by sweeping through a full training set a number of times. Then, for the most part, only the interconnections whose weight magnitudes exceed a minimum weight threshold are considered in further training. This minimum weight threshold can be established as a value that results in only a prescribed maximum number of interconnections being considered when setting interconnection weight values via an error back-propagation procedure during the training. It is noted that the continued DNN training tends to converge much faster than the initial training.
    Type: Grant
    Filed: November 28, 2011
    Date of Patent: April 15, 2014
    Assignee: Microsoft Corporation
    Inventors: Dong Yu, Li Deng, Frank Torsten Bernd Seide, Gang Li
  • Patent number: 8650029
    Abstract: A voice activity detection (VAD) module analyzes a media file, such as an audio file or a video file, to determine whether one or more frames of the media file include speech. A speech recognizer generates feedback relating to an accuracy of the VAD determination. The VAD module leverages the feedback to improve subsequent VAD determinations. The VAD module also utilizes a look-ahead window associated with the media file to adjust estimated probabilities or VAD decisions for previously processed frames.
    Type: Grant
    Filed: February 25, 2011
    Date of Patent: February 11, 2014
    Assignee: Microsoft Corporation
    Inventors: Albert Joseph Kishan Thambiratnam, Weiwu Zhu, Frank Torsten Bernd Seide
  • Patent number: 8642872
    Abstract: Described is a technology by which a playback list comprising similar songs is automatically built based on automatically detected/generated song attributes, such as by extracting numeric features of each song. The attributes may be downloaded from a remote connection, and/or may be locally generated on the playback device. To build a playlist, a seed song's attributes may be compared against attributes of other songs to determine which other songs are similar to the seed song and thus included in the playlist. Another way to build a playlist is based on similarity of songs to a set of user provided-attributes, such as corresponding to moods or usage modes such as “resting” “reading” “jogging” or “driving” moods/modes. The playlist may be dynamically adjusted based on user interaction with the device, such as when a user skips a song, queues a song, or dequeues a song.
    Type: Grant
    Filed: June 4, 2008
    Date of Patent: February 4, 2014
    Assignee: Microsoft Corporation
    Inventors: Lie Lu, Frank Torsten Bernd Seide, Gabriel White
  • Publication number: 20130191126
    Abstract: Techniques are described for training a speech recognition model for accented speech. A subword parse table is employed that models mispronunciations at multiple subword levels, such as the syllable, position-specific cluster, and/or phone levels. Mispronunciation probability data is then generated at each level based on inputted training data, such as phone-level annotated transcripts of accented speech. Data from different levels of the subword parse table may then be combined to determine the accented speech model. Mispronunciation probability data at each subword level is based at least in part on context at that level. In some embodiments, phone-level annotated transcripts are generated using a semi-supervised method.
    Type: Application
    Filed: January 20, 2012
    Publication date: July 25, 2013
    Applicant: Microsoft Corporation
    Inventors: Albert Joseph Kishan Thambiratnam, Timo Pascal Mertens, Frank Torsten Bernd Seide
  • Publication number: 20130138589
    Abstract: Deep Neural Network (DNN) training technique embodiments are presented that train a DNN while exploiting the sparseness of non-zero hidden layer interconnection weight values. Generally, a fully connected DNN is initially trained by sweeping through a full training set a number of times. Then, for the most part, only the interconnections whose weight magnitudes exceed a minimum weight threshold are considered in further training. This minimum weight threshold can be established as a value that results in only a prescribed maximum number of interconnections being considered when setting interconnection weight values via an error back-propagation procedure during the training. It is noted that the continued DNN training tends to converge much faster than the initial training.
    Type: Application
    Filed: November 28, 2011
    Publication date: May 30, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Dong Yu, Li Deng, Frank Torsten Bernd Seide, Gang Li
  • Publication number: 20130138436
    Abstract: Discriminative pretraining technique embodiments are presented that pretrain the hidden layers of a Deep Neural Network (DNN). In general, a one-hidden-layer neural network is trained first using labels discriminatively with error back-propagation (BP). Then, after discarding an output layer in the previous one-hidden-layer neural network, another randomly initialized hidden layer is added on top of the previously trained hidden layer along with a new output layer that represents the targets for classification or recognition. The resulting multiple-hidden-layer DNN is then discriminatively trained using the same strategy, and so on until the desired number of hidden layers is reached. This produces a pretrained DNN. The discriminative pretraining technique embodiments have the advantage of bringing the DNN layer weights close to a good local optimum, while still leaving them in a range with a high gradient so that they can be fine-tuned effectively.
    Type: Application
    Filed: November 26, 2011
    Publication date: May 30, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Dong Yu, Li Deng, Frank Torsten Bernd Seide, Gang Li
  • Publication number: 20120226696
    Abstract: In various embodiments, a transcript that represents a media file is created. Keyword candidates that may represent topics and/or content associated with the media content are then be extracted from the transcript. Furthermore, a keyword set may be generated for the media content utilizing a mutual information criteria. In other embodiments, one or more queries may be generated based at least in part on the transcript, and a plurality of web documents may be retrieved based at least in part on the one or more queries. Additional keyword candidates may be extracted from each web document and then ranked. A subset of the keyword candidates may then be selected to form a keyword set associated with the media content.
    Type: Application
    Filed: March 4, 2011
    Publication date: September 6, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Albert Joseph Kishan Thambiratnam, Sha Meng, Gang Li, Frank Torsten Bernd Seide
  • Publication number: 20120221330
    Abstract: A voice activity detection (VAD) module analyzes a media file, such as an audio file or a video file, to determine whether one or more frames of the media file include speech. A speech recognizer generates feedback relating to an accuracy of the VAD determination. The VAD module leverages the feedback to improve subsequent VAD determinations. The VAD module also utilizes a look-ahead window associated with the media file to adjust estimated probabilities or VAD decisions for previously processed frames.
    Type: Application
    Filed: February 25, 2011
    Publication date: August 30, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Albert Joseph Kishan Thambiratnam, Weiwu Zhu, Frank Torsten Bernd Seide
  • Publication number: 20100268534
    Abstract: Described is a technology that provides highly accurate speech-recognized text transcripts of conversations, particularly telephone or meeting conversations. Speech is received for recognition when it is at a high quality and separate for each user, that is, independent of any transmission. Moreover, because the speech is received separately, a personalized recognition model adapted to each user's voice and vocabulary may be used. The separately recognized text is then merged into a transcript of the communication. The transcript may be labeled with the identity of each user that spoke the corresponding speech. The output of the transcript may be dynamic as the conversation takes place, or may occur later, such as contingent upon each user agreeing to release his or her text. The transcript may be incorporated into the text or data of another program, such as to insert it as a thread in a larger email conversation or the like.
    Type: Application
    Filed: April 17, 2009
    Publication date: October 21, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: ALBERT JOSEPH KISHAN THAMBIRATNAM, FRANK TORSTEN BERND SEIDE, PENG YU, ROY GEOFFREY WALLACE
  • Patent number: 7693267
    Abstract: Improved systems and methods are provided for transcribing audio files of voice mails sent over a unified messaging system. Customized grammars specific to a voice mail recipient are created and utilized to transcribe a received voice mail by comparing the audio file to commonly utilized words, names, acronyms, and phrases used by the recipient. Key elements are identified from the resulting text transcription to aid the recipient in processing received voice mails based on the significant content contained in the voice mail.
    Type: Grant
    Filed: December 30, 2005
    Date of Patent: April 6, 2010
    Assignee: Microsoft Corporation
    Inventors: David Andrew Howell, Sridhar Sundararaman, David T. Fong, Frank Torsten Bernd Seide
  • Patent number: 7617104
    Abstract: A method of speech recognition is provided that determines a production-related value, vocal-tract resonance frequencies in particular, for a state at a particular frame based on the production-related values associated with two preceding frames using a recursion. The production-related value is used to determine a probability distribution of the observed feature vector for the state. A probability for an observed value received for the frame is then determined from the probability distribution. Under one embodiment, the production-related value is determined using a noise-free recursive definition for the value. Use of the recursion substantially improves the decoding speed. When the decoding algorithm is applied to training data with known phonetic transcripts, forced alignment is created which improves the phone segmentation obtained from the prior art.
    Type: Grant
    Filed: January 21, 2003
    Date of Patent: November 10, 2009
    Assignee: Microsoft Corporation
    Inventors: Li Deng, Jian-Iai Zhou, Frank Torsten Bernd Seide
  • Publication number: 20090217804
    Abstract: Described is a technology by which a playback list comprising similar songs is automatically built based on automatically detected/generated song attributes, such as by extracting numeric features of each song. The attributes may be downloaded from a remote connection, and/or may be locally generated on the playback device. To build a playlist, a seed song's attributes may be compared against attributes of other songs to determine which other songs are similar to the seed song and thus included in the playlist. Another way to build a playlist is based on similarity of songs to a set of user provided-attributes, such as corresponding to moods or usage modes such as “resting” “reading” “jogging” or “driving” moods/modes. The playlist may be dynamically adjusted based on user interaction with the device, such as when a user skips a song, queues a song, or dequeues a song.
    Type: Application
    Filed: June 4, 2008
    Publication date: September 3, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Lie Lu, Frank Torsten Bernd Seide, Gabriel White
  • Patent number: 7206741
    Abstract: A speech signal is decoded by determining a production-related value for a current state based on an optimal production-related value at the end of a preceding state, the optimal production-related value being selected from a set of continuous values. The production-related value is used to determine a likelihood of a phone being represented by a set of observation vectors that are aligned with a path between the preceding state and the current state. The likelihood of the phone is combined with a score from the preceding state to determine a score for the current state, the score from the preceding state being associated with a discrete class of production-related values wherein the class matches the class of the optimal production-related value.
    Type: Grant
    Filed: December 6, 2005
    Date of Patent: April 17, 2007
    Assignee: Microsoft Corporation
    Inventors: Li Deng, Jian-lai Zhou, Frank Torsten Bernd Seide, Asela J. R. Gunawardana, Hagai Attias, Alejandro Acero, Xuedong Huang
  • Patent number: 7050975
    Abstract: A method of speech recognition is provided that identifies a production-related dynamics value by performing a linear interpolation between a production-related dynamics value at a previous time and a production-related target using a time-dependent interpolation weight. The hidden production-related dynamics value is used to compute a predicted value that is compared to an observed value of acoustics to determine the likelihood of the observed acoustics given a sequence of hidden phonological units. In some embodiments, the production-related dynamics value at the previous time is selected from a set of continuous values. In addition, the likelihood of the observed acoustics given a sequence of hidden phonological units is combined with a score associated with a discrete class of production-related dynamic values at the previous time to determine a score for a current phonological state.
    Type: Grant
    Filed: October 9, 2002
    Date of Patent: May 23, 2006
    Assignee: Microsoft Corporation
    Inventors: Li Deng, Jian-Iai Zhou, Frank Torsten Bernd Seide, Asela J. R. Gunawardana, Hagai Attias, Alejandro Acero, Xuedong Huang
  • Patent number: 6829578
    Abstract: Robust acoustic tone features are achieved first by the introduction of on-line, look-ahead trace back of the fundamental frequency (F0) contour with adaptive pruning, this fundamental frequency serves as the signal preprocessing front-end. The F0 contour is subsequently decomposed into lexical tone effect, phrase intonation effect, and random effect by means of time-variant, weighted moving average (MA) filter in conjunction with weighted (placing more emphasis on vowels) least squares of the F0 contour. The intonation effect is removed by subtraction of the F0 contour under superposition assumption. The acoustic tone features are defined as two parts. First, is the coefficients of the second order weighted regression of the de-intonation of the F0 contour over neighbouring frames. The second part deals with the degree of the periodicity of the signal, which are the coefficients of the second order regression of the auto-correlation.
    Type: Grant
    Filed: July 9, 2001
    Date of Patent: December 7, 2004
    Assignee: Koninklijke Philips Electronics, N.V.
    Inventors: Chang-Han Huang, Frank Torsten Bernd Seide
  • Publication number: 20040143435
    Abstract: A method of speech recognition is provided that determines a production-related value, vocal-tract resonance frequencies in particular, for a state at a particular frame based on the production-related values associated with two preceding frames using a recursion. The production-related value is used to determine a probability distribution of the observed feature vector for the state. A probability for an observed value received for the frame is then determined from the probability distribution. Under one embodiment, the production-related value is determined using a noise-free recursive definition for the value. Use of the recursion substantially improves the decoding speed. When the decoding algorithm is applied to training data with known phonetic transcripts, forced alignment is created which improves the phone segmentation obtained from the prior art.
    Type: Application
    Filed: January 21, 2003
    Publication date: July 22, 2004
    Inventors: Li Deng, Jian-Iai Zhou, Frank Torsten Bernd Seide
  • Publication number: 20040019483
    Abstract: A method of speech recognition is provided that identifies a production-related dynamics value by performing a linear interpolation between a production-related dynamics value at a previous time and a production-related target using a time-dependent interpolation weight. The hidden production-related dynamics value is used to compute a predicted value that is compared to an observed value of acoustics to determine the likelihood of the observed acoustics given a sequence of hidden phonological units. In some embodiments, the production-related dynamics value at the previous time is selected from a set of continuous values. In addition, the likelihood of the observed acoustics given a sequence of hidden phonological units is combined with a score associated with a discrete class of production-related dynamic values at the previous time to determine a score for a current phonological state.
    Type: Application
    Filed: October 9, 2002
    Publication date: January 29, 2004
    Inventors: Li Deng, Jian-Iai Zhou, Frank Torsten Bernd Seide, Asela J.R. Gunawardana, Hagai Attias, Alejandro Acero, Xuedong Huang