Patents by Inventor Matthias Paulik

Matthias Paulik has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20180330737
    Abstract: Systems and processes for providing user-specific acoustic models are provided. In accordance with one example, a method includes, at an electronic device having one or more processors, receiving a plurality of speech inputs, each of the speech inputs associated with a same user of the electronic device; providing each of the plurality of speech inputs to a user-independent acoustic model, the user-independent acoustic model providing a plurality of speech results based on the plurality of speech inputs; initiating a user-specific acoustic model on the electronic device; and adjusting the user-specific acoustic model based on the plurality of speech inputs and the plurality of speech results.
    Type: Application
    Filed: September 22, 2017
    Publication date: November 15, 2018
    Inventors: Matthias PAULIK, Henry G. MASON, Jason A. SKINDER
  • Publication number: 20180330730
    Abstract: Speech recognition is performed on a received utterance to determine a plurality of candidate text representations of the utterance, including a primary text representation and one or more alternative text representations. Natural language processing is performed on the primary text representation to determine a plurality of candidate actionable intents, including a primary actionable intent and one or more alternative actionable intents. A result is determined based on the primary actionable intent. The result is provided to the user. A recognition correction trigger is detected. In response to detecting the recognition correction trigger, a set of alternative intent affordances and a set of alternative text affordances are concurrently displayed.
    Type: Application
    Filed: August 15, 2017
    Publication date: November 15, 2018
    Inventors: Ashish GARG, Harry J. SADDLER, Shweta GRAMPUROHIT, Robert A. WALKER, Rushin N. SHAH, Matthew S. SEIGEL, Matthias PAULIK
  • Publication number: 20180330731
    Abstract: Systems and processes for performing a task with a digital assistant are provided.
    Type: Application
    Filed: September 22, 2017
    Publication date: November 15, 2018
    Inventors: Nicolas ZEITLIN, Matthias PAULIK, Henry G. MASON, Karric KWONG, Sinan AKAY, Saravana Kumar RATHINAM, Anumita BISWAS
  • Patent number: 10007663
    Abstract: An iterative language translation system includes multiple communicatively connected statistical speech translation systems. The system includes an automatic speech recognition component adapted to recognize spoken language in a source language and to create a source language hypothesis. A machine translation component is adapted to translate the source language hypothesis into a target language. The system also includes a second automatic speech recognition component and second machine translation component. The translation results are used to adapt the automatic speech recognition components and the language hypotheses are used to adapt the machine translation components.
    Type: Grant
    Filed: October 31, 2014
    Date of Patent: June 26, 2018
    Assignee: Facebook, Inc.
    Inventors: Alexander Waibel, Matthias Paulik
  • Patent number: 9972304
    Abstract: Systems and processes for evaluating embedded personalized systems are provided. In one example process, instructions that define an experiment associated with a personalized speech recognition system can be received. The instructions can define one or more experimental parameters. In accordance with the received instructions, a second personalized speech recognition system can be generated based on the personalized speech recognition system and the one or more experimental parameters. Additionally, the plurality of user speech samples can be processed using the second personalized speech recognition system to generate a plurality of speech recognition results and a plurality of accuracy scores corresponding to the plurality of speech recognition results. Second instructions can be received based on the plurality of accuracy scores. In accordance with the second instructions, the second speech recognition system can be activated.
    Type: Grant
    Filed: September 15, 2016
    Date of Patent: May 15, 2018
    Assignee: Apple Inc.
    Inventors: Matthias Paulik, Henry G. Mason, Matthew S. Seigel
  • Publication number: 20180108346
    Abstract: Systems and processes are disclosed for discovering trending terms in automatic speech recognition. Candidate terms (e.g., words, phrases, etc.) not yet found in a speech recognizer vocabulary or having low language model probability can be identified based on trending usage in a variety of electronic data sources (e.g., social network feeds, news sources, search queries, etc.). When candidate terms are identified, archives of live or recent speech traffic can be searched to determine whether users are uttering the candidate terms in dictation or speech requests. Such searching can be done using open vocabulary spoken term detection to find phonetic matches in the audio archives. As the candidate terms are found in the speech traffic, notifications can be generated that identify the candidate terms, provide relevant usage statistics, identify the context in which the terms are used, and the like.
    Type: Application
    Filed: November 3, 2017
    Publication date: April 19, 2018
    Inventors: Matthias PAULIK, Gunnar EVERMANN, Laurence S. GILLICK
  • Publication number: 20170352346
    Abstract: Systems and processes for evaluating embedded personalized systems are provided. In one example process, instructions that define an experiment associated with a personalized speech recognition system can be received. The instructions can define one or more experimental parameters. In accordance with the received instructions, a second personalized speech recognition system can be generated based on the personalized speech recognition system and the one or more experimental parameters. Additionally, the plurality of user speech samples can be processed using the second personalized speech recognition system to generate a plurality of speech recognition results and a plurality of accuracy scores corresponding to the plurality of speech recognition results. Second instructions can be received based on the plurality of accuracy scores. In accordance with the second instructions, the second speech recognition system can be activated.
    Type: Application
    Filed: September 15, 2016
    Publication date: December 7, 2017
    Inventors: Matthias PAULIK, Henry G. MASON, Matthew S. SEIGEL
  • Patent number: 9818400
    Abstract: Systems and processes are disclosed for discovering trending terms in automatic speech recognition. Candidate terms (e.g., words, phrases, etc.) not yet found in a speech recognizer vocabulary or having low language model probability can be identified based on trending usage in a variety of electronic data sources (e.g., social network feeds, news sources, search queries, etc.). When candidate terms are identified, archives of live or recent speech traffic can be searched to determine whether users are uttering the candidate terms in dictation or speech requests. Such searching can be done using open vocabulary spoken term detection to find phonetic matches in the audio archives. As the candidate terms are found in the speech traffic, notifications can be generated that identify the candidate terms, provide relevant usage statistics, identify the context in which the terms are used, and the like.
    Type: Grant
    Filed: August 28, 2015
    Date of Patent: November 14, 2017
    Assignee: Apple Inc.
    Inventors: Matthias Paulik, Gunnar Evermann, Laurence S. Gillick
  • Publication number: 20160358600
    Abstract: Systems and processes for automatic accent detection are provided. In accordance with one example, a method includes, at an electronic device with one or more processors and memory, receiving a user input, determining a first similarity between a representation of the user input and a first acoustic model of a plurality of acoustic models, and determining a second similarity between the representation of the user input and a second acoustic model of the plurality of acoustic models. The method further includes determining whether the first similarity is greater than the second similarity. In accordance with a determination that the first similarity is greater than the second similarity, the first acoustic model may be selected; and in accordance with a determination that the first similarity is not greater than the second similarity, the second acoustic model may be selected.
    Type: Application
    Filed: September 4, 2015
    Publication date: December 8, 2016
    Inventors: Udhyakumar NALLASAMY, Sachin S. KAJAREKAR, Matthias PAULIK, Matthew SEIGEL
  • Publication number: 20160358598
    Abstract: The present disclosure generally relates to context-based endpoint detection in user speech input. A method for identifying an endpoint of a spoken request by a user may include receiving user input of natural language speech including one or more words; identifying at least one context associated with the user input; generating a probability, based on the at least one context associated with the user input, that a location in the user input is an endpoint; determining whether the probability is greater than a threshold; and in accordance with a determination that the probability is greater than the threshold, identifying the location in the user input as the endpoint.
    Type: Application
    Filed: September 4, 2015
    Publication date: December 8, 2016
    Inventors: Shaun E. WILLIAMS, Henry G. MASON, Mahesh KRISHNAMOORTHY, Matthias PAULIK, Neha AGRAWAL, Sachin S. KAJAREKAR, Selen UGUROGLU, Ali S. MOHAMED
  • Patent number: 9502031
    Abstract: Systems and processes are disclosed for recognizing speech using a weighted finite state transducer (WFST) approach. Dynamic grammars can be supported by constructing the final recognition cascade during runtime using difference grammars. In a first grammar, non-terminals can be replaced with a, weighted phone loop that produces sequences of mono-phone words. In a second grammar, at runtime, non-terminals can be replaced with sub-grammars derived from user-specific usage data including contact, media, and application lists. Interaction frequencies associated with these entities can be used to weight certain words over others. With all non-terminals replaced, a static recognition cascade with the first grammar can be composed with the personalized second grammar to produce a user-specific WEST. User speech can then be processed to generate candidate words having associated probabilities, and the likeliest result can be output.
    Type: Grant
    Filed: September 23, 2014
    Date of Patent: November 22, 2016
    Assignee: Apple Inc.
    Inventors: Matthias Paulik, Rongqing Huang
  • Patent number: 9418660
    Abstract: Speech audio that is intended for transcription into textual form is received. The received speech audio is divided into first speech segments. A plurality of speakers is identified. A speaker is configured for repeating in spoken form a first speech segment that the speaker has listened to. A subset of speakers is determined for sending each first speech segment. Each first speech segment is sent to the subset of speakers determined for the particular first speech segment. The second speech segments are received from the speakers. The second speech segment is a re-spoken version of a first speech segment that has been generated by a speaker by repeating in spoken form the first speech segment. The second speech segments are processed to generate partial transcripts. The partial transcripts are combined to generate a complete transcript that is a textual representation corresponding to the received speech audio.
    Type: Grant
    Filed: January 15, 2014
    Date of Patent: August 16, 2016
    Assignee: Cisco Technology, Inc.
    Inventors: Matthias Paulik, Vivek Halder, Ananth Sankar
  • Patent number: 9338199
    Abstract: An example method is provided and includes receiving recorded meeting information, selecting a meeting participant from the recorded meeting information, determining at least one of meeting participant emotion information, meeting participant speaker role information, or meeting participant engagement information based, at least in part, on the meeting information, and determining an interaction map associated with the meeting participant based, at least in part, on at least one of the meeting participant emotion information, the meeting participant speaker role information, or the meeting participant engagement information.
    Type: Grant
    Filed: July 8, 2013
    Date of Patent: May 10, 2016
    Assignee: CISCO TECHNOLOGY, INC.
    Inventors: Matthias Paulik, Vivek Halder
  • Publication number: 20160078860
    Abstract: Systems and processes are disclosed for discovering trending terms in automatic speech recognition. Candidate terms (e.g., words, phrases, etc.) not yet found in a speech recognizer vocabulary or having low language model probability can be identified based on trending usage in a variety of electronic data sources (e.g., social network feeds, news sources, search queries, etc.). When candidate terms are identified, archives of live or recent speech traffic can be searched to determine whether users are uttering the candidate terms in dictation or speech requests. Such searching can be done using open vocabulary spoken term detection to find phonetic matches in the audio archives. As the candidate terms are found in the speech traffic, notifications can be generated that identify the candidate terms, provide relevant usage statistics, identify the context in which the terms are used, and the like.
    Type: Application
    Filed: August 28, 2015
    Publication date: March 17, 2016
    Inventors: Matthias PAULIK, Gunnar EVERMANN, Laurence S. GILLICK
  • Publication number: 20160063998
    Abstract: Systems and processes for processing speech in a digital assistant are provided. In one example process, a first speech input can be received from a user. The first speech input can be processed using a first automatic speech recognition system to produce a first recognition result. An input indicative of a potential error in the first recognition result can be received. The input can be used to improve the first recognition result. For example, the input can include a second speech input that is a repetition of the first speech input. The second speech input can be processed using a second automatic speech recognition system to produce a second recognition result.
    Type: Application
    Filed: January 7, 2015
    Publication date: March 3, 2016
    Inventors: Mahesh KRISHNAMOORTHY, Matthias PAULIK
  • Publication number: 20160034811
    Abstract: Systems and processes for generating complementary acoustic models for performing automatic speech recognition system combination are provided. In one example process, a deep neural network can be trained using a set of training data. The trained deep neural network can be a deep neural network acoustic model. A Gaussian-mixture model can be linked to a hidden layer of the trained deep neural network such that any feature vector outputted from the hidden layer is received by the Gaussian-mixture model. The Gaussian-mixture model can be trained via a first portion of the trained deep neural network and using the set of training data. The first portion of the trained deep neural network can include an input layer of the deep neural network and the hidden layer. The first portion of the trained deep neural network and the trained Gaussian-mixture model can be a Deep Neural Network-Gaussian-Mixture Model (DNN-GMM) acoustic model.
    Type: Application
    Filed: September 30, 2014
    Publication date: February 4, 2016
    Inventors: Matthias PAULIK, Mahesh KRISHNAMOORTHY
  • Patent number: 9240181
    Abstract: An audio stream is segmented into a plurality of time segments using speaker segmentation and recognition (SSR), with each time segment corresponding to the speaker's name, producing an SSR transcript. The audio stream is transcribed into a plurality of word regions using automatic speech recognition (ASR), with each of the word regions having a measure of the confidence in the accuracy of the translation, producing an ASR transcript. Word regions with a relatively low confidence in the accuracy of the translation are identified. The low confidence regions are filtered using named entity recognition (NER) rules to identify low confidence regions that a likely names. The NER rules associate a region that is identified as a likely name with the name of the speaker corresponding to the current, the previous, or the next time segment. All of the likely name regions associated with that speaker's name are selected.
    Type: Grant
    Filed: August 20, 2013
    Date of Patent: January 19, 2016
    Assignee: Cisco Technology, Inc.
    Inventors: Aparna Khare, Neha Agrawal, Sachin S. Kajarekar, Matthias Paulik
  • Publication number: 20150348547
    Abstract: Systems and processes are disclosed for recognizing speech using a weighted finite state transducer (WFST) approach. Dynamic grammars can be supported by constructing the final recognition cascade during runtime using difference grammars. In a first grammar, non-terminals can be replaced with a, weighted phone loop that produces sequences of mono-phone words. In a second grammar, at runtime, non-terminals can be replaced with sub-grammars derived from user-specific usage data including contact, media, and application lists. Interaction frequencies associated with these entities can be used to weight certain words over others. With all non-terminals replaced, a static recognition cascade with the first grammar can be composed with the personalized second grammar to produce a user-specific WEST. User speech can then be processed to generate candidate words having associated probabilities, and the likeliest result can be output.
    Type: Application
    Filed: September 23, 2014
    Publication date: December 3, 2015
    Inventors: Matthias PAULIK, Rongqing HUANG
  • Publication number: 20150199966
    Abstract: Speech audio that is intended for transcription into textual form is received. The received speech audio is divided into first speech segments. A plurality of speakers is identified. A speaker is configured for repeating in spoken form a first speech segment that the speaker has listened to. A subset of speakers is determined for sending each first speech segment. Each first speech segment is sent to the subset of speakers determined for the particular first speech segment. The second speech segments are received from the speakers. The second speech segment is a re-spoken version of a first speech segment that has been generated by a speaker by repeating in spoken form the first speech segment. The second speech segments are processed to generate partial transcripts. The partial transcripts are combined to generate a complete transcript that is a textual representation corresponding to the received speech audio.
    Type: Application
    Filed: January 15, 2014
    Publication date: July 16, 2015
    Applicant: Cisco Technology, Inc.
    Inventors: Matthias Paulik, Vivek Halder, Ananth Sankar
  • Publication number: 20150058005
    Abstract: An audio stream is segmented into a plurality of time segments using speaker segmentation and recognition (SSR), with each time segment corresponding to the speaker's name, producing an SSR transcript. The audio stream is transcribed into a plurality of word regions using automatic speech recognition (ASR), with each of the word regions having a measure of the confidence in the accuracy of the translation, producing an ASR transcript. Word regions with a relatively low confidence in the accuracy of the translation are identified. The low confidence regions are filtered using named entity recognition (NER) rules to identify low confidence regions that a likely names. The NER rules associate a region that is identified as a likely name with the name of the speaker corresponding to the current, the previous, or the next time segment. All of the likely name regions associated with that speaker's name are selected.
    Type: Application
    Filed: August 20, 2013
    Publication date: February 26, 2015
    Applicant: Cisco Technology, Inc.
    Inventors: Aparna Khare, Neha Agrawal, Sachin S. Kajarekar, Matthias Paulik