Patents by Inventor Xavier Menendez-Pidal

Xavier Menendez-Pidal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20190341025
    Abstract: A system and method for multimodal classification of user characteristics is described. The method comprises receiving audio and other inputs, extracting fundamental frequency information from the audio input, extracting other feature information from the video input, classifying the fundamental frequency information, textual information and video feature information using the multimodal neural network.
    Type: Application
    Filed: April 15, 2019
    Publication date: November 7, 2019
    Inventors: Masanori Omote, Ruxin Chen, Xavier Menendez-Pidal, Jaekwon Yoo, Koji Tashiro, Sudha Krishnamurthy, Komath Naveen Kumar
  • Patent number: 10376785
    Abstract: Consumer electronic devices have been developed with enormous information processing capabilities, high quality audio and video outputs, large amounts of memory, and may also include wired and/or wireless networking capabilities. Additionally, relatively unsophisticated and inexpensive sensors, such as microphones, video camera, GPS or other position sensors, when coupled with devices having these enhanced capabilities, can be used to detect subtle features about users and their environments. A variety of audio, video, simulation and user interface paradigms have been developed to utilize the enhanced capabilities of these devices. These paradigms can be used separately or together in any combination. One paradigm automatically creating user identities using speaker identification. Another paradigm includes a control button with 3-axis pressure sensitivity for use with game controllers and other input devices.
    Type: Grant
    Filed: June 30, 2016
    Date of Patent: August 13, 2019
    Assignee: SONY INTERACTIVE ENTERTAINMENT INC.
    Inventors: Gustavo Hernandez-Abrego, Xavier Menendez-Pidal, Steven Osman, Ruxin Chen, Rishi Deshpande, Care Michaud-Wideman, Richard Marks, Eric J. Larsen, Xiaodong Mao
  • Publication number: 20190013015
    Abstract: A method for improved initialization of speech recognition system comprises mapping a trained hidden markov model based recognition node network (HMM) to a Connectionist Temporal Classification (CTC) based node label scheme. The central state of each frame in the HMM are mapped to CTC-labeled output nodes and the non-central states of each frame are mapped to CTC-blank nodes to generate a CTC-labeled HMM and each central state represents a phoneme from human speech detected and extracted by a computing device. Next the CTC-labeled HMM is trained using a cost function, wherein the cost function is not part of a CTC cost function. Finally the CTC-labeled HMM is trained using a CTC cost function to produce a CTC node network. The CTC node network may be iteratively trained by repeating the initialization steps.
    Type: Application
    Filed: July 10, 2017
    Publication date: January 10, 2019
    Inventors: Xavier Menendez-Pidal, Ruxin Chen
  • Publication number: 20160310847
    Abstract: Consumer electronic devices have been developed with enormous information processing capabilities, high quality audio and video outputs, large amounts of memory, and may also include wired and/or wireless networking capabilities. Additionally, relatively unsophisticated and inexpensive sensors, such as microphones, video camera, GPS or other position sensors, when coupled with devices having these enhanced capabilities, can be used to detect subtle features about users and their environments. A variety of audio, video, simulation and user interface paradigms have been developed to utilize the enhanced capabilities of these devices. These paradigms can be used separately or together in any combination. One paradigm automatically creating user identities using speaker identification. Another paradigm includes a control button with 3-axis pressure sensitivity for use with game controllers and other input devices.
    Type: Application
    Filed: June 30, 2016
    Publication date: October 27, 2016
    Applicant: Sony Interactive Entertainment Inc.
    Inventors: Gustavo Hernandez-Abrego, Xavier Menendez-Pidal, Steven Osman, Ruxin Chen, Rishi Deshpande, Care Michaud-Wideman, Richard Marks, Eric J. Larsen, Xiaodong Mao
  • Patent number: 9405363
    Abstract: Consumer electronic devices have been developed with enormous information processing capabilities, high quality audio and video outputs, large amounts of memory, and may also include wired and/or wireless networking capabilities. Additionally, relatively unsophisticated and inexpensive sensors, such as microphones, video camera, GPS or other position sensors, when coupled with devices having these enhanced capabilities, can be used to detect subtle features about users and their environments. A variety of audio, video, simulation and user interface paradigms have been developed to utilize the enhanced capabilities of these devices. These paradigms can be used separately or together in any combination. One paradigm automatically creating user identities using speaker identification. Another paradigm includes a control button with 3-axis pressure sensitivity for use with game controllers and other input devices.
    Type: Grant
    Filed: August 13, 2014
    Date of Patent: August 2, 2016
    Assignee: SONY INTERACTIVE ENTERTAINMENT INC. (SIEI)
    Inventors: Gustavo Hernandez-Abrego, Xavier Menendez-Pidal, Steven Osman, Ruxin Chen, Rishi Deshpande, Care Michaud-Wideman, Richard Marks, Eric J. Larsen, Xiaodong Mao
  • Publication number: 20140347272
    Abstract: Consumer electronic devices have been developed with enormous information processing capabilities, high quality audio and video outputs, large amounts of memory, and may also include wired and/or wireless networking capabilities. Additionally, relatively unsophisticated and inexpensive sensors, such as microphones, video camera, GPS or other position sensors, when coupled with devices having these enhanced capabilities, can be used to detect subtle features about users and their environments. A variety of audio, video, simulation and user interface paradigms have been developed to utilize the enhanced capabilities of these devices. These paradigms can be used separately or together in any combination. One paradigm automatically creating user identities using speaker identification. Another paradigm includes a control button with 3-axis pressure sensitivity for use with game controllers and other input devices.
    Type: Application
    Filed: August 13, 2014
    Publication date: November 27, 2014
    Applicant: Sony Computer Entertainment Inc.
    Inventors: Gustavo Hernandez-Abrego, Xavier Menendez-Pidal, Steven Osman, Ruxin Chen, Rishi Deshpande, Care Michaud-Wideman, Richard Marks, Eric J. Larsen, Xiaodong Mao
  • Patent number: 8825482
    Abstract: Consumer electronic devices have been developed with enormous information processing capabilities, high quality audio and video outputs, large amounts of memory, and may also include wired and/or wireless networking capabilities. Additionally, relatively unsophisticated and inexpensive sensors, such as microphones, video camera, GPS or other position sensors, when coupled with devices having these enhanced capabilities, can be used to detect subtle features about users and their environments. A variety of audio, video, simulation and user interface paradigms have been developed to utilize the enhanced capabilities of these devices. These paradigms can be used separately or together in any combination. One paradigm automatically creating user identities using speaker identification. Another paradigm includes a control button with 3-axis pressure sensitivity for use with game controllers and other input devices.
    Type: Grant
    Filed: September 15, 2006
    Date of Patent: September 2, 2014
    Assignee: Sony Computer Entertainment Inc.
    Inventors: Gustavo Hernandez-Abrego, Xavier Menendez-Pidal, Steven Osman, Ruxin Chen, Rishi Deshpande, Care Michaud-Wideman, Richard Marks, Eric Larsen, Xiaodong Mao
  • Patent number: 8788256
    Abstract: Computer implemented speech processing generates one or more pronunciations of an input word in a first language by a non-native speaker of the first language who is a native speaker of a second language. The input word is converted into one or more pronunciations. Each pronunciation includes one or more phonemes selected from a set of phonemes associated with the second language. Each pronunciation is associated with the input word in an entry in a computer database. Each pronunciation in the database is associated with information identifying a pronunciation language and/or a phoneme language.
    Type: Grant
    Filed: February 2, 2010
    Date of Patent: July 22, 2014
    Assignee: Sony Computer Entertainment Inc.
    Inventors: Ruxin Chen, Gustavo Hernandez-Abrego, Masanori Omote, Xavier Menendez-Pidal
  • Patent number: 8719023
    Abstract: An apparatus to improve robustness to environmental changes of a context dependent speech recognizer for an application, that includes a training database to store sounds for speech recognition training, a dictionary to store words supported by the speech recognizer, and a speech recognizer training module to train a set of one or more multiple state Hidden Markov Models (HMMs) with use of the training database and the dictionary. The speech recognizer training module performs a non-uniform state clustering process on each of the states of each HMM, which includes using a different non-uniform cluster threshold for at least some of the states of each HMM to more heavily cluster and correspondingly reduce a number of observation distributions for those of the states of each HMM that are less empirically affected by one or more contextual dependencies.
    Type: Grant
    Filed: May 21, 2010
    Date of Patent: May 6, 2014
    Assignee: Sony Computer Entertainment Inc.
    Inventors: Xavier Menendez-Pidal, Ruxin Chen
  • Publication number: 20110288869
    Abstract: An apparatus to improve robustness to environmental changes of a context dependent speech recognizer for an application, that includes a training database to store sounds for speech recognition training, a dictionary to store words supported by the speech recognizer, and a speech recognizer training module to train a set of one or more multiple state Hidden Markov Models (HMMs) with use of the training database and the dictionary. The speech recognizer training module performs a non-uniform state clustering process on each of the states of each HMM, which includes using a different non-uniform cluster threshold for at least some of the states of each HMM to more heavily cluster and correspondingly reduce a number of observation distributions for those of the states of each HMM that are less empirically affected by one or more contextual dependencies.
    Type: Application
    Filed: May 21, 2010
    Publication date: November 24, 2011
    Inventors: Xavier Menendez-Pidal, Ruxin Chen
  • Publication number: 20100211376
    Abstract: Computer implemented speech processing generates one or more pronunciations of an input word in a first language by a non-native speaker of the first language who is a native speaker of a second language. The input word is converted into one or more pronunciations. Each pronunciation includes one or more phonemes selected from a set of phonemes associated with the second language. Each pronunciation is associated with the input word in an entry in a computer database. Each pronunciation in the database is associated with information identifying a pronunciation language and/or a phoneme language.
    Type: Application
    Filed: February 2, 2010
    Publication date: August 19, 2010
    Applicant: Sony Computer Entertainment Inc.
    Inventors: Ruxin Chen, Gustavo Hernandez-Abrego, Masanori Omote, Xavier Menendez-Pidal
  • Patent number: 7716047
    Abstract: A system and method for an automatic set-up of speech recognition engines may include a speech recognizer configured to perform speech recognition procedures to identify input speech data according to one or more operating parameters. A merit manager may be utilized to automatically calculate merit values corresponding to the foregoing recognition procedures. These merit values may incorporate recognition accuracy information, recognition speed information, and a user-specified weighting factor that shifts the relative effect of the recognition accuracy information and the recognition speed information on the merit values. The merit manager may then automatically perform a merit value optimization procedure to select operating parameters that correspond to an optimal one of the merit values.
    Type: Grant
    Filed: March 31, 2003
    Date of Patent: May 11, 2010
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Gustavo Hernandez-Abrego, Xavier Menendez-Pidal, Thomas Kemp, Katsuki Minamino, Helmut Lucke
  • Patent number: 7502731
    Abstract: The present invention comprises a system and method for speech recognition utilizing a multi-language dictionary, and may include a recognizer that is configured to compare input speech data to a series of dictionary entries from the multi-language dictionary to detect a recognized phrase or command. The multi-language dictionary may be implemented with a mixed-language technique that utilizes dictionary entries which incorporate multiple different languages such as Cantonese and English. The speech recognizer may thus advantageously achieve more accurate speech recognition accuracy in an efficient and compact manner.
    Type: Grant
    Filed: August 11, 2003
    Date of Patent: March 10, 2009
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Michael Emonts, Xavier Menendez-Pidal, Lex Olorenshaw
  • Patent number: 7467086
    Abstract: A system and method for effectively performing speech recognition procedures includes enhanced demiphone acoustic models that a speech recognition engine utilizes to perform the speech recognition procedures. The enhanced demiphone acoustic models each have three states that are collectively arranged to form a preceding demiphone and a succeeding demiphone. An acoustic model generator may utilize a decision tree for analyzing speech context information from a training database. The acoustic model generator then effectively configures each of the enhanced demiphone acoustic models as either a succeeding-dominant enhanced demiphone acoustic model or a preceding-dominant enhanced demiphone acoustic model to accurately model speech characteristics.
    Type: Grant
    Filed: December 16, 2004
    Date of Patent: December 16, 2008
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Xavier Menendez-Pidal, Lex S. Olorenshaw, Gustavo Hernandez Abrego
  • Patent number: 7392186
    Abstract: A system and method for effectively implementing an optimized language model for speech recognition includes initial language models each created by combining source models according to selectable interpolation coefficients that define proportional relationships for combining the source models. A rescoring module iteratively utilizes the initial language models to process input development data for calculating word-error rates that each correspond to a different one of the initial language models. An optimized language model is then selected from the initial language models by identifying an optimal word-error rate from among the foregoing word-error rates. The speech recognizer may then utilize the optimized language model for effectively performing various speech recognition procedures.
    Type: Grant
    Filed: March 30, 2004
    Date of Patent: June 24, 2008
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Lei Duan, Gustavo Abrego, Xavier Menendez-Pidal, Lex Olorenshaw
  • Patent number: 7353174
    Abstract: The present invention comprises a system and method for effectively implementing a Mandarin Chinese speech recognition dictionary, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Mandarin Chinese phone set. The optimized Mandarin Chinese phone set may efficiently be implemented by utilizing an allophone and phonemic variation technique. In addition, the foregoing vocabulary dictionary may be implemented by utilizing unified dictionary optimization techniques to provide robust and accurate speech recognition. Furthermore, the vocabulary dictionary may be implemented as an optimized dictionary to accurately recognize either Northern Mandarin Chinese speech or Southern Mandarin Chinese speech during the speech recognition procedure.
    Type: Grant
    Filed: March 31, 2003
    Date of Patent: April 1, 2008
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Xavier Menendez-Pidal, Lei Duan, Jingwen Lu, Lex Olorenshaw
  • Patent number: 7353172
    Abstract: The present invention comprises a system and method for implementing a Cantonese speech recognizer with an optimized phone set, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Cantonese phone set. The optimized Cantonese phone set may be implemented with a phonetic technique to separately include consonantal phones and vocalic phones. For reasons of system efficiency, the optimized Cantonese phone set may preferably be implemented in a compact manner to include only a minimum required number of consonantal phones and vocalic phones to accurately represent Cantonese speech during the speech recognition procedure.
    Type: Grant
    Filed: March 24, 2003
    Date of Patent: April 1, 2008
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Michael Emonts, Xavier Menendez-Pidal, Lex Olorenshaw
  • Patent number: 7353173
    Abstract: The present invention comprises a system and method for implementing a Mandarin Chinese speech recognizer with an optimized phone set, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Mandarin Chinese phone set. The optimized Mandarin Chinese phone set may be implemented with a phonetic technique to separately include consonantal phones and vocalic phones. For reasons of system efficiency, the optimized Mandarin Chinese phone set may preferably be implemented in a compact manner to include only a minimum required number of consonantal phones and vocalic phones to accurately represent Mandarin Chinese speech during the speech recognition procedure.
    Type: Grant
    Filed: March 31, 2003
    Date of Patent: April 1, 2008
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Xavier Menendez-Pidal, Lei Duan, Jingwen Lu, Lex Olorenshaw
  • Patent number: 7272560
    Abstract: A system and method for performing a refinement procedure to effectively implement a speech recognition dictionary for spontaneous speech recognition may include a problematic word identifier configured to divide vocabulary words from an initial speech recognition dictionary into problematic words and non-problematic words according to pre-defined identification criteria. A candidate generator may analyze the problematic words to produce one or more pronunciation candidates for each of the problematic words. An optimization module may then perform an optimization process for refining one or more pronunciation candidates according to certain optimization criteria to thereby generate optimized problematic pronunciations. A dictionary refinement manager may finally combine the optimized problematic pronunciations with non-problematic pronunciations of the non-problematic words to produce a refined speech recognition dictionary for use by the speech recognition system.
    Type: Grant
    Filed: March 22, 2004
    Date of Patent: September 18, 2007
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Gustavo Hernandez Abrego, Xavier Menendez-Pidal, Lex Olorenshaw
  • Publication number: 20070061142
    Abstract: Consumer electronic devices have been developed with enormous information processing capabilities, high quality audio and video outputs, large amounts of memory, and may also include wired and/or wireless networking capabilities. Additionally, relatively unsophisticated and inexpensive sensors, such as microphones, video camera, GPS or other position sensors, when coupled with devices having these enhanced capabilities, can be used to detect subtle features about users and their environments. A variety of audio, video, simulation and user interface paradigms have been developed to utilize the enhanced capabilities of these devices. These paradigms can be used separately or together in any combination. One paradigm automatically creating user identities using speaker identification. Another paradigm includes a control button with 3-axis pressure sensitivity for use with game controllers and other input devices.
    Type: Application
    Filed: September 15, 2006
    Publication date: March 15, 2007
    Applicant: Sony Computer Entertainment Inc.
    Inventors: Gustavo Hernandez-Abrego, Xavier Menendez-Pidal, Steven Osman, Ruxin Chen, Rishi Deshpande, Care Michaud-Wideman, Richard Marks, Eric Larsen, Xiaodong Mao