Patents by Inventor Mehrez Souden

Mehrez Souden has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20200312315
    Abstract: An acoustic environment aware method for selecting a high quality audio stream during multi-stream speech recognition. A number of input audio streams are processed to determine if a voice trigger is detected, and if so a voice trigger score is calculated for each stream. An acoustic environment measurement is also calculated for each audio stream. The trigger score and acoustic environment measurement are combined for each audio stream, to select as a preferred audio stream the audio stream with the highest combined score. The preferred audio stream is output to an automatic speech recognizer. Other aspects are also described and claimed.
    Type: Application
    Filed: March 28, 2019
    Publication date: October 1, 2020
    Inventors: Feipeng Li, Mehrez Souden, Joshua D. Atkins, John Bridle, Charles P. Clark, Stephen H. Shum, Sachin S. Kajarekar, Haiying Xia, Erik Marchi
  • Patent number: 10546593
    Abstract: A number of features are extracted from a current frame of a multi-channel speech pickup and from side information that is a linear echo estimate, a diffuse signal component, or a noise estimate of the multi-channel speech pickup. A DNN-based speech presence probability is produced for the current frame, where the SPP value is produced in response to the extracted features being input to the DNN. The DNN-based SPP value is applied to configure a multi-channel filter whose input is the multi-channel speech pickup and whose output is a single audio signal. In one aspect, the system is designed to run online, at low enough latency for real time applications such voice trigger detection. Other aspects are also described and claimed.
    Type: Grant
    Filed: December 4, 2017
    Date of Patent: January 28, 2020
    Assignee: APPLE INC.
    Inventors: Jason Wung, Mehrez Souden, Ramin Pishehvar, Joshua D. Atkins
  • Patent number: 10403299
    Abstract: A digital speech enhancement system that performs a specific chain of digital signal processing operations upon multi-channel sound pick up, to result in a single, enhanced speech signal. The operations are designed to be computationally less complex yet as a whole yield an enhanced speech signal that produces accurate voice trigger detection and low word error rates by an automatic speech recognizer. The constituent operations or components of the system have been chosen so that the overall system is robust to changing acoustic conditions, and can deliver the enhanced speech signal with low enough latency so that the system can be used online (enabling real-time, voice trigger detection and streaming ASR.) Other embodiments are also described and claimed.
    Type: Grant
    Filed: June 2, 2017
    Date of Patent: September 3, 2019
    Assignee: Apple Inc.
    Inventors: Jason Wung, Joshua D. Atkins, Ramin Pishehvar, Mehrez Souden
  • Patent number: 10334357
    Abstract: Impulse responses of a device are measured. A database of sound files is generated by convolving source signals with the impulse responses of the device. The sound files from the database are transformed into time-frequency domain. One or more sub-band directional features is estimated at each sub-band of the time-frequency domain. A deep neural network (DNN) is trained for each sub-band based on the estimated one or more sub-band directional features and a target directional feature.
    Type: Grant
    Filed: September 29, 2017
    Date of Patent: June 25, 2019
    Assignee: Apple Inc.
    Inventors: Joshua D. Atkins, Mehrez Souden, Symeon Delikaris-Manias, Peter Raffensperger
  • Publication number: 20190172476
    Abstract: A number of features are extracted from a current frame of a multi-channel speech pickup and from side information that is a linear echo estimate, a diffuse signal component, or a noise estimate of the multi-channel speech pickup. A DNN-based speech presence probability is produced for the current frame, where the SPP value is produced in response to the extracted features being input to the DNN. The DNN-based SPP value is applied to configure a multi-channel filter whose input is the multi-channel speech pickup and whose output is a single audio signal. In one aspect, the system is designed to run online, at low enough latency for real time applications such voice trigger detection. Other aspects are also described and claimed.
    Type: Application
    Filed: December 4, 2017
    Publication date: June 6, 2019
    Inventors: Jason Wung, Mehrez Souden, Ramin Pishehvar, Joshua D. Atkins
  • Publication number: 20190104357
    Abstract: Impulse responses of a device are measured. A database of sound files is generated by convolving source signals with the impulse responses of the device. The sound files from the database are transformed into time-frequency domain. One or more sub-band directional features is estimated at each sub-band of the time-frequency domain. A deep neural network (DNN) is trained for each sub-band based on the estimated one or more sub-band directional features and a target directional feature.
    Type: Application
    Filed: September 29, 2017
    Publication date: April 4, 2019
    Inventors: Joshua D. Atkins, Mehrez Souden, Symeon Delikaris-Manias, Peter Raffensperger
  • Publication number: 20190074009
    Abstract: Systems and processes for operating an intelligent automated assistant are provided. In accordance with one example, a method includes, at an electronic device with one or more processors, memory, and a plurality of microphones, sampling, at each of the plurality of microphones of the electronic device, an audio signal to obtain a plurality of audio signals; processing the plurality of audio signals to obtain a plurality of audio streams; and determining, based on the plurality of audio streams, whether any of the plurality of audio signals corresponds to a spoken trigger. The method further includes, in accordance with a determination that the plurality of audio signals corresponds to the spoken trigger, initiating a session of the digital assistant; and in accordance with a determination that the plurality of audio signals does not correspond to the spoken trigger, foregoing initiating a session of the digital assistant.
    Type: Application
    Filed: November 5, 2018
    Publication date: March 7, 2019
    Inventors: Yoon KIM, John BRIDLE, Joshua D. ATKINS, Feipeng LI, Mehrez SOUDEN
  • Publication number: 20180350379
    Abstract: A digital speech enhancement system that performs a specific chain of digital signal processing operations upon multi-channel sound pick up, to result in a single, enhanced speech signal. The operations are designed to be computationally less complex yet as a whole yield an enhanced speech signal that produces accurate voice trigger detection and low word error rates by an automatic speech recognizer. The constituent operations or components of the system have been chosen so that the overall system is robust to changing acoustic conditions, and can deliver the enhanced speech signal with low enough latency so that the system can be used online (enabling real-time, voice trigger detection and streaming ASR.) Other embodiments are also described and claimed.
    Type: Application
    Filed: June 2, 2017
    Publication date: December 6, 2018
    Inventors: Jason Wung, Joshua D. Atkins, Ramin Pishehvar, Mehrez Souden
  • Publication number: 20180336892
    Abstract: Systems and processes for operating an intelligent automated assistant are provided. In accordance with one example, a method includes, at an electronic device with one or more processors, memory, and a plurality of microphones, sampling, at each of the plurality of microphones of the electronic device, an audio signal to obtain a plurality of audio signals; processing the plurality of audio signals to obtain a plurality of audio streams; and determining, based on the plurality of audio streams, whether any of the plurality of audio signals corresponds to a spoken trigger. The method further includes, in accordance with a determination that the plurality of audio signals corresponds to the spoken trigger, initiating a session of the digital assistant; and in accordance with a determination that the plurality of audio signals does not correspond to the spoken trigger, foregoing initiating a session of the digital assistant.
    Type: Application
    Filed: March 13, 2018
    Publication date: November 22, 2018
    Inventors: Yoon KIM, John BRIDLE, Joshua D. ATKINS, Feipeng LI, Mehrez SOUDEN
  • Patent number: 9754608
    Abstract: A noise estimation apparatus which estimates a non-stationary noise component on the basis of the likelihood maximization criterion is provided. The noise estimation apparatus obtains the variance of a noise signal that causes a large value to be obtained by weighted addition of the sums each of which is obtained by adding the product of the log likelihood of a model of an observed signal expressed by a Gaussian distribution in a speech segment and a speech posterior probability in each frame, and the product of the log likelihood of a model of an observed signal expressed by a Gaussian distribution in a non-speech segment and a non-speech posterior probability in each frame, by using complex spectra of a plurality of observed signals up to the current frame.
    Type: Grant
    Filed: January 30, 2013
    Date of Patent: September 5, 2017
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Mehrez Souden, Keisuke Kinoshita, Tomohiro Nakatani, Marc Delcroix, Takuya Yoshioka
  • Publication number: 20150032445
    Abstract: A noise estimation apparatus which estimates a non-stationary noise component on the basis of the likelihood maximization criterion is provided. The noise estimation apparatus obtains the variance of a noise signal that causes a large value to be obtained by weighted addition of the sums each of which is obtained by adding the product of the log likelihood of a model of an observed signal expressed by a Gaussian distribution in a speech segment and a speech posterior probability in each frame, and the product of the log likelihood of a model of an observed signal expressed by a Gaussian distribution in a non-speech segment and a non-speech posterior probability in each frame, by using complex spectra of a plurality of observed signals up to the current frame.
    Type: Application
    Filed: January 30, 2013
    Publication date: January 29, 2015
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Mehrez Souden, Keisuke Kinoshita, Tomohiro Nakatani, Marc Delcroix, Takuya Yoshioka