Patents by Inventor Ossama A. ABDELHAMID

Ossama A. ABDELHAMID has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Low-latency multi-speaker speech recognition

Patent number: 11475898

Abstract: Systems and processes for operating an intelligent automated assistant are provided. In one example, a method includes receiving mixed speech data representing utterances of a target speaker and utterances of one or more interfering audio sources. The method further includes obtaining a target speaker representation, which represents speech characteristics of the target speaker; and determining, using a learning network, probability distributions of phonetic elements directly from the mixed speech data. The inputs of the learning network include the mixed speech data and the target speaker representation. An output of the learning network includes the probability distributions of phonetic elements. The method further includes generating text corresponding to the utterances of the target speaker based on the probability distributions of the phonetic elements; and providing a response to the target speaker based on the text corresponding to the utterances of the target speaker.

Type: Grant

Filed: August 7, 2019

Date of Patent: October 18, 2022

Assignee: Apple Inc.

Inventors: Masood Delfarah, Ossama A. Abdelhamid, Kyuyeon Hwang, Donald R. McAllaster, Sabato Marco Siniscalchi
LOW-LATENCY MULTI-SPEAKER SPEECH RECOGNITION

Publication number: 20200135209

Abstract: Systems and processes for operating an intelligent automated assistant are provided. In one example, a method includes receiving mixed speech data representing utterances of a target speaker and utterances of one or more interfering audio sources. The method further includes obtaining a target speaker representation, which represents speech characteristics of the target speaker; and determining, using a learning network, probability distributions of phonetic elements directly from the mixed speech data. The inputs of the learning network include the mixed speech data and the target speaker representation. An output of the learning network includes the probability distributions of phonetic elements. The method further includes generating text corresponding to the utterances of the target speaker based on the probability distributions of the phonetic elements; and providing a response to the target speaker based on the text corresponding to the utterances of the target speaker.

Type: Application

Filed: August 7, 2019

Publication date: April 30, 2020

Inventors: Masood DELFARAH, Ossama A. ABDELHAMID, Kyuyeon HWANG, Donald R. MCALLASTER, Sabato Marco SINISCALCHI
System and method for applying a convolutional neural network to speech recognition

Patent number: 9734824

Abstract: A system and method for applying a convolutional neural network (CNN) to speech recognition. The CNN may provide input to a hidden Markov model and has at least one pair of a convolution layer and a pooling layer. The CNN operates along the frequency axis. The CNN has units that operate upon one or more local frequency bands of an acoustic signal. The CNN mitigates acoustic variation.

Type: Grant

Filed: May 25, 2015

Date of Patent: August 15, 2017

Assignees: THE GOVERNING COUNCIL OF THE UNIVERSITY OF TORONTO

Inventors: Gerald Bradley Penn, Hui Jiang, Ossama Abdelhamid Mohamed Abdelhamid, Abdel-rahman Samir Abdel-rahman Mohamed
System and method for applying a convolutional neural network to speech recognition

Patent number: 9190053

Abstract: A system and method for applying a convolutional neural network (CNN) to speech recognition. The CNN may provide input to a hidden Markov model and has at least one pair of a convolution layer and a pooling layer. The CNN operates along the frequency axis. The CNN has units that operate upon one or more local frequency bands of an acoustic signal. The CNN mitigates acoustic variation.

Type: Grant

Filed: March 25, 2013

Date of Patent: November 17, 2015

Assignees: THE GOVERNING COUNCIL OF THE UNIVERISTY OF TORONTO

Inventors: Gerald Bradley Penn, Hui Jiang, Ossama Abdelhamid Mohamed Abdelhamid, Abdel-rahman Samir Abdel-rahman Mohamed
SYSTEM AND METHOD FOR APPLYING A CONVOLUTIONAL NEURAL NETWORK TO SPEECH RECOGNITION

Publication number: 20150255062

Abstract: A system and method for applying a convolutional neural network (CNN) to speech recognition. The CNN may provide input to a hidden Markov model and has at least one pair of a convolution layer and a pooling layer. The CNN operates along the frequency axis. The CNN has units that operate upon one or more local frequency bands of an acoustic signal. The CNN mitigates acoustic variation.

Type: Application

Filed: May 25, 2015

Publication date: September 10, 2015

Inventors: Gerald Bradley Penn, Hui Jiang, Ossama Abdelhamid Mohamed Abdelhamid, Abdel-rahman Samir Abdel-rahman MOHAMED
SYSTEM AND METHOD FOR APPLYING A CONVOLUTIONAL NEURAL NETWORK TO SPEECH RECOGNITION

Publication number: 20140288928

Abstract: A system and method for applying a convolutional neural network (CNN) to speech recognition. The CNN may provide input to a hidden Markov model and has at least one pair of a convolution layer and a pooling layer. The CNN operates along the frequency axis. The CNN has units that operate upon one or more local frequency bands of an acoustic signal. The CNN mitigates acoustic variation.

Type: Application

Filed: March 25, 2013

Publication date: September 25, 2014

Inventors: Gerald Bradley Penn, Hui Jiang, Ossama Abdelhamid Mohamed Abdelhamid, Abdel-rahman Samir Abdel-rahman Mohamed

Low-latency multi-speaker speech recognition

LOW-LATENCY MULTI-SPEAKER SPEECH RECOGNITION

System and method for applying a convolutional neural network to speech recognition

System and method for applying a convolutional neural network to speech recognition

SYSTEM AND METHOD FOR APPLYING A CONVOLUTIONAL NEURAL NETWORK TO SPEECH RECOGNITION

SYSTEM AND METHOD FOR APPLYING A CONVOLUTIONAL NEURAL NETWORK TO SPEECH RECOGNITION