Patents by Inventor Ozlem Kalinli

Ozlem Kalinli has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Speech recognition system using machine learning to classify phone posterior context information and estimate boundaries in speech from combined boundary posteriors

Patent number: 10424289

Abstract: A speech recognition system includes a phone classifier and a boundary classifier. The phone classifier generates combined boundary posteriors from a combination of auditory attention features and phone posteriors by feeding phone posteriors of neighboring frames of an audio signal into a machine learning algorithm to classify phone posterior context information. The boundary classifier estimates boundaries in speech contained in the audio signal from the combined boundary posteriors.

Type: Grant

Filed: August 14, 2018

Date of Patent: September 24, 2019

Assignee: SONY INTERACTIVE ENTERTAINMENT INC.

Inventor: Ozlem Kalinli-Akbacak
SPEECH RECOGNITION SYSTEM USING MACHINE LEARNING TO CLASSIFY PHONE POSTERIOR CONTEXT INFORMATION AND ESTIMATE BOUNDARIES IN SPEECH FROM COMBINED BOUNDARY POSTERIORS

Publication number: 20190005943

Abstract: A speech recognition system includes a phone classifier and a boundary classifier. The phone classifier generates combined boundary posteriors from a combination of auditory attention features and phone posteriors by feeding phone posteriors of neighboring frames of an audio signal into a machine learning algorithm to classify phone posterior context information. The boundary classifier estimates boundaries in speech contained in the audio signal from the combined boundary posteriors.

Type: Application

Filed: August 14, 2018

Publication date: January 3, 2019

Inventor: Ozlem Kalinli-Akbacak
Emotional speech processing

Patent number: 10127927

Abstract: A method for emotion or speaking style recognition and/or clustering comprises receiving one or more speech samples, generating a set of training data by extracting one or more acoustic features from every frame of the one or more speech samples, and generating a model from the set of training data, wherein the model identifies emotion or speaking style dependent information in the set of training data. The method may further comprise receiving one or more test speech samples, generating a set of test data by extracting one or more acoustic features from every frame of the one or more test speeches, and transforming the set of test data using the model to better represent emotion/speaking style dependent information, and use the transformed data for clustering and/or classification to discover speech with similar emotion or speaking style.

Type: Grant

Filed: June 18, 2015

Date of Patent: November 13, 2018

Assignee: Sony Interactive Entertainment Inc.

Inventors: Ozlem Kalinli-Akbacak, Ruxin Chen
Using machine learning to classify phone posterior context information and estimating boundaries in speech from combined boundary posteriors

Patent number: 10049657

Abstract: Phoneme boundaries may be determined from a signal corresponding to recorded audio by extracting auditory attention features from the signal and extracting phoneme posteriors from the signal. The auditory attention features and phoneme posteriors may then be combined to detect boundaries in the signal.

Type: Grant

Filed: May 26, 2017

Date of Patent: August 14, 2018

Assignee: SONY INTERACTIVE ENTERTAINMENT INC.

Inventor: Ozlem Kalinli-Akbacak
COMBINING AUDITORY ATTENTION CUES WITH PHONEME POSTERIOR SCORES FOR PHONE/VOWEL/SYLLABLE BOUNDARY DETECTION

Publication number: 20170263240

Abstract: Phoneme boundaries may be determined from a signal corresponding to recorded audio by extracting auditory attention features from the signal and extracting phoneme posteriors from the signal. The auditory attention features and phoneme posteriors may then be combined to detect boundaries in the signal.

Type: Application

Filed: May 26, 2017

Publication date: September 14, 2017

Inventor: Ozlem Kalinli-Akbacak
Combining auditory attention cues with phoneme posterior scores for phone/vowel/syllable boundary detection

Patent number: 9672811

Abstract: Phoneme boundaries may be determined from a signal corresponding to recorded audio by extracting auditory attention features from the signal and extracting phoneme posteriors from the signal. The auditory attention features and phoneme posteriors may then be combined to detect boundaries in the signal.

Type: Grant

Filed: May 23, 2013

Date of Patent: June 6, 2017

Assignee: SONY INTERACTIVE ENTERTAINMENT INC.

Inventor: Ozlem Kalinli-Akbacak
Speech syllable/vowel/phone boundary detection using auditory attention cues

Patent number: 9251783

Abstract: In syllable or vowel or phone boundary detection during speech, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more syllable or vowel or phone boundaries in the input window of sound can be detected by mapping the cumulative gist vector to one or more syllable or vowel or phone boundary characteristics using a machine learning algorithm.

Type: Grant

Filed: June 17, 2014

Date of Patent: February 2, 2016

Assignee: Sony Computer Entertainment Inc.

Inventors: Ozlem Kalinli-Akbacak, Ruxin Chen
EMOTIONAL SPEECH PROCESSING

Publication number: 20160027452

Abstract: A method for emotion or speaking style recognition and/or clustering comprises receiving one or more speech samples, generating a set of training data by extracting one or more acoustic features from every frame of the one or more speech samples, and generating a model from the set of training data, wherein the model identifies emotion or speaking style dependent information in the set of training data. The method may further comprise receiving one or more test speech samples, generating a set of test data by extracting one or more acoustic features from every frame of the one or more test speeches, and transforming the set of test data using the model to better represent emotion/speaking style dependent information, and use the transformed data for clustering and/or classification to discover speech with similar emotion or speaking style.

Type: Application

Filed: June 18, 2015

Publication date: January 28, 2016

Inventors: Ozlem Kalinli-Akbacak, Ruxin Chen
Interface using eye tracking contact lenses

Patent number: 9244285

Abstract: Methods of eye gaze tracking are provided using magnetized contact lenses tracked by magnetic sensors and/or reflecting contact lenses tracked by video-based sensors. Tracking information of contact lenses from magnetic sensors and video-based sensors may be used to improve eye tracking and/or combined with other sensor data to improve accuracy. Furthermore, reflective contact lenses improve blink detection while eye gaze tracking is otherwise unimpeded by magnetized contact lenses. Additionally, contact lenses may be adapted for viewing 3D information.

Type: Grant

Filed: January 21, 2014

Date of Patent: January 26, 2016

Assignee: SONY COMPUTER ENTERTAINMENT INC.

Inventors: Ruxin Chen, Ozlem Kalinli
Multi-modal sensor based emotion recognition and emotional interface

Patent number: 9031293

Abstract: Features, including one or more acoustic features, visual features, linguistic features, and physical features may be extracted from signals obtained by one or more sensors with a processor. The acoustic, visual, linguistic, and physical features may be analyzed with one or more machine learning algorithms and an emotional state of a user may be extracted from analysis of the features. It is emphasized that this abstract is provided to comply with the rules requiring an abstract that will allow a searcher or other reader to quickly ascertain the subject matter of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims.

Type: Grant

Filed: October 19, 2012

Date of Patent: May 12, 2015

Assignee: Sony Computer Entertainment Inc.

Inventor: Ozlem Kalinli-Akbacak
Emotion recognition using auditory attention cues extracted from users voice

Patent number: 9020822

Abstract: Emotion recognition may be implemented on an input window of sound. One or more auditory attention features may be extracted from an auditory spectrum for the window using one or more two-dimensional spectro-temporal receptive filters. One or more feature maps corresponding to the one or more auditory attention features may be generated. Auditory gist features may be extracted from feature maps, and the auditory gist features may be analyzed to determine one or more emotion classes corresponding to the input window of sound. In addition, a bottom-up auditory attention model may be used to select emotionally salient parts of speech and execute emotion recognition only on the salient parts of speech while ignoring the rest of the speech signal.

Type: Grant

Filed: October 19, 2012

Date of Patent: April 28, 2015

Assignee: Sony Computer Entertainment Inc.

Inventor: Ozlem Kalinli-Akbacak
Noise adaptive training for speech recognition

Patent number: 9009039

Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.

Type: Grant

Filed: June 12, 2009

Date of Patent: April 14, 2015

Assignee: Microsoft Technology Licensing, LLC

Inventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
SPEECH SYLLABLE/VOWEL/PHONE BOUNDARY DETECTION USING AUDITORY ATTENTION CUES

Publication number: 20150073794

Abstract: In syllable or vowel or phone boundary detection during speech, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more syllable or vowel or phone boundaries in the input window of sound can be detected by mapping the cumulative gist vector to one or more syllable or vowel or phone boundary characteristics using a machine learning algorithm.

Type: Application

Filed: June 17, 2014

Publication date: March 12, 2015

Inventors: Ozlem Kalinli-Akbacak, Ruxin Chen
INTERFACE USING EYE TRACKING CONTACT LENSES

Publication number: 20140198382

Abstract: Methods of eye gaze tracking are provided using magnetized contact lenses tracked by magnetic sensors and/or reflecting contact lenses tracked by video-based sensors. Tracking information of contact lenses from magnetic sensors and video-based sensors may be used to improve eye tracking and/or combined with other sensor data to improve accuracy. Furthermore, reflective contact lenses improve blink detection while eye gaze tracking is otherwise unimpeded by magnetized contact lenses. Additionally, contact lenses may be adapted for viewing 3D information.

Type: Application

Filed: January 21, 2014

Publication date: July 17, 2014

Applicant: Sony Computer Entertainment Inc.

Inventors: Ruxin Chen, Ozlem Kalinli
Speech syllable/vowel/phone boundary detection using auditory attention cues

Patent number: 8756061

Abstract: In syllable or vowel or phone boundary detection during speech, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more syllable or vowel or phone boundaries in the input window of sound can be detected by mapping the cumulative gist vector to one or more syllable or vowel or phone boundary characteristics using a machine learning algorithm.

Type: Grant

Filed: April 1, 2011

Date of Patent: June 17, 2014

Assignee: Sony Computer Entertainment Inc.

Inventors: Ozlem Kalinli, Ruxin Chen
COMBINING AUDITORY ATTENTION CUES WITH PHONEME POSTERIOR SCORES FOR PHONE/VOWEL/SYLLABLE BOUNDARY DETECTION

Publication number: 20140149112

Abstract: Phoneme boundaries may be determined from a signal corresponding to recorded audio by extracting auditory attention features from the signal and extracting phoneme posteriors from the signal. The auditory attention features and phoneme posteriors may then be combined to detect boundaries in the signal.

Type: Application

Filed: May 23, 2013

Publication date: May 29, 2014

Applicant: Sony Computer Entertainment Inc.

Inventor: Ozlem KALINLI-AKBACAK
MULTI-MODAL SENSOR BASED EMOTION RECOGNITION AND EMOTIONAL INTERFACE

Publication number: 20140112556

Abstract: Features, including one or more acoustic features, visual features, linguistic features, and physical features may be extracted from signals obtained by one or more sensors with a processor. The acoustic, visual, linguistic, and physical features may be analyzed with one or more machine learning algorithms and an emotional state of a user may be extracted from analysis of the features. It is emphasized that this abstract is provided to comply with the rules requiring an abstract that will allow a searcher or other reader to quickly ascertain the subject matter of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims.

Type: Application

Filed: October 19, 2012

Publication date: April 24, 2014

Applicant: Sony Computer Entertainment Inc.

Inventor: Ozlem Kalinli-Akbacak
EMOTION RECOGNITION USING AUDITORY ATTENTION CUES EXTRACTED FROM USERS VOICE

Publication number: 20140114655

Abstract: Emotion recognition may be implemented on an input window of sound. One or more auditory attention features may be extracted from an auditory spectrum for the window using one or more two-dimensional spectro-temporal receptive filters. One or more feature maps corresponding to the one or more auditory attention features may be generated. Auditory gist features may be extracted from feature maps, and the auditory gist features may be analyzed to determine one or more emotion classes corresponding to the input window of sound. In addition, a bottom-up auditory attention model may be used to select emotionally salient parts of speech and execute emotion recognition only on the salient parts of speech while ignoring the rest of the speech signal.

Type: Application

Filed: October 19, 2012

Publication date: April 24, 2014

Applicant: Sony Computer Entertainment Inc.

Inventor: Ozlem Kalinli-Akbacak
Method for tone/intonation recognition using auditory attention cues

Patent number: 8676574

Abstract: In a spoken language processing method for tone/intonation recognition, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more tonal characteristics corresponding to the input window of sound can be determined by mapping the cumulative gist vector to one or more tonal characteristics using a machine learning algorithm.

Type: Grant

Filed: November 10, 2010

Date of Patent: March 18, 2014

Assignee: Sony Computer Entertainment Inc.

Inventor: Ozlem Kalinli
Interface using eye tracking contact lenses

Patent number: 8632182

Abstract: Methods of eye gaze tracking are provided using magnetized contact lenses tracked by magnetic sensors and/or reflecting contact lenses tracked by video-based sensors. Tracking information of contact lenses from magnetic sensors and video-based sensors may be used to improve eye tracking and/or combined with other sensor data to improve accuracy. Furthermore, reflective contact lenses improve blink detection while eye gaze tracking is otherwise unimpeded by magnetized contact lenses. Additionally, contact lenses may be adapted for viewing 3D information.

Type: Grant

Filed: May 5, 2011

Date of Patent: January 21, 2014

Assignee: Sony Computer Entertainment Inc.

Inventors: Ruxin Chen, Ozlem Kalinli

1 2 next