Patents by Inventor Ozlem Kalinli
Ozlem Kalinli has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10424289Abstract: A speech recognition system includes a phone classifier and a boundary classifier. The phone classifier generates combined boundary posteriors from a combination of auditory attention features and phone posteriors by feeding phone posteriors of neighboring frames of an audio signal into a machine learning algorithm to classify phone posterior context information. The boundary classifier estimates boundaries in speech contained in the audio signal from the combined boundary posteriors.Type: GrantFiled: August 14, 2018Date of Patent: September 24, 2019Assignee: SONY INTERACTIVE ENTERTAINMENT INC.Inventor: Ozlem Kalinli-Akbacak
-
Publication number: 20190005943Abstract: A speech recognition system includes a phone classifier and a boundary classifier. The phone classifier generates combined boundary posteriors from a combination of auditory attention features and phone posteriors by feeding phone posteriors of neighboring frames of an audio signal into a machine learning algorithm to classify phone posterior context information. The boundary classifier estimates boundaries in speech contained in the audio signal from the combined boundary posteriors.Type: ApplicationFiled: August 14, 2018Publication date: January 3, 2019Inventor: Ozlem Kalinli-Akbacak
-
Patent number: 10127927Abstract: A method for emotion or speaking style recognition and/or clustering comprises receiving one or more speech samples, generating a set of training data by extracting one or more acoustic features from every frame of the one or more speech samples, and generating a model from the set of training data, wherein the model identifies emotion or speaking style dependent information in the set of training data. The method may further comprise receiving one or more test speech samples, generating a set of test data by extracting one or more acoustic features from every frame of the one or more test speeches, and transforming the set of test data using the model to better represent emotion/speaking style dependent information, and use the transformed data for clustering and/or classification to discover speech with similar emotion or speaking style.Type: GrantFiled: June 18, 2015Date of Patent: November 13, 2018Assignee: Sony Interactive Entertainment Inc.Inventors: Ozlem Kalinli-Akbacak, Ruxin Chen
-
Patent number: 10049657Abstract: Phoneme boundaries may be determined from a signal corresponding to recorded audio by extracting auditory attention features from the signal and extracting phoneme posteriors from the signal. The auditory attention features and phoneme posteriors may then be combined to detect boundaries in the signal.Type: GrantFiled: May 26, 2017Date of Patent: August 14, 2018Assignee: SONY INTERACTIVE ENTERTAINMENT INC.Inventor: Ozlem Kalinli-Akbacak
-
Publication number: 20170263240Abstract: Phoneme boundaries may be determined from a signal corresponding to recorded audio by extracting auditory attention features from the signal and extracting phoneme posteriors from the signal. The auditory attention features and phoneme posteriors may then be combined to detect boundaries in the signal.Type: ApplicationFiled: May 26, 2017Publication date: September 14, 2017Inventor: Ozlem Kalinli-Akbacak
-
Patent number: 9672811Abstract: Phoneme boundaries may be determined from a signal corresponding to recorded audio by extracting auditory attention features from the signal and extracting phoneme posteriors from the signal. The auditory attention features and phoneme posteriors may then be combined to detect boundaries in the signal.Type: GrantFiled: May 23, 2013Date of Patent: June 6, 2017Assignee: SONY INTERACTIVE ENTERTAINMENT INC.Inventor: Ozlem Kalinli-Akbacak
-
Patent number: 9251783Abstract: In syllable or vowel or phone boundary detection during speech, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more syllable or vowel or phone boundaries in the input window of sound can be detected by mapping the cumulative gist vector to one or more syllable or vowel or phone boundary characteristics using a machine learning algorithm.Type: GrantFiled: June 17, 2014Date of Patent: February 2, 2016Assignee: Sony Computer Entertainment Inc.Inventors: Ozlem Kalinli-Akbacak, Ruxin Chen
-
Publication number: 20160027452Abstract: A method for emotion or speaking style recognition and/or clustering comprises receiving one or more speech samples, generating a set of training data by extracting one or more acoustic features from every frame of the one or more speech samples, and generating a model from the set of training data, wherein the model identifies emotion or speaking style dependent information in the set of training data. The method may further comprise receiving one or more test speech samples, generating a set of test data by extracting one or more acoustic features from every frame of the one or more test speeches, and transforming the set of test data using the model to better represent emotion/speaking style dependent information, and use the transformed data for clustering and/or classification to discover speech with similar emotion or speaking style.Type: ApplicationFiled: June 18, 2015Publication date: January 28, 2016Inventors: Ozlem Kalinli-Akbacak, Ruxin Chen
-
Patent number: 9244285Abstract: Methods of eye gaze tracking are provided using magnetized contact lenses tracked by magnetic sensors and/or reflecting contact lenses tracked by video-based sensors. Tracking information of contact lenses from magnetic sensors and video-based sensors may be used to improve eye tracking and/or combined with other sensor data to improve accuracy. Furthermore, reflective contact lenses improve blink detection while eye gaze tracking is otherwise unimpeded by magnetized contact lenses. Additionally, contact lenses may be adapted for viewing 3D information.Type: GrantFiled: January 21, 2014Date of Patent: January 26, 2016Assignee: SONY COMPUTER ENTERTAINMENT INC.Inventors: Ruxin Chen, Ozlem Kalinli
-
Patent number: 9031293Abstract: Features, including one or more acoustic features, visual features, linguistic features, and physical features may be extracted from signals obtained by one or more sensors with a processor. The acoustic, visual, linguistic, and physical features may be analyzed with one or more machine learning algorithms and an emotional state of a user may be extracted from analysis of the features. It is emphasized that this abstract is provided to comply with the rules requiring an abstract that will allow a searcher or other reader to quickly ascertain the subject matter of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims.Type: GrantFiled: October 19, 2012Date of Patent: May 12, 2015Assignee: Sony Computer Entertainment Inc.Inventor: Ozlem Kalinli-Akbacak
-
Patent number: 9020822Abstract: Emotion recognition may be implemented on an input window of sound. One or more auditory attention features may be extracted from an auditory spectrum for the window using one or more two-dimensional spectro-temporal receptive filters. One or more feature maps corresponding to the one or more auditory attention features may be generated. Auditory gist features may be extracted from feature maps, and the auditory gist features may be analyzed to determine one or more emotion classes corresponding to the input window of sound. In addition, a bottom-up auditory attention model may be used to select emotionally salient parts of speech and execute emotion recognition only on the salient parts of speech while ignoring the rest of the speech signal.Type: GrantFiled: October 19, 2012Date of Patent: April 28, 2015Assignee: Sony Computer Entertainment Inc.Inventor: Ozlem Kalinli-Akbacak
-
Patent number: 9009039Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.Type: GrantFiled: June 12, 2009Date of Patent: April 14, 2015Assignee: Microsoft Technology Licensing, LLCInventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
-
Publication number: 20150073794Abstract: In syllable or vowel or phone boundary detection during speech, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more syllable or vowel or phone boundaries in the input window of sound can be detected by mapping the cumulative gist vector to one or more syllable or vowel or phone boundary characteristics using a machine learning algorithm.Type: ApplicationFiled: June 17, 2014Publication date: March 12, 2015Inventors: Ozlem Kalinli-Akbacak, Ruxin Chen
-
Publication number: 20140198382Abstract: Methods of eye gaze tracking are provided using magnetized contact lenses tracked by magnetic sensors and/or reflecting contact lenses tracked by video-based sensors. Tracking information of contact lenses from magnetic sensors and video-based sensors may be used to improve eye tracking and/or combined with other sensor data to improve accuracy. Furthermore, reflective contact lenses improve blink detection while eye gaze tracking is otherwise unimpeded by magnetized contact lenses. Additionally, contact lenses may be adapted for viewing 3D information.Type: ApplicationFiled: January 21, 2014Publication date: July 17, 2014Applicant: Sony Computer Entertainment Inc.Inventors: Ruxin Chen, Ozlem Kalinli
-
Patent number: 8756061Abstract: In syllable or vowel or phone boundary detection during speech, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more syllable or vowel or phone boundaries in the input window of sound can be detected by mapping the cumulative gist vector to one or more syllable or vowel or phone boundary characteristics using a machine learning algorithm.Type: GrantFiled: April 1, 2011Date of Patent: June 17, 2014Assignee: Sony Computer Entertainment Inc.Inventors: Ozlem Kalinli, Ruxin Chen
-
Publication number: 20140149112Abstract: Phoneme boundaries may be determined from a signal corresponding to recorded audio by extracting auditory attention features from the signal and extracting phoneme posteriors from the signal. The auditory attention features and phoneme posteriors may then be combined to detect boundaries in the signal.Type: ApplicationFiled: May 23, 2013Publication date: May 29, 2014Applicant: Sony Computer Entertainment Inc.Inventor: Ozlem KALINLI-AKBACAK
-
Publication number: 20140114655Abstract: Emotion recognition may be implemented on an input window of sound. One or more auditory attention features may be extracted from an auditory spectrum for the window using one or more two-dimensional spectro-temporal receptive filters. One or more feature maps corresponding to the one or more auditory attention features may be generated. Auditory gist features may be extracted from feature maps, and the auditory gist features may be analyzed to determine one or more emotion classes corresponding to the input window of sound. In addition, a bottom-up auditory attention model may be used to select emotionally salient parts of speech and execute emotion recognition only on the salient parts of speech while ignoring the rest of the speech signal.Type: ApplicationFiled: October 19, 2012Publication date: April 24, 2014Applicant: Sony Computer Entertainment Inc.Inventor: Ozlem Kalinli-Akbacak
-
Publication number: 20140112556Abstract: Features, including one or more acoustic features, visual features, linguistic features, and physical features may be extracted from signals obtained by one or more sensors with a processor. The acoustic, visual, linguistic, and physical features may be analyzed with one or more machine learning algorithms and an emotional state of a user may be extracted from analysis of the features. It is emphasized that this abstract is provided to comply with the rules requiring an abstract that will allow a searcher or other reader to quickly ascertain the subject matter of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims.Type: ApplicationFiled: October 19, 2012Publication date: April 24, 2014Applicant: Sony Computer Entertainment Inc.Inventor: Ozlem Kalinli-Akbacak
-
Patent number: 8676574Abstract: In a spoken language processing method for tone/intonation recognition, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more tonal characteristics corresponding to the input window of sound can be determined by mapping the cumulative gist vector to one or more tonal characteristics using a machine learning algorithm.Type: GrantFiled: November 10, 2010Date of Patent: March 18, 2014Assignee: Sony Computer Entertainment Inc.Inventor: Ozlem Kalinli
-
Patent number: 8632182Abstract: Methods of eye gaze tracking are provided using magnetized contact lenses tracked by magnetic sensors and/or reflecting contact lenses tracked by video-based sensors. Tracking information of contact lenses from magnetic sensors and video-based sensors may be used to improve eye tracking and/or combined with other sensor data to improve accuracy. Furthermore, reflective contact lenses improve blink detection while eye gaze tracking is otherwise unimpeded by magnetized contact lenses. Additionally, contact lenses may be adapted for viewing 3D information.Type: GrantFiled: May 5, 2011Date of Patent: January 21, 2014Assignee: Sony Computer Entertainment Inc.Inventors: Ruxin Chen, Ozlem Kalinli