Patents by Inventor Viktor Rozgic

Viktor Rozgic has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Character-level emotion detection

Patent number: 11869535

Abstract: Described is a system and method that determines character sequences from speech, without determining the words of the speech, and processes the character sequences to determine sentiment data indicative of emotional state of a user that output the speech. The emotional state may then be presented or provided as an output to the user.

Type: Grant

Filed: December 12, 2019

Date of Patent: January 9, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Mohammad Taha Bahadori, Viktor Rozgic, Alexander Jonathan Pinkus, Chao Wang, David Heckerman
Sentiment detection in audio data

Patent number: 11854538

Abstract: Described herein is a system for sentiment detection in audio data. The system processes audio frame level features of input audio data using a machine learning algorithm to classify the input audio data into a particular sentiment category. The machine learning algorithm may be a neural network trained using an encoder-decoder method. The training of the machine learning algorithm may include normalization techniques to avoid potential bias in the training data that may occur when the training data is annotated for a perceived sentiment of the speaker.

Type: Grant

Filed: February 15, 2019

Date of Patent: December 26, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Viktor Rozgic, Chao Wang, Ming Sun, Srinivas Parthasarathy
Multiple classifications of audio data

Patent number: 11790919

Abstract: Described herein is a system for sentiment detection in audio data. The system is trained using acoustic information and lexical information to determine a sentiment corresponding to an utterance. In some cases when lexical information is not available, the system (trained on acoustic and lexical information) is configured to determine a sentiment using only acoustic information.

Type: Grant

Filed: May 10, 2022

Date of Patent: October 17, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Gustavo Alfonso Aguilar Alas, Viktor Rozgic, Chao Wang
MULTIPLE CLASSIFICATIONS OF AUDIO DATA

Publication number: 20230027828

Abstract: Described herein is a system for sentiment detection in audio data. The system is trained using acoustic information and lexical information to determine a sentiment corresponding to an utterance. In some cases when lexical information is not available, the system (trained on acoustic and lexical information) is configured to determine a sentiment using only acoustic information.

Type: Application

Filed: May 10, 2022

Publication date: January 26, 2023

Inventors: Gustavo Alfonso Aguilar Alas, Viktor Rozgic, Chao Wang
Emotion detection using speaker baseline

Patent number: 11545174

Abstract: Described herein is a system for emotion detection in audio data using a speaker's baseline. The baseline may represent a user's speaking style in a neutral emotional state. The system is configured to compare the user's baseline with input audio representing speech from the user to determine a emotion of the user. The system may store multiple baselines for the user, each associated with a different context (e.g., environment, activity, etc.), and select one of the baselines to compare with the input audio based on the contextual situation.

Type: Grant

Filed: February 18, 2021

Date of Patent: January 3, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Daniel Kenneth Bone, Chao Wang, Viktor Rozgic
System to determine sentiment from audio data

Patent number: 11532300

Abstract: A device with a microphone acquires audio data of a user's speech. A neural network accepts audio data as input and provides sentiment data as output. The neural network is trained using training data based on input from raters who provide votes as to which sentiment descriptors they think are associated with a sample of speech. A vote by a rater assessing the sample for a particular semantic descriptor is distributed to a plurality of semantically similar semantic descriptors. Semantic descriptor similarity data indicates relative similarity between possible semantic descriptors in the semantic space. The distributed partial votes may be aggregated to produce training data comprising samples of speech and weights of corresponding semantic descriptors. The training data is then used to train the neural network. For example, the neural network may be trained with the training data using per-instance cosine similarity loss or correlational loss.

Type: Grant

Filed: June 26, 2020

Date of Patent: December 20, 2022

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Daniel Kenneth Bone, Viktor Rozgic, Chao Wang
Multiple classifications of audio data

Patent number: 11335347

Abstract: Described herein is a system for sentiment detection in audio data. The system is trained using acoustic information and lexical information to determine a sentiment corresponding to an utterance. In some cases when lexical information is not available, the system (trained on acoustic and lexical information) is configured to determine a sentiment using only acoustic information.

Type: Grant

Filed: June 3, 2019

Date of Patent: May 17, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Gustavo Alfonso Aguilar Alas, Viktor Rozgic, Chao Wang
EMOTION DETECTION USING SPEAKER BASELINE

Publication number: 20210249035

Abstract: Described herein is a system for emotion detection in audio data using a speaker's baseline. The baseline may represent a user's speaking style in a neutral emotional state. The system is configured to compare the user's baseline with input audio representing speech from the user to determine a emotion of the user. The system may store multiple baselines for the user, each associated with a different context (e.g., environment, activity, etc.), and select one of the baselines to compare with the input audio based on the contextual situation.

Type: Application

Filed: February 18, 2021

Publication date: August 12, 2021

Inventors: Daniel Kenneth Bone, Chao Wang, Viktor Rozgic
Media presence detection

Patent number: 11069352

Abstract: Described herein is a system for media presence detection in audio. The system analyzes audio data to recognize whether a given audio segment contains sounds from a media source as a way of differentiating recorded media source sounds from other live sounds. In exemplary embodiments, the system includes a hierarchical model architecture for processing audio data segments, where individual audio data segments are processed by a trained machine learning model operating locally, and another trained machine learning model provides historical and contextual information to determine a score indicating the likelihood that the audio data segment contains sounds from a media source.

Type: Grant

Filed: February 18, 2019

Date of Patent: July 20, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Qingming Tang, Ming Sun, Chieh-Chi Kao, Chao Wang, Viktor Rozgic
MULTIPLE CLASSIFICATIONS OF AUDIO DATA

Publication number: 20210104245

Abstract: Described herein is a system for sentiment detection in audio data. The system is trained using acoustic information and lexical information to determine a sentiment corresponding to an utterance. In some cases when lexical information is not available, the system (trained on acoustic and lexical information) is configured to determine a sentiment using only acoustic information.

Type: Application

Filed: June 3, 2019

Publication date: April 8, 2021

Inventors: Gustavo Alfonso Aguilar Alas, Viktor Rozgic, Chao Wang
Emotion detection using speaker baseline

Patent number: 10943604

Abstract: Described herein is a system for emotion detection in audio data using a speaker's baseline. The baseline may represent a user's speaking style in a neutral emotional state. The system is configured to compare the user's baseline with input audio representing speech from the user to determine a emotion of the user. The system may store multiple baselines for the user, each associated with a different context (e.g., environment, activity, etc.), and select one of the baselines to compare with the input audio based on the contextual situation.

Type: Grant

Filed: June 28, 2019

Date of Patent: March 9, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Daniel Kenneth Bone, Chao Wang, Viktor Rozgic
SYSTEM FOR ASSESSING VOCAL PRESENTATION

Publication number: 20200302952

Abstract: A wearable device with a microphone acquires audio data of a wearer's speech. The audio data is processed to determine sentiment data indicative of perceived emotional content of the speech. For example, the sentiment data may include values for one or more of valence that is based on a particular change in pitch over time, activation that is based on speech pace, dominance that is based on pitch rise and fall patterns, and so forth. A simplified user interface provides the wearer with information about the emotional content of their speech based on the sentiment data. The wearer may use this information to assess their state of mind, facilitate interactions with others, and so forth.

Type: Application

Filed: March 20, 2019

Publication date: September 24, 2020

Inventors: ALEXANDER JONATHAN PINKUS, DOUGLAS GRADT, SAMUEL ELBERT MCGOWAN, CHAD THOMPSON, CHAO WANG, VIKTOR ROZGIC
Multi-view learning in detection of psychological states

Patent number: 9792823

Abstract: Systems and methods for using multi-view learning to leverage highly informative, high-cost, psychophysiological data collected in a laboratory setting, to improve post-traumatic stress disorder (PTSD) screening in the field, where only less-informative, low-cost, speech data are available. Partial least squares (PLS) methods can be used to learn a bilinear factor model, which projects speech and EEG responses onto latent spaces, where the covariance between the projections is maximized. The systems and methods use a speech representation based on a combination of audio and language descriptors extracted from spoken responses to open-ended questions.

Type: Grant

Filed: September 15, 2014

Date of Patent: October 17, 2017

Assignee: Raytheon BBN Technologies Corp.

Inventors: Xiaodan Zhuang, Viktor Rozgic, Michael Roger Crystal
MULTI-VIEW LEARNING IN DETECTION OF PSYCHOLOGICAL STATES

Publication number: 20160078771

Abstract: Systems and methods for using multi-view learning to leverage highly informative, high-cost, psychophysiological data collected in a laboratory setting, to improve post-traumatic stress disorder (PTSD) screening in the field, where only less-informative, low-cost, speech data are available. Partial least squares (PLS) methods can be used to learn a bilinear factor model, which projects speech and EEG responses onto latent spaces, where the covariance between the projections is maximized. The systems and methods use a speech representation based on a combination of audio and language descriptors extracted from spoken responses to open-ended questions.

Type: Application

Filed: September 15, 2014

Publication date: March 17, 2016

Inventors: Xiaodan Zhuang, Viktor Rozgic, Michael Roger Crystal