Patents by Inventor Viktor Rozgic

Viktor Rozgic has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11869535
    Abstract: Described is a system and method that determines character sequences from speech, without determining the words of the speech, and processes the character sequences to determine sentiment data indicative of emotional state of a user that output the speech. The emotional state may then be presented or provided as an output to the user.
    Type: Grant
    Filed: December 12, 2019
    Date of Patent: January 9, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Mohammad Taha Bahadori, Viktor Rozgic, Alexander Jonathan Pinkus, Chao Wang, David Heckerman
  • Patent number: 11854538
    Abstract: Described herein is a system for sentiment detection in audio data. The system processes audio frame level features of input audio data using a machine learning algorithm to classify the input audio data into a particular sentiment category. The machine learning algorithm may be a neural network trained using an encoder-decoder method. The training of the machine learning algorithm may include normalization techniques to avoid potential bias in the training data that may occur when the training data is annotated for a perceived sentiment of the speaker.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: December 26, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Viktor Rozgic, Chao Wang, Ming Sun, Srinivas Parthasarathy
  • Patent number: 11790919
    Abstract: Described herein is a system for sentiment detection in audio data. The system is trained using acoustic information and lexical information to determine a sentiment corresponding to an utterance. In some cases when lexical information is not available, the system (trained on acoustic and lexical information) is configured to determine a sentiment using only acoustic information.
    Type: Grant
    Filed: May 10, 2022
    Date of Patent: October 17, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Gustavo Alfonso Aguilar Alas, Viktor Rozgic, Chao Wang
  • Publication number: 20230027828
    Abstract: Described herein is a system for sentiment detection in audio data. The system is trained using acoustic information and lexical information to determine a sentiment corresponding to an utterance. In some cases when lexical information is not available, the system (trained on acoustic and lexical information) is configured to determine a sentiment using only acoustic information.
    Type: Application
    Filed: May 10, 2022
    Publication date: January 26, 2023
    Inventors: Gustavo Alfonso Aguilar Alas, Viktor Rozgic, Chao Wang
  • Patent number: 11545174
    Abstract: Described herein is a system for emotion detection in audio data using a speaker's baseline. The baseline may represent a user's speaking style in a neutral emotional state. The system is configured to compare the user's baseline with input audio representing speech from the user to determine a emotion of the user. The system may store multiple baselines for the user, each associated with a different context (e.g., environment, activity, etc.), and select one of the baselines to compare with the input audio based on the contextual situation.
    Type: Grant
    Filed: February 18, 2021
    Date of Patent: January 3, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Daniel Kenneth Bone, Chao Wang, Viktor Rozgic
  • Patent number: 11532300
    Abstract: A device with a microphone acquires audio data of a user's speech. A neural network accepts audio data as input and provides sentiment data as output. The neural network is trained using training data based on input from raters who provide votes as to which sentiment descriptors they think are associated with a sample of speech. A vote by a rater assessing the sample for a particular semantic descriptor is distributed to a plurality of semantically similar semantic descriptors. Semantic descriptor similarity data indicates relative similarity between possible semantic descriptors in the semantic space. The distributed partial votes may be aggregated to produce training data comprising samples of speech and weights of corresponding semantic descriptors. The training data is then used to train the neural network. For example, the neural network may be trained with the training data using per-instance cosine similarity loss or correlational loss.
    Type: Grant
    Filed: June 26, 2020
    Date of Patent: December 20, 2022
    Assignee: AMAZON TECHNOLOGIES, INC.
    Inventors: Daniel Kenneth Bone, Viktor Rozgic, Chao Wang
  • Patent number: 11335347
    Abstract: Described herein is a system for sentiment detection in audio data. The system is trained using acoustic information and lexical information to determine a sentiment corresponding to an utterance. In some cases when lexical information is not available, the system (trained on acoustic and lexical information) is configured to determine a sentiment using only acoustic information.
    Type: Grant
    Filed: June 3, 2019
    Date of Patent: May 17, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Gustavo Alfonso Aguilar Alas, Viktor Rozgic, Chao Wang
  • Publication number: 20210249035
    Abstract: Described herein is a system for emotion detection in audio data using a speaker's baseline. The baseline may represent a user's speaking style in a neutral emotional state. The system is configured to compare the user's baseline with input audio representing speech from the user to determine a emotion of the user. The system may store multiple baselines for the user, each associated with a different context (e.g., environment, activity, etc.), and select one of the baselines to compare with the input audio based on the contextual situation.
    Type: Application
    Filed: February 18, 2021
    Publication date: August 12, 2021
    Inventors: Daniel Kenneth Bone, Chao Wang, Viktor Rozgic
  • Patent number: 11069352
    Abstract: Described herein is a system for media presence detection in audio. The system analyzes audio data to recognize whether a given audio segment contains sounds from a media source as a way of differentiating recorded media source sounds from other live sounds. In exemplary embodiments, the system includes a hierarchical model architecture for processing audio data segments, where individual audio data segments are processed by a trained machine learning model operating locally, and another trained machine learning model provides historical and contextual information to determine a score indicating the likelihood that the audio data segment contains sounds from a media source.
    Type: Grant
    Filed: February 18, 2019
    Date of Patent: July 20, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Qingming Tang, Ming Sun, Chieh-Chi Kao, Chao Wang, Viktor Rozgic
  • Publication number: 20210104245
    Abstract: Described herein is a system for sentiment detection in audio data. The system is trained using acoustic information and lexical information to determine a sentiment corresponding to an utterance. In some cases when lexical information is not available, the system (trained on acoustic and lexical information) is configured to determine a sentiment using only acoustic information.
    Type: Application
    Filed: June 3, 2019
    Publication date: April 8, 2021
    Inventors: Gustavo Alfonso Aguilar Alas, Viktor Rozgic, Chao Wang
  • Patent number: 10943604
    Abstract: Described herein is a system for emotion detection in audio data using a speaker's baseline. The baseline may represent a user's speaking style in a neutral emotional state. The system is configured to compare the user's baseline with input audio representing speech from the user to determine a emotion of the user. The system may store multiple baselines for the user, each associated with a different context (e.g., environment, activity, etc.), and select one of the baselines to compare with the input audio based on the contextual situation.
    Type: Grant
    Filed: June 28, 2019
    Date of Patent: March 9, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Daniel Kenneth Bone, Chao Wang, Viktor Rozgic
  • Publication number: 20200302952
    Abstract: A wearable device with a microphone acquires audio data of a wearer's speech. The audio data is processed to determine sentiment data indicative of perceived emotional content of the speech. For example, the sentiment data may include values for one or more of valence that is based on a particular change in pitch over time, activation that is based on speech pace, dominance that is based on pitch rise and fall patterns, and so forth. A simplified user interface provides the wearer with information about the emotional content of their speech based on the sentiment data. The wearer may use this information to assess their state of mind, facilitate interactions with others, and so forth.
    Type: Application
    Filed: March 20, 2019
    Publication date: September 24, 2020
    Inventors: ALEXANDER JONATHAN PINKUS, DOUGLAS GRADT, SAMUEL ELBERT MCGOWAN, CHAD THOMPSON, CHAO WANG, VIKTOR ROZGIC
  • Patent number: 9792823
    Abstract: Systems and methods for using multi-view learning to leverage highly informative, high-cost, psychophysiological data collected in a laboratory setting, to improve post-traumatic stress disorder (PTSD) screening in the field, where only less-informative, low-cost, speech data are available. Partial least squares (PLS) methods can be used to learn a bilinear factor model, which projects speech and EEG responses onto latent spaces, where the covariance between the projections is maximized. The systems and methods use a speech representation based on a combination of audio and language descriptors extracted from spoken responses to open-ended questions.
    Type: Grant
    Filed: September 15, 2014
    Date of Patent: October 17, 2017
    Assignee: Raytheon BBN Technologies Corp.
    Inventors: Xiaodan Zhuang, Viktor Rozgic, Michael Roger Crystal
  • Publication number: 20160078771
    Abstract: Systems and methods for using multi-view learning to leverage highly informative, high-cost, psychophysiological data collected in a laboratory setting, to improve post-traumatic stress disorder (PTSD) screening in the field, where only less-informative, low-cost, speech data are available. Partial least squares (PLS) methods can be used to learn a bilinear factor model, which projects speech and EEG responses onto latent spaces, where the covariance between the projections is maximized. The systems and methods use a speech representation based on a combination of audio and language descriptors extracted from spoken responses to open-ended questions.
    Type: Application
    Filed: September 15, 2014
    Publication date: March 17, 2016
    Inventors: Xiaodan Zhuang, Viktor Rozgic, Michael Roger Crystal