Patents by Inventor Hagai Attias

Hagai Attias has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DETECTING BEAT INFORMATION USING A DIVERSE SET OF CORRELATIONS

Publication number: 20150007708

Abstract: A beat analysis module is described for determining beat information associated with an audio item. The beat analysis module uses an Expectation-Maximization (EM) approach to determine an average beat period, where correlation is performed over diverse representations of the audio item. The beat analysis module can determine the beat information in a relative short period of time. As such, the beat analysis module can perform its analysis together with another application task (such as a game application task) without disrupting the real time performance of that application task. In one application, a user may select his or her own audio items to be used in conjunction with the application task.

Type: Application

Filed: September 26, 2014

Publication date: January 8, 2015

Applicant: Microsoft Corporation

Inventors: Hagai ATTIAS, Darko KIROVSKI
Speaker detection and tracking using audiovisual data

Patent number: 8842177

Abstract: Object tracking includes an audio model that receives at least two audio input signals and a video model that receives a video input. The audio model and the video model employ probabilistic generative models which are combined to facilitate object tracking. Expectation maximization can be employed to modify trainable parameters of the audio model and the video model.

Type: Grant

Filed: March 31, 2010

Date of Patent: September 23, 2014

Assignee: Microsoft Corporation

Inventors: Matthew James Beal, Nebojsa Jojic, Hagai Attias
Detecting Beat Information Using a Diverse Set of Correlations

Publication number: 20100300271

Abstract: A beat analysis module is described for determining beat information associated with an audio item. The beat analysis module uses an Expectation-Maximization (EM) approach to determine an average beat period, where correlation is performed over diverse representations of the audio item. The beat analysis module can determine the beat information in a relative short period of time. As such, the beat analysis module can perform its analysis together with another application task (such as a game application task) without disrupting the real time performance of that application task. In one application, a user may select his or her own audio items to be used in conjunction with the application task.

Type: Application

Filed: May 27, 2009

Publication date: December 2, 2010

Applicant: Microsoft Corporation

Inventors: Hagai Attias, Darko Kirovski
SPEAKER DETECTION AND TRACKING USING AUDIOVISUAL DATA

Publication number: 20100194881

Abstract: Object tracking includes an audio model that receives at least two audio input signals and a video model that receives a video input. The audio model and the video model employ probabilistic generative models which are combined to facilitate object tracking. Expectation maximization can be employed to modify trainable parameters of the audio model and the video model.

Type: Application

Filed: March 31, 2010

Publication date: August 5, 2010

Applicant: Microsoft Corporation

Inventors: Matthew James Beal, Nebojsa Jojic, Hagai Attias
Speaker detection and tracking using audiovisual data

Patent number: 7692685

Abstract: A system and method facilitating object tracking is provided. The invention includes an audio model that receives at least two audio input signals and a video model that receives a video input. The audio model and the video model employ probabilistic generative models which are combined to facilitate object tracking. Expectation maximization can be employed to modify trainable parameters of the audio model and the video model.

Type: Grant

Filed: March 31, 2005

Date of Patent: April 6, 2010

Assignee: Microsoft Corporation

Inventors: Matthew James Beal, Nebojsa Jojic, Hagai Attias
Speech detection and enhancement using audio/video fusion

Patent number: 7689413

Abstract: A system and method facilitating speech detection and/or enhancement utilizing audio/video fusion is provided. The present invention fuses audio and video in a probabilistic generative model that implements cross-model, self-supervised learning, enabling rapid adaptation to audio visual data. The system can learn to detect and enhance speech in noise given only a short (e.g., 30 second) sequence of audio-visual data. In addition, it automatically learns to track the lips as they move around in the video.

Type: Grant

Filed: September 10, 2007

Date of Patent: March 30, 2010

Assignee: Microsoft Corporation

Inventors: John R. Hershey, Trausti Thor Kristajanson, Hagai Attias, Nebojsa Jojic
Method and apparatus for scene learning and three-dimensional tracking using stereo video cameras

Patent number: 7486815

Abstract: A method and apparatus are provided for learning a model for the appearance of an object while tracking the position of the object in three dimensions. Under embodiments of the present invention, this is achieved by combining a particle filtering technique for tracking the object's position with an expectation-maximization technique for learning the appearance of the object. Two stereo cameras are used to generate data for the learning and tracking.

Type: Grant

Filed: February 20, 2004

Date of Patent: February 3, 2009

Assignee: Microsoft Corporation

Inventors: Trausti Kristjansson, Hagai Attias, John R. Hershey
Method of speech recognition using variational inference with switching state space models

Patent number: 7487087

Abstract: A method is developed which includes 1) defining a switching state space model for a continuous valued hidden production-related parameter and the observed speech acoustics, and 2) approximating a posterior probability that provides the likelihood of a sequence of the hidden production-related parameters and a sequence of speech units based on a sequence of observed input values. In approximating the posterior probability, the boundaries of the speech units are not fixed but are optimally determined. Under one embodiment, a mixture of Gaussian approximation is used. In another embodiment, an HMM posterior approximation is used.

Type: Grant

Filed: November 9, 2004

Date of Patent: February 3, 2009

Assignee: Microsoft Corporation

Inventors: Hagai Attias, Leo Jingyu Lee, Li Deng
Method of speech recognition using multimodal variational inference with switching state space models

Patent number: 7480615

Abstract: A method of efficiently setting posterior probability parameters for a switching state space model begins by defining a window containing at least two but fewer than all of the frames. A separate posterior probability parameter is determined for each frame in the window. The window is then shifted sequentially from left to right in time so that it includes one or more subsequent frames in the sequence of frames. A separate posterior probability parameter is then determined for each frame in the shifted window. This method closely approximates a more rigorous solution but saves computational cost by two to three orders of magnitude. Further, a method of determining the optimal discrete state sequence in the switching state space model is invented that directly exploits the observation vector on a frame-by-frame basis and operates from left to right in time.

Type: Grant

Filed: January 20, 2004

Date of Patent: January 20, 2009

Assignee: Microsoft Corporation

Inventors: Hagai Attias, Li Deng, Leo Lee
Variational inference and learning for segmental switching state space models of hidden speech dynamics

Patent number: 7454336

Abstract: A system and method that facilitate modeling unobserved speech dynamics based upon a hidden dynamic speech model in the form of segmental switching state space model that employs model parameters including those describing the unobserved speech dynamics and those describing the relationship between the unobserved speech dynamic vector and the observed acoustic feature vector is provided. The model parameters are modified based, at least in part, upon, a variational learning technique. In accordance with an aspect of the present invention, novel and powerful variational expectation maximization (EM) algorithm(s) for the segmental switching state space models used in speech applications, which are capable of capturing key internal (or hidden) dynamics of natural speech production, are provided. For example, modification of model parameters can be based upon an approximate mixture of Gaussian (MOG) posterior and/or based upon an approximate hidden Markov model (HMM) posterior using a variational technique.

Type: Grant

Filed: June 20, 2003

Date of Patent: November 18, 2008

Assignee: Microsoft Corporation

Inventors: Hagai Attias, Li Deng, Leo J. Lee
Quantum mechanical model-based system and method for global optimization

Patent number: 7398162

Abstract: A model-based system and method for global optimization that utilizes quantum mechanics in order to approximate the global minimum of a given problem (e.g., mathematical function). A quantum mechanical particle with a sufficiently large mass has a ground state solution to the Schrödinger Equation which is localized to the global minimum of the energy field, or potential, it experiences. A given function is modeled as a potential, and a quantum mechanical particle with a sufficiently large mass is placed in the potential. The ground state of the particle is determined, and the probability density function of the ground state of the particle is calculated. The peak of the probability density function is localized to the global minimum of the potential.

Type: Grant

Filed: February 21, 2003

Date of Patent: July 8, 2008

Assignee: Microsoft Corporation

Inventors: Oliver B. Downs, Hagai Attias, Christopher J. C. Burges, Robert L. Rounthwaite
SPEECH DETECTION AND ENHANCEMENT USING AUDIO/VIDEO FUSION

Publication number: 20080059174

Abstract: A system and method facilitating speech detection and/or enhancement utilizing audio/video fusion is provided. The present invention fuses audio and video in a probabilistic generative model that implements cross-model, self-supervised learning, enabling rapid adaptation to audio visual data. The system can learn to detect and enhance speech in noise given only a short (e.g., 30 second) sequence of audio-visual data. In addition, it automatically learns to track the lips as they move around in the video.

Type: Application

Filed: September 10, 2007

Publication date: March 6, 2008

Applicant: MICROSOFT CORPORATION

Inventors: John Hershey, Trausti Kristjansson, Hagai Attias, Nebojsa Jojic
Searching multimedia databases using multimedia queries

Patent number: 7325008

Abstract: A system and method for generating responsibility vectors associated with multi-media files (e.g., audio and/or video files) is provided. The responsibility vectors are based upon responsibility of mixture components fitted to a mixture model for frames of the files. The responsibility vectors can be grouped based upon clustering related to extracted identifiable features of frames of the multi-media files. Once generated, responsibility vectors can be searched by a multi-media searching system. Also provided is a system for multi-media searching based, at least in part upon responsibility vectors associated with a query segment and multi-media files. The system can generate a query profile based, at least in part, upon responsibility vectors of frames of the query segment. The system can further generate segment profiles of segments of the multi-media files.

Type: Grant

Filed: July 20, 2005

Date of Patent: January 29, 2008

Assignee: Microsoft Corporation

Inventor: Hagai Attias
Speech detection and enhancement using audio/video fusion

Patent number: 7269560

Abstract: A system and method facilitating speech detection and/or enhancement utilizing audio/video fusion is provided. The present invention fuses audio and video in a probabilistic generative model that implements cross-model, self-supervised learning, enabling rapid adaptation to audio visual data. The system can learn to detect and enhance speech in noise given only a short (e.g., 30 second) sequence of audio-visual data. In addition, it automatically learns to track the lips as they move around in the video.

Type: Grant

Filed: June 27, 2003

Date of Patent: September 11, 2007

Assignee: Microsoft Corporation

Inventors: John R. Hershey, Trausti Thor Kristjansson, Hagai Attias, Nebojsa Jojic
Audio source separation based on flexible pre-trained probabilistic source models

Publication number: 20070154033

Abstract: Improved audio source separation is provided by providing an audio dictionary for each source to be separated. Thus the invention can be regarded as providing “partially blind” source separation as opposed to the more commonly considered “blind” source separation problem, where no prior information about the sources is given. The audio dictionaries are probabilistic source models, and can be derived from training data from the sources to be separated, or from similar sources. Thus a library of audio dictionaries can be developed to aid in source separation. An unmixing and deconvolutive transformation can be inferred by maximum likelihood (ML) given the received signals and the selected audio dictionaries as input to the ML calculation. Optionally, frequency-domain filtering of the separated signal estimates can be performed prior to reconstructing the time-domain separated signal estimates. Such filtering can be regarded as providing an “audio skin” for a recovered signal.

Type: Application

Filed: December 1, 2006

Publication date: July 5, 2007

Inventor: Hagai Attias
Method of speech recognition using time-dependent interpolation and hidden dynamic value classes

Patent number: 7206741

Abstract: A speech signal is decoded by determining a production-related value for a current state based on an optimal production-related value at the end of a preceding state, the optimal production-related value being selected from a set of continuous values. The production-related value is used to determine a likelihood of a phone being represented by a set of observation vectors that are aligned with a path between the preceding state and the current state. The likelihood of the phone is combined with a score from the preceding state to determine a score for the current state, the score from the preceding state being associated with a discrete class of production-related values wherein the class matches the class of the optimal production-related value.

Type: Grant

Filed: December 6, 2005

Date of Patent: April 17, 2007

Assignee: Microsoft Corporation

Inventors: Li Deng, Jian-lai Zhou, Frank Torsten Bernd Seide, Asela J. R. Gunawardana, Hagai Attias, Alejandro Acero, Xuedong Huang
Microphone array signal enhancement using mixture models

Patent number: 7103541

Abstract: A system and method facilitating signal enhancement utilizing mixture models is provided. The invention includes a signal enhancement adaptive system having a speech model, a noise model and a plurality of adaptive filter parameters. The signal enhancement adaptive system employs probabilistic modeling to perform signal enhancement of a plurality of windowed frequency transformed input signals received, for example, for an array of microphones. The signal enhancement adaptive system incorporates information about the statistical structure of speech signals. The signal enhancement adaptive system can be embedded in an overall enhancement system which also includes components of signal windowing and frequency transformation.

Type: Grant

Filed: June 27, 2002

Date of Patent: September 5, 2006

Assignee: Microsoft Corporation

Inventors: Hagai Attias, Li Deng
Managing media objects in a database

Patent number: 7076503

Abstract: A method and apparatus are provided for organizing media objects in a database using contextual information for a media object and known media objects, categories, indexes and searches, to arrive at an inference for cataloging the media object in a database. The media object may then be cataloged in the database according to the inference. A method and apparatus are provided for clustering media objects by forming groups of unlabeled data and applying a distance metric to said group.

Type: Grant

Filed: December 19, 2001

Date of Patent: July 11, 2006

Assignee: Microsoft Corporation

Inventors: John Carlton Platt, Jonathan Kagle, Hagai Attias, Victoria Elizabeth Milton
Method of speech recognition using time-dependent interpolation and hidden dynamic value classes

Patent number: 7050975

Abstract: A method of speech recognition is provided that identifies a production-related dynamics value by performing a linear interpolation between a production-related dynamics value at a previous time and a production-related target using a time-dependent interpolation weight. The hidden production-related dynamics value is used to compute a predicted value that is compared to an observed value of acoustics to determine the likelihood of the observed acoustics given a sequence of hidden phonological units. In some embodiments, the production-related dynamics value at the previous time is selected from a set of continuous values. In addition, the likelihood of the observed acoustics given a sequence of hidden phonological units is combined with a score associated with a discrete class of production-related dynamic values at the previous time to determine a score for a current phonological state.

Type: Grant

Filed: October 9, 2002

Date of Patent: May 23, 2006

Assignee: Microsoft Corporation

Inventors: Li Deng, Jian-Iai Zhou, Frank Torsten Bernd Seide, Asela J. R. Gunawardana, Hagai Attias, Alejandro Acero, Xuedong Huang
Method of speech recognition using time-dependent interpolation and hidden dynamic value classes

Publication number: 20060085191

Abstract: A speech signal is decoded by determining a production-related value for a current state based on an optimal production-related value at the end of a preceding state, the optimal production-related value being selected from a set of continuous values. The production-related value is used to determine a likelihood of a phone being represented by a set of observation vectors that are aligned with a path between the preceding state and the current state. The likelihood of the phone is combined with a score from the preceding state to determine a score for the current state, the score from the preceding state being associated with a discrete class of production-related values wherein the class matches the class of the optimal production-related value.

Type: Application

Filed: December 6, 2005

Publication date: April 20, 2006

Applicant: Microsoft Corporation

Inventors: Li Deng, Jian-Iai Zhou, Frank Seide, Asela Gunawardana, Hagai Attias, Alejandro Acero, Xuedong Huang

1 2 3 next