Patents by Inventor Zhengyou Zhang

Zhengyou Zhang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Multi-modal device power/mode management

Patent number: 8180465

Abstract: A system that facilitates managing resources (e.g., functionality, services) based at least in part upon an established context. More particularly, a context determination component can be employed to establish a context by processing sensor inputs or learning/inferring a user action/preference. Once the context is established via context determination component, a power/mode management component can be employed to activate and/or mask resources in accordance with the established context. The power and mode management of the device can extend life of a power source (e.g., battery) and mask functionality in accordance with a user and/or device state.

Type: Grant

Filed: January 15, 2008

Date of Patent: May 15, 2012

Assignee: Microsoft Corporation

Inventors: Michael J. Sinclair, David W. Williams, Zhengyou Zhang, Zicheng Liu
Learning image enhancement

Patent number: 8175382

Abstract: Image enhancement techniques are described to enhance an image in accordance with a set of training images. In an implementation, an image color tone map is generated for a facial region included in an image. The image color tone map may be normalized to a color tone map for a set of training images so that the image color tone map matches the map for the training images. The normalized color tone map may be applied to the image to enhance the in-question image. In further implementations, the procedure may be updated when the average color intensity in non-facial regions differs from an accumulated mean by a threshold amount.

Type: Grant

Filed: May 10, 2007

Date of Patent: May 8, 2012

Assignee: Microsoft Corporation

Inventors: Zicheng Liu, Cha Zhang, Zhengyou Zhang
IN-HOME DEPTH CAMERA CALIBRATION

Publication number: 20120105585

Abstract: A system and method are disclosed for calibrating a depth camera in a natural user interface. The system in general obtains an objective measurement of true distance between a capture device and one or more objects in a scene. The system then compares the true depth measurement to the depth measurement provided by the depth camera at one or more points and determines an error function describing an error in the depth camera measurement. The depth camera may then be recalibrated to correct for the error. The objective measurement of distance to one or more objects in a scene may be accomplished by a variety of systems and methods.

Type: Application

Filed: November 3, 2010

Publication date: May 3, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Prafulla J. Masalkar, Szymon P. Stachniak, Tommer Leyvand, Zhengyou Zhang, Leonardo Del Castillo, Zsolt Mathe
Text-dependent speaker verification

Patent number: 8099288

Abstract: A text-dependent speaker verification technique that uses a generic speaker-independent speech recognizer for robust speaker verification, and uses the acoustical model of a speaker-independent speech recognizer as a background model. Instead of using a likelihood ratio test (LRT) at the utterance level (e.g., the sentence level), which is typical of most speaker verification systems, the present text-dependent speaker verification technique uses weighted sum of likelihood ratios at the sub-unit level (word, tri-phone, or phone) as well as at the utterance level.

Type: Grant

Filed: February 12, 2007

Date of Patent: January 17, 2012

Assignee: Microsoft Corp.

Inventors: Zhengyou Zhang, Amarnag Subramaya
HIERARCHICAL FILTERED MOTION FIELD FOR ACTION RECOGNITION

Publication number: 20110311137

Abstract: Described is a hierarchical filtered motion field technology such as for use in recognizing actions in videos with crowded backgrounds. Interest points are detected, e.g., as 2D Harris corners with recent motion, e.g. locations with high intensities in a motion history image (MHI). A global spatial motion smoothing filter is applied to the gradients of MHI to eliminate low intensity corners that are likely isolated, unreliable or noisy motions. At each remaining interest point, a local motion field filter is applied to the smoothed gradients by computing a structure proximity between sets of pixels in the local region and the interest point. The motion at a pixel/pixel set is enhanced or weakened based on its structure proximity with the interest point (nearer pixels are enhanced).

Type: Application

Filed: June 22, 2010

Publication date: December 22, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Zicheng Liu, Yingli Tian, Liangliang Cao, Zhengyou Zhang
MULTI-MODAL GENDER RECOGNITION

Publication number: 20110307260

Abstract: Gender recognition is performed using two or more modalities. For example, depth image data and one or more types of data other than depth image data is received. The data pertains to a person. The different types of data are fused together to automatically determine gender of the person. A computing system can subsequently interact with the person based on the determination of gender.

Type: Application

Filed: June 11, 2010

Publication date: December 15, 2011

Inventors: Zhengyou Zhang, Alex Aben-Athar Kipman
Multimodal authentication

Patent number: 8079079

Abstract: A multimodal system that employs a plurality of sensing modalities which can be processed concurrently to increase confidence in connection with authentication. The multimodal system and/or set of various devices can provide several points of information entry in connection with authentication. Authentication can be improved, for example, by combining face recognition, biometrics, speech recognition, handwriting recognition, gait recognition, retina scan, thumb/hand prints, or subsets thereof. Additionally, portable multimodal devices (e.g., a smartphone) can be used as credit cards, and authentication in connection with such use can mitigate unauthorized transactions.

Type: Grant

Filed: June 29, 2005

Date of Patent: December 13, 2011

Assignee: Microsoft Corporation

Inventors: Zhengyou Zhang, David W. Williams, Yuan Kong, Zicheng Liu, David Kurlander, Mike Sinclair
Distinguishing Live Faces from Flat Surfaces

Publication number: 20110299741

Abstract: Multiple images including a face presented by a user are accessed. One or more determinations are made based on the multiple images, such as a determination of whether the face included in the multiple images is a 3-dimensional structure or a flat surface and/or a determination of whether motion is present in one or more face components (e.g., eyes or mouth). If it is determined that the face included in the multiple images is a 3-dimensional structure or that that motion is present in the one or more face components, then an indication is provided that the user can be authenticated. However, if it is determined that the face included in the multiple images is a flat surface or that motion is not present in the one or more face components, then an indication is provided that the user cannot be authenticated.

Type: Application

Filed: June 8, 2010

Publication date: December 8, 2011

Applicant: Microsoft Corporation

Inventors: Zhengyou Zhang, Qin Cai, Pieter R. Kasselman, Arthur H. Baker
Spatial audio conferencing

Patent number: 8073125

Abstract: Audio in an audio conference is spatialized using either virtual sound-source positioning or sound-field capture. A spatial audio conference is provided between a local and remote parties using audio conferencing devices (ACDs) interconnected by a network. Each ACD captures spatial audio information from the local party, generates either one, or three or more, audio data streams which include the captured information, and transmits the generated stream(s) to each remote party. Each ACD also receives the generated audio data stream(s) transmitted from each of the remote parties, processes the received streams to generate a plurality of audio signals, and renders the signals to produce a sound-field that is perceived by the local party, where the sound-field includes the spatial audio information captured from the remote parties. A sound-field capture device is also provided which includes at least three directional microphones symmetrically configured about a center axis in a semicircular array.

Type: Grant

Filed: September 25, 2007

Date of Patent: December 6, 2011

Assignee: Microsoft Corporation

Inventors: Zhengyou Zhang, James Johnston
DETECTING REACTIONS AND PROVIDING FEEDBACK TO AN INTERACTION

Publication number: 20110295392

Abstract: Reaction information of participants to an interaction may be sensed and analyzed to determine one or more reactions or dispositions of the participants. Feedback may be provided based on the determined reactions. The participants may be given an opportunity to opt in to having their reaction information collected, and may be provided complete control over how their reaction information is shared or used.

Type: Application

Filed: May 27, 2010

Publication date: December 1, 2011

Applicant: Microsoft Corporation

Inventors: Sharon K. Cunnington, Rajesh K. Hegde, Kori Quinn, Jin Li, Philip A. Chou, Zhengyou Zhang, Desney S. Tan
ACCELERATED INSTANT REPLAY FOR CO-PRESENT AND DISTRIBUTED MEETINGS

Publication number: 20110267419

Abstract: Techniques for recording and replay of a live conference while still attending the live conference are described. A conferencing system includes a user interface generator, a live conference processing module, and a replay processing module. The user interface generator is configured to generate a user interface that includes a replay control panel and one or more output panels. The live conference processing module is configured to extract information included in received conferencing data that is associated with one or more conferencing modalities, and to display the information in the one or more output panels in a live manner (e.g., as a live conference). The replay processing module is configured to enable information associated with the one or more conferencing modalities corresponding to a time of the conference session prior to live to be presented at a desired rate, possibly different from the real-time rate, if a replay mode is selected in the replay control panel.

Type: Application

Filed: April 30, 2010

Publication date: November 3, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Kori Inkpen Quinn, Rajesh Hegde, Zhengyou Zhang, John Tang, Sasa Junuzovic, Christopher Brooks
Video noise reduction

Patent number: 8031967

Abstract: A video noise reduction technique is presented. Generally, the technique involves first decomposing each frame of the video into low-pass and high-pass frequency components. Then, for each frame of the video after the first frame, an estimate of a noise variance in the high pass component is obtained. The noise in the high pass component of each pixel of each frame is reduced using the noise variance estimate obtained for the frame under consideration, whenever there has been no substantial motion exhibited by the pixel since the last previous frame. Evidence of motion is determined by analyzing the high and low pass components.

Type: Grant

Filed: June 19, 2007

Date of Patent: October 4, 2011

Assignee: Microsoft Corporation

Inventors: Cha Zhang, Zhengyou Zhang, Zicheng Liu
Recovering parameters from a sub-optimal image

Patent number: 8009880

Abstract: A subregion-based image parameter recovery system and method for recovering image parameters from a single image containing a face taken under sub-optimal illumination conditions. The recovered image parameters (including albedo, illumination, and face geometry) can be used to generate face images under a new lighting environment. The method includes dividing the face in the image into numerous smaller regions, generating an albedo morphable model for each region, and using a Markov Random Fields (MRF)-based framework to model the spatial dependence between neighboring regions. Different types of regions are defined, including saturated, shadow, regular, and occluded regions. Each pixel in the image is classified and assigned to a region based on intensity, and then weighted based on its classification.

Type: Grant

Filed: May 11, 2007

Date of Patent: August 30, 2011

Assignee: Microsoft Corporation

Inventors: Zhengyou Zhang, Zicheng Liu, Gang Hua, Yang Wang
Translation and capture architecture for output of conversational utterances

Patent number: 7991607

Abstract: Architecture that combines capture and translation of concepts, goals, needs, locations, objects, locations, and items (e.g., sign text) into complete conversational utterances that take a translation of the item, and morph it with fluidity into sets of sentences that can be echoed to a user, and that the user can select to communicate speech (or textual utterances). A plurality of modalities that process images, audio, video, searches and cultural context, for example, which are representative of at least context and/or content, and can be employed to glean additional information regarding a communications exchange to facilitate more accurate and efficient translation. Gesture recognition can be utilized to enhance input recognition, urgency, and/or emotional interaction, for example. Speech can be used for document annotation. Moreover, translation (e.g., speech to speech, text to speech, speech to text, handwriting to speech, text or audio, . . .

Type: Grant

Filed: June 27, 2005

Date of Patent: August 2, 2011

Assignee: Microsoft Corporation

Inventors: Zhengyou Zhang, David W. Williams, Yuan Kong, Zicheng Liu
Automated Acquisition of Facial Images

Publication number: 20110170739

Abstract: Described is a technology by which medical patient facial images are acquired and maintained for associating with a patient's records and/or other items. A video camera may provide video frames, such as captured when a patient is being admitted to a hospital. Face detection may be employed to clip the facial part from the frame. Multiple images of a patient's face may be displayed on a user interface to allow selection of a representative image. Also described is obtaining the patient images by processing electronic documents (e.g., patient records) to look for a face pictured therein.

Type: Application

Filed: January 12, 2010

Publication date: July 14, 2011

Applicant: Microsoft Corporation

Inventors: Michael Gillam, John Christopher Gillotte, Craig Frederick Feied, Jonathan Alan Handler, Renato Reder Cazangi, Rajesh Kutpadi Hegde, Zhengyou Zhang, Cha Zhang
Multiple Category Learning for Training Classifiers

Publication number: 20110119210

Abstract: Described is multiple category learning to jointly train a plurality of classifiers in an iterative manner. Each training iteration associates an adaptive label with each training example, in which during the iterations, the adaptive label of any example is able to be changed by the subsequent reclassification. In this manner, any mislabeled training example is corrected by the classifiers during training. The training may use a probabilistic multiple category boosting algorithm that maintains probability data provided by the classifiers, or a winner-take-all multiple category boosting algorithm selects the adaptive label based upon the highest probability classification. The multiple category boosting training system may be coupled to a multiple instance learning mechanism to obtain the training examples. The trained classifiers may be used as weak classifiers that provide a label used to select a deep classifier for further classification, e.g., to provide a multi-view object detector.

Type: Application

Filed: November 16, 2009

Publication date: May 19, 2011

Applicant: c/o Microsoft Corporation

Inventors: Cha Zhang, Zhengyou Zhang
GESTURE PERSONALIZATION AND PROFILE ROAMING

Publication number: 20110093820

Abstract: A gesture-based system may have default or pre-packaged gesture information, where a gesture is derived from a user's position or motion in a physical space. In other words, no controllers or devices are necessary. Depending on how a user uses his or her gesture to accomplish the task, the system may refine the properties and the gesture may become personalized. The personalized gesture information may be stored in a gesture profile and can be further updated with the latest data. The gesture-based system may use the gesture profile information for gesture recognition techniques. Further, the gesture profile may be roaming such that the gesture profile is available in a second location without requiring the system to relearn gestures that have already been personalized on behalf of the user.

Type: Application

Filed: October 19, 2009

Publication date: April 21, 2011

Applicant: Microsoft Corporation

Inventors: Zhengyou Zhang, Alex Aben-Athar Kipman, Kenneth Alan Lobb, Joseph Reginald Scott Molnar
Speech modeling and enhancement based on magnitude-normalized spectra

Patent number: 7930178

Abstract: A frame of a speech signal is converted into the spectral domain to identify a plurality of frequency components and an energy value for the frame is determined. The plurality of frequency components is divided by the energy value for the frame to form energy-normalized frequency components. A model is then constructed from the energy-normalized frequency components and can be used for speech recognition and speech enhancement.

Type: Grant

Filed: December 23, 2005

Date of Patent: April 19, 2011

Assignee: Microsoft Corporation

Inventors: Zhengyou Zhang, Alejandro Acero, Amarnag Subramanya, Zicheng Liu
Energy-based sound source localization and gain normalization

Patent number: 7924655

Abstract: An energy based technique to estimate the positions of people speaking from an ad hoc network of microphones. The present technique does not require accurate synchronization of the microphones. In addition, a technique to normalize the gains of the microphones based on people's speech is presented, which allows aggregation of various audio channels from the ad hoc microphone network into a single stream for audio conferencing. The technique is invariant of the speaker's volumes thus making the system easy to deploy in practice.

Type: Grant

Filed: January 16, 2007

Date of Patent: April 12, 2011

Assignee: Microsoft Corp.

Inventors: Zicheng Liu, Zhengyou Zhang, Li-wei He, Philip A. Chou, Minghua Chen
MULTI-CAMERA HEAD POSE TRACKING

Publication number: 20110063403

Abstract: Techniques and technologies for tracking a face with a plurality of cameras wherein a geometry between the cameras is initially unknown. One disclosed method includes detecting a head with two of the cameras and registering a head model with the image of the head (as detected by one of the cameras). The method also includes back projecting the other detected face image to the head model and determining a head pose from the back-projected head image. Furthermore, the determined geometry is used to track the face with at least one of the cameras.

Type: Application

Filed: September 16, 2009

Publication date: March 17, 2011

Applicant: Microsoft Corporation

Inventors: Zhengyou Zhang, Aswin Sankaranarayanan, Qing Zhang, Zicheng Liu, Qin Cai

prev … 7 8 9 10 11 12 13 14 15 … next