Patents by Inventor Zhengyou Zhang
Zhengyou Zhang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 8180465Abstract: A system that facilitates managing resources (e.g., functionality, services) based at least in part upon an established context. More particularly, a context determination component can be employed to establish a context by processing sensor inputs or learning/inferring a user action/preference. Once the context is established via context determination component, a power/mode management component can be employed to activate and/or mask resources in accordance with the established context. The power and mode management of the device can extend life of a power source (e.g., battery) and mask functionality in accordance with a user and/or device state.Type: GrantFiled: January 15, 2008Date of Patent: May 15, 2012Assignee: Microsoft CorporationInventors: Michael J. Sinclair, David W. Williams, Zhengyou Zhang, Zicheng Liu
-
Patent number: 8175382Abstract: Image enhancement techniques are described to enhance an image in accordance with a set of training images. In an implementation, an image color tone map is generated for a facial region included in an image. The image color tone map may be normalized to a color tone map for a set of training images so that the image color tone map matches the map for the training images. The normalized color tone map may be applied to the image to enhance the in-question image. In further implementations, the procedure may be updated when the average color intensity in non-facial regions differs from an accumulated mean by a threshold amount.Type: GrantFiled: May 10, 2007Date of Patent: May 8, 2012Assignee: Microsoft CorporationInventors: Zicheng Liu, Cha Zhang, Zhengyou Zhang
-
Publication number: 20120105585Abstract: A system and method are disclosed for calibrating a depth camera in a natural user interface. The system in general obtains an objective measurement of true distance between a capture device and one or more objects in a scene. The system then compares the true depth measurement to the depth measurement provided by the depth camera at one or more points and determines an error function describing an error in the depth camera measurement. The depth camera may then be recalibrated to correct for the error. The objective measurement of distance to one or more objects in a scene may be accomplished by a variety of systems and methods.Type: ApplicationFiled: November 3, 2010Publication date: May 3, 2012Applicant: MICROSOFT CORPORATIONInventors: Prafulla J. Masalkar, Szymon P. Stachniak, Tommer Leyvand, Zhengyou Zhang, Leonardo Del Castillo, Zsolt Mathe
-
Patent number: 8099288Abstract: A text-dependent speaker verification technique that uses a generic speaker-independent speech recognizer for robust speaker verification, and uses the acoustical model of a speaker-independent speech recognizer as a background model. Instead of using a likelihood ratio test (LRT) at the utterance level (e.g., the sentence level), which is typical of most speaker verification systems, the present text-dependent speaker verification technique uses weighted sum of likelihood ratios at the sub-unit level (word, tri-phone, or phone) as well as at the utterance level.Type: GrantFiled: February 12, 2007Date of Patent: January 17, 2012Assignee: Microsoft Corp.Inventors: Zhengyou Zhang, Amarnag Subramaya
-
Publication number: 20110311137Abstract: Described is a hierarchical filtered motion field technology such as for use in recognizing actions in videos with crowded backgrounds. Interest points are detected, e.g., as 2D Harris corners with recent motion, e.g. locations with high intensities in a motion history image (MHI). A global spatial motion smoothing filter is applied to the gradients of MHI to eliminate low intensity corners that are likely isolated, unreliable or noisy motions. At each remaining interest point, a local motion field filter is applied to the smoothed gradients by computing a structure proximity between sets of pixels in the local region and the interest point. The motion at a pixel/pixel set is enhanced or weakened based on its structure proximity with the interest point (nearer pixels are enhanced).Type: ApplicationFiled: June 22, 2010Publication date: December 22, 2011Applicant: MICROSOFT CORPORATIONInventors: Zicheng Liu, Yingli Tian, Liangliang Cao, Zhengyou Zhang
-
Publication number: 20110307260Abstract: Gender recognition is performed using two or more modalities. For example, depth image data and one or more types of data other than depth image data is received. The data pertains to a person. The different types of data are fused together to automatically determine gender of the person. A computing system can subsequently interact with the person based on the determination of gender.Type: ApplicationFiled: June 11, 2010Publication date: December 15, 2011Inventors: Zhengyou Zhang, Alex Aben-Athar Kipman
-
Patent number: 8079079Abstract: A multimodal system that employs a plurality of sensing modalities which can be processed concurrently to increase confidence in connection with authentication. The multimodal system and/or set of various devices can provide several points of information entry in connection with authentication. Authentication can be improved, for example, by combining face recognition, biometrics, speech recognition, handwriting recognition, gait recognition, retina scan, thumb/hand prints, or subsets thereof. Additionally, portable multimodal devices (e.g., a smartphone) can be used as credit cards, and authentication in connection with such use can mitigate unauthorized transactions.Type: GrantFiled: June 29, 2005Date of Patent: December 13, 2011Assignee: Microsoft CorporationInventors: Zhengyou Zhang, David W. Williams, Yuan Kong, Zicheng Liu, David Kurlander, Mike Sinclair
-
Publication number: 20110299741Abstract: Multiple images including a face presented by a user are accessed. One or more determinations are made based on the multiple images, such as a determination of whether the face included in the multiple images is a 3-dimensional structure or a flat surface and/or a determination of whether motion is present in one or more face components (e.g., eyes or mouth). If it is determined that the face included in the multiple images is a 3-dimensional structure or that that motion is present in the one or more face components, then an indication is provided that the user can be authenticated. However, if it is determined that the face included in the multiple images is a flat surface or that motion is not present in the one or more face components, then an indication is provided that the user cannot be authenticated.Type: ApplicationFiled: June 8, 2010Publication date: December 8, 2011Applicant: Microsoft CorporationInventors: Zhengyou Zhang, Qin Cai, Pieter R. Kasselman, Arthur H. Baker
-
Patent number: 8073125Abstract: Audio in an audio conference is spatialized using either virtual sound-source positioning or sound-field capture. A spatial audio conference is provided between a local and remote parties using audio conferencing devices (ACDs) interconnected by a network. Each ACD captures spatial audio information from the local party, generates either one, or three or more, audio data streams which include the captured information, and transmits the generated stream(s) to each remote party. Each ACD also receives the generated audio data stream(s) transmitted from each of the remote parties, processes the received streams to generate a plurality of audio signals, and renders the signals to produce a sound-field that is perceived by the local party, where the sound-field includes the spatial audio information captured from the remote parties. A sound-field capture device is also provided which includes at least three directional microphones symmetrically configured about a center axis in a semicircular array.Type: GrantFiled: September 25, 2007Date of Patent: December 6, 2011Assignee: Microsoft CorporationInventors: Zhengyou Zhang, James Johnston
-
Publication number: 20110295392Abstract: Reaction information of participants to an interaction may be sensed and analyzed to determine one or more reactions or dispositions of the participants. Feedback may be provided based on the determined reactions. The participants may be given an opportunity to opt in to having their reaction information collected, and may be provided complete control over how their reaction information is shared or used.Type: ApplicationFiled: May 27, 2010Publication date: December 1, 2011Applicant: Microsoft CorporationInventors: Sharon K. Cunnington, Rajesh K. Hegde, Kori Quinn, Jin Li, Philip A. Chou, Zhengyou Zhang, Desney S. Tan
-
Publication number: 20110267419Abstract: Techniques for recording and replay of a live conference while still attending the live conference are described. A conferencing system includes a user interface generator, a live conference processing module, and a replay processing module. The user interface generator is configured to generate a user interface that includes a replay control panel and one or more output panels. The live conference processing module is configured to extract information included in received conferencing data that is associated with one or more conferencing modalities, and to display the information in the one or more output panels in a live manner (e.g., as a live conference). The replay processing module is configured to enable information associated with the one or more conferencing modalities corresponding to a time of the conference session prior to live to be presented at a desired rate, possibly different from the real-time rate, if a replay mode is selected in the replay control panel.Type: ApplicationFiled: April 30, 2010Publication date: November 3, 2011Applicant: MICROSOFT CORPORATIONInventors: Kori Inkpen Quinn, Rajesh Hegde, Zhengyou Zhang, John Tang, Sasa Junuzovic, Christopher Brooks
-
Patent number: 8031967Abstract: A video noise reduction technique is presented. Generally, the technique involves first decomposing each frame of the video into low-pass and high-pass frequency components. Then, for each frame of the video after the first frame, an estimate of a noise variance in the high pass component is obtained. The noise in the high pass component of each pixel of each frame is reduced using the noise variance estimate obtained for the frame under consideration, whenever there has been no substantial motion exhibited by the pixel since the last previous frame. Evidence of motion is determined by analyzing the high and low pass components.Type: GrantFiled: June 19, 2007Date of Patent: October 4, 2011Assignee: Microsoft CorporationInventors: Cha Zhang, Zhengyou Zhang, Zicheng Liu
-
Patent number: 8009880Abstract: A subregion-based image parameter recovery system and method for recovering image parameters from a single image containing a face taken under sub-optimal illumination conditions. The recovered image parameters (including albedo, illumination, and face geometry) can be used to generate face images under a new lighting environment. The method includes dividing the face in the image into numerous smaller regions, generating an albedo morphable model for each region, and using a Markov Random Fields (MRF)-based framework to model the spatial dependence between neighboring regions. Different types of regions are defined, including saturated, shadow, regular, and occluded regions. Each pixel in the image is classified and assigned to a region based on intensity, and then weighted based on its classification.Type: GrantFiled: May 11, 2007Date of Patent: August 30, 2011Assignee: Microsoft CorporationInventors: Zhengyou Zhang, Zicheng Liu, Gang Hua, Yang Wang
-
Patent number: 7991607Abstract: Architecture that combines capture and translation of concepts, goals, needs, locations, objects, locations, and items (e.g., sign text) into complete conversational utterances that take a translation of the item, and morph it with fluidity into sets of sentences that can be echoed to a user, and that the user can select to communicate speech (or textual utterances). A plurality of modalities that process images, audio, video, searches and cultural context, for example, which are representative of at least context and/or content, and can be employed to glean additional information regarding a communications exchange to facilitate more accurate and efficient translation. Gesture recognition can be utilized to enhance input recognition, urgency, and/or emotional interaction, for example. Speech can be used for document annotation. Moreover, translation (e.g., speech to speech, text to speech, speech to text, handwriting to speech, text or audio, . . .Type: GrantFiled: June 27, 2005Date of Patent: August 2, 2011Assignee: Microsoft CorporationInventors: Zhengyou Zhang, David W. Williams, Yuan Kong, Zicheng Liu
-
Publication number: 20110170739Abstract: Described is a technology by which medical patient facial images are acquired and maintained for associating with a patient's records and/or other items. A video camera may provide video frames, such as captured when a patient is being admitted to a hospital. Face detection may be employed to clip the facial part from the frame. Multiple images of a patient's face may be displayed on a user interface to allow selection of a representative image. Also described is obtaining the patient images by processing electronic documents (e.g., patient records) to look for a face pictured therein.Type: ApplicationFiled: January 12, 2010Publication date: July 14, 2011Applicant: Microsoft CorporationInventors: Michael Gillam, John Christopher Gillotte, Craig Frederick Feied, Jonathan Alan Handler, Renato Reder Cazangi, Rajesh Kutpadi Hegde, Zhengyou Zhang, Cha Zhang
-
Publication number: 20110119210Abstract: Described is multiple category learning to jointly train a plurality of classifiers in an iterative manner. Each training iteration associates an adaptive label with each training example, in which during the iterations, the adaptive label of any example is able to be changed by the subsequent reclassification. In this manner, any mislabeled training example is corrected by the classifiers during training. The training may use a probabilistic multiple category boosting algorithm that maintains probability data provided by the classifiers, or a winner-take-all multiple category boosting algorithm selects the adaptive label based upon the highest probability classification. The multiple category boosting training system may be coupled to a multiple instance learning mechanism to obtain the training examples. The trained classifiers may be used as weak classifiers that provide a label used to select a deep classifier for further classification, e.g., to provide a multi-view object detector.Type: ApplicationFiled: November 16, 2009Publication date: May 19, 2011Applicant: c/o Microsoft CorporationInventors: Cha Zhang, Zhengyou Zhang
-
Publication number: 20110093820Abstract: A gesture-based system may have default or pre-packaged gesture information, where a gesture is derived from a user's position or motion in a physical space. In other words, no controllers or devices are necessary. Depending on how a user uses his or her gesture to accomplish the task, the system may refine the properties and the gesture may become personalized. The personalized gesture information may be stored in a gesture profile and can be further updated with the latest data. The gesture-based system may use the gesture profile information for gesture recognition techniques. Further, the gesture profile may be roaming such that the gesture profile is available in a second location without requiring the system to relearn gestures that have already been personalized on behalf of the user.Type: ApplicationFiled: October 19, 2009Publication date: April 21, 2011Applicant: Microsoft CorporationInventors: Zhengyou Zhang, Alex Aben-Athar Kipman, Kenneth Alan Lobb, Joseph Reginald Scott Molnar
-
Patent number: 7930178Abstract: A frame of a speech signal is converted into the spectral domain to identify a plurality of frequency components and an energy value for the frame is determined. The plurality of frequency components is divided by the energy value for the frame to form energy-normalized frequency components. A model is then constructed from the energy-normalized frequency components and can be used for speech recognition and speech enhancement.Type: GrantFiled: December 23, 2005Date of Patent: April 19, 2011Assignee: Microsoft CorporationInventors: Zhengyou Zhang, Alejandro Acero, Amarnag Subramanya, Zicheng Liu
-
Patent number: 7924655Abstract: An energy based technique to estimate the positions of people speaking from an ad hoc network of microphones. The present technique does not require accurate synchronization of the microphones. In addition, a technique to normalize the gains of the microphones based on people's speech is presented, which allows aggregation of various audio channels from the ad hoc microphone network into a single stream for audio conferencing. The technique is invariant of the speaker's volumes thus making the system easy to deploy in practice.Type: GrantFiled: January 16, 2007Date of Patent: April 12, 2011Assignee: Microsoft Corp.Inventors: Zicheng Liu, Zhengyou Zhang, Li-wei He, Philip A. Chou, Minghua Chen
-
Publication number: 20110063403Abstract: Techniques and technologies for tracking a face with a plurality of cameras wherein a geometry between the cameras is initially unknown. One disclosed method includes detecting a head with two of the cameras and registering a head model with the image of the head (as detected by one of the cameras). The method also includes back projecting the other detected face image to the head model and determining a head pose from the back-projected head image. Furthermore, the determined geometry is used to track the face with at least one of the cameras.Type: ApplicationFiled: September 16, 2009Publication date: March 17, 2011Applicant: Microsoft CorporationInventors: Zhengyou Zhang, Aswin Sankaranarayanan, Qing Zhang, Zicheng Liu, Qin Cai