Patents by Inventor Cha Zhang

Cha Zhang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20110313766
    Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.
    Type: Application
    Filed: August 30, 2011
    Publication date: December 22, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
  • Publication number: 20110268281
    Abstract: Described are systems and methods performed by computer to reduce crosstalk produced by loudspeakers when rendering binaural sound that is emitted from the loudspeakers into a room. The room may have sound-reflecting surfaces that reflect some of the sound produced by the loudspeakers. To reduce crosstalk, a room model stored by the computer, is accessed. The room model models at least sound reflected by one or more of the physical surfaces. The room model is used to calculate a model of an audio channel from the loudspeakers to a listener. The model of the audio channel models sound transmission from the loudspeakers to the listener. The computer uses the model of the audio channel to cancel crosstalk from the loudspeakers when rendering the binaural sound.
    Type: Application
    Filed: April 30, 2010
    Publication date: November 3, 2011
    Applicant: Microsoft Corporation
    Inventors: Dinei A. Florencio, Cha Zhang, Myung-Suk Song
  • Patent number: 8031967
    Abstract: A video noise reduction technique is presented. Generally, the technique involves first decomposing each frame of the video into low-pass and high-pass frequency components. Then, for each frame of the video after the first frame, an estimate of a noise variance in the high pass component is obtained. The noise in the high pass component of each pixel of each frame is reduced using the noise variance estimate obtained for the frame under consideration, whenever there has been no substantial motion exhibited by the pixel since the last previous frame. Evidence of motion is determined by analyzing the high and low pass components.
    Type: Grant
    Filed: June 19, 2007
    Date of Patent: October 4, 2011
    Assignee: Microsoft Corporation
    Inventors: Cha Zhang, Zhengyou Zhang, Zicheng Liu
  • Patent number: 8024189
    Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.
    Type: Grant
    Filed: June 22, 2006
    Date of Patent: September 20, 2011
    Assignee: Microsoft Corporation
    Inventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
  • Patent number: 8010471
    Abstract: A “Classifier Trainer” trains a combination classifier for detecting specific objects in signals (e.g., faces in images, words in speech, patterns in signals, etc.). In one embodiment “multiple instance pruning” (MIP) is introduced for training weak classifiers or “features” of the combination classifier. Specifically, a trained combination classifier and associated final threshold for setting false positive/negative operating points are combined with learned intermediate rejection thresholds to construct the combination classifier. Rejection thresholds are learned using a pruning process which ensures that objects detected by the original combination classifier are also detected by the combination classifier, thereby guaranteeing the same detection rate on the training set after pruning. The only parameter required throughout training is a target detection rate for the final cascade system.
    Type: Grant
    Filed: July 13, 2007
    Date of Patent: August 30, 2011
    Assignee: Microsoft Corporation
    Inventors: Cha Zhang, Paul Viola
  • Publication number: 20110170739
    Abstract: Described is a technology by which medical patient facial images are acquired and maintained for associating with a patient's records and/or other items. A video camera may provide video frames, such as captured when a patient is being admitted to a hospital. Face detection may be employed to clip the facial part from the frame. Multiple images of a patient's face may be displayed on a user interface to allow selection of a representative image. Also described is obtaining the patient images by processing electronic documents (e.g., patient records) to look for a face pictured therein.
    Type: Application
    Filed: January 12, 2010
    Publication date: July 14, 2011
    Applicant: Microsoft Corporation
    Inventors: Michael Gillam, John Christopher Gillotte, Craig Frederick Feied, Jonathan Alan Handler, Renato Reder Cazangi, Rajesh Kutpadi Hegde, Zhengyou Zhang, Cha Zhang
  • Publication number: 20110119210
    Abstract: Described is multiple category learning to jointly train a plurality of classifiers in an iterative manner. Each training iteration associates an adaptive label with each training example, in which during the iterations, the adaptive label of any example is able to be changed by the subsequent reclassification. In this manner, any mislabeled training example is corrected by the classifiers during training. The training may use a probabilistic multiple category boosting algorithm that maintains probability data provided by the classifiers, or a winner-take-all multiple category boosting algorithm selects the adaptive label based upon the highest probability classification. The multiple category boosting training system may be coupled to a multiple instance learning mechanism to obtain the training examples. The trained classifiers may be used as weak classifiers that provide a label used to select a deep classifier for further classification, e.g., to provide a multi-view object detector.
    Type: Application
    Filed: November 16, 2009
    Publication date: May 19, 2011
    Applicant: c/o Microsoft Corporation
    Inventors: Cha Zhang, Zhengyou Zhang
  • Patent number: 7890443
    Abstract: A “Classifier Trainer” trains a combination classifier for detecting specific objects in signals (e.g., faces in images, words in speech, patterns in signals, etc.). In one embodiment “multiple instance pruning” (MIP) is introduced for training weak classifiers or “features” of the combination classifier. Specifically, a trained combination classifier and associated final threshold for setting false positive/negative operating points are combined with learned intermediate rejection thresholds to construct the combination classifier. Rejection thresholds are learned using a pruning process which ensures that objects detected by the original combination classifier are also detected by the combination classifier, thereby guaranteeing the same detection rate on the training set after pruning. The only parameter required throughout training is a target detection rate for the final cascade system.
    Type: Grant
    Filed: July 13, 2007
    Date of Patent: February 15, 2011
    Assignee: Microsoft Corporation
    Inventors: Cha Zhang, Paul Viola
  • Patent number: 7885463
    Abstract: A spatial-color Gaussian mixture model (SCGMM) image segmentation technique for segmenting images. The SCGMM image segmentation technique specifies foreground objects in the first frame of an image sequence, either manually or automatically. From the initial segmentation, the SCGMM segmentation system learns two spatial-color Gaussian mixture models (SCGMM) for the foreground and background objects. These models are built into a first-order Markov random field (MRF) energy function. The minimization of the energy function leads to a binary segmentation of the images in the image sequence, which can be solved efficiently using a conventional graph cut procedure.
    Type: Grant
    Filed: March 30, 2006
    Date of Patent: February 8, 2011
    Assignee: Microsoft Corp.
    Inventors: Cha Zhang, Michael Cohen, Yong Rui, Ting Yu
  • Publication number: 20100329358
    Abstract: Multi-view video that is being streamed to a remote device in real time may be encoded. Frames of a real-world scene captured by respective video cameras are received for compression. A virtual viewpoint, positioned relative to the video cameras, is used to determine expected contributions of individual portions of the frames to a synthesized image of the scene from the viewpoint position using the frames. For each frame, compression rates for individual blocks of a frame are computed based on the determined contributions of the individual portions of the frame. The frames are compressed by compressing the blocks of the frames according to their respective determined compression rates. The frames are transmitted in compressed form via a network to a remote device, which is configured to render the scene using the compressed frames.
    Type: Application
    Filed: June 25, 2009
    Publication date: December 30, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Cha Zhang, Dinei Florencio
  • Publication number: 20100329517
    Abstract: Techniques for face verification are described. Local binary pattern (LBP) features and boosting classifiers are used to verify faces in images. A boosted multi-task learning algorithm is used for face verification in images. Finally, boosted face verification is used to verify faces in videos.
    Type: Application
    Filed: June 26, 2009
    Publication date: December 30, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Cha Zhang, Xiaogang Wang, Zhengyou Zhang
  • Patent number: 7840638
    Abstract: A multimedia conference technique is disclosed that allows physically remote users to participate in an immersive telecollaborative environment by synchronizing multiple data, images and sounds. The multimedia conference implementation provides users with the perception of being in the same room visually as well as acoustically according to an orientation plan which reflects each remote user's position within the multimedia conference environment.
    Type: Grant
    Filed: June 27, 2008
    Date of Patent: November 23, 2010
    Assignee: Microsoft Corporation
    Inventors: Zhengyou Zhang, Xuedong David Huang, Zicheng Liu, Cha Zhang, Philip A. Chou, Christian Huitema
  • Publication number: 20100289904
    Abstract: Systems are disclosed that provide improved transfer speed of video data from a video capture device to a computing device using multiple video feeds respectively comprising different resolutions. A high-resolution image sensor is used to convert light images into a high-resolution video data stream. A down sampler converts the high-resolution video data stream to a low-resolution video data stream, so that both a low-resolution data stream and a high-resolution data stream are available. While the low resolution-data stream can be sent to the computing device, a digital signal processor (DSP) processes the high-resolution video data stream in accordance with an input control signal that is comprised of desired high-resolution video stream parameters derived from the low-resolution video data stream.
    Type: Application
    Filed: May 15, 2009
    Publication date: November 18, 2010
    Applicant: Microsoft Corporation
    Inventors: Cha Zhang, Zhengyou Zhang, Zicheng Liu, Wanghong Yuan, Christian Huitema
  • Patent number: 7822696
    Abstract: A “Classifier Trainer” trains a combination classifier for detecting specific objects in signals (e.g., faces in images, words in speech, patterns in signals, etc.). In one embodiment “multiple instance pruning” (MIP) is introduced for training weak classifiers or “features” of the combination classifier. Specifically, a trained combination classifier and associated final threshold for setting false positive/negative operating points are combined with learned intermediate rejection thresholds to construct the combination classifier. Rejection thresholds are learned using a pruning process which ensures that objects detected by the original combination classifier are also detected by the combination classifier, thereby guaranteeing the same detection rate on the training set after pruning. The only parameter required throughout training is a target detection rate for the final cascade system.
    Type: Grant
    Filed: July 13, 2007
    Date of Patent: October 26, 2010
    Assignee: Microsoft Corporation
    Inventors: Cha Zhang, Paul Viola
  • Publication number: 20100225743
    Abstract: Techniques and technologies are described herein for motion parallax three-dimensional (3D) imaging. Such techniques and technologies do not require special glasses, virtual reality helmets, or other user-attachable devices. More particularly, some of the described motion parallax 3D imaging techniques and technologies generate sequential images, including motion parallax depictions of various scenes derived from clues in views obtained of or created for the displayed scene.
    Type: Application
    Filed: March 5, 2009
    Publication date: September 9, 2010
    Applicant: Microsoft Corporation
    Inventors: Dinei Afonso Ferreira Florencio, Cha Zhang
  • Patent number: 7783075
    Abstract: Background blurring is an effective way to both preserve privacy and keep communication effective during video conferencing. The present image background blurring technique is a light weight real-time technique to perform background blurring using a fast background modeling procedure combined with an object (e.g., face) detector/tracker. A soft decision is made at each pixel whether it belongs to the foreground or the background based on multiple vision features. The classification results are mapped to a per-pixel blurring radius image to blur the background. In another embodiment, the image background blurring technique blurs the background of the image without using the object detector.
    Type: Grant
    Filed: June 7, 2006
    Date of Patent: August 24, 2010
    Assignee: Microsoft Corp.
    Inventors: Cha Zhang, Li-wei He, Yong Rui
  • Publication number: 20100085416
    Abstract: Multi-device capture and spatial browsing of conferences is described. In one implementation, a system detects cameras and microphones, such as the webcams on participants' notebook computers, in a conference room, group meeting, or table game, and enlists an ad-hoc array of available devices to capture each participant and the spatial relationships between participants. A video stream composited from the array is browsable by a user to navigate a 3-dimensional representation of the meeting. Each participant may be represented by a video pane, a foreground object, or a 3-D geometric model of the participant's face or body displayed in spatial relation to the other participants in a 3-dimensional arrangement analogous to the spatial arrangement of the meeting.
    Type: Application
    Filed: October 6, 2008
    Publication date: April 8, 2010
    Applicant: Microsoft Corporation
    Inventors: Rajesh K. Hegde, Zhengyou Zhang, Philip A. Chou, Cha Zhang, Zicheng Liu, Sasa Junuzovic
  • Publication number: 20090327418
    Abstract: A multimedia conference technique is disclosed that allows physically remote users to participate in an immersive telecollaborative environment by synchronizing multiple data, images and sounds. The multimedia conference implementation provides users with the perception of being in the same room visually as well as acoustically according to an orientation plan which reflects each remote user's position within the multimedia conference environment.
    Type: Application
    Filed: June 27, 2008
    Publication date: December 31, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Zhengyou Zhang, Xuedong David Huang, Zicheng Liu, Cha Zhang, Philip A. Chou, Christian Huitema
  • Publication number: 20090263010
    Abstract: A classifier is trained on a first set of examples, and the trained classifier is adapted to perform on a second set of examples. The classifier implements a parameterized labeling function. Initial training of the classifier optimizes the labeling function's parameters to minimize a cost function. The classifier and its parameters are provided to an environment in which it will operate, along with an approximation function that approximates the cost function using a compact representation of the first set of examples in place of the actual first set. A second set of examples is collected, and the parameters are modified to minimize a combined cost of labeling the first and second sets of examples. The part of the combined cost that represents the cost of the modified parameters applied to the first set is calculated using the approximation function.
    Type: Application
    Filed: April 18, 2008
    Publication date: October 22, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Cha Zhang, Zhengyou Zhang
  • Publication number: 20090251594
    Abstract: Videos are retargeted to a target display for viewing with little to no geometric distortion or video information loss. Salient regions of video frames may be determined using scale-space spatiotemporal information. Video information loss may be a result of spatial loss, due to cropping, and resolution loss, due to resizing. A desired cropping window may be determined using a coarse-to-fine searching strategy. Video frames may be cropped with a window that matches an aspect ratio of the target display, and resized isotropically to match a size of the target display.
    Type: Application
    Filed: April 2, 2008
    Publication date: October 8, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Gang Hua, Cha Zhang, Zhengyou Zhang, Zicheng Liu, Ying Shan