Patents by Inventor Cha Zhang
Cha Zhang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20110313766Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.Type: ApplicationFiled: August 30, 2011Publication date: December 22, 2011Applicant: MICROSOFT CORPORATIONInventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
-
Publication number: 20110268281Abstract: Described are systems and methods performed by computer to reduce crosstalk produced by loudspeakers when rendering binaural sound that is emitted from the loudspeakers into a room. The room may have sound-reflecting surfaces that reflect some of the sound produced by the loudspeakers. To reduce crosstalk, a room model stored by the computer, is accessed. The room model models at least sound reflected by one or more of the physical surfaces. The room model is used to calculate a model of an audio channel from the loudspeakers to a listener. The model of the audio channel models sound transmission from the loudspeakers to the listener. The computer uses the model of the audio channel to cancel crosstalk from the loudspeakers when rendering the binaural sound.Type: ApplicationFiled: April 30, 2010Publication date: November 3, 2011Applicant: Microsoft CorporationInventors: Dinei A. Florencio, Cha Zhang, Myung-Suk Song
-
Patent number: 8031967Abstract: A video noise reduction technique is presented. Generally, the technique involves first decomposing each frame of the video into low-pass and high-pass frequency components. Then, for each frame of the video after the first frame, an estimate of a noise variance in the high pass component is obtained. The noise in the high pass component of each pixel of each frame is reduced using the noise variance estimate obtained for the frame under consideration, whenever there has been no substantial motion exhibited by the pixel since the last previous frame. Evidence of motion is determined by analyzing the high and low pass components.Type: GrantFiled: June 19, 2007Date of Patent: October 4, 2011Assignee: Microsoft CorporationInventors: Cha Zhang, Zhengyou Zhang, Zicheng Liu
-
Patent number: 8024189Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.Type: GrantFiled: June 22, 2006Date of Patent: September 20, 2011Assignee: Microsoft CorporationInventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
-
Patent number: 8010471Abstract: A “Classifier Trainer” trains a combination classifier for detecting specific objects in signals (e.g., faces in images, words in speech, patterns in signals, etc.). In one embodiment “multiple instance pruning” (MIP) is introduced for training weak classifiers or “features” of the combination classifier. Specifically, a trained combination classifier and associated final threshold for setting false positive/negative operating points are combined with learned intermediate rejection thresholds to construct the combination classifier. Rejection thresholds are learned using a pruning process which ensures that objects detected by the original combination classifier are also detected by the combination classifier, thereby guaranteeing the same detection rate on the training set after pruning. The only parameter required throughout training is a target detection rate for the final cascade system.Type: GrantFiled: July 13, 2007Date of Patent: August 30, 2011Assignee: Microsoft CorporationInventors: Cha Zhang, Paul Viola
-
Publication number: 20110170739Abstract: Described is a technology by which medical patient facial images are acquired and maintained for associating with a patient's records and/or other items. A video camera may provide video frames, such as captured when a patient is being admitted to a hospital. Face detection may be employed to clip the facial part from the frame. Multiple images of a patient's face may be displayed on a user interface to allow selection of a representative image. Also described is obtaining the patient images by processing electronic documents (e.g., patient records) to look for a face pictured therein.Type: ApplicationFiled: January 12, 2010Publication date: July 14, 2011Applicant: Microsoft CorporationInventors: Michael Gillam, John Christopher Gillotte, Craig Frederick Feied, Jonathan Alan Handler, Renato Reder Cazangi, Rajesh Kutpadi Hegde, Zhengyou Zhang, Cha Zhang
-
Publication number: 20110119210Abstract: Described is multiple category learning to jointly train a plurality of classifiers in an iterative manner. Each training iteration associates an adaptive label with each training example, in which during the iterations, the adaptive label of any example is able to be changed by the subsequent reclassification. In this manner, any mislabeled training example is corrected by the classifiers during training. The training may use a probabilistic multiple category boosting algorithm that maintains probability data provided by the classifiers, or a winner-take-all multiple category boosting algorithm selects the adaptive label based upon the highest probability classification. The multiple category boosting training system may be coupled to a multiple instance learning mechanism to obtain the training examples. The trained classifiers may be used as weak classifiers that provide a label used to select a deep classifier for further classification, e.g., to provide a multi-view object detector.Type: ApplicationFiled: November 16, 2009Publication date: May 19, 2011Applicant: c/o Microsoft CorporationInventors: Cha Zhang, Zhengyou Zhang
-
Patent number: 7890443Abstract: A “Classifier Trainer” trains a combination classifier for detecting specific objects in signals (e.g., faces in images, words in speech, patterns in signals, etc.). In one embodiment “multiple instance pruning” (MIP) is introduced for training weak classifiers or “features” of the combination classifier. Specifically, a trained combination classifier and associated final threshold for setting false positive/negative operating points are combined with learned intermediate rejection thresholds to construct the combination classifier. Rejection thresholds are learned using a pruning process which ensures that objects detected by the original combination classifier are also detected by the combination classifier, thereby guaranteeing the same detection rate on the training set after pruning. The only parameter required throughout training is a target detection rate for the final cascade system.Type: GrantFiled: July 13, 2007Date of Patent: February 15, 2011Assignee: Microsoft CorporationInventors: Cha Zhang, Paul Viola
-
Patent number: 7885463Abstract: A spatial-color Gaussian mixture model (SCGMM) image segmentation technique for segmenting images. The SCGMM image segmentation technique specifies foreground objects in the first frame of an image sequence, either manually or automatically. From the initial segmentation, the SCGMM segmentation system learns two spatial-color Gaussian mixture models (SCGMM) for the foreground and background objects. These models are built into a first-order Markov random field (MRF) energy function. The minimization of the energy function leads to a binary segmentation of the images in the image sequence, which can be solved efficiently using a conventional graph cut procedure.Type: GrantFiled: March 30, 2006Date of Patent: February 8, 2011Assignee: Microsoft Corp.Inventors: Cha Zhang, Michael Cohen, Yong Rui, Ting Yu
-
Publication number: 20100329358Abstract: Multi-view video that is being streamed to a remote device in real time may be encoded. Frames of a real-world scene captured by respective video cameras are received for compression. A virtual viewpoint, positioned relative to the video cameras, is used to determine expected contributions of individual portions of the frames to a synthesized image of the scene from the viewpoint position using the frames. For each frame, compression rates for individual blocks of a frame are computed based on the determined contributions of the individual portions of the frame. The frames are compressed by compressing the blocks of the frames according to their respective determined compression rates. The frames are transmitted in compressed form via a network to a remote device, which is configured to render the scene using the compressed frames.Type: ApplicationFiled: June 25, 2009Publication date: December 30, 2010Applicant: MICROSOFT CORPORATIONInventors: Cha Zhang, Dinei Florencio
-
Publication number: 20100329517Abstract: Techniques for face verification are described. Local binary pattern (LBP) features and boosting classifiers are used to verify faces in images. A boosted multi-task learning algorithm is used for face verification in images. Finally, boosted face verification is used to verify faces in videos.Type: ApplicationFiled: June 26, 2009Publication date: December 30, 2010Applicant: MICROSOFT CORPORATIONInventors: Cha Zhang, Xiaogang Wang, Zhengyou Zhang
-
Patent number: 7840638Abstract: A multimedia conference technique is disclosed that allows physically remote users to participate in an immersive telecollaborative environment by synchronizing multiple data, images and sounds. The multimedia conference implementation provides users with the perception of being in the same room visually as well as acoustically according to an orientation plan which reflects each remote user's position within the multimedia conference environment.Type: GrantFiled: June 27, 2008Date of Patent: November 23, 2010Assignee: Microsoft CorporationInventors: Zhengyou Zhang, Xuedong David Huang, Zicheng Liu, Cha Zhang, Philip A. Chou, Christian Huitema
-
Publication number: 20100289904Abstract: Systems are disclosed that provide improved transfer speed of video data from a video capture device to a computing device using multiple video feeds respectively comprising different resolutions. A high-resolution image sensor is used to convert light images into a high-resolution video data stream. A down sampler converts the high-resolution video data stream to a low-resolution video data stream, so that both a low-resolution data stream and a high-resolution data stream are available. While the low resolution-data stream can be sent to the computing device, a digital signal processor (DSP) processes the high-resolution video data stream in accordance with an input control signal that is comprised of desired high-resolution video stream parameters derived from the low-resolution video data stream.Type: ApplicationFiled: May 15, 2009Publication date: November 18, 2010Applicant: Microsoft CorporationInventors: Cha Zhang, Zhengyou Zhang, Zicheng Liu, Wanghong Yuan, Christian Huitema
-
Patent number: 7822696Abstract: A “Classifier Trainer” trains a combination classifier for detecting specific objects in signals (e.g., faces in images, words in speech, patterns in signals, etc.). In one embodiment “multiple instance pruning” (MIP) is introduced for training weak classifiers or “features” of the combination classifier. Specifically, a trained combination classifier and associated final threshold for setting false positive/negative operating points are combined with learned intermediate rejection thresholds to construct the combination classifier. Rejection thresholds are learned using a pruning process which ensures that objects detected by the original combination classifier are also detected by the combination classifier, thereby guaranteeing the same detection rate on the training set after pruning. The only parameter required throughout training is a target detection rate for the final cascade system.Type: GrantFiled: July 13, 2007Date of Patent: October 26, 2010Assignee: Microsoft CorporationInventors: Cha Zhang, Paul Viola
-
Publication number: 20100225743Abstract: Techniques and technologies are described herein for motion parallax three-dimensional (3D) imaging. Such techniques and technologies do not require special glasses, virtual reality helmets, or other user-attachable devices. More particularly, some of the described motion parallax 3D imaging techniques and technologies generate sequential images, including motion parallax depictions of various scenes derived from clues in views obtained of or created for the displayed scene.Type: ApplicationFiled: March 5, 2009Publication date: September 9, 2010Applicant: Microsoft CorporationInventors: Dinei Afonso Ferreira Florencio, Cha Zhang
-
Patent number: 7783075Abstract: Background blurring is an effective way to both preserve privacy and keep communication effective during video conferencing. The present image background blurring technique is a light weight real-time technique to perform background blurring using a fast background modeling procedure combined with an object (e.g., face) detector/tracker. A soft decision is made at each pixel whether it belongs to the foreground or the background based on multiple vision features. The classification results are mapped to a per-pixel blurring radius image to blur the background. In another embodiment, the image background blurring technique blurs the background of the image without using the object detector.Type: GrantFiled: June 7, 2006Date of Patent: August 24, 2010Assignee: Microsoft Corp.Inventors: Cha Zhang, Li-wei He, Yong Rui
-
Publication number: 20100085416Abstract: Multi-device capture and spatial browsing of conferences is described. In one implementation, a system detects cameras and microphones, such as the webcams on participants' notebook computers, in a conference room, group meeting, or table game, and enlists an ad-hoc array of available devices to capture each participant and the spatial relationships between participants. A video stream composited from the array is browsable by a user to navigate a 3-dimensional representation of the meeting. Each participant may be represented by a video pane, a foreground object, or a 3-D geometric model of the participant's face or body displayed in spatial relation to the other participants in a 3-dimensional arrangement analogous to the spatial arrangement of the meeting.Type: ApplicationFiled: October 6, 2008Publication date: April 8, 2010Applicant: Microsoft CorporationInventors: Rajesh K. Hegde, Zhengyou Zhang, Philip A. Chou, Cha Zhang, Zicheng Liu, Sasa Junuzovic
-
Publication number: 20090327418Abstract: A multimedia conference technique is disclosed that allows physically remote users to participate in an immersive telecollaborative environment by synchronizing multiple data, images and sounds. The multimedia conference implementation provides users with the perception of being in the same room visually as well as acoustically according to an orientation plan which reflects each remote user's position within the multimedia conference environment.Type: ApplicationFiled: June 27, 2008Publication date: December 31, 2009Applicant: MICROSOFT CORPORATIONInventors: Zhengyou Zhang, Xuedong David Huang, Zicheng Liu, Cha Zhang, Philip A. Chou, Christian Huitema
-
Publication number: 20090263010Abstract: A classifier is trained on a first set of examples, and the trained classifier is adapted to perform on a second set of examples. The classifier implements a parameterized labeling function. Initial training of the classifier optimizes the labeling function's parameters to minimize a cost function. The classifier and its parameters are provided to an environment in which it will operate, along with an approximation function that approximates the cost function using a compact representation of the first set of examples in place of the actual first set. A second set of examples is collected, and the parameters are modified to minimize a combined cost of labeling the first and second sets of examples. The part of the combined cost that represents the cost of the modified parameters applied to the first set is calculated using the approximation function.Type: ApplicationFiled: April 18, 2008Publication date: October 22, 2009Applicant: MICROSOFT CORPORATIONInventors: Cha Zhang, Zhengyou Zhang
-
Publication number: 20090251594Abstract: Videos are retargeted to a target display for viewing with little to no geometric distortion or video information loss. Salient regions of video frames may be determined using scale-space spatiotemporal information. Video information loss may be a result of spatial loss, due to cropping, and resolution loss, due to resizing. A desired cropping window may be determined using a coarse-to-fine searching strategy. Video frames may be cropped with a window that matches an aspect ratio of the target display, and resized isotropically to match a size of the target display.Type: ApplicationFiled: April 2, 2008Publication date: October 8, 2009Applicant: MICROSOFT CORPORATIONInventors: Gang Hua, Cha Zhang, Zhengyou Zhang, Zicheng Liu, Ying Shan