Patents by Inventor Cha Zhang

Cha Zhang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20130121526
    Abstract: A three-dimensional shape parameter computation system and method for computing three-dimensional human head shape parameters from two-dimensional facial feature points. A series of images containing a user's face is captured. Embodiments of the system and method deduce the 3D parameters of the user's head by examining a series of captured images of the user over time and in a variety of head poses and facial expressions, and then computing an average. An energy function is constructed over a batch of frames containing 2D face feature points obtained from the captured images, and the energy function is minimized to solve for the head shape parameters valid for the batch of frames. Head pose parameters and facial expression and animation parameters can vary over each captured image in the batch of frames. In some embodiments this minimization is performed using a modified Gauss-Newton minimization technique using a single iteration.
    Type: Application
    Filed: November 11, 2011
    Publication date: May 16, 2013
    Applicant: Microsoft Corporation
    Inventors: Nikolay Smolyanskiy, Christian F. Huitema, Cha Zhang, Lin Liang, Sean Eron Anderson, Zhengyou Zhang
  • Patent number: 8406483
    Abstract: Techniques for face verification are described. Local binary pattern (LBP) features and boosting classifiers are used to verify faces in images. A boosted multi-task learning algorithm is used for face verification in images. Finally, boosted face verification is used to verify faces in videos.
    Type: Grant
    Filed: June 26, 2009
    Date of Patent: March 26, 2013
    Assignee: Microsoft Corporation
    Inventors: Cha Zhang, Xiaogang Wang, Zhengyou Zhang
  • Patent number: 8401979
    Abstract: Described is multiple category learning to jointly train a plurality of classifiers in an iterative manner. Each training iteration associates an adaptive label with each training example, in which during the iterations, the adaptive label of any example is able to be changed by the subsequent reclassification. In this manner, any mislabeled training example is corrected by the classifiers during training. The training may use a probabilistic multiple category boosting algorithm that maintains probability data provided by the classifiers, or a winner-take-all multiple category boosting algorithm selects the adaptive label based upon the highest probability classification. The multiple category boosting training system may be coupled to a multiple instance learning mechanism to obtain the training examples. The trained classifiers may be used as weak classifiers that provide a label used to select a deep classifier for further classification, e.g., to provide a multi-view object detector.
    Type: Grant
    Filed: November 16, 2009
    Date of Patent: March 19, 2013
    Assignee: Microsoft Corporation
    Inventors: Cha Zhang, Zhengyou Zhang
  • Publication number: 20130010079
    Abstract: A system described herein includes a receiver component that receives a first digital image from a color camera, wherein the first digital image comprises a planar object, and a second digital image from a depth sensor, wherein the second digital image comprises the planar object. The system also includes a calibrator component that jointly calibrates the color camera and the depth sensor based at least in part upon the first digital image and the second digital image.
    Type: Application
    Filed: July 8, 2011
    Publication date: January 10, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Cha Zhang, Zhengyou Zhang
  • Publication number: 20120294510
    Abstract: A depth construction module is described that receives depth images provided by two or more depth capture units. Each depth capture unit generates its depth image using a structured light technique, that is, by projecting a pattern onto an object and receiving a captured image in response thereto. The depth construction module then identifies at least one deficient portion in at least one depth image that has been received, which may be attributed to overlapping projected patterns that impinge the object. The depth construction module then uses a multi-view reconstruction technique, such as a plane sweeping technique, to supply depth information for the deficient portion. In another mode, a multi-view reconstruction technique can be used to produce an entire depth scene based on captured images received from the depth capture units, that is, without first identifying deficient portions in the depth images.
    Type: Application
    Filed: May 16, 2011
    Publication date: November 22, 2012
    Applicant: Microsoft Corporation
    Inventors: Cha Zhang, Wenwu Zhu, Zhengyou Zhang, Philip A. Chou
  • Publication number: 20120287223
    Abstract: The described implementations relate to enhancement images, such as in videoconferencing scenarios. One system includes a poriferous display screen having generally opposing front and back surfaces. This system also includes a camera positioned proximate to the back surface to capture an image through the poriferous display screen.
    Type: Application
    Filed: May 11, 2011
    Publication date: November 15, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Cha Zhang, Timothy A. Large, Zhengyou Zhang, Ruigang Yang
  • Publication number: 20120281059
    Abstract: The subject disclosure is directed towards an immersive conference, in which participants in separate locations are brought together into a common virtual environment (scene), such that they appear to each other to be in a common space, with geometry, appearance, and real-time natural interaction (e.g., gestures) preserved. In one aspect, depth data and video data are processed to place remote participants in the common scene from the first person point of view of a local participant. Sound data may be spatially controlled, and parallax computed to provide a realistic experience. The scene may be augmented with various data, videos and other effects/animations.
    Type: Application
    Filed: May 4, 2011
    Publication date: November 8, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Philip A. Chou, Zhengyou Zhang, Cha Zhang, Dinei A. Florencio, Zicheng Liu, Rajesh K. Hegde, Nirupama Chandrasekaran
  • Publication number: 20120278077
    Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.
    Type: Application
    Filed: July 11, 2012
    Publication date: November 1, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
  • Publication number: 20120242810
    Abstract: Techniques and technologies are described herein for motion parallax three-dimensional (3D) imaging. Such techniques and technologies do not require special glasses, virtual reality helmets, or other user-attachable devices. More particularly, some of the described motion parallax 3D imaging techniques and technologies generate sequential images, including motion parallax depictions of various scenes derived from clues in views obtained of or created for the displayed scene.
    Type: Application
    Filed: June 6, 2012
    Publication date: September 27, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Dinei Afonso Ferreira Florencio, Cha Zhang
  • Patent number: 8276195
    Abstract: Described herein is a method that includes receiving multiple requests for access to an exposed media object, wherein the exposed media object represents a live media stream that is being generated by a media source. The method also includes receiving data associated with each entity that provided a request, and determining, for each entity, whether the entities that provided the request are authorized to access the media stream based at least in part upon the received data and splitting the media stream into multiple media streams, wherein a number of media streams corresponds to a number of authorized entities. The method also includes automatically applying at least one policy to at least one of the split media streams based at least in part upon the received data.
    Type: Grant
    Filed: January 2, 2008
    Date of Patent: September 25, 2012
    Assignee: Microsoft Corporation
    Inventors: Rajesh K. Hegde, Cha Zhang, Philip A. Chou, Zicheng Liu
  • Patent number: 8234113
    Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.
    Type: Grant
    Filed: August 30, 2011
    Date of Patent: July 31, 2012
    Assignee: Microsoft Corporation
    Inventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
  • Patent number: 8233353
    Abstract: A multi-sensor sound source localization (SSL) technique is presented which provides a true maximum likelihood (ML) treatment for microphone arrays having more than one pair of audio sensors. Generally, this is accomplished by selecting a sound source location that results in a time of propagation from the sound source to the audio sensors of the array, which maximizes a likelihood of simultaneously producing audio sensor output signals inputted from all the sensors in the array. The likelihood includes a unique term that estimates an unknown audio sensor response to the source signal for each of the sensors in the array.
    Type: Grant
    Filed: January 26, 2007
    Date of Patent: July 31, 2012
    Assignee: Microsoft Corporation
    Inventors: Cha Zhang, Dinei Florencio, Zhengyou Zhang
  • Patent number: 8199186
    Abstract: Techniques and technologies are described herein for motion parallax three-dimensional (3D) imaging. Such techniques and technologies do not require special glasses, virtual reality helmets, or other user-attachable devices. More particularly, some of the described motion parallax 3D imaging techniques and technologies generate sequential images, including motion parallax depictions of various scenes derived from clues in views obtained of or created for the displayed scene.
    Type: Grant
    Filed: March 5, 2009
    Date of Patent: June 12, 2012
    Assignee: Microsoft Corporation
    Inventors: Dinei Afonso Ferreira Florencio, Cha Zhang
  • Publication number: 20120141017
    Abstract: A training set for a post-filter classifier is created from the output of a face detector. The face detector can be a Viola Jones face detector. Face detectors produce false positives and true positives. The regions in the training set are labeled so that false positives are labeled negative and true positives are labeled positive. The labeled training set is used to train a post-filter classifier. The post-filter classifier can be an SVM (Support Vector Machine). The trained face detection classifier is placed at the end of a face detection pipeline comprising a face detector, one or more feature extractors and the trained post-filter classifier. The post-filter reduces the number of false positives in the face detector output while keeping the number of true positives almost unchanged using features different from the Haar features used by the face detector.
    Type: Application
    Filed: December 3, 2010
    Publication date: June 7, 2012
    Applicant: Microsoft Corporation
    Inventors: Eyal Krupka, Igor Abramovski, Igor Kviatkovsky, Jason M. Cahill, Timothy R. O'Connor, Cha Zhang
  • Patent number: 8175382
    Abstract: Image enhancement techniques are described to enhance an image in accordance with a set of training images. In an implementation, an image color tone map is generated for a facial region included in an image. The image color tone map may be normalized to a color tone map for a set of training images so that the image color tone map matches the map for the training images. The normalized color tone map may be applied to the image to enhance the in-question image. In further implementations, the procedure may be updated when the average color intensity in non-facial regions differs from an accumulated mean by a threshold amount.
    Type: Grant
    Filed: May 10, 2007
    Date of Patent: May 8, 2012
    Assignee: Microsoft Corporation
    Inventors: Zicheng Liu, Cha Zhang, Zhengyou Zhang
  • Patent number: 8098842
    Abstract: A novel enhanced beamforming technique that improves beamforming operations by incorporating a model for the directional gains of the sensors, such as microphones, and provides means of estimating these gains. The technique forms estimates of the relative magnitude responses of the sensors (e.g., microphones) based on the data received at the array and includes those in the beamforming computations.
    Type: Grant
    Filed: March 29, 2007
    Date of Patent: January 17, 2012
    Assignee: Microsoft Corp.
    Inventors: Dinei Florencio, Cha Zhang, Demba Ba
  • Publication number: 20110317522
    Abstract: Described is modeling a room to obtain estimates for walls and a ceiling, and using the model to improve sound source localization by incorporating reflection (reverberation) data into the location estimation computations. In a calibration step, reflections of a known sound are detected at a microphone array, with their corresponding signals processed to estimate wall (and ceiling) locations. In a sound source localization step, when an actual sound (including reverberations) is detected, the signals are processed into hypotheses that include reflection data predictions based upon possible locations, given the room model. The location corresponding to the hypothesis that matches (maximum likelihood) the actual sound data is the estimated location of the sound source.
    Type: Application
    Filed: June 28, 2010
    Publication date: December 29, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Dinei Afonso Ferreira Florencio, Cha Zhang, Flavio Protasio Ribeiro, Demba Elimane Ba
  • Patent number: 8085302
    Abstract: A combined digital and mechanical tracking system and process for generating a video using a single digital video camera that tracks a person or object of interest moving in a scene is presented. This generally involves operating the camera at a higher resolution than is needed for the application, and cropping a sub-region out of the image captured that is output as the output video. The person or object being tracked is at least partially contained within the cropped sub-region. As the person or object moves within the field of view of the camera, the location of the cropped sub-region is also moved so as to keep the subject of interest within its boundaries. When the subject of interest moves to the boundary of the FOV of the camera, the camera is mechanically panned to keep the person or object inside its FOV.
    Type: Grant
    Filed: November 21, 2005
    Date of Patent: December 27, 2011
    Assignee: Microsoft Corporation
    Inventors: Cha Zhang, Li-wei He, Yong Rui
  • Publication number: 20110313766
    Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.
    Type: Application
    Filed: August 30, 2011
    Publication date: December 22, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
  • Publication number: 20110268281
    Abstract: Described are systems and methods performed by computer to reduce crosstalk produced by loudspeakers when rendering binaural sound that is emitted from the loudspeakers into a room. The room may have sound-reflecting surfaces that reflect some of the sound produced by the loudspeakers. To reduce crosstalk, a room model stored by the computer, is accessed. The room model models at least sound reflected by one or more of the physical surfaces. The room model is used to calculate a model of an audio channel from the loudspeakers to a listener. The model of the audio channel models sound transmission from the loudspeakers to the listener. The computer uses the model of the audio channel to cancel crosstalk from the loudspeakers when rendering the binaural sound.
    Type: Application
    Filed: April 30, 2010
    Publication date: November 3, 2011
    Applicant: Microsoft Corporation
    Inventors: Dinei A. Florencio, Cha Zhang, Myung-Suk Song