Patents by Inventor Zhengyou Zhang

Zhengyou Zhang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7539616
    Abstract: Speaker authentication is performed by determining a similarity score for a test utterance and a stored training utterance. Computing the similarity score involves determining the sum of a group of functions, where each function includes the product of a posterior probability of a mixture component and a difference between an adapted mean and a background mean. The adapted mean is formed based on the background mean and the test utterance. The speech content provided by the speaker for authentication can be text-independent (i.e., any content they want to say) or text-dependent (i.e., a particular phrase used for training).
    Type: Grant
    Filed: February 20, 2006
    Date of Patent: May 26, 2009
    Assignee: Microsoft Corporation
    Inventors: Zhengyou Zhang, Ming Liu
  • Patent number: 7532359
    Abstract: A system and process for improving the appearance of improperly colored and/or improperly exposed images is presented. This involves the use of two novel techniques—namely an automatic color correction technique and an automatic exposure correction technique. The automatic color correction technique takes information from within an image to determine true color characteristics, and improves the color in improperly colored pixels. The automatic exposure correction technique measures the average intensity of all of the pixels and adjusts the entire image pixel by pixel to compensate for over or under exposure. These techniques are stand alone in that each can be applied to an image exclusive of the other, or they can both be applied in any order desired.
    Type: Grant
    Filed: March 9, 2004
    Date of Patent: May 12, 2009
    Assignee: Microsoft Corporation
    Inventors: Po Yuan, Zhengyou Zhang
  • Patent number: 7518631
    Abstract: A visual control system controls a controlled component. In one embodiment, the visual control system controls the controlled component based on a visual location of a user. In another embodiment, input from a visual perception device is used to provide focus control for an audio input device. In additional embodiments, the visual control system stops, starts or suppresses speech recognition or other audio functions when the direction of the sound detected by the audio input device is not coming from the user's visual location.
    Type: Grant
    Filed: June 28, 2005
    Date of Patent: April 14, 2009
    Assignee: Microsoft Corporation
    Inventors: John R. Hershey, Zhengyou Zhang
  • Patent number: 7515173
    Abstract: Video images representative of a conferee's head are received and evaluated with respect to a reference model to monitor a head position of the conferee. A personalized face model of the conferee is captured to track head position of the conferee. In a stereo implementation, first and second video images representative of a first conferee taken from different views are concurrently captured. A head position of the first conferee is tracked from the first and second video images. The tracking of head-position through a personalized model-based approach can be used in a number of applications such as human-computer interaction and eye-gaze correction for video conferencing.
    Type: Grant
    Filed: May 23, 2002
    Date of Patent: April 7, 2009
    Assignee: Microsoft Corporation
    Inventors: Zhengyou Zhang, Ruigang Yang
  • Publication number: 20090080632
    Abstract: Audio in an audio conference is spatialized using either virtual sound-source positioning or sound-field capture. A spatial audio conference is provided between a local and remote parties using audio conferencing devices (ACDs) interconnected by a network. Each ACD captures spatial audio information from the local party, generates either one, or three or more, audio data streams which include the captured information, and transmits the generated stream(s) to each remote party. Each ACD also receives the generated audio data stream(s) transmitted from each of the remote parties, processes the received streams to generate a plurality of audio signals, and renders the signals to produce a sound-field that is perceived by the local party, where the sound-field includes the spatial audio information captured from the remote parties. A sound-field capture device is also provided which includes at least three directional microphones symmetrically configured about a center axis in a semicircular array.
    Type: Application
    Filed: September 25, 2007
    Publication date: March 26, 2009
    Applicant: Microsoft Corporation
    Inventors: Zhengyou Zhang, James D. Johnston
  • Patent number: 7508993
    Abstract: A system and process for improving the appearance of improperly colored and/or improperly exposed images is presented. This involves the use of two novel techniques—namely an automatic color correction technique and an automatic exposure correction technique. The automatic color correction technique takes information from within an image to determine true color characteristics, and improves the color in improperly colored pixels. The automatic exposure correction technique measures the average intensity of all of the pixels and adjusts the entire image pixel by pixel to compensate for over or under exposure. These techniques are stand alone in that each can be applied to an image exclusive of the other, or they can both be applied in any order desired.
    Type: Grant
    Filed: March 9, 2004
    Date of Patent: March 24, 2009
    Assignee: Microsoft Corporation
    Inventors: Po Yuan, Zhengyou Zhang
  • Publication number: 20090075634
    Abstract: Multi-modal, multi-lingual devices can be employed to consolidate numerous items including, but not limited to, keys, remote controls, image capture devices, audio recorders, cellular telephone functionalities, location/direction detectors, health monitors, calendars, gaming devices, smart home inputs, pens, optical pointing devices or the like. For example, a corner of a cellular telephone can be used as an electronic pen. Moreover, the device can be used to snap multiple pictures stitching them together to create a panoramic image. A device can automate ignition of an automobile, initiate appliances, etc. based upon relative distance. The device can provide for near to eye capabilities for enhanced image viewing. Multiple cameras/sensors can be provided on a single device to provide for stereoscopic capabilities. The device can also provide assistance to blind, privacy, etc. by consolidating services.
    Type: Application
    Filed: November 26, 2008
    Publication date: March 19, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Michael J. Sinclair, Yuan Kong, Zhengyou Zhang, Behrooz Chitsaz, David W. Williams, Silviu-Petru Cucerzan, Zicheng Liu
  • Patent number: 7499686
    Abstract: A mobile device is provided that includes a digit input that can be manipulated by a user's fingers or thumb, an air conduction microphone and an alternative sensor that provides an alternative sensor signal indicative of speech. Under some embodiments, the mobile device also includes a proximity sensor that provides a proximity signal indicative of the distance from the mobile device to an object. Under some embodiments, the signal from the air conduction microphone, the alternative sensor signal, and the proximity signal are used to form an estimate of a clean speech value. In further embodiments, a sound is produced through a speaker in the mobile device based on the amount of noise in the clean speech value. In other embodiments, the sound produced through the speaker is based on the proximity sensor signal.
    Type: Grant
    Filed: February 24, 2004
    Date of Patent: March 3, 2009
    Assignee: Microsoft Corporation
    Inventors: Michael J. Sinclair, Xuedong David Huang, Zhengyou Zhang
  • Patent number: 7496229
    Abstract: A system and method for transmitting a clear image of a whiteboard work surface for remote collaboration. The image is separated into two portions; the projected image of the work surface, and the writing physically added to the whiteboard by participants. This separation allows several benefits. The bandwidth requirements are much lower than video teleconferencing, and the benefits of whiteboard sharing are improved. The visual echo created on a physical whiteboard can be canceled.
    Type: Grant
    Filed: February 17, 2004
    Date of Patent: February 24, 2009
    Assignee: Microsoft Corp.
    Inventors: Zhengyou Zhang, Hanning Zhou
  • Patent number: 7477762
    Abstract: An incremental motion estimation system and process for estimating the camera pose parameters associated with each image of a long image sequence. Unlike previous approaches, which rely on point matches across three or more views, the present system and process also includes those points shared only by two views. The problem is formulated as a series of localized bundle adjustments in such a way that the estimated camera motions in the whole sequence are consistent with each other. The result of the inclusion of two-view matching points and the localized bundle adjustment approach is more accurate estimates of the camera pose parameters for each image in the sequence than previous incremental techniques, and providing an accuracy approaching that of global bundle adjustment techniques except with processing times about 100 to 700 times faster than the global approaches.
    Type: Grant
    Filed: August 31, 2004
    Date of Patent: January 13, 2009
    Assignee: Microsoft Corporation
    Inventors: Zhengyou Zhang, Ying Shan
  • Publication number: 20090001165
    Abstract: Systems and methods for 2-D barcode recognition are described. In one aspect, the systems and methods use a charge coupled camera capturing device to capture a digital image of a 3-D scene. The systems and methods evaluate the digital image to localize and segment a 2-D barcode from the digital image of the 3-D scene. The 2-D barcode is rectified to remove non-uniform lighting and correct any perspective distortion. The rectified 2-D barcode is divided into multiple uniform cells to generate a 2-D matrix array of symbols. A barcode processing application evaluates the 2-D matrix array of symbols to present data to the user.
    Type: Application
    Filed: June 29, 2007
    Publication date: January 1, 2009
    Applicant: Microsoft Corporation
    Inventors: Chunhui Zhang, Zhouchen Lin, Zhengyou Zhang, Shi Han
  • Publication number: 20080317371
    Abstract: A video noise reduction technique is presented. Generally, the technique involves first decomposing each frame of the video into low-pass and high-pass frequency components. Then, for each frame of the video after the first frame, an estimate of a noise variance in the high pass component is obtained. The noise in the high pass component of each pixel of each frame is reduced using the noise variance estimate obtained for the frame under consideration, whenever there has been no substantial motion exhibited by the pixel since the last previous frame. Evidence of motion is determined by analyzing the high and low pass components.
    Type: Application
    Filed: June 19, 2007
    Publication date: December 25, 2008
    Applicant: Microsoft Corporation
    Inventors: Cha Zhang, Zhengyou Zhang, Zicheng Liu
  • Patent number: 7460884
    Abstract: Multi-modal, multi-lingual devices can be employed to consolidate numerous items including, but not limited to, keys, remote controls, image capture devices, audio recorders, cellular telephone functionalities, location/direction detectors, health monitors, calendars, gaming devices, smart home inputs, pens, optical pointing devices or the like. For example, a corner of a cellular telephone can be used as an electronic pen. Moreover, the device can be used to snap multiple pictures stitching them together to create a panoramic image. A device can automate ignition of an automobile, initiate appliances, etc. based upon relative distance. The device can provide for near to eye capabilities for enhanced image viewing. Multiple cameras/sensors can be provided on a single device to provide for stereoscopic capabilities. The device can also provide assistance to blind, privacy, etc. by consolidating services.
    Type: Grant
    Filed: June 29, 2005
    Date of Patent: December 2, 2008
    Assignee: Microsoft Corporation
    Inventors: Michael J. Sinclair, Yuan Kong, Zhengyou Zhang, Behrooz Chitsaz, David W. Williams, Silviu-Petru Cucerzan, Zicheng Liu
  • Patent number: 7460160
    Abstract: A digital camera having a single image sensor made up of an array of filtered photosites used to capture non-visible light wavelengths in addition to the standard red/green/blue (RGB) or other visible light intensity values is presented. Essentially, this is accomplished using a separate filter disposed over each photosite that exhibits a light transmission function with regard to wavelength which passes only a prescribed range of wavelengths—some passing light in the visible light spectrum and others in the non-visible light spectrum. The photosites passing non-visible light wavelengths can be configured to pass light in the infrared (IR) light spectrum, which can be limited to just the near infrared (NIR) spectrum if desired, or alternately light in the ultra-violet (UV) light spectrum.
    Type: Grant
    Filed: September 24, 2004
    Date of Patent: December 2, 2008
    Assignee: Microsoft Corporation
    Inventors: John Hershey, Zhengyou Zhang
  • Publication number: 20080279423
    Abstract: A subregion-based image parameter recovery system and method for recovering image parameters from a single image containing a face taken under sub-optimal illumination conditions. The recovered image parameters (including albedo, illumination, and face geometry) can be used to generate face images under a new lighting environment. The method includes dividing the face in the image into numerous smaller regions, generating an albedo morphable model for each region, and using a Markov Random Fields (MRF)-based framework to model the spatial dependence between neighboring regions. Different types of regions are defined, including saturated, shadow, regular, and occluded regions. Each pixel in the image is classified and assigned to a region based on intensity, and then weighted based on its classification.
    Type: Application
    Filed: May 11, 2007
    Publication date: November 13, 2008
    Applicant: Microsoft Corporation
    Inventors: Zhengyou Zhang, Zicheng Liu, Gang Hua, Yang Wang
  • Publication number: 20080279467
    Abstract: Image enhancement techniques are described to enhance an image in accordance with a set of training images. In an implementation, an image color tone map is generated for a facial region included in an image. The image color tone map may be normalized to a color tone map for a set of training images so that the image color tone map matches the map for the training images. The normalized color tone map may be applied to the image to enhance the in-question image. In further implementations, the procedure may be updated when the average color intensity in non-facial regions differs from an accumulated mean by a threshold amount.
    Type: Application
    Filed: May 10, 2007
    Publication date: November 13, 2008
    Applicant: Microsoft Corporation
    Inventors: Zicheng Liu, Cha Zhang, Zhengyou Zhang
  • Patent number: 7447630
    Abstract: A method and system use an alternative sensor signal received from a sensor other than an air conduction microphone to estimate a clean speech value. The estimation uses either the alternative sensor signal alone, or in conjunction with the air conduction microphone signal. The clean speech value is estimated without using a model trained from noisy training data collected from an air conduction microphone. Under one embodiment, correction vectors are added to a vector formed from the alternative sensor signal in order to form a filter, which is applied to the air conductive microphone signal to produce the clean speech estimate. In other embodiments, the pitch of a speech signal is determined from the alternative sensor signal and is used to decompose an air conduction microphone signal. The decomposed signal is then used to determine a clean signal estimate.
    Type: Grant
    Filed: November 26, 2003
    Date of Patent: November 4, 2008
    Assignee: Microsoft Corporation
    Inventors: Zicheng Liu, Michael J. Sinclair, Alejandro Acero, Xuedong D. Huang, James G. Droppo, Li Deng, Zhengyou Zhang, Yanli Zheng
  • Publication number: 20080267578
    Abstract: The present virtual video muting technique seamlessly inserts a virtual video into a live video when the user does not want to reveal his/her actual activity. The virtual video is generated based on real video frames captured earlier and thus makes the virtual video appear to be real.
    Type: Application
    Filed: April 30, 2007
    Publication date: October 30, 2008
    Applicant: Microsoft Corporation
    Inventors: Zhengyou Zhang, Aaron Fred Bobick
  • Publication number: 20080234842
    Abstract: A device controller that controls a device by tapping or rubbing the surface of microphones on the device. It allows microphones to be used as both speech sensors (to capture speech signals, the original functionality) and a device controller (the new functionality). Tapping or rubbing the surface of microphones on the device produces complex yet distinctive signals. By detecting these events, the present device controller can generate appropriate commands to control the device.
    Type: Application
    Filed: March 21, 2007
    Publication date: September 25, 2008
    Applicant: Microsoft Corporation
    Inventor: Zhengyou Zhang
  • Patent number: 7426297
    Abstract: A system that captures both whiteboard content and audio signals of a meeting using a video camera and records or transmits them in real-time. The Real-Time Whiteboard Capture captures pen strokes on whiteboards in real time using an off-the-shelf video camera. Unlike many existing tools, the RTWCS does not instrument the pens or the whiteboard. It analyzes the sequence of captured video images in real time, classifies the pixels into whiteboard background, pen strokes and foreground objects (e.g., people in front of the whiteboard), and extracts newly written pen strokes. This allows the RTWCS to transmit whiteboard contents using very low bandwidth to remote meeting participants. Combined with other teleconferencing tools such as voice conference and application sharing, the RTWCS becomes a powerful tool to share ideas during online meetings.
    Type: Grant
    Filed: March 21, 2007
    Date of Patent: September 16, 2008
    Assignee: Microsoft Corp.
    Inventors: Zhengyou Zhang, Liwei He