Patents by Inventor Yong Rui

Yong Rui has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 6934370
    Abstract: A system for communicating audio data signals comprises a source computer that performs an action, generates an event message corresponding to the action, converts the event message into an audio data signal, and communicates the audio data signal through its speaker. A source telephone receives a voice signal from a participant and the audio data signal through its microphone and communicates the audio data signal and voice as coherent sound via an audio communications medium. A recipient telephone receives the audio data signal from the coherent sound communicated via the audio communications medium and communicates the audio data signal via its speaker. A recipient computer receives the audio data signal through its microphone, extracts the event message from the audio data signal, and performs an action based on the event message from the audio data signal. The audio communications medium can comprise a telephone communications system or air.
    Type: Grant
    Filed: June 16, 2003
    Date of Patent: August 23, 2005
    Assignee: Microsoft Corporation
    Inventors: Roy Leban, Ross Garrett Cutler, Henrique S. Malvar, Yong Rui
  • Publication number: 20050160457
    Abstract: Audio/video programming content is made available to a receiver from a content provider, and meta data is made available to the receiver from a meta data provider. The meta data corresponds to the programming content, and identifies, for each of multiple portions of the programming content, an indicator of a likelihood that the portion is an exciting portion of the content. In one implementation, the meta data includes probabilities that segments of a baseball program are exciting, and is generated by analyzing the audio data of the baseball program for both excited speech and baseball hits. The meta data can then be used to generate a summary for the baseball program.
    Type: Application
    Filed: March 15, 2005
    Publication date: July 21, 2005
    Applicant: Microsoft Corporation
    Inventors: Yong Rui, Anoop Gupta, Alejandro Acero
  • Publication number: 20050159956
    Abstract: Audio/video programming content is made available to a receiver from a content provider, and meta data is made available to the receiver from a meta data provider. The meta data corresponds to the programming content, and identifies, for each of multiple portions of the programming content, an indicator of a likelihood that the portion is an exciting portion of the content. In one implementation, the meta data includes probabilities that segments of a baseball program are exciting, and is generated by analyzing the audio data of the baseball program for both excited speech and baseball hits. The meta data can then be used to generate a summary for the baseball program.
    Type: Application
    Filed: March 4, 2005
    Publication date: July 21, 2005
    Applicant: Microsoft Corporation
    Inventors: Yong Rui, Anoop Gupta, Alejandro Acero
  • Publication number: 20050147278
    Abstract: Automatic detection and tracking of multiple individuals includes receiving a frame of video and/or audio content and identifying a candidate area for a new face region in the frame. One or more hierarchical verification levels are used to verify whether a human face is in the candidate area, and an indication made that the candidate area includes a face if the one or more hierarchical verification levels verify that a human face is in the candidate area. A plurality of audio and/or video cues are used to track each verified face in the video content from frame to frame.
    Type: Application
    Filed: January 25, 2005
    Publication date: July 7, 2005
    Applicant: Mircosoft Corporation
    Inventors: Yong Rui, Yungqiang Chen
  • Publication number: 20050129278
    Abstract: Automatic detection and tracking of multiple individuals includes receiving a frame of video and/or audio content and identifying a candidate area for a new face region in the frame. One or more hierarchical verification levels are used to verify whether a human face is in the candidate area, and an indication made that the candidate area includes a face if the one or more hierarchical verification levels verify that a human face is in the candidate area. A plurality of audio and/or video cues are used to track each verified face in the video content from frame to frame.
    Type: Application
    Filed: January 25, 2005
    Publication date: June 16, 2005
    Applicant: Microsoft Corporation
    Inventors: Yong Rui, Yunqiang Chen
  • Publication number: 20050114079
    Abstract: A system and process for tracking an object state over time using particle filter sensor fusion and a plurality of logical sensor modules is presented. This new fusion framework combines both the bottom-up and top-down approaches to sensor fusion to probabilistically fuse multiple sensing modalities. At the lower level, individual vision and audio trackers can be designed to generate effective proposals for the fuser. At the higher level, the fuser performs reliable tracking by verifying hypotheses over multiple likelihood models from multiple cues. Different from the traditional fusion algorithms, the present framework is a closed-loop system where the fuser and trackers coordinate their tracking information. Furthermore, to handle non-stationary situations, the present framework evaluates the performance of the individual trackers and dynamically updates their object states.
    Type: Application
    Filed: November 10, 2004
    Publication date: May 26, 2005
    Applicant: Microsoft Corporation
    Inventors: Yong Rui, Yunqiang Chen
  • Publication number: 20050086223
    Abstract: An improved image retrieval process based on relevance feedback uses a hierarchical (per-feature) approach in comparing images. Multiple query vectors are generated for an initial image by extracting multiple low-level features from the initial image. When determining how closely a particular image in an image collection matches the initial image, a distance is calculated between the query vectors and corresponding low-level feature vectors extracted from the particular image. Once these individual distances are calculated, they are combined to generate an overall distance that represents how closely the two images match. According to other aspects, relevancy feedback received regarding previously retrieved images is used during the query vector generation and the distance determination to influence which images are subsequently retrieved.
    Type: Application
    Filed: October 21, 2004
    Publication date: April 21, 2005
    Applicant: Microsoft Corporation
    Inventor: Yong Rui
  • Publication number: 20050086703
    Abstract: A program distribution system includes a plurality of set-top boxes that receive broadcast programming and segmentation data from content and information providers. The segmentation information indicates portions of programs that are to be included in skimmed or condensed versions of the received programming, and is produced using manual or automated methods. Automated methods include the use of ancillary production data to detect the most important parts of a program. A user interface allows a user to control time scale modification and skimming during playback, and also allows the user to easily browse to different points within the current program.
    Type: Application
    Filed: October 22, 2004
    Publication date: April 21, 2005
    Applicant: Microsoft Corporation
    Inventors: Anoop Gupta, Li-Wei He, Francis Li, Yong Rui
  • Publication number: 20050083849
    Abstract: Estimation of available bandwidth on a network uses packet pairs and spatially filtering. Packet pairs are transmitted over the network. The dispersion of the packet pairs is used to generate samples of the available bandwidth, which are then classified into bins to generate a histogram. The bins can have uniform bin widths, and the histogram data can be aged so that older samples are given less weight in the estimation. The histogram data is then spatially filtered. Kernel density algorithms can be used to spatially filter the histogram data. The network available bandwidth is estimated using the spatially filtered histogram data. Alternatively, the spatially filtered histogram data can be temporally filtered before the available bandwidth is estimated.
    Type: Application
    Filed: October 15, 2003
    Publication date: April 21, 2005
    Inventors: Yong Rui, Andres Vega-Garcia
  • Patent number: 6882959
    Abstract: A system and process for tracking an object state over time using particle filter sensor fusion and a plurality of logical sensor modules is presented. This new fusion framework combines both the bottom-up and top-down approaches to sensor fusion to probabilistically fuse multiple sensing modalities. At the lower level, individual vision and audio trackers can be designed to generate effective proposals for the fuser. At the higher level, the fuser performs reliable tracking by verifying hypotheses over multiple likelihood models from multiple cues. Different from the traditional fusion algorithms, the present framework is a closed-loop system where the fuser and trackers coordinate their tracking information. Furthermore, to handle non-stationary situations, the present framework evaluates the performance of the individual trackers and dynamically updates their object states.
    Type: Grant
    Filed: May 2, 2003
    Date of Patent: April 19, 2005
    Assignee: Microsoft Corporation
    Inventors: Yong Rui, Yunqiang Chen
  • Publication number: 20050076081
    Abstract: Indications of which participant is providing information during a multi-party conference. Each participant has equipment to display information being transferred during the conference. A sourcing signaler residing in the participant equipment provides a signal that indicates the identity of its participant when this participant is providing information to the conference. The source indicators of the other participant equipment receive the signal and cause a UI to indicate that the participant identified by the received signal is providing information (e.g. the UI can causes the identifier to change appearance). An audio discriminator is used to distinguish between an acoustic signal that was generated by a person speaking from that generated in a band-limited manner. The audio discriminator analyzes the spectrum of detected audio signals and generates several parameters from the spectrum and from past determinations to determine the source of an audio signal on a frame-by-frame basis.
    Type: Application
    Filed: October 1, 2003
    Publication date: April 7, 2005
    Inventors: Yong Rui, Anoop Gupta
  • Publication number: 20050065929
    Abstract: An improved image retrieval process based on relevance feedback uses a hierarchical (per-feature) approach in comparing images. Multiple query vectors are generated for an initial image by extracting multiple low-level features from the initial image. When determining how closely a particular image in an image collection matches the initial image, a distance is calculated between the query vectors and corresponding low-level feature vectors extracted from the particular image. Once these individual distances are calculated, they are combined to generate an overall distance that represents how closely the two images match. According to other aspects, relevancy feedback received regarding previously retrieved images is used during the query vector generation and the distance determination to influence which images are subsequently retrieved.
    Type: Application
    Filed: October 26, 2004
    Publication date: March 24, 2005
    Applicant: Microsoft Corporation
    Inventor: Yong Rui
  • Publication number: 20050065802
    Abstract: A system and method for automatically determining if a remote client is a human or a computer. A set of HIP design guidelines which are important to ensure the security and usability of a HIP system are described. Furthermore, one embodiment of this new HIP system and method is based on human face and facial feature detection. Because human face is the most familiar object to all human users the embodiment of the invention employing a face is possibly the most universal HIP system so far.
    Type: Application
    Filed: September 19, 2003
    Publication date: March 24, 2005
    Applicant: Microsoft Corporation
    Inventors: Yong Rui, Zicheng Liu
  • Patent number: 6859802
    Abstract: An improved image retrieval process based on relevance feedback uses a hierarchical (per-feature) approach in comparing images. Multiple query vectors are generated for an initial image by extracting multiple low-level features from the initial image. When determining how closely a particular image in an image collection matches the initial image, a distance is calculated between the query vectors and corresponding low-level feature vectors extracted from the particular image. Once these individual distances are calculated, they are combined to generate an overall distance that represents how closely the two images match. According to other aspects, relevancy feedback received regarding previously retrieved images is used during the query vector generation and the distance determination to influence which images are subsequently retrieved.
    Type: Grant
    Filed: September 13, 2000
    Date of Patent: February 22, 2005
    Assignee: Microsoft Corporation
    Inventor: Yong Rui
  • Publication number: 20040263636
    Abstract: A system and method for teleconferencing and recording of meetings. The system uses a variety of capture devices (a novel 360° camera, a whiteboard camera, a presenter view camera, a remote view camera, and a microphone array) to provide a rich experience for people who want to participate in a meeting from a distance. The system is also combined with speaker clustering, spatial indexing, and time compression to provide a rich experience for people who miss a meeting and want to watch it afterward.
    Type: Application
    Filed: June 26, 2003
    Publication date: December 30, 2004
    Applicant: Microsoft Corporation
    Inventors: Ross Cutler, Yong Rui, Anoop Gupta
  • Publication number: 20040240680
    Abstract: A system and process for finding the location of a sound source using direct approaches having weighting factors that mitigate the effect of both correlated and reverberation noise is presented. When more than two microphones are used, the traditional time-delay-of-arrival (TDOA) based sound source localization (SSL) approach involves two steps. The first step computes TDOA for each microphone pair, and the second step combines these estimates. This two-step process discards relevant information in the first step, thus degrading the SSL accuracy and robustness. In the present invention, direct, one-step, approaches are employed. Namely, a one-step TDOA SSL approach and a steered beam (SB) SSL approach are employed. Each of these approaches provides an accuracy and robustness not available with the traditional two-step approaches.
    Type: Application
    Filed: May 28, 2003
    Publication date: December 2, 2004
    Inventors: Yong Rui, Dinei A. Florencio
  • Publication number: 20040220769
    Abstract: A system and process for tracking an object state over time using particle filter sensor fusion and a plurality of logical sensor modules is presented. This new fusion framework combines both the bottom-up and top-down approaches to sensor fusion to probabilistically fuse multiple sensing modalities. At the lower level, individual vision and audio trackers can be designed to generate effective proposals for the fuser. At the higher level, the fuser performs reliable tracking by verifying hypotheses over multiple likelihood models from multiple cues. Different from the traditional fusion algorithms, the present framework is a closed-loop system where the fuser and trackers coordinate their tracking information. Furthermore, to handle non-stationary situations, the present framework evaluates the performance of the individual trackers and dynamically updates their object states.
    Type: Application
    Filed: May 2, 2003
    Publication date: November 4, 2004
    Inventors: Yong Rui, Yunqiang Chen
  • Publication number: 20040190730
    Abstract: A system and process for estimating the time delay of arrival (TDOA) between a pair of audio sensors of a microphone array is presented. Generally, a generalized cross-correlation (GCC) technique is employed. However, this technique is improved to include provisions for both reducing the influence (including interference) from correlated ambient noise and reverberation noise in the sensor signals prior to computing the TDOA estimate. Two unique correlated ambient noise reduction procedures are also proposed. One involves the application of Wiener filtering, and the other a combination of Wiener filtering with a Gnn subtraction technique. In addition, two unique reverberation noise reduction procedures are proposed. Both involve applying a weighting factor to the signals prior to computing the TDOA which combines the effects of a traditional maximum likelihood (TML) weighting function and a phase transformation (PHAT) weighting function.
    Type: Application
    Filed: March 31, 2003
    Publication date: September 30, 2004
    Inventors: Yong Rui, Dinei A. Florencio
  • Publication number: 20040105004
    Abstract: An automated camera management system and method for capturing presentations using videography rules. The system and method use technology components and aesthetic components represented by the videography rules to capture a presentation. In general, the automated camera management method captures a presentation using videography rules to determine camera positioning, camera movement, and switching or transition between cameras. The videography rules depend on the type of presentation room and the number of audio-visual camera units used to capture the presentation. The automated camera management system of the invention uses the above method to capture a presentation in a presentation room. The system includes a least one audio-visual (A-V) camera unit for capturing and tracking a subject based on vision or sound. The (A-V) camera unit includes any combination of the following components: (1) a pan-tilt-zoom (PTZ) camera; (2) a fixed camera; and (3) a microphone array.
    Type: Application
    Filed: November 30, 2002
    Publication date: June 3, 2004
    Inventors: Yong Rui, Anoop Gupta, Jonathan Thomas Grudin
  • Publication number: 20040037436
    Abstract: A system and process is described for estimating the location of a speaker using signals output by a microphone array characterized by multiple pairs of audio sensors. The location of a speaker is estimated by first determining whether the signal data contains human speech components and filtering out noise attributable to stationary sources. The location of the person speaking is then estimated using a time-delay-of-arrival based SSL technique on those parts of the data determined to contain human speech components. A consensus location for the speaker is computed from the individual location estimates associated with each pair of microphone array audio sensors taking into consideration the uncertainty of each estimate. A final consensus location is also computed from the individual consensus locations computed over a prescribed number of sampling periods using a temporal filtering technique.
    Type: Application
    Filed: August 26, 2002
    Publication date: February 26, 2004
    Inventor: Yong Rui