Patents by Inventor Hong Jiang Zhang

Hong Jiang Zhang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20040264744
    Abstract: Improved methods and apparatuses are provided for use in face detection. The methods and apparatuses significantly reduce the number of candidate windows within a digital image that need to be processed using more complex and/or time consuming face detection algorithms. The improved methods and apparatuses include a skin color filter and an adaptive non-face skipping scheme.
    Type: Application
    Filed: June 30, 2003
    Publication date: December 30, 2004
    Applicant: MICROSOFT CORPORATION
    Inventors: Lei Zhang, Mingjing Li, Hong-Jiang Zhang
  • Publication number: 20040264780
    Abstract: Systems and methods for annotating a face in a digital image are described. In one aspect, a probability model is trained by mapping one or more sets of sample facial features to corresponding names of individuals. A face from an input data set of at least one the digital image is then detected. Facial features are then automatically extracted from the detected face. A similarity measure is them modeled as a posterior probability that the facial features match a particular set of features identified in the probability model. The similarity measure is statistically learned. A name is then inferred as a function of the similarity measure. The face is then annotated with the name.
    Type: Application
    Filed: June 30, 2003
    Publication date: December 30, 2004
    Inventors: Lei Zhang, Longbin Chen, Mingjing Li, Hong-Jiang Zhang
  • Publication number: 20040264745
    Abstract: A face model having outer and inner facial features is matched to that of first and second models. Each facial feature of the first and second models is represented by plurality of points that are adjusted for each matching outer and inner facial feature of the first and second models using 1) the corresponding epipolar constraint for the inner features of the first and second models. 2) Local grey-level structure of both outer and inner features of the first and second models. The matching and the adjusting are repeated, for each of the first and second models, until the points for each of the outer and inner facial features on the respective first and second models that are found to match that of the face model have a relative offset there between of not greater than a predetermined convergence tolerance. The inner facial features can include a pair of eyes, a nose and a mouth. The outer facial features can include a pair of eyebrows and a silhouette of the jaw, chin, and cheeks.
    Type: Application
    Filed: June 30, 2003
    Publication date: December 30, 2004
    Applicant: MICROSOFT CORPORATION
    Inventors: Lie Gu, Li Ziqing, Hong-Jiang Zhang
  • Publication number: 20040243541
    Abstract: An implementation of a technology, described herein, for relevance-feedback, content-based facilitating accurate and efficient image retrieval minimizes the number of iterations for user feedback regarding the semantic relevance of exemplary images while maximizing the resulting relevance of each iteration. One technique for accomplishing this is to use a Bayesian classifier to treat positive and negative feedback examples with different strategies. In addition, query refinement techniques are applied to pinpoint the users' intended queries with respect to their feedbacks. These techniques further enhance the accuracy and usability of relevance feedback. This abstract itself is not intended to limit the scope of this patent. The scope of the present invention is pointed out in the appending claims.
    Type: Application
    Filed: April 26, 2004
    Publication date: December 2, 2004
    Inventors: Hong-Jiang Zhang, Zhong Su, Xingquan Zhu
  • Publication number: 20040240708
    Abstract: Improvements are provided to effectively assess a user's face and head pose such that a computer or like device can track the user's attention towards a display device(s). Then the region of the display or graphical user interface that the user is turned towards can be automatically selected without requiring the user to provide further inputs. A frontal face detector is applied to detect the user's frontal face and then key facial points such as left/right eye center, left/right mouth corner, nose tip, etc., are detected by component detectors. The system then tracks the user's head by an image tracker and determines yaw, tilt and roll angle and other pose information of the user's head through a coarse to fine process according to key facial points and/or confidence outputs by pose estimator.
    Type: Application
    Filed: May 30, 2003
    Publication date: December 2, 2004
    Applicant: MICROSOFT CORPORATION
    Inventors: Yuxiao Hu, Lei Zhang, Mingjing Li, Hong-Jiang Zhang
  • Publication number: 20040228542
    Abstract: In one aspect, the present disclosure describes a process for automatic artifact compensation in a digital representation of an image. The process includes detecting, by a processor, regions corresponding to facial images within the digital representation; locating, by the processor, red-eye regions within the detected regions; and automatically modifying, by the processor, the located red-eye regions to provide a modified image.
    Type: Application
    Filed: May 13, 2003
    Publication date: November 18, 2004
    Applicant: MICROSOFT CORPORATION
    Inventors: Lei Zhang, Yanfeng Sun, Mingjing Li, Hong-Jiang Zhang
  • Publication number: 20040220925
    Abstract: Systems and methods for a media agent are described. In one aspect, user access of a media content source is detected. Responsive to this detection, a piece of media content and associated text is collected from the media content source. Semantic text features are extracted from the associated text and the piece of media content. The semantic text features are indexed into a media database.
    Type: Application
    Filed: May 24, 2004
    Publication date: November 4, 2004
    Applicant: Microsoft Corporation
    Inventors: Wen-Yin Liu, Hong-Jiang Zhang, Zheng Chen
  • Publication number: 20040216585
    Abstract: Systems and methods for extracting a music snippet from a music stream are described. In one aspect, one or more music sentences are extracted from the music stream. The one or more sentences are extracted as a function of peaks and valleys of acoustic energy across sequential music stream portions. The music snippet is selected based on the one or more music sentences.
    Type: Application
    Filed: June 3, 2004
    Publication date: November 4, 2004
    Applicant: Microsoft Corporation
    Inventors: Lie Lu, Hong-Jiang Zhang, Po Yuan
  • Publication number: 20040215663
    Abstract: Systems and methods for a media agent are described. In one aspect, it is determined whether a user wants to save or download a media object from a media source. Semantic information is then extracted from the media source. A filename is then suggested to the user for the media object based on the semantic information.
    Type: Application
    Filed: May 24, 2004
    Publication date: October 28, 2004
    Applicant: Microsoft Corporation
    Inventors: Wen-Yin Liu, Hong-Jiang Zhang, Zheng Chen
  • Publication number: 20040197071
    Abstract: An algorithm identifies a salient video frame from a video sequence for use as a video thumbnail. The identification of a video thumbnail is based on a frame goodness measure. The algorithm calculates a color histogram of a frame, and then calculates the entropy and standard deviation of the color histogram. The frame goodness measure is a weighted combination of the entropy and the standard deviation. A video frame having the highest value of frame goodness measure for a video sequence is determined as the video thumbnail for a video sequence.
    Type: Application
    Filed: April 1, 2003
    Publication date: October 7, 2004
    Applicant: MICROSOFT CORPORATION
    Inventors: Dong Zhang, Yijin Wang, Hong-Jiang Zhang
  • Publication number: 20040170392
    Abstract: A “music video parser” automatically detects and segments music videos in a combined audio-video media stream. Automatic detection and segmentation is achieved by integrating shot boundary detection, video text detection and audio analysis to automatically detect temporal boundaries of each music video in the media stream. In one embodiment, song identification information, such as, for example, a song name, artist name, album name, etc., is automatically extracted from the media stream using video optical character recognition (OCR). This information is then used in alternate embodiments for cataloging, indexing and selecting particular music videos, and in maintaining statistics such as the times particular music videos were played, and the number of times each music video was played.
    Type: Application
    Filed: February 19, 2003
    Publication date: September 2, 2004
    Inventors: Lie Lu, Yan-Feng Sun, Mingjing Li, Xian-Sheng Hua, Hong-Jiang Zhang
  • Patent number: 6784354
    Abstract: Systems and methods for extracting a music snippet from a music stream are described. In one aspect, the music stream is divided into multiple frames of fixed length. The most-salient frame of the multiple frames is then identified. One or more music sentences are then extracted from the music stream as a function of peaks and valleys of acoustic energy across sequential music stream portions. The music snippet is the sentence that includes the most-salient frame.
    Type: Grant
    Filed: March 13, 2003
    Date of Patent: August 31, 2004
    Assignee: Microsoft Corporation
    Inventors: Lie Lu, Hong-Jiang Zhang, Po Yuan
  • Publication number: 20040165784
    Abstract: Systems and methods for adapting images for substantially optimal presentation by heterogeneous client display sizes are described. In one aspect, an image is modeled with respect to multiple visual attentions to generate respective attention objects for each of the visual attentions. For each of one or more image adaptation schemes, an objective measure of information fidelity (IF) is determined for a region R of the image. The objective measures are determined as a function of a resource constraint of the display device and as a function of a weighted sum of IF of each attention object in the region R. A substantially optimal adaptation scheme is then selected as a function of the calculated objective measures. The image is then adapted via the selected substantially optimal adaptation scheme to generate an adapted image as a function of at least the target area of the client display.
    Type: Application
    Filed: February 20, 2003
    Publication date: August 26, 2004
    Inventors: Xing Xie, Wei-Ying Ma, Hong-Jiang Zhang, Liqun Chen, Xin Fan
  • Publication number: 20040161154
    Abstract: Systems and methods for learning-based automatic commercial content detection are described. In one aspect, program data is divided into multiple segments. The segments are analyzed to determine visual, audio, and context-based feature sets that differentiate commercial content from non-commercial content. The context-based features are a function of single-side left and/or right neighborhoods of segments of the multiple segments.
    Type: Application
    Filed: February 18, 2003
    Publication date: August 19, 2004
    Inventors: Xian-Sheng Hua, Lie Lu, Mingjing Li, Hong-Jiang Zhang
  • Publication number: 20040145602
    Abstract: A technique is provided for organizing and displaying digital photographs based on time. The technique includes inputting data representing a photograph and storing the data as a photograph image file. The technique then identifies the manner in which the photograph image file stores time information (such as date and time of day). For instance, the technique determines whether the time information is digitally encoded in the image file, or whether it is embedded within the image data itself. The technique next includes extracting the time information from the photograph image file using a technique appropriate to the identified manner in which the time information is stored, to produce extracted time information. The photographs are then inserted into a time sequence based on the extracted time information, and presented on a calendar display at a location representative of the chronological placement of the photograph within the time sequence.
    Type: Application
    Filed: January 24, 2003
    Publication date: July 29, 2004
    Applicant: Microsoft Corporation
    Inventors: Yan-Feng Sun, Lei Zhang, Mingjing Li, Hong-Jiang Zhang
  • Patent number: 6748398
    Abstract: An implementation of a technology, described herein, for relevance-feedback, content-based facilitating accurate and efficient image retrieval minimizes the number of iterations for user feedback regarding the semantic relevance of exemplary images while maximizing the resulting relevance of each iteration. One technique for accomplishing this is to use a Bayesian classifier to treat positive and negative feedback examples with different strategies. In addition, query refinement techniques are applied to pinpoint the users' intended queries with respect to their feedbacks. These techniques further enhance the accuracy and usability of relevance feedback. This abstract itself is not intended to limit the scope of this patent. The scope of the present invention is pointed out in the appending claims.
    Type: Grant
    Filed: March 30, 2001
    Date of Patent: June 8, 2004
    Assignee: Microsoft Corporation
    Inventors: Hong-Jiang Zhang, Zhong Su, Xingquan Zhu
  • Publication number: 20040107100
    Abstract: A method is provided for real-time speaker change detection and speaker tracking in a speech signal. The method is a “coarse-to-refine” process, which consists of two stages: pre-segmentation and refinement. In the pre-segmentation process, the covariance of a feature vector of each segment of speech is built initially. A distance is determined based on the covariance of the current segment and a previous segment; and the distance is used to determine if there is a potential speaker change between these two segments. If there is no speaker change, the model of current identified speaker model is updated by incorporating data of the current segment. Otherwise, if there is a speaker change, a refinement process is utilized to confirm the potential speaker change point.
    Type: Application
    Filed: November 29, 2002
    Publication date: June 3, 2004
    Inventors: Lie Lu, Hong-Jiang Zhang
  • Publication number: 20040103371
    Abstract: A large web page is analyzed and partitioned into smaller sub-pages so that a user can navigate the web page on a small form factor device. The user can browse the sub-pages to find and read information in the content of the large web page. The partitioning can be performed at a web server, an edge server, at the small form factor device, or can be distributed across one or more such devices. The analysis leverages design habits of a web page author to extract a representation structure of an authored web page. The extracted representation structure includes high level structure using several markup language tag selection rules and low level structure using visual boundary detection in which visual units of the low level structure are provided by clustering markup language tags. User viewing habits can be learned to display favorite parts of a web page.
    Type: Application
    Filed: November 27, 2002
    Publication date: May 27, 2004
    Inventors: Yu Chen, Wei-Ying Ma, Ming-Yu Wang, Hong Jiang Zhang
  • Patent number: 6738512
    Abstract: Shape suppression is used to identify areas of images that include particular shapes. According to one embodiment, a Vector Quantization (VQ)-based shape classifier is designed to identify the vertical edges of a set of shapes (e.g., English letters and numbers). A shape suppression filter is applied to the candidate areas, which are identified from a vertical edge map according to the edge density, to remove the vertical edges which are not classified as characteristic of shapes. Areas with high enough edge density after the filtering are identified as potential areas of the image that include one or more of the set of shapes.
    Type: Grant
    Filed: June 19, 2000
    Date of Patent: May 18, 2004
    Assignee: Microsoft Corporation
    Inventors: Xiangrong Chen, Hong-Jiang Zhang
  • Publication number: 20040086046
    Abstract: Systems and methods to generate a motion attention model of a video data sequence are described. In one aspect, a motion saliency map B is generated to precisely indicate motion attention areas for each frame in the video data sequence. The motion saliency maps are each based on intensity I, spatial coherence Cs, and temporal coherence Ct values. These values are extracted from each block or pixel in motion fields that are extracted from the video data sequence. Brightness values of detected motion attention areas in each frame are accumulated to generate, with respect to time, the motion attention model.
    Type: Application
    Filed: November 1, 2002
    Publication date: May 6, 2004
    Inventors: Yu-Fei Ma, Hong-Jiang Zhang