Patents by Inventor Tao Mei

Tao Mei has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20120297038
    Abstract: Techniques describe analyzing users and groups of a social network to identify user interests and providing recommendations for a user based on the user's identified interests. A content-awareness application obtains a collection of images and tags associated with the images belonging to members in the social network. The content-awareness application decomposes the members into a representative matrix to identify users and groups in order to calculate a similarity matrix between the users and their images based on a visual content of the images and a textual content of the tags. The content-awareness application further constructs a graph Laplacian over the users and the groups to align with the representative matrix based at least in part on the similarity matrix and further provides recommendations of groups for a user to join in the social network based at least in part on the graph Laplacian identifying the user's interests.
    Type: Application
    Filed: May 16, 2011
    Publication date: November 22, 2012
    Applicant: Microsoft Corporation
    Inventors: Tao Mei, Xian-Sheng Hua, Shipeng Li, Jinfeng Zhuang
  • Publication number: 20120263433
    Abstract: Tools and techniques for acquiring key roles and their relationships from a video independent of metadata, such as cast lists and scripts, are described herein. These techniques include discovering key roles and their relationships by treating a video (e.g., a movie, television program, music video, and personal video, etc.) as a community. For instance, a video is segmented into a hierarchical structure that includes levels for scenes, shots, and key frames. In some implementations, the techniques include performing face detection and grouping on the detected key frames. In some implementations, the techniques include exploiting the key roles and their correlations in this video to discover a community. The discovered community provides for a wide variety of applications, including the automatic generation of visual summaries or video posters including acquired key roles.
    Type: Application
    Filed: April 12, 2011
    Publication date: October 18, 2012
    Applicant: Microsoft Corporation
    Inventors: Tao Mei, Xian-Sheng Hua, Shipeng Li, Yan Wang
  • Patent number: 8249366
    Abstract: Described is a technology by which an image is classified (e.g., grouped and/or labeled), based on multi-label multi-instance data learning-based classification according to semantic labels and regions. An image is processed in an integrated framework into multi-label multi-instance data, including region and image labels. The framework determines local association data based on each region of an image. Other multi-label multi-instance data is based on relationships between region labels of the image, relationships between image labels of the image, and relationships between the region and image labels. These data are combined to classify the image. Training is also described.
    Type: Grant
    Filed: June 16, 2008
    Date of Patent: August 21, 2012
    Assignee: Microsoft Corporation
    Inventors: Tao Mei, Xian-Sheng Hua, Shipeng Li, Zheng-Jun Zha
  • Patent number: 8207989
    Abstract: Embodiments that provide multi-video synthesis are disclosed. In accordance with one embodiment, multi-video synthesis includes breaking a main video into a plurality of main frames and break a supplementary video into a plurality of supplementary frames. The multi-video synthesis also includes assigning one or more supplementary frames into each of a plurality of states of a Hidden Markov Model (HMM), where each of the plurality of states corresponding to one or more main frames. The multi-video synthesis further includes determining optimal frames in the plurality of main frames for insertion of the plurality of supplementary frames based on the plurality of states and visual properties. The optimal frames include optimal insertion positions. The multi-video synthesis additionally includes inserting the plurality of supplementary frames into the optimal insertion positions to form a synthesized video.
    Type: Grant
    Filed: December 12, 2008
    Date of Patent: June 26, 2012
    Assignee: Microsoft Corporation
    Inventors: Tao Mei, Xian-Sheng Hua, Shipeng Li, Teng Li
  • Publication number: 20120109754
    Abstract: The sponsored multi-media blogging technique is an advertising-driven service on a computing device, such as a mobile phone, that makes the multi-media micro-blog or blog an effective carrier for advertising. The data collected while employing the sponsored multi-media blogging technique is used for user intent mining and increasing advertisement relevance for mobile advertising projects. The benefits to the sponsored multi-media blogging technique's users are a natural interface for composing multi-media micro-blogs/blogs and instant experience sharing, while the benefits to advertisers is the promoted brand impression from the contextual advertising in rich media micro-blogs/blogs.
    Type: Application
    Filed: November 3, 2010
    Publication date: May 3, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Tao Mei, Xian-Sheng Hua, Ying-Qing Xu, Shipeng Li
  • Publication number: 20120110432
    Abstract: Techniques for the design and operation of a blogging tool for automated blog creation and automated upload to a server are described herein. A content capturing process may obtain a plurality of images, including still images or video, as well as audio capture of voices and other sound, according to direction of a user operating an image-capture device. One or more of the images may be annotated with metadata or with text, which may be derived from verbal content provided by the user. A template may be selected in either an automated or user-controlled manner. The images and other content may be assembled into the template to form a blog entry. The blog entry may be uploaded to a server or otherwise shared. In one example, the uploading may be in response to a single user command, obtained by operation of a physical user interface or from verbal user input.
    Type: Application
    Filed: October 29, 2010
    Publication date: May 3, 2012
    Applicant: Microsoft Corporation
    Inventors: Tao Mei, Xian-Sheng Hua, Shipeng Li
  • Publication number: 20120095825
    Abstract: Techniques for image selection and region of interest analysis are described herein. A pair of two or more users is configured, and an image is displayed to the pair. The image can be a still image (i.e., a picture) or a moving image (i.e., video). In some instances, a plurality of advertisements is suggested for possible association with the image. Input is received from both users in the pair, indicating a positive or a negative association between each advertisement and the image. When the pair positively rates an advertisement, the advertisement is associated with the image. A plurality of regions of interest within the image may be suggested. In response, positive or negative input is received from the pair indicating whether each of the plurality of regions of interest is appropriately suggested for placement of an advertisement.
    Type: Application
    Filed: October 18, 2010
    Publication date: April 19, 2012
    Applicant: Microsoft Corporation
    Inventors: Tao Mei, Xian-Sheng Hua, Shipeng Li
  • Publication number: 20110289015
    Abstract: Users may browse web pages, interact with a plethora of applications, search for new content, and perform a wide variety of other tasks using a mobile device. Unfortunately, useful content may be difficult for a user to locate because of the large amount of content available (e.g. hundreds of thousands of applications within an application store). Accordingly, one or more systems and/or techniques for determining recommendations are disclosed herein. In particular, user input (e.g., text, numbers, etc.) and/or a user profile (e.g., contextual information relating to a user) may be used to determine a user intent. Recommendations may be determined based upon the user intent. For example, a user may input “I am hungry” using a mobile phone having a GPS location of Downtown and a noon timestamp. Using this information, an application allowing the user to make lunch reservations at local restaurants may be provided as a recommendation.
    Type: Application
    Filed: May 21, 2010
    Publication date: November 24, 2011
    Applicant: Microsoft Corporation
    Inventors: Tao Mei, Ying-Qing Xu, Xian-Sheng Hua, Shipeng Li
  • Publication number: 20110288929
    Abstract: Techniques for recommending music and advertising to enhance a user's experience while photo browsing are described. In some instances, songs and ads are ranked for relevance to at least one photo from a photo album. The songs, ads and photo(s) from the photo album are then mapped to a style and mood ontology to obtain vector-based representations. The vector-based representations can include real valued terms, each term associated with a human condition defined by the ontology. A re-ranking process generates a relevancy term for each song and each ad indicating relevancy to the photo album. The relevancy terms can be calculated by summing weighted terms from the ranking and the mapping. Recommended music and ads may then be provided to a user, as the user browses a series of photos obtained from the photo album. The ads may be seamlessly embedded into the music in a nonintrusive manner.
    Type: Application
    Filed: May 24, 2010
    Publication date: November 24, 2011
    Applicant: Microsoft Corporation
    Inventors: Tao Mei, Xian-Sheng Hua, Shipeng Li, Jinlian Guo, Fei Sheng
  • Publication number: 20110267544
    Abstract: Described is perceptually near-lossless video summarization for use in maintaining video summaries, which operates to substantially reconstruct an original video in a generally perceptually near-lossless manner. A video stream is summarized with little information loss by using a relatively very small piece of summary metadata. The summary metadata comprises an image set of synthesized mosaics and representative keyframes, audio data, and the metadata about video structure and motion. In one implementation, the metadata is computed and maintained (e.g., as a file) to summarize a relatively large video sequence, by segmenting a video shot into subshots, and selecting keyframes and mosaics based upon motion data corresponding to those subshots. The motion data is maintained as a semantic description associated with the image set.
    Type: Application
    Filed: April 28, 2010
    Publication date: November 3, 2011
    Applicant: Microsoft Corporation
    Inventors: Tao Mei, Xian-Sheng Hua, Shipeng Li, Lin-Xie Tang
  • Publication number: 20110264700
    Abstract: Many internet users consume content through online videos. For example, users may view movies, television shows, music videos, and/or homemade videos. It may be advantageous to provide additional information to users consuming the online videos. Unfortunately, many current techniques may be unable to provide additional information relevant to the online videos from outside sources. Accordingly, one or more systems and/or techniques for determining a set of additional information relevant to an online video are disclosed herein. In particular, visual, textual, audio, and/or other features may be extracted from an online video (e.g., original content of the online video and/or embedded advertisements). Using the extracted features, additional information (e.g., images, advertisements, etc.) may be determined based upon matching the extracted features with content of a database. The additional information may be presented to a user consuming the online video.
    Type: Application
    Filed: April 26, 2010
    Publication date: October 27, 2011
    Applicant: Microsoft Corporation
    Inventors: Tao Mei, Xian-Sheng Hua, Shipeng Li
  • Publication number: 20110196859
    Abstract: An initial ranked list of a first plurality of visual documents is obtained from a first source in response to a query, and a second plurality of visual documents relevant to the query is gathered from a plurality of second sources. Visual patterns identified from the second plurality of visual documents are compared with the first visual documents for reranking the first visual documents.
    Type: Application
    Filed: February 5, 2010
    Publication date: August 11, 2011
    Applicant: Microsoft Corporation
    Inventors: Tao Mei, Xian-Sheng Hua, Shipeng Li, Yuan Liu
  • Publication number: 20110075992
    Abstract: Video advertising overlay technique embodiments are presented that generally detect a set of spatio-temporal nonintrusive positions within a series of consecutive video frames in shots of a digital video and then overlay contextually relevant ads on these positions. In one general embodiment, this is accomplished by decomposing the video into a series of shots, and then identifying a video advertisement for each of a selected set of the shots. The identified video advertisement is one that is determined to be the most relevant to the content of the shot. An overlay area is also identified in each of the shots, where the selected overlay area is the least intrusive among a plurality of prescribed areas to a viewer of the video. The video advertisements identified for the shots are then respectively scheduled to be overlaid in the identified overlay area of a shot, whenever the shot is played.
    Type: Application
    Filed: September 30, 2009
    Publication date: March 31, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: TAO MEI, XIAN-SHENG HUA, SHIPENG LI, JINLIAN GUO
  • Patent number: 7890512
    Abstract: Images are automatically annotated using semantic distance learning. Training images are manually annotated and partitioned into semantic clusters. Semantic distance functions (SDFs) are learned for the clusters. The SDF for each cluster is used to compute semantic distance scores between a new image and each image in the cluster. The scores for each cluster are used to generate a ranking list which ranks each image in the cluster according to its semantic distance from the new image. An association probability is estimated for each cluster which specifies the probability of the new image being semantically associated with the cluster. Cluster-specific probabilistic annotations for the new image are generated from the manual annotations for the images in each cluster. The association probabilities and cluster-specific probabilistic annotations for all the clusters are used to generate final annotations for the new image.
    Type: Grant
    Filed: June 11, 2008
    Date of Patent: February 15, 2011
    Assignee: Microsoft Corporation
    Inventors: Tao Mei, Xian-Sheng Hua, Shipeng Li, Yong Wang
  • Publication number: 20100205202
    Abstract: Techniques described herein enable better understanding of the intent of a user that submits a particular search query. These techniques receive a search request for images associated with a particular query. In response, the techniques determine images that are associated with the query, as well as other keywords that are associated with these images. The techniques then cluster, for each set of images associated with one of these keywords, the set of images into multiple groups. The techniques then rank the images and determine a representative image of each cluster. Finally, the tools suggest, to the user that submitted the query, to refine the search based on user selection of a keyword and a representative image. Thus, the techniques better understand the user's intent by allowing the user to refine the search based on another keyword and based on an image on which the user wishes to focus the search.
    Type: Application
    Filed: February 11, 2009
    Publication date: August 12, 2010
    Applicant: Microsoft Corporation
    Inventors: Linjun Yang, Meng Wang, Zhengjun Zha, Tao Mei, Xian-Sheng Hua
  • Patent number: 7773813
    Abstract: Systems and methods are described for detecting capture-intention in order to analyze video content. In one implementation, a system decomposes video structure into sub-shots, extracts intention-oriented features from the sub-shots, delineates intention units via the extracted features, and classifies the intention units into intention categories via the extracted features. A video library can be organized via the categorized intention units.
    Type: Grant
    Filed: October 31, 2005
    Date of Patent: August 10, 2010
    Assignee: Microsoft Corporation
    Inventors: Xian-Sheng Hua, Shipeng Li, Tao Mei
  • Publication number: 20100153219
    Abstract: Computer program products, devices, and methods for generating in-text embedded advertising are described. Embedded advertising is “hidden” or embedded into a message by matching an advertisement to the message and identifying a place in the message to insert the advertisement. For textual messages, statistical analysis of individual sentences is performed to determine where it would be most natural to insert an advertisement. Statistical rules of grammar derived from a language model may be used choose a natural and grammatical place in the sentence for inserting the advertisement. Insertion of the advertisement creates a modified sentence without degrading a meaning of the original sentence, yet also includes the advertisement as a part of a new sentence.
    Type: Application
    Filed: December 12, 2008
    Publication date: June 17, 2010
    Applicant: Microsoft Corporation
    Inventors: Tao Mei, Xian-Sheng Hua, Shipeng Li, Linjun Yang
  • Publication number: 20100149419
    Abstract: Embodiments that provide multi-video synthesis are disclosed. In accordance with one embodiment, multi-video synthesis includes breaking a main video into a plurality of main frames and break a supplementary video into a plurality of supplementary frames. The multi-video synthesis also includes assigning one or more supplementary frames into each of a plurality of states of a Hidden Markov Model (HMM), where each of the plurality of states corresponding to one or more main frames. The multi-video synthesis further includes determining optimal frames in the plurality of main frames for insertion of the plurality of supplementary frames based on the plurality of states and visual properties. The optimal frames include optimal insertion positions. The multi-video synthesis additionally includes inserting the plurality of supplementary frames into the optimal insertion positions to form a synthesized video.
    Type: Application
    Filed: December 12, 2008
    Publication date: June 17, 2010
    Applicant: Microsoft Corporation
    Inventors: Tao Mei, Xian-Sheng Hua, Shipeng Li, Teng Li
  • Publication number: 20090319883
    Abstract: Described is a technology in which a new video is automatically annotated based on terms mined from the text associated with similar videos. In a search phase, searching by one or more various search modalities (e.g., text, concept and/or video) finds a set of videos that are similar to a new video. Text associated with the new video and with the set of videos is obtained, such as by automatic speech recognition that generates transcripts. A mining mechanism combines the associated text of the similar videos with that of the new video to find the terms that annotate the new video. For example, the mining mechanism creates a new term frequency vector by combining term frequency vectors for the set of similar videos with a term frequency vector for the new video, and provides the mined terms by fitting a zipf curve to the new term frequency vector.
    Type: Application
    Filed: June 19, 2008
    Publication date: December 24, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Tao Mei, Xian-Sheng Hua, Wei-Ying Ma, Emily Kay Moxley
  • Publication number: 20090313294
    Abstract: Images are automatically annotated using semantic distance learning. Training images are manually annotated and partitioned into semantic clusters. Semantic distance functions (SDFs) are learned for the clusters. The SDF for each cluster is used to compute semantic distance scores between a new image and each image in the cluster. The scores for each cluster are used to generate a ranking list which ranks each image in the cluster according to its semantic distance from the new image. An association probability is estimated for each cluster which specifies the probability of the new image being semantically associated with the cluster. Cluster-specific probabilistic annotations for the new image are generated from the manual annotations for the images in each cluster. The association probabilities and cluster-specific probabilistic annotations for all the clusters are used to generate final annotations for the new image.
    Type: Application
    Filed: June 11, 2008
    Publication date: December 17, 2009
    Applicant: Microsoft Corporation
    Inventors: Tao Mei, Xian-Sheng Hua, Shipeng Li, Yong Wang