Patents by Inventor Benyu Zhang

Benyu Zhang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7533094
    Abstract: A method and system for determining similarity between items is provided. To calculate similarity scores for pairs of items, the similarity system initializes a similarity score for each pair of objects and each pair of features. The similarity system then iteratively calculates the similarity scores for each pair of objects based on the similar scores of the pairs of features calculated during a previous iteration and calculates the similarity scores for each pair of features based on the similarity scores of the pairs of objects calculated during a previous iteration. The similarity system implements an algorithm that is based on a recursive definition of the similarities between objects and between features. The similarity system continues the iterations of recalculating the similarity scores until the similarity scores converge on a solution.
    Type: Grant
    Filed: November 23, 2004
    Date of Patent: May 12, 2009
    Assignee: Microsoft Corporation
    Inventors: Benyu Zhang, Hua-Jun Zeng, Wei-Ying Ma, Zheng Chen, Ning Liu, Jun Yan
  • Publication number: 20090119284
    Abstract: A method and system for classifying display pages based on automatically generated summaries of display pages. A web page classification system uses a web page summarization system to generate summaries of web pages. The summary of a web page may include the sentences of the web page that are most closely related to the primary topic of the web page. The summarization system may combine the benefits of multiple summarization techniques to identify the sentences of a web page that represent the primary topic of the web page. Once the summary is generated, the classification system may apply conventional classification techniques to the summary to classify the web page. The classification system may use conventional classification techniques such as a Naïve Bayesian classifier or a support vector machine to identify the classifications of a web page based on the summary generated by the summarization system.
    Type: Application
    Filed: June 24, 2008
    Publication date: May 7, 2009
    Applicant: Microsoft Corporation
    Inventors: Zheng Chen, Dou Shen, Benyu Zhang, Hua-Jun Zeng, Wei-Ying Ma
  • Patent number: 7529719
    Abstract: Computer-readable media having computer-executable instructions and apparatuses categorize documents or corpus of documents. A Tensor Space Model (TSM), which models the text by a higher-order tensor, represents a document or a corpus of documents. Supported by techniques of multilinear algebra, TSM provides a framework for analyzing the multifactor structures. TSM is further supported by operations and presented tools, such as the High-Order Singular Value Decomposition (HOSVD) for a reduction of the dimensions of the higher-order tensor. The dimensionally reduced tensor is compared with tensors that represent possible categories. Consequently, a category is selected for the document or corpus of documents. Experimental results on the dataset for 20 Newsgroups suggest that TSM is advantageous to a Vector Space Model (VSM) for text classification.
    Type: Grant
    Filed: March 17, 2006
    Date of Patent: May 5, 2009
    Assignee: Microsoft Corporation
    Inventors: Ning Liu, Benyu Zhang, Jun Yan, Zheng Chen, Hua-Jun Zeng, Jian Wang
  • Patent number: 7529735
    Abstract: A method and system for identifying information about people is provided. The information system identifies groups of people that have relationships based on their relationships to documents or more generally to objects. The information system initially is provided with an indication of which people have which relationships to which documents. The information system then identifies clusters of people based on having a relationship to the same objects. The information system may also identify clusters of related objects associated with a cluster of people. When a user wants to identify information about a person, the user can provide the name of that person to the information system. The information system then can retrieve and display the names of the other people who are in the same cluster as the person.
    Type: Grant
    Filed: February 11, 2005
    Date of Patent: May 5, 2009
    Assignee: Microsoft Corporation
    Inventors: Benyu Zhang, Wei-Ying Ma, Gu Xu, Hongbin Gao, Zheng Chen, Randy Hinrichs, Hua-Jun Zeng
  • Publication number: 20090106019
    Abstract: A method and system for prioritizing communications based on classifications of sentences within the communications is provided. A sentence classification system may classify sentences of communications according to various classifications such as “sentence mode.” The sentence classification system trains a sentence classifier using training data and then classifies sentences using the trained sentence classifier. After the sentences of a communication are classified, a document ranking system may generate a rank for the communication based on the classifications of the sentences within the communication. The document ranking system trains a document rank classifier using training data and then calculates the rank of communications using the trained document rank classifier.
    Type: Application
    Filed: October 20, 2008
    Publication date: April 23, 2009
    Applicant: Microsoft Corporation
    Inventors: Zheng Chen, Wei-Ying Ma, Hua-Jun Zeng, Benyu Zhang
  • Patent number: 7506274
    Abstract: Methods and systems for displaying data retrieved from a multi-dimensional data source via an interactive data diagram. A graphical user interface is responsive to input from a user to retrieve multi-dimensional data for display via an interactive data diagram. The interactive data diagram displays multi-dimensional data in a hierarchical structure that includes a plurality of dimension levels and one or more member levels within each dimension level. A user specifies a change to the display structure by selecting a displayed member level in the hierarchical structure. The interactive data diagram is responsive to the user specified change to generate a drilled down data diagram displaying detailed dimension and member levels related to the selected member level.
    Type: Grant
    Filed: May 18, 2005
    Date of Patent: March 17, 2009
    Assignee: Microsoft Corporation
    Inventors: Benyu Zhang, Teresa B. Mah, Lee Wang, Julie L. Hesseltine Richardson
  • Patent number: 7502495
    Abstract: A method and system for generating a projection matrix for projecting data from a high dimensional space to a low dimensional space. The system establishes an objective function based on a maximum margin criterion matrix. The system then provides data samples that are in the high dimensional space and have a class. For each data sample, the system incrementally derives leading eigenvectors of the maximum margin criterion matrix based on the derivation of the leading eigenvectors of the last data sample. The derived eigenvectors compose the projection matrix, which can be used to project data samples in a high dimensional space into a low dimensional space.
    Type: Grant
    Filed: March 1, 2005
    Date of Patent: March 10, 2009
    Assignee: Microsoft Corporation
    Inventors: Benyu Zhang, Hua-Jun Zeng, Jun Yan, Wei-Ying Ma, Zheng Chen
  • Patent number: 7502785
    Abstract: Extraction of semantic information and the generation of semantic attributes allows for improved organization and management of data. Semantic attributes are automatically generated and eliminate the need for manual entry of attribute information. A semantic file network may further be constructed based on similarities between files that are based on the semantic attribute information. Semantic links representing a semantic relationship may be built between similar or relevant files. In addition, user operations and user operation patterns may also be considered in building the file network. Semantic attributes and information may further facilitate browsing the file systems as well as improve the accuracy and speed of queries.
    Type: Grant
    Filed: March 30, 2006
    Date of Patent: March 10, 2009
    Assignee: Microsoft Corporation
    Inventors: Zheng Chen, Lei Li, Chenxi Lin, Qiaoling Liu, Jian Wang, Benyu Zhang
  • Publication number: 20090006313
    Abstract: Techniques for analyzing and modeling the frequency of queries are provided by a query analysis system. A query analysis system analyzes frequencies of a query over time to determine whether the query is time-dependent or time-independent. The query analysis system forecasts the frequency of time-dependent queries based on their periodicities. The query analysis system forecasts the frequency of time-independent queries based on causal relationships with other queries. To forecast the frequency of time-independent queries, the query analysis system analyzes the frequency of a query over time to identify significant increases in the frequency, which are referred to as “query events” or “events.” The query analysis system forecasts frequencies of time-independent queries based on queries with events that tend to causally precede events of the query to be forecasted.
    Type: Application
    Filed: June 28, 2007
    Publication date: January 1, 2009
    Applicant: Microsoft Corporation
    Inventors: Ning Liu, Jun Yan, Benyu Zhang, Zheng Chen, Jian Wang
  • Publication number: 20090006365
    Abstract: Techniques for identifying similar queries based on their overall similarity and partial similarity of time series of frequencies of the queries are provided. To identify queries that are similar to a target query, the query analysis system generates, for each query, an overall similarity score for that query and the target query based on the time series of the query and the target query. The query analysis system also generates, for each query, partial similarity scores for the query and the target query based on various time sub-series of the overall time series of the queries. The query analysis system then identifies queries as being similar to the target query based on the overall similarity scores and the partial similarity scores of the queries.
    Type: Application
    Filed: June 28, 2007
    Publication date: January 1, 2009
    Applicant: Microsoft Corporation
    Inventors: Ning Liu, Jun Yan, Benyu Zhang, Zheng Chen, Jian Wang
  • Publication number: 20090006312
    Abstract: Techniques for analyzing and modeling the frequency of queries are provided by a query analysis system. A query analysis system analyzes frequencies of a query over time to determine whether the query is time-dependent or time-independent. The query analysis system forecasts the frequency of time-dependent queries based on their periodicities. The query analysis system forecasts the frequency of time-independent queries based on causal relationships with other queries. To forecast the frequency of time-independent queries, the query analysis system analyzes the frequency of a query over time to identify significant increases in the frequency, which are referred to as “query events” or “events.” The query analysis system forecasts frequencies of time-independent queries based on queries with events that tend to causally precede events of the query to be forecasted.
    Type: Application
    Filed: June 28, 2007
    Publication date: January 1, 2009
    Applicant: Microsoft Corporation
    Inventors: Ning Liu, Jun Yan, Benyu Zhang, Zheng Chen, Jian Wang
  • Publication number: 20090006284
    Abstract: Techniques for analyzing and modeling the frequency of queries are provided by a query analysis system. A query analysis system analyzes frequencies of a query over time to determine whether the query is time-dependent or time-independent. The query analysis system forecasts the frequency of time-dependent queries based on their periodicities. The query analysis system forecasts the frequency of time-independent queries based on causal relationships with other queries. To forecast the frequency of time-independent queries, the query analysis system analyzes the frequency of a query over time to identify significant increases in the frequency, which are referred to as “query events” or “events.” The query analysis system forecasts frequencies of time-independent queries based on queries with events that tend to causally precede events of the query to be forecasted.
    Type: Application
    Filed: June 28, 2007
    Publication date: January 1, 2009
    Applicant: Microsoft Corporation
    Inventors: Ning Liu, Jun Yan, Benyu Zhang, Zheng Chen, Jian Wang
  • Publication number: 20090006045
    Abstract: Techniques for analyzing and modeling the frequency of queries are provided by a query analysis system. A query analysis system analyzes frequencies of a query over time to determine whether the query is time-dependent or time-independent. The query analysis system forecasts the frequency of time-dependent queries based on their periodicities. The query analysis system forecasts the frequency of time-independent queries based on causal relationships with other queries. To forecast the frequency of time-independent queries, the query analysis system analyzes the frequency of a query over time to identify significant increases in the frequency, which are referred to as “query events” or “events.” The query analysis system forecasts frequencies of time-independent queries based on queries with events that tend to causally precede events of the query to be forecasted.
    Type: Application
    Filed: June 28, 2007
    Publication date: January 1, 2009
    Applicant: Microsoft Corporation
    Inventors: Ning Liu, Jun Yan, Benyu Zhang, Zheng Chen, Jian Wang
  • Publication number: 20090006294
    Abstract: Techniques for analyzing and modeling the frequency of queries are provided by a query analysis system. A query analysis system analyzes frequencies of a query over time to determine whether the query is time-dependent or time-independent. The query analysis system forecasts the frequency of time-dependent queries based on their periodicities. The query analysis system forecasts the frequency of time-independent queries based on causal relationships with other queries. To forecast the frequency of time-independent queries, the query analysis system analyzes the frequency of a query over time to identify significant increases in the frequency, which are referred to as “query events” or “events.” The query analysis system forecasts frequencies of time-independent queries based on queries with events that tend to causally precede events of the query to be forecasted.
    Type: Application
    Filed: June 28, 2007
    Publication date: January 1, 2009
    Applicant: Microsoft Corporation
    Inventors: Ning Liu, Jun Yan, Benyu Zhang, Zheng Chen, Jian Wang
  • Publication number: 20090006326
    Abstract: Representing queries and determining similarity of queries based on an autoregressive integrated moving average (“ARIMA”) model is provided. A query analysis system represents each query by its ARIMA coefficients. The query analysis system may estimate the frequency information for a desired past or future interval based on frequency information for some initial intervals. The query analysis system may also determine the similarity of a pair of queries based on the similarity of their ARIMA coefficients. The query analysis system may use various metrics, such as a correlation metric, to determine the similarity of the ARIMA coefficients.
    Type: Application
    Filed: June 28, 2007
    Publication date: January 1, 2009
    Applicant: Microsoft Corporation
    Inventors: Ning Liu, Jun Yan, Benyu Zhang, Zheng Chen, Jian Wang
  • Publication number: 20080288348
    Abstract: A method for ranking online advertisements using retailer reputation and product reputation. In one implementation, a query may be received. Advertisements may be selected by determining a level of relevance between the query and each advertisement and selecting the advertisements with a level of relevance above a pre-determined level of relevance. A predicted reputation for a retailer and a predicted reputation for a product may be retrieved for each of the selected advertisements. The selected advertisements may then be ranked based on the predicted reputation for the retailer and the predicted reputation of the product. The ranking of the selected advertisements may be accomplished by calculating a ranking score for each selected advertisement based on the retailer predicted reputation and the product predicted reputation. The selected advertisements may then be displayed according to the ranking.
    Type: Application
    Filed: May 15, 2007
    Publication date: November 20, 2008
    Applicant: Microsoft Corporation
    Inventors: Huajun Zeng, Chenxi Lin, Dingyi Han, Benyu Zhang, Zheng Chen, Jian Wang
  • Publication number: 20080288483
    Abstract: Described is an efficient retrieval mechanism that quickly locates documents (e.g., corresponding to online advertisements) based on query term discrimination. A topmost subset (e.g., two) of search terms is selected according to their ranked importance, e.g., as ranked by inverted document frequency. The topmost terms are then used to narrow the number of rows of an inverted query index that are searched to find document identifiers and associated scores, such as computed offline by a BM25 algorithm. For example, for each document identifier of each important term, a fast search within each of the narrowed subset of rows (that also contain that document identifier) may be performed by comparing document identifiers to jump a pointer within each other row, followed by a binary search to locate a particular document. The scores of the set of particular documents may then be used to rank their relative importance for returning as results.
    Type: Application
    Filed: May 18, 2007
    Publication date: November 20, 2008
    Applicant: Microsoft Corporation
    Inventors: Chenxi Lin, Lei Ji, Huajun Zeng, Benyu Zhang, Zheng Chen, Jian Wang
  • Publication number: 20080288491
    Abstract: Described is a behavioral targeting technology for online advertising, by which an original attribute is uniformly expanded. Users that meet an original attribute are aggregated into a mid-result used to determine similarity relative to candidate attribute types. The most similar candidate attributes are selected for the expanded attribute. A URL/URL pattern suggestion technology is provided, with similarity computed from users/URLs visited by the users. URLs are separated into URL tree nodes, for calculating the number of users who have visited each URL and the number of users who have visited the URL on a sub-tree whose root is the node. URL/URL patterns are output based on similarity. Domains are also suggested based on user-visits. Similarities between pairs of domains may be computed (e.g., offline), with an output for a given domain provided in based on its similarity with each other domain.
    Type: Application
    Filed: May 15, 2007
    Publication date: November 20, 2008
    Applicant: Microsoft Corporation
    Inventors: Min Wu, Chenxi Lin, Benyu Zhang, Zheng Chen, Jian Wang
  • Publication number: 20080288481
    Abstract: Described is a technology by which online advertisements for returning with a query response are ranked according to reputation. The reputation may correspond to a product or service and/or seller reputation. In one example, a set of relevant advertisement items are located and ranked using reputation data as a factor. For example, for each item, a ranking value is based on a mathematical combination of a product reputation score, a seller reputation score and a relevance score, with the items ranked by their computed values. The scores may be weighted differently. The reputation data may be mined from a review source, such as customer reviews available on the web. In one example implementation, a 3-gram model that considers terms in the review along with the two terms proceeding each term is used to analyze the reviews to determine whether each review is positive or negative with respect to the reputation.
    Type: Application
    Filed: May 15, 2007
    Publication date: November 20, 2008
    Applicant: Microsoft Corporation
    Inventors: Huajun Zeng, Chenxi Lin, Dingyi Han, Benyu Zhang, Zheng Chen, Jian Wang
  • Publication number: 20080281834
    Abstract: Described is a technology by which blocks of web pages may be selected, such as for building a user-personalized web page containing selected blocks. A selection mechanism, such as a browser toolbar add-on, provides a user interface for selecting blocks, and records information about selected blocks. A block tracking mechanism (e.g., a daemon program) uses the information to locate selected blocks of the web pages, including when the web page containing the block is updated with respect to content and/or layout. The block tracking mechanism may update a local gadget that when invoked, such as by browsing to a particular web page, which shows updated versions of the block on a personalized web page. Blocks may be efficiently located by processing trees representing web pages into reduced trees, and then by performing a minimum distance mapping algorithm on the reduced trees.
    Type: Application
    Filed: May 9, 2007
    Publication date: November 13, 2008
    Applicant: Microsoft Corporation
    Inventors: Min Wu, Chenxi Lin, Benyu Zhang, Huajun Zeng, Zheng Chen, Jian Wang