Patents by Inventor Benyu Zhang
Benyu Zhang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 7590603Abstract: A method and system for classifying messages of a discussion thread as questions is provided. A classification system generates a classifier to classify messages of discussion threads as question messages or non-question messages. The system trains the classifier using the feature vectors and input classifications derived from a training set of discussion threads. After the classifier is trained, the classification system uses the classifier to classify messages within a corpus of discussion threads as question or non-question messages. To classify a message, the classification system generates a feature vector for the messages and submits that feature vector to the classifier. The classifier generates a score for the message indicating a likelihood that the message is a question message.Type: GrantFiled: October 1, 2004Date of Patent: September 15, 2009Assignee: Microsoft CorporationInventors: Benyu Zhang, Zheng Chen, Hua-Jun Zeng, Wei-Ying Ma
-
Publication number: 20090228452Abstract: A method and system for identifying information about people is provided. The information system identifies groups of people that have relationships based on their relationships to documents or more generally to objects. The information system initially is provided with an indication of which people have which relationships to which documents. The information system then identifies clusters of people based on having a relationship to the same objects. The information system may also identify clusters of related objects associated with a cluster of people. When a user wants to identify information about a person, the user can provide the name of that person to the information system. The information system then can retrieve and display the names of the other people who are in the same cluster as the person.Type: ApplicationFiled: March 17, 2009Publication date: September 10, 2009Applicant: Microsoft CorporationInventors: Benyu Zhang, Wei-Ying Ma, Gu Xu, Hongbin Gao, Zheng Chen, Randy Hinrichs, Hua-Jun Zeng
-
Patent number: 7584100Abstract: A method and system for clustering documents based on generalized sentence patterns of the topics of the documents is provided. A generalized sentence patterns (“GSP”) system identifies a “sentence” that describes the topic of a document. To cluster documents, the GSP system generates a “generalized sentence” form of the sentence that describes the topic of each document. The generalized sentence is an abstraction of the words of the sentence. The GSP system identifies clusters of documents based on the patterns of their generalized sentences. The GSP system clusters documents when the generalized sentence representations of their topics have a similar pattern.Type: GrantFiled: June 30, 2004Date of Patent: September 1, 2009Assignee: Microsoft CorporationInventors: Benyu Zhang, Wei-Ying Ma, Zheng Chen, Hua-Jun Zeng
-
Publication number: 20090193047Abstract: The claimed subject matter is directed to constructing query hierarchies in response to a query request. To construct a query hierarchy, a list of related candidate queries is generated in response to the received query request. The list of related candidate queries is generated by determining the relative coverage of information shared by the candidate queries and the query request. Relationships between the submitted query request and the candidate queries in the list are determined based upon the extent of relative coverage of information shared by the candidate queries and the query request. A query hierarchy is then constructed to reflect the determined relationships between the query request and the candidate queries.Type: ApplicationFiled: January 28, 2008Publication date: July 30, 2009Applicant: MICROSOFT CORPORATIONInventors: Weizhu Chen, Benyu Zhang, Zheng Chen, Jian Wang, Dou Shen
-
Patent number: 7567895Abstract: A method and system for prioritizing communications based on classifications of sentences within the communications is provided. A sentence classification system may classify sentences of communications according to various classifications such as “sentence mode.” The sentence classification system trains a sentence classifier using training data and then classifies sentences using the trained sentence classifier. After the sentences of a communication are classified, a document ranking system may generate a rank for the communication based on the classifications of the sentences within the communication. The document ranking system trains a document rank classifier using training data and then calculates the rank of communications using the trained document rank classifier.Type: GrantFiled: August 31, 2004Date of Patent: July 28, 2009Assignee: Microsoft CorporationInventors: Zheng Chen, Wei-Ying Ma, Hua-Jun Zeng, Benyu Zhang
-
Patent number: 7565372Abstract: A summary system for evaluating summaries of documents and for generating summaries of documents based on normalized probabilities of portions of the document. A summarization system generates a summary by selecting sentences for the summary based on their normalized probabilities as derived from a document model. An evaluation system evaluates the effectiveness of a summary based on a normalized probability for the summary that is derived from a document model.Type: GrantFiled: September 13, 2005Date of Patent: July 21, 2009Assignee: Microsoft CorporationInventors: Benyu Zhang, Hua-Jun Zeng, Wei-Ying Ma, Zheng Chen, Hua Li
-
Patent number: 7555480Abstract: The invention provides a method of interactively crawling data records on a web page. Users may select various data records of interest on a web page to generate templates to search for similar data items on the same web page or on different web pages. A tree matching algorithm may be used to compare and extract data matching the generated template.Type: GrantFiled: July 11, 2006Date of Patent: June 30, 2009Assignee: Microsoft CorporationInventors: Benyu Zhang, Chenxi Lin, Hua-Jun Zeng, Jian Wang, Ke Tang, Zheng Chen
-
Publication number: 20090132530Abstract: Described herein is technology for, among other things, mining pair-based data on the web. The technology involves an online pair-based data mining system as well as an offline SVM training system. By subjecting a pair-based input data to the systems, one may grow a pool of pair-based data which share characteristics of the pair-based input data in more efficient manner.Type: ApplicationFiled: November 19, 2007Publication date: May 21, 2009Applicant: MICROSOFT CORPORATIONInventors: Weizhu Chen, Long Jiang, Ming Zhou, Benyu Zhang, Zheng Chen, Jian Wang
-
Method and system for determining similarity of items based on similarity objects and their features
Patent number: 7533094Abstract: A method and system for determining similarity between items is provided. To calculate similarity scores for pairs of items, the similarity system initializes a similarity score for each pair of objects and each pair of features. The similarity system then iteratively calculates the similarity scores for each pair of objects based on the similar scores of the pairs of features calculated during a previous iteration and calculates the similarity scores for each pair of features based on the similarity scores of the pairs of objects calculated during a previous iteration. The similarity system implements an algorithm that is based on a recursive definition of the similarities between objects and between features. The similarity system continues the iterations of recalculating the similarity scores until the similarity scores converge on a solution.Type: GrantFiled: November 23, 2004Date of Patent: May 12, 2009Assignee: Microsoft CorporationInventors: Benyu Zhang, Hua-Jun Zeng, Wei-Ying Ma, Zheng Chen, Ning Liu, Jun Yan -
Publication number: 20090119284Abstract: A method and system for classifying display pages based on automatically generated summaries of display pages. A web page classification system uses a web page summarization system to generate summaries of web pages. The summary of a web page may include the sentences of the web page that are most closely related to the primary topic of the web page. The summarization system may combine the benefits of multiple summarization techniques to identify the sentences of a web page that represent the primary topic of the web page. Once the summary is generated, the classification system may apply conventional classification techniques to the summary to classify the web page. The classification system may use conventional classification techniques such as a Naïve Bayesian classifier or a support vector machine to identify the classifications of a web page based on the summary generated by the summarization system.Type: ApplicationFiled: June 24, 2008Publication date: May 7, 2009Applicant: Microsoft CorporationInventors: Zheng Chen, Dou Shen, Benyu Zhang, Hua-Jun Zeng, Wei-Ying Ma
-
Patent number: 7529735Abstract: A method and system for identifying information about people is provided. The information system identifies groups of people that have relationships based on their relationships to documents or more generally to objects. The information system initially is provided with an indication of which people have which relationships to which documents. The information system then identifies clusters of people based on having a relationship to the same objects. The information system may also identify clusters of related objects associated with a cluster of people. When a user wants to identify information about a person, the user can provide the name of that person to the information system. The information system then can retrieve and display the names of the other people who are in the same cluster as the person.Type: GrantFiled: February 11, 2005Date of Patent: May 5, 2009Assignee: Microsoft CorporationInventors: Benyu Zhang, Wei-Ying Ma, Gu Xu, Hongbin Gao, Zheng Chen, Randy Hinrichs, Hua-Jun Zeng
-
Patent number: 7529719Abstract: Computer-readable media having computer-executable instructions and apparatuses categorize documents or corpus of documents. A Tensor Space Model (TSM), which models the text by a higher-order tensor, represents a document or a corpus of documents. Supported by techniques of multilinear algebra, TSM provides a framework for analyzing the multifactor structures. TSM is further supported by operations and presented tools, such as the High-Order Singular Value Decomposition (HOSVD) for a reduction of the dimensions of the higher-order tensor. The dimensionally reduced tensor is compared with tensors that represent possible categories. Consequently, a category is selected for the document or corpus of documents. Experimental results on the dataset for 20 Newsgroups suggest that TSM is advantageous to a Vector Space Model (VSM) for text classification.Type: GrantFiled: March 17, 2006Date of Patent: May 5, 2009Assignee: Microsoft CorporationInventors: Ning Liu, Benyu Zhang, Jun Yan, Zheng Chen, Hua-Jun Zeng, Jian Wang
-
Publication number: 20090106019Abstract: A method and system for prioritizing communications based on classifications of sentences within the communications is provided. A sentence classification system may classify sentences of communications according to various classifications such as “sentence mode.” The sentence classification system trains a sentence classifier using training data and then classifies sentences using the trained sentence classifier. After the sentences of a communication are classified, a document ranking system may generate a rank for the communication based on the classifications of the sentences within the communication. The document ranking system trains a document rank classifier using training data and then calculates the rank of communications using the trained document rank classifier.Type: ApplicationFiled: October 20, 2008Publication date: April 23, 2009Applicant: Microsoft CorporationInventors: Zheng Chen, Wei-Ying Ma, Hua-Jun Zeng, Benyu Zhang
-
Patent number: 7506274Abstract: Methods and systems for displaying data retrieved from a multi-dimensional data source via an interactive data diagram. A graphical user interface is responsive to input from a user to retrieve multi-dimensional data for display via an interactive data diagram. The interactive data diagram displays multi-dimensional data in a hierarchical structure that includes a plurality of dimension levels and one or more member levels within each dimension level. A user specifies a change to the display structure by selecting a displayed member level in the hierarchical structure. The interactive data diagram is responsive to the user specified change to generate a drilled down data diagram displaying detailed dimension and member levels related to the selected member level.Type: GrantFiled: May 18, 2005Date of Patent: March 17, 2009Assignee: Microsoft CorporationInventors: Benyu Zhang, Teresa B. Mah, Lee Wang, Julie L. Hesseltine Richardson
-
Patent number: 7502785Abstract: Extraction of semantic information and the generation of semantic attributes allows for improved organization and management of data. Semantic attributes are automatically generated and eliminate the need for manual entry of attribute information. A semantic file network may further be constructed based on similarities between files that are based on the semantic attribute information. Semantic links representing a semantic relationship may be built between similar or relevant files. In addition, user operations and user operation patterns may also be considered in building the file network. Semantic attributes and information may further facilitate browsing the file systems as well as improve the accuracy and speed of queries.Type: GrantFiled: March 30, 2006Date of Patent: March 10, 2009Assignee: Microsoft CorporationInventors: Zheng Chen, Lei Li, Chenxi Lin, Qiaoling Liu, Jian Wang, Benyu Zhang
-
Patent number: 7502495Abstract: A method and system for generating a projection matrix for projecting data from a high dimensional space to a low dimensional space. The system establishes an objective function based on a maximum margin criterion matrix. The system then provides data samples that are in the high dimensional space and have a class. For each data sample, the system incrementally derives leading eigenvectors of the maximum margin criterion matrix based on the derivation of the leading eigenvectors of the last data sample. The derived eigenvectors compose the projection matrix, which can be used to project data samples in a high dimensional space into a low dimensional space.Type: GrantFiled: March 1, 2005Date of Patent: March 10, 2009Assignee: Microsoft CorporationInventors: Benyu Zhang, Hua-Jun Zeng, Jun Yan, Wei-Ying Ma, Zheng Chen
-
Publication number: 20090006326Abstract: Representing queries and determining similarity of queries based on an autoregressive integrated moving average (“ARIMA”) model is provided. A query analysis system represents each query by its ARIMA coefficients. The query analysis system may estimate the frequency information for a desired past or future interval based on frequency information for some initial intervals. The query analysis system may also determine the similarity of a pair of queries based on the similarity of their ARIMA coefficients. The query analysis system may use various metrics, such as a correlation metric, to determine the similarity of the ARIMA coefficients.Type: ApplicationFiled: June 28, 2007Publication date: January 1, 2009Applicant: Microsoft CorporationInventors: Ning Liu, Jun Yan, Benyu Zhang, Zheng Chen, Jian Wang
-
Publication number: 20090006045Abstract: Techniques for analyzing and modeling the frequency of queries are provided by a query analysis system. A query analysis system analyzes frequencies of a query over time to determine whether the query is time-dependent or time-independent. The query analysis system forecasts the frequency of time-dependent queries based on their periodicities. The query analysis system forecasts the frequency of time-independent queries based on causal relationships with other queries. To forecast the frequency of time-independent queries, the query analysis system analyzes the frequency of a query over time to identify significant increases in the frequency, which are referred to as “query events” or “events.” The query analysis system forecasts frequencies of time-independent queries based on queries with events that tend to causally precede events of the query to be forecasted.Type: ApplicationFiled: June 28, 2007Publication date: January 1, 2009Applicant: Microsoft CorporationInventors: Ning Liu, Jun Yan, Benyu Zhang, Zheng Chen, Jian Wang
-
Publication number: 20090006312Abstract: Techniques for analyzing and modeling the frequency of queries are provided by a query analysis system. A query analysis system analyzes frequencies of a query over time to determine whether the query is time-dependent or time-independent. The query analysis system forecasts the frequency of time-dependent queries based on their periodicities. The query analysis system forecasts the frequency of time-independent queries based on causal relationships with other queries. To forecast the frequency of time-independent queries, the query analysis system analyzes the frequency of a query over time to identify significant increases in the frequency, which are referred to as “query events” or “events.” The query analysis system forecasts frequencies of time-independent queries based on queries with events that tend to causally precede events of the query to be forecasted.Type: ApplicationFiled: June 28, 2007Publication date: January 1, 2009Applicant: Microsoft CorporationInventors: Ning Liu, Jun Yan, Benyu Zhang, Zheng Chen, Jian Wang
-
Publication number: 20090006284Abstract: Techniques for analyzing and modeling the frequency of queries are provided by a query analysis system. A query analysis system analyzes frequencies of a query over time to determine whether the query is time-dependent or time-independent. The query analysis system forecasts the frequency of time-dependent queries based on their periodicities. The query analysis system forecasts the frequency of time-independent queries based on causal relationships with other queries. To forecast the frequency of time-independent queries, the query analysis system analyzes the frequency of a query over time to identify significant increases in the frequency, which are referred to as “query events” or “events.” The query analysis system forecasts frequencies of time-independent queries based on queries with events that tend to causally precede events of the query to be forecasted.Type: ApplicationFiled: June 28, 2007Publication date: January 1, 2009Applicant: Microsoft CorporationInventors: Ning Liu, Jun Yan, Benyu Zhang, Zheng Chen, Jian Wang