Patents by Inventor Benyu Zhang

Benyu Zhang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20060036596
    Abstract: A method and system for calculating the significance of a sentence within a document is provided. The summarization system calculates the significance of the sentences of a document and selects the most significant sentences as the summary of the document. The summarization system calculates the significance of a sentence based on the “important” words of the document that are contained within the sentence. The summarization system calculates the importance of words of the document using various scoring techniques and then combines the scores to classify a word as important or not important. The summarization system can then be used to identify significant sentences of the document based on the important words that a sentence contains and select significant sentences as a summary of the document.
    Type: Application
    Filed: August 13, 2004
    Publication date: February 16, 2006
    Applicant: Microsoft Corporation
    Inventors: Benyu Zhang, Wei-Ying Ma, Zheng Chen, Hua-Jun Zeng, Dou Shen
  • Publication number: 20060026152
    Abstract: A clustering architecture that dynamically groups the search result documents into clusters labeled by phrases extracted from the search result snippets. Documents related to the same topic usually share a common vocabulary. The words are first clustered based on their co-occurrences and each cluster forms a potentially interesting topic. Keywords are chosen and then clustered by counting co-occurrences of pairs of keywords. Documents are assigned to relevant topics based on the feature vectors of the clusters.
    Type: Application
    Filed: July 13, 2004
    Publication date: February 2, 2006
    Applicant: Microsoft Corporation
    Inventors: Hua-Jun Zeng, Qicai He, Guimei Liu, Zheng Chen, Benyu Zhang, Wei-Ying Ma
  • Publication number: 20060026298
    Abstract: A method and system for calculating the importance of persons based on interpersonal relationships and prioritizing communications based on importance of participants in the communications is provided. A prioritization system identifies relationships between persons and identifies the importance of a person to other persons based on these relationships. After the prioritization system identifies the importance of persons, the prioritization system can prioritize communications based on the importance of the senders or recipients.
    Type: Application
    Filed: July 30, 2004
    Publication date: February 2, 2006
    Applicant: Microsoft Corporation
    Inventors: Hua-Jun Zeng, Zheng Chen, Benyu Zhang, Wei-Ying Ma
  • Publication number: 20060004561
    Abstract: A method and system for clustering documents based on generalized sentence patterns of the topics of the documents is provided. A generalized sentence patterns (“GSP”) system identifies a “sentence” that describes the topic of a document. To cluster documents, the GSP system generates a “generalized sentence” form of the sentence that describes the topic of each document. The generalized sentence is an abstraction of the words of the sentence. The GSP system identifies clusters of documents based on the patterns of their generalized sentences. The GSP system clusters documents when the generalized sentence representations of their topics have a similar pattern.
    Type: Application
    Filed: June 30, 2004
    Publication date: January 5, 2006
    Applicant: Microsoft Corporation
    Inventors: Benyu Zhang, Wei-Ying Ma, Zheng Chen, Hua-Jun Zeng
  • Publication number: 20060005247
    Abstract: A method and system for detecting whether an outgoing communication contains confidential information or other target information is provided. The detection system is provided with a collection of documents that contain confidential information, referred to as “confidential documents.” When the detection system is provided with an outgoing communication, it compares the content of the outgoing communication to the content of the confidential documents. If the outgoing communication contains confidential information, then the detection system may prevent the outgoing communication from being sent outside the organization. The detection system detects confidential information based on the similarity between the content of an outgoing communication and the content of confidential documents that are known to contain confidential information.
    Type: Application
    Filed: June 30, 2004
    Publication date: January 5, 2006
    Applicant: Microsoft Corporation
    Inventors: Benyu Zhang, Hua-Jun Zeng, Wei-Ying Ma, Zheng Chen
  • Publication number: 20060004809
    Abstract: A system for calculating the importance of web pages is provided. The web pages are organized hierarchically into collections. The system calculates the importance of each collection based on inter-collection links from a web page in one collection to a web page in another collection. The system then calculates the importance of web pages in the collections with a high calculated importance based on links between the web pages in those collections using, for example, a conventional page rank algorithm. The system may also calculate the importance of web pages in each collection with a low calculated importance separately based on the links between the web pages in the collection using, for example, a conventional page rank algorithm.
    Type: Application
    Filed: June 30, 2004
    Publication date: January 5, 2006
    Applicant: Microsoft Corporation
    Inventors: Benyu Zhang, Hua-Jun Zeng, Wei-Ying Ma, Zheng Chen
  • Publication number: 20050256832
    Abstract: A method and system for ranking objects based on relationships with objects of a different object type is provided. The ranking system defines an equation for each attribute of each type of object. The equations define the attribute values and are based on relationships between the attribute and the attributes associated with the same type of object and different types of objects. The ranking system iteratively calculates the attribute values for the objects using the equations until the attribute values converge on a solution. The ranking system then ranks objects based on attribute values.
    Type: Application
    Filed: May 14, 2004
    Publication date: November 17, 2005
    Applicant: Microsoft Corporation
    Inventors: Benyu Zhang, Hua-Jun Zeng, Wei-Ying Ma, Wensi Xi, Zheng Chen
  • Publication number: 20050246410
    Abstract: A method and system for classifying display pages based on automatically generated summaries of display pages. A web page classification system uses a web page summarization system to generate summaries of web pages. The summary of a web page may include the sentences of the web page that are most closely related to the primary topic of the web page. The summarization system may combine the benefits of multiple summarization techniques to identify the sentences of a web page that represent the primary topic of the web page. Once the summary is generated, the classification system may apply conventional classification techniques to the summary to classify the web page. The classification system may use conventional classification techniques such as a Naïve Bayesian classifier or a support vector machine to identify the classifications of a web page based on the summary generated by the summarization system.
    Type: Application
    Filed: April 30, 2004
    Publication date: November 3, 2005
    Applicant: Microsoft Corporation
    Inventors: Zheng Chen, Dou Shen, Benyu Zhang, Hua-Jun Zeng, Wei-Ying Ma
  • Publication number: 20050246328
    Abstract: A method and system for ranking documents of search results based on information richness and diversity of topics. A ranking system determines the information richness of each document within a search result. The ranking system groups documents of a search result based on their relatedness, meaning that they are directed to similar topics. The ranking system ranks the documents to ensure that the highest ranking documents may include at least one document covering each topic, that is, one document from each of the groups. The ranking system selects the document from each group that has the highest information richness of the documents within the group. When the documents are presented to a user in rank order, the user will likely find on the first page of the search result documents that cover a variety of topics, rather than just a single popular topic.
    Type: Application
    Filed: April 30, 2004
    Publication date: November 3, 2005
    Applicant: Microsoft Corporation
    Inventors: Benyu Zhang, Zheng Chen, Hua-Jun Zeng, Wei-Ying Ma
  • Publication number: 20050234879
    Abstract: Systems and methods for related term suggestion are described. In one aspect, term clusters are generated as a function of calculated similarity of term vectors. Each term vector having been generated from search results associated with a set of high frequency of occurrence (FOO) historical queries previously submitted to a search engine. Responsive to receiving a term/phrase from an entity, the term/phrase is evaluated in view of terms/phrases in the term clusters to identify one or more related term suggestions.
    Type: Application
    Filed: April 15, 2004
    Publication date: October 20, 2005
    Inventors: Hua-Jun Zeng, Benyu Zhang, Zheng Chen, Wei-Ying Ma, Li Li, Ying Li, Tarek Najm
  • Publication number: 20050234955
    Abstract: Systems and methods for clustering-based text classification are described. In one aspect text is clustered as a function of labeled data to generate cluster(s). The text includes the labeled data and unlabeled data. Expanded labeled data is then generated as a function of the cluster(s). The expanded label data includes the labeled data and at least a portion of unlabeled data. Discriminative classifier(s) are then trained based on the expanded labeled data and remaining ones of the unlabeled data.
    Type: Application
    Filed: August 16, 2004
    Publication date: October 20, 2005
    Applicant: Microsoft Corporation
    Inventors: Hua-Jun Zeng, Xuanhui Wang, Zheng Chen, Benyu Zhang, Wei-Ying Ma
  • Publication number: 20050234973
    Abstract: Systems and methods for mining service requests for product support are described. In one aspect, unstructured service requests are converted to one or more structured answer objects. Each structured answer object includes hierarchically structured historic problem diagnosis data. In view of a product problem description, a set of the one or more structured answer data objects is identified. Each structured solution data object in the set includes term(s) and/or phrase(s) related to the product problem description. Historic and hierarchically structured problem diagnosis data from the set is provided to an end-user for product problem diagnosis.
    Type: Application
    Filed: April 15, 2004
    Publication date: October 20, 2005
    Inventors: Hua-Jun Zeng, Benyu Zhang, Zheng Chen, Ji-Rong Wen, Hang Li, Wei-Ying Ma, Gabor Hirschler, Kurt Samuelson
  • Publication number: 20050234953
    Abstract: Systems and methods for verifying relevance between terms and Web site contents are described. In one aspect, site contents from a bid URL are retrieved. Expanded term(s) semantically and/or contextually related to bid term(s) are calculated. Content similarity and expanded similarity measurements are calculated from respective combinations of the bid term(s), the site contents, and the expanded terms. Category similarity measurements between the expanded terms and the site contents are determined in view of a trained similarity classifier. The trained similarity classifier having been trained from mined web site content associated with directory data. A confidence value providing an objective measure of relevance between the bid term(s) and the site contents is determined from the content, expanded, and category similarity measurements evaluating the multiple similarity scores in view of a trained relevance classifier model.
    Type: Application
    Filed: April 15, 2004
    Publication date: October 20, 2005
    Inventors: Benyu Zhang, Hua-Jun Zeng, Zheng Chen, Wei-Ying Ma, Li Li, Ying Li, Tarek Najm
  • Publication number: 20050234952
    Abstract: Systems and methods providing computer-implemented content propagation for enhanced document retrieval are described. In one aspect, reference information directed to one or more documents is identified. The reference information is identified from one or more sources of data that are independent of a data source that includes the one or more documents. Metadata that is proximally located to the reference information is extracted from the one or more sources of data. Relevance between respective features of the metadata to content of associated ones of the one or more documents is calculated. For each document of the one or more documents, associated portions of the metadata is indexed with the relevance of features from the respective portions into original content of the document. The indexing generates one or more enhanced documents.
    Type: Application
    Filed: April 15, 2004
    Publication date: October 20, 2005
    Inventors: Hua-Jun Zeng, Benyu Zhang, Zheng Chen, Wei-Ying Ma, Hsiao-Wuen Hon, Daniel Cook, Gabor Hirschler, Karen Fries, Kurt Samuelson
  • Publication number: 20050234880
    Abstract: Systems and methods for enhanced document retrieval are described. In one aspect, a search query from an end-user is received. Responsive to receiving the search query, search results are retrieved. The search results include an enhanced document and a set of non-enhanced documents. The enhanced document and the non-enhanced documents include term(s) of the search query. The enhanced document is derived from a base document. The base document was modified with metadata mined from one or more different documents. The metadata is associated with one or more respective references to the base document. The one or more different documents are independent of the base document.
    Type: Application
    Filed: April 15, 2004
    Publication date: October 20, 2005
    Inventors: Hua-Jun Zeng, Benyu Zhang, Zheng Chen, Wei-Ying Ma, Hsiao-Wuen Hon, Daniel Cook, Gabor Hirschler, Karen Fries, Kurt Samuelson
  • Publication number: 20050234972
    Abstract: Systems and methods for related term suggestion are described. In one aspect, relationships among respective ones of two or more multi-type data objects are identified. The respective ones of the multi-type data objects include at least one object of a first type and at least one object of a second type that is different from the first type. The multi-type data objects are iteratively clustered in view of respective ones of the relationships to generate reinforced clusters.
    Type: Application
    Filed: April 15, 2004
    Publication date: October 20, 2005
    Inventors: Hua-Jun Zeng, Benyu Zhang, Zheng Chen, Wei-Ying Ma, Li Li, Ying Li, Tarek Najm