Patents by Inventor Fuchun Peng

Fuchun Peng has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20100257150
    Abstract: A computer-implemented method for processing user entered query data to improve results of a search of pages using a local search database, when searching the internet, is disclosed. The method includes receiving the user entered query data and parsing each word of the query data and segmenting words using a probabilistic dictionary to determine a likelihood that the word is for a particular name. And, associating the particular names with a name tag to create one or more tagged name terms. Then, normalizing each of the tagged name terms and the normalizing including boosting information if found in the local search database and determining proximity between selected ones of the tagged name terms. The method then generates an optimized search query that incorporates normalized terms and operators. The optimized search query being applied to the internet to enable search results to be produced and displayed to the user in response to the entered query data.
    Type: Application
    Filed: June 17, 2010
    Publication date: October 7, 2010
    Applicant: Yahoo!, Inc.
    Inventors: Yumao Lu, Nawaaz Ahmed, Fuchun Peng, Marco Zagha
  • Patent number: 7809715
    Abstract: A method for handling abbreviations in web queries includes building a dictionary of a plurality of possible word expansions for a plurality of potential abbreviations related to query terms received or anticipated to be received by a search engine; accepting a query including an abbreviation; expanding the abbreviation into one of the plurality of word expansions if a probability that the expansion is correct is above a threshold value, wherein the probability is determined by taking into consideration a context of the abbreviation within the query, wherein the context including at least anchor text; and sending the query with the expanded abbreviation to the search engine to generate a search results page related to the query.
    Type: Grant
    Filed: April 15, 2008
    Date of Patent: October 5, 2010
    Assignee: Yahoo! Inc.
    Inventors: Xing Wei, Fuchun Peng, Benoit Dumoulin
  • Patent number: 7788276
    Abstract: Techniques for determining when and how to transform words in a query to return the most relevant search results while minimizing computational overhead are provided. A dictionary is generated based upon words used in a specified number of previous most frequent search queries and comprises lists of transformations that may include variants based upon the stems of words, synonyms, and abbreviation expansions. When a query is received from a user, candidate queries are generated based upon replacing particular words in the query with a transformation of the particular words. Candidate queries are selected that have a high probability of returning relevant results by computing values of the query using language model scoring and translation scoring. The selected candidate queries and the original query are executed to return search results. The search results are displayed to the user with the words in the original query and the transformed words in bold.
    Type: Grant
    Filed: August 22, 2007
    Date of Patent: August 31, 2010
    Assignee: Yahoo! Inc.
    Inventors: Fuchun Peng, Nawaaz Ahmed, Yumao Lu, Marco J. Zagha
  • Publication number: 20100205198
    Abstract: Disclosed herein is a system and method of query disambiguation. At least one model is generated using training data, which model can be used to score, or rank, possible interpretations identified for a query, which can be used to select an interpretation from a number of possible interpretations. A selected interpretation can be used to process a web search request, e.g., to generate search results that relate to the selected query interpretation, rank or order the items in the search result based on relevance to the selected query interpretation, and/or identify a presentation to be used to display the search results based on the selected query interpretation.
    Type: Application
    Filed: February 6, 2009
    Publication date: August 12, 2010
    Inventors: Gilad Mishne, Raymond Stata, Fuchun Peng
  • Patent number: 7769746
    Abstract: Computer-implemented methods and systems for processing user entered query data to improve results of a search of pages using a local search database are provided, when searching the internet. The method includes receiving the user entered query data and parsing each word of the query data and examining each word to determine if the word is associated with one of a business name, a city name or a state name. The examining uses probabilistic dictionaries to determine a likelihood that the word is for a particular term or intent. The method further includes normalizing each of the tagged business terms. The normalizing includes boosting information if found in the local search database and determining proximity between selected ones of the tagged terms. Then, generating an optimized internal search query that incorporates constraints and ranking based on at least the boosting information and the determined proximity between the selected tagged terms.
    Type: Grant
    Filed: January 16, 2008
    Date of Patent: August 3, 2010
    Assignee: Yahoo! Inc.
    Inventors: Yumao Lu, Nawaaz Ahmed, Fuchun Peng, Marco Zagha
  • Publication number: 20100191758
    Abstract: A system and method for improved search relevance using proximity boosting. A query for a web search is received from a user, via a network, wherein the query comprises a plurality of query tokens. One or more concepts are identified in the query wherein each of concepts comprises at least two query tokens. A relative concept strength is determined for each of the identified concepts. The query is then rewritten for submission to a search engine wherein for each of the one or more concepts, a syntax rule associated with the respective relative concept strength of the concept is applied to the query tokens comprising the concept such that the rewritten query represents the one or more concepts whereby the proximity of the one or more concepts in a search result returned by the search engine to the user in response to the rewritten query is boosted.
    Type: Application
    Filed: January 26, 2009
    Publication date: July 29, 2010
    Applicant: Yahoo! Inc.
    Inventors: Fuchun Peng, Xing Wei, Yumao Lu, Xin Li, Donald Metzler, Hang Cui, Benoit Dumoulin
  • Publication number: 20100131538
    Abstract: Methods and apparatus are described for identifying implicitly temporally qualified queries, i.e., queries for which a time period is implied but not explicitly stated, and for expanding such queries to include one or more temporal references.
    Type: Application
    Filed: November 24, 2008
    Publication date: May 27, 2010
    Applicant: YAHOO! INC.
    Inventors: Rosie Jones, Donald Metzler, Fuchun Peng
  • Patent number: 7693865
    Abstract: To accurately classify a query as navigational, thousands of available features are explored, extracted from major commercial search engine results, user Web search click data, query log, and the whole Web's relational content. To obtain the most useful features for navigational query identification, a three level system is used which integrates feature generation, feature integration, and feature selection in a pipeline. Because feature selection plays a key role in classification methodologies, the best feature selection method is coupled with the best classification approach to achieve the best performance for identifying navigational queries. According to one embodiment, linear Support Vector Machine (SVM) is used to rank features and the top ranked features are fed into a Stochastic Gradient Boosting Tree (SGBT) classification method for identifying whether or not a particular query is a navigational query.
    Type: Grant
    Filed: August 30, 2006
    Date of Patent: April 6, 2010
    Assignee: Yahoo! Inc.
    Inventors: Yumao Lu, Fuchun Peng, Xin Li, Nawaaz Ahmed
  • Patent number: 7630978
    Abstract: Techniques for rewriting queries submitted to a query engine are provided. A query is submitted by a user and sent to a search mechanism. Based on the query, one or more query suggestions are generated. Features are generated based on the query and the query suggestions. Those features are input to a trained machine learning mechanism that generates a rewrite score. The rewrite score signifies a confidence score that indicates how confident the search mechanism is that the user intended to submit the original query. If the rewrite score is below a certain threshold, then the original query is rewritten to a second query. Results of executing the original query may be sent to the user along with a reference to the second query. Additionally or alternatively, results of executing the second query are sent to the user.
    Type: Grant
    Filed: December 14, 2006
    Date of Patent: December 8, 2009
    Assignee: Yahoo! Inc.
    Inventors: Xin Li, Nawaaz Ahmed, Fuchun Peng, Yumao Lu
  • Publication number: 20090259643
    Abstract: A method for normalizing query words in web search includes populating a dictionary with join and split candidates and corresponding joined and split words from an aggregate of query logs; determining a confidence score for join and split candidates, a highest confidence score for each being characterized in the dictionary as must-join and must-split, respectively; accepting queries with words amenable to being split or joined, or amenable to an addition or deletion of a hyphen or an apostrophe; generating, based on the accepted queries, split candidates obtained from the dictionary, and candidates of join, hyphen, or apostrophe algorithmically; and submitting to a search engine the generated possible candidates characterized as must-join or must-split in the dictionary, to improve search results returned in response to the queries; applying a language dictionary to generated candidates not characterized as must-split or must-join, to rank them, and submitting those highest-ranked to the search engine.
    Type: Application
    Filed: April 15, 2008
    Publication date: October 15, 2009
    Applicant: Yahoo! Inc.
    Inventors: Fuchun Peng, George H. Mills, Benoit Dumoulin
  • Publication number: 20090259629
    Abstract: A method for handling abbreviations in web queries includes building a dictionary of a plurality of possible word expansions for a plurality of potential abbreviations related to query terms received or anticipated to be received by a search engine; accepting a query including an abbreviation; expanding the abbreviation into one of the plurality of word expansions if a probability that the expansion is correct is above a threshold value, wherein the probability is determined by taking into consideration a context of the abbreviation within the query, wherein the context including at least anchor text; and sending the query with the expanded abbreviation to the search engine to generate a search results page related to the query.
    Type: Application
    Filed: April 15, 2008
    Publication date: October 15, 2009
    Applicant: Yahoo! Inc.
    Inventors: Xing Wei, Fuchun Peng, Benoit Dumoulin
  • Publication number: 20090248595
    Abstract: Computer-enabled methods, apparatus, and computer-readable media are provided for verifying that a given network name, such as a URL, is an official, e.g., registered, approved, or otherwise officially recognized, network name that refers to or identifies a principal, such as a business. These techniques involve receiving a principal name and a given network name, receiving at least one feature attribute from at least one database of feature attributes, wherein the at least one feature attribute comprises a characteristic of the principal name or a characteristic of the network name, and invoking a logistic regression method to generate a probability, based upon the at least one feature attribute, that the given network name is an official network name for the principal name. The logistic regression method may include a gradient boosting tree model that generates the probability based upon the at least one feature attribute.
    Type: Application
    Filed: March 31, 2008
    Publication date: October 1, 2009
    Inventors: Yumao Lu, Nawaaz Ahmed, Fuchun Peng, Benoit Dumoulin
  • Publication number: 20090234836
    Abstract: Generally, a method and apparatus provides for search results in response to a web search request having at least two search terms in the search request. The method and apparatus includes generating a plurality of term groupings of the search terms and determining a relevance factor for each of the term groupings. The method and apparatus further determines a set of the term groupings based on the relevance factors and therein conducts a web resource search using the set of term groupings, to thereby generate search results. The method and apparatus provides the search results to the requesting entity.
    Type: Application
    Filed: March 14, 2008
    Publication date: September 17, 2009
    Applicant: YAHOO! INC.
    Inventors: Fuchun Peng, Yumao Lu, Nawaaz Ahmed, Bin Tan
  • Publication number: 20090182729
    Abstract: Computer-implemented methods and systems for processing user entered query data to improve results of a search of pages using a local search database are provided, when searching the internet. The method includes receiving the user entered query data and parsing each word of the query data and examining each word to determine if the word is associated with one of a business name, a city name or a state name. The examining uses probabilistic dictionaries to determine a likelihood that the word is one of the business name, the city name or the state name. Then, associating the words that were determined to be: (i) the business name with a business name tag to create one or more tagged business terms; (ii) the city name with a city name tag to create one or more tagged city terms; and (iii) the state name with a state name tag to create one or more tagged state terms. The method further includes normalizing each of the tagged business terms, the tagged city terms and the tagged state terms.
    Type: Application
    Filed: January 16, 2008
    Publication date: July 16, 2009
    Applicant: Yahoo!, Inc.
    Inventors: Yumao Lu, Nawaaz Ahmed, Fuchun Peng, Marco Zagha
  • Publication number: 20090132515
    Abstract: A method and apparatus for performing multi-phase ranking of web search results by re-ranking results using feature and label calibration are provided. According to one embodiment of the invention, a ranking function is trained by using machine learning techniques on a set of training samples to produce ranking scores. The ranking function is used to rank the set of training samples according to its ranking score, in order of its relevance to a particular query. Next, a re-ranking function is trained by the same training samples to re-rank the documents from the first ranking. The features and labels of the training samples are calibrated and normalized before they are reused to train the re-ranking function. By this method, training data and training features used in past trainings are leveraged to perform additional training of new functions, without requiring the use of additional training data or features.
    Type: Application
    Filed: November 19, 2007
    Publication date: May 21, 2009
    Inventors: Yumao Lu, Fuchun Peng, Xin Li, Nawaaz Ahmed
  • Publication number: 20090055380
    Abstract: Techniques for determining when and how to transform words in a query to return the most relevant search results while minimizing computational overhead are provided. A dictionary is generated based upon words used in a specified number of previous most frequent search queries and comprises lists of transformations that may include variants based upon the stems of words, synonyms, and abbreviation expansions. When a query is received from a user, candidate queries are generated based upon replacing particular words in the query with a transformation of the particular words. Candidate queries are selected that have a high probability of returning relevant results by computing values of the query using language model scoring and translation scoring. The selected candidate queries and the original query are executed to return search results. The search results are displayed to the user with the words in the original query and the transformed words in bold.
    Type: Application
    Filed: August 22, 2007
    Publication date: February 26, 2009
    Inventors: Fuchun PENG, Nawaaz Ahmed, Yumao Lu, Marco J. Zagha
  • Publication number: 20080189262
    Abstract: Techniques for determining when and how to transform words in a query to its plural or non-plural form in order to provide the most relevant search results while minimizing computational overhead are provided. A dictionary is generated based upon the words used in a specified number of previous most frequent search queries and comprises lists of transformations from plural to singular and singular to plural. Unnecessary transformations are removed from the dictionary based upon language modeling. The word to transform is determined by finding the last non-stop re-writable word of the query. The context of the transformed word is confirmed in the search documents and a version of the query is executed using both the original form of the word and the transformation of the word.
    Type: Application
    Filed: February 1, 2007
    Publication date: August 7, 2008
    Inventors: Fuchun Peng, Nawaaz Ahmed, Xin Li, Yumao Lu
  • Publication number: 20080147637
    Abstract: Techniques for rewriting queries submitted to a query engine are provided. A query is submitted by a user and sent to a search mechanism. Based on the query, one or more query suggestions are generated. Features are generated based on the query and the query suggestions. Those features are input to a trained machine learning mechanism that generates a rewrite score. The rewrite score signifies a confidence score that indicates how confident the search mechanism is that the user intended to submit the original query. If the rewrite score is below a certain threshold, then the original query is rewritten to a second query. Results of executing the original query may be sent to the user along with a reference to the second query. Additionally or alternatively, results of executing the second query are sent to the user.
    Type: Application
    Filed: December 14, 2006
    Publication date: June 19, 2008
    Inventors: Xin Li, Nawaaz Ahmed, Fuchun Peng, Yumao Lu
  • Publication number: 20080059508
    Abstract: To accurately classify a query as navigational, thousands of available features are explored, extracted from major commercial search engine results, user Web search click data, query log, and the whole Web's relational content. To obtain the most useful features for navigational query identification, a three level system is used which integrates feature generation, feature integration, and feature selection in a pipeline. Because feature selection plays a key role in classification methodologies, the best feature selection method is coupled with the best classification approach to achieve the best performance for identifying navigational queries. According to one embodiment, linear Support Vector Machine (SVM) is used to rank features and the top ranked features are fed into a Stochastic Gradient Boosting Tree (SGBT) classification method for identifying whether or not a particular query is a navigational query.
    Type: Application
    Filed: August 30, 2006
    Publication date: March 6, 2008
    Inventors: Yumao Lu, Fuchun Peng, Xin Li, Nawaaz Ahmed
  • Publication number: 20070282591
    Abstract: A method for predicting results for input data based on a model that is generated based on clusters of related characters, clusters of related segments, and training data. The method comprises receiving a data set that includes a plurality of words in a particular language. In the particular language, words are formed by characters. Clusters of related characters are formed from the data set. A model is generated based at least on the clusters of related characters and training data. The model may also be based on the clusters of related segments. The training data includes a plurality of entries, wherein each entry includes a character and a designated result for said character. A set of input data that includes characters that have not been associated with designated results is received. The model is applied to the input data to determine predicted results for characters within the input data.
    Type: Application
    Filed: June 1, 2006
    Publication date: December 6, 2007
    Inventor: Fuchun Peng